The machine learning discourse has been understandably dominated of late by language modeling tools such as Chat-GPT. But machine learning can be applied to a much wider range of contexts, inputs, and applications. One of the most fascinating, and most useful, of these is Computer Vision – the machine learning field focused on analyzing and making sense of image and video data. From red-eye removal in smartphone cameras to medical imaging diagnostics to object recognition for self-driving cars, computer vision tools have become ubiquitous in day-to-day life as a result not just of powerful cameras and sensors, but also the availability of high-throughput computer hardware.
Yet, despite the utility and growing prevalence of these fascinating tools and methods, trust in the computer vision domain is especially challenging. At Galois, one branch of our Data Science and Machine Learning (DSML) team is focusing their research on exploring how and when users might trust computer vision models in high-risk scenarios – studying to what extent a data-driven model can be deployed reliably. In particular, we have been exploring challenges in using machine learning tools to analyze satellite imagery for port security.
More Challenging than Meets the Eye
Applying machine learning methods to satellite images is a task fraught with challenges. Most issues in computer vision are a result of users deploying models in contexts different from those in which the model learned. For example, advanced imaging tools are sensitive to changes in their settings as well as exterior conditions such as lighting and capture angles. Using a computer vision model on data with a differently tuned sensor can produce subpar results, diminishing consumer trust in the model’s reliability. In addition, obtaining new, accurate, and useful data can be laborious, and subsequent processing of such data to tune the model is also computationally expensive.
Typical imaging mediums are optical, meaning that they rely on light. However, light itself is often unreliable. Clouds or other weather patterns may disrupt steady sunlight, not to mention the sun setting every day! Those interested in 24-hour, weather independent observation will need a different tool. Security personnel interested in reliable monitoring of port activity turn instead to a form of radar imaging known as SAR, or synthetic aperture radar, to reliably observe and track ship movements in and out of ports.
SAR is a non-optical method of imaging; i.e. the method does not use light waves. Instead, it uses longer wavelength radio waves. Radar is an acronym for: radio detection and ranging; radar systems see by using an antenna (an aperture) to emit radio waves which reflect off a surface and are then received by the antenna. The distance between the satellite and an encountered object is proportional to the time traveled. This data is used to render a type of heat map visualizing objects.
Building an Object Detection Model
In our application, a satellite directed at a port takes periodic SAR images capturing port infrastructure, docked ships, and ships in open water. Time-lapse images of ports are used to measure traffic and identify suspicious ships. Our work focused on finding ships in all contexts contained in a port scene. Currently, we do not try to predict the type of ship. Instead, our goal is to find all ships, which includes information about their presence, location, and movement. This information could be helpful to security-minded port authorities.
Future research could also seek to classify types of ships and even attempt to identify suspicious behaviors. However, as erroneously identified port infrastructure or consistently undetected ships would cause errors, subsequent analysis steps require an accurate ship detector. Thus, our work is an important first step.
We trained state-of-the-art neural networks based on the Mask-RCNN family of models to understand the appearance of ships relative to port infrastructure and other objects in an image—in other words, building a model to automatically detect and identify ships in SAR image data. Expert-annotated images highlighting ships were repeatedly shown to a neural network until the model was able to identify patterns in SAR images defining types of ships. We also leveraged a recent object detection model known as Hybrid Task Cascade, which first scans an image to find ships, then applies a more powerful pixel-wise locator to perfectly extract ships down to the pixel level.
Our trained model achieved results comparable to the best performing vessel detection models on unseen data from the same annotation set. Figure 1 shows five ships alongside port infrastructure. Red boxes denote ships, while blue boxes with green highlights are model predictions. In this image, our model accurately identified all ships without erroneously detecting docks or other port structures. We then set out to understand how the model would perform on cutting-edge images from commercial SAR providers such as Capella Space and ICEYE.
Figure 1: Our trained model applied to an image from the held-out test set. Small blue text depicts the model predicted likelihood of each detection being a ship.
The ICEYE-provided image captured ships moving in the Singapore Strait serves as a helpful example of the model in action. The single 1 GB image contains multiple harbors each with docked ships, ships in open water, and land mass. Figures 2 and 3 depict cropped portions from the full image; the former shows three ships in open water while the latter shows multiple docked ships. Our trained model successfully identified all ships in Figure 2, but missed some ships in Figure 3. Certain facets of the image such as the type of satellite and angle of capture can be difficult to mitigate. Despite these differences, our preprocessing steps: scaling, normalization, and tiling were effective in transforming the ICEYE image to better resemble examples experienced by our model. The model performed comparably to the unseen test data on portions of the image that were routinely encountered during training, e.g., ships in open water. However, portions of harbor infrastructure were erroneously detected as ships at times. Some especially small ships were also undetected. Further model tuning on more similar data would improve performance.
Figures 2 (left) and 3 (right): Two cropped portions from the ICEYE Singapore Strait image. The model performed better on ships in open water than on ships around port infrastructure. Small changes in the illumination caused errors in model performance.
As we continue to refine the accuracy and reliability of our SAR imagery ship detection model, the future looks promising. By understanding when and why machine learning models like this one perform well or poorly, and then using that information to address shortcomings, we can continuously improve our models to improve performance, broaden potential applications, and ultimately establish trust. Already, SAR imagery analysis is an in-demand computational task with enormous potential in maintaining and improving global port security.
As machine learning techniques and computer vision technologies improve, we anticipate even greater contributions, including automated ship classification and behavior analysis for threat prediction. Machine learning for image analytics is a complex domain, but a fascinating one. We look forward to expanding our tooling and expertise in the computer vision space—one more piece of the puzzle in our quest to build trustworthy systems.