Autonomous vehicles rely on computer vision to perceive their environment and operate safely within it, but the effects of weather and lighting such as rain, fog, and night can significantly impact the performance of vision systems. For reliable integration into traffic, autonomous vehicles’ computer vision must be robust to the varying effects of weather. We measured the effects of weather on semantic segmentation and tracking models with simulated data, and then implemented three approaches to mitigate the effects of weather: domain adaptation with fine tuning, de-weathering model input, and multimodal sensor fusion. We collected image and lidar data on city driving scenes in the CARLA simulator across four scenarios: clear day, foggy day, rainy day, and clear night. After obtaining baseline performance for models trained on each of these scenarios, we evaluated our mitigation strategies. We show improvements in cross-domain performance for each of these methods and compare the merits and demerits of each approach.
We used the CARLA simulator to generate synthetic datasets for each of our four weather scenarios. We trained our models on each of the weather datasets and evaluated those models across all weathers. Initially, CARLA provided a simple route featuring basic instances of vehicles and pedestrians. In order to execute our weather-focused project, we produced data for each of our weathers with deterministic simulation and implemented user-friendly configurability for a variety of relevant CARLA options, such as car count. Based on the needs of the instance segmentation and semantic segmentation teams, we also expanded our data modalities to include lidar and added options to record videos and output bounding boxes. This comprehensive effort resulted in the creation of over one million files of data, which formed the foundation of our project.
We investigated the impact of weather on the performance and reliability of semantic segmentation for autonomous driving in varied weather conditions. We used models implemented in PyTorch by MIT for their ADE20k dataset. Our final results use their implementation of the HRNETV2 encoder with a single convolutional layer as the decoder. We trained our models on each of the 4 weather datasets using an NVIDIA A2000 with 12GB of VRAM. As expected, the models did the best when evaluated on the weather they were trained on, with significant performance degradation when evaluating on other weathers. The magnitude of this drop varied by the test weather, with models generally showing the worst cross-domain performance on foggy scenes. These results formed the baseline against which we measured our weather mitigation strategies.
As an extension on the task of semantic segmentation, we also trained a video instance segmentation model on CARLA data to see how weather would affect its ability to distinguish between multiple instances of the same types of objects and keep track of them between the frames of a video. We utilized the MMDetection framework and the MaskTrackRCNN model, which is typically trained on the large YoutubeVIS2021 dataset, to accomplish this task. We trained our models on each of the 4 weather datasets using an NVIDIA A2000 with 12GB of VRAM. We generally saw lower performance than the original implementation, but we were able to still show the significant differences in model performance when tested on weathers that each model was not trained on.
Models trained on only one weather perform the best when evaluated on data from that same condition. For instance, a model trained on clear day images performs poorly on evaluating night or foggy images. This limited generalization occurs due to the model learning specific features of clear day weather, including sharp edges, bright colors, and strong contrasts. However, foggy images and videos appear desaturated and blurry, while night images and videos shift colors due to artificial lighting, altering key visual features. While adverse weather scenarios present a clear challenge to computer vision models, we found that models trained on an adverse condition performed just as well as the clear day model when evaluted on in-domain data. This finding indicates that the degraded performance on cross-domain evaluation stems not necessarily from the increased difficulty of vision in adverse weather, but from the poor generalizability of features learned from only one weather. For our semantic segmentation model (HRNETV2 + C1), our baseline results show MIOU scores around 0.65 when models are tested on the same weather condition. However, testing across different weather conditions results in a roughly 0.2 to 0.3 drop in performance. For instance segmentation, our models achieve a MAP score of approximately 0.15 under same-weather conditions, with a performance drop of 0.05 to 0.13 when evaluated on different weather types.
We evaluated three different approaches to mitigating the poor cross-domain performance: domain adaptation, deweathering, and sensor fusion. After generating new data sets for each solution, we applied them to our models to obtain new metrics to compare to our previous results. Domain adaptation had the best results out of the three, but all showed some signs of helping our weathering problem.
Special thank you to,
our advisor, Tanya Amert
and to Mike Tie.