DMFuser: Distilled Multi-Task Learning for End-to-end Transformer-Based Sensor Fusion in Autonomous Driving
In the context of end-to-end autonomous driving, current sensor fusion techniques for imitation learning are insufficient} in challenging scenarios involving multiple dynamic agents and result in multiple accidents. To tackle this issue, we introduce DMFuser, a transformer-based algorithm that employs knowledge distillation between multi-task student and single-task teachers to fuse multiple RGB-D camera representations and produce a vehicular navigational commands, containing throttle, steering and brake. Our model encompasses two modules. The first module, perception, encodes data from RGB-D cameras for tasks like semantic segmentation, semantic depth cloud mapping (SDC), and traffic light state recognition. To enhance feature extraction and fusion from both RGB and depth sources, we harness local and global capabilities of convolution and transformer modules. We employ an attention-CNN fusion structure to effectively learn and fuse RGB and SDC map features. Subsequently, the control module decodes encoded features along with supplementary data, including a coarse simulator for static and dynamic environments, to predict waypoints in an underlying feature space. We evaluate the model and conduct a comparative analysis, in various scenarios, weather conditions, and traffic situations, spanning from normal to adversarial, to simulate real-world scenarios using the CARLA simulator. We achieve better or comparable results in term of driving score (DS) and other metrics with respect to our baselines.
Join Upaspro to get email for news in AI and Finance
Great work!
How does the mode handle unseen classes? Do you have any suggestions on how to handle them. The current burden for full AD are behaviour in adversarial situations which should be noticeably higher than humans.
Pingback: Autonomous Driving: from Sensor Fusion to End-to-End Control – Up As Pro – Academic