CodeMachine LearningProjectVideos

DMFuser: Distilled Multi-Task Learning for  End-to-end Transformer-Based Sensor Fusion in Autonomous Driving

In the context of end-to-end autonomous driving, current sensor fusion techniques for imitation learning are insufficient} in challenging scenarios involving multiple dynamic agents and result in multiple accidents. To tackle this issue, we introduce DMFuser, a transformer-based algorithm that employs knowledge distillation between multi-task student and single-task teachers to fuse multiple RGB-D camera representations and produce a vehicular navigational commands, containing throttle, steering and brake. Our model encompasses two modules. The first module, perception, encodes data from RGB-D cameras for tasks like semantic segmentation, semantic depth cloud mapping (SDC), and traffic light state recognition. To enhance feature extraction and fusion from both RGB and depth sources, we harness local and global capabilities of convolution and transformer modules. We employ an attention-CNN fusion structure to effectively learn and fuse RGB and SDC map features. Subsequently, the control module decodes encoded features along with supplementary data, including a coarse simulator for static and dynamic environments, to predict waypoints in an underlying feature space. We evaluate the model and conduct a comparative analysis, in various scenarios, weather conditions, and traffic situations, spanning from normal to adversarial, to simulate real-world scenarios using the CARLA simulator. We achieve better or comparable results in term of driving score (DS) and other metrics with respect to our baselines.

Restricted content! log in or register for free.

YouTube player

Join Upaspro to get email for news in AI and Finance

One thought on “DMFuser: Distilled Multi-Task Learning for  End-to-end Transformer-Based Sensor Fusion in Autonomous Driving

  • Yundon

    Great work!
    How does the mode handle unseen classes? Do you have any suggestions on how to handle them. The current burden for full AD are behaviour in adversarial situations which should be noticeably higher than humans.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.