Top papers: LTE, Dust3r, 3D DP, Classification DRL

April 14, 2024April 2, 2024 admin

This edition, we go over LoRA-the-Explorer (LTE), Reconstruct 3D scenes from images (Dust3r), 3D Diffusion Policy, Training Value Functions via Classification for Scalable Deep RL.

LoRA-the-Explorer (LTE)

This paper introduces LoRA-the-Explorer (LTE), a novel optimization algorithm that extends low-rank adaptation (LoRA) methods to the pre-training of neural networks, aiming to address the computational, memory, and communication limitations faced by current deep learning model scalability.

By enabling parallel training of multiple low-rank heads across computing nodes with minimal need for synchronization, LTE offers a solution for efficient model training on lower-memory devices and in bandwidth-constrained environments.

n this work, we investigated the feasibility of using low-rank adapters for model pre-training. We introduced LTE, a bi-level optimization method that capitalizes on the memory- efficient properties of LoRA. Although we succeeded in matching performance on moderately sized tasks, several
questions remain unresolved. These include: how to accelerate convergence during the final 10% of training; how to dynamically determine the number of ranks or heads required; whether heterogeneous parameterization of LoRA is feasible, where each LoRA head employs a variable ranr; and leveraging merging strategies to accompany higher local optimization steps. Our work serves as a proof-of- concept, demonstrating the viability of utilizing low-rank adapters for neural network training from scratch. However, stress tests on larger models are essential for a comprehensive understanding of the method’s scalability. Addressing these open questions will be crucial for understanding the limitations of our approach. We anticipate that our work will pave the way for pre-training models in computationally constrained or low-bandwidth environments, where less capable and low-memory devices can collaboratively train a large model, embodying the concept of the “wisdom of the crowd”

ArXiv

Reconstruct 3D scenes from images Dust3r

Duster is a repository introducing a novel approach called Dense and Unconstrained Stereo 3D Reconstruction (DUSt3R). It allows you to generate 3D models from 2D images without requiring camera calibration or viewpoint data.

Key Capabilities:

Operates on arbitrary image collections
Integrates monocular and binocular reconstruction methods via pointmap regression
Aligns multi-view pointmaps into a common reference frame
Utilizes transformer encoders/decoders with pre-trained models

The approach simplifies previous multi-view stereo methods that necessitated intricate camera parameter estimation. Duster’s unified formulation handles single, dual, and multi-image inputs seamlessly.

Includes:

Pre-trained models for various resolutions
Interactive demos with Docker setup
Data preparation and custom training guides

Duster achieves new state-of-the-art results for monocular/multi-view depth estimation and relative pose estimation tasks critical to 3D reconstruction.

Github

ArXiv paper

3D Diffusion Policy

The study introduces 3D Diffusion Policy (DP3), a cutting-edge visual imitation learning algorithm that integrates 3D visual representations with diffusion policies to teach robots dexterous skills efficiently and effectively. Leveraging a compact 3D visual representation from sparse point clouds, DP3 demonstrates remarkable success in both simulated and real-world tasks, achieving significant improvements over traditional methods with fewer demonstrations required and showcasing strong generalization capabilities across different conditions.

ArXiv paper

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

This paper explores the potential of enhancing the scalability and performance of deep reinforcement learning (RL) by adopting a classification approach for training value functions instead of the traditional regression method. Demonstrating across a diverse set of domains, this method achieves state-of-the-art results, effectively addressing common challenges in value-based RL such as noisy targets and non-stationarity, while improving scalability with minimal additional cost.

ArXiv