Top 10 topics of NeurIPS 2024
NeurIPS 2024 has once again set the stage for transformative breakthroughs in AI. This year’s workshops covered an impressive range of topics, from autonomous driving systems to optimization strategies for large-scale models. Here’s a sneak peek into the key discussions:
- Autonomous Driving: We explored concepts like BEV generalization, trajectory modeling as “conversations” with motionLM, and the RL vs imitation learning debate. These advancements are steering us toward safer, smarter vehicles.
- Foundation Models: The focus was on scaling high-quality data and innovations like RMSNorm and QK-Norm. Simulated annealing was a highlight, showcasing how upsampling premium data and low learning rates refine results.
- Transformer Updates: SwiGLU and other architectural shifts optimize long-range dependency capture while speeding up attention mechanisms.
- LLMs and Benchmarking: From pretraining strategies using billions of tokens to fine-tuning with LLaMA stacks, we delved into pushing boundaries for long-context processing.
- Sequential Modeling: Time series analysis got a boost with methods like factoid-based evaluation, offering more precise financial insights.
- RAG Advancements: Llama Stack and RagChecker redefine retrieval-augmented generation and agentic applications.
- Optimization Papers: New algorithms like adaptive proximal gradient methods and Gauss-Newton optimization are set to streamline computation-heavy tasks.
- Reinforcement Learning: Offline RL techniques showed promise in enhancing exploration and tackling sequential decision-making challenges.
- Computer Vision: SAM2 and refined decoder designs led discussions on efficient test-time training and multimodal learning.
- Knowledge Distillation and Distribution Shifts: Techniques like synthetic data distillation and adversarial handling underscore AI’s adaptability to real-world challenges.
Curious about what these mean for AI in 2025? Watch our video for an in-depth breakdown and discover how you can leverage these insights for your projects. Don’t miss out—subscribe for more updates!
1- Autonomous driving:
non-reactive autonomous vehicle simulation
2- Data and evaluation for foundation model:
RepLiQA: a question answering dataset for benchmarking LLM on unseen reference content
Suremap: simultaneous mean estimation for single task and multi task disaggregated evaluation
Benchmarking Large Language Models for Task Automatic
Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks
Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models
DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors
3- Transformer architecture:
Why warmup the learning rate?
Mixture of expert (MoE):
Papers: measuring Deja VU memorization efficiently
Spiking transformer with experts mixture
Grokked transformers are implicit reasoners: a mechanistic journey to the edge of generalization
Weight decay induces low-rank attention layer
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
How Transformers Utilize Multi-Head Attention in In-Context Learning?
A Case Study on Sparse Linear Regression
Pretrained Transformer Efficiently Learns Low-dimensional Target Functions In-context
4- Large language model
CCA: Mitigating Object Hallucination via Concentric Causal Attention
Online Weighted Paging With Unknown Weights
Bias and Volatility:A Statistical Framework for Evaluating Large Language Model’s Stereotypes and the Associated Generation Inconsistency
An auctions for LLM via retrieval augmented generation
Wings; learning multimodal LLM without text-only forgetting
Evaluating numerical reasoning in text -to-image models
LoQT: Low-Rank Adapters for Quantized Pretraining
Mixture of Experts Meets Prompt-based Continual Learning
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Sirius: Contextual Sparsity with Correction for Efficient LLMs
ReFT: Representation Finetuning for Language Models
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts
Instruction Tuning Large Language Models to Understand Electronic Health Records
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages
Online Adaptation of Language Models with a Memory of Amortized Contexts
Time-Reversal Provides Unsupervised Feedback to LLMs
SmallToLarge (SL): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
5- Sequential and time series
TFT: temporal fusion transformer
TEMPO: prompt-based genitive pre-trained transformer for time series forecasting
The FinBen: An Holistic Financial Benchmark for Large Language Models
RTUS: A recurrent architecture that allows efficient realtime recurrent learning.
6- Agents and RAG
RagChecker; a fine-grained framework for diagnosing retrieval-augmentation generation
Embodied agent interface: benchmarking LLMs for embodied decision making
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering
APIGen: automated pipeline for generating verification and diver function calling dataset
Secret collusion among ai agents: multi-agent deception via steganography
CRAG- comprehensive RAG benchmark
On the Effects of Data Scale on Ul Control Agents
7- Optimization
Optimal Parallelization of Boosting
The road less schedule
Exact, ractable Gauss-Newton Optimization in Deep Reversible Architectures Keveal
Poor Generalization
A New Efficient Scale-Invariant Version of AdaGrad
Adaptive Proximal Gradient Method for Convex Optimization
A Simple and Optimal Approach for Universal Online Learning with Gradient Variations
DAGER:exact gradient inversion for large language models
Can models learn skill composition from examples
Heavy-tailed class imbalance why adam beat SGD on language models
How to boost any loss function
How to reduce the memory usage of Adam optimizer?
PRIVATE ONLINE LEARNING VIA LAZY ALGORITHMS
8- Reinforcement learning
worse case offline RL with arbitrary data support
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline RL
The surprixing ineffectiveness of pre-trained visual representation for model-based RL
PPO sufffers from a deteriorating representation that breaks it’s trust region
Mitigating portal observability in seq decision process via the Lambda discrepancy
GenRL: Multimodal-foundation world models for generalization in embodied agents
Generative trajectory augmentation with guidance for offline RL
Constrained latent action policies for model-based offline RL
Parallelizing mode-based RL over the sequence length
Rethinking model based, policy based and value based RL via the lens of representation complexity
Optimal design for human preference elicitation
How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
Subwords as skills; tokenization for spase-reward RL
Can learned optimization make RL less difficult
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
Mitigating Covariate Shift in Behavioral Cloning via Robust Stationary Distribution Correction
First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs
Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
trajectory Data Suffices for Statistically Efficient Learning in Offline
RL with Linear q”-Realizability and Concentrability
Exclusively Penalized Q-learning for Offline Reinforcement Learning
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
Logarithmic Smoothing for Pessimistic Off-Policy
Evaluation, Selection and Learning
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning’
Robust Reinforcement Learning from Corrupted Human Feedback*
C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory
Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning
Sample Complexity Reduction via Policy Difference Estimation in Tabular RL
Ensemble sampling for linear bandit: small ensembles suffices
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation
9- Computer vision
Asynchronous perception machine for efficient test time training
On the Comparison between Multi-modal and Single-modal Contrastive Learning
Rethinking decoders for transformer-based semantic segmentation: compression is all you need
CLIPCEIL: domain generalization through clip via channel refinement and image text alignment
WATT: weight average test-time adaptation fo CLIP
Unibench: visual reasoning requires rethinking vision-language beyond scaling
Calibrated Self-Rewarding Vision Language Models
10- others
10-1 Knowledge distillation
DDK:distilling domain knowledge for efficient large language models
UNDERSTANDING THE GAINS FROM REPEATED SELF-DISTILLATION
10-2 Distribution shift
Out-Of-Distribution Detection with Diversification (Provably)
Changing the Training Data Distribution to Reduce Simplicity
Bias Improves In-distribution Generalization