VideosYouTube

Top 10 topics of NeurIPS 2024

NeurIPS 2024 has once again set the stage for transformative breakthroughs in AI. This year’s workshops covered an impressive range of topics, from autonomous driving systems to optimization strategies for large-scale models. Here’s a sneak peek into the key discussions:

  1. Autonomous Driving: We explored concepts like BEV generalization, trajectory modeling as “conversations” with motionLM, and the RL vs imitation learning debate. These advancements are steering us toward safer, smarter vehicles.
  2. Foundation Models: The focus was on scaling high-quality data and innovations like RMSNorm and QK-Norm. Simulated annealing was a highlight, showcasing how upsampling premium data and low learning rates refine results.
  3. Transformer Updates: SwiGLU and other architectural shifts optimize long-range dependency capture while speeding up attention mechanisms.
  4. LLMs and Benchmarking: From pretraining strategies using billions of tokens to fine-tuning with LLaMA stacks, we delved into pushing boundaries for long-context processing.
  5. Sequential Modeling: Time series analysis got a boost with methods like factoid-based evaluation, offering more precise financial insights.
  6. RAG Advancements: Llama Stack and RagChecker redefine retrieval-augmented generation and agentic applications.
  7. Optimization Papers: New algorithms like adaptive proximal gradient methods and Gauss-Newton optimization are set to streamline computation-heavy tasks.
  8. Reinforcement Learning: Offline RL techniques showed promise in enhancing exploration and tackling sequential decision-making challenges.
  9. Computer Vision: SAM2 and refined decoder designs led discussions on efficient test-time training and multimodal learning.
  10. Knowledge Distillation and Distribution Shifts: Techniques like synthetic data distillation and adversarial handling underscore AI’s adaptability to real-world challenges.

Curious about what these mean for AI in 2025? Watch our video for an in-depth breakdown and discover how you can leverage these insights for your projects. Don’t miss out—subscribe for more updates!

YouTube player

1- Autonomous driving:

non-reactive autonomous vehicle simulation

2- Data and evaluation for foundation model:

RepLiQA: a question answering dataset for benchmarking LLM on unseen reference content

Suremap: simultaneous mean estimation for single task and multi task disaggregated evaluation 

Benchmarking Large Language Models for Task Automatic

Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks

Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors

3- Transformer architecture:

Why warmup the learning rate?

Mixture of expert (MoE):

Papers: measuring Deja VU memorization efficiently 

Spiking transformer with experts mixture

Grokked transformers are implicit reasoners: a mechanistic journey to the edge of generalization

Weight decay induces low-rank attention layer

One-Layer Transformer Provably Learns One-Nearest Neighbor In Context

How Transformers Utilize Multi-Head Attention in In-Context Learning?

A Case Study on Sparse Linear Regression

Pretrained Transformer Efficiently Learns Low-dimensional Target Functions In-context

4- Large language model

CCA: Mitigating Object Hallucination via Concentric Causal Attention

Online Weighted Paging With Unknown Weights

Bias and Volatility:A Statistical Framework for Evaluating Large Language Model’s Stereotypes and the Associated Generation Inconsistency

An auctions for LLM via retrieval augmented generation 

Wings; learning multimodal LLM without text-only forgetting

Evaluating numerical reasoning in text -to-image models

LoQT: Low-Rank Adapters for Quantized Pretraining

Mixture of Experts Meets Prompt-based Continual Learning

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Sirius: Contextual Sparsity with Correction for Efficient LLMs

ReFT: Representation Finetuning for Language Models

Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts

Instruction Tuning Large Language Models to Understand Electronic Health Records

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages

Online Adaptation of Language Models with a Memory of Amortized Contexts

Time-Reversal Provides Unsupervised Feedback to LLMs

SmallToLarge (SL): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

5- Sequential and time series

TFT: temporal fusion transformer

TEMPO: prompt-based genitive pre-trained transformer for time series forecasting

The FinBen: An Holistic Financial Benchmark for Large Language Models

RTUS: A recurrent architecture that allows efficient realtime recurrent learning.

6- Agents and RAG

RagChecker; a fine-grained framework for diagnosing retrieval-augmentation generation

Embodied agent interface: benchmarking LLMs for embodied decision making

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering

APIGen: automated pipeline for generating verification and diver function calling dataset

Secret collusion among ai agents: multi-agent deception via steganography

CRAG- comprehensive RAG benchmark

On the Effects of Data Scale on Ul Control Agents

7- Optimization

​​Optimal Parallelization of Boosting

The road less schedule

Exact, ractable Gauss-Newton Optimization in Deep Reversible Architectures Keveal

Poor Generalization

A New Efficient Scale-Invariant Version of AdaGrad

Adaptive Proximal Gradient Method for Convex Optimization

A Simple and Optimal Approach for Universal Online Learning with Gradient Variations

DAGER:exact gradient inversion for large language models 

Can models learn skill composition from examples

Heavy-tailed class imbalance why adam beat SGD on language models

How to boost any loss function

How to reduce the memory usage of Adam optimizer?

PRIVATE ONLINE LEARNING VIA LAZY ALGORITHMS

8- Reinforcement learning

worse case offline RL with arbitrary data support

Entropy-regularized Diffusion Policy with Q-Ensembles for Offline RL

The surprixing ineffectiveness of pre-trained visual representation for model-based RL

PPO sufffers from a deteriorating representation that breaks it’s trust region

Mitigating portal observability in seq decision process via the Lambda discrepancy 

GenRL: Multimodal-foundation world models for generalization in embodied agents

Generative trajectory augmentation with guidance for offline RL

Constrained latent action policies for model-based offline RL

Parallelizing mode-based RL over the sequence length

Rethinking model based, policy based and value based RL via the lens of representation complexity

Optimal design for human preference elicitation 

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Subwords as skills; tokenization for spase-reward RL

Can learned optimization make RL less difficult

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

Mitigating Covariate Shift in Behavioral Cloning via Robust Stationary Distribution Correction

First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs

Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

trajectory Data Suffices for Statistically Efficient Learning in Offline

RL with Linear q”-Realizability and Concentrability

Exclusively Penalized Q-learning for Offline Reinforcement Learning

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

Logarithmic Smoothing for Pessimistic Off-Policy

Evaluation, Selection and Learning

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning’​

Robust Reinforcement Learning from Corrupted Human Feedback*

C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning

Sample Complexity Reduction via Policy Difference Estimation in Tabular RL

Ensemble sampling for linear bandit: small ensembles suffices

Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation

9- Computer vision

Asynchronous perception machine for efficient test time training

On the Comparison between Multi-modal and Single-modal Contrastive Learning

Rethinking decoders for transformer-based semantic segmentation: compression is all you need

CLIPCEIL: domain generalization through clip via channel refinement and image text alignment

WATT: weight average test-time adaptation fo CLIP

Unibench: visual reasoning requires rethinking vision-language beyond scaling

Calibrated Self-Rewarding Vision Language Models

10- others

10-1 Knowledge distillation 

DDK:distilling domain knowledge for efficient large language models

UNDERSTANDING THE GAINS FROM REPEATED SELF-DISTILLATION

10-2 Distribution shift 

Out-Of-Distribution Detection with Diversification (Provably)

Changing the Training Data Distribution to Reduce Simplicity

Bias Improves In-distribution Generalization

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.