Deep dive: Transformers by Gemma, Iterative Reasoning PO, inner work of Transformers
Demystifying Transformers with Google’s Gemma, boosting reasoning tasks with Meta’s Iterative Reasoning Preference Optimization, and enhancing understanding of Transformer models with a unified interpretability framework. These are the latest strides in AI, making complex concepts accessible and improving model performance. Stay tuned for more!
Understanding Transformers by breaking down Gemma
Transformer-based LLMs seem mysterious, but they don’t need to. This tutorial breaks down a modern transformer LLM, Google’s Gemma, providing bare-bones PyTorch code and some intuition for why each step is there.
If you’re a programmer and casual ML enthusiast, this is written for you. It covers:
- Tokenization
- Embedding lookup
- Post-embedding rescaling
- Transformer layer
- RMSNorm
- Attention
- Rotary positional encoding (RoPE)
- Multi-layer perceptron (MLP, GeGLU)
- Final norm
- Output projection
Iterative Reasoning Preference Optimization
Problem: Traditional methods fail to improve reasoning tasks using iterative optimization. Reasoning tasks remain challenging despite advances in language models.
Solution: Meta introduces Iterative Reasoning Preference Optimization. It iteratively generates chain-of-thought candidates, constructs preference pairs based on correct answers, and trains with modified DPO + NLL. This method boosts accuracy significantly without extra data.
Results: Llama-2-70B-chat accuracy jumps from 55.6% to 81.6% on GSM8k. Improvement seen across tasks: GSM8k (55.6% to 81.6%), MATH (12.5% to 20.8%), ARC-Challenge (77.8% to 86.7%). Outperforms baselines, emphasizing its effectiveness for enhancing reasoning in LLMs.
A Primer on the Inner Workings of Transformer-based Language Models
Problem: Understanding the inner workings of Transformer-based language models is challenging. Existing interpretability methods lack comprehensive coverage and fail to provide actionable insights
Solution: The paper introduces a unified framework for interpreting Transformer models, categorizing methods into localizing inputs or model components and decoding learned representations.
Results: Insights include types of attention heads, neuron roles, and circuits found. Performance metrics show improved interpretability, with a 20% increase in identifying model behavior. The framework achieves 95% accuracy in localizing inputs and components, enhancing understanding of Transformer-based language models for AI researchers and ML engineers.