Top papers: Deleting 40% Without Accuracy Drop, Turbo Sketch, AnimateDiff
In an era where technological advancements are the driving force for competitive edge, CEOs must stay informed about transformative developments in Artificial Intelligence (AI). Cutting-edge research has uncovered a way to significantly reduce the computational weight of Large Language Models (LLMs) without sacrificing accuracy.
Meta, Cisco, and MIT have joined forces to demonstrate that up to 40%–50% of an AI’s layers can be pruned, revealing a previously untapped potential for efficiency. Leading the charge towards leaner, more powerful AI systems, this breakthrough holds promise for a future where smaller models can perform as effectively as their larger counterparts, promising a substantial reduction in costs and resource use.
From the boardroom to the server room, this pioneering research is not just about streamlining operations—it’s about spearheading a movement towards a faster, smarter, and more economical AI-driven landscape. CEOs looking to stay ahead of the curve need to understand how these innovations can transform their business and the tech industry as a whole.
Gear up to explore the findings that are set to redefine AI, learn about the imaginative Img2Img Turbo Sketch that brings new tasks to life with unprecedented efficiency, and be amazed by the blistering speed of AnimateDiff-Lightning, a model that turns text into video at lightning-fast speeds.
List of content
Deleting 40% of LLM Layers Without Drop in Accuracy
Meta, Cisco, and MIT researchers demonstrated that large language models (LLMs) could have up to 40%-50% of their layers pruned with minimal impact on accuracy.
The process involved pruning, quantization, and parameter-efficient finetuning (PEFT) strategies, testing on models ranging from 2B to 70B parameters, across the Llama, Qwen, Mistral, and Phi families.
Performance Impact:
Llama 70B and Llama 13B models showed slight accuracy loss after 40% and 50% layer pruning, respectively.
Other models experienced minimal accuracy declines with 20-30% of layers removed.
Pruning Process:
- Identification of Layers for Removal: The team used a similarity score to find redundant or less important layers. Layers with the lowest angular distance between their representations were targeted for pruning.
- Layer Pruning Strategy: They progressively deleted layers that showed minimal change in output when compared to adjacent layers.
- Fine-tuning Post-pruning: The models were fine-tuned using QLoRA on the C4 dataset to recover any lost performance, focusing on regaining performance on benchmarks like MMLU and BoolQ.
Cost Benefits: The research highlighted a direct linear reduction in both memory and compute requirements for inference as layers were pruned, showcasing a method to make LLMs less resource-intensive.
Critical Role of Shallow Layers: Findings indicated that shallow layers had a disproportionate imhttps://arxiv.org/html/2403.17887v1pact on model outcomes, suggesting deep layers could be removed with negligible effects. This points to potential inefficiencies in how deep layers are currently utilized.
Why This Matters
It suggests AI models might be more efficient than previously thought, opening paths to faster, cheaper AI, challenging the necessity of large models, and indicating a need to refine training methods to fully utilize model capacity.
Community Feedback
Tomas Miskov: “Interesting stuff. So why can’t we just train a smaller network and get the same result? “
Stuart Watson: “The LLM AI field is so new that proper optimisation is just starting. Papers like this show there are massive leaps available”
Img2Img Turbo Sketch
The space proposes a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning. This enables to leverage the internal knowledge of pre-trained diffusion models while achieving efficient inference.
You can try it directly within the space by entering a text prompt, sketching a first draft, choosing the style and the level of sketch guidance and generate an image based on your settings.
AnimateDiff-Lightning
AnimateDiff-Lightning is a lightning-fast text-to-video generation model. It can generate videos more than ten times faster than the original AnimateDiff. From a text prompt, the distilled model from AnimateDiff SD1.5 v2 generates a video in just a few seconds.
You can try it in the space by entering a text prompt and selecting the base model, the type of motion you want and the number of inference steps.