NewsTechnology

Technow: OpenELM, FineWeb, OpenVoice

Apple has released OpenELM, a family of small yet efficient language models designed for on-device applications, leveraging a unique “layer-wise scaling” architecture. Huggingface’s FineWeb offers over 15T tokens of cleaned, deduplicated English web data optimized for LLM performance. MIT CSAIL and MyShell.ai’s OpenVoice V2 is a text-to-speech model that enables instant voice cloning and supports multiple languages with enhanced audio quality.

Apple Releases OpenELM

What’s New
Apple has unveiled OpenELM, a family of small yet efficient language models tailored for on-device applications. These models range from 270M to 3B parameters, making them suitable for deployment on mobile devices and computers.

Core Innovation: Layer-wise Scaling Architecture
The key innovation lies in OpenELM’s “layer-wise scaling” architecture. It strategically allocates fewer parameters to the initial transformer layers near the input and gradually increases the parameter count towards the output layers. This approach optimizes compute resources based on the varying information complexity at each layer.

Model Performance and Benchmarks
OpenELM-1.1B outperforms AI2’s OLMo-1B by 2.36% accuracy while using half as many pre-training tokens.

On ARC-C, MMLU, and HellaSwag benchmarks, the largest 3B model scored 42.24%, 26.76%, and 73.28% respectively.
Training Data and Availability
OpenELM models were trained on 1.8T tokens from datasets like RefinedWeb, deduplicated PILE, RedPajama subset, and Dolma v1.6 subset.

Both pre-trained and instruction-tuned checkpoints are available for all four model sizes (270M, 450M, 1.1B, 3B).
Open-Source and Licensing
Apple released OpenELM under a permissive “sample code” license, allowing commercial use and modification with retained license text on redistribution.

The CoreNet library used for pre-training and training recipes are also open-sourced to enable reproducibility.
Hardware Requirements
Benchmarks cited an Intel i9 workstation with RTX 4090 GPU and an M2 Max MacBook Pro as capable hardware for running OpenELM inference.
Why It Matters
As a vertically integrated hardware and software company, Apple’s open-source OpenELM models pave the way for on-device AI assistants and language capabilities without privacy trade-offs, potentially setting the stage for more advanced device-centric AI experiences across Apple’s ecosystem.
Community Feedback
Indira Negi: “Can’t wait for Apple to step into the LLM arena
They own the hardware in all our pockets. They have to be the one to do this
Fingers crossed that they deliver the ability to run a decent model locally “

The AI Edge: “It seems like everyone is joining the trend of creating compact models, and this launch is another hint towards Apple’s possible advancements in on-device AI, which might be revealed at WWDC”

Vikash K Prasad: “Interesting that they have 270m and 460m odd parameters model finetuned that too instruction fine tuned. Parameter wise this is similar to Bert(non instruction) the first language model released by google in 2018.”
Access
You also can run Apple’s new OpenELM models in MLX LM:
pip install -U mlx-lm

Huggingface FineWeb

The 🍷 FineWeb dataset consists of more than 15T tokens of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library.

🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release of the full dataset under the ODC-By 1.0 license. However, by carefully adding additional filtering steps, we managed to push the performance of 🍷 FineWeb well above that of the original 🦅 RefinedWeb, and models trained on our dataset also outperform models trained on other commonly used high quality web datasets (like C4, Dolma-v1.6, The Pile, SlimPajama, RedPajam2) on our aggregate group of benchmark tasks.

OpenVoice

OpenVoice V2 is an open source text-to-speech model released by MIT CSAIL and MyShell.ai researchers. It enables instantly cloning a person’s voice from a short audio sample and generating highly realistic speech in that cloned voice.

Core Features

  • Accurate voice cloning reproducing speaker’s tone color
  • Control over stylistic attributes: emotion, accent, rhythm, intonation
  • Zero-shot cross-lingual voice cloning for unseen languages
  • Native multi-lingual support: English, Spanish, French, Chinese, Japanese, Korean

Enhanced Audio Quality

A new training approach in V2 delivers significantly improved audio fidelity and more natural-sounding synthesized speech compared to V1.

Zero-Shot Cross-Lingual Cloning

The standout feature of OpenVoice V2 is its zero-shot cross-lingual voice cloning capability, which allows it to produce speech in languages not present in the training data. This functionality is crucial for applications requiring broad linguistic versatility, utilizing just a short audio sample to accurately replicate a speaker’s voice in multiple languages.

Open Source

The model’s technical report and full source code are available on arXiv. As of April 2024, both V1 and V2 are released under the MIT License for free commercial and research use.

System Requirements

The model runs locally on any computer, without the need for specialized hardware.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.