Technow: Phi-3, Open LLM IoS app, Mini-Gemini, LLaMa-Factory

July 12, 2024 admin

Microsoft is shaking things up with Phi-3, a series of open-source large language models (LLMs) designed for accessibility and performance. Available in mini, small, and medium sizes, these models run efficiently on both mobile devices and PCs. Let’s explore what makes Phi-3 unique!

Phi-3 Open LLM by Microsoft
Huggingface IoS App for local LLM
Mini-Gemini
LLaMa-Factory

Phi-3 Open LLM by Microsoft

What’s New
Microsoft has launched the Phi-3 series, which includes three sizes: mini (3.8 billion parameters), small (7 billion parameters), and medium (14 billion parameters). These models are designed to run efficiently on both mobile devices and PCs, using advanced datasets to achieve high performance.

Technical Details and Performance:
Architecture: All models feature a transformer decoder architecture.

Performance: The mini model achieves 69% on the MMLU and 8.38 on MT-bench, showing performance on par with larger models such as Mixtral 8x7B and GPT-3.5.

Context Support: Supports a default 4K context length, expandable to 128K through LongRope technology.

Training and Data Strategy: The models are trained on a combination of heavily filtered web data and synthetic data, using a two-phase approach that enhances both general knowledge and specialized skills like logical reasoning.

Architecture and Compatibility: Each model uses the same tokenizer as the Llama-2, ensuring compatibility with existing packages. This design focuses on robustness, safety, and effective interaction in a variety of formats, including chat.

Access and Deployment:
Availability: The entire Phi-3 series is accessible under an MIT license on the Hugging Face platform, allowing for widespread use and integration.

Mobile Optimization: The mini model is particularly optimized for mobile use, requiring only about 1.8GB when compressed to 4-bits and efficiently processing over 12 tokens per second on mobile devices like the iPhone 14.

Post-Training and Context Extension: Post-training enhancements have improved the models’ abilities in specific domains such as math and coding. The mini model also offers an extended context version capable of handling up to 128K tokens for complex tasks.
Community Feedback
Following the release of Llama 3 models, this is an unprecedented time for the open-source community.

There is a strong trend of models achieving better performance with smaller sizes. Microsoft anticipates companies to increasingly opt for a an dual approach, using both small and large models depending on the use case.
Community Feedback
Bindu Reddy: “Phi-3 7B just dropped and beats Llama-3 7B handily. With an MMLU of 75.3, it’s coming close to 70B SOTA models!! 🤯 I wouldn’t be surprised if we ended up with a 7B model that beats GPT-4 by the end of the year.”

Nathan Lambert: “I really hope phi 3 proves us wrong about evaluation doping and it is actually an amazing model. But, being an outlier on log compute <-> MMLU plots is a little sus”

Aaron Ng: “if these scores hold up everyone could have something nearly as good as gpt-3.5-turbo on a phone phi-3-mini q4 should only need something like ~2gb of ram (mixtral 8x7b needs ~22gb for comparison)”

ArXiv

Try it

Huggingface IoS App for local LLM

HuggingFace Releases an iOS App to Chat with Open LLMs On Your Phone (Includes Llama 3, phi3)

Download the app

Mini-Gemini

Mini-Gemini helps you improve Vision Language Models by combining high and low-resolution visual encoders with large language models from 2B to 34B. You train and fine-tune on specialized datasets to enhance image understanding and generation. Download pre-trained weights and access detailed training scripts from its GitHub repository.

GitHub

LLaMa-Factory

LLaMA Factory allows you to fine-tune over 100 large language modelsefficiently. You can employ methods like supervised fine-tuning, reward modeling, and policy-based optimizations such as PPO, DPO, and ORPO. It supports 16-bit and 32-bit full-tuning and LoRA adjustments, with options for 2/4/8-bit QLoRA to reduce GPU memory use.

GitHub