TechNow: Open source Grok-1, PrivateGPT, Apple Multimodal Models
In this article, we’ve got Grok-1, the colossal 314-billion parameter open-source LLM from xAI, shattering records and empowering researchers. For secure on-device AI, PrivateGPT lets you build confidential applications using large language models on your local documents. And giants like Apple are joining the fray with MM1, a new family of multimodal models that can process both text and visual data.
Open source Grok-1
xAI finally open-source Grok-1 making it the largest open LLM ever built.
With 314-billion parameters, the Mixture of Experts (MoE) model utilizes 86 billion active parameters at any given time, enhancing its processing capabilities.
Unlike traditional models, Grok-1 employs Rotary Embeddings, avoiding fixed positional limitations and supporting a more dynamic data interpretation.
Key Specifications:
- Parameters: 314 billion, with 25% of weights active per token.
- Architecture: Mixture of 8 Experts, using 2 per token.
- Layers: 64 transformer layers, integrating multihead attention and dense blocks.
- Tokenization: Utilizes a SentencePiece tokenizer, vocab size of 131,072.
- Embedding and Positional Encoding: 6,144 embedding size, matching rotary positional embeddings.
- Attention: 48 heads for queries, 8 for keys/values, each with a size of 128.
- Context Length: Capable of processing 8,192 tokens with bf16 precision.
Performance Metrics:
Outperforms LLaMa 2 70B and Mixtral 8x7B with a MMLU score of 73%, showcasing its efficiency and accuracy in various tests.
Implementation Details:
Requires significant GPU resources due to its size.
Uses an inefficient MoE layer implementation to avoid custom kernel needs, focusing on model correctness validation.
The model supports activation sharding and 8-bit quantization to optimize performance.
Open-Source Availability:
Released under the Apache 2.0 license, Grok-1’s weights and architecture are accessible for community use and contribution.
PrivateGPT
PrivateGPT helps you build private, context-aware AI applications using large language models on local documents without internet. It provides an API for ingesting documents, generating embeddings, retrieving relevant context, and generating responses using retrieval-augmented generation pipelines based on LlamaIndex abstractions. It supports completions, streaming, and extends OpenAI API standards.
Apple Multimodal Models
Apple has released MM1, a new family of multimodal AI models designed for processing visual and textual data. The release includes detailed information unusual for Apple.
Model Composition: MM1 encompasses models with up to 30 billion parameters, trained on data including image captions, combined image-text, and text-only datasets.
Learning Efficiency: The 30 billion parameter version shows strong capabilities in few-shot learning, indicating effective learning from limited examples.
Benchmarking: MM1 competes with existing models like GPT-4V and Gemini Pro in pre-training and fine-tuning performance.
Technical Details:
- Performance increases significantly with enhancements in the image encoder and adjustments in image resolution.
- Optimal data combination for training includes image-caption, interleaved image-text, and text-only data.
- The vision-language connector has a smaller effect on performance compared to other factors.
Performance Metrics: The MM1-30B model achieves a 39.4 score in zero-shot settings and 44.4 in eight-shot settings on the MathVista benchmark, demonstrating strong few-shot and reasoning abilities.