TechNow: Open source Grok-1, PrivateGPT, Apple Multimodal Models

May 5, 2024April 9, 2024 admin

In this article, we’ve got Grok-1, the colossal 314-billion parameter open-source LLM from xAI, shattering records and empowering researchers. For secure on-device AI, PrivateGPT lets you build confidential applications using large language models on your local documents. And giants like Apple are joining the fray with MM1, a new family of multimodal models that can process both text and visual data.

Open source Grok-1
PrivateGPT
Apple Multimodal Models

Open source Grok-1

xAI finally open-source Grok-1 making it the largest open LLM ever built.

With 314-billion parameters, the Mixture of Experts (MoE) model utilizes 86 billion active parameters at any given time, enhancing its processing capabilities.

Unlike traditional models, Grok-1 employs Rotary Embeddings, avoiding fixed positional limitations and supporting a more dynamic data interpretation.

Key Specifications:

Parameters: 314 billion, with 25% of weights active per token.
Architecture: Mixture of 8 Experts, using 2 per token.
Layers: 64 transformer layers, integrating multihead attention and dense blocks.
Tokenization: Utilizes a SentencePiece tokenizer, vocab size of 131,072.
Embedding and Positional Encoding: 6,144 embedding size, matching rotary positional embeddings.
Attention: 48 heads for queries, 8 for keys/values, each with a size of 128.
Context Length: Capable of processing 8,192 tokens with bf16 precision.

Performance Metrics:

Outperforms LLaMa 2 70B and Mixtral 8x7B with a MMLU score of 73%, showcasing its efficiency and accuracy in various tests.
Implementation Details:

Requires significant GPU resources due to its size.
Uses an inefficient MoE layer implementation to avoid custom kernel needs, focusing on model correctness validation.
The model supports activation sharding and 8-bit quantization to optimize performance.
Open-Source Availability:

Released under the Apache 2.0 license, Grok-1’s weights and architecture are accessible for community use and contribution.

Github

PrivateGPT

PrivateGPT helps you build private, context-aware AI applications using large language models on local documents without internet. It provides an API for ingesting documents, generating embeddings, retrieving relevant context, and generating responses using retrieval-augmented generation pipelines based on LlamaIndex abstractions. It supports completions, streaming, and extends OpenAI API standards.

GitHub

Apple Multimodal Models

Apple has released MM1, a new family of multimodal AI models designed for processing visual and textual data. The release includes detailed information unusual for Apple.

Model Composition: MM1 encompasses models with up to 30 billion parameters, trained on data including image captions, combined image-text, and text-only datasets.

Learning Efficiency: The 30 billion parameter version shows strong capabilities in few-shot learning, indicating effective learning from limited examples.

Benchmarking: MM1 competes with existing models like GPT-4V and Gemini Pro in pre-training and fine-tuning performance.

Technical Details:

Performance increases significantly with enhancements in the image encoder and adjustments in image resolution.
Optimal data combination for training includes image-caption, interleaved image-text, and text-only data.
The vision-language connector has a smaller effect on performance compared to other factors.

Performance Metrics: The MM1-30B model achieves a 39.4 score in zero-shot settings and 44.4 in eight-shot settings on the MathVista benchmark, demonstrating strong few-shot and reasoning abilities.

ArXiv paper