Devin: The First AI Software Engineer
In this post we will talk about Devin, the first AI software engineer. In addition, we talk about Figure partner with OpenAI for the voice, Antropic latest model Haiku, and Grok being open sourced.
Devin is a new model that can complete jobs on Upwork, pass interviews at top AI companies, and excel at the SWE-Bench coding benchmark. Launched by Cognition and backed by industry giants such as Patrick and John Collison, Elad Gil, and Peter Thiel’s Founders Fund, Devin launch video reached 27M views on X.
features:
- Handles all aspects of software development, from coding to deployment.
- Operates in a sandboxed environment with access to standard developer tools.
- Facilitates user interaction through a natural language interface for real-time monitoring and commands.
- Demonstrates superior performance in SWE-Bench, resolving 13.86% of issues unassisted.
Devin showcases significant advancements in autonomous software problem-solving and project execution. This model’s introduction to the market indicates a shift towards more sophisticated AI roles in software development.
Currently, access to Devin is limited to a select group of users, with Cognition planning broader availability in the future. Interested individuals or organizations can reach out for early access through Cognition’s contact channels, such as their official website or direct email at info@cognition-labs.com.
The model’s underlying technology, attributed to advances in long-term reasoning and planning, remains proprietary. However, its performance in benchmarks and real-world applications highlights the potential for AI to take on more integral roles in software engineering tasks.
Figure partners with OpenAI to give a voice to their robots
“With OpenAI, Figure 01 can now have full conversations with people”
Figure, in collaboration with OpenAI, released Figure 01, a humanoid robot utilizing a multimodal model for real-time interactions and task execution. The model processes visual and textual data, enabling tasks like object identification and garbage collection. “Our robot can describe its visual experience, plan future actions, reflect on its memory, and explain its reasoning verbally,”
Anthropic releases their fastest model yet: Haiku
Anthropic released Claude 3 Haiku, an AI model processing 21K tokens/second for sub-32K token prompts, targeting enterprise needs. It utilizes a 1:5 input-to-output token ratio for cost-effective large dataset analysis. Rigorous security protocols ensure enterprise-grade protection. Available through the Claude API and Claude Pro, it efficiently processes extensive documents like Supreme Court cases.
xAI finally open-source Grok-1 making it the largest open LLM ever built.
With 314-billion parameters, the Mixture of Experts (MoE) model utilizes 86 billion active parameters at any given time, enhancing its processing capabilities.
Unlike traditional models, Grok-1 employs Rotary Embeddings, avoiding fixed positional limitations and supporting a more dynamic data interpretation.
Key Specifications:
Parameters: 314 billion, with 25% of weights active per token.
Architecture: Mixture of 8 Experts, using 2 per token.
Layers: 64 transformer layers, integrating multihead attention and dense blocks.
Tokenization: Utilizes a SentencePiece tokenizer, vocab size of 131,072.
Embedding and Positional Encoding: 6,144 embedding size, matching rotary positional embeddings.
Attention: 48 heads for queries, 8 for keys/values, each with a size of 128.
Context Length: Capable of processing 8,192 tokens with bf16 precision.
Performance Metrics:
Outperforms LLaMa 2 70B and Mixtral 8x7B with a MMLU score of 73%, showcasing its efficiency and accuracy in various tests.
Implementation Details:
Requires significant GPU resources due to its size.
Uses an inefficient MoE layer implementation to avoid custom kernel needs, focusing on model correctness validation.
The model supports activation sharding and 8-bit quantization to optimize performance.
Open-Source Availability:
Released under the Apache 2.0 license, Grok-1’s weights and architecture are accessible for community use and contribution.