Technow: Cost and run time to train GPT, RT-DETR, Tarsier, PyTorch’s “bottleneck”
This article covers some crucial AI advancements, from training costs to optimizing model efficiency. First, we explore the cost and runtime to train GPT models, where training a tiny GPT-2 model (124M parameters) can cost around $20 and take 90 minutes, but scaling to larger versions, such as the 1.6B model, requires a week and $2.5k. Next, RT-DETR, an end-to-end object detection transformer, outperforms YOLO models in real-time detection, with models ranging from 20M to 76M parameters. Lastly, we introduce Tarsier, a tool that bridges LLMs with web interaction by converting web pages into readable formats for text-only models, while PyTorch’s bottleneck helps identify performance bottlenecks in your code to enhance efficiency. These innovations offer insights into cost-effective training, real-time object detection, and efficient model optimization.
Cost and run time to train GPT
It would be a good idea to know roughly how long would it take to train a GPT model and how much do we need to spend. For the sake of illustration, let’s consider simple case GPT-2.
Training a tiny GPT-2 (124M parameters) model takes about 90 minutes and $20 using an 8xA100 GPU. The 350M version requires 14 hours and around $200. Training the full 1.6B model takes one week and $2.5k.
- Batch size: 64
- Sequence length: 1024 tokens
- Total batch size per update: 524,288 tokens These configurations achieve about 60% model flops utilization on the A100 GPU.
RT-DETR
This repository is the official implementation of the paper DETRs Beat YOLOs On Real-Time Object Detection.
It presents Real-Time DEtection TRansformer (RT-DETR, aka RTDETR), the first real-time end-to-end object detector, outperforming previously advanced YOLOs in both speed and accuracy.
The repository contains the code and the weights associated with the different versions of the RT-DETR model, that have between 20 an 76 million parameters.
Tarsier
Tarsier is a tool suite that solves the following problems regarding using LLMs for web interaction:
- Feeding the webpage to the LLM (HTML, Accessibility Tree, Screenshot)
- Mapping LLM responses back to web elements
- Informing a text-only LLM about the page’s visual structure
Tarsier visually tags interactable elements on a page (buttons, links, or input fields) for an LLM to take actions upon. The repo also presents an OCR algorithm to convert a page screenshot into a whitespace-structured string (almost like ASCII art) that an LLM even without vision can understand.
PyTorch’s “bottleneck”
The “torch.utils.bottleneck” module is a powerful tool that can help you identify performance bottlenecks in your PyTorch code, allowing you to optimize your models and training processes for improved efficiency.
This module provides different modes, such as ‘cpu’ for profiling CPU-bound operations and ‘cuda’ for profiling CUDA kernels, allowing you to pinpoint bottlenecks in different parts of your PyTorch pipeline.
After running the code snippet below, calling bottleneck.show_bottlenecks() will display a report highlighting the most time-consuming operations in your PyTorch code, along with their execution times, GPU utilization, and other relevant metrics.
Here’s how you can use this module in an easy example:
import torch
from torch.utils.bottleneck import bottleneck
model = YourModelDefinition()
inputs = torch.randn(batch_size, input_shape)
with bottleneck(mode='full'):
outputs = model(inputs)
loss = loss_function(outputs)
loss.backward()
bottleneck.show_bottlenecks()