0xSojalSec/airllm โ€” GitHub Repository Preview
AI & Machine Learning โ˜… 2.7k Python

0xSojalSec/airllm

by @0xSojalSec ยท

2.7k Stars
193 Forks
0 Issues
Python Language

AirLLM enables running 70B parameter large language models with inference on a single 4GB GPU. Uses layer-by-layer computation and memory-efficient techniques to run massive models on consumer hardware that would normally require expensive multi-GPU setups. Makes state-of-the-art AI accessible to researchers and developers with limited GPU resources.

0xSojalSec
@0xSojalSec Project maintainer on GitHub
View Profile
View on GitHub
git clone https://github.com/0xSojalSec/airllm.git

Quick Start Example

python
from airllm import AutoModel

# Load a 70B model on 4GB GPU
model = AutoModel.from_pretrained(
    "meta-llama/Llama-2-70b-hf"
)

# Generate text
output = model.generate(
    "Explain quantum computing",
    max_length=200
)
print(output)

Tags

#llm#inference#gpu#optimization#ai#deep-learning

Related Projects