AI & Machine Learning
โ
2.7k
Python
0xSojalSec/airllm
2.7k
Stars
193
Forks
0
Issues
Python
Language
AirLLM enables running 70B parameter large language models with inference on a single 4GB GPU. Uses layer-by-layer computation and memory-efficient techniques to run massive models on consumer hardware that would normally require expensive multi-GPU setups. Makes state-of-the-art AI accessible to researchers and developers with limited GPU resources.
View on GitHub
git clone https://github.com/0xSojalSec/airllm.git
Quick Start Example
python
from airllm import AutoModel
# Load a 70B model on 4GB GPU
model = AutoModel.from_pretrained(
"meta-llama/Llama-2-70b-hf"
)
# Generate text
output = model.generate(
"Explain quantum computing",
max_length=200
)
print(output)