Lumen Language Model

Overview

A 128M Parameter Language Model build from Scratch for Education and research purposes.

Category

Artificial Intelligence

A 128M Parameter Language Model

LumenBase is a 128M-parameter transformer language model built from scratch using PyTorch, featuring a custom tokenizer and GQA-based architecture. It includes a complete training and evaluation pipeline, achieving competitive scores on ARC and HellaSwag reasoning benchmarks.

LumenBase is a 128M-parameter transformer language model developed entirely from scratch using PyTorch. The project includes a custom tokenizer, complete data pipeline, and a transformer architecture featuring Grouped Query Attention (GQA). Training was performed on an NVIDIA H100 GPU for around 10 hours using mixed-precision (FP16/BF16), gradient accumulation, and standard optimization techniques.

The model was evaluated on multiple reasoning benchmarks, achieving ARC-Easy 39.48%, ARC-Challenge 23.55%, and HellaSwag 32.62%. The repository provides training scripts, inference utilities, model checkpoints, and evaluation notebooks, enabling reproducibility and experimentation with lightweight transformer architectures.

Designing a future I want to see

Avatar of the website author

Hariom Jangra

Think Different, Build Different

Hit me up if you are having any Questions

Hariom.profile

Designing a future I want to see

Avatar of the website author

Hariom Jangra

Think Different, Build Different

Hit me up if you are having any Questions

Hariom.profile

Designing a future I want to see

Avatar of the website author

Hariom Jangra

Think Different, Build Different

Hit me up if you are having any Questions

Hariom.profile