← PROJECTS

visualizeml

LLaMA 2 70B
Concept
LLaMA 2 70B80 Layers — ~70Bhidden=8,192 · 64 heads (8 KV) · FFN=28,672 · 855.7M/layerEmbedTransformerLM HeadLoRA··· ▸Embed32,000 × 8,192012345678···62 layers(click to expand)53.1B params717273747576777879NormHead32,000
Click a layer to explore · Expand ··· to see all layers
◂ ARCH
Architecture
Model
LLaMA 2 70B
Params
~70B
Layers
80
Hidden
8,192
Heads
64 (8 KV)
Head dim
128
FFN dim
28,672
Vocab
32,000
Context
4,096
Per Layer
Attention
151.0M150,994,944
FFN
704.6M704,643,072
Norms
16.4K
Layer total
855.7M
All layers
69.0B80 layers + embed
LoRA

Injects trainable low-rank matrices alongside frozen weights. Only the small A and B matrices are updated during training.

Rank
16
Targets
q, k, v, o
Query (W_q)
262.1KA:8192×16 B:16×8192
Key (W_k)
147.5KA:8192×16 B:16×1024
Value (W_v)
147.5KA:8192×16 B:16×1024
Output (W_o)
262.1KA:8192×16 B:16×8192
Per layer
819.2K
Total LoRA
65.5M65,536,000
% of model
0.095%