visualizeml

LLaMA 2 70B

Concept

Model

Technique

Click a layer to explore · Expand ··· to see all layers

◂ ARCH

Architecture

Model

LLaMA 2 70B

Params

~70B

Layers

Hidden

8,192

Heads

64 (8 KV)

Head dim

128

FFN dim

28,672

Vocab

32,000

Context

4,096

Per Layer

Attention

151.0M150,994,944

FFN

704.6M704,643,072

Norms

16.4K

Layer total

855.7M

All layers

69.0B80 layers + embed

LoRA

Injects trainable low-rank matrices alongside frozen weights. Only the small A and B matrices are updated during training.

Rank

Targets

q, k, v, o

Query (W_q)

262.1KA:8192×16 B:16×8192

Key (W_k)

147.5KA:8192×16 B:16×1024

Value (W_v)

147.5KA:8192×16 B:16×1024

Output (W_o)

262.1KA:8192×16 B:16×8192

Per layer

819.2K

Total LoRA

65.5M65,536,000

% of model

0.095%