← Back to Blog

Introducing VisualizeML: An ML Architecture Zoo

5 min read
machine-learningvisualizationtransformersloravibe-coding

Today I'm introducing VisualizeML, an ML architecture zoo designed to help understand current and emerging concepts in machine learning. the advent of coding agents makes this project is particularly enjoyable for me, the ability to spin up an interactive visualization at the same time of reading the paper is deeply satisfying, and the process makes the learning stick better than just reading the paper alone.

The project lives at /projects/visualizeml and is growing steadily as I encounter interesting ideas worth visualizing.

The Idea

ML concepts are hard to internalize from text and equations alone. Diagrams help, but they're static on paper. What if you could click into a transformer layer, see real weight matrix dimensions, and watch a LoRA adapter decompose a 67M-parameter projection into two tiny low-rank matrices? That's the premise: interactive visualizations with real numbers from real architectures, where each concept lets you drill down and build intuition at your own pace.

What's Built So Far

LLM + LoRA Architecture Explorer

The flagship visualization (still WIP), it has an interactive walkthrough of the transformer architecture as implemented in models like LLaMA 2 (70B, 13B, 7B) and Mistral 7B. It has three levels of detail:

  1. Full Model Pipeline. A horizontal flow showing the token embedding, all transformer layers, and the language model head. Each layer is a compact block with color-coded sub-components. Click any layer to zoom in.

  2. Transformer Block Detail. The internals of a single layer: RMSNorm, multi-head attention (with grouped query attention for 70B), feed-forward network with SwiGLU, and residual connections. LoRA adapter attachment points are highlighted when a fine-tuning technique is selected.

  3. LoRA Weight Decomposition. Click a weight projection (Q, K, V, or O) to see the mathematical decomposition: how LoRA replaces a full-rank update with two small matrices. The frozen base weights are shown in blue, trainable adapters in orange, with exact parameter counts for the selected model.

You can switch between models and fine-tuning techniques (full fine-tuning, LoRA, QLoRA) to see how the numbers change. For LLaMA 2 70B with rank-16 LoRA, you're training roughly 65.5 million parameters, just 0.094% of the model's 70 billion. Seeing that ratio rendered visually next to the full weight matrices makes the efficiency of parameter-efficient fine-tuning click immediately.

Manifold Geometry

Inspired by Anthropic's research on how models manipulate data in latent space ("When Models Manipulate Manifolds"), this concept visualizes the geometric transformations that neural networks learn. It's the kind of paper that's fascinating to read but hard to hold in your head. The visualization gives you something concrete to anchor the intuition.

XGBoost Decision Trees

A visualization of gradient-boosted decision trees: how individual weak learners combine, how splits are chosen, and how the ensemble builds its predictions iteratively. This one came from a quick experiment after seeing a post about XGBoost and wanting to understand the mechanics beyond the API.

Why Coding Agents Make This Fun

Coding agents changed the mental economics of such tasks significantly. Building a one-off interactive SVG visualization used to be a multi-day endeavor (if ever). Now the bottleneck is understanding the concept and deciding how to present it. going from reading a paper to a working interactive visualization in a single session is very rewarding :). It feels more like sketching than engineering, which is the kind of flow energy you want for exploratory learning.

What's Next

The backlog for VisualizeML is rich. Here's what I'm planning to add as the architecture zoo grows:

More architectures to visualize:

  • Autoencoder. Input to bottleneck to reconstruction, with an interactive view of latent space compression. Train on MNIST or similar and let you watch data flow through the encoding/decoding pipeline.
  • Variational Autoencoder (VAE). Extending the autoencoder with a probabilistic latent space. The interesting part to visualize here is sampling and interpolation between latent points, showing how smoothly the model generates new outputs as you move through the space.
  • GAN (Generative Adversarial Network). The training dynamics between generator and discriminator are inherently visual. I want to show the adversarial dance: generator output evolving over training steps, loss curves for both networks, and the moment the generator starts producing convincing outputs.
  • Attention Heatmaps. The transformer block view already shows the multi-head attention mechanism structurally, but I haven't yet built a token-level attention weight visualization. A heatmap showing which tokens attend to which across a real input sequence would complete the picture.

New visualization approaches:

  • Three.js 3D Visualizations. Exploring Three.js for 3D interactive views of neural network graphs, latent space point clouds, and data manifold landscapes. Some concepts are inherently three-dimensional (like latent spaces), and being able to orbit, zoom, and explore them in 3D would add a dimension (literally) that 2D SVGs can't capture Research directions I'm watching:

  • Google's LangExtract and PaperBanana tools, still evaluating how these fit into the workflow

  • Reinforcement learning visualization (possibly tying into TensorTrade for financial RL experiments)

  • Market regime detection using Hidden Markov Models, another concept that would benefit from interactive visualization

The beauty of a zoo-style project is that every interesting paper or concept I encounter is a potential new exhibit. The bar for adding one is low: if I can explain it better with an interactive diagram than with words, it belongs here.

Try It

Head over to /projects/visualizeml and click around. Start with the LLM + LoRA explorer, select LLaMA 2 70B, and drill down into a layer. If you've ever wondered what's actually inside a transformer or why LoRA is so efficient, this should make it concrete.