All Classes and Interfaces

Class
Description
 
AdamW optimizer with per-parameter state.
 
One language-modeling batch: x are input token ids, y are targets (next-token ids).
 
 
 
Minimal byte-level tokenizers (0-255).
Wiring helpers for causal language model training.
 
 
 
 
Collects GPU operations lazily and flushes them as a single command buffer.
 
Cross-entropy loss with integer class targets.
 
Token embedding: ids -> vectors.
Flexible fully-connected neural network (MLP) built from Linear projections.
Gaussian Error Linear Unit (GELU), using the tanh approximation popularized by GPT-2.
 
 
Minimal GPT-style decoder-only transformer for educational/training use.
Handle to a GPU-resident float buffer managed by a ComputeGraph.
Abstraction over a GPU compute runtime (Metal, CUDA, Vulkan, etc.).
Differentiable module mapping Tensor -> Tensor.
LayerNorm over feature dimension (cols) with trainable gamma/beta exposed as Parameters.
Fully-connected layer: y = xW + b x: [n x dIn], W: [dIn x dOut], b: [1 x dOut]
 
 
 
 
Multi-head causal self-attention for a single sequence (no batch dimension).
Simple mutable parameter holder for optimizers.
Optimizer that updates a set of Parameters once per training step.
 
Learnable positional embeddings added to token embeddings.
 
 
Row-wise softmax for 2D tensors: applies softmax independently to each row.
Helpers to train classic Tensor->Tensor supervised models (e.g., FNN) using the unified Trainer wrapper.
 
 
Utilities for converting between Tensor (double[][]) and flat float[] arrays used by GPU runtimes.
 
Simple in-memory dataset that samples random contiguous chunks from token ids.
Simple autoregressive text generation for GPTModel.
 
 
 
Example: classic MLP training using Linear + activation (via FNN) and the unified Trainer wrapper.
A small, reusable training loop wrapper.
 
 
 
Example: tiny GPT training on a small text file using byte-level tokens.
Pre-LN Transformer block: x = x + Attn(LN(x)) x = x + MLP(LN(x))
Convenience builder for assembling transformer stacks.
A simple sequential stack of TransformerBlocks.