A Mental Model of TPUs for Performance Engineering
A visual mental model for understanding TPU architecture and how it relates to ML workloads.
A visual mental model for understanding TPU architecture and how it relates to ML workloads.
Pretraining SmolLM-360M on a single A100 GPU within a 30-hour window, focusing on feasibility analysis, throughput measurement, and hardware efficiency optimization.
Extending Lox beyond the original spec with len(), explicit initialization checks, and break statements.
Adding single inheritance with method overriding and super keyword support to complete the Lox class system.
Implementing object-oriented programming in Lox with classes, instances, methods, and the this keyword.
Fixing closure scoping bugs with a static resolver pass that computes variable binding distances.
Adding functions as first-class values, closures, and native functions to the Lox interpreter.
Adding control flow structures that make Lox Turing complete.
Extending the Lox interpreter with statements, variables, environments, and lexical scoping.
Implementing expression evaluation by converting AST nodes into runtime values in the Lox interpreter.
Building an expression parser for Lox using recursive descent, with ASTs and the Visitor pattern.
Building a scanner for the Lox language that converts source code into tokens, following Crafting Interpreters.
Analyzing the dot product operation through the roofline model on NVIDIA H100 GPU hardware.
Building a functional HTTP server using Python, from basic TCP connections through file handling and gzip compression.
Exploring whether language model agents can enhance the performance of other LLM agents through a meta-benchmark approach.
Exploring what it means for a set to be countable, with proofs and examples from set theory.