A deep-dive into hermes-agent, claw-code, codex, opencode, and openclaw: their one-turn control flow, the Tools / Skills / Harness stack, and why training-friendliness is the new axis of elegance. Zero-to-deep, interactive flowcharts with every step tied to a real source line.
A compact map of the three algorithms worth mastering first: PPO for classic online RLHF, DPO for offline preference optimization, and GRPO for critic-free reasoning RL.
A two-hour review plan for Hot 100 patterns, plus clean PyTorch interview templates for multi-head attention, one-dimensional convolution, and two-dimensional convolution.