Blog - Yuxuan Zhang

AI Agents 2026 · bilingual · interactive

How 5 coding agents actually run — a harness architecture comparison

A deep-dive into hermes-agent, claw-code, codex, opencode, and openclaw: their one-turn control flow, the Tools / Skills / Harness stack, and why training-friendliness is the new axis of elegance. Zero-to-deep, interactive flowcharts with every step tied to a real source line.

LLM Training 2026 · Chinese · RLHF

LLM RL 核心算法：PPO、DPO、GRPO

A compact map of the three algorithms worth mastering first: PPO for classic online RLHF, DPO for offline preference optimization, and GRPO for critic-free reasoning RL.

Coding Interview 2026 · Chinese · 2-hour review

Coding Test 速查：LeetCode Hot 100 + 手写 MHA / Conv1D / Conv2D

A two-hour review plan for Hot 100 patterns, plus clean PyTorch interview templates for multi-head attention, one-dimensional convolution, and two-dimensional convolution.