Can your humanoid walk up and hand you a full cup of beer—without spilling a drop? While humanoids are increasingly featured in flashy demos—dancing, delivering packages, traversing rough terrain—fine-grained control during locomotion remains a significant challenge. In particular, stabilizing a filled end-effector (EE) while walking is far from solved, due to a fundamental mismatch in task dynamics: locomotion demands slow-timescale, robust control, whereas EE stabilization requires rapid, high-precision corrections. To address this, we propose SoFTA, a Slow-Fast TwoAgent framework that decouples upper-body and lower-body control into separate agents operating at different frequencies and with distinct rewards. This temporal and objective separation mitigates policy interference and enables coordinated whole-body behavior. SoFTA executes upper-body actions at 100 Hz for precise EE control and lower-body actions at 50 Hz for robust gait. It reduces EE acceleration by 2-5x relative to baselines and performs much closer to human-level stability, enabling delicate tasks such as carrying nearly full cups, capturing steady video during locomotion, and disturbance rejection with EE stability.
w/o ours stabilization control
with ours stabilization control
A fair comparison
More clear compensation behavior
Unitree Default Controller
Tapping in place with large gait period
Tapping in place with large gait period (another view)
Tapping in place with small gait period
Turning in place
Walking + slight turn
w/o ours(tapping)
w/o ours(slightly forward)
Ours(tapping)
Ours(slightly forward)
Ours(sudden start)
Turning
Go forward (1)
Go forward (2)
Walk toward the windowsill and then turn
Circle around
Step out of the elevator sideways
Handover
Ours (with or w/o stabilization controller)
Booster T1 Default controller
Unitree G1 backward
Unitree G1 tapping
Compensation behavior in Unitree G1 forward
Booster T1