Training duration and token counts don't guarantee better LLM generalization. What matters is how sequence lengths are structured during training. Learn why variable-length training beats raw scale and how to avoid common pitfalls.