"summary": "Sustained Mode: explicit long-context native-MTP path with chunked contiguous prefill, final-token logits, and repaged decode KV." "tail_preview": "# Final user request Write code only.
This project implements a speculative decoding engine designed specifically for the EAGLE-2 architecture. Speculative decoding accelerates Large Language Model (LLM) inference by generating a "draft" ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results