SGLang enhances its capabilities with a new Diffusion engine and adds support for Devstral 2 and LLaDA 2.0 models. NVIDIA's AI platform achieves a new performance milestone in graph processing.
Main Content
SGLang Feature Update: Introduces a new Diffusion engine supporting text-to-image, text-to-video, image-to-image, and image-to-video generation. The update includes multi-GPU acceleration, parallel CFG for faster inference, and both CLI and Python API interfaces.
SGLang New Release: The Miles series now enables true online policy for Vision Language Models (VLMs) in FSDP, aligning with SGLang's inference to achieve precise log probability matching and a KL divergence of absolute zero.
SGLang Integration Update: Adds support for the LLaDA 2.0 model, including 16B and 100B MoE versions. This integration delivers a 2.1x inference speedup and shows excellent performance in code, math, and agentic tasks.
SGLang Integration Update: Full support for the Devstral 2 coding model series has been integrated, providing end-to-end automation capabilities for developers.
NVIDIA AI Platform Performance Optimization: Achieves 410 trillion TEPS on the Graph500 benchmark by leveraging CUDA, Spectrum-X networking, Hopper GPUs, and a new active messaging library. This reduces hardware requirements and significantly accelerates graph processing speeds.