Social

Soul App Open-Sources a Real-Time Avatar Generation Model SoulX-FlashTalk

Soul App, a Chinese AI social platform, announced that its AI research team, Soul AI Lab, has open-sourced SoulX-FlashTalk, a real-time avatar generation model. The model is built for low latency and high visual quality in interactive video.

SoulX-FlashTalk is a 14B-parameter model. It is the first model of this scale to achieve 0.87-second sub-second latency, maintain 32 FPS real-time generation, and support stable ultra-long video output. These capabilities mark an important step toward practical commercial use of large-scale real-time generative avatars.

Sub-Second Latency for Real-Time Interaction

Latency is critical to interactive video experiences. Through a full-stack inference acceleration suite, SoulX-FlashTalk reduces time-to-first-frame to 0.87 seconds, enabling near-instant response even with a 14B-parameter model. This capability eliminates the lag typically associated with large generative models and allows avatars to respond naturally in live interactions.

Sustained 32 FPS High-Throughput

Despite its 14B large DiT-based framework, SoulX-FlashTalk delivers sustained 32 FPS inference throughput in both short-form and long-form video tasks, exceeding the widely accepted live-streaming benchmark of 25 FPS, outperforming comparable models in inference efficiency. This performance supports smooth interactions across scenarios such as video calls, live streaming, and real-time customer service.

Stable Ultra-Long Video Generation

Long-duration avatar video generation often suffers from identity drift and visual degradation. SoulX-FlashTalk addresses this challenge with a proprietary Self-Correcting Bidirectional Distillation strategy. The system simulates error accumulation in long sequences and corrects it in real time. It keeps full bidirectional attention, allowing each frame to reference both past context and implicit future information.

This design effectively suppresses identity drift, ensuring consistent facial features, lip movements, and background details during live streams. In long-video generation exceeding five minutes, the model achieved a Sync-C score of 1.61, showing strong audiovisual synchronization stability.

Audio-Driven Full-Body Motion

SoulX-FlashTalk goes beyond facial rendering. It supports audio-driven full-body motion generation, producing natural and realistic human movement. In short-video evaluations, it achieved an ASE score of 3.51 and an IQA score of 4.79, setting new benchmarks for visual fidelity, along with a Sync-C score of 1.47 for precise lip synchronization.

The 14B-parameter framework reduces common issues such as hand distortion and motion blur. At the same time, it preserves clear structure and sharp visual details. Even with large movements, the model maintains high identity consistency, achieving a Subject-C score of 99.22.

Practical Industry Applications

Traditional avatar generation solutions often face long rendering times, high latency, and unstable output. Quantitative evaluations on the TalkBench-Short and TalkBench-Long datasets show that SoulX-FlashTalk outperforms comparable models in visual quality, synchronization accuracy, and generation speed.

With its open-source release, SoulX-FlashTalk enables deployment across multiple real-world scenarios. In e-commerce, it supports 24/7 AI-powered live streaming, addressing common issues such as lip-sync drift and image degradation during prolonged broadcasts. Even under high-frequency real-time interaction, the system maintains high visual fidelity, significantly reducing operational costs. The model also provides production-ready solutions for short-video creation, AI-assisted education, interactive NPC experiences, and AI-powered customer service.

Ongoing Open-Source Work at Soul AI Lab

The release of SoulX-FlashTalk marks a new stage in Soul AI Lab’s open-source efforts. In October 2025, the team open-sourced its speech synthesis model SoulX-Podcast. The model topped the Hugging Face Text-to-Speech trending list and has received more than 3,100 stars on GitHub.

By continuing to advance speech and visual interaction technologies and collaborating with the global developer community, Soul aims to contribute to the evolution of AI-driven social interaction.

Brian Meyer

Want to boost your website’s visibility and authority? Get high-quality backlinks from top DA/DR websites and watch your rankings soar! Don’t wait any longer — take your SEO performance to the next level today. 📩 Contact us now: BrianMeyer.com@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *