In face-to-face conversations, individuals need to switch between speaking and listening roles seamlessly.
Existing 3D talking head generation models focus solely on speaking or listening, neglecting the natural
dynamics of interactive conversation, which leads to unnatural interactions and awkward transitions.
To address this issue, we propose a new task—multi-round dual-speaker interaction for
3D talking head generation—which requires models to handle and generate both speaking
and listening behaviors in continuous conversation.
To solve this task, we introduce DualTalk,
a novel unified framework that integrates the dynamic behaviors of speakers and listeners
to simulate realistic and coherent dialogue interactions. This framework not only
synthesizes lifelike talking heads when speaking but also generates continuous and
vivid non-verbal feedback when listening, effectively capturing the interplay between
the roles. We also create a new dataset featuring 50 hours of multi-round conversations
with over 1,000 characters, where participants continuously switch between speaking
and listening roles.
Extensive experiments demonstrate that our method significantly
enhances the naturalness and expressiveness of 3D talking heads in dual-speaker conversations.
We recommend watching the supplementary video.
Comparison of single-role models (Speaker-Only and Listener-Only) with DualTalk.
Unlike single-role models, which lack key interaction elements, DualTalk supports
speaking and listening role transition, multi-round conversations, and natural interaction.
Overview of DualTalk. DualTalk consists of four components: (a) Dual-Speaker Joint Encoder,
(b) Cross-Modal Temporal Enhancer, (c) Dual-Speaker Interaction Module,
and (d) Expressive Synthesis Module, enabling the generation of smooth and natural dual-speaker interactions.
@inproceedings{peng2025dualtalk,
title={DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations},
author={Ziqiao Peng and Yanbo Fan and Haoyu Wu and Xuan Wang and Hongyan Liu and Jun He and Zhaoxin Fan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025},
}