CVPR 2025

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations

1Renmin University of China, 2Ant Group, 3Tsinghua University, 4Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
Project Leader, *Corresponding author

DualTalk is a 3D talking head interaction model capable of supporting multi-round dual-speaker conversations, seamlessly switching between speaking and listening roles, and providing adaptive non-verbal feedback to enhance the naturalness and realism of virtual dialogue.

Abstract

In face-to-face conversations, individuals need to switch between speaking and listening roles seamlessly. Existing 3D talking head generation models focus solely on speaking or listening, neglecting the natural dynamics of interactive conversation, which leads to unnatural interactions and awkward transitions.
To address this issue, we propose a new task—multi-round dual-speaker interaction for 3D talking head generation—which requires models to handle and generate both speaking and listening behaviors in continuous conversation.
To solve this task, we introduce DualTalk, a novel unified framework that integrates the dynamic behaviors of speakers and listeners to simulate realistic and coherent dialogue interactions. This framework not only synthesizes lifelike talking heads when speaking but also generates continuous and vivid non-verbal feedback when listening, effectively capturing the interplay between the roles. We also create a new dataset featuring 50 hours of multi-round conversations with over 1,000 characters, where participants continuously switch between speaking and listening roles.
Extensive experiments demonstrate that our method significantly enhances the naturalness and expressiveness of 3D talking heads in dual-speaker conversations. We recommend watching the supplementary video.



Proposed Method

dualtalk


Comparison of single-role models (Speaker-Only and Listener-Only) with DualTalk. Unlike single-role models, which lack key interaction elements, DualTalk supports speaking and listening role transition, multi-round conversations, and natural interaction.


dualtalk


Overview of DualTalk. DualTalk consists of four components: (a) Dual-Speaker Joint Encoder, (b) Cross-Modal Temporal Enhancer, (c) Dual-Speaker Interaction Module, and (d) Expressive Synthesis Module, enabling the generation of smooth and natural dual-speaker interactions.

BibTeX


  @inproceedings{peng2025dualtalk,
    title={DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations},
    author={Ziqiao Peng and Yanbo Fan and Haoyu Wu and Xuan Wang and Hongyan Liu and Jun He and Zhaoxin Fan},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2025},
}