I am a Ph.D. student at Show Lab, National University of Singapore, advised by Prof. Mike Zheng Shou and Prof. Wynne Hsu. I received my B.Eng. in Computer Science from Shen Yuan Honors College of Beihang University. My research interests lie in generative models for images, videos, 3D and 4D.
email: jay.zhangjie.wu [at] gmail.com
google scholar · github · linkedin · twitter
news
| Jun 2025 | Difix3D+ is recognized as the Best Paper Award Candidate at CVPR 2025. |
|---|---|
| Feb 2025 | We’re organizing WorldModelBench: The First Workshop on Benchmarking World Foundation Models at CVPR 2025. |
| Feb 2024 | Tutorial Diffusion-based Video Generative Models to appear at CVPR 2024. |
| Oct 2023 | Code and model weights of Show-1 |
| May 2023 | Organized LOVEU-TGVE (Text-Guided Video Editing) competition at CVPR 2023. |
| Apr 2023 | Searching for papers on video diffusion models? Check out our GitHub repo Awesome-Video-Diffusion |
publications
(*) denotes equal contribution-
ChronoEdit: Towards Temporal Reasoning for Image Editing and World SimulationTechnical Report · -
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models -
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation ModelsWhite Paper · -
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal ControlWhite Paper · -
Cosmos World Foundation Model Platform for Physical AI -
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided VideoICCV 2025 · -
SCube: Instant Large-Scale Scene Reconstruction using VoxSplatsNeurIPS 2024 · -
MotionDirector: Motion Customization of Text-to-Video Diffusion ModelsECCV 2024 (Oral) · -
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention MasksECCV 2024 -
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point CorrespondenceCVPR 2024 · -
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video EditingCVPR 2024 -
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationIJCV 2024 · -
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion ModelsNeurIPS 2024 · -
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation