Zhuofan Zong

Cited by

	All	Since 2020
Citations	1131	1129
h-index	13	13
i10-index	13	13

580

290

145

435

2021202220232024202513 18 110 566 419

Public access

View all

7 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Hongsheng Li (李鸿升)The Chinese University of Hong KongVerified email at ee.cuhk.edu.hk
Dongzhi JiangMMLab, CUHKVerified email at link.cuhk.edu.hk
Zeyue Xue (薛泽岳)The University of Hong KongVerified email at connect.hku.hk
Hao ShaoCUHK, MMLabVerified email at link.cuhk.edu.hk
Ping Luo (羅平)Associate Professor, The University of Hong Kong; MMLAB@HKUVerified email at hku.hk
Bingqi MaSensetime ResearchVerified email at sensetime.com
Kunchang LiShenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; ByteDance SeedVerified email at siat.ac.cn
Yu QiaoProfessor of Shanghai AI Laboratory; Shenzhen Institutes of Advanced Technology, CASVerified email at siat.ac.cn
Guanglu Song
Dazhong ShenNanjing University of Aeronautics and Astronautics

Zhuofan Zong

MMLab, The Chinese University of Hong Kong

Verified email at link.cuhk.edu.hk - Homepage

Large Models Multimodal Object Detection 3D Object Detection


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Detrs with collaborative hybrid assignments training Z Zong, G Song, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	487	2023
Raphael: Text-to-image generation via large mixture of diffusion paths Z Xue, G Song, Q Guo, B Liu, Z Zong, Y Liu, P Luo Advances in Neural Information Processing Systems 36, 41693-41706, 2023	174	2023
Visual cot: Advancing multi-modal language models with a comprehensive dataset and benchmark for chain-of-thought reasoning H Shao, S Qian, H Xiao, G Song, Z Zong, L Wang, Y Liu, H Li Advances in Neural Information Processing Systems 37, 8612-8642, 2024	74	2024
Graph attention based proposal 3d convnets for action detection J Li, X Liu, Z Zong, W Zhao, M Zhang, J Song Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 4626-4633, 2020	62	2020
Mova: Adapting mixture of vision experts to multimodal context Z Zong, B Ma, D Shen, G Song, H Shao, D Jiang, H Li, Y Liu arXiv preprint arXiv:2404.13046, 2024	59	2024
Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models H Shao, S Qian, H Xiao, G Song, Z Zong, L Wang, Y Liu, H Li arXiv e-prints, arXiv: 2403.16999, 2024	59	2024
Self-slimmed vision transformer Z Zong, K Li, G Song, Y Wang, Y Qiao, B Leng, Y Liu European Conference on Computer Vision, 432-448, 2022	42	2022
Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue, J Su, H Li, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	39	2023
Comat: Aligning text-to-image diffusion model with image-to-text concept matching D Jiang, G Song, X Wu, R Zhang, D Shen, Z Zong, Y Liu, H Li Advances in Neural Information Processing Systems 37, 76177-76209, 2024	28	2024
RCNet: Reverse feature pyramid and cross-scale shift network for object detection Z Zong, Q Cao, B Leng Proceedings of the 29th ACM International Conference on Multimedia, 5637-5645, 2021	24	2021
Exploring the role of large language models in prompt encoding for diffusion models B Ma, Z Zong, G Song, H Li, Y Liu arXiv preprint arXiv:2406.11831, 2024	22	2024
Jingyong Su, Hongsheng Li, and Yu Liu. Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue arXiv preprint arXiv:2304.00967 2, 2023	18	2023
T2i-r1: Reinforcing image generation with collaborative semantic-level and token-level cot D Jiang, Z Guo, R Zhang, Z Zong, H Li, L Zhuo, S Yan, PA Heng, H Li arXiv preprint arXiv:2505.00703, 2025	16	2025
DETRs with collaborative hybrid assignments training (2023) Z Zong, G Song, Y Liu arXiv preprint arXiv:2211.12860, 0	9
Easyref: Omni-generalized group image reference for diffusion models via multimodal llm Z Zong, D Jiang, B Ma, G Song, H Shao, D Shen, Y Liu, H Li arXiv preprint arXiv:2412.09618, 2024	7	2024
Large-batch optimization for dense visual predictions: Training faster R-CNN in 4.2 minutes Z Xue, J Liang, G Song, Z Zong, L Chen, Y Liu, P Luo Advances in Neural Information Processing Systems 35, 18694-18706, 2022	6	2022
Large-batch optimization for dense visual predictions Z Xue, J Liang, G Song, Z Zong, L Chen, Y Liu, P Luo Advances in Neural Information Processing Systems 1, 2022	5	2022
ADT: Tuning Diffusion Models with Adversarial Supervision D Shen, G Song, Y Zhang, B Ma, L Li, D Jiang, Z Zong, Y Liu arXiv preprint arXiv:2504.11423, 2025		2025
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping H Shao, S Wang, Y Zhou, G Song, D He, S Qin, Z Zong, B Ma, Y Liu, H Li arXiv preprint arXiv:2412.11279, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–19

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors