|
Sheng Zhou (周晟)
[Email]
[Github]
[Google Scholar]
|
|
|
I am actively seeking research discussions and collaboration opportunities, so feel free to contact me!
My group at KAUST is actively recruiting visiting/remote students,
with several openings available until July 2026.
Students interested in MLLM for Healthcare can contact me!
|
|
Research
My research focuses on visual-language understanding and reasoning,
primarily on scene-text visual question answering and visual grounding.
I am currently expanding my research scope to egocentric video understanding and multimodal large language models for human assistance.
|
|
🔥News
2026.02: One Paper is accepted by CVPR. 🎉.
2025.11: Successfully defended my Ph.D.🎓
Thesis: Research on Scene Text-Driven Visual Question Answering.
2025.05: Our work EgoTextVQA will be presented at the Egocentric Vision (EgoVis) Workshop and Vision-based Assistants in the Real-World (VAR) Workshop @ CVPR 2025! 😄
2025.05: One Paper is accepted by IEEE TMM. 🎉
2025.02: One Paper is accepted by CVPR. 🎉
2025.02: I honor Tat-Seng Chua Scholarship.
2024.09: I will be a visiting student at NUS for one year, collaborating with Dr. Junbin Xiao.
2024.01: One paper is accepted by ACM TOMM. 🎉
2023.07: Start a study at USTC and supervised by Prof. Xun Yang.
2023.09: One paper is accepted by IEEE TIP. 🎉
|
|
Publications and Preprints
|
(† Correspnding Author, # Core Contributor)
|
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark
Yang Shi#, Yuhao Dong#, Yue Ding#, Yuran Wang#, Xuanyu Zhu#, Sheng Zhou#, Wenting Liu#, Haochen Tian#, Rundong Wang#, Huanqian Wang, Zuyan Liu, Bohan Zeng, Ruizhe Chen, Qixun Wang, Zhuoran Zhang, Xinlong Chen, Chengzhuo Tong, Bozhou Li, Qiang Liu, Haotian Wang†, Wenjing Yang, Yuanxing Zhang†, Pengfei Wan, YiFan Zhang†, Ziwei Liu†.
CVPR'26
[arXiv]
[Code]
[Dataset]
|
|
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
Sheng Zhou,
Junbin Xiao†,
Qingyun Li,
Yicong Li,
Xun Yang,
Dan Guo,
Meng Wang,
Tat-Seng Chua,
Angela Yao.
CVPR'25
[arXiv]
[Project Page]
[Code]
[Dataset]
|
|
Scene-Text Grounding for Text-Based Video Question Answering
Sheng Zhou,
Junbin Xiao†,
Xun Yang†,
Peipei Song,
Dan Guo†,
Angela Yao,
Meng Wang,
Tat-Seng Chua.
IEEE TMM'25
[arXiv]
[Code]
[Dataset]
|
|
Graph Pooling Inference Network for Text-based VQA
Sheng Zhou,
Dan Guo†,
Xun Yang†,
Jianfeng Dong,
Meng Wang†.
ACM TOMM'24
[Paper]
[Code]
|
|
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Sheng Zhou, Dan Guo†, Jia Li, Xun Yang†, Meng Wang†.
IEEE TIP'23
[Paper]
[Code]
|
|
Selected Honors and Awards
[2025.02] Tat-Seng Chua Scholarship
[2022 - 2025] First Class Academic Scholarship (three times)
[2020 - 2022] Second Class Academic Scholarship (two times)
[2020] Outstanding Graduate of Innovation and Entrepreneurship in Hunan Province
[2016 - 2019] National Encouragement Scholarship (three times)
|
|
Services
Reviewer for Conference: CVPR (2026), ACM MM (2025, 2026), ECCV (2026), IJCNN (2025, 2026)
Reviewer for Journal: IEEE TIP, ACM TOMM, Information Fusion, Neurocomputing, ...
|
|