Research
My research focuses on visual-language understanding and reasoning,
primarily on scene-text visual question answering.
I am currently expanding my research scope to egocentric video understanding and multimodal large language models for human assistance.
|
🔥News
2025.02: One Paper is accepted by CVPR. 🎉
2025.02: I honor Tat-Seng Chua Scholarship.
2024.09: I will be a visiting student at NUS for one year, collaborating with Dr. Junbin Xiao.
2024.01: One paper is accepted by TOMM.
2023.07: Start a study at USTC and supervised by Prof. Xun Yang.
2023.09: One paper is accepted by TIP.
|
Publications and Preprints
|
|
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
Sheng Zhou,
Junbin Xiao,
Qingyun Li,
Yicong Li,
Xun Yang,
Dan Guo,
Meng Wang,
Tat-Seng Chua,
Angela Yao.
CVPR'25
[arXiv]
[Project Page]
[Code]
|
|
Scene-Text Grounding for Text-Based Video Question Answering
Sheng Zhou,
Junbin Xiao,
Xun Yang,
Peipei Song,
Dan Guo,
Angela Yao,
Meng Wang,
Tat-Seng Chua.
Arxiv'24
[arXiv]
[Code]
|
|
Graph Pooling Inference Network for Text-based VQA
Sheng Zhou,
Dan Guo,
Xun Yang,
Jianfeng Dong,
Meng Wang.
TOMM'24
[Paper]
[Code]
|
|
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Sheng Zhou, Dan Guo, Jia Li, Xun Yang, Meng Wang.
TIP'23
[Paper]
[Code]
|
Selected Honors and Awards
[2025.02] Tat-Seng Chua Scholarship
[2022 - 2025] First Class Academic Scholarship (three times)
[2020 - 2022] Second Class Academic Scholarship (two times)
[2020] Outstanding Graduate of Innovation and Entrepreneurship in Hunan Province
[2016 - 2019] National Encouragement Scholarship (three times)
|
|