Sheng Zhou (ε‘¨ζ™Ÿ)

I obtained my Ph.D. degree in Computer Science at Hefei University of Technology (HFUT) through a direct M.S.–Ph.D. program from Sep. 2020 to Dec. 2025, advised by Prof. Dan Guo and Prof. Meng Wang. From Aug. 2023 to Aug. 2024, I studied at University of Science and Technology of China (USTC) for one year under the guidance of Prof. Xun Yang. From Sep. 2024 to Aug. 2025, I studied at National University of Singapore (NUS) as a visiting student under the guidance of Dr. Junbin Xiao, Prof. Angela Yao, and Prof. Tat-Seng Chua.

I am actively seeking research discussions and collaboration opportunities, so feel free to contact me!

My group at KAUST is actively recruiting visiting/remote students, with several openings available until July 2026. Students interested in the MLLM for Healthcare can contact me!

[Email] [Github] [Google Scholar]

profile photo
Research

My research focuses on visual-language understanding and reasoning, primarily on scene-text visual question answering and visual grounding. I am currently expanding my research scope to egocentric video understanding and multimodal large language models for human assistance.

πŸ”₯News

  • 2025.12: I will soon join King Abdullah University of Science and Technology (KAUST) as a Postdoctoral Fellow.
  • 2025.11: Successfully defended my Ph.D.πŸŽ“ Thesis: Research on Scene Text-Driven Visual Question Answering.
  • 2025.05: Our work EgoTextVQA will be presented at the Egocentric Vision (EgoVis) Workshop and Vision-based Assistants in the Real-World (VAR) Workshop @ CVPR 2025! πŸ˜„
  • 2025.05: One Paper is accepted by IEEE TMM. πŸŽ‰
  • 2025.02: One Paper is accepted by CVPR. πŸŽ‰
  • 2025.02: I honor Tat-Seng Chua Scholarship.
  • 2024.09: I will be a visiting student at NUS for one year, collaborating with Dr. Junbin Xiao.
  • 2024.01: One paper is accepted by ACM TOMM. πŸŽ‰
  • 2023.07: Start a study at USTC and supervised by Prof. Xun Yang.
  • 2023.09: One paper is accepted by IEEE TIP. πŸŽ‰
  • Publications and Preprints

    egotextvqa.png EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
    Sheng Zhou, Junbin Xiao, Qingyun Li, Yicong Li, Xun Yang, Dan Guo, Meng Wang, Tat-Seng Chua, Angela Yao.
    CVPR'25 [arXiv] [Project Page] [Code] [Dataset]
    vitxtgqa Scene-Text Grounding for Text-Based Video Question Answering
    Sheng Zhou, Junbin Xiao, Xun Yang, Peipei Song, Dan Guo, Angela Yao, Meng Wang, Tat-Seng Chua.
    TMM'25 [arXiv] [Code] [Dataset]
    gpin Graph Pooling Inference Network for Text-based VQA
    Sheng Zhou, Dan Guo, Xun Yang, Jianfeng Dong, Meng Wang.
    TOMM'24 [Paper] [Code]
    ssgn Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
    Sheng Zhou, Dan Guo, Jia Li, Xun Yang, Meng Wang.
    TIP'23 [Paper] [Code]
    Selected Honors and Awards

  • [2025.02] Tat-Seng Chua Scholarship
  • [2022 - 2025] First Class Academic Scholarship (three times)
  • [2020 - 2022] Second Class Academic Scholarship (two times)
  • [2020] Outstanding Graduate of Innovation and Entrepreneurship in Hunan Province
  • [2016 - 2019] National Encouragement Scholarship (three times)