Hi, I’m Hani Alomari šŸ‘‹

Ph.D. student in Computer Science at Virginia Tech, working with Dr. Chris Thomas on multimodal learning and vision-language models, with an emphasis on building retrieval systems that capture diverse, non-literal meaning across images, text, video, and audio.

Research Interests

✨ Vision-Language Models šŸ”„ Cross-Modal Retrieval 🧠 Multimodal Reasoning

Recent News

Feb '26 Lenses accepted to CVPR 2026
Jan '26 3 papers under review at ACL Rolling Review
May '25 MaxMatch accepted to ACL 2025

Selected Publications


Lenses thumbnail
Lenses: Toward Polysemous Vision-Language Understanding
Hani Alomari, Ali Asgarov, Chris Thomas
CVPR 2026
MaxMatch thumbnail
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
Hani Alomari, Anushka Sivakumar, Andrew Zhang, Chris Thomas
ACL 2025 Paper
JourneyBench thumbnail
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark
Zhecan Wang, Junzhang Liu, Chia-Wei Tang, Hani Alomari, et al.
NeurIPS 2024 Paper Website
ENTER thumbnail
ENTER: Event-Based Interpretable Reasoning for VideoQA
Hammad Ayyubi, Junzhang Liu, Ali Asgarov, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Zhecan Wang, Chia-Wei Tang, Hani Alomari, et al.
NeurIPS 2024 MAR Workshop · Spotlight
Paper
See publications page for the complete list and links.