Hani Alomari’s
I’m a third-year Ph.D. student in Computer Science at Virginia Tech, affiliated with the Sanghani Center for Artificial Intelligence and Data Analytics. I work at the intersection of computer vision and natural language processing, with a focus on multimodal learning and vision-language models. My research centers on developing robust, semantically aligned representations across modalities to support cross-modal retrieval, reasoning, and interpretability. Under the guidance of Dr. Chris Thomas, I design embedding models and architectures that capture nuanced, non-literal relationships while preserving modality-specific details and promoting semantic diversity.
Before joining Virginia Tech, I earned both my M.Sc. and B.Sc. in Computer Science from Jordan University of Science and Technology. I have collaborated with Prof. Rehab M. Duwairi and Dr. Malak A. Abdullah on projects involving information retrieval, emotion classification, and propaganda detection. I have also contributed to deep learning research for medical imaging tasks.
Specific research interests include:
Cross-modal retrieval across images, text, video, and audio
Learning diverse and semantically meaningful embeddings for multimodal alignment
Structured information extraction and representation from multimodal data
Knowledge structures and reasoning in vision-language models