Hani Alomari’s

I’m a third-year Ph.D. student in Computer Science at Virginia Tech, affiliated with the Sanghani Center for Artificial Intelligence and Data Analytics. I work at the intersection of computer vision and natural language processing, with a focus on multimodal learning and vision-language models. My research centers on developing robust, semantically aligned representations across modalities to support cross-modal retrieval, reasoning, and interpretability. Under the guidance of Dr. Chris Thomas, I design embedding models and architectures that capture nuanced, non-literal relationships while preserving modality-specific details and promoting semantic diversity.

Before joining Virginia Tech, I earned both my M.Sc. and B.Sc. in Computer Science from Jordan University of Science and Technology. I have collaborated with Prof. Rehab M. Duwairi and Dr. Malak A. Abdullah on projects involving information retrieval, emotion classification, and propaganda detection. I have also contributed to deep learning research for medical imaging tasks.

Specific research interests include:

  • Cross-modal retrieval across images, text, video, and audio

  • Learning diverse and semantically meaningful embeddings for multimodal alignment

  • Structured information extraction and representation from multimodal data

  • Knowledge structures and reasoning in vision-language models