Hani Alomari’s

I am currently in the second year of my Ph.D. in Computer Science at Virginia Tech, specializing in applied machine learning with a strong focus on cross-modal information retrieval across various modalities, such as image, text, video, and audio. My research is aimed at enhancing embedding representations to capture non-literal and nuanced connections between different modalities. Under the guidance of Dr. Chris Thomas, I am leading projects that develop advanced cross-modal representations to preserve semantic richness while maintaining modality-specific details. Additionally, my work involves designing novel model architectures and loss functions that promote semantic diversity and prevent redundancy in representations.

My broader research interests span multiple areas within natural language processing (NLP) and computer vision (CV), including multimodal event understanding, text-to-image generation, and various classification tasks. I have also collaborated on medical imaging projects using deep learning models for accurate image classification in medical tasks. My past experience includes working with Prof. Rehab M. Duwairi on question-answering information retrieval and Dr. Malak A. Abdullah on classification tasks like emotion classification and propaganda detection. My ongoing commitment to pushing the boundaries of machine learning research is reflected in my participation in creating new benchmarks and evaluating models for fine-grained multimodal reasoning abilities.