CV

Research Statement

My work lies at the intersection of computer vision and natural language processing, with primary focus on multimodal learning and vision-language models. I develop diverse, semantically aligned representations for cross-modal retrieval across image, text, video, and audio using multi-embedding strategies to capture non-literal and abstract relationships. I work on structured knowledge and grounded reasoning for interpretable vision-language models, supported by new benchmarks and evaluation pipelines. This research has led to publications at top-tier venues such as ACL 2025.

Education

Publications

See the publications page for the complete list and links.

Research Experience

Academic Contributions

Honors & Awards

Conferences & Presentations

Technical Skills