I am a PhD student at MBZUAI. My research focuses on reliable and efficient language technologies for real-world deployment, with emphasis on domain adaptation, robust and uncertainty-aware learning, and safety for large language models. I received my B.Tech. in Electrical Engineering (Minor: Machine Learning) from IIT Kanpur. I am fortunate to be advised by Yova Kementchedjhieva and Kentaro Inui.
Current research focus
My main research looks at vision–language models, specifically how visual and textual tokens align across layers in the LLM component of the VLMs. We observe that alignment happens late which may prevent visual tokens from using task-specific circuits in early layers in LLMs. I am working on validating this and exploring a recurrent back-patching approach to inject late-layer visual representations into earlier layers.
I am working on personalization in VLMs where we are building a benchmark to evaluate VLMs if they have capabilities required to do personalization such as a person's behavior, preferences, characteristics, habits and so on. Instead of treating personalization as a static, we treat it as temporal which will keep changing across time.
I am also collaborating with a master’s student on a Q-Former–based visual RAG architecture that compresses retrieved images into compact semantic tokens for efficient grounding. With this approach, we want to extend in-context learning of VLMs as they suffer to utilize more than 1 example.
Updates
- Jan. 2026: Started validating late-layer alignment behavior and testing recurrent back-patching interventions.
- Nov. 2025: Drafted evaluation protocol for temporal personalization in VLMs (preferences/behavior/habits over time).
- Sep. 2025: Built an early prototype of Q-Former–based visual RAG compression for efficient grounding.
- Jul. 2025: Set up layerwise alignment analysis to track visual–text fusion across the LLM stack.
- May 2025: Consolidated papers/projects and refreshed the website structure for easier sharing.