my_pic.jpg

Bangalore, Karnataka

India, 560-066

Hey Voila! :wave:

I’m currently a Research Enginner @ AMD GenAI, where my focus is mostly on building in-house LMMs (Large Multimodal Models) πŸ€–. Previously, I was affiliated to CVIT Lab, IIIT, Hyderabad πŸ›οΈ from where I completed my MS by Research in 2024. I was a part of the Katha-AI group where I had the privilege of working with Prof. Makarand Tapaswi.

I am currently focusing 🎯 on RL, Diffusion, and partially, Efficient Training ⚑. Trying to πŸ”„ transition from my usual research area of Multimodal Learning 🎭.

πŸ› οΈ Before IIITH:
I was a Research Intern at Indian Statistical Institute, Kolkata (2020-21) πŸ“, where I worked with Prof. B. Uma Shankar in the Machine Intelligence Unit on topic Multi-label Classification of Remote Sensing Images πŸ›°οΈ.

πŸ“š I graduated with an BS-MS degree in Mathematics & Statistics πŸ“Š from IISER Kolkata in 2021. In my spare time, I enjoy playing badminton 🏸, swimming 🏊, and sometimes biking 🏍️.

⚑ Riding the wave of AGI innovation! πŸš€

Hiring πŸ”₯

AMD’s GenAI team is driving the future of foundation models πŸ€– and is hiring!

πŸ“’ We have multiple open roles, including:

  1. Research Scientist (Senior & Junior)
  2. Research Engineer
  3. Research Intern

Feel free to reach out βœ‰οΈ if you’re interested!

news

Aug 05, 2024 πŸš€ Joined the AMD GenAI team as a Research Engineer πŸ§‘β€πŸ’».
πŸ”₯ Building fully open-source LMMs from scratch on AMD Instinct GPUs ⚑ (MI300 / MI250).
Jul 13, 2024 πŸŽ‰ Graduated with a MS by Research degree in CSE πŸ–₯️ from IIIT Hyderabad πŸŽ“.
Jun 18, 2024 🌎 Visited Seattle, US πŸ‡ΊπŸ‡Έ for my πŸ† poster presentation at 41st CVPR conference πŸŽ‰.
Apr 23, 2024 πŸ›‘οΈ Defended my Master’s thesis πŸ“œ required for the completion of my MS degree πŸŽ“ at IIIT-H! 🎯
Apr 18, 2024 Paper acceptance! πŸ”₯ to FSE 2024. Topic: Leverage LLMs to automatically recommend OCEs on quickly identifying and mitigating critical issues (RCA). Read More
Mar 06, 2024 πŸ”₯πŸ”₯Best Paper AwardπŸ”₯πŸ”₯. Acceptance to FOSS-CIL 2024. Read More

selected publications

  1. RecapStorySumm.png

    "Previously on ..." From Recaps to Story Summarization

    In IEEE Conference on Computer Vision and Pattern Recognition,, 2024

    We introduce multimodal story summarization by leveraging TV episode recaps – short video sequences interweaving key story moments from previous episodes to bring viewers up to speed. We propose PlotSnap, a dataset featuring two crime thriller TV shows with rich recaps and long episodes of 40 minutes. Story summarization labels are unlocked by matching recap shots to corresponding substories in the episode. We propose a hierarchical model TaleSumm that processes entire episodes by creating compact shot and dialog representations, and predicts importance scores for each video shot and dialog utterance by enabling interactions between local story groups. Unlike traditional summarization, our method extracts multiple plot points from long videos. We present a thorough evaluation on story summarization, including promising cross-series generalization. TaleSumm also shows good results on classic video summarization benchmarks.

  2. emotx.gif

    How you feelin’? Learning Emotions and Mental States in Movie Scenes

    In IEEE Conference on Computer Vision and Pattern Recognition,, 2023

    Movie story analysis requires understanding characters’ emotions and mental states. Towards this goal, we formulate emotion understanding as predicting a diverse and multi-label set of emotions at the level of a movie scene and for each character. We propose EmoTx, a multimodal Transformer-based architecture that ingests videos, multiple characters, and dialog utterances to make joint predictions. By leveraging annotations from the MovieGraphs dataset, we aim to predict classic emotions (e.g. happy, angry) and other mental states (e.g. honest, helpful). We conduct experiments on the most frequently occurring 10 and 25 labels, and a mapping that clusters 181 labels to 26. Ablation studies and comparison against adapted state-of-the-art emotion recognition approaches shows the effectiveness of EmoTx. Analyzing EmoTx’s self-attention scores reveals that expressive emotions often look at character tokens while other mental states rely on video and dialog cues.