About me

Hi, my name is Oscar Mañas.

I am a PhD student at Mila and Université de Montréal, advised by Prof. Aishwarya Agrawal. I am also a research scientist intern at Meta AI, advised by Dr. Michal Drozdzal and Prof. Adriana Romero.

My research interests lie at the intersection of computer vision and natural language processing. I believe that, like humans (and other animals), AI systems should have a holistic understanding of the world around them. This means working with multiple sensory modalities, among which vision and language arise as particularly interesting. On one hand, they are complementary: vision is a low-level perceptual modality, while language is an abstract human construct. On the other hand, they are believed to be two essential modalities for solving AI-complete problems.

I am generally interested in multimodal vision-language generative models, i.e. models capable of generating images and/or text conditioned on multimodal inputs. Generating new content requires learning and composing patterns from existing data, i.e. modeling the underlying data distribution. When this data represents the real world, generative models become effective “world models”. This idea has numerous applications. For example, text-conditioned image generation models can synthesize data on demand for training recognition/representation learning models on new tasks/skills. Furthermore, given the semantic and compositional nature of language, (large) language models can serve as reasoning engines. By aligning language models with vision encoders, we can build powerful multimodal systems capable of both perceiving and reasoning, which can be deployed as multimodal assistants (e.g. to aid visually-impaired users).

Previously, I was a research intern at Element AI in Montreal, advised by Dr. Pau Rodríguez and Dr. David Vázquez. I obtained a M.Sc. in Computer Vision from Universitat Autònoma de Barcelona, and I carried out my master’s thesis at the Image Processing Group advised by Prof. Xavier Giró. Before, I obtained a B.Sc. in Computer Science from Universitat Politècnica de Catalunya, and I carried out my bachelor’s thesis at the Architectures and Compilers Group advised by Prof. Antonio Gonzalez and Dr. Jose-Maria Arnau.