Hola! My name is Oscar Mañas. I am a PhD Candidate at Mila and Université de Montréal, advised by Prof. Aishwarya Agrawal. I am also a Visiting Researcher at Meta FAIR, advised by Dr. Michal Drozdzal and Prof. Adriana Romero.

My research explores the intersection of computer vision and natural language processing. I believe that AI systems should have a holistic understanding of their environment, mirroring human perception and understanding. This requires seamless integration of multiple sensory modalities, with vision and language emerging as particularly significant and complementary: vision provides raw perceptual data, while language is an abstract human construct for efficient communication. Together, these modalities are essential for solving AI-complete problems. My work focuses on advancing multimodal vision-language generative models, i.e., models capable of generating images and/or text conditioned on multimodal inputs.

Previously, I was a Research Intern at Element AI in Montreal, advised by Dr. Pau Rodríguez and Dr. David Vázquez. I obtained a M.Sc. in Computer Vision from Universitat Autònoma de Barcelona, and I carried out my master’s thesis at the Image Processing Group advised by Prof. Xavier Giró. Before, I obtained a B.Sc. in Computer Science from Universitat Politècnica de Catalunya, and I carried out my bachelor’s thesis at the Architectures and Compilers Group advised by Prof. Antonio Gonzalez and Dr. Jose-Maria Arnau.