Hola! My name is Oscar Mañas. I am a Research Scientist at Meta Superintelligence Labs on the Media Generation team. I recently completed my PhD at Mila and Université de Montréal, advised by Prof. Aishwarya Agrawal.
My research explores the intersection of computer vision and natural language processing. I believe that AI systems should develop a holistic understanding of their environment, mirroring human perception and understanding. This requires seamless integration of multiple modalities, with vision and language being especially complementary: vision provides raw perceptual data, while language is an abstract human construct for efficient communication and reasoning. I focus on multimodal vision-language generative models, systems capable of generating images, videos and text conditioned on multimodal inputs. My long-term vision is that treating modalities as interconnected channels of reasoning will enable AI systems that learn continuously from multimodal interactions, becoming deeply embedded in human society and potentially achieving artificial general intelligence, or even superintelligence.
Previously, I was a Visiting Researcher at Meta FAIR, advised by Dr. Michal Drozdzal and Prof. Adriana Romero. I was also a Research Intern at Element AI in Montreal, advised by Dr. Pau Rodríguez and Dr. David Vázquez. I obtained a M.Sc. in Computer Vision from Universitat Autònoma de Barcelona, and I carried out my master’s thesis at the Image Processing Group advised by Prof. Xavier Giró. Before, I obtained a B.Sc. in Computer Science from Universitat Politècnica de Catalunya, and I carried out my bachelor’s thesis at the Architectures and Compilers Group advised by Prof. Antonio Gonzalez and Dr. Jose-Maria Arnau.
