Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.
The AI Platform team is on a mission to advance the state of the art in AI and deliver on our company’s vision for how intelligent cloud and intelligent edge will shape the next phase of innovation. The team includes top scientists and researchers from across Microsoft who are creating a center of excellence in speech, computer vision, and natural language.
Within the AI Platform, the Multi-modal Intelligence team (MMI) mission is to make fundamental contributions to advancing the state-of-the-art in AI technology related to Video, Image, Document, and other multimodality inputs. “Documents”, for example, stand at the intersection between NLP and Vision research. To fully understand a document, one needs to borrow from both language and visual (Layout) elements of the document. We explore both single and multimodality inputs – and their synergy - to conduct research on forward-looking topics such as Video Understanding, Information Retrieval, Key-Value extraction, few-shot Named Entity Recognition (NER), hierarchical layout analysis, and many others.
We are looking for Research Interns to work on cutting edge research in Multimodal AI. We are particularly interested in Research Interns with background in AI, NLP, and/or CV, including topics like Video/image understanding, document layout analysis, chart understanding, multi-page multi-document question answering, novel ways of leveraging LLMs for document/video/image understanding and solving problems inherent to large language models (grounding, retrieval-based generation, etc.). Familiarity with modern LLMs/VLMs is a plus, but not required.