Department Summary
Prescient Design is seeking exceptional graduate student interns with a strong research background in machine learning (ML), a passion for independent exploration, and the ability to develop and implement innovative ideas. Ideal candidates excel at conducting independent research, solving complex technical problems, and collaborating effectively within cross-functional teams.
Prescient Design, now an integral part of Genentech's Research and Early Development (gRED) organization, leverages cutting-edge machine learning technologies to revolutionize drug discovery and development. Our team comprises experts in machine learning and computational biology, dedicated to pioneering innovative solutions that expedite the creation of life-saving therapies.
This internship position is located in New York City, NY (on-site).
The Opportunity
As a Graduate Research Intern in Multimodal Representation Learning for Drug Discovery, you will:
Design Multimodal Encoder Models: Architect and implement language models that integrate diverse data modalities—such as textual information, genomic sequences, protein structures, and small molecules—into a unified embedding space.
Model Training: Leverage the computational power of our large GPU clusters to train models on large multimodal datasets, ensuring robust and comprehensive representation learning across diverse biological data types.
Performance Evaluation: Assess model performance using biotech-specific benchmarks, focusing on generating meaningful embeddings for complex, high-dimensional data.
Applied Use Case Analysis: Demonstrate the practical utility of the models in real-world scenarios, such as Retrieval Augmented Generation (RAG), to showcase their effectiveness in biotechnological applications.
Program Highlights
Intensive 12-week, full-time (40 hours per week) paid internship.
Program start dates are in May/June (Summer).
A stipend, based on location, will be provided to help alleviate costs associated with the internship.
Ownership of challenging and impactful business-critical projects.
Opportunity to work with some of the most talented professionals in the biotechnology industry.
Who You Are
Required Education:
Must be pursuing a Master's Degree (enrolled student) or a PhD (enrolled student).
Required Majors:
Computer Science, Statistics, Applied Math, Physics, Bioinformatics, Computational Biology, or a related technical field.
Required Skills:
Programming Proficiency: Strong programming skills, particularly in Python; experience with machine learning frameworks such as TensorFlow or PyTorch.
Machine Learning Expertise: Solid understanding of machine learning concepts, including supervised and unsupervised learning, neural networks, and representation learning techniques.
Communication Skills: Excellent written and verbal communication abilities, with a capacity to work collaboratively in a multidisciplinary team environment.
Preferred Knowledge, Skills, and Qualifications
Experience with large language models (LLMs) and their development for representation learning.
Familiarity with processing and integrating diverse data types, such as text, genomic sequences, protein structures, and small molecules.
Prior exposure to drug discovery processes and biotechnological data analysis.
Experience with unsupervised and supervised contrastive learning techniques.
Excellent communication, collaboration, and interpersonal skills.
Complements our culture and the standards that guide our daily behavior & decisions: Integrity, Courage, and Passion.
Relocation benefits are not available for this job posting.
The expected salary range for this position based on the primary location of New York is $45-$50 hour. Actual pay will be determined based on experience, qualifications, geographic location, and other job-related factors permitted by law. This position also qualifies for paid holiday time off benefits.
#GNE-R&D-Interns-2025
Genentech is an equal opportunity employer, and we embrace the increasingly diverse world around us. Genentech prohibits unlawful discrimination based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin or ancestry, age, disability, marital status and veteran status.