Belgium
40 days ago
Bell Labs Internship on Source-aware Language Models (PhD)

Large Language Models (LLM) have changed the landscape of AI use in personal and industrial applications. They present an opportunity to increase automation and improve the workflows of people in many different domains. However, LLMs can often misinterpret queries or training data, and sometimes mislead their users with confident sounding yet erroneous answers. This issue arises in parts from making it difficult to confidently find the origin of the knowledge an LLM answer arose from.

One way in which we might increase the confidence of users in a LLM answer is by determining where the knowledge provided in said answer comes from. For example, the user might learn if a particular answer comes from internal Nokia technical documents, external vendor-specific documents, or non-technical documents to determine the level of confidence they have into the answer, as well as where related information is found. By extension, determining the source of information in this way can also help with protecting the users from misinformation. Furthermore, a source-aware model also enables accountability of information, by giving credit to the sources that contributed most to helping a user find the answer to a particular question.

In this project, you will learn about specific methods that offer potential avenues for tracking the source of LLM knowledge, and be tasked with refining, implementing, testing and extending those techniques to enable information source tracking in LLMs. This task will require being familiar with the training of LLMs and how information is encoded, as well as implementing new training methods testing the validity of assumptions on source tracking. Furthermore, you can expect to research potential new domains of application that can be used to test and evaluate the benefits of Source-aware LLMs.
 

Duration: flexible, to be agreed (typically 3-4 months), starting time is flexible

Location: Antwerp (Belgium) or Stuttgart (Germany).

Student enrolled in Ph.D. in Computer Science or Engineering. Strong programming skills in Python. Some experience with LLMs. Experience with Explainable AI or training of LLMs is a plus. Language skills: English This is a paid internship. You will learn how to leverage existing LLM techniques that offer potential avenues in designing source-aware LLMs. You will explore use cases and dataset that provide suitable training data to track the origin of LLM knowledge. You will implement new LLM training methods that capture information on knowledge origin. You will evaluate your results and iterate on the training method design to improve source learning. Ideally, this project leads to a publication at an academic venue.
Confirm your E-mail: Send Email
All Jobs from Nokia