With the rise of the Internet of Things (IoT), many sensors produce vast amounts of data. To reduce latency and bandwidth requirements, systems that process this data are typically geo-distributed, with processing occurring on the device, in a nearby edge device, in a regional edge cloud, and in a central cloud. This has led to the emergence of data processing platforms where multiple stakeholders contribute data or provide services that use the data. This involves collaboration across many parties: an infrastructure provider hosts the platform, different data owners contribute their data, and service providers develop and deploy services. For example, in an intelligent transport system, a road operator is an infrastructure provider that processes data from road users (data owners), used by service providers such as emergency services, mapping applications, or road maintenance. Similarly, an electricity grid operator (infrastructure provider) collects data from electricity meters installed in households and companies (data owners), which is processed by service providers such as governments, electricity producers, and electricity consumers.
In these scenarios, while the different parties benefit from the collaboration, in practice, they are often hindered by concerns about trust. The data owners want to keep control of where and by whom their data is used, requiring data confidentiality. The service providers want to ensure their computations run as specified, hence requiring code integrity, and may require their proprietary code (e.g., ML models) to be kept confidential, thus requiring code confidentiality. The infrastructure provider must ensure these guarantees to both parties.
A Trusted Execution Environment (TEE), such as Intel SGX, is a secure area of a processor in which code and data can be loaded. It guarantees confidentiality, i.e., that the code and data cannot be accessed from outside the TEE, and integrity, i.e., that the code is executed as specified. This is implemented in hardware and guaranteed by the hardware vendor. In this project, we will leverage TEEs to provide the required trust guarantees for a distributed data processing platform. Such an application would consist of TEEs running in various locations – edge, edge cloud, cloud – that each contain a part of a distributed stream processing platform (e.g., Apache Flink, Apache Spark).
In this project, you will explore and solve the challenges related to building such a system. We plan to adapt an existing distributed stream processing framework to run within TEEs across distributed nodes, including secure communication. We envision you will need to design a protocol that can convince a data owner that only pre-approved services can access their data. Moreover, it should remain possible to exploit typical features of stream processing platforms such as scalability (adding/removing nodes), failure recovery, and dynamic migration of workloads while maintaining the trust guarantees. This will lead to implementing a TEE-based “trusted orchestrator” such that (1) service providers can deploy and upgrade services, (2) the infrastructure provider can scale and adapt their nodes dynamically, and meanwhile (3) data owners can be certain their data remains confidential.
This project is at the intersection of research in confidential computing (using TEEs) and distributed computing (using stream processing frameworks). We welcome applicants with experience in either. During this internship, you will work closely with us and become familiarized with industrial applications of the research work. We attempt to find a fit such that this internship can become an integral part of your PhD (e.g., as an application case) and lead to a paper written in collaboration between you and us.
Duration: flexible, to be agreed (typically 3-4 months), starting time is flexible
Location: Antwerp (Belgium)
Student enrolled in, or with a, Ph.D. in Computer Science or Engineering. We are open to starting PhDs, students more advanced in their PhD, or post-docs. Experience related to either stream processing platforms (Apache Flink, Spark, Beam) or trustworthy computing (Intel SGX/TDX, ARM TrustZone, AMD SEV) is required. Previous (or pending) publications in related domains (confidential computing, distributed computing) are a strong plus. Programming skills in Python, JavaScript/TypeScript, or Java is a plus. Language skills: English You must relocate to Belgium for the duration of the internship. Hybrid work is possible. This is a paid internship. You will explore how TEEs can be used to provide confidentiality and integrity guarantees for distributed stream processing systems. You will design and implement a protocol or algorithm to provide these guarantees. You will implement a prototype on top of an existing stream processing platform. (This can be an open-source platform or our in-house platform.) You will implement benchmarks and evaluate your results. Ideally, this project leads to a publication at a top academic venue.