IPE 05-2023 Master Thesis or Internship: Development of Large-Scale Data Monitoring System for High throughput Streaming and Real-time Analysis of Data Intensive Scientific Equipments

  • Institute for Data Processing and Electronics (IPE)
  • Fixed-Term
  • Full-time

Your Tasks

Industrial and scientific applications are seeing a growing use of data intensive scientific equipment (e.g., high-throughput cameras). Hence, real-time data evaluation is of indisputable value for interactive monitoring. However, processing large amount of data is a challenging task requiring orchestration of an efficient data flow and implementation of efficient parallel algorithms suitable for modern hardware architectures.

In this project, you will build a data monitoring system for data intensive scientific equipment using data processing and statistical methods. The real-time data processing will identify high-value targets for further analysis, thereby accelerating scientific discovery over traditional methods. The system will be used in several scientific pilot experiments. Notably, the KATRIN experiment--- Karlsruhe Tritium Neutrino Experiment---has numerous sensor instruments that generate data at an unprecedented rate. You will work closely with our team of experienced researchers and engineers to design and implement a robust system that can handle the challenges of processing and analyzing large volumes of data in real-time.

The goal of the work is to develop a data monitoring system that can manage such demanding experiments. You are expected to design the architecture of the system and integrate it with the real-time data processing pipelines of the experiment. Particularly, you will implement diverse communication interfaces (e.g., REST or gRPC), and use redis as the intermediate data storage.

Responsibilities:

  • Collaborate with the team to understand the requirements of the data monitoring system for the different scientific experiment.
  • Design and develop scalable and efficient algorithms and data structures for high-throughput data streaming and real-time analysis.
  • Implement software components and modules that integrate with existing data acquisition systems.
  • Test and debug the system to ensure its reliability, performance, and accuracy.
  • Optimize the system to handle large-scale data and maximize its efficiency and scalability.
  • Document the development process, including design decisions, implementation details, and troubleshooting steps.
  • Collaborate with the scientists to integrate the data monitoring system into the overall data analysis workflow.

Your Profile

  • Very good knowledge and practical experience with Python and Web-development programming languages
  • good understanding of cloud architecture development or/and database storage is a plus

Interested?

Carrasco Sanchez, Raquel

Tel: +49 721 608-42016

Contract duration

6 months

Contact person in line-management

For further information, please contact Dr. Nicholas Tan Jerome, e-mail: nicholas.tanjerome@kit.edu or Dr. Suren Chilingaryan, e-mail: suren.chilingaryan@kit.edu.