Large language models to drive credible digital twins
PhD Project
Precision medicine through Digital Twins relies on the effective integration and processing of vast amounts of experimental and clinical data, computational models, and knowledge. Semantic annotations play a crucial role in automating knowledge retrieval, extraction, and integration. However, creating high-quality semantic annotations can be labour-intensive and subjective. As the field of computational biology generates an increasing amount of data and models, scaling up the annotation process to retrospectively annotate existing models is not feasible without the investment of large amounts of resources. This project aims to automate the semantic annotation of biological computational models by developing novel large language models (LLM).
Natural language processing (NLP) can be employed to automatically extract knowledge from scientific literature and research papers. Structured model encoding formats (e.g., CellML, SBML, SED-ML, etc.) and formal descriptions of knowledge offer opportunities to develop more informed machine-learning methods. Students will explore innovative approaches to represent and incorporate prior knowledge into the machine learning process, thus overcoming the limitations of purely data-driven approaches. The resulting LLM minimize the effort to build and extend knowledge graphs from the annotated models, simulation results, experimental data, etc, for future integration with other research projects.
Desired skills
Exposure and interest in machine learning. Strong computational skills, including proficiency in programming in at least one language (e.g., Python, C++). A computer science background is preferred but not necessary.
Contact and supervisors
Contact/Main supervisor
Supporting supervisor(s)
- David Nickerson
- Yuda Munarko
Page expires: 6 February 2025