Computer Science

Applications for 2024-2025 open on 1 July 2024.

Nearest Neighbor Search with GPUs

Project code: SCI075

Supervisor:

Ninh Pham

Discipline: School of Computer Science

Project description

Approximate nearest neighbor search (ANNS) is the central problem in many computer science fields, e.g. recommender systems, large-scale classification, information retrieval, multi-model LLM in vector database. While most of the ANNS solvers are CPU-based, the project will utilize advances of parallel computing of GPU to accelerate the search performance.

Objectives:
We will study and advance state-of-the-art randomized algorithms that are often scalable in parallel. We will design and implement these solutions with the CUDA library, and compare the performance with state-of-the-art GPU-Faiss library [1].

[1] https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU

Prerequisites: CUDA library, C/C++, Python

Desired Output:
An open-sourced implementation and evaluation of the proposed approach against GPU-based FAISS.

AI policy and regulations

Project code: SCI076

Supervisor:

Gillian Dobbie

Discipline: School of Computer Science

Project description

This project will produce a survey of legislations, regulations and policies around the use of AI internationally. Countries and constitutions are developing legislation, regulations and policies around the use of AI. Some have different sets of regulation for use of AI and research using AI. In this project, you will review various information sources to build a map of the different current legislation, regulations and policies that have been developed, along with a description of how NZ fits within the map, and suggest a way forward for NZ.
The skills you require are an interest in the application of AI in society and solid research skills.

Using deidentified CT brain scan data to predict brain health in older people

Project code: SCI077

Supervisor:

Gillian Dobbie

Discipline: School of Computer Science

Project description

This summer student study is part of a larger study that aims to discover whether machine learning can be used with routinely collected health information to predict future brain health. As part of an ongoing HRC-funded project, the research team have collected de-identified data from users of older adult services in Te Whatu Ora Counties Manukau. Working with the Health Informatics department based at Middlemore Hospital, they have already established that they can download deidentified data from routinely collected health data including sociodemographics, hospitalisation data, discharge destinations, ICD-10 diagnostic codes, delirium screening scores, care home placements, pharmacy data, lab results (e.g HbA1c) and mortality data.

The aim of this summer studentship project is to add in potential radiological predictors to the dataset, specifically CT brain scan data. The project will explore the feasibility of using raw data from deidentified CT brain scans to augment the predictive dataset, facilitating the machine learning model in identifying patterns that might identify groups of individuals at higher risk of developing dementia. The study will focus on a random sample of patients who have had at least one CT brain scan and use machine learning methods while combining the scans with other information to predict dementia.

This study has the potential to significantly contribute to our understanding of brain health and advance our ability to predict and address dementia risk.

Skills required: Strong motivation and a willingness to learn. Python programming capabilities and the ability to pick-up existing tools and knowledge. Machine learning knowledge is essential.

Generative AI and policy

Project code: SCI078

Supervisor:

Gillian Dobbie

Discipline: School of Computer Science

Project description

This project will produce a survey of legislations, regulations, policies and tools around the use of Generative AI internationally. Countries and constitutions are developing legislation, regulations, policies and tools around the use of Generative AI. Some have different sets of regulation for different use of Generative AI and research using Generative AI. In this project, you will review various information sources to build a map of the different current legislation, regulations, policies and tools that have been developed, along with a description of how NZ fits within the map, and suggest a way forward for NZ.
The skills you require are an interest in the uses of Generative AI in different domains and solid research skills.

AI in Health policy and tools

Project code: SCI079

Supervisor: 

Gillian Dobbie

Discipline: School of Computer Science

Project description

This project will produce a survey of legislations, regulations, policies and tools around the use of AI in Health internationally. Countries and constitutions are developing legislation, regulations, policies and related tools to ensure the safety of AI in health. There may be different sets of regulations for use of AI in health and research using AI in health. In this project, you will review various information sources to build a map of the different current legislation, regulations, policies and tools that have been developed, and suggest what would work to enable a healthy and thriving use of AI in Health, in both society and research.
The skills you require are an interest in the uses of AI in Health and solid research skills.

Unconditional hardness of weak computational models

Project code: SCI080

Supervisor:

Marc Vinyals

Discipline: School of Computer Science

Project description

One of modern mathematics' greatest problems is whether P equals NP or, in other words, whether deterministic and nondeterministic polynomial-time Turing machines have the same computational power. We seem far away from answering this question, but we are able to prove that some problems are hard for limited computational models.

The first part of the project consists of the candidate acquainting themselves with one of the aforementioned weak computational models such as decision trees, communication protocols, propositional proofs, or monotone Boolean circuits. In other words, survey the cornerstone results and research techniques used to prove hardness results in that computational model. This will involve reading books and research articles with guidance from the supervisor.

The second part consists of applying these techniques to attempt to prove that some particular task cannot be solved efficiently in the model of choice. This will involve research discussions with the supervisor consisting of brainstorming ideas and discussing technical roadblocks, developing ideas and formalising them into proofs individually, and writing down finished proofs. An example of a problem that may be attempted during the second part is designing a function whose DAG-like query complexity differs from its certificate complexity.

Skills required: mathematical maturity, including the ability to read and write formal proofs, and a strong background in discrete mathematics. Prior knowledge of theory of computation is strongly recommended but not essential. Programming experience is not required.

Algorithmic drills

Project code: SCI081

Supervisor:

Marc Vinyals
Michael Dinneen

Discipline: School of Computer Science

Project description

The goal of the project is to set up a number of drill exercises for students to practice algorithms. Each drill covers one algorithm, which students should be able to run by hand on as many different inputs as they wish, and obtain immediate feedback.

For example, on a drill covering selection sort, a student gets a list of integers as their input and enters the state of the list after the maximum element has been moved to the correct place in the list. The student's answer is checked and the student is told the correct state. Then the student enters the list after the second maximum has been sorted, and the process repeats until all the necessary steps are done.

Each student should receive a different random input each time that they attempt an exercise. For each student, the system should record which exercises were successfully solved at least once by a certain deadline, and provide a way to export that information (e.g. as a text file containing UPIs and solved exercises).

The first part of the project consists of evaluating and selecting the appropriate framework. A few candidates to explore are coderunner, canvas, designing a new system from scratch, or building on a similar already-existing open-source system.

The second part consists of implementing a number of drills covering a few of the following algorithms:

* Selection sort
* Inserton sort
* Merge sort
* Quicksort
* Heap (insert/delete)
* BST (insert/find/delete)
* Hashing (chaining) (same ops)
* Hashing (open address) (same ops)
* DFS (time at which each node is opened/closed)
* BFS (same)
* Dijkstra (same)
* Tarjan's SCC
* Bellman--Ford
* Prim's MST
* Kruskal's MST

Inputs, while random, should be carefully generated in a way that the algorithm being simulated takes a reasonable number of steps.

Skills required: good programming capabilities, preferrably in Python, and familiarity with the algorithms above. Web development skills can be helpful but are not required.

Improve functionality and performance of assignment automarker used in algorithms classes

Project code: SCI082

Supervisor:

Michael J. Dinneen

Discipline: School of Computer Science

Project description

We need some new features to one of the automated marking platforms that we use in computer science algorithms courses.
We want to fix few operational issues and adapt with improved performance (sequential to parallel as we are moving to the AWS cloud) for test/exam environments.
Need to be fluent (or learn quickly) linux, docker, java, php and possibly javascript. Can focus on front-end (web/gui) or back-end (scheduling and marking performance) of application.

Repository for Quantum Computing local resources

Project code: SCI083

Supervisor:

Michael J. Dinneen

Discipline: School of Computer Science

Project description

The computer science theory group wants to disseminate easily its acquired research outputs on quantum computing. Mainly published and local articles and student theses, but also code (e.g. for D-Wave quantum annealers) and research on random quantum strings.
A data scientist is needed for updating the local web server to achieve these goals with the main emphasis on easy maintenance and updating as new resources need to be added.

Using Generative AI to understand extreme rainfall under climate change

Project code: SCI084

Supervisor:

Yun Sing Koh (CS)
Neelesh Rampal (NIWA)

Discipline: School of Computer Science

Project description

How will the events of Cyclone Gabrielle and the Auckland Anniversary flooding look like under climate change? Will they become more extreme? This project aims to use Generative AI techniques such as Generative Adversarial Networks (GANs) and Diffusion Models to increase the spatial resolution of Climate Models, which are typically only run with spatial resolutions of ~100km. This means that one single grid cell is the size of the Auckland Region! To better understand how extreme events will change in a warmer climate we need higher resolution climate projections (<12km). The main advantage of Generative AI is that it is cost-effective and can be over 10,000 times faster than a physics-based model. The successful student will require a strong programming background, preferably in Python. While not necessary, experience with Tensorflow, PyTorch will be advantageous. A sound understanding of Physics and/or Mathematics will also be beneficial but not necessary.

Extremal number of bipartite graphs

Project code: SCI085

Supervisor:

Rajko Nenadov

Discipline: School of Computer Science

Project description

A central problem in extremal combinatorics is to determine how many edges a graph with n vertices can have, without containing a given graph H as a subgraph. When H has the chromatic number at least 3, this question has been resolved by Erdős and Stone in 1946. However, when H is bipartite, the problem is still wide open. In the last decade there has been significant progress, and the goal of the project would be to further advance some of these recent results. For example, Bradac, Janzer, Sudakov, and Tomon (https://people.math.ethz.ch/~sudakovb/turan-number-of-grid.pdf) studied this problem in the case H is a 2-dimensional grid graph. A natural extension would be to consider grids in higher dimensions, cartesian products of trees, and so on.

Skills: COMPSCI 120, COMPSCI 225 or MATHS 254/255, and ideally MATHS 326 and STATS 120. Working on this type of problems combines algorithmic thinking with mathematical ideas. The project has no programming component.

Online Ramsey Games

Project code: SCI086

Supervisor:

Rajko Nenadov

Discipline: School of Computer Science

Project description

Consider the following one-player game: Start with the empty graph or a hypergraph on n vertices. In each round, one new edge chosen uniformly at random is present to the player, and the player must colour it with one of the r colours. The game finishes once a monochromatic copy of some given graph H appears in the player’s painted graph. The main problem here is to design a good strategy for the player and analyse how long the game typically lasts if the player follows such a strategy.
The graph case has been resolved, but the proof is long and technical. Using some new tools, the goal would be to provide a simplified proof of the graph case, and then extend it to the hypergraph case.

Skills: COMPSCI 120, COMPSCI 225 or MATHS 254/255, and ideally MATHS 326 and STATS 120. The problem belongs to the field known as “probabilistic combinatorics”. Working on this type of problems combines algorithmic thinking with mathematical ideas (discrete maths and probability). The project has no programming component.

Novel power-mean loss function for robust machine learning and AI

Project code: SCI087

Supervisor:

Ni Ding

Discipline: School of Computer Science

Project description

Deep learning models and AI significantly advanced machine learning (ML) skills for complex patterns, whereas people always experience difficulties in the training stage, e.g., class imbalance, outliers, vanishing and exploding gradient, etc. Instead of the conventional interventions such as regularization, data augmentation or drop-outs, recent studies suggest reviewing and redesigning the loss function to deal with the above training problems.

This project will propose a novel Alpha-power mean loss function for robust ML and AI training. This power mean loss is expected to generalize the existing ML loss/cost functions, e.g., least square, SVM, regression (linear or logistic), etc., by tuning the parameter Alpha. We will explore how to calibrate Alpha to cope with various problems raised in the model training stage. The relationships of the Alpha-power mean loss and Renyi cross entropy will be further studied for deep learning and generative models. This project may also generalize the ELBO function to Renyi measures for variational Bayesian method, e.g., variational auto-encoder (VAE).

Skills required: Strong in statistical machine learning, knowledge in statistics (e.g., some experience in probability elicitation) is desirable; Python or R programming; self-motivated and eager to learn.

Improving data utility in pufferfish privacy

Project code: SCI088

Supervisor:

Ni Ding

Discipline: School of Computer Science

Project description

Pufferfish privacy generalizes the well-known differential privacy by guaranteeing statistical indistinguishability in the presence of intrinsic probabilistic correlations between sensitive attributes and published data. It is proved to be a more realistic modelling of data sharing environment.

While the convention approach is to calibrate the noise adding mechanism to the maximum Wasserstein distance, recent study shows that this sufficient condition can be relaxed by a convolution operation.

The purpose of this project is to investigate this relaxation by addressing the question: how much the noise can be reduced while still preserving pufferfish privacy? Consequently, a quantitative study of the improvement in data utility will be conducted. The outcome will constitute the fundamentals for proposing a novel utility-enhancing privacy preserving mechanism, experimentally verified on real-world datasets.

Skills required: Strong in statistics and data science; capability to derive some theoretical results; Python or R programming; self-motivated and eager to learn.

Designing an Efficient RISC-V-based AI Accelerator

Project code: SCI089

Supervisor: 

Bruce Sham

Discipline: School of Computer Science

Project description

First, the candidate should understand the RISC-V instruction set architecture and the specific AI algorithms you want to accelerate. The specific AI algorithms could include algorithms like convolutional neural networks (CNNs), recurrent neural networks (RNNs), or transformers. Then, the candidate should identify the computational and memory access bottlenecks in executing these AI algorithms. The identifications can help us to profile the algorithm execution on a general-purpose RISC-V processor. Based on the identified bottlenecks, design a specialized RISC-V-based AI accelerator. The design could involve adding custom instructions to the RISC-V ISA. The custom instructions can accelerate the identified bottlenecks. For example, you could add vector instructions for matrix multiplications in neural networks. The last step is to implement the vector instructions using high-level synthesis. Finally, the student needs to evaluate the performance of the AI accelerator by running the AI algorithms on it. Compare the performance with the execution on a general-purpose RISC-V processor to quantify the speedup achieved.

Skilled required: Strong motivation and a willingness to learn. Strong programming skills. Computer architecture knowledge is not essential although the candidate will need to pick up relevant computer architecture and machine learning skills along the way.

Expected Outcome: The expected outcome of this project is a RISC-V-based AI accelerator that can efficiently execute AI algorithms, providing significant speedup over a general-purpose RISC-V processor.
The candidate will first provide a brief overview of the state-of-the-art both for behavioural analysis in team sports and on existing annotated datasets.

Leveraging AI for Efficient Physical Design in Electronic Design Automation (EDA)

Project code: SCI090

Supervisor:

Bruce Sham

Discipline: School of Computer Science

Project description

First, the candidate would spend time studying the EDA process, focusing on physical design in particular. The physical design includes steps like placement, routing, and timing analysis. The student then needs to identify the challenges in the physical design process. The challenges are wire length minimization, timing closure, power reduction, etc. The next step is to develop AI models to address these challenges. For example, you could use reinforcement learning to optimize the placement of cells or use a convolutional neural network (CNN) to predict the timing delay of a given placement. Then, the student trains the AI models using a dataset of past designs. The models should learn to predict the quality of a design (in terms of power, performance, and area) and suggest improvements. Lastly, the student would integrate the trained AI models into EDA tools. The AI models should be able to guide the physical design process, making it more efficient. Finally, evaluate the effectiveness of the AI-based approach.

Skilled required: Strong motivation and a willingness to learn. Strong programming skills. Electronic Design Automation (EDA) knowledge is not essential although the candidate will need to pick up relevant EDA and machine learning skills along the way.

Expected Outcome: The expected outcome of this project is an AI-based approach that can make the physical design process in EDA more efficient, producing high-quality designs in less time.

Full-Body Interaction with in a Virtual Reality Art Installation

Project code: SCI090

Supervisor:

A/P Danielle Lottridge
Dr Becca Weber

Discipline: School of Computer Science

Project description

This project explores the future of interacting in VR --- full-body interaction with AI agents within an immersive environment where users experience responsive audio-visual feedback based on real-time body tracking. The student will have a choice to follow their own passion with this project, which could be integrating AI agents into a Unity code, creating custom sound effects, creating custom visual effects, or investigating how users feel and move in these environments. The research goal is to understand how the VR installation impacts sensory perception, embodiment, and subjective experiences. Over the summer, we will work with dance experts to iteratively develop the agents and interaction. This summer research project will contribute to an installation that will be made public. It is related to a larger project that is likely to lead to topics of Masters and PhD studies and to collaborations with other researchers in universities abroad.

Skills required: Interest in VR development. Experience with Unity is a plus.

Augmented reality stroke rehabilitation game from te whare tapa whā

Project code: SCI092

Supervisor:

A/P Danielle Lottridge

Discipline: School of Computer Science

Project description

We reconceptualise healthtech for elders from the Māori model 'te whare tapa whā’ (the four cornerstones of health; Durie, 1994): it builds on iwi involvement and concurrently supports physical, mental, whānau (family) and spiritual health with interactive activities and experiences. Our team includes a partnership with a Māori augmented reality development company ARA where we iteratively codesigned the first version of the software with Māori communities. Together, we will interact directly with kaumatua (elders) to better understand their experiences and motivation to engage in their rehabilitation. The project will involve at-home interviews with people living with stroke as well as analysis of analytics of usage.

Automatic assessment of accessibility, visual design, and interactivity of websites

Project code: SCI093

Supervisor: 

A/P Danielle Lottridge
Dr Gerald Weber

Discipline: School of Computer Science

Project description

Web technologies are foundational and continue to be widespread, with front-end development skills in high demand. This project pursues the automatic assessment of web aspects dynamically, by executing them as would occur within a typical browser using the Selenium WebDriver framework. In this project you will write custom code to assess and interact with web components through browser-specific drivers, expanding functionality to assess visual Gestalt Principles and interactivity in programmatic fashion. Selenium enables the remote control of a browser and mimics user actions on the browser including button click, drag, and drop selection, checkboxes, key presses, taps, and scrolling. The use of this tool is educational to support increased understanding of accessibility guidelines and visual design skills. There are options to explore how machine learning and/or classifiers and/or LLM can support student learning of visual design and accessibility in web technologies.

Mobile-based Indoor Position Tracking for Elderly Care

Project code: SCI094

Supervisor:

Jing Sun

Discipline: School of Computer Science

Project description

Dementia is a complex and progressive neurodegenerative condition. Around 70,000 New Zealanders currently live with dementia, and there is evidence suggesting that Māori and Pacific peoples are at a greater risk of developing dementia, often at an earlier age. Symptoms such as cognitive decline, memory loss, confusion, and behavioral changes affect not only those living with dementia but also place a strain on whānau, caregivers, and healthcare systems. People with dementia often like to walk around their homes or leave to walk around the neighborhood, a behavior sometimes referred to as wandering. Wandering can result in the person with dementia getting lost or encountering dangers such as falls. This concern drives our project to develop an innovative indoor tracking system called SMART-Dementia.

SMART-Dementia has the potential to reduce the risks associated with wandering and to enhance safety, freedom, and dignity among people living with dementia. Our system comprises Ultra-Wideband (UWB) Microcontrollers, a mobile app for whānau/caregivers, and an administrator dashboard. We have preliminarily tested SMART-Dementia with healthy university students in 2023, and it is now ready for field testing with people with dementia in the real world. The aim of this project is to extend the software capabilities and functionalities of the existing prototype, and to further develop the system into a real-world feasibility study, deployment, and evaluation. Please be advised that the research project/group responsible for SMART-Dementia development retains all intellectual property rights for the software and any subsequent extensions.

Skills required: Strong motivation and a willingness to learn. Experience in mobile and server-side application development, as well as AWS deployment, is preferred. Hardware programming knowledge is not essential, although the candidate will need to acquire relevant skills in using UWB devices.

Inferring Question Difficulty Levels in Isomorphic Questions

Project code: SCI095

Supervisor:

Mano Manoharan

Discipline: School of Computer Science

Project description

Two questions are isomorphic if they test the same learning outcome. They are useful as repeat assessments or as practise questions. If we use them for certain types of assessments – such as exams – we would like all of them to be of the same difficulty level.
Creating isomorphic questions by hand is time-consuming, so we use tools to generate them programmatically. In this case, we need a mechanism to judge the difficulty levels automatically and flag questions that don’t have the same level of difficulty.
In this project, you will look at how the difficulty levels can be inferred and reported in the context of Dividni (https://dividni.com).
A good knowledge of C# programming and MSIL is required.

Modelling Causal Behaviour Using Inverse Reinforcement Learning

Project code: SCI096

Supervisor:

Michael Witbrock

Discipline: School of Computer Science

Project description

Humans and animals respond to external and internal environments accordingly to reach specific objectives. This results in causal behaviour that amounts to a sequence of decisions where the previous ones influence the upcoming ones. Understanding this influence can help us predict the behaviour of humans or animals as a function of certain environmental factors. This project aims to utilise machine learning techniques, particularly inverse reinforcement learning, to model the causal behaviour of humans and animals. Inverse reinforcement learning provides a framework for building a utility-based function to recover the dependencies between events coming in a sequence, holding the belief that each succeeding event is conditioned on past events and maximises future utilities. By choosing this project, you will be supervised by Prof. Micahel Witbrock and will work with the team in the Strong AI Lab led by him. The Strong AI Lab is one of the leading research groups that aims to promote AI in various fields, including but not limited to natural language processing, social goods, ethical robotics and industrial manufacturing. The Strong AI Lab already has an accumulation of experience and knowledge in modelling casual behaviour, reflected by the datasets and papers published on top-ranked AI venues.

This is a 10-week project that includes both research and development. By participating in this project, you will have a chance to publish your research work on internationally prestigious AI venues and have the potential to make your research outcome a product or an open-source tool.

Enhancing Retrieval Accuracy and Efficiency for AI Lab Assistant Using Optimized Data Structures

Project code: SCI097

Supervisor:

Michael Witbrock

Discipline: School of Computer Science

Project description

Our AI lab assistant aims to support research activities by efficiently retrieving and presenting relevant information. To achieve this, the assistant relies on advanced retrieval techniques and robust data structures. This project focuses on enhancing the retrieval accuracy and efficiency of the AI lab assistant through the optimization of its retrieval model and underlying data structures.

Efficient data retrieval is crucial for providing accurate and timely information. Optimizing data structures such as inverted indexes, tries, and hash tables can significantly improve the speed and accuracy of information retrieval. Additionally, advanced retrieval techniques, including dense retrieval using BERT and vector embeddings, will be explored to enhance the assistant's performance.

By choosing this project, you will be supervised by Prof. Michael Witbrock and will work with the team in the Strong AI Lab led by him. The Strong AI Lab is one of the leading research groups promoting AI and has extensive experience and knowledge in the field, reflected in the datasets and papers published in top-ranked AI venues.

This is a 10-week project that includes both research and development. By participating in this project, you will have the chance to contribute to improving a practical AI application, with the potential to publish your research work in internationally prestigious AI venues and make your research outcome a product or an open-source tool.

Using AI to predict behaviour in team sports

Project code: SCI098

Supervisor:

Patrice Delmas

Discipline: School of Computer Science

Project description

The use of AI for team player behaviour analysis during games using video recording and data analytics (such as body sensors which are already available for most professional sports) from sports channels has the potential to improve players and teams performance amongst other applications for the viewers.

The candidate will first provide a brief overview of the state-of-the-art both for behavioral analysis in team sports and on existing annotated datasets.

Leveraging the above and using our professional sports and broadcasting partners expertise and datasets, the candidate will trial best existing machine learning techniques for individual behaviour tracking and depending on progress will attempt to link this behaviour to recorded game events (as provided by our sports team partner) such as fouls, scoring, injury and so on. The sports studied will be one or more of the following based on available datasets and task complexity: basketball, netball, rugby league, soccer, rugby union.

Skills required: Strong motivation and a willingness to learn. Some Python programming capabilities and the ability to pick-up existing tools and knowledge. Computer vision knowledge is not essential although the candidate will need to pick-up relevant skills in computer vision and machine learning along the way.

Zero waste management using AI in the building construction industry

Project code: SCI099

Supervisor:

Patrice Delmas
Robert Amor

Discipline: School of Computer Science

Project description

The project seeks to minimize waste at the construction stage by automatically identifying, measuring and estimating its potential for recycling/reuse of leftovers at building construction sites. Building on existing intelligent recycling bins which are equipped with cameras to capture any new additional material deposited, the goal is to identify, sort, and repurpose construction materials, starting with timber, which makes up to 40% of the C&D waste and has significant repurposing potential (up to a zero waste target). The project will use existing datasets created by our Industry partner. The student will explore the relevant literature review and implement deep learning and machine learning-based algorithms combined with simple Computer Vision rationales to identify timber and plastic wastes and infer some information such as material type and volume.

Skills required: Strong motivation and a willingness to learn. Some Python programming capabilities and the ability to pick-up existing tools and knowledge. Computer vision knowledge is not essential although the candidate will need to pick-up relevant skills in computer vision and machine learning along the way.

A Server Performance Prediction Tool

Project code: SCI100

Supervisor:

Xinfeng Ye

Discipline: School of Computer Science

Project description

Cloud computing has emerged as a dominant paradigm for delivering on-demand web services. Within this framework, IT operations engineers play a crucial role in continuously monitoring the servers' key performance indicators (KPIs) and taking proactive measures to uphold service level agreements with clients. A challenge in this context is to enhance the efficiency of meeting client needs by predicting these KPIs accurately. The project aims to develop a machine learning model for forecasting server KPIs. By leveraging predictive analytics, developed model empowers service providers to not only reactively address server performance issues but also proactively optimize resource allocation to ensure a more effective fulfillment of clients' requirements.
Python programming skills are essential for the project. While doing the project, the student is expected to acquire the knowledge of building neural network for modelling the relationships between the various performance parameters of servers.

Machine learning for mass spectrometry data analysis

Project code: SCI101

Supervisor:

Katerina Taskova

Discipline: School of Computer Science

Project description

As new mass spectrometry (MS) technologies are rapidly developed to cope with the complexity of biological samples emerging in environmental and biomedical sciences, standard tools for MS analysis fail to exploit the full data potential offered by the recent technologies. For example, top-down tandem MS has been extremely useful in studying metal-protein interactions and relevant for development of anti-cancer metal-based drugs. Manual identification of binding sites of metal-based drugs is extremely difficult, prone to error, and often only the most intense peaks get assigned. Nevertheless, it is the common approach in absence of effective automated methods for the given problem.

New computational methods are therefore needed, and machine learning ML algorithms would be particularly valuable to cope with the complexity, noise, and volume of MS data. More specifically, the overall problem resembles challenges in ML for time series analysis. You will investigate the use of time series data analysis techniques to match and identify specific peak patterns in MS data.

Recommended skills: Basic knowledge of machine learning, good knowledge of Python, open to learn about mass spectrometry data and collaborate with chemists.

AI for Climate Change: Detection of sea urchin barren reefs from underwater imagery

Project code: SCI102

Supervisor:

Katerina Taskova

Co-supervisor: Arie Spyksma (Institute of Marine Science)

Discipline: School of Computer Science

Project description

Kelp forests are among the most productive ecosystems on Earth, but climate-driven impacts
are causing wide-spread kelp habitat loss. For example, the climate-driven proliferation of the longspined sea urchin is one of the most urgent threats to kelp forests in south-eastern Australia and north-eastern New Zealand.

Assessing this threat requires collection and analysis (typically manually) of underwater imagery spanning tens to hundreds of kilometres of reef. The high contrast of sea urchins on barren reef makes this an ideal candidate for modern computer vision solutions based on machine learning (ML) algorithms to dramatically improve annotation and analysis.

Using existing image-based monitoring data you will develop and test ML algorithms to detect the presence and the extent of urchin barren expansion in Australia/New Zealand.

Recommended skills: This project is suitable for students with basic skills in maths, statistics, machine learning and image analysis; intermediate programming skills in Python; familiarity with convolutional neural networks and programming experience in Pytorch will be beneficial (but it is not necessary and can be learned while working on the project).

Reliable machine learning for predator identification

Project code: SCI103

Supervisor:

Katerina Taskova

Discipline: School of Computer Science

Project description

Large datasets are now routinely collected from digital cameras and other sensing technologies that need to be integrated and analyzed in an efficient and intelligent way in order to address biosecurity problems (such as predator identification) with success at operational scale. Recent advances in low-cost sensing technology, computer vision and deep learning methodology has enabled new opportunities for developing zero tolerance technology for predator monitoring and trapping in large forested and complex environments.

While deep learning models have seen enormous success in computer vision due to their high expressiveness compared to traditional shallow models, they don’t have well-motivated methods for accurately estimating their confidence in a prediction. They can be “overconfident” for images that humans clearly will rule out as not relevant for the prediction task or not even including the object of interest.

This project will investigate methods for quantifying uncertainty in deep learning model predictions. The goal is to develop actionable deep machine learning models, safe to deploy in the real-world applications that need reliable detection of predators in sensing (image-based) data.

Recommended skills: Basic knowledge of machine learning, good knowledge of Python, beneficial to have essential understanding of deep learning networks, and programming experience with Pytorch (TensorFlow or Keras).

Identification of Insects for Biosecurity Surveillance

Project code: SCI104

Supervisor:

Katerina Taskova
Darren Ward (Manaaki Whenua Landcare Research, School of Biological Sciences)

Discipline: School of Computer Science

Project description

Insects can be important economic pests for forestry and horticultural industries, costing many hundreds of millions in economic losses. There is an urgent need to develop more automated methods to identify insects and distinguish pest from non-pest species. Modern techniques in computer vision and machine learning (ML) provide an excellent opportunity for biosecurity surveillance.

Research objectives of this project include:
1. Build ML model for identification of insects in colour images.
2. Determine of how different image quality and type affect ML. For example, comparison of museum specimens (controlled lighting and backgrounds) vs real-life images (various backgrounds, taken outside, objects are angled.
3. Image segmentation: Different insect species are often defined by distinct features (e.g. wing patterns, colours on the body). Can ML be used to localise these distinct parts and then search large image datasets to detect their presence?

Recommended skills: Basic knowledge of machine learning, good knowledge of Python, beneficial to have essential understanding of deep learning networks, computer vision tasks and programming experience with Pytorch (TensorFlow or Keras).

Auditing Artificial Intelligence with Adversarial Learning

Project code: SCI105

Supervisor:

Joerg Wicker

Discipline: School of Computer Science

Project description

We aim to design and develop new methods to attack machine learning models and use the adversarial attacks to define a measure of reliability. Weak performances of models where data sets are not representative or flaws in training process are a common issue in Machine Learning. This leads to misclassification and unfairness of the model. We will develop a framework that identifies adversarial regions in the data space that are prone to make models fail. The framework will not only identify these regions and data, but also produce tools to improve it, and return a score that reflects the reliability of the model. This score can be used to certify models without having access to the training process and estimate the applicability of models to specific use cases.

Recommended skills: Basic knowledge of machine learning and python

Predicting Persistence of Environmental Pollutants

Project code: SCI106

Supervisor:

Joerg Wicker

Discipline: School of Computer Science

Project description

Most chemicals that are currently produced sooner or later end up in the environment, many of them in rivers and other waters. It is essential to know their fate in terms of transformations and persistence. Harmful chemicals that degrade quickly might pose no big thread to the environment, however persistent toxic compounds can have lasting negative impact. We will go beyond the prediction of specific biodegradation products as done in state-of-the-art metabolic prediction systems (such as enviPath https://envipath.org) and aim to predict reaction rates, that is how long pollutants and their metabolites persist in the environment. We will develop and train machine learning models that use data on metabolic reactions under certain environmental conditions and aim to predict reaction rates and the half-life of compounds.

Recommended skills: Basic knowledge of chemistry, machine learning, and python

Design for Degradability - In-Silico Development of Sustainable Chemicals

Project code: SCI107

Supervisor:

Joerg Wicker

Discipline: School of Computer Science

Project description

An important aspect in the development of novel chemicals is their environmental fate, that is their ability to degrade when released in the environment. To achieve this, the goal is to design compounds that fulfill a certain function - for example medication or pesticides, and at the same time allow for quick degradation into harmless metabolites. We will develop new algorithms that achieve this, evaluating on large databases of existing compounds. We will use standard machine learning models for predicting degradation products and pathways (see enviPath - https://envipath.org). Our approach will be to start with existing compounds, and transform them using adversarial methods and generative models (GANs) such that their degradability increases while at the same time keeping their original function.

Recommended skills: Basic knowledge of chemistry, machine learning, and python

Adversarial Time Series

Project code: SCI108

Supervisor:

Joerg Wicker

Discipline: School of Computer Science

Project description

Adversarial Machine Learning is a field of Machine Learning that focuses on exploiting model vulnerabilities by making use of obtainable information from the model. Studying a model’s weaknesses to adversarial attacks not only helps the researcher understand more about the model itself, but also allows them to defend against malicious attacks and prevent potentially fatal consequences after deployment. Adversarial Machine Learning was firstly proposed in the image classification domain, where an attack fools a model to misclassify an image by adding carefully crafted noise that is hardly detectable by a human. Recently, adversarial methods have been introduced that target time series challenges. We will develop and evaluate new adversarial attacks on time series, targeting specific time series challenges beyond forecasting.

Recommended skills: Basic knowledge of machine learning and python

Image compression to support image processing

Project code: SCI109

Supervisor:

Joerg Wicker

Discipline: School of Computer Science

Project description

This project aims to investigate the potential benefits of using our newly developed image compression technique, based on multivariate trees, to enhance image processing machine learning models. The objective is to explore whether employing this technique can lead to faster and more efficient training of these models, requiring fewer iterations, layers, and parameters. While previous research has shown improvements using Superpixels, our approach offers a substantially more lightweight and simplistic solution, reducing storage requirements while maintaining performance. Through this project, the student will conduct empirical evaluations, comparing the performance of models trained on compressed images versus uncompressed ones, and analyze the impact on training time, convergence rate, and model accuracy.

Recommended skills: Basic understanding of machine learning and Python

Overspecialization bias in scientific databases

Project code: SCI110

Supervisor:

Joerg Wicker

Discipline: School of Computer Science

Project description

In scientific fields where gathering data requires time-intensive experiments, predicting likely outcomes for experiments with machine learning helps concentrate efforts on the right experiments.

However, predictive models learn from and specialize to the data provided to them.
While this specialization is useful up to the point where the desired domain is accurately captured, the models can over-specialize.
Starting from the initial dataset, a trained model will only be able to make reliable predictions in densely populated areas of the compound space, leaving the remaining areas outside of the model's applicability domain.
As a consequence, it will suggest a set of experiments well within its applicability domain, shifting the overall data distribution towards in-domain data.
Should the model be re-trained after obtaining the new experimental results, it will put more emphasis on the now densely populated areas further shifting the data distribution.
After a few iterations of dataset growth, we can observe that the applicability domain is either consistent or shrinking despite the additional data, and new potentially interesting areas of the compound space will never be explored.
This scenario is a self-reinforcing type of selection bias where the model chooses to obtain new results for compounds it can already predict reliably, and therefore slows down or even stops learning.

This project will investigate this phenomenon on a concrete large scientific database and find ways to combat this overspecialization.

Recommended skills: Basic understanding of machine learning and Python

Retrosynthesis

Project code: SCI111

Supervisor: 

Joerg Wicker

Discipline: School of Computer Science

Project description

Retrosynthesis is a fundamental concept in chemistry and materials science that involves working backwards from a target molecule to identify the sequence of reactions needed to synthesize the target compound. Revealing synthesis pathways that should have been documented (but were not) is a handy tool. However, substantially more impactful is identifying alternative sequences that use cheaper, more environmentally friendly, or otherwise superior materials, involve fewer steps, enhance yields, mitigate risks, etc. This project will review and enhance or develop machine learning based retrosynthesis methods.

Recommended skills: Basic understanding of machine learning and Python