Ihaka Lecture Series
In March 2017 the Department of Statistics launched an annual lecture series named after Associate Professor Ross Ihaka in honour of his contributions to the field. Find out about the 2024 lecture series below.
Ross Ihaka, along with Robert Gentleman, co-created R – a statistical programming language now used by the majority of the world’s practising statisticians. It is hard to over-emphasise the importance of Ross’s contribution to our field. We named this lecture series in his honour to recognise his work and contributions to our field in perpetuity.
Find out more about Ross Ihaka here.
2024 series
Engaging through data
Data analysis is no longer confined to individuals working in isolation. Modern connectivity and complexity of problems requires teamwork, reproducibility and collaboration. Similarly, the analytic outputs must be actionable, understandable and intuitive such that wider communities can benefit from the results by engaging with the analyses and data, to participate in the outcome.
The 2024 Ihaka Lecture Series features three speakers that bring together visualisation, teamwork and reproducibility. They share their real-world experience in developing engaging visualisations, bringing people and tools together and a vision for reproducible data analysis.
Lectures commence at 6.30pm, MLT2/303-102, Building 303, 38 Princes Street.
Refreshments will be available before each lecture at 6pm.
Lecture 1: 12 September
Reimagining Literate Programming and Automated Report Generation
Yihui Xie, Independent consultant on R Markdown and R package development.
I first came across Sweave in R around 2007 and immediately fell in love with it. Later I learned about the idea behind Sweave, literate programming, and found it quite interesting, too. Literate programming may not be very useful for programming, but is a perfect paradigm for automated report generation. That is, generating (data analysis) reports automatically from computer code. In 2011, I started developing the R package knitr to explore further the potential of literate programming, which achieved some success (especially with the invention of R Markdown), but when looking back after thirteen years, I couldn’t believe that I missed some good ideas that should have been so obvious, and also implemented some ideas so awfully. In this lecture, I’ll share
some thoughts along the design of a reimagined report generator in R after gaining first-hand experience with R Markdown users in the industry in 2024. I’ll also explain the philosophy and rationale behind some decisions when writing this software package.
Yihui Xie (https://yihui.org) is currently an independent consultant on R Markdown and R package development. Previously, he worked as a software engineer at Posit Software, PBC (formerly RStudio, PBC) from 2013 to 2023. He earned his PhD from the Department of Statistics, Iowa State University. As
an active R user, he has authored several R packages, such as knitr, bookdown, blogdown, xaringan, animation, tinytex, and pagedown, among which the animation package won the 2009 John M. Chambers Statistical Software Award (ASA). He also co-authored a few other R packages, including shiny, markdown, articles, and leaflet. He has published five books related to R
Markdown, including “Dynamic Documents with R and knitr”, “R Markdown: The Definitive Guide”, and “R Markdown Cookbook”, etc.
Lecture 2: 19 September
Putting feelings into figures
Farah Hancock, Data/Longform Journalist, RNZ
Ever have data which shows something so important you wanted to hit people right in the “feels” with your visualisation? There will always be a need for simple charts, which just show the facts, but sometimes when the numbers are compelling there’s a call to think beyond a bar chart and visualise data in a way which evokes emotion.
In this presentation, Farah will look at examples of this sort of data visualisation and walk through the process of the visualisation choices made in two very different projects published on the RNZ.co.nz. Initially trained as a designer, Farah Hancock worked for many years in the advertising industry in New Zealand and abroad, primarily working on digital campaigns. Experiencing a midlife career crisis she pivoted to journalism and as a visual thinker she inadvertently fell into data journalism when the Covid-19 pandemic started. Since then she’s visualised a range of topics from food exports to bus cancellations, fatal police shootings and political donations. She lives in Tāmaki Makaurau, Auckland and works at RNZ in its investigative In Depth team.
Lecture 3: 26 September
Making R work in government
Peter Ellis, Director of the Statistics for Development Division at the Pacific Community (SPC)
Managed well, R can be a critical component of a transformation of the effectiveness and efficiency of an analytical team in government. R’s competitor in government is not Julia or Python or even SAS, but overwhelmingly Excel. The keys to making the most of R are not the latest and fanciest R packages, but integrating it into a new workflow. That workflow also uses Git and (probably) SQL. It breaks down micro-silos and the “lone
genius who understands the spreadsheet”, replacing them with teamwork, transparency, reproducible analytical pipelines, peer review, and home-grown R packages and rules for use.
Doing this successfully is difficult and depends on process changes, firm direction from management, and a nuanced understanding of public sector incentives and risk aversion. It means challenging assumptions that public servants, non-IT contractors and management consultants don’t write code; and changing recruitment and professional development. In this talk I’ll draw on experiences in several countries and very different environments to explore these issues; and see if we can identify the secret sauce to making R bloom in the potentially difficult soil of a public sector bureaucracy.
Peter Ellis is the Director of the Statistics for Development Division at the Pacific Community (SPC), where his team of around 30 statisticians, data scientists and analysts help Pacific Island country and territory statistical offices collect, process, analyse and disseminate data for official statistics. He is an Accredited Statistician with the Statistical Society of Australia. He was previously the Chief Data Scientist at Australian head quartered management consultancy Nous Group where he led a transformation of its approach to analytics based on R, SQL and Git. Prior roles included the Principal Data Scientist at Stats NZ, General Manager Evidence and Insights at the Social Investment Agency, Manager Sector Performance at Ministry of Business, Innovation and Employment, and Director Program Evaluation for the Australian aid program. You can view his blog here.
2023 series
Bringing numbers to life
For many, data visualisation is an entry point into either providing or understanding statistical information.
Data visualisation provides a powerful tool for turning numbers into something that anyone can consume, whether for communication, for exploration, or just for pleasure.
The 2023 Ihaka Lecture Series brings together three speakers whose work focuses on making data visualisations that communicate well, provide user interaction, and look spectacular.
Lecture 1: 28 September
Interactive Graphics and Data Analysis
Antony Unwin, First Professor of Computer-Oriented Statistics and Data Analysis, Augsburg University, Germany
In the late 1980s there were already impressive software packages for Interactive Graphics. Little progress has been made since then, particularly in comparison with developments in other software such as R. Leaving aside all that R can do that Interactive Graphics cannot, there are several ways Interactive Graphics can augment what can be done with R. They are complementary approaches with different styles of working.
Two datasets are used to illustrate the possibilities, movie ratings and recent German electoral and demographic data. The examples demonstrate Interactive Graphics in action, emphasising the key elements and how important they can be in illuminating what data mean.
Antony Unwin was the first Professor of Computer-Oriented Statistics and Data Analysis at the University of Augsburg in Germany. Earlier he was at Trinity College Dublin. He is a fellow of the American Statistical Association, co-author of the book Graphics of Large Datasets and co-editor of the Handbook of Data Visualisation. His research focuses on data visualisation and his research group developed several pieces of interactive graphics software, "the Augsburg Impressionists" and wrote packages for R. Antony is author of the book "Graphical Data Analysis with R" published in 2015 by CRC Press. He is Data Visualization Editor of the Harvard Data Science Review.
Lecture 2: 12 October
Unpredictable paintings: Making generative artwork in R using data visualisation tools
Danielle Navarro, Pharmacometrician, Certara
The R statistical programming language is one of the most widely used languages in data science and statistics. Among other things it provides powerful tools for data visualisation and graphics. The ggplot2 package, for example, provides an implementation of the grammar of graphics and allows users to compose flexible and beautiful data visualisations from reusable, composable parts.
In this talk I'll discuss how those same tools can be repurposed and used for purely artistic purposes. Though not explicitly designed for artistic use, the graphics and data visualisation tools in R turn out to be extremely well suited to artistic pursuits. I'll talk about the techniques and coding methods used when creating generative art in R, showcase some of my own work and other artists and offer some thoughts about why I think there is such a tight connection between art and data visualisation.
Danielle Navarro is a data scientist, mathematical psychologist and generative artist. She's worked as an academic at the University of New South Wales and the University of Adelaide, studying human reasoning and behavioural statistics and more recently as a developer advocate at Voltron Data working on Apache Arrow. She's the author of "Learning Statistics with R" and a coauthor on the forthcoming 3rd edition of "ggplot2: Elegant Graphics for Data Analysis". She lives in Sydney with her two children and her Netflix subscription.
Lecture 3: 19 October
What’s Behind the Map: The Process of Data Visualisation
Chris McDowall, Surveillance and Intelligence Specialist, Te Whatu Ora
We often discuss data visualisation in terms of its outputs: maps, scatterplots, bar graphs, interactive graphics. However, beneath every chart and graph lies a web of thoughtful decisions. Which data should we include? What is omitted? How should we visually represent it? Does the intended message come across effectively? How might this visualisation be potentially misconstrued?
In this presentation, Chris will delve into mapping and data visualisation as processes, with a special emphasis on visual thinking and communication. The discussion will explore curatorial and design considerations with regards to various audiences. The talk will particularly focus on the roles of colour and text in effective data visualisation. Many examples of outstanding maps and graphs will be examined to uncover how and why they succeed. The session will conclude with a couple of end-to-end case studies, revealing the evolution of a graphic and the many invisible decisions involved in their creation.
Chris McDowall trained as a geographer with a focus on cartography and human geography. Over the last twenty years he has worked variously as a cartographer, environmental scientist, analyst and data journalist with roles at the University of Auckland, Manaaki Whenua— Landcare Research, the National Library of New Zealand and The New Zealand Herald.
He is a co-creator of the award-winning book, We Are Here, An Atlas of Aotearoa. His maps and data visualisations have been exhibited at the National Library and Auckland War Memorial Museum and featured widely in media such as the Spinoff, RNZ, New Zealand Geographic and the NZ Herald. His work runs a spectrum from large format wall maps to animations to interactives for mobile devices. It is unified by a desire to share geographic insights with readers.
Chris lives in Tāmaki Makaurau / Auckland and works at Te Whatu Ora as a surveillance and intelligence specialist. This role involves mapping communicable diseases and environmental risks.
2022 series
Building Building Blocks for Data Science
The field of Data Science is fortunate because the most popular software tools for Data Science are programming languages. The availability of such tools depends on people building effective, efficient, and open software tools for Data Science. This means that most Data Scientists learn to write code, and some Data Scientists are also developers; writing code so that other people can write code.
The 2022 Ihaka Lecture Series featured three speakers who develop software tools for Data Science, building systems that can be built upon in turn.
Lecture 1: Thursday 28 July 2022
The genesis of experimentation
Dr Emi Tanaka, Senior Lecturer in Statistics, Monash University
Experiments are essential endeavours to understand the process or phenomena around us via the analysis of experimental data. However, as a precursor to any analysis, the importance of the design of experiment and the data collection process cannot be emphasised enough.
There is no salvation for rubbish data, yet there is far more focus on the analysis of experimental data than any steps prior to the analysis. In this talk, I introduce the framework, called “the grammar of experimental designs”, implemented as the edibble R-package that puts the focus on capturing the user’s intention and understanding of the experimental structure to plan, design and simulate experiments. This approach differs considerably from standard, often recipe-driven, approaches and has potential to encourage users to reflect and revise designs tailored to their experimental need.
Dr. Emi Tanaka is a lecturer in statistics at Monash University whose primary interest is to develop impactful statistical methods and tools that can readily be used by practitioners. Her research area includes data visualisation, mixed models and experimental designs, motivated primarily by problems in bioinformatics and agricultural sciences. She is currently the President of the Statistical Society of Australia Victorian Branch and the recipient of the Distinguished Presenter’s Award from the Statistical Society of Australia for her delivery of a wide-range of R workshops.
You can see the slides from this lecture here, and watch the lecture on YouTube here.
Lecture 2: Thursday 4 August 2022
New plumbing: Adding a pipe operator to base R
Professor Luke Tierney, Ralph E. Wareham Professor of Mathematical Sciences, the University of Iowa
The forward pipe operator ‘%>%’ was introduced to R by the ‘magrittr’ package and has since become an integral part of many data science workflows. Forward pipe operators have also been introduced in other languages in recent years. R 4.1.0 added a new forward pipe operator ‘|>’ to the base R language. This talk will review the history of forward pipe operators in R and other languages, and explain the motivation and design decisions behind the new operator.
Luke Tierney is Ralph E. Wareham Professor of Mathematical Sciences at the University of Iowa. He has been a member of the R Core Team since 1998. His research has focused mainly on two aspects of computational methods and tools to support statistical analysis. The first area involves developing computational methods, based on approximations and simulation methods, for carrying out Bayesian data analysis. The second involves designing, developing, and maintaining computing environments for statistics and data science.
You can watch the lecture on YouTube here.
2021 series
Looking on the bright side
We should be worried about how much of our personal data businesses are gathering, but are there benefits to be had from allowing our health system to know more about us? We are on constant guard to protect our computers from viruses, but when a virus strikes humanity, can our computers help to protect us? We know that giving teenagers the ability to communicate 24/7 can have negative outcomes, but what happens when scientists get hold of social media tools?
The 2021 Ihaka Lecture Series featured three speakers who described how modern computing can be used to positively impact the world.
The recordings of each lecture are available to view below.
Lecture 1: Thursday 29 July 2021
Data Science in the Connected Era
Dr Simon Urbanek Senior Lecturer, Department of Statistics, University of Auckland
Our world is increasingly interconnected, which has several implications. On the one hand it increases the amount and variety of data we can collect to make informed decisions and improve our lives, but also it allows us to perform data analyses without constraints related to the physical location of the data or compute infrastructure.
Modern computer technologies such as cloud computing and the Web have given rise to social media, but in this talk we will explore the possibilities of leveraging them for visualisation and data analysis, connecting people with data across the world and fostering collaboration.
We will illustrate the benefits of that approach using RCloud - a collaborative tool for data analysis and interactive visualisation which supports several data analytic languages, distributed computing, discovery, sharing and reproducible research. It allows us to analyse data collaboratively at a large scale and communicate results efficiently.
Professor Simon Urbanek is a Senior Lecturer in the Department of Statistics at the University of Auckland. Simon obtained his PhD in Statistics from the Augsburg University, Germany in 2004 and has worked at AT&T Labs in Data Science and AI Research for 15 years, leading research and projects on large-scale data analysis in the areas of mobility networks, TV and advertising.
His main interests are visualisation, interactive graphics, big data analytics, statistical and distributed computing. He is member of the R Core Development Team and author of numerous popular R packages including Rserve, multicore, rJava, iPlots, RJDBC and iotools.
Lecture 2: Thursday 5 August 2021
Implementing a Machine-Learning Tool to Support High-Stakes Decisions in Child Welfare: A case study in Human Centred AI
Professor Rhema Vaithianathan, Centre for Social Data Analytics, AUT
Data analytics techniques like predictive risk modelling offer incredible opportunities to learn from rich data sets and make decisions supported by data. But while the private sector has been quick to realise the benefits of data analytics (especially as a tool to drive profitability), the public sector has moved much slower, despite needing new solutions to many wicked social problems.
Professor Rhema Vaithianathan will reflect on what we can learn about applying data analytics in a trusted way, from the very different experiences of the private and public sectors. In particular, she will talk about different approaches to key concepts like consent, transparency, fairness and community voice and how they can contribute to project success or failure. She will go on to talk about new ‘rules of engagement’ that are emerging for social good uses of data analytics, drawing on her experiences implementing the Allegheny Family Screening Tool, a machine learning tool used to support screening of child abuse calls in Allegheny County, PA (United States) since 2016, and scaling out of this work in California and Colorado.
Professor Vaithianathan is a Professor of Economics at Auckland University of Technology where she is director of the Centre for Social Data Analytics, a research centre focused on using data analytics for social impact. She is also a Professor of Social Data Analytics at the Institute for Social Science Research at The University of Queensland, where she leads a second node of the Centre
for Social Data Analytics.
Lecture 3: Thursday 12 August 2021
Modelling to support the COVID-19 response in Aotearoa New Zealand
Dr Rachelle Binny, Manaaki Whenua - Landcare Research and Te Pūnaha Matatini
Mathematical models are playing an important role in the ongoing pandemic, providing insights into the spread of the virus and the effects of interventions to help inform response strategies. This seminar will give an overview of mathematical modelling by Te Pūnaha Matatini to support New Zealand’s COVID-19 response. We will describe the models used to simulate spread of COVID-19 in New Zealand, how they can help inform decisions on switching between Alert Levels, and how we are modelling the risk of new cases arriving at the border.
Rachelle Binny is a mathematical biology researcher at Manaaki Whenua - Landcare Research in Christchurch NZ, and a Principal Investigator in Te Pūnaha Matatini, the NZ Centre of Research Excellence for Complex Systems and Networks. Her research lies at the interface of mathematics, statistics and biology and is data-driven. Following a BSc in Mathematical Biology (University of Dundee, Scotland), she undertook a PhD (University of Canterbury, Christchurch) to develop new models of collective cell behaviour in wound healing, and calibrate these using experimental data. After completing her PhD in 2015, she spent two years as a postdoc at Manaaki Whenua (a Crown Research Institute for environment and biodiversity) before taking on a Researcher position there. Rachelle’s current research combines modelling theory with data from ecological systems to guide conservation management.
2020 series
The role of statistics and computing in public and social policy
Corporations are collecting and mining mountains of data to make better consumers of us all, but there are also vast quantities of data being gathered by public organisations for administrative and policy purposes.
The 2020 Ihaka Lecture Series brings together three experts to discuss the challenges and rewards of applying data science to societal issues.
Our thanks to The New Zealand Statistical Association who are our official sponsors for the 2020 Ihaka Lecture Series.
The triumph of the quants?: Model-based poll aggregation for election forecasting
Professor Simon Jackman, Chief Executive Officer at the United States Studies Centre, will examine recent successes and failures of predictive models of election outcomes. Professor Jackman will also discuss trends and discontinuities in the evolution of public opinion over election campaigns, spatial smoothing and pollster biases.
Machine learning for causal inference: Magic elixir or fool’s gold?
Professor Jennifer Hill from New York University will review the conceptual issues involved in understanding causal mechanisms and describe the potential for machine learning to improve our understanding of these mechanisms.
Implementing a machine learning tool to support high-stake decisions in child welfare: A case study in human centred AI (cancelled)
Professor Rhema Vaithianathan, from the Centre for Social Data Analytics at AUT, will reflect on what we can learn about applying data analytics in a trusted way, covering key concepts like consent, transparency, fairness and community voice, and how they can contribute to project success or failure.
2019 series
Rise of the machine learners: Statistical learning in the computational era
Whether labelled as machine learning, predictive algorithms, statistical learning, or AI, the ability of computers to make real-world decisions is rising every year.
The 2019 Ihaka Lecture Series brought together four experts at the interface of statistics and computer science to discuss how computers do it, and how much we should let them.
Our thanks to The New Zealand Statistical Association who are our official sponsors for the 2019 Ihaka Lecture Series.
Open source Machine Learning @ Waikato
Professor Bernhard Pfahringer from the Machine Learning research group at the University of Waikato discusses open-source Machine Learning software suites. He reflects on their design and their position in the current international Machine Learning landscape.
Watch the lecture
Deep learning: why is it deep, and what is it learning?
University of Auckland Professor Thomas Lumley discusses the rise of neural networks. He provides insight into how deep convolutional nets are structured and how they can be effective, but also why they are brittle and can fail in remarkably alien ways.
Watch the lecture
Algorithmic fairness: Examples from predictive models for criminal justice
Dr Kristian Lum from the Human Rights Data Analysis Group discusses the use of predictive models in the criminal justice system. Using examples from predictive policing and recidivism risk assessment she demonstrates how such models could perpetuate and potentially amplify data-encoded biases.
Watch the lecture
Statistical learning and sparsity
Professor Robert Tibshirani from Stanford University reviews the lasso method for high dimensional supervised learning and discusses some new developments in the area, including the Pliable Lasso, and post-selection inference for understanding the important features.
Watch the lecture
2018 series
A thousand words: Visualising statistical data
A picture is worth a thousand words – or perhaps that should be a million numbers. The distillation of data into an honest and compelling graphic is an essential component of modern (data) science.
The 2018 Ihaka Lecture Series displayed the contributions of three experts across different facets of data visualisation.
Myth-busting and apophenia in data visualisation: Is what you see really there?
Plots of data are important tools for observing patterns, but it is easy to imagine patterns that may not exist. Using two protocols the Rorschach and the lineup, Professor Dianne Cook of Monash University describes some simple tools for helping to decide if patterns are real.
Watch the lecture
Making colour accessible
University of Auckland Associate Professor Paul Murrell investigates the 'BrailleR' package for R and its difficulties with colour. By making a mountain out of that molehill, Paul embarks on a daring Statistical Graphics journey featuring colour spaces, high-performance computing, Te reo, and XKCD.
Watch the lecture
Visual trumpery: How charts lie – and how they make us smarter
With facts and truth increasingly under assault, the use of graphs, charts, maps and infographics have become popular in supporting all manner of spin. Identifying information from misinformation is an important skill for any citizen. Alberto Cairo from the University of Miami teaches some guiding principles on how people can become more critical and better-informed readers of charts.
Watch the lecture
2017 series
Statistical Computing in the Data Age
Statistics has become essential in the data age. We have an increasing ability to collect vast quantities of data, but often still struggle to make sense of it.
The 2017 Ihaka lectures aimed to highlight the important role that both statistics and computing play in this endeavour.
Expressing yourself with R
Hadley Wickham Chief Scientist at RStudio discusses Expressing yourself with R.
Watch the lecture
R and data journalism in New Zealand
Harkanwal Singh Data Editor from the New Zealand Herald on the use of R in New Zealand's data journalism landscape.
Watch the lecture
Interactive visualisation and fast computation of the solution path for convex clustering and biclustering
Genevera Allen, from Dobelman Family Junior Chair and Departments of Statistics and Electrical and Computer Engineering at Rice University, discusses clustering as a fundamental tool for exploratory analysis of big data.
Watch the lecture
Statistical computing in a (more) static environment
Ross Ihaka Associate Professor in the Department of Statistics at the University of Auckland discusses the spectrum of statistical computing systems from the dynamic to the very static.