Problem-solving, pits of success, and improving the R environment

Sometimes, when he arrives to give a talk and the equipment isn’t fit for purpose, Statistics and Computer Science alumnus Hadley Wickham is tempted to write a rider that describes – in minute detail – how he wants everything set up. “It would also demand a bowl of M&Ms with all the brown ones removed, Van Halen-style,” jokes Hadley.

Statistics and Computer Science alumnus Hadley Wickham

This rigorous attention to detail and passion is what makes a data scientist of his calibre acquire the nickname ‘the Rock star of R’ – for his commitment to bringing an understanding of data science to the masses.

For the uninitiated, R is a statistical programming language developed by Department of Statistics academics Ross Ihaka and Robert Gentleman in the early 1990s, and now used by the majority of the world’s practising statisticians.

In his role as Chief Scientist for RStudio (an integrated development environment for R), Hadley works with his team to “make R a better environment for doing data science”, and to develop new, open source R packages that solve particular problems. He also helps people learn to use R most effectively, in order to make sense of data.

Data is one of the greatest resources of the 21st century; however, many people are baffled by the volume, variety and complexity of ‘big data’ that is collected about all aspects of our lives and the world around us, every single day. We’re fortunate, then, that Hadley is here to help us out.

“Better than ‘big data’ I like the term ‘too big data’. Too big data is when your data gets big enough that you can no longer handle it with your current tools. For most people, this is not very big: as soon as you have thousands of observations you usually need to learn some new tools (like R) to help work out what’s going on.

“It’s incredibly important to be data literate so that you can understand what data is being collected about you, and how that’s being used to drive important decisions. 

Better than ‘big data’ I like the term ‘too big data’. Too big data is when your data gets big enough that you can no longer handle it with your current tools. 

Hadley Wickham Statistics and Computer Science alumnus

“If you read the newspaper or watch TV, you hear a lot about machine learning and AI [artificial intelligence] like they’re these magic wands that make problems go away. But the reality is that it’s very easy for these tools to amplify existing biases and inequalities,” explains Hadley.

As a passionate data science communicator and educator (he teaches in-person workshops at RStudio and speaks at conferences worldwide), Hadley hopes to see the field become more accessible for newcomers, through “better tools and better teaching.” 

Hadley’s own education took a few twists and turns: “In high school I wanted to be a genetic engineer and it seemed like the best way to do that was to become a doctor first!” So he began a Bachelor of Human Biology at the University of Auckland, before switching to a Bachelor of Science majoring in Statistics and Computer Science. “I didn’t really enjoy medicine, so I went back to what I enjoyed in high school: statistics and programming.”

A love of the problem-solving nature of statistics led Hadley to complete his masters at the University of Auckland before travelling to the United States to take up a PhD at Iowa State University.

“What I particularly enjoyed at the Department of Statistics at the University of Auckland was that it was grounded in real problems – and real problems require programming, which I also enjoyed. I was lucky enough to learn R very early in my statistics career, and it has formed the foundation of pretty much everything I’ve done since then,” says Hadley. 

And “since then” quite a bit has happened – following his PhD Hadley worked at Rice University as an assistant professor of statistics for four years, before joining RStudio as its Chief Scientist. This year, he can also add ‘Adjunct Professor of Statistics at the University of Auckland’ to his admirable list of achievements. A keen foodie, baker and cocktail maker, Hadley is now based in Houston, Texas, but visits New Zealand as often as he can.

“One of the things I love about where I live now is that I’m just a couple of blocks away from a 7500 square metre liquor store where I can get any cocktail ingredient I need. 

"The thing I miss most about New Zealand is the food – there is so much good food – and coffee! – everywhere,” he says.

My work is all about trying to dig to that ‘pit’ by developing new ways of looking at data science problems and then providing code tools to bring ideas to life.

Hadley Wickham Statistics and Computer Science alumnus

Hadley returned to the Faculty of Science for a fleeting visit in March this year, when he was invited to deliver a keynote address at the Department of Statistics’ inaugural Ihaka Lecture series.

He is also a generous donor to the Department of Statistics, as part of a wider appeal to extend the reach and impact of the department’s work in statistical computing by establishing a Centre for Advanced Data Science. The Centre will lay the foundations for future initiatives that will develop a rich environment of learning, discovery and innovation.

Hadley imagines the future of his field as full of “pits of success”. He explains, “It’s not like a peak of success, which you have to strive to climb to; rather, it’s something that you can fall into – almost by accident.”

While the ‘Rock star of R’ has no immediate plans to ditch the data science for the life of an on-the-road musician, he’s no stranger to the (mosh!)pit of creativity and innovation (“I spend most of my time writing, either R code or English prose”) and he’s a fan of Forrest yoga, an intense, internally focused practice. 

“My work is all about trying to dig to that ‘pit’ by developing new ways of looking at data science problems and then providing code tools to bring ideas to life.”

We’re hoping Hadley will bring his ideas for data science-themed cocktails with him when he next visits the University – we’ve been promised a mixology session and expect a Tibbleoni (Negroni + tibbles, a way of storing data in R) to be on the menu.

We’re sure, too, that he’ll continue to make the mysteries of big data seem surmountable. 

Find out more

Ihaka Lecture Series

inSCight 

This article appears in the December 2017 edition of inSCight, the print magazine for Faculty of Science alumni.

View more articles from inSCight