Reflections one year into working as a research software engineer

Nicholas Tierney

Telethon Kids Institute, Perth, Australia

UseR! 23rd June 2022

njt-user-2022.netlify.app

nj_tierney

How/where do I work?

I am a Research Software Engineer (RSE)
Working at Telethon Kids Institute
With the Malaria Atlas Project
Primarily with Nick Golding
Maintaining greta software
Embedded within a team
Not consulted out to teams (usually)
Develop software to help teams + for specific research problems
Mixture of remote and at workplace

What sorts of things does an RSE do?

Create software to solve research problems
Develop tools that abstract the right components to facilitate research
Help researchers to find and learn good tools
Support researchers with (computational) reproducibility

(adapted from Heidi Seibold's UseR2021 Keynote talk)

njt-user-2022.netlify.app • @nj_tierney
The past yearUnderstanding, improving, maintaining greta
Develop new interfaces for statistical methods
COVID modelling for Australian Government

4

Professor Nick Golding

greta-stats.org

why 'greta' ?

Grete Hermann (1901 - 1984)

wrote the first algorithms for computer algebra

... without a computer

(To avoid people saying 'greet', the package is spelled greta instead)

What greta looks like

$α \sim N o r m a l (0, 5)$

$β \sim N o r m a l (0, 3)$

$σ \sim l o g N o r m a l (0, 5)$ $μ = α + β X$

$Y \sim N o r m a l (μ, σ)$

x <- penguins$bill_length_mm
y <- penguins$flipper_length_mm
alpha <- normal(0,5)
beta <- normal(0,3)
sd <- lognormal(0,3)
mu <- alpha + coef * x
distribution(y) <- normal(mu, sd)
m <- model(mu, beta, sd)
draws <- mcmc(m)

njt-user-2022.netlify.app • @nj_tierney
Designing new interfaces8

Malaria modelling

yahtsee (Yet Another Hierarchical Time Series Extension + Expansion)

cleaned_data <- data %>%
    as_tibble() %>%
    group_by(who_region) %>%
    transmute(.who_region_id = cur_group_id()) %>%
    ungroup(who_region) %>%
    select(-who_region) %>%
    group_by(country) %>%
    transmute(.country_id = cur_group_id()) %>%
    ungroup(country) %>%
    select(-country)

Malaria modelling

model <- inlabru::bru(
formula = pr ~ avg_lower_age + Intercept + 
  who_region(month_num,
             model = "ar1", 
             group = .who_region_id,
             constr = FALSE) + 
  country(month_num,
          model = "ar1", 
          group = .country_id,
          constr = FALSE),
    family = "gaussian",
    data = malaria_africa_ts,
    options = list(control.compute = list(config = TRUE),
                   control.predictor = list(compute = TRUE, link = 1))
    )

Malaria modelling

yahtsee (Yet Another Hierarchical Time Series Extension + Expansion)

m <- fit_hts(
  formula = pr ~ avg_lower_age + 
  hts(who_region,
      country),
  .data = malaria_africa_ts,
  family = "gaussian"
)

Reflections; Advice

njt-user-2022.netlify.app • @nj_tierney
greta is complex: Where do you start11,177 lines of code
1,535 tests
~705 functions

13

njt-user-2022.netlify.app • @nj_tierney
14

njt-user-2022.netlify.app • @nj_tierney
Getting to grips with a new code baseKeep a notebook
Get familiar with the code - use it!
Go through the vignettes
Read the helpfiles
Use the code (again)
Read the vignettes (again)
Keep notes: questions, unexpected behaviour
Talk to the maintainer often, clarifying questions
15

Getting to grips with a new code base?

Sort alphabetically, read through every line of code (really)
Keep a document of things I note that could be improved
...16 pages of notes later, rearrange and organise into tasks/groups

njt-user-2022.netlify.app • @nj_tierney
You can smell without doing the cooking17

You can smell without doing the cooking

Code Smells (I first heard through Jenny Bryan's UseR 2018 keynote):

is an evocative term for that vague feeling of unease we get when reading certain bits of code. It's not necessarily wrong, but neither is it obviously correct

You can smell without doing the cooking

Code Smells (I first heard through Jenny Bryan's UseR 2018 keynote):

is an evocative term for that vague feeling of unease we get when reading certain bits of code. It's not necessarily wrong, but neither is it obviously correct

You can identify code patterns and smells even without deeply understanding the code

njt-user-2022.netlify.app • @nj_tierney
You can smell without doing the cookingIdentifying repeated error messages
Re-wording error messages
param_lengths <- vapply(
    params,
    function(x) length(x),
    FUN.VALUE = 1L
      )

param_lengths <- lengths(params)

18

Use snapshot tests

You want to test messages or output, e.g., that your code looks like this:

> 
greta array (data)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Use snapshot tests: before

# data arrays
# print method
ga_data <- as_data(matrix(1:9, nrow = 3))
expected_output <- paste0(
  "greta array (data)\n\n     [,1] [,2] [,3]\n[1,]",
  "    1    4    7\n[2,]    2    5    8\n[3,]    3",
  "    6    9")
result <- evaluate_promise(ga_data, print = TRUE)
expect_identical(result$output, expected_output)

Snapshot tests: after

# data arrays
# print method
ga_data <- as_data(matrix(1:9, nrow = 3))
expect_snapshot(
  ga_data
)

Snapshot test

# print and summary work
    Code
      ga_data
    Output
      greta array (data)
           [,1] [,2] [,3]
      [1,]    1    4    7
      [2,]    2    5    8
      [3,]    3    6    9

Snapshot tests: error message testing before

# wrong class of object
expect_error(
  as_data(NULL),
  "objects of class NULL cannot be coerced to greta arrays"
  )
expect_error(
  as_data(list()),
  "objects of class list cannot be coerced to greta arrays"
  )
expect_error(
  as_data(environment()),
  "objects of class environment cannot be coerced to greta arrays"
  )

Snapshot tests: error message testing after

# wrong class of object
expect_snapshot_error(
  as_data(NULL)
)
expect_snapshot_error(
  as_data(list())
)
expect_snapshot_error(
  as_data(environment())
)

Snapshot tests: error messages output

# as_data errors informatively
    Object cannot be coerced to <greta_array>
    Objects of class <NULL> cannot be coerced to a <greta_array>
---
    Object cannot be coerced to <greta_array>
    Objects of class <list> cannot be coerced to a <greta_array>
---
    Object cannot be coerced to <greta_array>
    Objects of class <environment> cannot be coerced to a <greta_array>
---

Also provides a useful way to review all error messages - you can read over error/warning/messages in bulk
E.g., change from:
- "Error: Wrong dimensions for X" to
- "Error: Dimensions for X must be Z, but we see X has dimensions Y"

njt-user-2022.netlify.app • @nj_tierney
Use Version ControlGit/mercurial/SVN/whatever
26

Use Version Control

Git/mercurial/SVN/whatever
github/bitbucket/gitlab/whatever

Use Version Control

Git/mercurial/SVN/whatever
github/bitbucket/gitlab/whatever
If you develop a new changes, make a new branch.
The minor convenience is worth the relief.

The relief of making changes to the code, making a bunch of mistakes, realising it would break a lot of things downstream and that you can safely leave it be and it will not ruin your code. It is very satisfying.

Write useful commit messages

Do:

Finish the sentence: "This commit will..."*
"This commit will use message instead of stop"

Don't

"No hyphen, ugh"
"uhhhh, try putting dev mockery back?"

* heard from Adam Gruer

Save time with `pr_fetch()` and `pr_finish()`

These are amazing functions that help save me a bit of time every day

pr_fetch(101): Grabs Pull Request 101 from github into your local session
pr_finish(101): After you've merged your PR on github, deletes the branch locally and remotely, makes sure you are back on main branch
There are more (which I should probably learn) see the usethis docs

njt-user-2022.netlify.app • @nj_tierney
Continuous integration: it will save you timeRun your code on someone else's (GitHub's) machine
Run your tests - check things work
29

Continuous integration: it will save you time...eventually

njt-user-2022.netlify.app • @nj_tierney
....EventuallySometimes it feels like you'll be doing a lot of waiting
31

....Eventually

Sometimes it feels like you'll be doing a lot of waiting
Push...wait 23 minutes for it to finish building on Windows

....Eventually

Sometimes it feels like you'll be doing a lot of waiting
Push...wait 23 minutes for it to finish building on Windows
Push...google some obscure error message

....Eventually

Sometimes it feels like you'll be doing a lot of waiting
Push...wait 23 minutes for it to finish building on Windows
Push...google some obscure error message
Realise you have specified shell: RScript {0} instead of shell: Rscript {0} (there is an upper case S in the first one. Thank you Jim Hester for finding this)

....Eventually

Sometimes it feels like you'll be doing a lot of waiting
Push...wait 23 minutes for it to finish building on Windows
Push...google some obscure error message
Realise you have specified shell: RScript {0} instead of shell: Rscript {0} (there is an upper case S in the first one. Thank you Jim Hester for finding this)
Realise you have spent several days debugging some issue with GH actions that was actually just some missing {

njt-user-2022.netlify.app • @nj_tierney
....EventuallyI feel like I spent 90% of last year trying to get GH actions passing
32

....Eventually

I feel like I spent 90% of last year trying to get GH actions passing
I really wish I had learnt a bit about it before blindly pushing changes - e.g., that you can run actions locally

....Eventually

I feel like I spent 90% of last year trying to get GH actions passing
I really wish I had learnt a bit about it before blindly pushing changes - e.g., that you can run actions locally
Some sample commit messages:
"Try windows old-rel instead of 3.6"
"No hyphen, ugh"
"use oldrel-1 and oldrel-2 instead of oldrel and R 3.5"
"ugh, mockery was there twice. Try removing the dev versions again?"
"uhhhh, try putting dev mockery back?"
"what happens on CI if we don't use mockery dev?"

njt-user-2022.netlify.app • @nj_tierney
... Eventually - the "always failing" paradoxIf your tests fail, does your software actually work?
33

... Eventually - the "always failing" paradox

If your tests fail, does your software actually work?
A single test failure doesn't mean it is broken!

... Eventually - the "always failing" paradox

If your tests fail, does your software actually work?
A single test failure doesn't mean it is broken!
greta stay in a place of "it isn't any more broken than before..."

... Eventually - the "always failing" paradox

If your tests fail, does your software actually work?
A single test failure doesn't mean it is broken!
greta stay in a place of "it isn't any more broken than before..."
Slightly brain melting: trying to diff error messages and mentally regression test them. Not what it is designed for.

... Eventually - the "always failing" paradox

If your tests fail, does your software actually work?
A single test failure doesn't mean it is broken!
greta stay in a place of "it isn't any more broken than before..."
Slightly brain melting: trying to diff error messages and mentally regression test them. Not what it is designed for.
Waiting 1-24 minutes for a build to finish can be a massive time suck.

njt-user-2022.netlify.app • @nj_tierney
Use reproducible examplesWrap up small problems into small examples with reprex
The act of reprexing has solved many problems for me!
It helps others solve your problem as well
A "video reprex" can also be useful/better. (e.g., demo spooky browser behaviour)

34

Prefer glue over paste/sprintf

dist_type <- "normal"
n_dim <- 6
paste0("Following a ", dist_type, " distribution with ", n_dim, " dimensions")

## [1] "Following a normal distribution with 6 dimensions"

Prefer glue over paste/sprintf

dist_type <- "normal"
n_dim <- 6
paste0("Following a ", dist_type, " distribution with ", n_dim, " dimensions")

## [1] "Following a normal distribution with 6 dimensions"

glue("Following a {dist_type} distribution with {n_dim} dimensions")

## Following a normal distribution with 6 dimensions

See my blog post, "glue magic Part 1"

Use cli to construct messages: good

print_file_msg_paste <- function(n_file){
  msg <- ifelse(test = n_file == 1,
                yes = paste0("Found ", n_file, " file"),
                no = paste0("Found ", n_file, " files"))
  cat(msg)
}

Use cli to construct messages: good

print_file_msg_paste(0)

## Found 0 files

print_file_msg_paste(1)

## Found 1 file

print_file_msg_paste(2)

## Found 2 files

Use cli to construct messages: better

print_file_msg_cli <- function(n_file){
  cat(format_message("Found {n_file} file{?s}"))
}

Use cli to construct messages: better

print_file_msg_cli(1)

## Found 1 file

print_file_msg_cli(2)

## Found 2 files

print_file_msg_cli(3)

## Found 3 files

R Packages aren't always the answer

Not everything needs to be an R package
Sometimes analysis code isn't always appropriate to change to package code. See Miles McBain's blog post "Project as an R package: An okay idea" on this.

njt-user-2022.netlify.app • @nj_tierney
The Future41

RSEs and software are starting to get more credit

In the past 12 months:

Nature article: Why science needs more research software engineers, by Chris Woolston
Monash University Business School now recognises software as first class academic output
Statistical Society of Australia (SSA) hosted a panel session on RSEs
ACEMs podcast: acknowledging research software
I delievered seminar: "Acknowledging research software in academia" at UNSW
SSA has developed two awards for statistical software:
- Di Cook Award ($1000): Student prize for Victoria and Tasmania
- Venables Award ($5000): National statistical software prize

njt-user-2022.netlify.app • @nj_tierney
ThanksNick Golding
Miles McBain
Heidi Seibold
Heather Turner

Dianne Cook
Rob Hyndman
Maëlle Salmon
Karthik Ram

43

Resources

Colophon

Slides made using xaringan
Extended with xaringanthemer
Colours taken + modified from lorikeet theme from ochRe
Header font is Josefin Sans
Body text font is Montserrat
Code font is Fira Mono
template available: njtierney/njt-talks

Learning more

talk link

nj_tierney

njtierney

nicholas.tierney@gmail.com

End.

How/where do I work?

I am a Research Software Engineer (RSE)

Working at Telethon Kids Institute

With the Malaria Atlas Project

Primarily with Nick Golding

Maintaining greta software

Embedded within a team

Not consulted out to teams (usually)

Develop software to help teams + for specific research problems

Mixture of remote and at workplace

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Reflections one year into working as a research software engineer

Nicholas Tierney

Telethon Kids Institute, Perth, Australia

UseR! 23rd June 2022

How/where do I work?

What sorts of things does an RSE do?

The past year

why 'greta' ?

What greta looks like

Designing new interfaces

Malaria modelling

Malaria modelling

Malaria modelling

greta is complex: Where do you start

Getting to grips with a new code base

Getting to grips with a new code base?

Getting to grips with a new code base?

You can smell without doing the cooking

You can smell without doing the cooking

You can smell without doing the cooking

You can smell without doing the cooking

Use snapshot tests

Use snapshot tests: before

Snapshot tests: after

Snapshot test

Snapshot tests: error message testing before

Snapshot tests: error message testing after

Snapshot tests: error messages output

Use Version Control

Use Version Control

Use Version Control

Write useful commit messages

Save time with pr_fetch() and pr_finish()

Continuous integration: it will save you time

Continuous integration: it will save you time...eventually

....Eventually

....Eventually

....Eventually

....Eventually

....Eventually

....Eventually

....Eventually

....Eventually

... Eventually - the "always failing" paradox

... Eventually - the "always failing" paradox

... Eventually - the "always failing" paradox

... Eventually - the "always failing" paradox

... Eventually - the "always failing" paradox

Use reproducible examples

Prefer glue over paste/sprintf

Prefer glue over paste/sprintf

Use cli to construct messages: good

Use cli to construct messages: good

Use cli to construct messages: better

Use cli to construct messages: better

R Packages aren't always the answer

The Future

RSEs and software are starting to get more credit

Thanks

Resources

Colophon

Learning more

How/where do I work?

Help

`greta` is complex: Where do you start

Save time with `pr_fetch()` and `pr_finish()`