+ - 0:00:00
Notes for current slide
Notes for next slide

Reflections one year into working as a research software engineer


Nicholas Tierney

Telethon Kids Institute, Perth, Australia

UseR! 23rd June 2022

njt-user-2022.netlify.app

nj_tierney

1

How/where do I work?

  • I am a Research Software Engineer (RSE)
  • Working at Telethon Kids Institute
  • With the Malaria Atlas Project
  • Primarily with Nick Golding
  • Maintaining greta software
  • Embedded within a team
  • Not consulted out to teams (usually)
  • Develop software to help teams + for specific research problems
  • Mixture of remote and at workplace
2

What sorts of things does an RSE do?

  • Create software to solve research problems
  • Develop tools that abstract the right components to facilitate research
  • Help researchers to find and learn good tools
  • Support researchers with (computational) reproducibility

(adapted from Heidi Seibold's UseR2021 Keynote talk)

3

The past year

  1. Understanding, improving, maintaining greta
  2. Develop new interfaces for statistical methods
  3. COVID modelling for Australian Government
4

Professor Nick Golding

5

why 'greta' ?

Grete Hermann (1901 - 1984)

wrote the first algorithms for computer algebra

... without a computer

(To avoid people saying 'greet', the package is spelled greta instead)

6

What greta looks like

$$ \alpha \sim Normal(0, 5) $$

$$ \beta \sim Normal(0, 3) $$

$$ \sigma \sim logNormal(0, 5) $$ $$ \mu = \alpha + \beta X $$

$$ Y \sim Normal(\mu, \sigma) $$

x <- penguins$bill_length_mm
y <- penguins$flipper_length_mm
alpha <- normal(0,5)
beta <- normal(0,3)
sd <- lognormal(0,3)
mu <- alpha + coef * x
distribution(y) <- normal(mu, sd)
m <- model(mu, beta, sd)
draws <- mcmc(m)
7

Designing new interfaces

8

Malaria modelling

yahtsee (Yet Another Hierarchical Time Series Extension + Expansion)

cleaned_data <- data %>%
as_tibble() %>%
group_by(who_region) %>%
transmute(.who_region_id = cur_group_id()) %>%
ungroup(who_region) %>%
select(-who_region) %>%
group_by(country) %>%
transmute(.country_id = cur_group_id()) %>%
ungroup(country) %>%
select(-country)
9

Malaria modelling

model <- inlabru::bru(
formula = pr ~ avg_lower_age + Intercept +
who_region(month_num,
model = "ar1",
group = .who_region_id,
constr = FALSE) +
country(month_num,
model = "ar1",
group = .country_id,
constr = FALSE),
family = "gaussian",
data = malaria_africa_ts,
options = list(control.compute = list(config = TRUE),
control.predictor = list(compute = TRUE, link = 1))
)
10

Malaria modelling

yahtsee (Yet Another Hierarchical Time Series Extension + Expansion)

m <- fit_hts(
formula = pr ~ avg_lower_age +
hts(who_region,
country),
.data = malaria_africa_ts,
family = "gaussian"
)
11

Reflections; Advice

12

greta is complex: Where do you start

  • 11,177 lines of code
  • 1,535 tests
  • ~705 functions
13
14

Getting to grips with a new code base

  • Keep a notebook
  • Get familiar with the code - use it!
  • Go through the vignettes
  • Read the helpfiles
  • Use the code (again)
  • Read the vignettes (again)
  • Keep notes: questions, unexpected behaviour
  • Talk to the maintainer often, clarifying questions
15

Getting to grips with a new code base?

16

Getting to grips with a new code base?

16
  • Sort alphabetically, read through every line of code (really)
  • Keep a document of things I note that could be improved
  • ...16 pages of notes later, rearrange and organise into tasks/groups

You can smell without doing the cooking

17

You can smell without doing the cooking

Code Smells (I first heard through Jenny Bryan's UseR 2018 keynote):

is an evocative term for that vague feeling of unease we get when reading certain bits of code. It's not necessarily wrong, but neither is it obviously correct

17

You can smell without doing the cooking

Code Smells (I first heard through Jenny Bryan's UseR 2018 keynote):

is an evocative term for that vague feeling of unease we get when reading certain bits of code. It's not necessarily wrong, but neither is it obviously correct

  • You can identify code patterns and smells even without deeply understanding the code
17

You can smell without doing the cooking

  • Identifying repeated error messages
  • Re-wording error messages
param_lengths <- vapply(
params,
function(x) length(x),
FUN.VALUE = 1L
)
param_lengths <- lengths(params)
18

Use snapshot tests

You want to test messages or output, e.g., that your code looks like this:

>
greta array (data)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
19

Use snapshot tests: before

# data arrays
# print method
ga_data <- as_data(matrix(1:9, nrow = 3))
expected_output <- paste0(
"greta array (data)\n\n [,1] [,2] [,3]\n[1,]",
" 1 4 7\n[2,] 2 5 8\n[3,] 3",
" 6 9")
result <- evaluate_promise(ga_data, print = TRUE)
expect_identical(result$output, expected_output)
20

Snapshot tests: after

# data arrays
# print method
ga_data <- as_data(matrix(1:9, nrow = 3))
expect_snapshot(
ga_data
)
21

Snapshot test

# print and summary work
Code
ga_data
Output
greta array (data)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
22

Snapshot tests: error message testing before

# wrong class of object
expect_error(
as_data(NULL),
"objects of class NULL cannot be coerced to greta arrays"
)
expect_error(
as_data(list()),
"objects of class list cannot be coerced to greta arrays"
)
expect_error(
as_data(environment()),
"objects of class environment cannot be coerced to greta arrays"
)
23

Snapshot tests: error message testing after

# wrong class of object
expect_snapshot_error(
as_data(NULL)
)
expect_snapshot_error(
as_data(list())
)
expect_snapshot_error(
as_data(environment())
)
24

Snapshot tests: error messages output

# as_data errors informatively
Object cannot be coerced to <greta_array>
Objects of class <NULL> cannot be coerced to a <greta_array>
---
Object cannot be coerced to <greta_array>
Objects of class <list> cannot be coerced to a <greta_array>
---
Object cannot be coerced to <greta_array>
Objects of class <environment> cannot be coerced to a <greta_array>
---
25
  • Also provides a useful way to review all error messages - you can read over error/warning/messages in bulk
  • E.g., change from:
    • "Error: Wrong dimensions for X" to
    • "Error: Dimensions for X must be Z, but we see X has dimensions Y"

Use Version Control

  • Git/mercurial/SVN/whatever
26

Use Version Control

  • Git/mercurial/SVN/whatever

  • github/bitbucket/gitlab/whatever

26

Use Version Control

  • Git/mercurial/SVN/whatever

  • github/bitbucket/gitlab/whatever

  • If you develop a new changes, make a new branch.

  • The minor convenience is worth the relief.
26
  • The relief of making changes to the code, making a bunch of mistakes, realising it would break a lot of things downstream and that you can safely leave it be and it will not ruin your code. It is very satisfying.

Write useful commit messages

Do:

  • Finish the sentence: "This commit will..."*
  • "This commit will use message instead of stop"

Don't

  • "No hyphen, ugh"
  • "uhhhh, try putting dev mockery back?"

* heard from Adam Gruer

27

Save time with pr_fetch() and pr_finish()

These are amazing functions that help save me a bit of time every day

  • pr_fetch(101): Grabs Pull Request 101 from github into your local session
  • pr_finish(101): After you've merged your PR on github, deletes the branch locally and remotely, makes sure you are back on main branch
  • There are more (which I should probably learn) see the usethis docs
28

Continuous integration: it will save you time

  • Run your code on someone else's (GitHub's) machine
  • Run your tests - check things work
29

Continuous integration: it will save you time...eventually

30

....Eventually

  • Sometimes it feels like you'll be doing a lot of waiting
31

....Eventually

  • Sometimes it feels like you'll be doing a lot of waiting

  • Push...wait 23 minutes for it to finish building on Windows

31

....Eventually

  • Sometimes it feels like you'll be doing a lot of waiting

  • Push...wait 23 minutes for it to finish building on Windows

  • Push...google some obscure error message

31

....Eventually

  • Sometimes it feels like you'll be doing a lot of waiting

  • Push...wait 23 minutes for it to finish building on Windows

  • Push...google some obscure error message

  • Realise you have specified shell: RScript {0} instead of shell: Rscript {0} (there is an upper case S in the first one. Thank you Jim Hester for finding this)

31

....Eventually

  • Sometimes it feels like you'll be doing a lot of waiting

  • Push...wait 23 minutes for it to finish building on Windows

  • Push...google some obscure error message

  • Realise you have specified shell: RScript {0} instead of shell: Rscript {0} (there is an upper case S in the first one. Thank you Jim Hester for finding this)

  • Realise you have spent several days debugging some issue with GH actions that was actually just some missing {

31

....Eventually

  • I feel like I spent 90% of last year trying to get GH actions passing
32

....Eventually

  • I feel like I spent 90% of last year trying to get GH actions passing

  • I really wish I had learnt a bit about it before blindly pushing changes - e.g., that you can run actions locally

32

....Eventually

  • I feel like I spent 90% of last year trying to get GH actions passing

  • I really wish I had learnt a bit about it before blindly pushing changes - e.g., that you can run actions locally

  • Some sample commit messages:

  • "Try windows old-rel instead of 3.6"

  • "No hyphen, ugh"
  • "use oldrel-1 and oldrel-2 instead of oldrel and R 3.5"
  • "ugh, mockery was there twice. Try removing the dev versions again?"
  • "uhhhh, try putting dev mockery back?"
  • "what happens on CI if we don't use mockery dev?"
32

... Eventually - the "always failing" paradox

  • If your tests fail, does your software actually work?
33

... Eventually - the "always failing" paradox

  • If your tests fail, does your software actually work?

  • A single test failure doesn't mean it is broken!

33

... Eventually - the "always failing" paradox

  • If your tests fail, does your software actually work?

  • A single test failure doesn't mean it is broken!

  • greta stay in a place of "it isn't any more broken than before..."

33

... Eventually - the "always failing" paradox

  • If your tests fail, does your software actually work?

  • A single test failure doesn't mean it is broken!

  • greta stay in a place of "it isn't any more broken than before..."

  • Slightly brain melting: trying to diff error messages and mentally regression test them. Not what it is designed for.

33

... Eventually - the "always failing" paradox

  • If your tests fail, does your software actually work?

  • A single test failure doesn't mean it is broken!

  • greta stay in a place of "it isn't any more broken than before..."

  • Slightly brain melting: trying to diff error messages and mentally regression test them. Not what it is designed for.

  • Waiting 1-24 minutes for a build to finish can be a massive time suck.

33

Use reproducible examples

  • Wrap up small problems into small examples with reprex
  • The act of reprexing has solved many problems for me!
  • It helps others solve your problem as well
  • A "video reprex" can also be useful/better. (e.g., demo spooky browser behaviour)
34

Prefer glue over paste/sprintf

dist_type <- "normal"
n_dim <- 6
paste0("Following a ", dist_type, " distribution with ", n_dim, " dimensions")
## [1] "Following a normal distribution with 6 dimensions"
35

Prefer glue over paste/sprintf

dist_type <- "normal"
n_dim <- 6
paste0("Following a ", dist_type, " distribution with ", n_dim, " dimensions")
## [1] "Following a normal distribution with 6 dimensions"
glue("Following a {dist_type} distribution with {n_dim} dimensions")
## Following a normal distribution with 6 dimensions

See my blog post, "glue magic Part 1"

35

Use cli to construct messages: good

print_file_msg_paste <- function(n_file){
msg <- ifelse(test = n_file == 1,
yes = paste0("Found ", n_file, " file"),
no = paste0("Found ", n_file, " files"))
cat(msg)
}
36

Use cli to construct messages: good

print_file_msg_paste(0)
## Found 0 files
print_file_msg_paste(1)
## Found 1 file
print_file_msg_paste(2)
## Found 2 files
37

Use cli to construct messages: better

print_file_msg_cli <- function(n_file){
cat(format_message("Found {n_file} file{?s}"))
}
38

Use cli to construct messages: better

print_file_msg_cli(1)
## Found 1 file
print_file_msg_cli(2)
## Found 2 files
print_file_msg_cli(3)
## Found 3 files
39

R Packages aren't always the answer

40

The Future

41

RSEs and software are starting to get more credit

In the past 12 months:

42

Thanks

  • Nick Golding
  • Miles McBain
  • Heidi Seibold
  • Heather Turner
  • Dianne Cook
  • Rob Hyndman
  • Maëlle Salmon
  • Karthik Ram
43

Colophon

45

Learning more

talk link

nj_tierney

njtierney

nicholas.tierney@gmail.com

46

End.

47

How/where do I work?

  • I am a Research Software Engineer (RSE)
  • Working at Telethon Kids Institute
  • With the Malaria Atlas Project
  • Primarily with Nick Golding
  • Maintaining greta software
  • Embedded within a team
  • Not consulted out to teams (usually)
  • Develop software to help teams + for specific research problems
  • Mixture of remote and at workplace
2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow