Overview

The purpose of this vignette is to walk one through how to calculate MHWs in both the R and Python languages. It will use code from both languages throughout. The secondary purpose of this vignette is to show that the outputs are identical. The tertiary purpose of this vignette is to run benchmarks on the code to compare their speeds.

Calculating events

First up, let’s look at the two different ways in which one may calculate MHWs. First we will look at the R code, then Python.

R code

Setup

The basic functionality for calculating marine heatwaves (MHWs) in R may be found in the heatwaveR package that may currently be downloaded and installed with the following lines of code. Note that if one has already installed these packages they do not need to be installed again. To run one of the lines in the following code chunk will require that the hash-tag first be removed.

With the necessary packages installed, we activate heatwaveR with the following line of code.

Default output

With everything downloaded and ready for us to use we may now calculate some events. The heatwaveR package has three built in time series (sst_WA, sst_Med, sst_NW_Atl) that we may use to more easily demonstrate how the code works. The general pipeline in R for calculating events is first to create a climatology from a chosen time series using ts2clm(), and then to feed that output into detect_event(), as seen below.

To look at these outputs we would use the following options. For now we’ll just look at the event output.

## # A tibble: 6 x 6
##   event_no index_start index_peak index_end duration date_start
##      <dbl>       <dbl>      <dbl>     <dbl>    <dbl> <date>    
## 1        1         884        887       895       12 1984-06-02
## 2        2         899        901       904        6 1984-06-17
## 3        3         908        922       926       19 1984-06-26
## 4        4        1008       1010      1012        5 1984-10-04
## 5        5        1023       1027      1029        7 1984-10-19
## 6        6        1033       1034      1052       20 1984-10-29
## # A tibble: 1 x 6
##   event_no index_start index_peak index_end duration date_start
##      <dbl>       <dbl>      <dbl>     <dbl>    <dbl> <date>    
## 1       42       10629      10651     10689       61 2011-02-06

Python code

Setup

To download and install the Python package for calculating MHWs one may run the following line of code in a console:

Or simply download the GitHub repository and follow the instructions there for downloading. Or if still lost, phone a friend!

Before we begin the calculations we need to create a time series. I’ve chosen to take the sst_WA time series from the heatwaveR package to ensure consistency between the results. The python code prefers to have the dates and the temperatures fed into it as two separate vectors, so I will first split these up in R before we feed them to Python.

Reticulate code

It is also possible to run the Python code through R with the use of the R package reticulate. This is particularly useful as it allows us to perform the comparisons and benchmarking all within the same language.

Setup

Here we load the reticulate package and choose the conda environment I’ve already created called py27. For help on how to set up a conda environment go here. I’ve ensured that all of the required python modules are installed within this environment.

Once we’ve told R which version/environment we would like to use for Python we may then load the necessary modules.

One may run python code in it’s native form within R by passing it as a character vector to py_run_string().

## This is an R chunk ##

py_run_string("t = np.arange(date(1982,1,1).toordinal(),date(2014,12,31).toordinal()+1)")
py_run_string("sst = np.loadtxt(open('data/sst_WA.csv', 'r'), delimiter=',', skiprows=1)")
# py_run_string("sst.flags.writeable = True")
# py_run_string("sst.setflags(write = True)")
# py_run_string("sst = np.array(sst).copy()")

Or we may just create the objects in native R.

Comparisons

With some climatologies and events calculated with both languages we may now compare the results of the two. I’ll do so here natively in R to avoid any potential hiccups from translating across languages. I’ll create the R output here, and load the Python output created in the Python code chunk above.

With these default results loaded in the same format in the same language we can now start to look at how they compare. For starters I am just doing some simple sums and correlations.

Climatologies

## This is an R chunk ##

cor(res_clim_R$seas, res_clim_Python$seas)
## [1] 0.9999999
sum(res_clim_R$seas) - sum(res_clim_Python$seas)
## [1] -2.053821
cor(res_clim_R$thresh, res_clim_Python$thresh)
## [1] 0.9999996
sum(res_clim_R$thresh) - sum(res_clim_Python$thresh)
## [1] -6.573385

The seasonal and threshold climatologies correlate very well, but adding them up we see that the Python values are consistently higher. This is due to rounding differences between the languages. Even though the sum difference between the thresholds may look large at a value of ~6.5, considering this is over 12053 days of data, the difference is only ~0.0005/day. So I think it is safe to move on and look at the events.

February 29th

It was also recently brought to my attention by Zijie Zhao, author of the MATLAB distribution for MHW code, that Python and R calculate February 29th differently. With Python calculating the climatologies first, and then filling in the missing days, whereas R calculates February 29th first, and then the climatologies from that. So with that being said let’s have a peek at the difference this may cause.

## [1] 0.0003321473
## [1] -0.005438159

Another thing pointed out by Zijie is that Python and MATLAB calculate percentiles differently, this then is likely also true for R and Python. This would explain why the seasonal climatologies (seas) are always much closer than the thresholds (thresh) for all of these tests.

Events

Below we have a for loop that will run a correlation on all rows with the same name, as well as find the sum difference between them.

## This is an R chunk ##

# Remove non-numeric columns
res_event_num <- res_event_R %>% 
  select_if(is.numeric)

# Run the loop
res_event <- data.frame()
for(i in 1:length(colnames(res_event_num))){
  if(colnames(res_event_num)[i] %in% colnames(res_event_Python)){
    x1 <- res_event_R[colnames(res_event_R) == colnames(res_event_num)[i]]
    x2 <- res_event_Python[colnames(res_event_Python) == colnames(res_event_num)[i]]
    x <- data.frame(r = cor(x1, x2, use = "complete.obs"),
                    difference = round(sum(x1, na.rm = T) - sum(x2, na.rm = T), 4),
                    var = colnames(res_event_num)[i])
    colnames(x)[1] <- "r"
    rownames(x) <- NULL
    } else {
      x <- data.frame(r = NA, difference = NA, var = colnames(res_event_num)[i])
      }
  res_event <- rbind(res_event, x)
  }
na.omit(res_event)
##            r difference                            var
## 2  1.0000000    60.0000                    index_start
## 3  1.0000000    60.0000                     index_peak
## 4  1.0000000    60.0000                      index_end
## 5  1.0000000     0.0000                       duration
## 6  0.9999991     0.0089                 intensity_mean
## 7  0.9999997     0.0087                  intensity_max
## 8  0.9996073     0.7347                  intensity_var
## 9  1.0000000     0.1630           intensity_cumulative
## 10 0.9999971     0.0160       intensity_mean_relThresh
## 11 0.9999994     0.0165        intensity_max_relThresh
## 12 0.9995693     0.7376        intensity_var_relThresh
## 13 0.9999996     0.4755 intensity_cumulative_relThresh
## 14 1.0000000     0.0000             intensity_mean_abs
## 15 0.9990733     1.6100              intensity_max_abs
## 16 0.9997263     0.7667              intensity_var_abs
## 17 1.0000000     0.0004       intensity_cumulative_abs
## 18 0.9999999    -0.0003                     rate_onset
## 19 0.9999999     0.0002                   rate_decline

It is a bit odd that there is such a large difference between intensity_var for the two language. This implies that R is detecting larger differences in the intensity around the threshold than Python is. This may again be due to the differences in thresholds being detected, but that makes a rather small difference. It is perhaps more likely that variance is calculated slightly differently between languages. Looking at the source code one sees that this is calculated by first finding the variance of the intensity, and then the square root of that. These two calculations likely differ enough between the languages to be causing this. Ultimately it is of little concern as the variance values found throughout both packages are not of much interest to anyone.

We may also see that the difference in intensity_max_abs is a bit higher in Python than in R. This is rather surprising as intensity_max_abs simply shows the maximum temperature (independent of any thresholds etc.) experienced during that event. And considering that these calculations were performed on the exact same time series, intensity_max_abs should always be exactly the same between the two languages. Looking at the climatology output one sees that the temperature is still correct, meaning that the difference must occur somewhere within the R function detect_event. Looking at the R source code we see that intensity_max_abs is calculated on line 298, for Python this is calculated on line 376. The reason for the difference is because R is looking for the maximum temperature throughout all dates for a given event, whereas Python is simply taking the temperature on the peak date of the event. This is a subtle but significant difference as this means this value is showing two different things. I’m not certain which is more appropriate, though I’m leaning towards the Python implementation.

With all of our overlapping columns compared, and our differences and Pearson r values looking solid, let’s finish off this basic comparison by finding which columns are not shared between the different language outputs. Also, please note that the apparently large differences between the index_start, index_peak, and index_end values are due to the different indexing methods of R and Python. R starts at 1 and Python at 0. The

## This is an R chunk ##

cols_R <- colnames(res_event_R)[!(colnames(res_event_R) %in% colnames(res_event_Python))]
cols_R
## [1] "event_no"
cols_Py <- colnames(res_event_Python)[!(colnames(res_event_Python) %in% colnames(res_event_R))]
cols_Py
## [1] "category"          "duration_extreme"  "duration_moderate"
## [4] "duration_severe"   "duration_strong"   "n_events"         
## [7] "time_end"          "time_peak"         "time_start"

Wonderful! Almost everything matches up exactly. The duration of categories of events is something added in R by another function category(), and will be left that way for now. The “time” columns in the Python output aren’t relevant as far as I can see in the present usage as these functions currently only take day values. The different methods of labelling the events will be left as they are for now as well.

It is also worth noting that the values for index_start, index_peak, and index_end are off by one between the two languages. This is due to the difference in indexing between the languages. Looking at the date_start, date_peak, and date_end values we see that they are the same.

Missing data

A bug was discovered in v0.3.3 of the R code with regards to the handling of missing data. Specifically, in the move to a C++ back-end, the R code stopped being able to handle missing data for the climatology calculations. This has since been corrected from v0.3.4 onwards, but it is worthwhile to ensure that missing data are handled the same way.

First we’ll create the dataframe with missing data:

## This is an R chunk ##

library(padr)
library(lubridate)
random_knockout <- function(df, prop){
  res <- df %>% 
    sample_frac(size = prop) %>% 
    arrange(t) %>%
    pad(interval = "day")
  return(res)
}

# Remove 10% of the data randomly
sst_WA_miss_random <- random_knockout(sst_WA, 0.9)
write.csv(sst_WA_miss_random$temp, file = "data/sst_WA_miss_random.csv", row.names = F, na = "NaN")

# Remove simulated ice cover
sst_WA_miss_ice <- sst_WA %>% 
  mutate(month = month(t, label = T),
         temp = ifelse(month %in% c("Jan", "Feb", "Mar"), NA, temp)) %>% 
  select(-month)
write.csv(sst_WA_miss_ice$temp, file = "data/sst_WA_miss_ice.csv", row.names = F, na = "NaN")

With the missing data saved, we can now calculate and compare the climatologies that the two different languages will create.

With the Python climatologies and events calculated from the missing data we’ll now do the same for R.

And then the comparisons of the seas and thresh values.

## This is an R chunk ##

# Compare clims/thresholds
cor(res_clim_random_R$seas, res_clim_random_Python$seas)
## [1] 0.9999996
sum(res_clim_random_R$seas) - sum(res_clim_random_Python$seas)
## [1] -2.372016
cor(res_clim_random_R$thresh, res_clim_random_Python$thresh)
## [1] 0.9999343
sum(res_clim_random_R$thresh) - sum(res_clim_random_Python$thresh)
## [1] -866.0401

We may see from the results above that the values still correlate very strongly, if slightly less so than with no missing data. We see again that the Python values are higher overall for both the seasonal climatology and the threshold. This time however the differences are much larger. The seasonal climatology is off by ~0.0004/day, which is acceptable, but the 90th percentile threshold is off by ~0.0695. This is too large of a difference and this issue will need to be investigated. It is most likely due to the difference in how quantiles are calculated between the languages though and so there may be little to do about it. Perhaps more importantly is how these differences may affect the events being detected.

## This is an R chunk ##

nrow(res_event_random_R)
## [1] 57
nrow(res_event_random_Python)
## [1] 53
## This is an R chunk ##

# Remove non-numeric columns
res_event_random_num <- res_event_random_R %>% 
  select_if(is.numeric)

# Run the loop
res_event_random <- data.frame()
for(i in 1:length(colnames(res_event_random_num))){
  if(colnames(res_event_random_num)[i] %in% colnames(res_event_random_Python)){
    x1 <- res_event_random_R[colnames(res_event_random_R) == colnames(res_event_random_num)[i]]
    x2 <- res_event_random_Python[colnames(res_event_random_Python) == colnames(res_event_random_num)[i]]
    x <- data.frame(difference = round(sum(x1, na.rm = T) - sum(x2, na.rm = T), 4),
                    var = colnames(res_event_random_num)[i])
    # colnames(x)[1] <- "r"
    rownames(x) <- NULL
    } else {
      x <- data.frame(difference = NA, var = colnames(res_event_random_num)[i])
      }
  res_event_random <- rbind(res_event_random, x)
  }
na.omit(res_event_random)
##    difference                            var
## 2  19349.0000                    index_start
## 3  19365.0000                     index_peak
## 4  19420.0000                      index_end
## 5     75.0000                       duration
## 6      3.7080                 intensity_mean
## 7      6.6284                  intensity_max
## 8      2.8726                  intensity_var
## 9     93.4348           intensity_cumulative
## 10     2.4525       intensity_mean_relThresh
## 11     5.4013        intensity_max_relThresh
## 12     2.8771        intensity_var_relThresh
## 13    57.0796 intensity_cumulative_relThresh
## 14    91.7193             intensity_mean_abs
## 15    95.5900              intensity_max_abs
## 16     3.0015              intensity_var_abs
## 17  1686.3577       intensity_cumulative_abs
## 18     3.4802                     rate_onset
## 19    -0.2195                   rate_decline

Because the count of events detected is not the same, we cannot run correlations. We can sum up the differences to get an idea of how far off things are, and we can see that the results are troubling. The differences in the index_* values can be discounted as we have already established that we have a different number of events. The next largest difference is that for intensity_cumulative_abs. Looking at the one event that is the same between the languages one may see that this value is the same for both, meaning that we may discount this difference as it is not due to calculation differences, but rather also due to the differing number of events. Regardless, it is an indicator that overall the much lower threshold in the R language is giving us results that look much more dire than the Python language. This is problematic and stems from the quantile calculation issue.

I looked into this in quite some detail and it appears to stem not only from the calculation of quantiles being different, but also from the R quantile calculations never giving more than one additional decimal place of precision over the initial data, whereas the Python results float near infinity. This explains why the R results always tend to be lower, but does not account for why the presence of missing data has such a dramatic effect.

## [1] 0.9999996
sum(clim_only_random_R$seas) - sum(clim_only_random_Python$seas)
## [1] -0.07149032
cor(clim_only_random_R$seas, clim_only_random_Python$seas)
## [1] 0.9999996
sum(clim_only_random_R$thresh) - sum(clim_only_random_Python$thresh)
## [1] -26.22067

Looking at only the values for one seasonal climatology or threshold cycle we see that the differences are the same as when we compare all of the data. I don’t know why they wouldn’t be… but I wanted to check.

After having gone over the R code very thoroughly I think the next course of action is to go through the step by step output of the Python code to see where exactly these changes begin to occur. It could be in the following places:

  • clim_calc_cpp: The calculation of quantile values
    • The rounding of the results to 0.001 will have an influence, but can’t explain the size of the difference when missing data are present
  • smooth_percentile: RcppRoll::roll_mean is used for the rolling means and this may differ greatly from Python when missing data are present
    • Line 290 in the Python code is where the function runavg is used for the same purpose

Benchmarks

The final thing we want to look at in this vignette is the speed differences in calculating MHWs between the two languages.

R

library(microbenchmark)
# The old R code
library(RmarineHeatWaves)
microbenchmark(detect(make_whole(data = sst_WA), climatology_start = "1982-01-01", climatology_end = "2014-12-31"), times = 10)
## Unit: seconds
##                                                                                                      expr
##  detect(make_whole(data = sst_WA), climatology_start = "1982-01-01",      climatology_end = "2014-12-31")
##       min       lq     mean   median       uq      max neval
##  1.040593 1.048529 1.211674 1.081987 1.190646 2.022516    10
# The new R code
microbenchmark(detect_event(ts2clm(data = sst_WA, climatologyPeriod = c("1982-01-01", "2014-12-31"))), times = 10)
## Unit: milliseconds
##                                                                                         expr
##  detect_event(ts2clm(data = sst_WA, climatologyPeriod = c("1982-01-01",      "2014-12-31")))
##       min       lq     mean   median       uq      max neval
##  192.4564 208.8704 279.6261 287.8735 319.2132 383.9136    10

Unsurprisingly, the average for running the heatwave analysis on heatwaveR 10 times comes out at ~0.194 seconds per calculations, which is faster than the average of ~0.730 with RmarineHeatWaves.

Python

## 0.173738

The average speed for running the Python code 10 times comes out at ~0.175 seconds, which is slightly faster than R. And the Python code is also calculating the event categories, which R is not. That is done in an additional step with category(). The R code is however allowing for a wider range of options for climatologies, and may be more easily multi-cored, as shown in this vignette.

Conclusion

Overall the basic applications of these two languages provide near identical results, and nearly the same computational time. I would not say that one is better enough than the other to warrant switching between languages. Rather this vignette supports the argument that people collaborating on a research project while using R and Python do not need to worry about their results. The exception currently being that large amounts of missing data appear to be influencing the calculation of the 90th percentile threshold differently. This is something that will be investigated further and corrected as necessary.