library(healthyR.ts) library(dplyr) library(ggplot2) library(tidyr) library(plotly) library(timetk) library(modeltime)
Today I am going to go over some quick yet practical examples of ways that you can use the healthyR.ts
package. This package is designed to help you analyze time series data in a more efficient and effective manner.
Let’s just jump right into it!
library(healthyR.ts) library(dplyr) library(ggplot2) library(tidyr) library(plotly) library(timetk) library(modeltime)
We are going to use the timeseries data called BJSales.lead
that comes with Base R. We will do this to showcase a couple of things like turning a ts object into a tibble and plotting the data.
# Load the data, which has no time series information other than it is # a time series object and 150 points in length, so we will go ahead and # create a date column for it and name it date_col. df <- BJsales.lead |> ts_to_tbl() |> mutate(date_col = seq.Date(from = as.Date("1991-01-01"), by = "month", length.out = 150)) |> select(date_col, everything()) # Print the first few rows of the data head(df)
# A tibble: 6 × 2 date_col value <date> <dbl> 1 1991-01-01 10.0 2 1991-02-01 10.1 3 1991-03-01 10.3 4 1991-04-01 9.75 5 1991-05-01 10.3 6 1991-06-01 10.1
So far, we have loaded the data and created a date column for it. Now, let’s plot the data. We are going to use the ts_vva_plot
function to do this.
# Plot the data plt_data <- ts_vva_plot(df, date_col, value) head(plt_data[["data"]][["augmented_data_tbl"]])
# A tibble: 6 × 3 date_col name value <date> <fct> <dbl> 1 1991-01-01 Value 10.0 2 1991-01-01 Velocity NA 3 1991-01-01 Acceleration NA 4 1991-02-01 Value 10.1 5 1991-02-01 Velocity 0.0600 6 1991-02-01 Acceleration NA
plt_data[["plots"]][["interactive_plot"]]
Now we have created the augmented data that gets the first order difference of the time series velocity and then the second order difference which gets us the acceleration. The function then creates a ggplot2
plot and a plotly
plot of the data. Let’s move on to see the growth rate of this data.
# Plot the growth rate of the data df_growth_augment_tbl <- ts_growth_rate_augment( df, value ) head(df_growth_augment_tbl)
# A tibble: 6 × 3 date_col value growth_rate_value <date> <dbl> <dbl> 1 1991-01-01 10.0 NA 2 1991-02-01 10.1 0.599 3 1991-03-01 10.3 2.48 4 1991-04-01 9.75 -5.52 5 1991-05-01 10.3 5.95 6 1991-06-01 10.1 -1.94
Let’s now view the data:
plt <- df_growth_augment_tbl |> pivot_longer(cols = -date_col) |> ggplot(aes(x = date_col, y = value, color = name)) + facet_wrap(~ name, ncol = 1, scales = "free") + geom_line() + theme_minimal() + labs( x = "Date", y = "Value", title = "Growth Rate of Time Series Data", color = "Variable" ) print(plt)
ggplotly(plt)
Is the data stationary? Meaning does the joint probability of the distribution change when shifted in time? Let’s find out.
ts_adf_test(df[["value"]])
$test_stat [1] -1.723664 $p_value [1] 0.6915227
The p-value from this test is 0.692. This means that we can accept the null hypothesis that the data is non-stationary. We can, however, make the data stationary by using a built in function in this package.
auto_stationary_df <- auto_stationarize(df[["value"]])
The time series is not stationary. Attempting to make it stationary...
stationary_vec <- auto_stationary_df[["stationary_ts"]] ndiffs <- auto_stationary_df[["ndiffs"]] trans_type <- auto_stationary_df[["trans_type"]] test_stat <- auto_stationary_df[["adf_stats"]][["test_stat"]] p_value <- auto_stationary_df[["adf_stats"]][["p_value"]]
The data is now stationary after 1 differencing. The transformation type used was diff. The test statistic was -4.839 and the p-value was 0.01.
Let’s now add the stationary data to the df_growth_augment_tbl and plot it. First in order to do this we are going to have to pad the data since it is shorter than the original data. We will simply add an NA to the vector then attach.
stationary_vec <- c(rep(NA, ndiffs), stationary_vec) df_growth_augment_tbl <- df_growth_augment_tbl |> mutate(stationary = stationary_vec) df_growth_augment_tbl |> pivot_longer(cols = -date_col) |> ggplot(aes(x = date_col, y = value, color = name)) + facet_wrap(~ name, ncol = 1, scales = "free") + geom_line() + theme_minimal() + labs( x = "Date", y = "Value", title = "Growth Rate/Value and Stationary Data of Time Series", color = "Variable" )
It’s close to the growth rate as it is the first order difference of the data.
Now, lets see if there is any lags that are present in the data.
output <- ts_lag_correlation(df_growth_augment_tbl, .date_col = date_col, .value_col = value, .lags = c(1,2,3,4,6,12,24)) output[["plots"]][["plotly_lag_plot"]]