IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Predicting the Real USD/TRY Rates with MARS

    Selcuk Disci发表于 2023-04-11 11:00:00
    love 0
    [This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    The ongoing debate recently in Turkey is that the Turkish government has suppressed US Dollar/Turkish Lira exchange rates (USD/TRY) to prevent economic turmoil. Many authorities in the business, especially exporters, think that the USD/TRY parity should be in the range of 24-25 Turkish Lira.

    To look through that, we will predict for the whole year and see whether the rates are in rational intervals. But first, we will model our data with bagged multivariate adaptive regression splines (MARS) via the earth package. The predictors of our regression model are the current account (account) and the producer price index (ppi) of Turkey.

    library(tidyverse)
    library(tidymodels)
    library(lubridate)
    library(timetk)
    library(tsibble)
    library(modeltime)
    library(baguette)
    library(fable)
    library(plotly)
    library(ggtext)
    library(systemfonts)
    library(showtext)
    
    df <- read_csv("https://raw.githubusercontent.com/mesdi/blog/main/usdtry_mars.csv")
    
    #Turkiye Current Account (USD)
    df_ca <- 
      df %>% 
      filter(type == "ca") %>% 
      mutate(date = 
               #removing parentheses and text within
               case_when(str_detect(date," \\(.*\\)") ~ str_remove(date," \\(.*\\)"),
               TRUE ~ date)) %>% 
      mutate(
        date = parse_date(date, format = "%b %d, %Y") %>% 
          #subtracting 2 months from release date
          floor_date("month") %m-% months(2),
        value = str_remove(value, "B") %>% as.numeric()
             ) %>% 
      select(date, account = value)
    
    #Turkiye Producer Price Index (TRY)
    df_ppi <- 
      df %>% 
      filter(type == "ppi") %>% 
      mutate(date = 
               #removing parentheses and text within
               case_when(str_detect(date," \\(.*\\)") ~ str_remove(date," \\(.*\\)"),
                         TRUE ~ date)) %>% 
      mutate(
        date = parse_date(date, format = "%b %d, %Y") %>% 
          #subtracting 1 months from release date
          floor_date("month") %m-% months(1),
        value = str_remove(value, "B") %>% as.numeric()
      ) %>% 
      select(date, ppi = value)
    
    #USD/TRY - US Dollar Turkish Lira
    df_usdtry <- 
      df %>% 
      filter(type == "usdtry") %>% 
      mutate(
        date = parse_date(date, format = "%m/%d/%Y"),
        value = as.numeric(value)
      ) %>% 
      select(date, usdtry = value)
    
    #Merging all the datasets
    df_tidy <- 
      df_usdtry %>% 
      left_join(df_ppi, by = "date") %>% 
      left_join(df_ca, by = "date") %>% 
      na.omit()
    
    

    Now that we have created our dataset, we can start modeling. Because the variables are in different scales, we normalize all of them. And we have to convert the categorical variables to numeric ones due to package/engine necessity.

    #Splitting the data for train and test data
    splits <- 
      df_tidy %>% 
      time_series_split(assess = "1 year", 
                        cumulative = TRUE,
                        date_var = date)
    
    df_train <- training(splits)
    df_test <- testing(splits)
    
    #Add time series signature
    df_rec_ts <- 
      recipe(usdtry ~ ., data = df_train) %>%
      step_timeseries_signature(date) 
    
    
    #Preprocessing
    df_rec_tidy <- 
      df_rec_ts %>%
      step_rm(date) %>%
      step_rm(contains("iso"), 
              contains("minute"), 
              contains("hour"),
              contains("am.pm"), 
              contains("xts")) %>%
      step_zv(all_numeric_predictors()) %>% 
      step_normalize(all_numeric_predictors()) %>%
      step_dummy(contains("lbl"), one_hot = TRUE) 
    

    We do hyperparameter tuning in order to find the optimal model for the data. As seen below, the degree of interaction is 2, which means there is an interaction term, and it exercises the backward pruning method.

    #Processed data for determining the number of model terms
    df_proc <- 
      df_rec_tidy %>% 
      prep() %>% 
      bake(new_data = NULL)
    
    #Model specification
    mars_spec <- 
      bag_mars(num_terms = min(200, max(20, 2 * ncol(df_proc))) + 1, 
               prod_degree = tune(),#marks for tuning 
               prune_method = tune()) %>% 
      set_engine("earth") %>% 
      set_mode("regression")
    
    #Workflow
    mars_wflow <- 
      workflow() %>% 
      add_model(mars_spec) %>% 
      add_recipe(df_rec_tidy)
    
    
    #cross-validation for resamples
    set.seed(12345)
    df_folds <- vfold_cv(df_train)
    
    #Building the hyperparameter data frame for tuning
    hyper_grid <- crossing(
      prod_degree = 1:2,
      prune_method = c("backward", "forward")
    )
    
    #Hyperparameter Tuning
    mars_reg_tune <-
      mars_wflow %>%
      tune_grid(
        df_folds,
        grid = hyper_grid,
        metrics = metric_set(rsq)
      )
    
    #Selecting the best parameters according to the r-square
    mars_param_best <- 
      select_best(mars_reg_tune, metric = "rsq") %>% 
      select(-.config)
    
    mars_param_best
    # A tibble: 1 x 2
    #  prod_degree prune_method
    #        <int> <chr>       
    #1           2 backward    
    

    We can build our model with the best parameters we found earlier tuning process.

    #Final workflow with the object of best parameters
    final_df_wflow <- 
      workflow() %>% 
      add_model(mars_spec) %>% 
      add_recipe(df_rec_tidy) %>% 
      finalize_workflow(mars_param_best)
    
    
    #Model table
    set.seed(2023)
    model_table <- modeltime_table(
      final_df_wflow %>% 
        fit(df_train)
    ) 
    
    #Calibration
    calibration_table <- 
      model_table %>%
      modeltime_calibrate(df_test)
    
    #Accuracy
    calibration_table %>%
      modeltime_accuracy() %>%
      select(rmse, rsq)
    
    # A tibble: 1 x 2
    #   rmse   rsq
    #  <dbl> <dbl>
    #1  2.55 0.543
    

    When we look at the accuracy results coefficient of determination (rsq) looks low because the accuracy measures are calculated based on test data in the calibration phase, and our test data’s time range is small. But the RMSE looks fine, considering the target variable.

    Before we start the modeling, we create our future data set to use in the regression model as a predictor. We will use the automated ARIMA function to do that. When we analyze the ARIMA models for ppi and account variables, we could see that they have an annual seasonality.

    #Future dataset for the next 12 months
    date <- 
      df_tidy %>%
      mutate(date = yearmonth(date)) %>% 
      as_tsibble() %>% 
      new_data(12) %>% 
      as_tibble() %>% 
      mutate(date = as.Date(date))
    
    ppi <- 
      df_tidy %>%
      mutate(date = yearmonth(date)) %>% 
      as_tsibble() %>% 
      model(ARIMA(ppi)) %>% 
      forecast(h = 12) %>% 
      as_tibble() %>% 
      select(ppi = .mean)
    
    account <- 
      df_tidy %>%
      mutate(date = yearmonth(date)) %>% 
      as_tsibble() %>% 
      model(ARIMA(account)) %>% 
      forecast(h = 12) %>% 
      as_tibble() %>% 
      select(account = .mean)
    
    df_future <- 
      date %>% 
      bind_cols(ppi) %>% 
      bind_cols(account)
    

    Finally, we can plot the actual values from 2016 to 2023 and predictions for the next 12 months.

    #Forecasting plot dataset
    set.seed(1983)
    df_plot <- 
      calibration_table %>% 
      modeltime_refit(df_tidy) %>% 
      modeltime_forecast(
        new_data = df_future,
        actual_data = df_tidy %>% filter(year(date) > 2015)
        )
    
    #The observation dataset
    df_actual <- 
      df_plot %>% 
      filter(.key == 'actual') 
    
    #The prediction dataset
    df_pred <- 
      df_plot %>% 
      filter(.key == 'prediction') 
    
    
    #importing google fonts
    font_add_google("Nunito", "nuni" )
    showtext_auto()
    
    #Hoverinfo texts 
    text_actual <- glue::glue("{round(df_actual$.value, 2)} TRY\n{df_actual$.index}")
    text_pred <- glue::glue("{round(df_pred$.value, 2)} TRY\n{df_pred$.index}")
    
    df_plot %>% 
      ggplot(aes(.index, .value)) + 
      geom_area(
        data = . %>% filter(.key == "actual"),
        fill="#69b3a2", 
        alpha=0.5
        ) + 
      geom_line(
        data = . %>% filter(.key == "actual"),
        color ="#69b3a2"
        ) +
      geom_point(aes(text = text_actual),
        data = . %>% filter(.key == "actual"),
        color ="#69b3a2"
      ) +
      geom_area(
        data = . %>% filter(.key == "prediction"),
        fill="#cf2765", 
        alpha=0.5
      ) + 
      geom_line(
        data = . %>% filter(.key == "prediction"),
        color = "#cf2765"
        ) +
      geom_point(aes(text = text_pred),
        data = . %>% filter(.key == "prediction"),
        color = "#cf2765"
      ) +
      scale_y_continuous(limits = c(0,30))+
      labs(
        x = "",
        y = "",
        title = "The USD/TRY <span style = 'color:#69b3a2'>rates from 2016 to 2023</span>\nand <span style = 'color:#cf2765'>the predictions for the next 12 months</span>"
      ) +
      theme_minimal()+
      theme(
        plot.title = 
          element_markdown(
            hjust = 0.5, 
            face = "bold"
            ),
        plot.background = element_rect(fill = "#f3f6f4", color = NA),
        panel.background = element_rect(fill = "#f3f6f4", color = NA) 
        ) -> p
                  
    
    #setting font family for ggplotly 
    font <- list(
      family= "Nunito", 
      size=20
    )
    
    #setting font family for hover label
    label <- list(
      font = list(
        family = "Nunito", 
        size = 16
      ) 
      
    ) 
    
    #converts ggplot2 object to plotly for interactive chart
    ggplotly(p, tooltip = "text") %>% 
      style(hoverlabel = label) %>% 
      layout(font = font) %>% 
      #Remove plotly buttons from the mode bar
      config(displayModeBar = FALSE)
    

    When we hover over the points, we can see that the April data is quite similar to the expectations of the Turkish exporters and, the majority of economists. So, in the near future, the parity might explode, especially after the election is to be held on May 14.

    To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Predicting the Real USD/TRY Rates with MARS


沪ICP备19023445号-2号
友情链接