IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Loading Financial Time Series

    R - datawookie发表于 2024-04-08 00:00:00
    love 0
    [This article was first published on R - datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    I’m going to be writing a series of posts which will look at some applications of R (and perhaps Python) to financial modelling. We’ll start here by pulling some stock data into R, calculating the daily returns and then looking at correlations and simple volatility estimates.

    Data

    I was looking for a chunky set of stock data, preferably something that was available at higher than daily time resolution. These data from Kaggle for NIFTY 100 stocks on the Indian Stock Market seemed a decent fit.

    📌 Although I wanted intraday data I won’t be using it for the moment and will start by aggregatig the data up to daily resolution.

    List of Data Frames

    Data for each of the stocks is in a separate CSV file. Start by iterating over these using map_dfr() and loading them into a single data frame.

    FILES <- list.files("data", pattern=".csv$", full.names = TRUE)
    
    load <- function(path) {
      read_csv(path, show_col_types = FALSE) %>%
        mutate(
          symbol = sub(".csv$", "", basename(path)),
          time = as.POSIXct(date),
          date = as.Date(time)
        ) %>%
        group_by(symbol, date) %>%
        arrange(time) %>%
        # Summarise the daily closing price.
        summarise(
          close = last(close),
          .groups = "drop"
        )
    }
    
    stocks <- map_dfr(FILES, load)
    

    How many records do we have and what does the data look like?

    nrow(stocks)
    
    [1] 151932
    
    head(stocks)
    
    # A tibble: 6 × 3
      symbol date       close
      <chr>  <date>     <dbl>
    1 ACC    2015-02-02 1515.
    2 ACC    2015-02-03 1512.
    3 ACC    2015-02-04 1490 
    4 ACC    2015-02-05 1500.
    5 ACC    2015-02-06 1513.
    6 ACC    2015-02-09 1504.
    

    We could work with them en masse, but it’s going to be easier to split into a list of data frames, one for each stock.

    stocks <- split(stocks, stocks$symbol)
    length(stocks)
    
    [1] 87
    
    names(stocks)
    
     [1] "ACC"        "ADANIENT"   "ADANIPORTS" "AMBUJACEM"  "APOLLOHOSP"
     [6] "ASIANPAINT" "AUROPHARMA" "AXISBANK"   "BAJAJ-AUTO" "BAJAJFINSV"
    [11] "BAJFINANCE" "BANKBARODA" "BERGEPAINT" "BHARTIARTL" "BIOCON"    
    [16] "BOSCHLTD"   "BPCL"       "BRITANNIA"  "CADILAHC"   "CHOLAFIN"  
    [21] "CIPLA"      "COALINDIA"  "COLPAL"     "DABUR"      "DIVISLAB"  
    [26] "DLF"        "DRREDDY"    "EICHERMOT"  "GAIL"       "GODREJCP"  
    [31] "GRASIM"     "HAVELLS"    "HCLTECH"    "HDFC"       "HDFCBANK"  
    [36] "HEROMOTOCO" "HINDALCO"   "HINDPETRO"  "HINDUNILVR" "ICICIBANK" 
    [41] "IGL"        "INDUSINDBK" "INDUSTOWER" "INFY"       "IOC"       
    [46] "ITC"        "JINDALSTEL" "JSWSTEEL"   "JUBLFOOD"   "KOTAKBANK" 
    [51] "LT"         "LUPIN"      "M_M"        "MARICO"     "MARUTI"    
    [56] "MCDOWELL-N" "MUTHOOTFIN" "NAUKRI"     "NESTLEIND"  "NIFTY50"   
    [61] "NIFTYBANK"  "NMDC"       "NTPC"       "ONGC"       "PEL"       
    [66] "PIDILITIND" "PIIND"      "PNB"        "POWERGRID"  "RELIANCE"  
    [71] "SAIL"       "SBIN"       "SHREECEM"   "SIEMENS"    "SUNPHARMA" 
    [76] "TATACONSUM" "TATAMOTORS" "TATASTEEL"  "TCS"        "TECHM"     
    [81] "TITAN"      "TORNTPHARM" "ULTRACEMCO" "UPL"        "VEDL"      
    [86] "WIPRO"      "YESBANK"   
    

    There are 87 distinct stocks and a number of names there that you might recognise.

    Time Series

    Now convert them into individual xts objects and concatenate to form a single xts object with the closing time series for each of the stocks.

    stocks <- stocks %>%
      map(~{
        ts <- xts(x = .[["close"]], order.by = .[["date"]])
        colnames(ts) <- unique(.$symbol)
        
        ts
      })
    
    stocks <- do.call(merge, stocks) %>% na.omit()
    stocks <- stocks["2016-01-01/2022-01-01"]
    

    We now have a multivariate time series. How many individual series and how many observations?

    dim(stocks)
    
    [1] 1485   87
    

    So that’s 87 time series and 1485 observations for each of them. Each time series looks like this:

    head(stocks$TATASTEEL)
    
               TATASTEEL
    2016-01-04    259.00
    2016-01-05    272.95
    2016-01-06    268.75
    2016-01-07    251.00
    2016-01-08    254.55
    2016-01-11    252.90
    

    We can print the closing prices but it’s more useful to have a plot.

    Returns

    For analytical purposes the absolute daily closing price is not as interesting as the daily return, which can be easily calculated but we’ll use CalculateReturns() from {PerformanceAnalytics} for convenience.

    returns <- CalculateReturns(stocks) %>% na.omit()
    

    Plot the corresponding return time series.

    Variability & Volatility

    There’s a lot of diversity in the return time series across the collection of stocks. To illustrate, here are the series for a collection of them.

    ACC has a roughly consistent low level of return variability over the six years. By contrast, Yes Bank has periods of increased variability starting in the latter part of 2018.

    Daily Volatility

    It would be useful if we had a number (or numbers) to describe the relative level of variability. Volatility is a statistical measure of the dispersion of returns and is calculated as either the standard deviation or variance of the returns. In general, the higher the volatility, the riskier the security.

    # Pull out the returns for two specific stocks.
    #
    ACC <- returns$ACC
    YESBANK <- returns$YESBANK
    

    Let’s start by calculating the standard deviations of those stocks over the entire data period.

    # Daily volatility.
    #
    sd(ACC)
    
    [1] 0.0184169
    
    sd(YESBANK)
    
    [1] 0.04682746
    

    As expected the volatility of Yes Bank is higher than that of ACC.

    Annualised Volatility

    These values can be thought of as the average daily variation in the stock’s return. However, what’s more commonly used is the annualised value, which takes into account the fact that there are on average 252 trading days in each year and reflects the average variation in the stock’s return over the course of a year.

    # Annualised volatility.
    #
    sqrt(252) * sd(ACC)
    
    [1] 0.2923593
    
    sqrt(252) * sd(YESBANK)
    
    [1] 0.7433629
    

    Varying Volatility

    It’s instructive to look at the volatility for shorter periods of time. Let’s compare the volatility of those two stocks in 2017 and 2020.

    sqrt(252) * sd(ACC["2017"])
    
    [1] 0.2344968
    
    sqrt(252) * sd(YESBANK["2017"])
    
    [1] 0.2665718
    

    In 2017 the volatilities of ACC and Yes Bank are comparable.

    sqrt(252) * sd(ACC["2020"])
    
    [1] 0.3900236
    
    sqrt(252) * sd(YESBANK["2020"])
    
    [1] 1.283764
    

    However, in 2020 Yes Bank is substantially more volatile than ACC.

    Of course you can make these volatility calculations progressively more granular. However, as time intervals get shorter you have less data, so each of the individual measurements becomes less reliable. We’ll be modelling the time variation of volatility in upcoming posts.

    Outliers

    The returns might include significant outliers that would likely bias any derived results. We can use winsorisation to reduce the effect of extreme outliers in the data. The Return.clean() function from {PerformanceAnalytics} can be used to apply clean.boudt() to the data.

    yesbank <- Return.clean(YESBANK, method = "boudt")
    

    In the plot below the raw data are in black and the cleaned data are in blue. The extrema are reduced in magnitude in the cleaned data.

    This can be appreciated by looking at the range (minimum and maximum values) of the rase and cleaned data.

    range(YESBANK)
    
    [1] -0.5537634  0.5964912
    
    range(yesbank)
    
    [1] -0.1621622  0.1601474
    

    Importance of Volatility

    Why is volatility important? It is an fundamental consideration in predicting future returns. Stocks with lower volatility are less likely to have substantial changes in return from day to day. However, stocks with higher volatility are literally more “volatile”: their returns can vary substantially from day to day.

    As we have seen, volatility varies from stock to stock (some companies are more volatile than others) and from period to period (a company may have times of low and high volatility).

    Volatility needs to be factored into future predictions. Of course, we also don’t know what future volatilities will be, so in addition to predicting future returns we will probably also want to predict future volatilities. We’ll see how this can be done in an upcoming post on GARCH models.

    To leave a comment for the author, please follow the link and comment on their blog: R - datawookie.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Loading Financial Time Series


沪ICP备19023445号-2号
友情链接