class: center, middle, inverse, title-slide # Data Visualization ## Introduction to Time Series ### Dr Thiyanga Talagala ### 2020-03-12 --- ## Time series - A time series is a sequence of observations taken sequentially in time. ## Time series data vs Cross sectional data .pull-left[ - Time series data <table> <thead> <tr> <th style="text-align:left;"> Year </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2013 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2014 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2015 </td> <td style="text-align:left;"> 150 </td> </tr> </tbody> </table> - a set of observations, along with some information about what times those observations were recorded. - usually discrete and equally spaced time intervals. ] .pull-right[ - Cross-sectional data <table> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 200 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 350 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 480 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 250 </td> </tr> </tbody> </table> - observations that come from different individuals or groups at a single point in time. ] --- ## Deterministic vs Non-deterministic time series .pull-left[ - **Deterministic time series:** future values can be exactly determined by using some mathematical function. ![](lecture5ts_files/figure-html/unnamed-chunk-3-1.png)<!-- --> `$$y_t = cos(2\pi t)$$` ] .pull-right[ - **Non-deterministic time series:** future values can be determined only in terms of a probability distribution. ![](lecture5ts_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] --- ## Frequency of a time series: Seasonal periods - Frequency: number of observation per natural time interval of measurement (usually year, but sometimes a week, a day or an hour) <table> <thead> <tr> <th style="text-align:left;"> Data </th> <th style="text-align:left;"> Frequency </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Annual </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Quarterly </td> <td style="text-align:left;"> 4 </td> </tr> <tr> <td style="text-align:left;"> Monthly </td> <td style="text-align:left;"> 12 </td> </tr> <tr> <td style="text-align:left;"> Weekly </td> <td style="text-align:left;"> 52 or 52.18 </td> </tr> </tbody> </table> - Multiple frequency setting <table> <thead> <tr> <th style="text-align:left;"> Data </th> <th style="text-align:left;"> Minute </th> <th style="text-align:left;"> Hour </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Week </th> <th style="text-align:left;"> Year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Daily </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 365.25 </td> </tr> <tr> <td style="text-align:left;"> Hourly </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> 24 </td> <td style="text-align:left;"> 168 </td> <td style="text-align:left;"> 8766 </td> </tr> <tr> <td style="text-align:left;"> Half-Hourly </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> 48 </td> <td style="text-align:left;"> 336 </td> <td style="text-align:left;"> 17532 </td> </tr> <tr> <td style="text-align:left;"> Minutes </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> 60 </td> <td style="text-align:left;"> 1440 </td> <td style="text-align:left;"> 10080 </td> <td style="text-align:left;"> 525960 </td> </tr> <tr> <td style="text-align:left;"> Seconds </td> <td style="text-align:left;"> 60 </td> <td style="text-align:left;"> 3600 </td> <td style="text-align:left;"> 86400 </td> <td style="text-align:left;"> 604800 </td> <td style="text-align:left;"> 31557600 </td> </tr> </tbody> </table> --- .pull-left[ ## Monthly time series ![](lecture5ts_files/figure-html/unnamed-chunk-7-1.png)<!-- --> - Length of the series: 72 - Monthly seasonality ] .pull-right[ ## Half-hourly Time Series ![](lecture5ts_files/figure-html/unnamed-chunk-8-1.png)<!-- --> - Length of the series: 4032 - Daily seasonality and weekly seasonality ] -- Note: Monthly seasonality with high-frequency data (daily, hourly, etc.) is tricky due to variable month lengths. You can't specify that using seasonal periods. It could be possibly handled using a dummy variable. --- class: duke-orange # Your turn - What are the frequencies for a monthly time series with semi-annual and annual pattern? --- # `ts` object in R - Annual time series <table> <thead> <tr> <th style="text-align:left;"> Year </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2013 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2014 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2015 </td> <td style="text-align:left;"> 150 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150), start=2012) y ``` ``` Time Series: Start = 2012 End = 2015 Frequency = 1 [1] 120 122 140 150 ``` --- # `ts` object in R - Quarterly time series <table> <thead> <tr> <th style="text-align:left;"> Quarter </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012-Q1 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2012-Q2 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2012-Q3 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2012-Q4 </td> <td style="text-align:left;"> 150 </td> </tr> <tr> <td style="text-align:left;"> 2013-Q1 </td> <td style="text-align:left;"> 200 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150, 200), start=c(2012, 1), frequency = 4) y ``` ``` Qtr1 Qtr2 Qtr3 Qtr4 2012 120 122 140 150 2013 200 ``` --- # `ts` object in R - Monthly time series <table> <thead> <tr> <th style="text-align:left;"> Month </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012-Jan </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2012-Feb </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2012-March </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2012-April </td> <td style="text-align:left;"> 150 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150), start=c(2012, 1), frequency = 12) y ``` ``` Jan Feb Mar Apr 2012 120 122 140 150 ``` --- # `ts` object in R - Weekly time series <table> <thead> <tr> <th style="text-align:left;"> Week </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012-W1 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2012-W2 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2012-W3 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2012-W4 </td> <td style="text-align:left;"> 150 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150), start=c(2012, 1), frequency = 52) y ``` ``` Time Series: Start = c(2012, 1) End = c(2012, 4) Frequency = 52 [1] 120 122 140 150 ``` --- ## Time series plots .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Year </th> <th style="text-align:left;"> Values </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2012 </td> <td style="text-align:left;"> 120 </td> </tr> <tr> <td style="text-align:left;"> 2013 </td> <td style="text-align:left;"> 122 </td> </tr> <tr> <td style="text-align:left;"> 2014 </td> <td style="text-align:left;"> 140 </td> </tr> <tr> <td style="text-align:left;"> 2015 </td> <td style="text-align:left;"> 150 </td> </tr> <tr> <td style="text-align:left;"> 2016 </td> <td style="text-align:left;"> 200 </td> </tr> <tr> <td style="text-align:left;"> 2017 </td> <td style="text-align:left;"> 250 </td> </tr> </tbody> </table> ```r y <- ts(c(120, 122, 140, 150, 200, 250), start=2012) y ``` ``` Time Series: Start = 2012 End = 2017 Frequency = 1 [1] 120 122 140 150 200 250 ``` ```r class(y) ``` ``` [1] "ts" ``` ] .pull-right[ ```r autoplot(y) ``` ![](lecture5ts_files/figure-html/unnamed-chunk-19-1.png)<!-- --> ] --- ## Add title and labels .pull-left[ ```r autoplot(y) ``` ![](lecture5ts_files/figure-html/unnamed-chunk-20-1.png)<!-- --> ] .pull-right[ ```r autoplot(y)+ylab("Number of sales")+ xlab("Year")+ ggtitle("Time series plot of sales from 2012 to 2017") ``` ![](lecture5ts_files/figure-html/unnamed-chunk-21-1.png)<!-- --> ] --- class: duke-orange ## Your turn Create plots of the following time series: dengue counts in Gampaha (Use `mozzie` package), a10 series (`fpp2` package). Use help() to find out about the data in each series. Modify the axes labels and title. --- #### Time series patterns **Trend** - Long-term increase or decrease in the data. **Seasonal** - A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a **fixed** and **known period**. Hence, seasonal time series are sometimes called periodic time series. - Period is unchanging and associated with some aspect of the calendar. **Cyclic** - A cyclic pattern exists when data exhibit rises and falls that are not of fixed period. The duration of these fluctuations is usually of at least 2 years. In general, - the average length of cycles is longer than the length of a seasonal pattern. - the magnitude of cycles tends to be more variable than the magnitude of seasonal patterns. --- ## Cyclic pattern ![](lecture5ts_files/figure-html/unnamed-chunk-22-1.png)<!-- --> --- ## Cyclic and seasonal pattern ![](lecture5ts_files/figure-html/unnamed-chunk-23-1.png)<!-- --> --- ## Multiple seasonal pattern ![](lecture5ts_files/figure-html/unnamed-chunk-24-1.png)<!-- --> --- ## Seasonal pattern ![](lecture5ts_files/figure-html/unnamed-chunk-25-1.png)<!-- --> --- ## Trend ![](lecture5ts_files/figure-html/unnamed-chunk-26-1.png)<!-- --> --- ## Trend and seasonal ![](lecture5ts_files/figure-html/unnamed-chunk-27-1.png)<!-- -->