Sales Forecasting Using Classical and Machine Learning Approaches – A Comparative Study University Canada West MBAR 661 Consulting/Research Project Prepared by: Juan Manuel Rodriguez Victoria (2131508) Niranjan Yellamelli (2115049) Supervisor: Dr. Amirhossein Zaji Date: November 10, 2023 Table of Content Abstract .................................................................................................................................................. 1 Introduction ........................................................................................................................................... 2 Literature Review ................................................................................................................................. 6 Methodology ....................................................................................................................................... 11 Overview .............................................................................................................................. 11 Limitations: .......................................................................................................................... 12 Datasets ................................................................................................................................ 12 Dataset 1 - Liquefied Gas sales in Bolivia ....................................................................... 12 Dataset 2 – Diesel Oil sales in Bolivia ............................................................................. 13 Dataset 3 – Brazilian Retailer ........................................................................................... 14 Dataset 4 – The United Kingdom Retail Sales ................................................................. 15 Descriptive Statistics ........................................................................................................ 16 Forecasting Models .............................................................................................................. 16 Naïve Method ................................................................................................................... 16 Moving Average ............................................................................................................... 16 Exponential Smoothing .................................................................................................... 17 Autoregressive Model....................................................................................................... 17 Autoregressive Integrated Moving Average (ARIMA).................................................... 18 Seasonal Autoregressive Integrated Moving Average (SARIMA) .................................. 18 Multilayer Perceptron (MLP) ........................................................................................... 19 Decision Tree (DT) ........................................................................................................... 19 Support Vector Regression (SVR) ................................................................................... 19 Long Short-Term Memory (LSTM) ................................................................................. 20 Relation Coefficients and Error Metrics .............................................................................. 20 Coefficient of Determination (R2) .................................................................................... 21 Correlation Coefficient (r) ................................................................................................ 21 Mean Squared Error (MSE) .............................................................................................. 21 Root Mean Square Error (RMSE) .................................................................................... 22 Mean Absolute Error (MAE) ............................................................................................ 22 Peak Similarity (PS) ......................................................................................................... 22 Rationale for Horizontal Error .......................................................................................... 23 Results and Discussion ...................................................................................................................... 25 Relation Coefficients Analysis ............................................................................................. 25 Vertical Error Analysis......................................................................................................... 26 Horizontal Error Analysis .................................................................................................... 27 Conclusion............................................................................................................................ 28 References ........................................................................................................................................... 30 APPENDIX A ..................................................................................................................................... 38 Plot and Correlation of Dataset 1 - Train Values and Predictions ....................................... 38 Plot and Correlation of Dataset 1 - Test Values and Predictions ......................................... 41 APPENDIX B ..................................................................................................................................... 44 Plot and Correlation of Dataset 2 - Train Values and Predictions ....................................... 44 Plot and Correlation of Dataset 2 - Test Values and Predictions ......................................... 47 APPENDIX C ..................................................................................................................................... 50 Plot and Correlation of Dataset 3 - Train Values and Predictions ....................................... 50 Plot and Correlation of Dataset 3 - Test Values and Predictions ......................................... 53 APPENDIX D ..................................................................................................................................... 56 Plot and Correlation of Dataset 4 - Train Values and Predictions ....................................... 56 Plot and Correlation of Dataset 4 - Test Values and Predictions ......................................... 59 APPENDIX E ..................................................................................................................................... 62 List of Tables Table 1. Researches of Forecasting Methods Comparison ............................................................ 7 Table 2. Descriptive Statistics of Datasets ..................................................................................... 16 Table E.1 Correlation and Error Metrics obtained for Dataset 1............................................... 62 Table E.2 Correlation and Error Metrics obtained for Dataset 2............................................... 62 Table E.3 Correlation and Error Metrics obtained for Dataset 3............................................... 63 Table E.4 Correlation and Error Metrics obtained for Dataset 4............................................... 63 List of Figures Figure 1. Negative Correlation Between the sMAPE and Training Time of ML/DL Models in M4 .......................................................................................................................................................... 4 Figure 2. The input imitation problem in time series forecasting .................................................. 5 Figure 3. Flowchart of the research made by the author ............................................................... 12 Figure 4. Plot of the Dataset 1 .......................................................................................................... 13 Figure 5. Plot of the Dataset 2 .......................................................................................................... 13 Figure 6. Plot of the Dataset 3 .......................................................................................................... 14 Figure 7. Plot of the Dataset 4 .......................................................................................................... 15 Figure 8. Peak Similarity Flow Chart .............................................................................................. 23 Figure 9. Accuracy of Forecasting models considering relation coefficients – Dataset 1........ 25 Figure 10. Accuracy of Forecasting models considering horizontal error – Dataset 1 ............. 26 Figure 11. Accuracy of Forecasting models considering vertical error – Dataset 1 ................. 27 Figure A.1 Dataset 1 and Moving Average Predictions (36 last values) .................................... 38 Figure A.2 Correlation between Dataset 1 and Moving Average Predicted Values ................. 38 Figure A.3 Dataset 1 and Exponential Smoothing Predictions (36 last values) ........................ 38 Figure A.4 Correlation between Dataset 1 and Exponential Smoothing Predicted Values ..... 38 Figure A.5 Dataset 1 and Autoregressive Predictions (36 last values) ....................................... 38 Figure A.6 Correlation between Dataset 1 and Autoregression Predicted Values .................... 38 Figure A.7 Dataset 1 and Naive Predictions (36 last values) ....................................................... 39 Figure A.8 Correlation between Dataset 1 and Naive Predicted Values .................................... 39 Figure A.9 Dataset 1 and ARIMA Predictions (36 last values) ................................................... 39 Figure A.10 Correlation between Dataset 1 and ARIMA Predicted Values .............................. 39 Figure A.11 Dataset 1 and SARIMA Predictions (36 last values) .............................................. 39 Figure A.12 Correlation between Dataset 1 and SARIMA Predicted Values............................ 39 Figure A.13 Dataset 1 and MLP Predictions (36 last values) ...................................................... 39 Figure A.14 Correlation between Dataset 1 and MLP Predicted Values.................................... 39 Figure A.15 Dataset 1 and Decision Tree Predictions (36 last values) ....................................... 40 Figure A.16 Correlation between Dataset 1 and Decision Tree Predicted Values .................... 40 Figure A.17 Dataset 1 and SVR Predictions (36 last values) ....................................................... 40 Figure A.18 Correlation between Dataset 1 and SVR Predicted Values .................................... 40 Figure A.19 Dataset 1 and LSTM Predictions (36 last values) .................................................... 40 Figure A.20 Correlation between Dataset 1 and LSTM Predicted Values ................................. 40 Figure A.21 Dataset 1 and Moving Average Predictions (36 last values) .................................. 41 Figure A.22 Correlation between Dataset 1 and Moving Average Predicted Values ............... 41 Figure A.23 Dataset 1 and Exponential Smoothing Predictions (36 last values) ...................... 41 Figure A.24 Correlation between Dataset 1 and Exponential Smoothing Predicted Values ... 41 Figure A.25 Dataset 1 and Autoregressive Predictions (36 last values) ..................................... 41 Figure A.26 Correlation between Dataset 1 and Autoregressive Predicted Values .................. 41 Figure A.27 Dataset 1 and Naive Predictions (36 last values) ..................................................... 42 Figure A.28 Correlation between Dataset 1 and Naive Predicted Values .................................. 42 Figure A.29 Dataset 1 and ARIMA Predictions (36 last values) ................................................. 42 Figure A.30 Correlation between Dataset 1 and ARIMA Predicted Values .............................. 42 Figure A.31 Dataset 1 and SARIMA Predictions (36 last values) .............................................. 42 Figure A.32 Correlation between Dataset 1 and SARIMA Predicted Values............................ 42 Figure A.33 Dataset 1 and MLP Predictions (36 last values) ...................................................... 42 Figure A.34 Correlation between Dataset 1 and MLP Predicted Values.................................... 42 Figure A.35 Dataset 1 and Decision Tree Predictions (36 last values) ....................................... 43 Figure A.36 Correlation between Dataset 1 and Decision Tree Predicted Values .................... 43 Figure A.37 Dataset 1 and SVR Predictions (36 last values) ....................................................... 43 Figure A.38 Correlation between Dataset 1 and SVR Predicted Values .................................... 43 Figure A.39 Dataset 1 and LSTM Predictions (36 last values) .................................................... 43 Figure A.40 Correlation between Dataset 1 and LSTM Predicted Values ................................. 43 Figure B.1 Dataset 2 and Moving Average Predictions (36 last values) .................................... 44 Figure B.2 Correlation between Dataset 2 and Moving Average Predicted Values ................. 44 Figure B.3 Dataset 2 and Exponential Smoothing Predictions (36 last values)......................... 44 Figure B.4 Correlation between Dataset 2 and Exponential Smoothing Predicted Values ...... 44 Figure B.5 Dataset 2 and Autoregressive Predictions (36 last values) ....................................... 44 Figure B.6 Correlation between Dataset 2 and Autoregression Predicted Values .................... 44 Figure B.7 Dataset 2 and Naive Predictions (36 last values) ....................................................... 45 Figure B.8 Correlation between Dataset 2 and Naive Predicted Values .................................... 45 Figure B.9 Dataset 2 and ARIMA Predictions (36 last values) ................................................... 45 Figure B.10 Correlation between Dataset 2 and ARIMA Predicted Values .............................. 45 Figure B.11 Dataset 2 and SARIMA Predictions (36 last values) ............................................... 45 Figure B.12 Correlation between Dataset 2 and SARIMA Predicted Values ............................ 45 Figure B.13 Dataset 2 and MLP Predictions (36 last values)....................................................... 45 Figure B.14 Correlation between Dataset 2 and MLP Predicted Values .................................... 45 Figure B.15 Dataset 2 and Decision Tree Predictions (36 last values) ....................................... 46 Figure B.16 Correlation between Dataset 2 and Decision Tree Predicted Values .................... 46 Figure B.17 Dataset 2 and SVR Predictions (36 last values) ....................................................... 46 Figure B.18 Correlation between Dataset 2 and SVR Predicted Values .................................... 46 Figure B.19 Dataset 2 and LSTM Predictions (36 last values) .................................................... 46 Figure B.20 Correlation between Dataset 2 and LSTM Predicted Values ................................. 46 Figure B.21 Dataset 2 and Moving Average Predictions (36 last values) .................................. 47 Figure B.22 Correlation between Dataset 2 and Moving Average Predicted Values ............... 47 Figure B.23 Dataset 2 and Exponential Smoothing Predictions (36 last values) ...................... 47 Figure B.24 Correlation between Dataset 2 and Exponential Smoothing Predicted Values.... 47 Figure B.25 Dataset 2 and Autoregressive Predictions (36 last values) ..................................... 47 Figure B.26 Correlation between Dataset 2 and Autoregressive Predicted Values .................. 47 Figure B.27 Dataset 2 and Naive Predictions (36 last values) ..................................................... 48 Figure B.28 Correlation between Dataset 2 and Naive Predicted Values .................................. 48 Figure B.29 Dataset 2 and ARIMA Predictions (36 last values) ................................................. 48 Figure B.30 Correlation between Dataset 2 and ARIMA Predicted Values .............................. 48 Figure B.31 Dataset 2 and SARIMA Predictions (36 last values) ............................................... 48 Figure B.32 Correlation between Dataset 2 and SARIMA Predicted Values ............................ 48 Figure B.33 Dataset 2 and MLP Predictions (36 last values)....................................................... 48 Figure B.34 Correlation between Dataset 2 and MLP Predicted Values .................................... 48 Figure B.35 Dataset 2 and Decision Tree Predictions (36 last values) ....................................... 49 Figure B.36 Correlation between Dataset 2 and Decision Tree Predicted Values .................... 49 Figure B.37 Dataset 2 and SVR Predictions (36 last values) ....................................................... 49 Figure B.38 Correlation between Dataset 2 and SVR Predicted Values .................................... 49 Figure B.39 Dataset 2 and LSTM Predictions (36 last values) .................................................... 49 Figure B.40 Correlation between Dataset 2 and LSTM Predicted Values ................................. 49 Figure C.1 Dataset 3 and Moving Average Predictions (60 last values) .................................... 50 Figure C.3 Correlation between Dataset 3 and Moving Average Predicted Values ................. 50 Figure C.3 Dataset 3 and Exponential Smoothing Predictions (60 last values)......................... 50 Figure C.4 Correlation between Dataset 3 and Exponential Smoothing Predicted Values ...... 50 Figure C.5 Dataset 3 and Autoregressive Predictions (60 last values) ....................................... 50 Figure C.6 Correlation between Dataset 3 and Autoregression Predicted Values .................... 50 Figure C.7 Dataset 3 and Naive Predictions (60 last values) ....................................................... 51 Figure C.8 Correlation between Dataset 3 and Naive Predicted Values .................................... 51 Figure C.9 Dataset 3 and ARIMA Predictions (60 last values) ................................................... 51 Figure C.10 Correlation between Dataset 3 and ARIMA Predicted Values .............................. 51 Figure C.11 Dataset 3 and SARIMA Predictions (60 last values) ............................................... 51 Figure C.12 Correlation between Dataset 3 and SARIMA Predicted Values ............................ 51 Figure C.13 Dataset 3 and MLP Predictions (60 last values)....................................................... 51 Figure C.14 Correlation between Dataset 3 and MLP Predicted Values .................................... 51 Figure C.15 Dataset 3 and Decision Tree Predictions (60 last values) ....................................... 52 Figure C.16 Correlation between Dataset 3 and Decision Tree Predicted Values .................... 52 Figure C.17 Dataset 3 and SVR Predictions (60 last values) ....................................................... 52 Figure C.18 Correlation between Dataset 3 and SVR Predicted Values .................................... 52 Figure C.19 Dataset 3 and LSTM Predictions (60 last values) .................................................... 52 Figure C.20 Correlation between Dataset 3 and LSTM Predicted Values ................................. 52 Figure C.21 Dataset 3 and Moving Average Predictions (60 last values) .................................. 53 Figure C.22 Correlation between Dataset 3 and Moving Average Predicted Values ............... 53 Figure C.23 Dataset 3 and Exponential Smoothing Predictions (60 last values) ...................... 53 Figure C.24 Correlation between Dataset 3 and Exponential Smoothing Predicted Values.... 53 Figure C.25 Dataset 3 and Autoregressive Predictions (60 last values) ..................................... 53 Figure C.26 Correlation between Dataset 3 and Autoregressive Predicted Values .................. 53 Figure C.27 Dataset 3 and Naive Predictions (60 last values) ..................................................... 54 Figure C.28 Correlation between Dataset 3 and Naive Predicted Values .................................. 54 Figure C.29 Dataset 3 and ARIMA Predictions (60 last values) ................................................. 54 Figure C.30 Correlation between Dataset 3 and ARIMA Predicted Values .............................. 54 Figure C.31 Dataset 3 and SARIMA Predictions (60 last values) ............................................... 54 Figure C.32 Correlation between Dataset 3 and SARIMA Predicted Values ............................ 54 Figure C.33 Dataset 3 and MLP Predictions (60 last values)....................................................... 54 Figure C.34 Correlation between Dataset 3 and MLP Predicted Values .................................... 54 Figure C.35 Dataset 3 and Decision Tree Predictions (60 last values) ....................................... 55 Figure C.36 Correlation between Dataset 3 and Decision Tree Predicted Values .................... 55 Figure C.37 Dataset32 and SVR Predictions (60 last values) ...................................................... 55 Figure C.38 Correlation between Dataset 3 and SVR Predicted Values .................................... 55 Figure C.39 Dataset 3 and LSTM Predictions (60 last values) .................................................... 55 Figure C.40 Correlation between Dataset 3 and LSTM Predicted Values ................................. 55 Figure D.1 Dataset 4 and Moving Average Predictions (36 last values) .................................... 56 Figure D.2 Correlation between Dataset 4 and Moving Average Predicted Values ................. 56 Figure D.3 Dataset 4 and Exponential Smoothing Predictions (36 last values) ........................ 56 Figure D.4 Correlation between Dataset 4 and Exponential Smoothing Predicted Values ..... 56 Figure D.5 Dataset 4 and Autoregressive Predictions (36 last values) ....................................... 56 Figure D.6 Correlation between Dataset 4 and Autoregression Predicted Values .................... 56 Figure D.7 Dataset 4 and Naive Predictions (36 last values) ....................................................... 57 Figure D.8 Correlation between Dataset 4 and Naive Predicted Values .................................... 57 Figure D.9 Dataset 4 and ARIMA Predictions (36 last values) ................................................... 57 Figure D.10 Correlation between Dataset 4 and ARIMA Predicted Values .............................. 57 Figure D.11 Dataset 4 and SARIMA Predictions (36 last values) .............................................. 57 Figure D.12 Correlation between Dataset 4 and SARIMA Predicted Values............................ 57 Figure D.13 Dataset 4 and MLP Predictions (36 last values) ...................................................... 57 Figure D.14 Correlation between Dataset 4 and MLP Predicted Values.................................... 57 Figure D.15 Dataset 4 and Decision Tree Predictions (36 last values) ....................................... 58 Figure D.16 Correlation between Dataset 4 and Decision Tree Predicted Values .................... 58 Figure D.17 Dataset 4 and SVR Predictions (36 last values) ....................................................... 58 Figure D.18 Correlation between Dataset 4 and SVR Predicted Values .................................... 58 Figure D.19 Dataset 4 and LSTM Predictions (36 last values) .................................................... 58 Figure D.20 Correlation between Dataset 4 and LSTM Predicted Values ................................. 58 Figure D.21 Dataset 4 and Moving Average Predictions (36 last values) .................................. 59 Figure D.22 Correlation between Dataset 4 and Moving Average Predicted Values ............... 59 Figure D.23 Dataset 4 and Exponential Smoothing Predictions (36 last values) ...................... 59 Figure D.24 Correlation between Dataset 4 and Exponential Smoothing Predicted Values ... 59 Figure D.25 Dataset 4 and Autoregressive Predictions (36 last values) ..................................... 59 Figure D.26 Correlation between Dataset 4 and Autoregressive Predicted Values .................. 59 Figure D.27 Dataset 4 and Naive Predictions (36 last values) ..................................................... 60 Figure D.28 Correlation between Dataset 4 and Naive Predicted Values .................................. 60 Figure D.29 Dataset 4 and ARIMA Predictions (36 last values) ................................................. 60 Figure D.30 Correlation between Dataset 4 and ARIMA Predicted Values .............................. 60 Figure D.31 Dataset 4 and SARIMA Predictions (36 last values) .............................................. 60 Figure D.32 Correlation between Dataset 4 and SARIMA Predicted Values............................ 60 Figure D.33 Dataset 4 and MLP Predictions (36 last values) ...................................................... 60 Figure D.34 Correlation between Dataset 4 and MLP Predicted Values.................................... 60 Figure D.35 Dataset 4 and Decision Tree Predictions (36 last values) ....................................... 61 Figure D.36 Correlation between Dataset 4 and Decision Tree Predicted Values .................... 61 Figure D.37 Dataset 4 and SVR Predictions (36 last values) ....................................................... 61 Figure D.38 Correlation between Dataset 4 and SVR Predicted Values .................................... 61 Figure D.39 Dataset 4 and LSTM Predictions (36 last values) .................................................... 61 Figure D.40 Correlation between Dataset 4 and LSTM Predicted Values ................................. 61 Abstract This research makes a comparison of sales predictions from classical statistical models with predictions from machine learning and deep learning models. Since sales forecasting is an activity that has an integral impact on an organization, the need to establish more accurate forecasting methods has prompted a large number of studies on the subject, in turn driving the creation of new models, however, one issue that was displaced is the error metrics of these predictions and their impact on the business sector. Ten different forecasting models were used to obtain the predictions of four datasets, and their accuracy was measured using classical metrics such as ratio coefficients and vertical error metrics, and new theories such as peak similarity were also considered to establish the horizontal error of the predictions. The results show that classical statistical models showed the best metrics and coefficients, and that the choice of the best prediction model is not necessarily consistent with the best results for each metric. These results demonstrate the existence of under-studied error metrics with results that have an impact on the business world and may affect the choice of forecasting models. Key Words: Sales forcasting, comparison, clasiccal statistics model, machine learning, deep learning, horizontal error, vertical error, peak similarity. 1 Introduction Forecasting is a much-needed tool in everyday life in the current market; one can see its use in various industries like the agriculture industry (Idigova et al., 2023), retail (Rogermann et al., 2023), manufacturing (Kmiecik et al., 2022) whether it is for understanding sales, demand, or inventory assessment. Many operations/decisions are based on such predictions. Long-term forecasting will require more considerations such as longterm impacts on the business/company/area, how the market might change, whereas shortterm forecasting might not be subjected to that many factors (Ping et al., 2023). Whereas short-term forecasting has its own implications and applications, such as in case of water supply demand in a city, sales spike for a new product launch or promotion (Guoxuan et al.,2023). Any business would want to estimate or guess its future sales so that it can prepare. Similarly, any company or organization would want to understand what their clients/users might need in future so that they can prepare well from now on. We have multiple concepts in that regard, like economic order quantity (EOQ), which can be used to improve profits, meet demand. (Tesalonika et al., 2023). This is also a forecast, and it is usually on a shortterm basis. Forecasts may affect not only the immediate department/company but also its effect can be felt for various other areas. For example, in order to meet the demand for a particular type of beverage during summer, it will be evident that the beverage company should increase its production. But in order to do that, the respective fruits company and, by extension, the plant seeds, fertilizers, and so on, should all be increasing their production (Technavio Research, 2020). This can happen if we expect/guess the demand of the future. When discussing sales forecasting, the demand will dictate how much product should be at the store/company to fulfill the needs of the clients. If the product on hand is in excess, then the business would have wasted time getting the product to the store and keeping the other 2 products waiting, which could have met the needs of the clients better than the product that is in excess. On the other hand, if the product is short of stock, then the business loses the revenue that could have been generated from the excess demand. Also, other aspects, like inventory costs, labor costs, etc., could be controlled by a good forecast (Mascle et al., 2014). When we know or predict the demand/sales for a particular company or product, decisions like how much product one needs to have it ready can be taken along with it, and other decisions like how that particular company can affect that future prediction can also be taken. For example, decisions like announcing a sales offer or promotion to improve sales, reducing the price of the product, changing the placement of the product, advertising, can be used to improve profits for the company (Rajaram et al., 2003). So, companies might be interested in understanding what their sales are looking like in future not only to scale themselves up/down but also to influence the market for their benefit. Such knowledge of sales will have an impact on short and long-term decisions of a business, the importance of a high degree of certainty has been an important field of study in academic research, and the development of forecasting models has grown since statistical model to Machine Learning (ML) models, Deep Learning (DL) models, and hybrid (classic statistical with ML/DL) models (Makridakis & et al, 2020). Since 1979, when studies made by Makridakis and Hibon found that predictions made by Brown’s exponential smoothing adjusted by seasonality model were the most accurate over many others complex models, with the possibility of increasing this accuracy by averaging the predictions with other models (Makridakis et al, 2020), there is a question regarding the level of complexity willing to accept in a statistical model in order to increase its accuracy level; the use of advanced ML, DL or hybrid models represents a new challenge: the increasing complexity and cost at the moment of forecasting. 3 In order to illustrate the previously mentioned, we will use the results of the Makridakis Competition (M Competition) as an example; acknowledging the previously mentioned benefits of accurate forecasting in business, the M Competition proposes challenges in order to boost the studies and use of new trends in forecasting models for over 40 years (Makridakis et al, 2020). The M Competition, in its 4th edition (M4), made evident the increase in forecasting accuracy, which is represented by the decrease of the Symmetric Mean Absolute Percentage Error (sMAPE), due to the use of ML, DL and hybrid models; but also made evident the increasing complexity and cost of it, as it is seen in Figure 1; which, shows that increasing the training time of an ML, DL, or hybrid model reduces the sMAPE, thus increasing the accuracy of the model. Figure 1. Negative Correlation Between the sMAPE and Training Time of ML/DL Models in M4 (Makridakis, Spiliotis, & Assimakopoulos, 2020) Moreover, academic researchers in different fields have also made improvements in studies related to forecasting techniques, proposing new metrics for analyzing errors and understanding the effect of peak values in time series data. In this sense, the concept of Input Imitation (Zaji et al, 2019) was developed, indicating and demonstrating that peak values of a data series affect the forecasted values, causing a horizontal bias (Seen in Figure 2). Due to 4 the presence of peak values in the time series that might be inducing bias in the forecasted values, this concept alerts to the existence of a bias, highlighting the importance of the horizontal analysis that should be measured and analyzed in parallel with the vertical analysis, in order to obtain metrics that indicate the real situation of the model and the forecasted values. Figure 2. The input imitation problem in time series forecasting (Bonakdari et al., 2019). Considering the aforementioned, this study aims to analyze the accuracy of predicted values using classical methods (Naïve, Moving Average, Exponential Smoothing, Autoregressive, ARIMA, and SARIMA), machine learning methods (Multilayer Perceptron, Decision Tree, and Support Vector Regression), and a deep learning method (Long ShortTerm Memory). Using these methods, the relation coefficients (Coefficient of Determination, Correlation Coefficient), vertical error metrics (Mean Squared Error, Root Mean Squared Error, and Mean Absolute Error), and a horizontal error metric (Peak Similarity) of four different datasets will be obtained and will be performed a comparison of coefficients and metrics, analyzing comprehensively the consistency and accuracy of the prediction for each dataset. 5 Literature Review From generic/naïve forecasting to forecasting based on data (Prescott,1922), sales forecasting changed over time. Prescott (1922) mentions how population growth was used as a base to forecast the sales of other products in the USA. As the data collection increased with the population growth and their differences in habits, the sales forecasting accuracy started dropping, the errors kept growing, and there raised a need for better forecasting techniques (Rothe, 1978). This results in better forecasting techniques like time series forecasting, regression analysis, classification and categorical forecasting and so on. More accurate techniques like time series forecasting help develop better decisions for managers and, by extension, the business (Mircetic et al., 2022). As initial intuitive methods evolved into more complex forecasting; not only the forecasting method became efficient, but it also started considering the historical data in a systematic manner to become more accurate, and more dependent on quantitative methods. But, with the abundance/huge volume of the data that helps in forecasting that also covers the dependency on multiple variables, the accuracy also started dropping. And any manipulation of such forecasts generated by machines took work, which drove the users further away from the quantitative techniques in the early onset period of computers (Lawrence, 1983). Machine Learning accelerated the sophistication of forecasting as the world moved more toward it. Programs that can learn the intricate patterns of the data and thus forecast the probabilistic values for the data became more efficient with huge amounts of customer data based on different market segmentations (Chase, 2016). Although those could be somewhat biased, the professionals overseeing the development of machine learning and the predictive analysis techniques to forecast sales will help get the required results for the managers/businesses. However, even with advanced techniques in forecasting, it is important to note that no technique is 100% correct, due to various factors like the time lag from the predicted and the 6 actual data points to unforeseen reasons that increase the error in the predictions. The error is the difference between the observed and modelled values of the sales. Having different types of error calculation methods like Root Mean Square Deviation (RMSE), Mean Absolute Error (MAE), R2 and as such will help models like AutoRegressive Integrated Moving Average (ARIMA), Seasonal AutoRegressive Integrated Moving Average (SARIMA), to become more accurate against these errors (Ramos et al., 2015). A vertical error, where the difference between actual and predicted value, has been discussed multiple times in research articles (Bannister, 2008; Neilson et al., 2022) to minimize the gap, but the horizontal error is rarely discussed. Considering the importance of sales forecasting of products and services, countless works were carried out in order to establish the appropriate forecasting method for the different time series of each product and the appropriate error measures for each method; because the influence of the last aforementioned affect the preference of use of each method (Aras et al, 2017). Table 1 shows a brief summary of academics researches performed in order to analyze the best model for time series forecasting. Table 1. Researches of Forecasting Methods Comparison Reference / Year of Publication (Ren et al., 2016) (Aras et al., 2017) Statistic Model Autoregressive Integrated Moving Average (ARIMA) Pure Panel Data (PPD) Grey Models ARIMA Exponential Smoothing (ETS) Autoregressive Fractionally Integrated Moving Average (ARFIMA) ML/DL Model Error Metric Best Model Extreme Learning Model (ELM) Mean Squared Error (MSE) Symmetric Mean Absolute Percentage Error (sMAPE) PPD sMAPE Artificial Neural Theil-U Root Mean Networks (ANN) Squared Error (RMSE) Artificial Neural Mean Absolute Error Network Fuzzy (MAE) Interference (ANFIS) Mean Absolute 7 Combined Forecasting Reference / Year of Publication Statistic Model ML/DL Model Error Metric Best Model PercentageError (MAPE) (Elmasdotter & Nystromer, 2018) ARIMA (Benboubker et al., 2019) ARIMA ETS TBATS model (Liu et al., 2020) Markov Chain Grey Model (Smolak et al., 2020) ARIMA (Haselbeck et al., 2022) ETS Seasonal Autoregressive Integrated Moving Average (SARIMA) SARIMA with external factors (SARIMAX) (Ensafi & et al, 2022) ARIMA SARIMA Autoregressive Long Short Term Memory (LSTM) Neural Network Autoregression (NNA) ELM Support Vector Machines (SVM) Minimum Description Length Neural Network (MDL – NN) Extra-Trees (ET) Random Forest (RF) Support Vector Regression (SVR) ANN LSTM Lasso Regression (LR) Ridge Regression (RR) Elastic Net Regression (ENR) Extreme Gradient Boosting (XGBoost) Bayesian Ridge Regression (BRR) Automatic Relevance Determination (ARD) Gaussian Process Regression (GPR) Facebook Prophet LSTM Convolutional Neural Network (CNN) 8 RMSE MAE LSTM Mean Absolute Scaled Error (MASE) NNA sMAPE MASE Revised Mean Absolute Percentage Error (RMAPE) MDL - NN RMSE MAPE Nash-Sutcliffe Index of Efficiency (EI) RF RMSE sMAPE MAPE XGBoost MSE RMSE MAPE LSTM Reference / Year of Publication Statistic Model ML/DL Model Error Metric Best Model SVR LSTM K-Nearest Neighbor (KNN) MSE MAE RMSE LSTM KNN Moving Average (ARMA) (Iaousse & et al, 2023) ARIMA Regarding the information in Table 1, it is evident that there is no rule over choosing a specific statistic model or ML/DL model to forecast that fits every situation (Aras et al, 2017). In 2016 and 2017, researchers obtained conclusions that show the classical statistics model with better accuracy than ML/DL models; henceforth, the improvements in this field increase the accuracy of ML/DL models, also increasing the public attention and use of these forecasting methods (Alroomi et al, 2022); this is visible in the conclusion of Best Model of each research from 2018 and above. An important point, also visible in Table 1, is the preference for using vertical error metrics in order to obtain the accuracy of each method; just two of nine of the mentioned research used MASE as an error metric, although it is a measure of vertical error relative to the naive method, it can be considered a step towards the search for error metrics that offer new perspectives for the evaluation of the values predicted by any method. The growing awareness that considering only one error measure does not guarantee a correct analysis and consequent selection of an adequate prediction model, because each error measure has strengths and weaknesses (Shcherbakov et al, 2013), resulted in the need to create new measures that can overcome these weaknesses; thus, these metrics grew in complexity up to the Scaled Pinball Loss Function for quantile forecast (Makridakis eta al., 2022) as an example; all these new analyses consider the vertical error as the axis of analysis. 9 However, all these approaches give little relevance to the analysis of the horizontal error, and considering that new research made evident the input imitation problem (Zaji et al, 2019), the use of horizontal error metrics will help to identify upper and lower peaks in the data, which affects the accuracy of the forecasting, with implication in the real world application. 10 Methodology Overview The paper considers the issue that is existing in the current retail world whether it is at a store level or at a country level. The actual issue being unable to perform more accurate short-term forecasting based on the data available. This is due to various factors like dependency on multiple variables like economic impacts, and trend impacts. Owing to all these it will become more complex to do effective predictions which can help the management to make decisions to sustain the company’s growth. Accordingly, four datasets are chosen at various levels of the retail sector in different parts of the world and different areas of retail. Once chosen, the datasets were analysed for any underlying issues like missing data, inconsistencies, or reliability of the data. Depending on all these the data is cleaned, divided into training and test sets and made ready for further analysis. To understand and compare the forecasting capabilities of the existing methods two types of forecasting methods were chosen namely statistical methods containing Naïve, Auto Regressive, Exponential, Moving Average, ARIMA, and SARIMA, similarly ML/DL methods are chosen namely MultiLayer Perceptron, Support Vector Regression, Decision Tree, Long Short-Term Memory methods. Based on the training and test sets each of these methods were trained and made sure all the parameters are matching for the forecasting purposes. As the results are tabulated and converted into graphs for respective methods, all the results are compared with other methods to understand the pattern of the predictions and the alignment of the actual results to our goal of the paper. Having considered peak similarity one of the important error metrics, the conclusions are drawn based on the comparison of existing error metrics and peak similarity, on how our assumptions/ideas will help businesses to make more informed decisions. 11 Figure 3. Flowchart of the research made by the author. Limitations: The present research has the following limitations: 1. As previously discussed, Horizontal Error is not valid in all time-series predictions. It depends on the case and needs to be chosen by the analyst accordingly. Datasets used in the paper are adjusted by the provider for anonymity and business confidentiality purposes, so at times, we might be dealing with data points different from the original. This may lead to wrong predictions. Datasets are limited; even though we have train and test sets, if the predictions need to be validated to date, it cannot be done. The models and error metrics are not exhaustive; there might be other methods or error metrics that could give different results. Based on the scope and our discretion, particular methods and error metrics are used. Real-time predictions could differ from the historical data; further research needs to be done in that area. Horizontal error is not directly used to train the ML/DL models; rather, it is used to compare the results and the accuracy of predictions. 2. 3. 4. 5. 6. Datasets In order to achieve the objectives of this research, the following datasets will be used: Dataset 1 - Liquefied Gas sales in Bolivia This research incorporates a dataset detailing monthly liquefied gas sales in Bolivia, sourced from the National Statistics Institute of Bolivia. Given the Bolivian government's long-term commitment to subsidizing petroleum derivatives in the domestic market 12 (Ministerio de Hidrocarburos y Energias de Bolivia, 2023), accurately projecting sales volumes becomes crucial. This ensures a consistent domestic supply and helps anticipate the economic implications of the subsidy. Since the primary consumption of this fuel is by households, any shortage can have a significant social impact on the populace. 550,000 500,000 Quantity Sales (bbl) 450,000 400,000 350,000 300,000 250,000 200,000 Jan-00 Jan-03 Jan-06 Jan-09 Jan-12 Time Jan-15 Jan-18 Jan-21 Figure 4. Plot of the Dataset 1 Dataset 2 – Diesel Oil sales in Bolivia This research utilizes a dataset detailing monthly diesel oil sales in Bolivia, sourced from the National Statistics Institute of Bolivia. Notably, these sales are subsidized by the Bolivian government, with projections extending long-term (Ministerio de Hidrocarburos y Energias de Bolivia, 2023). Diesel fuel is predominantly consumed by heavy transport and industrial machinery. Consequently, its availability has a direct impact on sectors such as agriculture, industry, and transportation. Quantity Sales (bbl) 1,300,000 1,100,000 900,000 700,000 500,000 300,000 Jan-00 Jan-03 Jan-06 Jan-09 Jan-12 Time Figure 5. Plot of the Dataset 2 13 Jan-15 Jan-18 Jan-21 Dataset 3 – Brazilian Retailer This dataset is extracted from the website Kaggle and was uploaded by a profile named TEVEC Systems (Tevec Systems, 2017). As per the website the data was obtained from a top Brazilian retailer and modifications to the data are done in order to anonymize the retailer. Out of the provided values from the site, the dateline and sales are used due to its relevance to the paper. The idea of the original author of this dataset was to provide basis for the implementation of Machine Learning (ML) models. The paper uses the dataset on similar lines. As discussed previously, short term forecasting helps the retailers in multiple ways like reducing wastage, solving inventory issues like space, ordering, replenishing, Etc. The same issue was considered even by the author of the dataset. The dataset has dates starting from 01-January-2014 to 31-July-2016 with column name as Date. The second column consists of Sales data with column name as Sales or Data. The dataset is taken as is and no further assumptions to adjust the units of the data to be in hundreds or thousands of dollars are done. All the data appears as it is, and in direct dollar format. In order to relate to a real-world retailer, the amounts could be increased to thousands or millions depending on the size of the retailer which is anonymized initially. Total number of rows in the data are 937. The sales data ranges from 0 to 542. The data misses 6 dates at different places, no adjustments were to these points. The zeroes in the data could be assumed as either low sales or sales information are removed for confidentiality purposes. 600 500 Sales Revenue ($us) 400 300 200 100 0 01-Jan-14 01-May-14 01-Sep-14 01-Jan-15 01-May-15 Time Figure 6. Plot of the Dataset 3 14 01-Sep-15 01-Jan-16 01-May-16 Dataset 4 – The United Kingdom Retail Sales The dataset is extracted from the retail sales of the Great Britain having the country’s data of all retail except automotive fuel. The dataset identifier code is J3L6. The website is of Britain’s government, Office for National Statistics (ONS, n.d.). ONS is the UK’s statistics producer, and the data found on the site is of open license. They release this information in periodic basis. This particular data is obtained from the dataset version dated, 18-Aug-2023 and the latest version is available from 22-Sep-2023. It has two columns as Date and Data/Sales. The dates range from Jan-1994 to Jan2016 with monthly intervals. There are total of 265 rows with sales starting from 16M to 34M. These are taken as they are, and currency conversion is not used as this is univariate data and is directly considered as dollars. Since the data is seasonally adjusted, the effect of other external factors can be considered low and the results from the predictions can be considered close to actual figures. The data keeps increasing from start to the end of the available dates with few variations. There are no missing or zero sales which helps the prediction methods produce better results. Since the data is for entire great Britain’s but not for a single retail store or a retail company, this can not be seen in the context of direct inventory control but rather can be viewed as to aid in the allocation of resources or infrastructure at the country level by the government or private investors. Sales Revenue ($us) 35,000,000 31,000,000 27,000,000 23,000,000 19,000,000 15,000,000 Jan-94 Jan-97 Jan-00 Jan-03 Jan-06 Time Figure 7. Plot of the Dataset 4 15 Jan-09 Jan-12 Jan-15 Descriptive Statistics Considering the above, the descriptive statistics of the datasets used for this research are presented below. Table 2. Descriptive Statistics of Datasets Description Mean Median Mode Standard Deviation Range Minimum Maximum Count Dataset 1 372,169 370,712 407,825 65,341 285,862 229,443 515,305 281 Dataset 2 770,805 745,235 690,176 258,594 1,066,278 313,187 1,379,465 281 Dataset 3 91 76 0 81 542 0 542 937 Dataset 4 23,991,410 24,335,865 4,535,639 18,067,974 16,074,633 34,142,607 265 Forecasting Models Naïve Method The Naïve method may be considered one of the simplest forms of forecasting, which employs the method of using the immediate past actual output as the prediction for the current time period (Akpınar et al., 2017). For example, at time t, one needs to make a prediction of 𝑦𝑡 and the actual output of the previous period is 𝑥𝑡−1 then, 𝑦𝑡 = 𝑥𝑡−1 (1) Moving Average Moving Average can be seen as the average of the latest fixed number of data points for this paper (Hyndman, 2011). So, at any point in time, a new data point appears that is considered the latest and will be used to calculate the average starting from it and going backwards in the dataset. For example, the datasets are time series, so only the forward direction is considered for the scope of the project as progression. The latest data point is t, the previous one being t-1 and then t-3; a total of 3 data points are considered for taking the 16 average. 𝑥𝑡 ,𝑥𝑡−1,𝑥𝑡−2 being the data points considered at a time, then the moving average at a specific time is, 𝑀𝐴𝑡 = 𝑥𝑡 +𝑥𝑡−1 +𝑥𝑡−2 (2) 3 For this, the first three data points are ignored for making predictions due to the unavailability of the data. The rest are calculated under the column MA (Moving Average). The same calculations are used for both training and test data. MA6 and MA12 (a set of 6 and 12 values at a time, respectively) were calculated to show the correlation changes. Exponential Smoothing Alpha (α)/smoothing factor is being considered in this method for making the predictions. Making the predictions based largely on the recent data and less on further old data can be deemed as exponential smoothing (Billah et al., 2006). In this paper, α value is obtained by calculating the best possible value for a minimal error. Then, each prediction is based on its previous prediction as well as the actual value of the previous time period. For example, if the current time period is t the prediction is 𝑦𝑡 , the previous prediction is 𝑦𝑡−1 , and the previous actual value is 𝑥𝑡−1 , then 𝑦𝑡 = 𝛼𝑥𝑡−1 + 𝑦𝑡−1 (1 − 𝛼) (3) Autoregressive Model Autoregressive model forecasting by considering the historical data and their respective weights in predicting future values. The method employs different time lags; for example, Lag1 in this paper is considered as the immediate previous datapoint for the current datapoint (xt). And three lags are considered for each prediction, and the lags are considered based on the correlation of their actual data points (Maatallah et al., 2015). The highest three correlated lags are considered for each dataset, and the test dataset uses the same lags as that of the train set. For calculating the weights, each lag coefficient is considered and used in the main equation as below, 17 𝑦𝑡 = ∅𝑎 𝑥𝑡−𝑎 + ∅𝑏 𝑥𝑡−𝑏 + ∅𝑐 𝑥𝑡−𝑐 (4) here, a, b, c represent the coefficients and lag values of Laga, Lagb, Lagc. Autoregressive Integrated Moving Average (ARIMA) This model combines both AR and MA to get the benefits of both the models of considering three different aspects like the order of AR (p) of the dataset which tells how many past lags were considered for the forecasting, degree (d) or the number of times the data needed to be differenced (subtract it with its past value) to make it stationary (which means that have constant mean and variance), and order of MA (q) which tells the number of past values used for the average (Ediger et al, 2007). An automated ARIMA package is employed for predicting p, d, q values by minimizing the AIC (Akaike Information Criterion), which tells the information lost when these particular values were used. Once the program finds the least AIC, those values are used in the ARIMA model to forecast the values of both training and test sets. Seasonal Autoregressive Integrated Moving Average (SARIMA) This method is the extension of the previous model. The concept of seasonality which tells the patterns in the given data over time. Considering the datasets used in the paper are retail sales, seasonality would be a better consideration for the predictions. Here, the seasonality 's' is considered for forecasting and similar packages and methods are used, like ARIMA forecasting. Once the p,d,q values are determined. Considering the seasonality s, P, D, Q (respective seasonal aspects) are predicted by minimizing AIC. And the forecasting is done for both train and test datasets. For all Machine Learning models, lags were prepared as a separate function. Similarly, graphs and Peak Similarity (PS) functions were prepared in order to be used under each model to be called whenever required. Training and test data sets are divided accordingly, with 70% and 30% of data in each set, respectively. Employing different ML 18 (Machine learning) models using Python packages by selecting appropriate parameters resulted in optimum results as per the scope of the paper. Multilayer Perceptron (MLP) MLP works based on the weighted connections, neurons, activation functions, and connection of all these neurons to form different layers from input to output, giving the scope for the network to learn the data and predict future sales accordingly. Different combinations of all these have been employed to get the optimal output (Armano et al., 2023), such as a total of 1 layer with 15 neurons for datasets 1 and 2 and 2 layers with 10 neurons each. The number of epochs/iterations are defined as 1000. Activation function ReLu (rectified linear activation unit) solver lbfgs, and ADAM (adaptive moment estimation) with a random state of 42 are used. Once all are defined by importing the MLP python package, the same is used for training using the train set and predicting the test set. Decision Tree (DT) DT develops the predictions based on the leaf nodes that will be split from the main node for making a decision based on the significance of the decision. For example, if a decision to select between two states needs to be made, then two leaves will be developed from a sample, giving each decision its respective weight and so on. Thus forming a tree-like structure (Chen et al., 2017). In this paper, a similar approach is used for the prediction by defining each parameter as follows. The maximum depth the tree can reach will be of 5 levels. Each node can be split into 3 child nodes. A leaf can have one sample. All the features in the data need to be considered. As done for MLP, a random initial state of 42 is selected. Support Vector Regression (SVR) The SVR model uses the input data to transform into higher dimensions in order to identify the patterns. It can be done using kernel functions, and this paper uses the polynomial kernel, which raises the power of the input data to introduce non-linearity (Chen 19 et al., 2017). Then, a regularization of C=3 for datasets 1 and 2, and C=1 for dataset 3 and 4, indicating moderate regularization strength. Epsilon of 0.1 is used, which identifies the width of the tube, i.e., the error margin for the predictions. Combining all the imported SVR function through the package is used to predict the training and test sets accordingly. A general SVR function can be represented as below, 𝑓(𝑥) = ∑𝑖 = 1𝑁(𝛼𝑖 − 𝛼𝑖 ∗ ) ⋅ 𝐾(𝑥, 𝑥𝑖) + 𝑏 (5) here, f(x) is the output for a given input x. N is the number of support vectors. 𝛼𝑖, 𝛼𝑖 ∗ are Lagrange multipliers. K is the kernel function; b is bias term. Long Short-Term Memory (LSTM) This method draws both the advantages of long-term as well as short-term memory by choosing the parameters appropriately and also avoiding the chances of overfitting while considering the historical data to train the model. It has different gates (forget, input and output) which determine the retention and passing of the information (Abbasimehr et al., 2020). In order to use the LSTM model, the data is reshaped into a 3 dimensional with a third dimension of size 1. Similar to the MLP, ReLu activation and ADAM solver are used. 100 units/neurons are defined for this function. And for the regression task, Mean Squared Error is considered. As the information is flown through each neuron, the memory is retained based on the parameters and adjusted accordingly as the new information appears, eventually producing the final predictions. Relation Coefficients and Error Metrics In order to identify the relevance of the use of classical or machine learning forecasting methods, the following coefficients will be used. In order to define each coefficient, we will refer to Error (E) as the difference between the actual value (y) and the predicted value (ŷ) of a dataset, which the following formula can define: E= y− ŷ (6) 20 also, the mean of the actual values (ȳ) is the total sum of all the actual values, divided by the number of values (m), the following formula can define it: 1 ȳ = 𝑚 ∑𝑚 1 𝑦 (7) Coefficient of Determination (R2) The coefficient of determination indicates the proportion of the variance in the dependable variable explained by the independent variable. It is a value between 0 and 1; an R2 of 1 means that the independent variable explains all the variance of the dependent variable. It can be obtained using the following formula: 𝑅2 = 1 − 2 ∑𝑚 1 (y− ŷ) (8) 2 ∑𝑚 1 (y− ȳ) Correlation Coefficient (r) The correlation coefficient indicates the linear relationship between two variables. It is a value between -1 and 1; a correlation coefficient of 1 indicates a perfect positive correlation, -1 indicates a perfect opposite correlation, and 0 indicates the inexistence of correlation between the values. As mentioned, to obtain the correlation coefficient it is necessary to have two sets of data (a, b) of the same amount (m); each set of data will represent a point within the linear relationship; it is also necessary to have the mean of the actual values on both datasets (ahat, bhat) can be defined by the following formula: r= ∑m 1 (a−ahat )(b−bhat ) (9) 2 m 2 √∑m 1 (a−ahat ) ∑1 (b−bhat ) In order to analyze the classical and machine learning approaches of forecasting methods, the following metrics will be analyzed. Mean Squared Error (MSE) Mathematically, the MSE is the sum of all the squared E of a dataset divided by the number of values. The following formula can also define it: 1 2 MSE = m ∑m 1 (E) (10) 21 Root Mean Square Error (RMSE) The RMSE is the root squared of the MSE; the following formula represents it: 1 2 RMSE = √m ∑m 1 (E) (11) RMSE = √MSE (12) Mean Absolute Error (MAE) The MAE can be defined as the summation of all the absolute values of the error divided by the number of dates used in the dataset (m). it is represented by the following formula: 1 MAE = 𝑚 √∑𝑚 1 |𝐸| (13) The metrics mentioned previously are part of the vertical metrics, which consider the error of the predicted values with the actual values in each period of time. Zero is the best possible value to this metrics. The different formulas to obtain them also creates the problem of inconsistency, this means that lower MSE or RMSE, not necessarily implies the lower MAE. Thus, a comprehensive analysis of all metrics are needed (Chicco & Warrens & Jurman, 2021). To offer a new point of view about the relevance of forecasted values of a dataset, the following metric will be used. Peak Similarity (PS) For the actual values (y), a peak will be defined as a higher or lower value considering a later value and a previous value. And for the predicted values (ŷ), a peak will be defined as higher or lower value considering a later value and a previous value; and a peak must exist in the actual values in the same frequency or period and must be in the same direction (higher or lower). 22 Figure 8. Peak Similarity Flow Chart Rationale for Horizontal Error As seen in the literature review, there are multiple papers that are based on various types of error metrics and very few that are actually based on Horizontal Error. Considering a retail store as an example, they might want to understand when they are going to see peaks or troughs in their demand so that they can prepare well for it. Especially during special occasions like holidays/promotional days, even if they miss the assumption by one day, the prediction can be considered as wrong due to the missed sales target. Or, in an occasion where there are signs of a flood, the government might want to know the precise time so that they can estimate the resources needed to evacuate the people and work accordingly. Based on this idea, Horizontal Error is important in such cases. It might not be the case for all types 23 of time-series predictions; in cases such as the overall average error, it needs to be given higher importance than catering to the surges in demand. Also, in cases where there will be a lesser chance of sudden or unpredicted surges, Horizontal Error might not be an optimum choice (this case will be further discussed with data in the next sections of the report). 24 Results and Discussion Once the prediction of the four datasets (shown in Appendix A, B, C, and D) were obtained, the relation coefficients and error metrics were also obtained (Appendix E); in order to perform the following analysis, the coefficients and metrics obtained from Dataset 1 will be used, considering that the mentioned coefficient and metrics cover the scope and goals of this research; important information obtained from other datasets will be mentioned in the discussion section. Relation Coefficients Analysis The predicted values’ accuracy of Dataset 1, according to the coefficient of determination and correlation coefficient, are shown in the following figure (values closer to +1 mean better accuracy). Train Accuracy 0.784 0.886 0.860 0.936 0.887 0.942 0.868 0.941 0.850 0.922 0.853 0.923 0.762 0.873 0.873 0.935 0.759 0.871 0.944 0.972 Model Test Accuracy 0.604 SARIMA 0.777 0.400 Long-Short Term Memory 0.636 0.398 Autoregressive 0.631 0.385 Multilayer Perceptron 0.629 0.345 Moving Average 0.588 0.277 Exponential Smoothing 0.527 0.238 Naïve 0.488 -0.197 Support Vector Regression 0.649 0.000 ARIMA -0.008 -1.537 Decision Tree Coefficient of Determination -0.075 Correlation Coefficient Figure 9. Accuracy of Forecasting models considering relation coefficients – Dataset 1 As mentioned in Figure 9 for Dataset 1, it is observed that the high relation coefficients shown in the train set are not visible in the test set, which can be a sign of 25 overfitting of the model or extreme values affecting the relationship between predicted and actual values, especially considering the Decision Tree and SVR models. SARIMA shows the higher relation coefficient’s accuracy considering the test set, and the Decision Tree and ARIMA models show lower accuracy with the same parameters. Vertical Error Analysis Considering the vertical error metrics, the following figure presents the RMSE, MSE, and MAE obtained for Dataset 1 (Values closer to 0 mean lower error). Train Error Model Test Error 635,333,345 598,985,584 SARIMA 25,206 24,474 15,235 19,369 220,763,521 655,309,255 Long-Short Term Memory 14,858 25,599 11,053 19,666 208,396,027 671,806,769 Multilayer Perceptron 14,436 25,919 10,896 19,323 178,699,518 802,764,215 Autoregressive 13,368 28,333 9,942 22,635 350,902,765 855,485,447 Moving Average 18,732 29,249 15,061 24,594 364,457,228 877,695,131 Exponential Smoothing 19,091 29,626 15,282 25,123 601,088,072 1,206,761,742 Naïve 24,517 34,738 19,431 28,204 199,925,667 1,307,716,383 Support Vector Regression 14,140 36,162 10,585 28,687 744,139,936 2,388,334,371 ARIMA 27,279 48,871 17,262 40,451 87,577,504 2,771,724,142 Decision Tree 9,358 52,647 7,102 44,399 MSE RMSE Figure 10. Accuracy of Forecasting models considering vertical error – Dataset 1 26 MAE In Figure 10, it is visible that SARIMA shows the lowest error for the test data of Dataset 1, closely followed by LSTM and MLP models, same as relation coefficients, in the vertical error analysis the Decision Tree, and ARIMA model shows the higher error. Considering the relation coefficients and vertical error metrics’ analysis, it is observable that both analyses are highly related and both metrics are used as accuracy metrics of predicted values, excluding metrics that can enhance the analysis of predicted values’ accuracy. Also, it is observable that classical statistical forecasting models, are able to offer better accuracy than machine learning or deep learning models. Horizontal Error Analysis In order to perform the horizontal error analysis, the following information regarding peak similarity was obtained from Dataset 1 (values closer to +1 mean better accuracy). Train Accuracy 0.735 0.701 0.675 0.624 0.331 0.169 0.641 0.127 0.000 0.000 Model Autoregressive Multilayer Perceptron Support Vector Regression Long-Short Term Memory SARIMA ARIMA Decision Tree Moving Average Exponential Smoothing Naïve Test Accuracy 0.667 0.667 0.667 0.641 0.585 0.255 0.179 0.111 0.000 0.000 Peak Similarity Figure 11. Accuracy of Forecasting models considering horizontal error – Dataset 1 As mentioned in Figure 11, considering the peak similarity as a horizontal error metric, it is obtained that AR, MLP and SVR share the higher accuracy, and Exponential Smoothing and Naïve Method share the lower accuracy under this parameter. The results of the horizontal error analysis show a different result than relation coefficients and vertical error metrics and must be analyzed according to the forecasting need, this analysis can conclude in using a forecasting method with higher accuracy in relation metrics and vertical error metrics, higher accuracy in horizontal error metrics, or balanced accuracy. Also, it is 27 important highlight that for Dataset 1, a classical statistical model offers the best accuracy regarding horizontal error. It is important to mention that errors like MSE, RMSE, and R2 can be used to analyze the Vertical error, but Horizontal Error is generally hard to achieve due to its nature, like the dependency of more than one variable for the forecasting. For example, sales may change due to the holidays, new products, marketing, consumer habits, discounts, inflation, Etc. these factors are not easily accountable due to the simple nature of the forecasting models chosen. Thus, accounting for Horizontal Error would also be tricky. ML/DL models depend on the previous input for the prediction, which can make it difficult for the model to assume the correct peaks. In our four datasets, similar cases were shown with both statistical and ML/DL models; both achieved very few peak similarities, with the highest being 76%+ in Dataset 4. The concept of peak similarity itself can be further optimized with respect to dependency on other variables, standing in coherence with other error metrics, etc., along with other ML/DL forecasting issues. In our datasets, both statistical as well as ML/DL models failed to achieve the balance. Hence, there is a need to develop methods which take into consideration all these metrics to build better predictions based on the business requirements. Conclusion Although it intuitively appears impossible to achieve even near-perfect Horizontal Error, a short-term forecasting consideration can make this idea more plausible. Unlike stock market changes or disaster situation forecasting, short-term retail sales forecasting could see less drastic changes from day to day or year to year. This gives the scope for improvement of peak similarity, which could benefit retail stores or even the retail sector of a country in various areas. Inventory can be effectively managed, and when it comes to smaller stores, it is not easy to manage or store high volumes of inventory in anticipation of sales peaks. 28 Predicting the peaks with appropriate time gaps could allow the stores to plan accordingly and order the inventory at the right times, thus reducing storage costs (Rockeman, 2022). Labour underutilization or insufficiency can be avoided by predicting the peaks. Other factors can be brought into the picture to improve the troughs or take appropriate actions at the right time. In the case of bigger economic areas like the retail sector of a country, the infrastructure of the country can be arranged to facilitate the sales changes, i.e., imports/exports. On the other side, predicting peaks at the wrong time can reverse these effects and can lead to more wastage or shortage of resources than other non-peak times. The combined effect of this shortage / wastage can lead to a bigger impact on the profits of the companies. 29 References Abbasimehr, H., Shabani, M., & Yousefi, M. (2020). An optimized model using the LSTM network for demand forecasting. Computers & Industrial Engineering, 143, 106435. https://doi.org/10.1016/j.cie.2020.106435 https://www.sciencedirect.com/science/article/abs/pii/S0360835220301698 Akpınar, M., & Yumuşak, N. (2017). Naive forecasting of household natural gas consumption with sliding window approach. Turkish Journal of Electrical Engineering and Computer Sciences, 25, 30–45. https://doi.org/10.3906/elk-1404-378 https://journals.tubitak.gov.tr/cgi/viewcontent.cgi?article=2398&context=elektrik Alroomi, A., et al (2022). Fathoming empirical forecasting competitions’ winners. International Journal of Forecasting. 38. 1519 – 1525. https://doi.org/10.1016/j.ijforecast.2022.03.010 Aras, S., Deveci, I., Polat, C. (2017). Comparative Study on Retail Sales Forecasting Between Single and Combination Methods. Journal of Business Economics and Management. 18(5). 803-832. https://doi.org/10.3846/16111699.2017.1367324 Armano, G., & Manconi, A. (2023). Devising novel performance measures for assessing the behaviour of multilayer perceptrons trained on regression tasks. PLoS ONE, 17(5), 1– 15. https://doi.org/10.1371/journal.pone.0285471 https://search.ebscohost.com/login.aspx?direct=true&AuthType=sso&db=asn&AN=1 63793130&authtype=sso&custid=ns012452&site=edslive&scope=site&custid=ns012452 Bannister, R. (2008). A review of forecast error covariance statistics in atmospheric variational data assimilation. I: Characteristics and measurements of forecast error covariances. Quarterly Journal of the Royal Meteorological Society, 134(637), 1951– 1970. https://doi.org/10.1002/qj.339 30 Basu, S., & Schroeder, R. G. (1977). Incorporating Judgments in Sales Forecasts: Application of the Delphi Method at American Hoist & Derrick. Interfaces, 7(3), 18– 27. https://doi-org.ezproxy.myucwest.ca/10.1287/inte.7.3.18 Benboubker, G., Kissani, I., Mourhir, A. (2019). Comparative Analysis in Sales Forecasting: Classical Methods and Neural Networks. Proceedings of the International Conference on Industrial Engineering and Operations Management. Billah, B., King, M. L., Snyder, R. D., & Koehler, A. B. (2006). Exponential smoothing model selection for forecasting. International Journal of Forecasting, 22(2), 239–247. https://doi.org/10.1016/j.ijforecast.2005.08.002 https://www.sciencedirect.com/science/article/abs/pii/S016920700500107X Bonakdari, H., Zaji, A. H., Binns, A. D., & Gharabaghi, B. (2019). Integrated Markov chains and uncertainty analysis techniques to more accurately forecast floods using satellite signals. Journal of Hydrology, 572, 75-95. Brookmire, J. H. (1913). Methods of Business Forecasting Based on Fundamental Statistics. American Economic Review, 3(1), 43. Chase, J. . C. W. (2016). Machine Learning Is Changing Demand Forecasting. Journal of Business Forecasting, 35(4), 43–45. Chen, Y., Xu, P., Chu, Y., Li, W., Wu, Y., Ni, L., Bao, Y., & Wang, K. (2017). Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Applied Energy, 195, 659–670. https://doi.org/10.1016/j.apenergy.2017.03.034 https://www.sciencedirect.com/science/article/abs/pii/S0306261917302581 Chicco, D., Warrens, M., Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science. https://doi.org/10.7717/peerj-cs.623 31 Ediger, V. Ş., & Akar, S. (2007). ARIMA forecasting of primary energy demand by fuel in Turkey. Energy Policy, 35(3), 1701–1708. https://doi.org/10.1016/j.enpol.2006.05.009 https://www.sciencedirect.com/science/article/abs/pii/S0301421506002291 Elmasdotter, A., Nystromer, C. (2018). A Comparative study between LSTM and ARIMA for sales forecasting in retail. Vetenskap Och Konst. Ensafi, Y., (2022). Time-series forecasting of seasonal items sales using machine learning – A comparative analysis. International Journal of Information Management Data Insights. 2. Guoxuan Liu, Dragan Savic, & Guangtao Fu. (2023). Short-term water demand forecasting using data-centric machine learning approaches. Journal of Hydroinformatics, 25(3), 895–911. https://doi.org/10.2166/hydro.2023.163 https://iwaponline.com/jh/article/25/3/895/94111/Short-term-water-demandforecasting-using-data Haselbeck, F., et al, (2022). Machine Learning Outperforms Classical Forecasting on Horticultural Sales Predictions. Machine Learning with Applications. 7. Hyndman, R. (2011). Moving averages. In Springer eBooks (pp. 866–869). https://doi.org/10.1007/978-3-642-04898-2_380 https://link.springer.com/referenceworkentry/10.1007/978-3-642-04898-2_380 Iaousse, M., et al (2023). A Comparative Simulation Study of Classical and Machine Learning Techniques for Forecasting Time Series Data. International Journal of Online and Biomedical Engineering. 19(08). 56-65. https://doi.org/10.3991/ijoe.v19i08.39853 Idigova, L. M., Rakhimova, B. H., & Amadaev, A. A. (2023). Foresiting as a tool for innovative development of the agroindustrial complex. In Nucleation and 32 Atmospheric Aerosols. American Institute of Physics. https://doi.org/10.1063/5.0144933 https://pubs.aip.org/aip/acp/article/2526/1/030017/2901014 Kmiecik, M., & Zangana, H. (2022). Supporting of manufacturing system based on demand forecasting tool. LogForum, 18(1), 35–48. https://doi.org/10.17270/j.log.2022.637 http://www.logforum.net/volume18/issue1/abstract-4.html Lawrence, M. J. (1983). An Exploration of Some Practical Issues in the Use of Quantitative Forecasting Models. Journal of Forecasting, 2(2), 169–179. https://doiorg.ezproxy.myucwest.ca/10.1002/for.3980020207 Liu, P., Ming, W., Hu, B. (2020). Sales forecasting in rapid market changes using a minimum description length neural network. Neural Computing and Applications. 33. 937-948. https://doi.org/10.1007/s00521-020-05294-8 Maatallah, O. A., Achuthan, A., Janoyan, K. D., & Marzocca, P. (2015). Recursive wind speed forecasting based on Hammerstein Auto-Regressive model. Applied Energy, pp. 145, 191–197. https://doi.org/10.1016/j.apenergy.2015.02.032 https://www.sciencedirect.com/science/article/abs/pii/S0306261915002093 Makridakis, S., Hogarth, R. M., & Gaba, A. (2009). Forecasting and uncertainty in the economic and business world. International Journal of Forecasting, 25(4), 794-812. https://doi.org/10.1016/j.ijforecast.2009.05.012 Makridakis, S., et al (2020). The benefits of systematic forecasting for organizations: The UFO project. Foresight, 58(Fall), 45-56. Makridakis, S., Spiliotis, E., & Assimakopoulos, V., (2020). The M4 Competition: 100,00 time series and 61 forecasting methods. International Journal of Forecasting. 36. 5474. https://doi.org/10.1016/j.ijforecast.2019.04.014 33 Makridakis, S., Spiliotis, E., & Assimakopoulos, V., (2022). M5 accuracy competition: Results, findings, and conclusions. International Journal of Forecasting. 38. 13461364. https://doi.org/10.1016/j.ijforecast.2021.11.013 Mascle, C., & Gosse, J. (2014). Inventory management maximization based on sales forecast: case study. Production Planning & Control, 25(12), 1039–1057. https://doi.org/10.1080/09537287.2013.805343 https://www.tandfonline.com/doi/abs/10.1080/09537287.2013.805343?journalCode=t ppc20 Ministerio de Hidrocarburos y Energías (2023). La subvención no se toca y el Gobierno garantiza la estabilidad económica. Ministerio de Hidrocarburos y Energías. Retrieved on October 13, 2023 from https://www.mhe.gob.bo/2023/04/22/la-subvencion-no-setoca-y-el-gobierno-garantiza-la-estabilidad-economica/ Mircetic, D., Rostami-Tabar, B., Nikolicic, S., & Maslaric, M. (2022). Forecasting hierarchical time series in supply chains: an empirical investigation. International Journal of Production Research, 60(8), 2514–2533. https://doiorg.ezproxy.myucwest.ca/10.1080/00207543.2021.1896817 Nielsen, J. K., Gleisner, H., Syndergaard, S., & Lauritsen, K. B. (2022). Estimation of refractivity uncertainties and vertical error correlations in collocated radio occultations, radiosondes, and model forecasts. Atmospheric Measurement Techniques, 15(20), 6243–6256. https://doi.org/10.5194/amt-15-6243-2022 Office for National Statistics. Pounds data: Total retail sales. https://www.ons.gov.uk/businessindustryandtrade/retailindustry/datasets/poundsdatat otalretailsales Ping Ai, Chuansheng Xiong, Ke Li, Yanhong Song, Shicheng Gong, & Zhaoxin Yue. (2023). Effect of Data Characteristics Inconsistency on Medium and Long-Term Runoff 34 Forecasting by Machine Learning. IEEE Access, 11, 11601–11612. https://doiorg.ezproxy.myucwest.ca/10.1109/ACCESS.2023.3241995 https://ieeexplore.ieee.org/document/10035975 Prescott, R. B. (1922). Law of Growth in Forecasting Demand. Journal of the American Statistical Association, 18(140), 471–479. https://doiorg.ezproxy.myucwest.ca/10.2307/2276919 Rajaram, K., & Ahmadi, R. (2003). Flow Management to Optimize Retail Profits at Theme Parks. Operations Research, 51(2), 175–184. https://www.jstor.org/stable/4132399 Ramos, P., Santos, N., & Rebelo, R. (2015). Performance of state space and ARIMA models for consumer retail sales forecasting. Robotics and Computer-integrated Manufacturing, 34, 151–163. https://doi.org/10.1016/j.rcim.2014.12.015 Ren, S., Chan, H., Ram, P. (2016). A Comparative Study on Fashion Demand Forecasting Models with Multiple Sources of Uncertainty. Springer. 257. 335-355. https://doi.org/10.1007/s10479-016-2204-6 Rockeman, O. (2022). Supply chain latest: inventory waste totals $163 billion a year. Bloomberg.com. https://www.bloomberg.com/news/newsletters/2022-11-10/supply-chain-latestinventory-waste-totals-163-billion-a-year Rogermann, K. C., & Shahtelab, A. A. (2023). Consumer retail industry and profitability: The role of analytics. Business & IT, XIII(1), 150–159. https://doi.org/10.14311/bit.2023.01.17 http://bit.fsv.cvut.cz/issues/01-23/full_01-23_17.pdf Rothe, J. T. (1978). Effectiveness of sales forecasting methods. Industrial Marketing Management, 7(2), 114–118. https://doi.org/10.1016/0019-8501(78)90058-5 35 Smolak, K., et al (2020). Applying human mobility and water consumption data for shortterm water demand forecasting using classical and machine learning models. Urban Water Journal. 17(1). 32-42.https://doi.org/10.1080/1573062X.2020.1734947 Tesalonika F. Dagi, Jenny Morasa, & Victorina Z. Tirayoh. (2023). Raw Material Inventory Analysis Using the Economic Order Quantity (EOQ) Method To Maximize Profits at UD. Panca Putra. Indonesian Journal of Business Analytics, 3(1). https://doiorg.ezproxy.myucwest.ca/10.55927/ijba.v3i1.3307 https://journal.formosapublisher.org/index.php/ijba/article/view/3307 Technavio Research. (5 C.E. 2020). Research Report with COVID-19 Forecasts-Technavio Evaluates the Impact of Rising Demand for Rigid Plastic Packaging from Food & Beverage Industry in its Rigid Plastic Packaging Market Analysis for the Forecast Period 2020-2024. Business Wire (English). https://search.ebscohost.com/login.aspx?direct=true&AuthType=sso&db=bwh&AN= bizwire.bw26636976&authtype=sso&custid=ns012452&site=edslive&scope=site&custid=ns012452 TEVEC Systems. (2017). Retail Sales Forecasting - Short term forecasting to optimize instore inventories. Kaggle. https://www.kaggle.com/datasets/tevecsystems/retail-salesforecasting?select=mock_kaggle.csv Shcherbakov, M., et al (2013). A Survey of Forecast Error Measures. World Applied Sciences Journal. 24. 171-176. Wheeler, F. C. (1937). Progress in Marketing Research . Journal of Marketing, 1(4), 401– 413. https://doi-org.ezproxy.myucwest.ca/10.2307/1246797 36 Zaji, A. H., Bonakdari, H., & Gharabaghi, B. (2019). Developing an AI-based method for river discharge forecasting using satellite signals. Theoretical and Applied Climatology, 138, 347-362. 37 APPENDIX A Plot and Correlation of Dataset 1 - Train Values and Predictions Figure A.1 Dataset 1 and Moving Average Predictions (36 last values) Figure A.2 Correlation between Dataset 1 and Moving Average Predicted Values Figure A.3 Dataset 1 and Exponential Smoothing Predictions (36 last values) Figure A.4 Correlation between Dataset 1 and Exponential Smoothing Predicted Values Figure A.5 Dataset 1 and Autoregressive Predictions (36 last values) Figure A.6 Correlation between Dataset 1 and Autoregression Predicted Values 38 Figure A.7 Dataset 1 and Naive Predictions (36 last values) Figure A.8 Correlation between Dataset 1 and Naive Predicted Values Figure A.9 Dataset 1 and ARIMA Predictions (36 last values) Figure A.10 Correlation between Dataset 1 and ARIMA Predicted Values Figure A.11 Dataset 1 and SARIMA Predictions (36 last values) Figure A.12 Correlation between Dataset 1 and SARIMA Predicted Values Figure A.13 Dataset 1 and MLP Predictions (36 last values) Figure A.14 Correlation between Dataset 1 and MLP Predicted Values 39 Figure A.15 Dataset 1 and Decision Tree Predictions (36 last values) Figure A.16 Correlation between Dataset 1 and Decision Tree Predicted Values Figure A.17 Dataset 1 and SVR Predictions (36 last values) Figure A.18 Correlation between Dataset 1 and SVR Predicted Values Figure A.19 Dataset 1 and LSTM Predictions (36 last values) Figure A.20 Correlation between Dataset 1 and LSTM Predicted Values 40 Plot and Correlation of Dataset 1 - Test Values and Predictions Figure A.21 Dataset 1 and Moving Average Predictions (36 last values) Figure A.22 Correlation between Dataset 1 and Moving Average Predicted Values Figure A.23 Dataset 1 and Exponential Smoothing Predictions (36 last values) Figure A.24 Correlation between Dataset 1 and Exponential Smoothing Predicted Values Figure A.25 Dataset 1 and Autoregressive Predictions (36 last values) Figure A.26 Correlation between Dataset 1 and Autoregressive Predicted Values 41 Figure A.27 Dataset 1 and Naive Predictions (36 last values) Figure A.28 Correlation between Dataset 1 and Naive Predicted Values Figure A.29 Dataset 1 and ARIMA Predictions (36 last values) Figure A.30 Correlation between Dataset 1 and ARIMA Predicted Values Figure A.31 Dataset 1 and SARIMA Predictions (36 last values) Figure A.32 Correlation between Dataset 1 and SARIMA Predicted Values Figure A.33 Dataset 1 and MLP Predictions (36 last values) Figure A.34 Correlation between Dataset 1 and MLP Predicted Values 42 Figure A.35 Dataset 1 and Decision Tree Predictions (36 last values) Figure A.36 Correlation between Dataset 1 and Decision Tree Predicted Values Figure A.37 Dataset 1 and SVR Predictions (36 last values) Figure A.38 Correlation between Dataset 1 and SVR Predicted Values Figure A.39 Dataset 1 and LSTM Predictions (36 last values) Figure A.40 Correlation between Dataset 1 and LSTM Predicted Values 43 APPENDIX B Plot and Correlation of Dataset 2 - Train Values and Predictions Figure B.1 Dataset 2 and Moving Average Predictions (36 last values) Figure B.2 Correlation between Dataset 2 and Moving Average Predicted Values Figure B.3 Dataset 2 and Exponential Smoothing Predictions (36 last values) Figure B.4 Correlation between Dataset 2 and Exponential Smoothing Predicted Values Figure B.5 Dataset 2 and Autoregressive Predictions (36 last values) Figure B.6 Correlation between Dataset 2 and Autoregression Predicted Values 44 Figure B.7 Dataset 2 and Naive Predictions (36 last values) Figure B.8 Correlation between Dataset 2 and Naive Predicted Values Figure B.9 Dataset 2 and ARIMA Predictions (36 last values) Figure B.10 Correlation between Dataset 2 and ARIMA Predicted Values Figure B.11 Dataset 2 and SARIMA Predictions (36 last values) Figure B.12 Correlation between Dataset 2 and SARIMA Predicted Values Figure B.13 Dataset 2 and MLP Predictions (36 last values) Figure B.14 Correlation between Dataset 2 and MLP Predicted Values 45 Figure B.15 Dataset 2 and Decision Tree Predictions (36 last values) Figure B.16 Correlation between Dataset 2 and Decision Tree Predicted Values Figure B.17 Dataset 2 and SVR Predictions (36 last values) Figure B.18 Correlation between Dataset 2 and SVR Predicted Values Figure B.19 Dataset 2 and LSTM Predictions (36 last values) Figure B.20 Correlation between Dataset 2 and LSTM Predicted Values 46 Plot and Correlation of Dataset 2 - Test Values and Predictions Figure B.21 Dataset 2 and Moving Average Predictions (36 last values) Figure B.22 Correlation between Dataset 2 and Moving Average Predicted Values Figure B.23 Dataset 2 and Exponential Smoothing Predictions (36 last values) Figure B.24 Correlation between Dataset 2 and Exponential Smoothing Predicted Values Figure B.25 Dataset 2 and Autoregressive Predictions (36 last values) Figure B.26 Correlation between Dataset 2 and Autoregressive Predicted Values 47 Figure B.27 Dataset 2 and Naive Predictions (36 last values) Figure B.28 Correlation between Dataset 2 and Naive Predicted Values Figure B.29 Dataset 2 and ARIMA Predictions (36 last values) Figure B.30 Correlation between Dataset 2 and ARIMA Predicted Values Figure B.31 Dataset 2 and SARIMA Predictions (36 last values) Figure B.32 Correlation between Dataset 2 and SARIMA Predicted Values Figure B.33 Dataset 2 and MLP Predictions (36 last values) Figure B.34 Correlation between Dataset 2 and MLP Predicted Values 48 Figure B.35 Dataset 2 and Decision Tree Predictions (36 last values) Figure B.36 Correlation between Dataset 2 and Decision Tree Predicted Values Figure B.37 Dataset 2 and SVR Predictions (36 last values) Figure B.38 Correlation between Dataset 2 and SVR Predicted Values Figure B.39 Dataset 2 and LSTM Predictions (36 last values) Figure B.40 Correlation between Dataset 2 and LSTM Predicted Values 49 APPENDIX C Plot and Correlation of Dataset 3 - Train Values and Predictions Figure C.1 Dataset 3 and Moving Average Predictions (60 last values) Figure C.3 Correlation between Dataset 3 and Moving Average Predicted Values Figure C.3 Dataset 3 and Exponential Smoothing Predictions (60 last values) Figure C.4 Correlation between Dataset 3 and Exponential Smoothing Predicted Values Figure C.5 Dataset 3 and Autoregressive Predictions (60 last values) Figure C.6 Correlation between Dataset 3 and Autoregression Predicted Values 50 Figure C.7 Dataset 3 and Naive Predictions (60 last values) Figure C.8 Correlation between Dataset 3 and Naive Predicted Values Figure C.9 Dataset 3 and ARIMA Predictions (60 last values) Figure C.10 Correlation between Dataset 3 and ARIMA Predicted Values Figure C.11 Dataset 3 and SARIMA Predictions (60 last values) Figure C.12 Correlation between Dataset 3 and SARIMA Predicted Values Figure C.13 Dataset 3 and MLP Predictions (60 last values) Figure C.14 Correlation between Dataset 3 and MLP Predicted Values 51 Figure C.15 Dataset 3 and Decision Tree Predictions (60 last values) Figure C.16 Correlation between Dataset 3 and Decision Tree Predicted Values Figure C.17 Dataset 3 and SVR Predictions (60 last values) Figure C.18 Correlation between Dataset 3 and SVR Predicted Values Figure C.19 Dataset 3 and LSTM Predictions (60 last values) Figure C.20 Correlation between Dataset 3 and LSTM Predicted Values 52 Plot and Correlation of Dataset 3 - Test Values and Predictions Figure C.21 Dataset 3 and Moving Average Predictions (60 last values) Figure C.22 Correlation between Dataset 3 and Moving Average Predicted Values Figure C.23 Dataset 3 and Exponential Smoothing Predictions (60 last values) Figure C.24 Correlation between Dataset 3 and Exponential Smoothing Predicted Values Figure C.25 Dataset 3 and Autoregressive Predictions (60 last values) Figure C.26 Correlation between Dataset 3 and Autoregressive Predicted Values 53 Figure C.27 Dataset 3 and Naive Predictions (60 last values) Figure C.28 Correlation between Dataset 3 and Naive Predicted Values Figure C.29 Dataset 3 and ARIMA Predictions (60 last values) Figure C.30 Correlation between Dataset 3 and ARIMA Predicted Values Figure C.31 Dataset 3 and SARIMA Predictions (60 last values) Figure C.32 Correlation between Dataset 3 and SARIMA Predicted Values Figure C.33 Dataset 3 and MLP Predictions (60 last values) Figure C.34 Correlation between Dataset 3 and MLP Predicted Values 54 Figure C.35 Dataset 3 and Decision Tree Predictions (60 last values) Figure C.36 Correlation between Dataset 3 and Decision Tree Predicted Values Figure C.37 Dataset32 and SVR Predictions (60 last values) Figure C.38 Correlation between Dataset 3 and SVR Predicted Values Figure C.39 Dataset 3 and LSTM Predictions (60 last values) Figure C.40 Correlation between Dataset 3 and LSTM Predicted Values 55 APPENDIX D Plot and Correlation of Dataset 4 - Train Values and Predictions Figure D.1 Dataset 4 and Moving Average Predictions (36 last values) Figure D.2 Correlation between Dataset 4 and Moving Average Predicted Values Figure D.3 Dataset 4 and Exponential Smoothing Predictions (36 last values) Figure D.4 Correlation between Dataset 4 and Exponential Smoothing Predicted Values Figure D.5 Dataset 4 and Autoregressive Predictions (36 last values) Figure D.6 Correlation between Dataset 4 and Autoregression Predicted Values 56 Figure D.7 Dataset 4 and Naive Predictions (36 last values) Figure D.8 Correlation between Dataset 4 and Naive Predicted Values Figure D.9 Dataset 4 and ARIMA Predictions (36 last values) Figure D.10 Correlation between Dataset 4 and ARIMA Predicted Values Figure D.11 Dataset 4 and SARIMA Predictions (36 last values) Figure D.12 Correlation between Dataset 4 and SARIMA Predicted Values Figure D.13 Dataset 4 and MLP Predictions (36 last values) Figure D.14 Correlation between Dataset 4 and MLP Predicted Values 57 Figure D.15 Dataset 4 and Decision Tree Predictions (36 last values) Figure D.16 Correlation between Dataset 4 and Decision Tree Predicted Values Figure D.17 Dataset 4 and SVR Predictions (36 last values) Figure D.18 Correlation between Dataset 4 and SVR Predicted Values Figure D.19 Dataset 4 and LSTM Predictions (36 last values) Figure D.20 Correlation between Dataset 4 and LSTM Predicted Values 58 Plot and Correlation of Dataset 4 - Test Values and Predictions Figure D.21 Dataset 4 and Moving Average Predictions (36 last values) Figure D.22 Correlation between Dataset 4 and Moving Average Predicted Values Figure D.23 Dataset 4 and Exponential Smoothing Predictions (36 last values) Figure D.24 Correlation between Dataset 4 and Exponential Smoothing Predicted Values Figure D.25 Dataset 4 and Autoregressive Predictions (36 last values) Figure D.26 Correlation between Dataset 4 and Autoregressive Predicted Values 59 Figure D.27 Dataset 4 and Naive Predictions (36 last values) Figure D.28 Correlation between Dataset 4 and Naive Predicted Values Figure D.29 Dataset 4 and ARIMA Predictions (36 last values) Figure D.30 Correlation between Dataset 4 and ARIMA Predicted Values Figure D.31 Dataset 4 and SARIMA Predictions (36 last values) Figure D.32 Correlation between Dataset 4 and SARIMA Predicted Values Figure D.33 Dataset 4 and MLP Predictions (36 last values) Figure D.34 Correlation between Dataset 4 and MLP Predicted Values 60 Figure D.35 Dataset 4 and Decision Tree Predictions (36 last values) Figure D.36 Correlation between Dataset 4 and Decision Tree Predicted Values Figure D.37 Dataset 4 and SVR Predictions (36 last values) Figure D.38 Correlation between Dataset 4 and SVR Predicted Values Figure D.39 Dataset 4 and LSTM Predictions (36 last values) Figure D.40 Correlation between Dataset 4 and LSTM Predicted Values 61 APPENDIX E Table E.1 Correlation and Error Metrics obtained for Dataset 1 Method MSE RMSE Moving Average Exponential Smoothing Autoregressive Naïve ARIMA SARIMA Multilayer Perceptron Decision Tree Support Vector Regression Long-Short Term Memory 350,902,765 364,457,228 178,699,518 601,088,072 744,139,936 635,333,345 208,396,027 87,577,504 199,925,667 220,763,521 18,732 19,091 13,368 24,517 27,279 25,206 14,436 9,358 14,140 14,858 Moving Average Exponential Smoothing Autoregressive Naïve ARIMA SARIMA Multilayer Perceptron Decision Tree Support Vector Regression Long-Short Term Memory 855,485,447 877,695,131 802,764,215 1,206,761,742 2,388,334,371 598,985,584 671,806,769 2,771,724,142 1,307,716,383 655,309,255 29,249 29,626 28,333 34,738 48,871 24,474 25,919 52,647 36,162 25,599 MAE Train Set 15,061 15,282 9,942 19,431 17,262 15,235 10,896 7,102 10,585 11,053 Test Set 24,594 25,123 22,635 28,204 40,451 19,369 19,323 44,399 28,687 19,666 R^2 Correlation Peak Similarity 0.850 0.853 0.887 0.762 0.759 0.784 0.868 0.944 0.873 0.860 0.922 0.923 0.942 0.873 0.871 0.886 0.941 0.972 0.935 0.936 0.127 0.000 0.735 0.000 0.169 0.331 0.701 0.641 0.675 0.624 0.345 0.277 0.398 0.238 0.000 0.604 0.385 -1.537 -0.197 0.400 0.588 0.527 0.631 0.488 -0.008 0.777 0.629 -0.075 0.649 0.636 0.111 0.000 0.667 0.000 0.255 0.585 0.667 0.179 0.667 0.641 Table E.2 Correlation and Error Metrics obtained for Dataset 2 Method MSE Moving Average Exponential Smoothing Autoregressive Naïve ARIMA SARIMA Multilayer Perceptron Decision Tree Support Vector Regression Long-Short Term Memory 5,383,329,552 25,431,330,666 4,557,946,136 13,396,884,770 1,965,451,061 24,573,233,649 4,895,776,735 888,048,931 4,761,019,932 2,016,912,094 Moving Average Exponential Smoothing Autoregressive Naïve ARIMA SARIMA Multilayer Perceptron Decision Tree Support Vector Regression Long-Short Term Memory 25,431,330,666 4,557,946,136 13,396,884,770 1,965,451,061 24,573,233,649 4,895,776,735 14,186,540,018 42,218,789,296 39,854,418,586 21,379,766,223 RMSE MAE Train Set 73,371 58,633 67,513 53,194 44,333 34,831 69,970 54,340 67,711 51,063 55,450 40,100 44,607 34,953 29,800 22,318 69,000 54,543 44,910 35,532 Test Set 159,472 123,593 115,745 90,096 156,759 111,857 119,107 89,348 203,717 171,031 129,521 86,268 187,119 130,763 205,472 149,315 199,636 156,970 146,218 105,259 62 R^2 Correlation Peak Similarity 0.859 0.881 0.940 0.880 0.888 0.924 0.940 0.973 0.855 0.939 0.927 0.939 0.970 0.938 0.942 0.961 0.970 0.986 0.929 0.970 0.073 0.000 0.456 0.000 0.275 0.431 0.456 0.478 0.356 0.456 0.150 0.536 0.250 0.560 0.002 0.395 -0.103 -0.330 -0.256 0.326 0.387 0.732 0.500 0.748 0.044 0.629 0.354 0.127 0.625 0.590 0.027 0.000 0.233 0.000 0.385 0.500 0.200 0.100 0.300 0.267 Table E.3 Correlation and Error Metrics obtained for Dataset 3 Method MSE RMSE Moving Average Exponential Smoothing Autoregressive Naïve ARIMA SARIMA Multilayer Perceptron Decision Tree Support Vector Regression Long-Short Term Memory 4,380 3,835 3,254 4,133 3,250 3,142 3,350 2,406 4,349 1,008 66 62 57 64 57 56 58 49 66 32 Moving Average Exponential Smoothing Autoregressive Naïve ARIMA SARIMA Multilayer Perceptron Decision Tree Support Vector Regression Long-Short Term Memory 6,054 5,310 5,083 5,622 11,002 11,335 4,855 5,404 8,929 8,716 78 73 71 75 105 106 70 74 94 93 MAE Train Set 44 40 41 41 41 39 38 33 46 19 Test Set 57 52 51 53 76 78 49 52 67 69 R^2 Correlation Peak Similarity 0.224 0.325 0.338 0.334 0.337 0.364 0.318 0.510 0.114 0.795 0.473 0.570 0.581 0.578 0.580 0.603 0.594 0.714 0.377 0.895 0.158 0.000 0.032 0.000 0.031 0.188 0.057 0.098 0.085 0.688 0.345 0.431 0.448 0.439 0.010 0.034 0.419 0.354 -0.068 -0.042 0.587 0.657 0.670 0.663 0.098 0.184 0.677 0.638 0.531 0.504 0.023 0.000 0.044 0.000 0.000 0.607 0.068 0.090 0.083 0.316 Table E.4 Correlation and Error Metrics obtained for Dataset 4 Description MSE RMSE MAE R^2 Correlation Peak Similarity 2,324,985 2,299,969 377,052 3,465,873 1,101,831 1,598,260 597,236 189,849 2,839,404 342,954 0.629 0.599 0.958 0.227 0.782 0.649 0.933 0.995 0.242 0.952 0.793 0.774 0.979 0.477 0.884 0.805 0.966 0.998 0.810 0.976 0.150 0.000 0.761 0.000 0.610 0.642 0.633 0.600 0.733 0.758 2,860,426 2,866,944 422,850 4,309,805 1,048,673 1,040,379 693,868 1,138,086 3,561,140 399,077 0.082 0.016 0.915 0.106 0.839 0.851 0.852 0.595 -1.164 0.901 0.287 0.125 0.957 -0.325 0.916 0.922 0.925 0.784 0.428 0.949 0.204 0.000 0.711 0.000 0.706 0.765 0.673 0.490 0.653 0.735 Train Moving Average Exponential Smoothing Autoregressive Naive ARIMA SARIMA Multilayer Perceptron Decision Tree Support Vector Regression Long-Short Term Memory 6,405,591,436,236 7,278,920,766,915 711,733,243,340 18,017,971,388,424 4,206,419,621,162 15,135,797,491,007 1,152,329,432,707 77,769,857,766 13,063,263,064,206 822,337,568,033 2,530,927 2,697,948 843,643 4,244,758 2,050,956 3,890,475 1,073,466 278,872 3,614,314 906,828 Moving Average Exponential Smoothing Autoregressive Naive ARIMA SARIMA Multilayer Perceptron Decision Tree Support Vector Regression Long-Short Term Memory 9,382,766,623,955 10,632,533,278,564 875,828,216,859 27,154,703,282,156 2,164,370,179,638 2,268,999,820,563 1,514,201,441,978 4,135,212,548,222 22,091,323,687,702 1,010,602,178,821 3,063,130 3,260,757 935,857 5,211,017 1,471,180 1,506,320 1,230,529 2,033,522 4,700,141 1,005,287 Test 63