Optimal Time Series Length For Cryptocurrency Modeling
When modeling any time series, consideration of optimal parameters must be made across their entire model production pipeline. Apart from the hyperparameters that determine how a model treats data, the time series data itself should be relevant, clean, and of reasonable length. When it comes to cryptocurrency modeling, the crypto time series used to train models should be treated no differently.
One of the major challenges with modeling cryptocurrency time series is there’s not always a sufficient amount of data to model them. This is especially true when modeling low frequencies such as daily, weekly, and monthly time series.
So, I decided to perform an experiment to find the optimal time series length for cryptocurrency modeling. Of course, the optimal time series length is heavily dependant on the type of model, time series frequency, and cryptocurrency under question, so the results of this experiment may differ from your own or others’ studies.
Why Create Cryptocurrency Time Series Models?
It probably comes to no surprise to you that many people would love to model cryptocurrency time series as accurately as possible. While “all models are wrong,” that doesn’t mean some can’t be useful. When it comes to testing algorithmic trading strategies as well as attempting to forecast future returns and volatility, cryptocurrency time series models can help us out tremendously.
Multiple Backtesting Scenarios
Building algorithmic trading strategies that trade cryptocurrencies can cause potential catastrophe for your brokerage account if they’re not properly tested. The most popular way to test algorithmic trading strategies is to backtest them in as realistic an environment as possible through the use of backtesting software.
Unfortunately, it’s all too common to backtest strategies with a single historical time series. For example, let’s say you build an algo that buys and sells Bitcoin based on some technical indicator, you backtest it against its historical time series, and discover it would’ve done phenomenally well. But once you push that crypto trading strategy into the market, it either shows a sub-par performance or completely fails.
This is an issue of overfitting and can be solved with backtesting your crypto trading strategy across tens to hundreds of thousands of possible scenarios. The way to generate these scenarios is by properly modeling cryptocurrency time series so backtest results become more realistic and robust.
Forecasting Future Returns and Volatility
As cryptocurrency traders, we’re all familiar with technical analysis and charting techniques, which can be an effective way for some to forecast future price action. However, these methods lack a quantitative foundation when it comes to forecasting, which can be attained through the use of cryptocurrency time series models.
Just like the scenarios used for robust backtesting, we can use the very same scenarios to forecast future cryptocurrency returns and volatility. By running a massive number of possible scenarios a particular crypto time series could take, we have the ability to pull various statistics, like average returns, standard deviation (i.e., volatility), Sharpe Ratio, VaR, and much more.
Why Time Series Length Matters: Overfitting and Bias
As I mentioned before, there are plenty of parameters to consider when building cryptocurrency time series models. However, the quality of data is arguably the most important because “garbage in, garbage out.” Also, time series models can be sensitive to the quantity of data provided to them due to the potential of overfitting and bias.
Give a time series model too much data and you risk producing an overfit model. The reason for this is the model attempts to fit a potentially complex time series that extends across a large number of data points. When it comes to a cryptocurrency time series, these data points can differ wildly depending on how far back you look in the past.
Give a time series too little data and you’ll find yourself with a biased model. With too few data points, a time series model has little to work with and only has a shallow understanding of the historical cryptocurrency time series under consideration. This is, of course, the opposite problem we have when overfitting.
Cryptocurrency time series models require a healthy dose of data that spans far enough in the past to give the models something to work with. This allows our models to more accurately represent the distant past while not ignoring what’s been going on recently. Thus, we need to find an optimal time series length for our cryptocurrency models so we can hit that sweet spot.
Experiment: 3 Years vs. 1 Year Time Series Length
I ran an experiment that helps uncover the optimal time series length for modeling cryptocurrencies starting with testing for 3 years and 1 year of time series data. While parameters in any model should be periodically optimized and tested, this experiment aims to present an example of how to go about doing that.
The experiment tested four time series models for Bitcoin (BTC), Ethereum (ETH), Litecoin (LTC), and Ripple (XRP), respectively. Each time series model was trained using data (training data) spanning 3 years and 1 year for a large number of hyperparameter sets.
While being trained across each hyperparameter set using the training data, the models were tested using 3 months of out-of-sample data (validation data) per hyperparameter set. The hyperparameter set that produced the best model per cryptocurrency based on the chosen score – the root mean square error (RMSE) – were selected for building out 100,000 scenarios per cryptocurrency over a 3-month time horizon.
The last stage of the experiment used another 3-month out-of-sample data set (testing data) to test the accuracy of the cryptocurrency scenarios. The average logarithmic returns of each set of 100,000 cryptocurrency scenarios that spanned 3 months were tested against the testing data using the chosen score.
It’s important to note that none of the training, validation, and testing data overlap. This removes any possibility for our cryptocurrency models to “peak” into the future while also forcing the time series length to be the independent variable in this experiment.
The results of the experiment show that, overall, 1 year’s worth of data for each cryptocurrency model provided more value than 3 years in terms of training time and forecasting ability.
When looking at model validation scores, which were the scores produced when training and optimizing for the best model hyperparameters, we find that a time series length of 1 year works best overall.
While there’s a negligible difference for ETH and LTC, the improvement for BTC and XRP is clear. On average, the percentage difference between the 3-years and 1-year score came out to be about 3.8%.
I also wanted to make sure our cryptocurrency models were able to outperform the baseline scores from the persistence model, which it clearly does in both cases.
Not surprisingly, the run time was reduced during the model training, optimization, and validation process. This is less exciting but still useful.
What’s more exciting are the results that come from the final stage of the experiment where we test the cryptocurrency models’ ability to forecast future prices and returns.
You can see the massive overall improvement the models have when using 1 year’s worth of data as opposed to 3 years, especially when looking at BTC and XRP. This also alludes to the fact that the 3-year models were overfitting since the difference between validation scores and testing scores is much larger during testing. The average score improvement was 33%.
Finally, it’s important to compare the final test results between the two time series length variables back to back.
This shows that the final verdict is that a cryptocurrency time series length of 1 year outperforms a length of 3 years for these cryptocurrencies at this point in time. The difference between these final scores came out to be about 47%.
At this point, you know why modeling cryptocurrency time series is important as well as the importance of time series length when training our models. This experiment will be a never-ending one since parameters for any model change over time, but we can say for sure that crypto time series length of 1 year is a better choice than 3 years.
Based on the results found in this experiment, I’ll be testing even more time series lengths and ensure this process is carried out periodically to ensure optimal time series length for our cryptocurrency models. I also plan on testing other model parameters as well as other models that can be used for future backtesting and forecasting software systems.