Data science is a difficult thing. What’s even more difficult is to set up a team of data scientists to create an AI application for the cryptocurrency market...
At Touch Digital Summit 2019, Oleksandr Honchar, CTO of Neurons Lab, shared his experience and talked about challenges in building fintech projects.How to set up a team and what they meet on the way to making a profitable automated trading system for the crypto markets.
So here goes the plan for the topic:
How and why AI in finance is different from the “normal” one
Typical pitfalls and working solutions (sort of a blueprint)
A workshop: diving deep into the things
Other sources of information on how to set up the teams:
- Microsoft guide of Team Data Science Processes - created for the classical data scientists, but not for the unpredictable problems in this sphere.
- Medium articles - though some blogs become irrelevant over time.
- Scientific papers - there is a big community of mathematicians and computer scientists trying to solve this problem, but their trading algorithms are hardly connected to the real business.
Data Science in trading - Expectations
There seems to be a very simple workflow: hire a team of data scientists and developers, business analysts -> mine some data -> train neural networks or other algorithms -> test them in conditions that are close to the real-life -> if they perform well, make a real-life run -> set up the goals, define the applications for your algorithm, and calculate your estimated profit.
That’s what you expect to see while creating an automated trading system.
But there are problems from the very beginning.
Data Science in trading - Reality
Hire a team
On the modern market, data scientists, computer scientists and mathematicians are not ready to work with financial data.
They know about deep learning, neural networks, pictures, techs and everything.
They know nothing about finance.
They believe they just need a lot of data from the market and can predict the market with deep learning.
It doesn’t work!
The other problem is that engineers have to get domain knowledge too - how to work on financial exchanges.
So, for the first few months, people are really just studying.
Mine some data
What happened to Neurons Lab two years ago:
Exchanges APIs were broken - they simply couldn’t unload the data. The team tried to send some requests to make the order on the market, but it wasn’t working. How do you expect to make automated trading system within conditions like this?
Market patterns were random! And the strategies that are working in classical stock market fields, oil markets or so, are useless here.
Another problem - no fundamental analytics available in the cryptocurrency market (bloggers don’t count). There is nothing like this.
Prices ain’t forecasting - even if you apply machine learning, deep learning, you can’t forecast it. It’s impossible!
Someone who says that they can, they are able to forecast it within one week, but already can’t do it for another week. Deep learning also doesn’t work.
The only luck is to overfit on a subset of bullish market. So modelling data also failed.
You need to test these models really well and keep them updated.
Even if you have 60% accuracy of prediction there are still negative returns, you even may be losing money having this kind of accurate model.
The moment when you put the order on the market, you have to buy something you want at a higher price already.
Cross-validation, a typical model of testing models in Machine Learning, seems like not working here. And the backtests tell you nothing about the future performance, paper trading is horrible.
Exchanges are providing false data. You have to do trades on not one exchange, but ten of them. That’s how it works.
Commissions eating all the profits. And the trained machine learning model stopped working too…
They were learning, so they were given another chance and they redesigned the way they were working on this problem.
Based on Alex’s experience, there comes the solution...
The Fixing Stuff
Where did Neurons Lab fail first? At the very beginning.
Hire a team to achieve success:
BAs / Traders: opportunities search
Data Engineers: information curation
Data Scientists: feature development
Quant Researchers: strategies development
BAs + Data Scientists: backtesting and interpretation
Software Engineers: deployment
BAs / Traders: portfolio and risk management
Then comes data mining. There are always alternative data sources. Whilst scrapping the data, assure you’ll get real-time data sets too.
Another important moment is: when you get input data, what is output data? For what should you use this machine learning model? To predict the price?
But remember that you are what you predict, predicting price itself is not the best option: volatility, returns, correlation are easier.
Training models - The simpler the better: interpretability is king, not complexity of neural network. Market is changing all the time, so is your AI, the same should change your models - retrain it all the time!. For each moment of time, you should have a totally different model, fleet of the models personalized for different situations, because assets are different, personalized approach or calibration is needed.
Backtests -> stress tests - you can define hundreds of stressful situations for your models and check how your algorithms will behave in these situations, based not on the average performance in the past, but on all spreadsheets of the stress tests.
Repetitive hypothesis testing fixes - If you are working on one idea, iterate it all the time trying to make it work, the probability of making a mistake is growing. So you can limit your ideas to ten, for example. Thus, it will help you speed up the process of strategy development. Risks are everywhere. You can use different strategies, weigh them and view them as a portfolio of different approaches to reduce risks and overfitting.
Deploying models on 10 exchanges will result in running at the same time more than a thousand algorithms: 1000+ bots.
And all of the above will finally come to the profit!
Take into account that actually people still make the same mistakes in AI-related projects.
So here are some takeaways:
Don’t be cheap on team
Always try to find alternative or meta-data
Keep it simple, dynamical, you are what you predict
Don’t trust backtests on the average performance
Think of the risks of execution from the very start
Choose a single right metric to optimize
Watch the full video here.