I like how the question says, how is it done in industry. Someone from industry answers and it gets downvoted because people saying, “in my pet project I do this…” don’t like how things are done in industry.
I like how the question says, how is it done in industry. Someone from industry answers and it gets downvoted because people saying, “in my pet project I do this…” don’t like how things are done in industry.
Ok there are a lot of people suggesting synthetic data. This strikes me as very odd. Never done it, never seen it done successfully.* If there isn’t enough data from your process, you don’t know it well enough and there are other things you should do. If this is high value, then use Bayesian methods, causal methods, logistic regressions to try and limit your risk of being wrong by understanding which bits of your data you really depend on.
If you have enough modeling capacity to make synthetic data, you can just make the actual model. After all it is a model that generates the synthetic data.
*I joined a project where there was a new energy market product being introduced. The regulator had put out expected prices in a synthetic time series and my employer had made big investment decisions and I was then tasked with improving the trading logic as it wasn’t making money. Turns out the synthetic data had zip all to with the real one.
Yes! I know all the cool kids use PyTorch but I find it is too much boilerplate. I like keras. So this is great news.