Often when I read ML papers the authors compare their results against a benchmark (e.g. using RMSE, accuracy, …) and say “our results improved with our new method by X%”. Nobody makes a significance test if the new method Y outperforms benchmark Z. Is there a reason why? Especially when you break your results down e.g. to the anaylsis of certain classes in object classification this seems important for me. Or do I overlook something?
It depends what book/paper you pick up. Anyone who comes at it from a probabilistic background is more likely to discuss statistical significance. For example, the Hastie, Tibshirani, and Friedman textbook discusses it in detail, and they consider it in many of their examples, e.g. the neural net chapter uses boxplots in all the examples