[Discussion] What are best practices when building/training very small models?

Snagnar@alien.top · 10 months ago

[Discussion] What are best practices when building/training very small models?

qalis@alien.top · 10 months ago

(I assume you are talking about convolutional models in the context of computer vision)

I had similar constraints (embedded devices in specific environment) and we didn’t use deep learning at all. Instead, we used classical image descriptors from OpenCV like color histograms, HOG, SIFT etc. with SVM as classifier. It can work surprisingly well for many problems, and is blazing fast.

Consider how you can make the problem easier. Maybe you can do binary classification instead of multiclass, or use only grayscale images. Anything that will make the task itself easier will be a good improvement.

If your problem absolutely requires neural networks, I would use all tools available:

Skip connections, either residuals or to all layers (like DenseNet)
Sharpness-Aware Minimizer (SAM) or some of its variants
Label smoothing
Data augmentation with a few really problem-relevant transformations
Extensive hyperparameter tuning with Gaussian Process or multivariate Tree Parzen Estimator (see e.g. Optuna)
You can concatenate those classical features like color histograms or HOG to the flattened output of the CNN, before the MLP head. This way you reduce what CNN needs to learn, so you can get away with less parameters
Go for more convolutional layers instead of large MLP head. Convolutional layers eat up a lot less of parameter budget than MLPs.

You can also consider training a larger network and then applying compression techniques, such as knowledge distillation, quantization or pruning.