• 0 Posts
  • 9 Comments
Joined 11 months ago
cake
Cake day: October 28th, 2023

help-circle


  • You shouldn’t do that, for multiple reasons (I can elaborate if needed). Your model is a binary file, a set of weights, basically no matter what you train. Once you write it to disk, typically with built-in serialization (e.g. pickle for Scikit-learn, or .pth format for PyTorch), there are lots of frameworks to deploy it.

    The easiest to use and the most generic one is BentoML, which will package the code into a Docker image and automatically deploy with REST and gRPC endpoints. It has a lot of integrations, and is probably the most popular option. There are also more specialized solutions, e.g. TorchServe.

    However, if you care about inference speed, you should also compile or optimize your model for the target architecture before packaging it for the API and target runtime, e.g. with ONNX, Apache TVM, Treelite or NVidia TensorRT.





  • (I assume you are talking about convolutional models in the context of computer vision)

    I had similar constraints (embedded devices in specific environment) and we didn’t use deep learning at all. Instead, we used classical image descriptors from OpenCV like color histograms, HOG, SIFT etc. with SVM as classifier. It can work surprisingly well for many problems, and is blazing fast.

    Consider how you can make the problem easier. Maybe you can do binary classification instead of multiclass, or use only grayscale images. Anything that will make the task itself easier will be a good improvement.

    If your problem absolutely requires neural networks, I would use all tools available:

    1. Skip connections, either residuals or to all layers (like DenseNet)
    2. Sharpness-Aware Minimizer (SAM) or some of its variants
    3. Label smoothing
    4. Data augmentation with a few really problem-relevant transformations
    5. Extensive hyperparameter tuning with Gaussian Process or multivariate Tree Parzen Estimator (see e.g. Optuna)
    6. You can concatenate those classical features like color histograms or HOG to the flattened output of the CNN, before the MLP head. This way you reduce what CNN needs to learn, so you can get away with less parameters
    7. Go for more convolutional layers instead of large MLP head. Convolutional layers eat up a lot less of parameter budget than MLPs.

    You can also consider training a larger network and then applying compression techniques, such as knowledge distillation, quantization or pruning.