• 0 Posts
  • 1 Comment
Joined 1 year ago
cake
Cake day: November 16th, 2023

help-circle
  • Without going into some of the fallacies that people posted in the tread, I’ll share some basic strategies I personally use to validate my work:

    • Bootstrap sampling to train and test the model.
    • Modifying the random seed.
    • using inferential statistics ( if you’re a fan of frequentist statistics then CI or ROPE if you are a fan of Bayesian)

    I repeat the experiment at least 30 times (using small datasets), draw a distribution and analyze the results.

    This is very basic, easy and if someone complains about compute, it can be automated to run overnight on commodity hardware or using a smaller dataset or building a simple benchmark and comparing performance.

    As to OP’s question, I personally feel that ML is more focused on optimizing metrics to achieve a goal, and less focused on inferential analysis or feasibility of results. As an example, I see a majority of kaggle notebooks using logistic regression without checking for its assumptions.