“Bad for the environment” is a bit too vague IMO to take meaningful action and drive change. Some products use machine learning to detect illegal logging or capture useful environmental data. In those cases, ML is being used to HELP the environment.
So I would zoom in more on the specific issues and externalities you want to resolve.
One simple shortcut is to electrify your entire setup and then ensure that only renewable energy is providing your electricity.
In my experience, it honestly depends on what you’re trying to have the models learn and the task at hand.
- Spend lots of time cleaning up your data and doing feature engineering. Regulated industries like insurance spend significantly more time in feature engineering than tuning fancy models, for example.
- I would recommend trying regression and random forest models first, or even xgboost
Good questions:
- DVC: no new commands to learn (we extend Git) and you don’t need S3.
- Git LFS: we inject useful views into your large files inside GitHub itself (in commits and PR’s) unlike Git LFS (e.g. check this model diff: https://youtu.be/lAyymscJUvI?t=87), we scale to much larger sizes (100 terabytes), and we deduplicate better (Git LFS considers a 1 line change to a large CSV file a new entire file, our technique captures the differences)