Hey guys, begginers doubt:
I am preparing a dataframe for a machine learning model. The purpose of the model is to predict whether people infected with COVID will die or not.
To do this, I am looking for some conditions and symptoms, such as sore throat, cough, comorbidities, gender, and others, and binarizing them into “yes” or “no” or “male” and “female”.
I have a problem. One of the variables is “pregnant”, but only individuals of the female sex can be pregnant. How can I deal with this variable?
Can I keep it in the dataframe and assign the value “not pregnant” to all male individuals? Or could this harm the model?
There is no need for all that effort, I can tell you with 100% accuracy that they will day for certain. Just like those who where not infected. /s
TIL: ML redditors don’t understand sarcasm or the downsides of “accuracy” in model evaluation.