Hey guys, begginers doubt:

I am preparing a dataframe for a machine learning model. The purpose of the model is to predict whether people infected with COVID will die or not.

To do this, I am looking for some conditions and symptoms, such as sore throat, cough, comorbidities, gender, and others, and binarizing them into “yes” or “no” or “male” and “female”.

I have a problem. One of the variables is “pregnant”, but only individuals of the female sex can be pregnant. How can I deal with this variable?

Can I keep it in the dataframe and assign the value “not pregnant” to all male individuals? Or could this harm the model?

  • VoidRippah@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The purpose of the model is to predict whether people infected with COVID will die or not.

    There is no need for all that effort, I can tell you with 100% accuracy that they will day for certain. Just like those who where not infected. /s

    • grudev@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      TIL: ML redditors don’t understand sarcasm or the downsides of “accuracy” in model evaluation.