Missing Data 3#
Handling Missing Data in Decision: A probabilistic approach#
key ideas#
A decision tree’s structure and notation
Review of imputation
Predictive value imputation
mean, median or mode
make assumption that features are indpendent
surrogate splits, partition data using another feature to
XG Boost
Expected Predictions:
impute all possible completions as once to avoid strong dist assumptions
consistent for MCAR and MAR
expensive, but density can help reduce
tractably compute the exact expected predictions
loss minimization
Experiments
for a single dataset, outperforms in general
Discussion#
generally easier
given single dataset, of results, how much do we trust this?
what does this provide as an advantage
NP hard
How to miss data?: Reinformcent leanring for environgments iwith high obseration cost#
Key points#
Reinforcement learning
cost associated with making accurate observations
goal directed
RL agent tries to
Problem setting:
\(o \sim p_0(o_t |s_t; \beta)\)
beta is accuracy og obs
r is old reward
Scenario A:
observed cangle vs
Big picture: manipulating how the data collection
Discussion#
survivorship bias?
right left imbalance for figure 3
simple pendulum example helped overcome the background lacking
figures
General#
Try writing out a missingness graph for a problem of choice, some scenario where you imagine there would be missing data, or an example dataset that you can find.