Missing Data 2:#
Graph theory foundation#
A DAG
shapes are nodes
nodes generally represent a random variable
nodes are connected with edges
edges may be directed (with an arrow)
path is a sequence of edges
a cycle is a path that returns to a given node twice
we will focus on acyclic graphs
directed edges connect parent nodes to child nodes (follwoign the arrow)
Why graphs: useful representation of joint distribtions
d-connected: two nodes are d-connected if there is a connected path without a collider
d-separation: independent through a collider
collider is when arrows flip
Missingness graphs#
x,y are variables
Y^* is a proxy for y
R_y : causal mechanism for missingness of y*
Recoverabiilty for MCAR#
Discussion#
proxy
example with ocean data temp sensor, cloud cover images
For next week#
Choose one: https://artemiss-workshop.github.io/#program
Information Theoretic Approaches for Testing Missingness in Predictive Models https://openreview.net/forum?id=6Y05VJfGlFM