Missing Data 2:#

Graph theory foundation#

  • A DAG

  • shapes are nodes

    • nodes generally represent a random variable

  • nodes are connected with edges

    • edges may be directed (with an arrow)

  • path is a sequence of edges

  • a cycle is a path that returns to a given node twice

  • we will focus on acyclic graphs

  • directed edges connect parent nodes to child nodes (follwoign the arrow)

  • Why graphs: useful representation of joint distribtions

  • d-connected: two nodes are d-connected if there is a connected path without a collider

  • d-separation: independent through a collider

  • collider is when arrows flip

Missingness graphs#

  • x,y are variables

  • Y^* is a proxy for y

  • R_y : causal mechanism for missingness of y*

Recoverabiilty for MCAR#

Discussion#

  • proxy

  • example with ocean data temp sensor, cloud cover images

For next week#

Choose one: https://artemiss-workshop.github.io/#program

Information Theoretic Approaches for Testing Missingness in Predictive Models https://openreview.net/forum?id=6Y05VJfGlFM