Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators

Jianhao Yuan1* Francesco Pinto1* Adam Davies2* Philip Torr1


1University of Oxford   2University of Illinois Urbana-Champaign   *Equal Contribution  

[Code][PDF]


Demos

1
1
1

Abstract

Neural image classifiers are known to undergo severe performance degradation when exposed to inputs that exhibit covariate shifts with respect to the training distribution. A general interventional data augmentation (IDA) mechanism that simulates arbitrary interventions over spurious variables has often been conjectured as a theoretical solution to this problem and approximated to varying degrees of success. In this work, we study how well modern Text-to-Image (T2I) generators and associated image editing techniques can solve the problem of IDA. We experiment across a diverse collection of benchmarks in domain generalization, ablating across key dimensions of T2I generation, including interventional prompts, conditioning mechanisms, and post-hoc filtering, showing that it substantially outperforms previously state-of-the-art image augmentation techniques independently of how each dimension is configured. We discuss the comparative advantages of using T2I for image editing versus synthesis, also finding that a simple retrieval baseline presents a surprisingly effective alternative, which raises interesting questions about how generative models should be evaluated in the context of domain generalization.

Citation

@article{yuan2022not,
  title={Not just pretty pictures: Text-to-image generators enable interpretable interventions for robust representations},
  author={Yuan, Jianhao and Pinto, Francesco and Davies, Adam and Gupta, Aarushi and Torr, Philip},
  journal={arXiv preprint arXiv:2212.11237},
  year={2022}
}