Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Description
This paper introduces an experimental recipe for interventional analyses designed to study how training data specifically affects the behavior of language models (LMs). This methodology, termed "Rewriting History," involves a three-stage process: selecting target evaluation items, matching relevant pretraining documents to those items, and then modifying those documents before retraining the model to measure the effects. The authors demonstrate the utility of this approach through case studies on factual knowledge acquisition in LMs, examining how both term cooccurrence and information retrieval (IR) methods relate to a model's ability to learn and report facts. The overall aim is to provide a standardized, flexible method for researchers to test fine-grained hypotheses about the relationship between pretraining data and specific model behaviors, moving beyond solely observational studies.