Abstract |
National statistical offices and other organizations collect data on individual subjects (persons, businesses, organizations), typically while assuring the subject that data pertaining to them will be held confidential. These data provide the raw material for statistical data products (tabular summaries, microdata files comprising data records pertaining to individual subjects, and, potentially, public statistical data bases and statistical query systems) which the statistical office disseminates to multiple, broad user communities. Statistical closure limitation (SDL) refers to the problem and methods for thwarting re-identification of a subject and divulging the subject's confidential data through analysis or manipulation of disseminated data products. SDL methods abbreviate or modify the data product sufficiently to thwart disclosure. SDL problems are typically computationally demanding; several have been shown to be NP-hard. Many SDL methods draw upon statistical, mathematical or optimization theory, but at the same time heuristic and partial approaches abound. Contributions from a Bayesian perspective have been few but are increasing. A strong theoretical connection between definitions of statistical disclosure, measurement of disclosure risk, and evaluation of SDL methods is lacking. This suggests opportunities for Baysian and heirarchical approaches. Selected opportunties and associated SDL methodological issues are discussed. |