Privacy protection and aggregate health data: a review of tabular cell suppression methods (not) employed in public health data systems

AbstractPublic health research often relies on individuals ’ confidential medical data. Therefore, data collecting entities, such as states, seek to disseminate this medical data as widely as possible while still maintaining the privacy of the individual for legal and ethical reasons. One common way in which this medical data is released is through the us e of Web-based Data Query Systems (WDQS). In this article, we examined WDQS listed in the National Association for Public Health Statistics and Information Systems (NAPHSIS) specifically reviewing them for how they prevent statistical disclosure in queries that produce a tabular response. One of the most common methods to combat this type of disclosure is through the use of suppression, that is, if a cell count in a table is below a certain threshhold, the true value is suppressed. This technique does work to prevent the direct disclosure of small cell counts, however, primary suppression by i tself is not always enough to preserve privacy in tabular data. Here, we present several real examples of tabular response queries that employ suppression, but we are able to infer the values of the suppressed cells, including cells with 1 counts, which could be linked to auxiliary data sources and thus has the possibility to create an identity disclosure. We seek to stimulate awareness of the potential for disclosure of information that individuals may wish to keep private through an online query system. This research is undertaken ...
Source: Health Services and Outcomes Research Methodology - Category: Statistics Source Type: research