We have made 5 critical achievements in our privacy research which represent important theoretical and practical improvements over the existing body of work:
demonstrated empirically for the first time that there are serious leaks of personal health information in Canada and that re-identification risks are real and can be significant if not appropriately managed.We have developed a method to de-identify sample data. While the theoretical concept has existed for some time (called k-map), no one had come up with an estimation method to do so that worked with minimal information loss. This is practically relevant because many (even most) data sets are not population registries. We applied some of these results to de-identify the Canadian discharge abstract database in collaboration with the Canadian Institute for Health Information and a large public data set in theUS.We have developed a new de-identification algorithm that performs better (in terms of information loss and speed) than any existing algorithms. This new algorithm also produces a globally optimal solution.We have developed empirical models from Canadian census data to manage the re-identification risk for geospatial information, and used the models to develop an algorithm to optimally aggregate small areas. While ad-hoc heuristics have been used in the past, the privacy risks from small geographic areas have been a continuous source of problems for many data custodians and users of health data for many years.To solve recurring data sharing problems in the context of public health, we have developed secure computation protocols that would allow real-time surveillance with strong identity protection guarantees. We have completed a pilot implementation of this protocol in the context of infectious organism surveillance in all 611 Ontario-based long term care homes with Public Health Ontario.