

The “ t” comes from requiring that the distributions be no more than a distance t apart in a sense that we’ll define below.
EARTH MOVER DISTANCE FULL
The idea of t-closeness is that the distribution of sensitive data in every group is not too far from the distribution in the full population. This post won’t go into l-diversity because it’s an intermediate step to where we want to go, which is t-closeness. One way to address this shortcoming is l-diversity. That is, the method is subject to a background knowledge attack. Or going the other way around, if you know already know something that stands about a group, this could help you identify the record belonging to an individual. That is, the method is subject to a homogeneity attack. A database could be k-anonymous but reveal information about a group if that group is homogeneous with respect to some field. Even when k is large, k-anonymity might prevent re-identification but still suffer from attribute disclosure.Īnother problem with k-anonymity is that it doesn’t offer group privacy. If k = 1, then k-anonymity offers no anonymity. But as you get more fields, it becomes more likely that a combination of fields is unique. If you have a lot of records and few fields, your value of k could be high. The idea of k-anonymity is that every database record appears at least k times when you restrict your attention to quasi-identifiers. An analogous principle in privacy is that a record preserves privacy if it’s like a lot of other records. There’s an old saying that if you want to hide a tree, put it in a forest.
