Facebook is too optimistic when it comes to Cambridge Analytica extends.

Sorry for this post on a fairly old topic. I just did not get around to write this up.

Several media outlets (e.g., Bloomberg) ran the story that Facebook privacy policy director Stephen Satterfield claimed that “European’s data” may not have been accessed by Cambridge Analytica in an EU hearing.

This claim is nonsense. It is almost a lie - except that he used the weasel word “may”.

For fairly trivial reasons, you can be sure that the data of at least some European’s data has been accessed. Largely because it’s pretty much impossible to perfectly separate U.S. and EU users. People move. People use Proxies. People use wrong locations. People forget to update their location. Location does not imply residency nor citizenship. People may have multiple nationalities. On Facebook, people may make up all of this, too.

Even if Dr. Aleksandr Kogan did try his best to provide only U.S. users to Cambridge Analytica, there ought to be some mistakes. Even if he only provided the data of users he could map to U.S. voter records, there likely is someone in there that has both U.S. and EU citizenship. Or that became a EU citizen since.

Because they shared the data of 87 million people. According to some numbers I found, there are around 70,000 people with U.S. and German citizenship. That is “just” a tiny 0.02% of U.S. citizens. Since Facebook users are younger than average, and in particular kids will often have both citizenships if their parents have different nationalities, we can expect the rate to be higher than that. If you now draw 87 million random samples, the chance of not having at least one of these U.S.-EU-citizens in your sample is effectively 0. This does not even take other EU nationalities into account yet.

Already a random sample of 100,000 U.S. citizens will with very high probability contain at least one E.U. citizen (in fact, at least one German citizen, because I didn’t include any other numbers but the 70,000 above). In 87 million, you likely have even several accounts created for a cat.

Says math.

To anyone trained in statistics, this should be obvious version of the birthday paradoxon.

So yes, I bet that at least one EU citizen was affected.

Just because the data is too big (and too unreliable) to be able to rule this out.

Apparently, neither the U.S. nor Germany (or the EU) even have reliable numbers on how many people have multiple nationalities. So do not trust Facebook (or Kogan’s) data to be better here…