Social Engineering to the extreme: the Cambridge Analytica case

Details: Category: Social Engineering; Published: Friday, 23 March 2018 15:04

Written by Davide Andreoletti, SUPSI and Enrico Frumento, CEFRIEL

In our post about Privacy Issues in Social Media, we highlighted how our data-driven world is built on the acceptance of a compromise: the value of services offered over the Internet comes at the price of users’ privacy. In fact, the more it is known about users, the higher will be the quality of the offered services. As an example, let us think how much valuable can be a service that suggests the most attended events within a given area. The more users make their location available to the service engine, the more attractive and valuable the service itself will become. Without users allowing to expose this information, that not few people consider a sensitive one, the service inevitably performs poorly.

Online Social Networks have turn to be revolutionary platforms also because of their role of intermediary between users and third business-oriented parties. Such entities perform analysis over users’ data in order to run business campaigns and, in exchange, foster the economy growth of the Social Network itself, thus contributing to realize one of the initial dreams of Internet pioneers: develop a digital network where information are freely accessible for the welfare and the economic growth of the entire society.

Not surprisingly, however, data analysis might be easily misused, for instance by exploiting the detailed information about users toward morally-questionable objectives (e.g., tailored persuasion techniques, for which we refer to another post of this blog). In addition, once disclosed to the acquiring party, data are not anymore in possession of the Social Network and, as such, might be illicitly forwarded to other parties. Given this scenario, we try to briefly explain what are the current capabilities and consequences of such capillary data production and analysis, that is, how much can be done starting from our digital shadow[1]?

Nowadays, the combination of psychology and data analysis is so powerful that 70 likes on Facebook are enough to infer more about a users’ personality than what their friends know about him; 300 likes are enough to know that user more than his partner[2]. Hence, Online Social Networks are such privacy-invasive that there is almost a coincidence between the daily life of a person and his digital shadow. Artificial Intelligence techniques are the today’s state of the art in many data analysis tasks and, whilst already performing excellently, their growth is not expected to stop.

Considering that the Internet is widespread at any level of our lives, with the Online Social Networks acting as a giant magnifying lens on the society, and being particularly suitable to foster the political discussions, the inferences performed on our data should raise serious concerns. Data can be used to easily profile users, to come in contact with them in a very tailored fashion, and, consequently, leveraged to induce them doing something they would not do in their own. To perform social engineering to the extreme, precisely. The more is known about users, the easier is also to employ persuasion techniques to propose them exactly what they like, or are scared of, thus opening the doors for a plague of our time: the widespread diffusion of fake news, which, in turn, have detrimental effects on the democracy of a country. In fact, a group of attackers with sufficient available resources can spread misconceptions and fake news on a global scale to influence the results of huge events by hacking the voters (which ironically has the same effect of vote rigging!).

Very recently, the case of an alleged misuse of data carried out by a company operating in the marketing sector, named Cambridge Analytica, came under the spotlight of the media. It is a case worth discussing because it embodies much of the issues described throughout this post. First of all, some details about the fact: Cambridge Analytica is accused to have been involved in an illicit sharing of data with Aleksandr Kogan, a researcher who developed a Facebook-based application to gather information about users’ personalities. Before 2014, Facebook’s rules about data sharing were not as much strict as they are now. Specifically, a user allowing to disclose some of his data, had also the capability to reveal pieces of his friends’ information. In this way, from the 270K users who deliberately shared their data with the application, it had been possible to profile up to 50 million American electors. With such information in hands, Cambridge Analytica is accused to have performed micro-targeting campaigns to favour the election of Donald Trump, by employing unscrupulous means, such as the spread of fake news to create a significant shift in public opinion.

However, it is anyway important to underline that the problem was not the collection of data per-se, but their abuse to perform illicit deception campaigns on a large scale. For example, among the fair and good usage of such types of data we can quote the experience of Duolingo, the famous app for mobile terminals for training course in foreign languages.

In an event in March 2016: “Big Data and the Ambivalence of Freedom to Irrationality”, Viktor Mayer-Schönberger told that more than 10 million people learn foreign languages with Duolingo every day. By collecting this vast amount of data, Duolingo was able to identify patterns of people learning languages. One of the findings was that Spanish native speakers learn English in the wrong way. They learn English quite well up to a certain lesson – a certain grammatical phenomenon – and then they drop out and quit the language course (A disproportionately large number of people). With the restructuring of the course - this special lesson was shifted further back in the course - the success rate went up. This approach illustrates that big data changes patterns of thinking and approaches. Instead of making a hypothesis and substantiating or disproving it, the researchers search for patterns in the data that show an indication of a certain phenomenon that one would probably not recognize it in this way through the traditional approach[3].

In our view, four main lessons should be learnt from this story:

Today’s data-driven business models come at the cost of sacrificing privacy and require a high level of trust on the entities managing our data. Once data have been disclosed, in fact, there is no guarantee that the party that is entitled to use them (e.g., the legitimate application) does not illegally forward them to other entities.
Although rules can be imposed to limit the control that users have on their friends’ information (as Facebook did in 2014), the issue is inherently present in Online Social Networks, since they are based on the friends/followers paradigm. Due to this model, in fact, the boundaries among users’ information spaces have become blurred. Just think of a picture where a user is inadvertently tagged. Moreover, it has been shown that a target user’s information (e.g., location) can be accurately inferred from the analysis of the profiles of his friends[4].
Social Engineering benefits from the heterogeneity and volume of the available data, and widely employs persuasion techniques[5]. The data-centric and all-interconnected world we live in represents the favourable scenario for the application of an extreme social engineering, i.e., people can be easily profiled, contacted and deceived to induce effects that go far beyond the traditional industrial espionage. As a matter of fact, Social Engineering has the potential to spread ideologies and influence the result of huge political events by exploiting the structure of the democracy itself.
The Duolingo case, as explained in our project also, is an excellent example of how tracking of people behaviour on large scale and inferring of behavioural habits is one of the solutions to improve the efficiency not only of the attack patterns, but also of the training systems.

[1] Digital shadow is defined as “A digital shadow, a subset of a digital footprint, consists of exposed personal, technical or organisational information that is often highly confidential, sensitive or proprietary. As well as damaging the brand, a digital shadow can leave your organisation vulnerable to corporate espionage and competitive intelligence. Worse still, criminals and hostile groups can exploit a digital shadow to find your organisation’s vulnerabilities and launch targeted cyber-attacks against them”, see "Cyber Situational awareness", Digital Shadows, 2015. [Online]. Available: http://bit.ly/2wyLMhk

[2] https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win

[3] See also “The Future of Education. VIKTOR MAYER-SCHÖNBERGER and KENNETH CUKIER. Learning with BIG DATA” (Chapter 2: Change, p.9-11), Available: https://www.hmhco.com/~/media/sites/home/educators/webinars/summer-session/LearningWithBigData-shortened.pdf

[4] https://www.semanticscholar.org/paper/Inferring-Twitter-user-locations-with-10-km-Ryoo-Moon/911f8de9745acd8f4e30a2bac6b89053e7bfa6e0

[5] Source: http://www.counter-currents.com/2016/09/creating-the-meme-superweapon/

by Davide Andreoletti (SUPSI) and Enrico Frumento (CEFRIEL)