Stealing Reality: When Criminals Become Data Scientists (or Vice Versa) morePublished in IEEE Journal of Intelligent Systems |
21 views |
Social network analysis, Privacy, Social Networks, Reality Mining, Mobile Networks, Network Security, and Stealing Reality
Social and Economic computing
O Stealing Reality: When Criminals Become FP
Data Scientists (or Vice Versa)
Yaniv Altshuler, Nadav Aharony, and Alex Pentland, Massachusetts Institute of Technology Yuval Elovici, Ben Gurion University of the Negev, Israel Manuel Cebrian, University of California, San Diego
Stealing-reality attacks attempt to steal social network and behavioral information through data collection and inference techniques, making them more dangerous than other types of identity theft.
W
e live in the age of social computing. Social networks are everywhere, exponentially increasing in volume, and changing how we do busi-
ness and how we understand ourselves and the world around us. The challenges and opportunities inherent in the social-oriented ecosystem have overtaken
scientific, financial, and popular discourse. With the growing emphasis on personalization, personal recommendation systems, and social networking, there is a growing interest in understanding personal and social behavior patterns. This trend is manifested in the increased demand for data scientists and data-mining experts, which derives from the increasing number of social data-driven start-up companies as well as the social-inference-related research sponsored by commercial entities and various nongovernmental organizations (NGOs). This article explores a âwhat ifâ scenario. That is, history has shown that whenever something exhibits a tangible value, someone will try to steal it for profit. Along this line of thoughtâbased on current trends in the data ecosystem coupled with the emergence of advanced tools for social and
1541-1672/11/$26.00 © 2011 IEEE
Published by the IEEE Computer Society
behavioral pattern detection and inferenceâ we ask the following: What will happen when criminals become data scientists? We conjecture that the world will increasingly see malware-integrating tools and mechanisms from network science as well as attacks that directly target human-network information as a goal rather than a means. Paraphrasing Marshall McLuhanâs âthe medium is the message,â1 we have reached the stage where âthe network is the message.â Specifically, a new type of information security threat involves a class of malware, the goal of which is not to corrupt and take control of the machines it infects or steal explicit information stored on them (such as credit card information and personal records). Rather, the goal is to steal social network and behavioral information through data collection and network science
IEEE INTELLIGENT SYSTEMS
2
IS-26-05-Altsh.indd 2
10/19/11 11:49 AM
Related Work in Reality mining
he social sciences have been undergoing a digital revolution, heralded by the emerging field of computational social science, which combines the leading techniques from network science1â3 with new machine learning and pattern recognition tools specialized for the understanding of peopleâs behavior and social interactions.4,5 David Lazer and his colleagues described the potential of computational social science to increase our knowledge of individuals, groups, and societies, with an unprecedented breadth, depth, and scale.6 The pervasiveness of mobile phones has made them ubiquitous social sensors of location, proximity, and communications. The phrase âreality miningâ describes the collection of sensor data pertaining to human social behavior.7 Using call records, cellular-tower IDs, and Bluetooth proximity logs collected via mobile phones at the individual level, the subjectsâ social network can be accurately detected as well as patterns in their daily activities.4,7 Mobile phone records from telecommunications companies have proven valuable for uncovering human-level insights. For example, researchers have used cell-tower location information to characterize human mobility.8 Nathan Eagle, Michael Macy, and Rob Claxton found that the diversity of individualsâ relationships is strongly correlated with the economic development of communities.9 Expanding on an earlier work,7 Anmol Madan and his colleagues showed how to use mobile social sensing to help measure and predict individualsâ health status based on mobility and communication patterns.10 Companies such as Sense Networks are already putting such tools to use in the commercial world to understand customer churn, enhance targeted advertisements, and
T
offer improved personalization and other services. The technical advancements in mobile phone platforms and the availability of mobile software development kits (SDKs) are making it easier than ever to collect reality-mining data.
References
1. A.-L. Barabasi and R. Albert, âEmergence of Scaling in Random Networks,â Science, vol. 286, no. 5429, 1999, pp. 509â512. 2. D. Watts and S. Strogatz, âCollective Dynamics of âSmallWorldâ Networks,â Nature, vol. 393, no. 6684, 1998, pp. 440â442. 3. M. Newman, âThe Structure and Function of Complex Networks,â SIAM Rev., vol. 45, 2003, pp. 167â256. 4. N. Eagle, A. Pentland, and D. Lazer, âInferring Social Network Structure Using Mobile Phone Data,â Proc. Natâl Academy of Sciences, vol. 106, no. 36, 2009, pp. 15274â15278. 5. C. Au Yeung et al., âMeasuring Expertise in Online Communities,â IEEE Intelligent Systems, vol. 26, no. 1, 2011, pp. 26â32. 6. D. Lazer et al., âSocial Science: Computational Social Science,â Science, vol. 323, no. 5915, 2009, pp. 721â723. 7. N. Eagle and A. Pentland, âReality Mining: Sensing Complex Social Systems,â Personal and Ubiquitous Computing, vol. 10, 2006, pp. 255â268. 8. M.C. Gonzalez, C.A. Hidalgo, and A.-L. Barabasi, âUnderstanding Individual Human Mobility Patterns,â Nature, vol. 453, no. 7196, 2008, pp. 779â782. 9. N. Eagle, M. Macy, and R. Claxton, âNetwork Diversity and Economic Development,â Science, vol. 328, no. 5981, 2010, pp. 1029â1031. 10. A. Madan et al., âSocial Sensing for Epidemiological Behavior Change,â Proc. 12th ACM Intâl Conf. Ubiquitous Computing, ACM Press, 2010, pp. 291â300.
inference techniques. We call this a stealing-reality attack. After exploring this new kind of attack, we analyze how it could be carried out. We show the optimal strategy for attackers interested in learning a social network and its hidden underlying social principles. Remarkably, our analysis shows that, in many cases, such an optimal strategy should follow an extremely slow spreading pattern. Counterintuitively, such attacks generate far greater damage in the long term compared to attacks that spread more aggressively. In addition, such attacks will likely avoid detection by many of todayâs network security mechanisms, which tend to focus on detecting network traffic anomalies, such as a traffic-volume increase. We demonstrate this discovery using several realworld social networks datasets.
NovEMbEr/dEcEMbEr 2011
Stealing Reality Threat Model In our discussion, we refer to reality information as inferred information about personal and social behavior. This includes three elements:
⢠information about individuals, which we refer to as node information (including any parameter on a node that can be learned from available data, such as occupation, income level, health state, or personality type); ⢠dyadic information, which includes information on relationships and other parameters connecting two nodes (we call this edge information); and ⢠network-level information, or information on groups of nodes, communities, and general network properties and information.
www.computer.org/intelligent
The full network information also includes all data on nodes and edges. We do not refer here to explicitly stated information that can be found in (and stolen from) existing databases, such as names, social security, or credit card numbers. Whereas reality mining is the legitimate collection and analysis of such information, reality stealing is the illegitimate accrual of it. (See the âRelated Work in Reality Miningâ for previous research in this area.)
Motivation for Attackers
Secondary markets for the resale of stolen identities already exist, such as www.infochimps.com or black market sites and chat rooms for the resale of other illegal datasets.2 It is reasonable to assume that an email address of a social hub would be worth more
3
IS-26-05-Altsh.indd 3
10/19/11 11:49 AM
Social and Economic computing
to an advertiser than that of a social leaf, or that a person meeting a student profile might be priced differently than a person meeting a corporate executive profile. Stolen reality information could be used for several malicious goals: ⢠selling to the highest bidder (legitimate bidders, advertisers, and so on or in the black market to other attackers), ⢠bootstrapping other attacks (as part of a complex advanced persistent threats [APT] attack 3,4), and ⢠business espionage (for example, to analyze a competitorâs customer base, profile high-yielding customers for targeted marketing, 5 or produce high-quality predictions6). Companies are already operating in this area, collecting email and demographic information with the intent to sell it. Methods for social network analysis and trends recognition have already been published in many leading venues.7 Why should attackers work hard when they can use automatic agents to collect the same data and possibly much higherquality information?
dangers of reality-Stealing Attacks
Victims of a behavioral-pattern theft cannot change their behaviors and life patterns. This type of information, once out, would be difficult to contain. A second component accentuating this danger is that real-life information can be deduced from seemingly safe data, such as accelerometer and location information, which users already freely allow many mobile applications to access. Because we believe this is a concrete threat, our research goal is to analyze potential attacks from the attackersâ perspective to better understand them and develop proper defenses.
Past Attacks on real-World Information
and challenges in recovering from leaked real-life information, whether by youthful carelessness or malicious extraction through an attack.10 Many existing viruses and worms use primitive forms of social engineering,11 which attempt to gain the trust of their next victims and then convince them to click on a link or install an application. For example, Happy99 was one of the first viruses to attach itself to outgoing emails, thus increasing the chances of the recipient opening an attachment to a seemingly legitimate message from an acquaintance. (More information concerning security and privacy leakage in social networks is available elsewhere.12,13)
Users can modify their communication network topologies and networkeddevice identifiers with the click of a button. In the event of a security breach, users can also change their passwords, usernames, and credit cards numbers; easily replace their email and other online accounts; and quickly warn their contacts. However, it is much harder to change oneâs social network, personto-person relationships, friendships, or family ties. If a chronic health condition is uncovered through such an attack, there is no going back.
4
To help understand the risk of attacks on inferred real-world information, we reviewed prior attacks on explicit data. In 2008, real identity information of millions of Korean citizens was stolen in a series of malicious attacks and posted for sale. In 2007, the Israel Ministry of Interiorâs database, which contained information on every Israeli citizen, was leaked and posted on the Web.8 In the US, a court is ruling whether a database from a bankrupt gay-dating site for teenagers can be sold to raise money to help repay its creditors; the site includes the personal information of more than a million teenage boys.9 In all these cases, once the information is out, there is no way to get it back, and the damage might be felt for an extended period. In a recent Wall Street Journal interview, former Google CEO Eric Schmidt referred to the possibility that people in the future might choose to legally change their name to detach themselves from embarrassing ârealityâ information publicly exposed in social networking sites. This demonstrates the sensitivity
www.computer.org/intelligent
Social Attack Model We model a social network as an undirected graph G(V, E). A stealingreality attackerâs first goal is to inject a single malware agent into one of the networkâs nodes. Upon such injection, the agent starts to learn about this node (and its interactions with its neighbors). Periodically, the agent tries to copy itself into one of the original nodeâs neighbors. The probability that an agent will try to copy itself to a neighboring node at any given time step determines the aggressiveness of the attack, r. Namely, aggressive agents have higher r values (and hence take less time between each two spreading attempts). Less aggressive agents are less likely to try and spread at any given time and generally will wait longer between trying to copy themselves to one of the neighbors of their current host. As the information about the network becomes worthy of an attack, the attackerâs motivation is stealing as many properties related to the networkâs social topology as possible. We denote the percentage of verticesrelated information acquired at time
IEEE INTELLIGENT SYSTEMS
IS-26-05-Altsh.indd 4
10/19/11 11:49 AM
t as LV (t) and the percentage of edgerelated information acquired at time t is as LE (t). The duration of the stealing-reality attackâs learning process refers to the time it takes the attacking agent to identify with high probability the properties of a nodeâs behaviors or of some of its social interactions. We model this process using a standard Gompertz function in the parametric ct form of y (t) = aebe (for some parameters a, b, and c). This model is flexible enough to fit various social-learning mechanisms, while providing the following important features: ⢠The longer such an agent operates, the more precise its conclusions will be. We call this the sigmoidal advancement. ⢠The rate at which information is gathered is smallest at the start and end of the learning process. ⢠For any value of T, the amount of information gathered in the first T timesteps is greater than the amount of information gathered at the last T timesteps. We call this the asymmetry of the asymptotes. Previous research demonstrated the applicability of the Gompertz function for the purpose of modeling the evolution of locally learning the preferences and behavior patterns of users.14 The authors attempted to predict which applications mobile users would install on their phones using an ongoing learning process. This experiment showed that this process can be best modeled using the function 1 - e -x. Because we know that 1 - t ⤠e -t (achieving tight results for most t < 1), we can clearly see that âx 1 â e â x â e â e , which is an instance of the Gompertz function (for a = 1, b = c = -1). Users and administrators are more likely to detect an aggressive spreading
NovEMbEr/dEcEMbEr 2011
pattern, resulting in the subsequent blocking of the attack. On the other hand, attacks that spread slowly might evade detection for a longer period of time, but the amount of data they gather would be limited. To predict the detection probability of the attack at time t, we use Richardâs curve15 âa generalized logistic function often used for modeling the detection of security attacks:
pdetect (t) = 1
1 Ï â Ï (t â M ) Ï
learning process with respect to this new measure in many cases involves nonaggressive attacks.
Information complexity of Social Networks
(
1+ e
)
where r is the attack aggressiveness, s is a normalizing constant for the detection mechanism, and M denotes the normalizing constant for the systemâs initial state. Let Iu(t) be the infection indicator of u at time t, Tu be the initial infection time of u, and p(u, t) be the Gompertz function. Defining
ÎV (t) = 1 |V |
â uâV Iu (t) â
p (u, t â Tu )
A networkâs Kolmogorov complexity represents the basic amount of information contained in a social network.16 For example, a military organizationâs network has many homogeneous links and hierarchical structures. We would expect it to require a much shorter minimal description than, say, the social network of the residents of a metropolitan suburb. In the latter, we would expect to see a highly heterogeneous network, consisting of many types of relationships such as work relationships, physical proximity, family ties, and other intricate types of social relationships and group affiliations. Let K E denote a networkâs Kolmogorov complexity, or the minimal number of bits required to âcodeâ the network in such a way that it could later be completely restored.
Social Entropy of Social Networks
we get
â
ÎV (Ï) =
â«ï£¬ ï£
0
 âÎV (t)  â
(1 â pdetect (t)) dt  ât
Social Learnability We defined a mathematical measure that predicts an attackerâs ability to steal or acquire a given social networkâwhat we call a networkâs social learnability. This measure reflects both the information contained in the network and the broader context from which the network was derived. Using this measure, we can sort real-world social networks according to their complexity (which is known) and even group two different social networks that were generated by the same group of people. The optimal
www.computer.org/intelligent
Every social-reality network belongs to one or more social families, each of which has its own consistency (or versatility). Some families might contain a great variety of possible networks, each having roughly a similar probability to occur, while another might consist of a limited number of possible networks. Each networkâs complexity, however, does not necessarily correlates with its entropy. For example, families of low variety might in fact be highly complicated networks, while other families might contain a great variety of relatively simple networks. Let us define í µí²¢n to contain n random instances of networks of |V| nodes that belong to the same social
5
IS-26-05-Altsh.indd 5
10/19/11 11:49 AM
Social and Economic computing
family as G. Let Xn be a discrete random variable with possibility values
of the network G, the network information can be written as follows:
I = λ (G) â
KE log2 ( |E | Î E (t)) log2 |E |
{
x1, x2 ,..., x
2
1 (|V |(|V |â1)) 2
}
(corresponding to all possible graphs over |V| nodes), taken according to the distribution of í µí²¢n. The normalized social entropy of the network G would therefore be calculated by dividing the entropy of the variable Xn by the maximal entropy for graphs of |V| nodes:
λn (G) ï
H (Xn ) log2 ζ |V |
After normalizing by the networkâs overall social essence (LE = 1), we achieve the following measurement for the social essence of the subnetwork acquired:
λ (G)â
KE â
log2 (| E| Î E (t )) log2 | E|
Î s (t) =
2
2λ (G)â
KE
λ (G)â
KE â
log2 Î E (t ) log2 | E|
where z|V| denotes the number of distinct nonisomorphic simple graphs of |V| nodes. l(G) is then defined as limnââ ln(G).
Stealing a Networkâs Social Essence
=2
which yields
Î S (t) = Î E (t)
λ (G)â
KE log2 |E|
Reedâs law asserts that the utility of large networks (and particularly social networks) can scale exponentially with the size of the network. This is because the number of possible subgroups of network participants is exponential in N (where N is the number of participants), stretching far beyond the N 2 utilization of Metcalfeâs law that was used to represent the value of telecommunication networks. Extending this notion, we assert that a strong value emerges from learning the 2I social principles behind a network, where I is the information encapsulated in a network. Assuming that at time t an attacker has stolen |E|LE (t) edges, then taking K E as the maximal amount of information that can be coded in the network G, we normalize it by the fraction of edges acquired thus far. Because K E is measured in bits, the appropriate normalization should maintain this scale. Multiplying by l(G), the normalized social entropy
6
K E represents the network complexity, whereas l(G) represents the complexity of the networkâs social family. At this point, we assert that our social-learnability measure is a valuable property for measuring network attacks. For this, we demonstrate the values of this measure for several realworld networks. Figure 1 presents an analysis of the networks derived from the Social Evolution experiment,17 the Reality Mining network,18 and the Friends and Family experiment. 19 We can easily see the logic behind the predictions received using the social-learnability measure concerning the difficulty of learning each network. Specifically, we determined the Social Evolution network is harder to steal than the Reality Mining network, but it is easier to steal than the Friends and Family networks. Whereas the Reality Mining experiment tracked people within a relatively static work environment, the Social Evolution experiment took place at Massachusetts Institute of
www.computer.org/intelligent
Technology undergraduate dorms, involving students with (apparently) much more complicated mobility and interactions patterns. The Friends and Family dataset involved even more complicated interactions because it includes a heterogeneous community of couples, increasing the amount of information encapsulated within the network. In addition, the social-learnability measure places the two Friends and Family networks directly on top of each other, despite the fact that the two networks contain significantly different information in terms of volume, meaning, and network information. Still, because the two networks essentially represent the same social group of people, their social-learnability measure has a similar value. Figure 2 demonstrates the importance of a networkâs social entropy, analyzing the Reality Mining network for various possible values of social entropy. We approximated the value for the networkâs Kolmogorov complexity using an LZW compression. Figure 3 demonstrates the progress of the network-essence-stealing process for various network-complexity values. As the amount of information contained in a network increasesâ that is, the network represents more complicated social structuresâthe network becomes much more difficult to acquire.
Experimental Results We evaluated our model on data derived from a real-world cluster of mobile phone users drawn from the call records of a major city within a developed western country. The data consists of approximately 200,000 nodes and 800,000 edges. Figure 4 shows the attack efficiency (namely, the maximal amount of network information acquired) as a function of its aggressivenessâthat is, the attackâs
IEEE INTELLIGENT SYSTEMS
IS-26-05-Altsh.indd 6
10/19/11 11:49 AM
1.0 Reality mining Random hall Friends and family
1.0 Reality mining Random hall Friends and family
Percentage of network acquired
0.6
Percentage of network acquired 0.4 0.6 0.8 1.0
0.8
0.8
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0
0
0.2
0.4
0.6
0.8
1.0
(a)
Percentage of edges acquired 1.0
(b)
Percentage of edges acquired
Percentage of network acquired
0.8
0.6
0.4
0.2
0
Reality mining Random hall Friends and family 0 0.2 0.4 0.6 0.8 1.0 Percentage of edges acquired
(c)
Figure 1. Reality-stealing process. We used three values of social entropy (a) l(G) = 1, (b) l(G) = 0.1, and (c) l(G) = 0.02, for four networks: the Random Hall network,17 Reality Mining network,18 Friends and Family self-reporting network, and Friends and Family Bluetooth network.19 Using this example, we can see that the Reality Mining network is easier to steal than the Random Hall network, which in turn is easier to steal than the Friends and Family networks.
infection rate. The two curves represent the amount of information (related edges and vertices) that can be obtained as a function of the aggressiveness value r. Although a local optimum exists for an aggressiveness value of little less than r = 0.5 (namely, a relatively aggressive attack), it is preceded by a global optimum achieved by a much more subtle attack, for an aggressiveness value of r = 0.04. To further validate our analytic model for predicting the success of
NovEMbEr/dEcEMbEr 2011
stealing-reality attacks, we simulated attacks for random subnetworks of our real-world 200,000-node mobile network using various attackaggressiveness values. We used numerous sets of values for the attack properties, and for each, we empirically measured the overall expected amount of information stolen by the attack. Although the actual percentage of stolen information varied significantly between the various simulations, demonstrating the influence of changes made to the
www.computer.org/intelligent
attackâs properties, many displayed the same interesting phenomenon: a global optimum for the attackâs performance located around a low value of r. Figure 5 presents some of these scenarios. To further validate our theoretical attack model, we used a small-scale real-world social network we obtained from the Friends and Family study containing data derived from a multitude of mobile mounted sensors (including call logs, accelerometers, Bluetooth, and WiFi interactions).
7
IS-26-05-Altsh.indd 7
10/19/11 11:49 AM
Social and Economic computing
1.0 0.9 0.8 Network essence acquired 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 Highest Kolmogorov complexity Lowest Kolmogorov complexity
0.4 0.5 0.6 0.7 Percentage of edges acquired
0.8
0.9
1.0
Figure 2. Network social entropy. This example illustrates a networkâs social entropy l(G) for the Reality Mining network.18 The curves represent an approximation of the social essence measure calculated using an LZW compression of the Reality Mining network. If we assume that the network is derived from a family of the maximal entropy (namely, having a uniform distribution of all possible networks), the evolution of the stealing-reality attack differs significantly than for networks that were derived from a family of a lower social entropy. In fact, even for l(G) = 0.1, stealing the network would be materially easier, having additional information out of any edge acquired.
1.0 0.9
Percentage of network acquired
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1
Social entropy = 0.02
he new concept of stealingreality attacks might provide an explanation for observed evidence in the process of investigating recent advanced persistent threats (APT) attacks as well as suggest that such attacks might have occurred in the past and gone undetected. Such attacks are difficult to detect because most existing network monitoring methods focus on detecting other, noisier attack attempts. Systems such as the Network Telescope20 are designed to detect activity in IP segments that are supposed to contain no such activities. Other widely used methods rely on detecting anomalies in network activity, 21,22 for which a considerable amount of data is required. As a result, a nonaggressive attack might avoid detection. Finally, the attack is sensitive to the accuracy of the selection of the optimal aggressiveness value (Figure 4), which further hints at the usefulness of the attack for entities such as global hacking organizations or national defense agencies that have the resources needed to gather the information required for such accurate estimation.
T
Acknowledgments
Social entropy = 0.1
This material is based upon work supported by the US National Science Foundation under grant number 0905645. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF. Social entropy = 1
0.2
0.3
0.4 0.5 0.6 0.7 Percentage of edges acquired
0.8
0.9
1.0
References
1. M. McLuhan, Understanding Media: The Extensions of Man, Mentor, 1964. 2. C. Herley and D. Florencio, âNobody Sells Gold for the Price of Silver: Dishonesty, Uncertainty and the Underground Economy,â Economics of Information Security and Privacy, T. Moore, D. Pym, and C. Ioannidis, eds., Springer, 2010, pp. 33-53.
IEEE INTELLIGENT SYSTEMS
Figure 3. Evolution of LS as a function of the overall percentage of acquired edges. For networks of the same number of edges (|E| = 1,000,000), we assume the same social entropy l(G) = 0.1, with different levels of Kolmogorov complexity.
Using this data, we confirmed our assumptions concerning the learning process.14 Our research currently
8
focuses on the empirical implementation and measurement of the model we presented here.
www.computer.org/intelligent
IS-26-05-Altsh.indd 8
10/19/11 11:49 AM
0.40 Percentage of the information collected 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Attack aggressiveness 0.7 0.8 0.9 1.0 Vertices information Edges information
3. âThe Advanced Persistent Threat (APT),â white paper, Solutionary, 2011. 4. B.E. Binde, R. McRee, and T.J. OâConnor, Assessing Outbound Traffic to Uncover Advanced Persistent Threat, tech. report, Sans Inst., 2011. 5. M. Brunner et al., Infiltrating Critical Infrastructures with Next-Generation Attacks, tech. report, Fraunhofer-Inst. for Secure Information Technology, 2010. 6. L. Tang and H. Liu, âToward Predicting Collective Behavior via Social Dimension Extraction,â IEEE Intelligent Systems, vol. 35, no. 5, 2010, pp. 19-25. 7. D. Barbieri et al., âDeductive and Inductive Stream Reasoning for Semantic Social Media Analytics,â IEEE Intelligent Systems, vol. 25, no. 6, 2010, pp. 32-41. 8. N. Jeffay, âIsrael Poised to Pass National I.D. Database Law,â The Jewish Daily Forward, 2009; www.forward. com/articles/112033. 9. D. Emery, âPrivacy Fears over Gay Teenage Database,â BBC News, 2010; www.bbc.co.uk/news/10612800. 10. R.M. Stana and D.R. Burton, Identity Theft: Prevalence and Cost Appear to Be Growing, tech. report GAO-02-363, US General Accounting Office, 2002. 11. S. Granger, âSocial Engineering Fundamentals, Part I: Hacker Tactics,â Symantec, 2001; www.symantec.com/ connect/articles/social-engineeringfundamentals-part-i-hacker-tactics. 12. R. Gross and A. Acquisti, âInformation Revelation and Privacy in Online Social Networks,â Proc. 2005 ACM Workshop Privacy in the Electronic Society (WPES), ACM Press, 2005, pp. 71-80. 13. A. Korolova et al., âLink Privacy in Social Networks,â Proc. 17th ACM Conf. Information and Knowledge Management (CIKM), ACM Press, 2008, pp. 289-298. 14. W. Pan, N. Aharony, and A. Pentland, âComposite Social Network for Predicting Mobile Apps Installation,â
NovEMbEr/dEcEMbEr 2011
Figure 4. Overall amount of data that can be captured by a stealing-reality attack. This example illustrates the phenomenon where the most successful attack possible (namely, an attack capable of stealing the maximal amount of information) is produced by a low attack aggressiveness r value. The upper curve represents LE (r), the overall percentage of edge-related information stolen. The lower curve represents LV (r), the overall percentage of vertices-related information stolen. The local maximum around r = 0.5 is outperformed by the global maximum at r = 0.04.
1.8
Percentage of network acquired
Ã10â3
1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3
Double logarithmic representation 100 â2 10 10â4 10â6 10â8 10â10 0.01 0.10 1.00
0.4 0.5 0.6 Attack spreading rate
0.7
0.8
0.9 1.0
Figure 5. Extensive study of a real-life mobile network, simulating stealing-reality attacks. The performance of each scenario is measured as the percentage of information acquired, as a function of the infection rate r. The scenarios that are presented in this figure demonstrate a global optimum of the attack performance for low r values, stressing the fact that in many cases an extremely nonaggressive attack achieves the maximal amount of stolen information.
Proc. 25th Conf. Artificial Intelligence, 2011, pp. 821-827. 15. N.A. Christakis and J.H. Fowler, âSocial Network Sensors for Early
www.computer.org/intelligent
Detection of Contagious Outbreaks,â PLoS ONE, vol. 5, 2010, p. e12948. 16. A. Kolmogorov, âThree Approaches to the Quantitative Definition of 9
IS-26-05-Altsh.indd 9
10/19/11 11:49 AM
Social and Economic computing
thE authoRS
Yaniv Altshuler is a postdoctoral associate at the Massachusetts Institute of Technol-
ogy Media Lab and an adjunct faculty member in the Computer Science Department at the Technion, Israel Institute of Technology. His research interests include information flow dynamics, social network analysis, and swarm algorithms in dynamic communities. Altshuler has a PhD in computer science from Technion. Contact him at yanival@media. mit.edu.
Nadav Aharony is a doctoral student in the Human Dynamics Group at the MIT Media Lab. His research interests include revolves around the intersection between communication networks, social dynamics, and systems that learn. Aharony has a BS in electrical engineering from Technion. Contact him at nadav@media.mit.edu. Alex âSandyâ Pentland is the Toshiba Professor of Media, Arts, and Sciences at MIT
and directs the MIT Human Dynamics Lab. He also directs the Media Lab Entrepreneurship Program, spinning off companies to bring MIT technologies into the real world. He is a pioneer in computational social science, organizational engineering, and mobile information systems and is developing computational social science and using this new science to guide organizational engineering. Pentland has a PhD in AI and psychology from MIT. Contact him at sandy@media.mit.edu.
Yuval Elovici is an associate professor in the Department of Information Systems Engineering and the director of the Deutsche Telekom Research Laboratories at Ben Gurion University of the Negev, Israel. His research interests include computer and network security, privacy and anonymity in an electronic society, social network security, complex networks, and detection of malicious code using machine learning techniques. Elovici has a PhD in information systems management from Tel Aviv University. Contact him at elovici@bgu.ac.il. Manuel cebrian is a research scientist in the Department of Computer Science and Engi-
neering at the University of California, San Diego. His research focuses on the advancement of computational social science, an emerging field that collects and analyzes data at a scale capable of revealing patterns of individual and group behaviors. Cebrian has a PhD in computer science from the Autonomous University of Madrid. Contact him at mcebrian@ucsd.edu.
Computing, ACM Press, 2010, pp. 291-300. 18. N. Eagle, A. Pentland, and D. Lazer, âInferring Social Network Structure Using Mobile Phone Data,â Proc. Natâl Academy of Sciences, vol. 106, no. 36, 2009, pp. 15274-15278. 19. N. Aharony et al., âThe Social fMRI: Measuring, Understanding, and Designing Social Mechanisms in the Real World,â Proc. 13th ACM Intâl Conf. Ubiquitous Computing (Ubicomp), ACM Press, 2011, pp. 445-454. 20. D. Moore et al., âInside the Slammer Worm,â IEEE Security & Privacy, vol. 1, no. 4, 2003, pp. 33-39. 21. F. Apap et al., âDetecting Malicious Software by Monitoring Anomalous Windows Registry Accesses,â Recent Advances in Intrusion Detection, Springer, 2002, pp. 36-53. 22. R. Moskovitch et al., âHost Based Intrusion Detection Using Machine Learning,â Proc. IEEE Intâl Conf. Intelligence and Security Informatics (ISI), IEEE Press, 2007, pp. 107-114.
Information,â Problems Information Transmission, vol. 1, no. 1, 1965, pp. 1-7.
17. A. Madan et al., âSocial Sensing for Epidemiological Behavior Change,â Proc. 12th ACM Intâl Conf. Ubiquitous
Selected CS articles and columns are also available for free at http://ComputingNow.computer.org.
10
www.computer.org/intelligent
IEEE INTELLIGENT SYSTEMS
IS-26-05-Altsh.indd 10
10/19/11 11:49 AM