Reblog network of an anonymous tumblr post.
Animated network graphs, oh my!
For the past nine months, Janet Vertesi, assistant professor of sociology at Princeton University, tried to hide from the Internet the fact that she’s pregnant — and it wasn’t easy. Pregnant women are incredibly valuable to marketers. For example, if a woman decides between Huggies and Pampers diapers, that’s a valuable, long-term decision that establishes a consumption pattern. According to Vertesi, the average person’s marketing data is worth 10 cents; a pregnant woman’s data skyrockets to $1.50. And once targeted advertising finds a pregnant woman, it won’t let up. […] First, Vertesi made sure there were absolutely no mentions of her pregnancy on social media, which is one of the biggest ways marketers collect information. She called and emailed family directly to tell them the good news, while also asking them not to put anything on Facebook. She even unfriended her uncle after he sent a congratulatory Facebook message. She also made sure to only use cash when buying anything related to her pregnancy, so no information could be shared through her credit cards or store-loyalty cards. For items she did want to buy online, Vertesi created an Amazon account linked to an email address on a personal server, had all packages delivered to a local locker and made sure only to use Amazon gift cards she bought with cash. […] Genius, right? But not exactly foolproof. Vertesi said that by dodging advertising and traditional forms of consumerism, her activity raised a lot of red flags. When her husband tried to buy $500 worth of Amazon gift cards with cash in order to get a stroller, a notice at the Rite Aid counter said the company had a legal obligation to report excessive transactions to the authorities. “Those kinds of activities, when you take them in the aggregate … are exactly the kinds of things that tag you as likely engaging in criminal activity, as opposed to just having a baby,” she said.
Facebook uses face recognition software to identify its users in photos. This works via a ‘template’ of your facial features that is created from your profile images. These features — the distance between your eyes, the symmetry of your mouth — generally do not change over time. Unlike a photograph, which captures some ephemeral expression of who you are at a particular moment, a face recognition template forever remains your portrait. It is all possible photos, taken and untaken, by which you, or someone else, might document your life.
These templates are Facebook’s proprietary data. For a brief period in 2013, users could access their template using the “Download a copy of your Facebook data” option in the settings (it is no longer included in the download). The information is unusable in its raw form without knowing the specifics of Facebook’s algorithm. But as an irrevocable corporate byproduct, the future implications of such data remain unclear.
In our work as social media researchers we are regularly answering clients’ questions about online influence and influencers. They know that they’re not the only force influencing perceptions of their brands, and they want to reach out to the other people who are. This could mean identifying the right bloggers to bring on board to increase the likelihood of a successful social campaign, or tracking who is most shaping a discussion about a brand or topic.
Pinning down who is influential isn’t straightforward. The data hardly ever exists to connect a social media message with the actions it may have inspired, such as products purchased or businesses boycotted. Instead what we can really assess is ‘potential to influence’: who’s reaching a big audience, who’s engaging that audience the most and getting a lot of interaction, and who’s demonstrating consistent expertise on a topic. So influence is complex, an outcome of a combination of properties about people, contexts and relationships.
That’s why here at FACE we developed our own proprietary metric to analyse which messages were reaching the biggest audience. Our visibility algorithm assigns each piece of content a visibility score, taking into account the properties of the channel it’s on (e.g. blog content lasts longer than Twitter), the size of the author or website’s audience, and the virality of the post – how many times it’s been shared.
Alongside visibility, we also use Social Network Analysis to understand influence through analyzing the dynamics of online behaviours and relationships. It provides the theory, the algorithms and the software to capture, visualize and explore the data gathered using Pulsar. This can enable us to take influencer analysis to the next level – and it’s what we’re going to discuss in today’s blog.
The role of influencers
Previous research carried out here at FACE by Francesco D’Orazio and Jess Owens highlighted the role of influencers in how information spreads through social media. It discovered that while influencers may only represent a small percentage of an overall conversation, their role does ultimately shape how information spreads. Tapping into close communities makes content shareable, but top-down influence is essential for content to achieve truly viral speed and scale.
We’ll cover communities in more detail in our next blog, but for the moment let’s understand that influencers play a vital role in shaping conversations, and insight into how their influence is structured can also prove important.
In essence Network Analysis views relationships as connections. Some people in the network might have only one or two connections (e.g. they only have 1 or 2 Twitter followers), and others might have hundreds or thousands.
So hubs or influencers in networks can be identified by looking for people who are highly connected in comparison to the remainder of the network. Because they’re better connected, these are the people who you may wish to bring on board with an online campaign, to help maximize its chance of successfully reaching the greatest number of people.
So let’s look at an example that demonstrates how networks can help us investigate relationships between nodes and identify influencers.
Investigating my ego network
When compiling a list of influencers you may start with a very basic measure, the number of friends/followers. Using Network Analysis and my social graph, we’ll explore the limitations of this metric, and how we might do a better job.
Introducing my friends & family…..
In this visualisation the nodes are people who are my friends on Facebook, and the edges are the friend relationships between them. It’s important to note that I’m not on the chart – so the connections aren’t their relationships with me. Instead, the connections shown are the friendships that they have with each other e.g. I’m friends with Amy and Bob, and if Amy and Bob are also friends, there’d be a connection between them. If they’re not friends, no connection.
We can rank nodes by a number of measures; in this instance I’ve chosen degree centrality, which is the number of connections each person has. I’ve used this to determine the size of each node: the larger the node the greater the number of connections. This makes the highly-connected people easier to spot.
We’ve also used what’s called a “force directed layout algorithm” to visualize the graph. This means that linked nodes attract each other and non-linked nodes are pushed apart. So the most-connected people tend to end up towards the middle of the chart.
The first analysis that can be taken from the graph is that a lot of nodes share connections. This why why there is one large giant component in the centre of the graph with lots of highly-connected people all clustered together. This is to be expected as the sample of individuals is taken from my Facebook account, the majority of whom do share common acquaintances.
The thing is, we can also see that the biggest nodes are basically the same size, meaning that they’ve got the same number of connections. This isn’t really telling us the story we need – but using network analysis we can go further.
Here we’ve taken the same graph and ranked nodes by betweeness centrality. A betweeness centrality algorithm starts by finding all the shortest paths between any two individuals in the network. It then counts the number of these shortest paths that go through each node. Nodes with high betweeness centrality can be considered information brokers that can connect disparate parts of the network.
The result is a smaller list of potential influencers, pin-pointing the people who are vital in connecting the different sub-networks (i.e. the different social groups) in the wider graph. We have identified four people who are now shown to hold a position of influence on the graph. And the layout of the graph begins to tell us how their spheres of influence are structured.
The person over on the right for example is crucial in connecting two small clusters of individuals to the rest of the graph. I know network analysis has correctly identified this node as an influencer – because she happens to be my girlfriend! So she’s the key person connecting both our families to the larger network of my friends.
How can this work for you?
Admittedly there’s a very short list of people who are interested in the finer details of the network structure of my Facebook graph! Nonetheless it’s an interesting example to demonstrate some of the principles of Social Network Analysis.
What can we take from this example? Using network analysis it is possible to study social groups in-depth, not just as homogenous wholes but understanding them as comprised of dynamic relationships between different individuals. And using data visualization and data exploration it is possible to infer a level of understanding which would be otherwise difficult to get hold of without real-world personal knowledge of the individuals involved.
Using Pulsar TRAC it’s possible to scale this analysis up significantly, sampling mentions by keyword, content or user, and applying network analysis we can powerfully:
Exactly the same methods would apply if we were studying, for example, the community of people talking online about beauty & make-up, or audiophile hi-fi equipment, or photography. We could first find the best-connected people, who a brand might want to target to promote their product to the largest number of people. But we could also find the connectors, the people that allow discussions to travel into new communities and ultimately travel further.
In the next blog in our series we’re going to dive into this further, explore how we can identify communities in network structures and get stuck into some more network analysis previously carried out here at FACE.
A piece I worked on for my company blog. Definitely worth checking out the site if you haven’t already. http://www.facegroup.com/identifying-influencers-with-social-network-analysis.html
Taking an introductory class to data science. Searched my computer to see if i’ve already downloaded Git, found this, fuck data science.
Outkast: Git Up, Git Out.
Classification is a few value proposition for text analytics - it allows users to quickly drill into articles of interest and look at trends over time. Setting up a classification scheme can be a lot of work.
The common techniques are:
Model-based categorization starts with humans marking content by all of the categories they satisfy. Something like “Buying kimonos while on holiday” should fit into Travel and Fashion. Generally, you need hundreds of examples per category. Then you set a machine-learning algorithm loose on the data set. The machine analyzes all of the text in each marked document and constructs some kind of signature for each category. This looks easy on the surface - deciding whether something should go in one category or another is a relatively easy decision to make. A big issue with this approach though is what they call “overfitting” where the algorithm performs really well on the exact content you fed it but sucks elsewhere. Changing its behavior is not easy, nor is it obvious WHY something got categorized this way.
Query-based categorization starts with humans trying out search terms that should define the category and seeing how their retrieval works. So, in the above taxonomy some human gets to decide the keywords appropriate for Sports, Travel, Fashion, etc. Often this is done by people looking at articles and trying to figure out which words they should choose for their queries - they might pick “kimono” and “japan” for instance in my above example. This is difficult work to do - skilled labor is required plus a lot of time. However, queries are very transparent - you can see immediately why something matched - and they are easy to change. When you realize that “kimono” is not very predictive of fashion articles in general, you can delete that word and put in something else if you like.
There are pros and cons to each. Building queries requires a fair amount of thought and a lot of iteration, but queries are easy to change if the data changes, and the results are transparent. On the other hand, models just require users to tag documents and the machine does the heavy lift, but the results are opaque and adjusting a model for data changes requires a user to re-tag content - potentially quite a lot of content.
full article here: http://lexalytics.com/lexablog/2014/classification-queries-vs-models