I'm a professional FB marketer and have managed both large budgets for private companies and also done political campaigns. I guarantee the trump campaign doesn't use the low quality Cambridge Analytica scraped data in their targeting. They either use voter file records or lookalikes (like everyone else does). All CA data so released so far has been useless, untargeted stuff.
Think about where the data originally came from. People in 2015 downloaded an app, and the app scraped their friends lists. You know what's better than targeting 87 mil loosely connected people? Using the FB algorithm, which targets 330m much, much more accurately and with more connections!
My understanding is that the most significant nonconventional heuristic that Cambridge Analytica got from FB data and exploited was by using FB like data to approximate OCEAN personality scores for almost all voting Americans using this technique [1], and then targeting correlations between people's big five ratings and their succeptibility to differnet types of marketing, such as targetingpeople high in neuroticism with emotionally charged ads, generally meant to instill fear. Am I off base?
I'm sure people claim you can find the best voters using their special astrology method also. The FB algorithm already takes in 1000s of black box factors that are going to work better than some unproven personality score hogwash.
Having worked in that industry, that's backwards. OCEAN is solid, well-understood science; we have decades of figures for test-retest reliability, we know what correlates and what doesn't. FB's black box factors are not public, not understood, and if they stopped working tomorrow no-one would be able to tell you why (or why they worked in the first place).
It takes surprisingly few data points to draw small and detailed psychographic categories of people. This has been known in the advertising world since the 50s, we just didn't have the tools to make microtargeting practical at scale until recently.
We can and do draw detailed psychographic categories with a few data points, but it's far from clear whether the results (and especially the details) are actually correct.
I think "since the 50s" should be taken as evidence against these models. Myers-Briggs first came into vogue in the late 1950s, and has been used for career counseling and hiring since despite being utterly unfit for purpose. Priming work dates to the 70s, and now it appears that many of the long-term uses advertising relies on don't replicate. The 'decoy effect' that drives many product strategies was formalized in the 1980s, and recent work suggests it exists only under very narrow conditions. Modern industry leaders like the Food and Brand Lab have apparently spent the last 20 years publishing absolute nonsense. Even results 'validated' with A/B testing are in many cases just noise from misusing statistics.
Precisely because we didn't have microtargeting or consumer-level feedback, all we've had since the 1950s is the belief that we can build and use these models. We know ads basically work, they improve brand recognition and reputation, but the Don Draper psychological rationales are essentially just-so stories written in the absence of data.
(As far as CA, no one seems to have dug up any seriously unusual patterns in 2016 voting. So unless they paired high-impact psychological targeting with an elaborate statistical coverup, what they actually did with the data wasn't exceptional.)
Psychographic segmentation is an evolution of psychoanalysis; in particular Jaques Lacan, whose work in the 50s took the general ideas of Freudian psychoanalysis and applied them to larger phenomenon -- namely how language and symbolism can be used to pluck emotional strings and influence the minds of groups of like-minded people. An ad man in the 1950s would certainly have been aware of his work. The folks from CA have gone on record about the influence of Lacan, so it's not remotely a stretch.
this is a little dense, but the preface has a nice statement on neurobiology, and wikipedia has some interesting articles on neuroscience and cognitive psychology. I suppose I'm looking for a popsci book on how computer science, psychology, neuroscience, etc all came together in the last decade to become so effective in hacking our brains and influencing our decisions. Or perhaps it's been there all along just now it's getting more attention.
> Or perhaps it's been there all along just now it's getting more attention.
It's been a slow build to add layers of targeting on as the media machine grows. It started out with time-based targeting by showing ads for home goods during the daytime (e.g. soap operas were used to sell soap to housewives). Cable TV was a big step forward -- you could craft shows that appealed to narrower demographics like 8-14 year old boys and then sell ads targeting those demographics.
Psychographic segmentation became prevalent along with cable TV and direct mail, but it was limited to a few dozen "personas" until Google came along and allowed keyword targeting, which then gave way to social targeting. It got exponentially more effective with each step, which is why it seemed to come out of nowhere.
The Facebook algorithm optimizes for Facebook's preferences, not yours as an advertiser. If your goal is to scare people then click-through is no longer your KPI, for example. I think you shouldn't be so quick to dismiss the huge potential potential advantage, particularly when it comes to fear-based political advertising, of backing out psychological profiles of people to refine targeting.
1) That kind of targeting is even less useful outside of FB. You can upload gmail addresses to Google to target, or use something like LiveRamp to target on display networks, but both options suck compared to FB.
2) Nothing beyond whats publically available.
3) It's not unusual to test different audiences, I've definitely tested all sorts. I'm sure thats how it started for Cambridge. Of course now, tin pot dictatorships hire them all around the world now to be basically a subpar FB agency, so they're happy with the PR.
> 1) That kind of targeting is even less useful outside of FB. You can upload gmail addresses to Google to target, or use something like LiveRamp to target on display networks, but both options suck compared to FB.
Both google and FB are financially incentivized to provide as much granularity in targeting as possible so they can charge more money to advertisers, who'll get a better return and get promoted. All up until the point that it becomes a liability. Thats the line they're walking - you can totally target quite a few things that end up correlating to say, neurotic people, if you know your audience is neurotic people. You have enough of the dataset at that point.
CA as an agency might be effective, but this big data scrape is not part of that (beyond marketing themselves as nefarious propagandists to skeezy buyers).
Doesn't that "unproven personality score hogwash" have something like 9 decades of research behind it and is considered the gold standard of personality testing by textbooks on the psychology of personality?
Mostly the standards of rigour in the field of psychology are deemed flimsy and a lot of findings have failed to replicate.
It's my understanding that Big Five has been replicated consistently across different languages and cultures over the last 90 years and is one of the only things we're fairly sure of in psychology at this point.
The accusation was that it was "unproven hogwash". That doesn't check out.
There's a strategy/tactics divide to consider. Relying on FB's targeting algorithms is tactical - that'll give you great targeting once the time comes to execute your campaign.
But CA data, even if noisy, has been exfiltrated out from under privacy controls, so that you can get a much more direct look at it. That allows an analyst to get a more detailed sense of what social networks actually look like. I imagine that is more useful for figuring out what kinds of people you should be targeting in the first place, in order to maximize the leverage of your campaign.
But at that point couldn't you just upload an email list and target by that? I could be miles off the mark, but I assumed a lot of this came from getting people to take a "personality test" and then bucketing that person for particular ads.
Admittedly I based a bunch of this on Alexander Nix's utterly fascinating OMR presentation (https://youtu.be/6bG5ps5KdDo). As per the parent, it always amazes me at how easy it is to influence someone if you know the right levers to pull.
You can still do that, but 1) much better email lists exist that the RNC was already sharing with the Trump campaign, 2) algorithmic targeting almost always beats uploaded lists nationally anyway.
I assumed email lists of people who'd taken the personality tests, then maybe LAL off the back of that. How was the algorithmic targeting back then? I know I wouldn't want to manually optimise against it now (with the right spend).
Just to caveat again - not my specialty, but very interesting all the same!
It's even better now, but in 2016 it was still really good. in 2016 I was using FB targeting to find business owners who were in the market for loans, which is far more specific than democrat, independent, or republican!
I’m really at a loss to understand how to judge between those who say it does and doesn’t work: but this is super helpful, don’t use CA’s crappy partial dataset, use FB’s!
But one question I wanted to ask is, which is very non-expert: was the value of the CA dataset that you can see how people are linked? I understand that you’re far more likely to believe a message if it comes from someone you know. So if CA could identify “sharers”, they could simply hit them again and again and again with information that would be forwarded. Or does FB give this functionality too?
> You know what's better than targeting 87 mil loosely connected people? Using the FB algorithm, which targets 330m much, much more accurately and with more connections!
Depends on how much of a premium you pay for that precision.
> All CA data so released so far has been useless, untargeted stuff.
It’s well known that the data is crap, and much crappier than what both the DNC and RNC were using going into 2016.
So the question remains about why CA generates so much headlines? And is associated so strongly with something magical, nefarious... something that potentially transformed 2016?
The answer is simple, to fit a narrative that 2016 was something other than voters duly electing a legitimate president.
The CA data is a BS story. But its also more or less definitive that Russia's slow leaking of otherwise innocuous emails really hurt Clinton's campaign. Its easy to get confused because there's so much going on.
I still remain unconvinced that was really “Russia” (as opposed to, say, some bored Russian teens).
The reason for my skepticism is the crappy podunk nature of the leaked material. The FSB spends billions every year spying on US politicians, surely they’ve managed to find stuff a lot more embarrassing than the boring emails of some dude whose password was “passw0rd”?
I'm extremely interested in learning more about the crowdstrike arrangment. As someone who is interested in both politics and cybersecurity, it leaves a lot of open questions.
Here's some dangling questions brought out from the Mueller report:
* Mueller’s decision not to interview Assange – a central figure who claims Russia was not behind the hack – suggests an unwillingness to explore avenues of evidence on fundamental questions.
* U.S. intelligence officials cannot make definitive conclusions about the hacking of the Democratic National Committee computer servers because they did not analyze those servers themselves. Instead, they relied on the forensics of CrowdStrike, a private contractor for the DNC that was not a neutral party, much as “Russian dossier” compiler Christopher Steele, also a DNC contractor, was not a neutral party. This puts two Democrat-hired contractors squarely behind underlying allegations in the affair – a key circumstance that Mueller ignores.
* Lawyers for Stone discovered that CrowdStrike submitted three forensic reports to the FBI that were redacted and in draft form. When Stone asked to see CrowdStrike's un-redacted versions, prosecutors made the explosive admission that the U.S. government does not have them.
They never even turned over the servers to the FBI? They only submitted a draft report?
Also, why use a Ukrainian company for something so important, do we not have the expertise here in the US to do this? If there's anything we've learned about Ukraine in the last few months, it's that many politician children were making a lot of money there through... let's just say questionable arrangements.
"Dmitri Alperovitch is co-founder and Chief Technology Officer (CTO) of CrowdStrike – responsible for the company’s overall technology vision, strategy, and architecture as well as R&D initiatives."
It does seem odd that the FBI has claimed in legal filings not to have taken the DNC computers into its possession for forensic analysis, relying entirely on 3rd party analysis, though. I believe this info can be sourced from filings in the Flynn case, which I really wish someone would put up a full archive of, it's been pretty crazy.
Add to the list of your suspect observations those of Steele and his dossier. He was hired by Republicans before he was hired by Democrats. And his dossier was largely accurate—surprisingly so given that the target was (at the time) a wealthy, media-savvy businessman and television celebrity.
The dossier was oppo research. There’s literally nothing controversial about any of it, other than it was leaked and its target successfully spun the few disputed claims into evidence of mass political conspiracy.
The DNC hired them to do a forensic analysis of their email servers after they learned they may have been hacked. Forensic analysis is one of the security services CrowdStrike provides, they also provide defensive services to detect intrusions or attempts.
Voter registration information is public, also both parties have huge, manually curated lists that they sell access to. But giving FB a seed audience of 10000 voters and asking them to find the 100mil most similar people works better a lot of the time. The effectiveness of FB's machine learning on this stuff would blow your mind.
I have some tangential experience here in analytics and advertising, including non-ad social campaigns, and from what I've seen the reporting in the press is likely just the tip of the iceberg. Additionally, much of the focus on the advertising element has distracted from broader and more powerful influence campaigns that operated outside of ads.
From an advertising performance standpoint yes, lookalikes and segmented email lists might perform best on the basis of maximizing engagement, shares, etc. that traditional reporting would suggest. But for a political campaign, enhancing the precision of issue-based messaging to target specific groups should theoretically have yielded gains beyond what the FB algorithm would optimize for – which would likely be more generalized based on in-platform behavior, and muddied by swaths of broadly viral content. While the desired signals might lie somewhere within Facebook's black box, I doubt that the algorithm is generally optimized for this type of political work, when FB's real money comes from driving product sales, signups, etc. In the cases of these political campaigns a lot of this content extended beyond spin to blatantly false information (i.e. made up / completely fake news, inflammatory memes, etc.) – therefore the variance in messaging could be unfathomably massive, limited only by the ability to generate huge amounts of content, which is where content / troll farms came into play.
And advertising aside, the Senate intelligence reports from a couple months ago suggested that the most powerful and far-reaching operations run by CA and similar parties were not based on advertising, and moreso around Facebook groups, Twitter bot / follower networks, and Instagram influencer spheres – recruiting real people into them, then propagating information within (e.g., a 2nd amendment group, run by false avatar accounts aligned with this voter profile, recruiting similar actual users in, and then propagating the desired information within). I've personally seen prior examples from 2014 of these types of mass influence campaigns, namely in the 2014 Senate race in North Carolina, where the seat was flipped to Tillis (R) from the incumbent Hagan (D).
If they had account information / emails, psychographic data, voter rolls, and were operating across platforms – the richness of the data would dictate how far you could go with it. According to the Senate report many of the hired trolls maintained multiple accounts across multiple platforms, were given marching orders each day for messaging, and quotas for performance. Match that with sheer manpower to create content to feed across these platforms, and enough horsepower on the analytics side to optimize based on performance, and you're staring into a frightening abyss.
We'll likely never know how far this went, but based on the data points I've seen from my own work and various reports on the CA / Russia campaigns, I see no reason why something this powerful couldn't exist with enough data, and enough money to put it to work.
Think about where the data originally came from. People in 2015 downloaded an app, and the app scraped their friends lists. You know what's better than targeting 87 mil loosely connected people? Using the FB algorithm, which targets 330m much, much more accurately and with more connections!