Journalism, Through the Eyes of a Data-Focused Entrepreneur

Fueled by wine and delicious food, the table was full of energy. The journalists and politicians at my table were eager to outdo one another. They exchanged candid personal stories about famous TV newsmen and potential presidential candidates. They recounted tales of the shocking political corruption they’d uncovered. They told their colleagues what had really gone on at that recent big event. Unable to compete with their stories, I nodded politely.

Somehow, I’d found myself at the upscale Rialto restaurant in Harvard Square, feeling like a fly who’d landed on the wrong wall. I was twenty-four, a quiet and unassuming fraud R&D scientist at a small Silicon Valley startup called PayPal. The eight or nine people at my table were quite unlike me: all big names from journalism and public policy, all far more extroverted than I, all at least twice my age.

It was January 2002, and I was sure that two worlds couldn’t be any more different. Here in the journalism sphere were politicians and journalists, extroverted and boisterous, who told great stories; back at PayPal, analytical, introverted nerds whose skills were mostly technical. At the dinner table in Cambridge were those who could talk to sources and get the scoop; back in Silicon Valley, engineers who automated processes and crunched data.

Within about a decade, all of that would completely flip. News organizations would embrace the data-heavy, analytical approach more common to tech companies. Many of 2013′s top stories would use data at a level that was unfathomable in 2002.

The Goldsmith Awards

My first journey to Cambridge was set in motion only a few days before that dinner, when I got an urgent call from my grandfather.

Can you be in Boston on Saturday to help select the winners of the Goldsmith Awards?

I had only a vague sense of what the Goldsmith Awards were; why exactly should I fly across the country on a few days’ notice?

The Goldsmith Awards, he explained to me, were something he’d worked with the Shorenstein Center at Harvard to set up, using assets from the estate of the late Berda Goldsmith (his legal client). The awards honored great journalism, but their true goal was to foster better public policy. He wanted to reward journalism that shines a light on government, highlighting bad regulations and bad policymakers for the benefit of ordinary citizens.

Being a lawyer who was both thoughtful and crafty, Bob put in place a contract that would maintain close ties between the foundation and the Shorenstein Center. A key clause in the contract stated that one of the award’s judges must be a foundation representative.

Bob had called me because he wanted me to take over as the foundation representative. He hoped that I could go to Cambridge on Friday to see how everything worked. Then the following year, I could represent the Greenfield Foundation’s on the Goldsmith selection panel.

Of course, I said. I’d long been interested in public policy — I minored in Political Science at Stanford — and this would be a real honor.

2002: Reporting

At the judging session — before the dinner at Rialto — I got my first jolt of culture shock. The Goldsmith panel of judges evaluated dozens of newspaper submissions, and their criteria were often a surprise. I’d long been a consumer of the news; that day I got to see a news professional’s perspective for the first time.

Some stories were very impressive on the surface: engaging, in-depth, surprising reports on a policy topic I knew nothing about. It turned out, however, that they closely resembled another story told earlier by someone else. The first story a journalist told about toxins in the local drinking water was probably very impressive; the twelfth such story — reported using the same template as the first one — doesn’t deserve an award.

More stories were dinged for reasons I wouldn’t have fathomed as a mere news consumer. Some were largely the product of a single leak: they came from an insider who wanted his story told, rather than from sleuthing by the reporter. Others were impressive investigative feats, but pointed to flaws in public policies which had virtually no chance of being changed.

The best pieces that year were original, impressive in the depth of their investigations, and had substantial impact on policy. The winner, about hospital care at the Hutch in Seattle, stood out for the sheer amount of manual work it required: the reporters had to wade through “100 interviews and 10,000 pages of documents” to tell their story. The story was amazing, and was notable to me for the set of skills used by the reporters: their methods were a far cry from the algorithm coding I was doing at PayPal.

2013: Reporting and Data

Serving on the Goldsmith panel soon became a tradition for me. Last week, for the twelfth time, I found my heavy winter jacket in the back of the closet (it’s useless in the Bay Area) and packed up for a January weekend in Boston. I’m now the veteran; this year I served with several people who had never judged a competition like the Goldsmith Awards. I still have yet to judge with a panelist younger than me, but I no longer elicit the “what’s that little kid doing at the table?” stares I saw a decade ago.

But that change is predictable: I knew I wouldn’t stay twenty-four forever.

The big surprise is that investigative journalism, so different from my PayPal day job in 2002, now feels like a natural project for a Silicon Valley startup data guy like me.

Journalism has changed a lot in the past decade. In 2002, almost all investigative stories were anecdotal. A story about ineffective education started and ended with interviews of teachers, parents and students. The investigation about medical treatment told stories of the travesties patients had endured, without using terms like “probability” or “percentage”, let alone “false positive”. Occasionally, a Goldsmith submission would talk about the painstaking work that reporters had done to piece together hundreds of paper records to assemble some basic statistics.

Today, by contrast, data analysis plays a huge role in many of the top stories. Of this year’s six finalists, at least three would have been unlikely or impossible ten years ago:

  • Cheating our Children, from the Atlanta Journal-Constitution, is a story about cheating by teachers and schools on standardized tests. The team looked at thousands of districts across the country for highly suspicious anomalies, like every student in a class (supposedly) erasing an incorrect answer to question #27 and then filling in the correct answer. They found several hundred patterns of student improvements that were most likely the result of fraud.
  • State Integrity Investigation, from the Center for Public Integrity, looks at the laws of each of the fifty states and grades each on their risks for corruption. To do this, a reporter in each state perused that state’s practices and regulations — a far more manual approach than Cheating our Children — and assembled a database of information about that state. The end result is both a great way of pressuring states (Utah, don’t you want to improve your D?) and an incredible Wikipedia-like online resource for others (especially journalists) interested in tackling related topics in the future.
  • The Shame of the Boy Scouts, from the LA Times, is the sad story of thousands of incidents of child sexual abuse records in the Boy Scouts. The Times pulled together thousands of newly released Boy Scout child records, using them to tell many unbelievable and sad stories about children who were molested. But the series complemented those stories with a feature that would have been unlikely a decade ago: they posted all of the documents online, for anyone to search and see.

These new data-centered stories are distinguished by three new attributes. The first relates to how a story was uncovered: many stories today are initially found not from a tip, but from a database search. In Cheating our Children, cheating was uncovered not because of a tip from a parent or a teacher, but because of a search for suspicious trends in the data. The steps to get the story were (at a high level) similar to what I was doing at PayPal in 2002: using algorithms to identify a handful of likely fraudsters.

The second data attribute is quantifiability. Historically, journalism has not been a quantitative field, relying instead on an anecdote or two, along with an assumption that “there are many others like them”. And while quantification would be silly for many stories — either Nixon’s people broke into Watergate or they didn’t — it’s an important part of any broader societal story. In 2011, the Goldsmith winner informed the public of local hospital practices that were quantifiably worse than others out there. This year, there were some great not-quite-finalist stories that found and measured the effect of cops speeding and explained just how harmful overly prevalent pain medications can be.

Finally, many of the top stories today are complemented by a structured, searchable database. Each of the three stories above features an interactive tool allowing anyone to find the information most useful to them. On ajc.com, I can look at my local school district for evidence of cheating; on publicintegrity.org I can see how my state fares with respect to corruption risk factors; on latimes.com I can see whether there were any reports of sexual abuse at a specific Boy Scout troop.

Though the world of journalism has its challenges, these are three great developments. They widen the range of stories journalists can tell, they raise the bar on their quality, and they make them individually relevant to the reader.

The Great Bifurcation

The landscape of news and other content is bifurcating, with increasing separation between work that aims to be high-traffic and work that aims to be high-impact. On one side is entertaining content, aimed at driving page views. That content may be news, opinion, or something else, but its goal is very simple: to be part of a traffic machine that underlies an ad-supported online operation.

Traffic machine content is most successful if it arouses curiosity (yes, I do want to check out the six ways that olive oil can help me lose weight!) and can be even more so if it’s also something the reader identifies with and wants to share (this is why people who voted for my political candidate are smart!). That high-traffic story, while cheap to produce, is usually not especially deep or insightful, and it may not even be true. Thus it has little or no positive impact on our institutions.

The other side of the coin, high-impact journalism, is a very different animal. It takes a lot of work and money to produce, but often doesn’t generate a lot of traffic. It may have a great impact on society, but it’s tough to justify for a business. And that’s why, increasingly, it’s the domain of non-profit entities and organizations that are only nominally for profit.

Journalism via a non-profit can be a good thing: those organizations — who today operate at both local and national scale — can focus on the highest impact work rather than try to mix unpopular high-impact stories with popular low-impact ones.

2023: Reporting, Data, and Software

In the new non-profit news organization, there is a simple question to ask: “how can we do work that will have the largest positive impact on public policy?” That is essentially the same question my grandfather asked when he set up the Goldsmith Awards over two decades ago.

To understand how that question will be answered a decade from now, one must first understand the roles played by three different people in today’s professional world:

  • The investigative reporter skillfully combs through documents and asks the right people the right questions to find information. He then turns that information into a compelling story for his audience.
  • The data scientist takes the data available to her and mines it to quantifiably understand a subject. With words, numbers, and data visualization, she shares — usually with less verbal skill than the journalist — that understanding with others.
  • The software developer takes a process that works manually, and figures out how to first generalize it and then automate it. For instance, if you have a meeting in your calendar with an accompanying address, how can software automatically send you directions at the appropriate time? A human can do it manually; the developer writes the software that will automate the process many times over.

A decade ago, the stories I read for the Goldsmith Awards were solely the work of reporters from the first group. They were executed by skilled journalists who knew how to comb through documents, convince insiders to give them secret information, and write stories elegantly.

Today, the data scientist is a key part of journalism: data skills are nearly as important for producing Goldsmith-caliber work as classic investigative skills. Data skills help both at the early phases of a story in finding anomalies worth writing about, and in moving beyond anecdotes to show that trends can be quantified. That anomaly-finding helps increase the range of stories that can be told; quantification makes the stories better.

Still, today’s journalism has a one-off quality that would frustrate a typical software developer. Sure, I can read a story about cheating in schools — or even look at how it affects my hometown — but will the story be automatically updated in three years so it’s still relevant?

In the next decade, it’s likely that we’ll see investigative reporting evolve and improve in several ways:

  • More and more journalism will be automated and updated regularly. District scores will be mined every week; state corruption will be automatically assessed monthly. In some cases, there will be written stories that complement the new data; in other cases the automated jobs will simply feed into an interactive database available to readers.
  • Investigative reporters will get better at soliciting information from their readers and viewers. It’s become a lot easier for readers to contact reporters with tips than it was a few decades ago, but there’s still a lot of room for improvement. Facebook, LinkedIn, Quora, and Twitter make it easier to find and contact the person likely to know a specific piece of information, but they’re not ideal. One could, for instance, imagine a world where citizens could record any suspicious or unacceptable government actions in a form that could be searched by reporters in the future; this would markedly improve many investigative stories.
  • The number of journalists with data skills is increasing rapidly, and that isn’t going to change any time soon: my Twitter feed is filled with data+government+journalism enthusiasts from many different backgrounds. They’re offering online courses, pushing for open data, and a lot more.
  • More and more data — particularly from governments — will come online. The picture today is awful: most government documents are still posted in unstructured form as PDFs and Word docs, making data analysis a lot tougher. That will change.

These changes will allow journalists to more quickly find important stories and tell them more accurately. At a time when some news organizations are slashing budgets and others are defining themselves, that’s important.

Merging Worlds

When I went to Harvard eleven years ago, I couldn’t help feeling like I didn’t quite belong. It was an honor to be part of the Goldsmith Awards, but I was there because I happened to be the grandson of the awards’ founder.

This year, I flew to Boston a day early and spent time with Alex Jones, the longtime Director of the Shorenstein Center. As always, I learned a few things. Alex told me about Journalist’s Resource, a great online tool which lets journalists freely access research on complex topics. He highlighted the increasing role of data in journalism and among many of the top Goldsmith Awards contenders.

While there, I also chatted with Nicco Mele and John Wihbey, both staff members at Shorenstein. Nicco lectures on technology and simultaneously runs a web consultancy; John is the developer behind Journalists’ Resource. Both were full of ideas on how data, journalism, and technology can come together to improve public policy, telling me about cool projects like Journalist’s Resource and Nearby FYI. It was inspiring, and in my conversations I saw a new take on my grandfather’s vision for journalistic impact.

That Saturday night, after a full day selecting the Goldsmith finalists, seven of us met for dinner at Rialto. I was still the introverted techie, and I still didn’t come armed with personal stories about Clintons or Bushes. But having just discussed such a strong set of data-heavy stories, I knew something was different. The landscape has shifted, and journalists have caught on to many of the skills my friends and I value in Silicon Valley. Once just a fly on the wall, the data geek is now an important part of the story.

Thanks to Ben Greenfield for his great feedback on this post.

Mike Greenfield founded Circle of Moms and Team Rankings, led LinkedIn's analytics team from 2004-2007, and built much of PayPal's early fraud detection technology. Ping him at [first_name] at mikegreenfield.com.