Best of Numerate Choir

Here are my best posts from the past few years:

Founder Stories

Silicon Valley

Data

Data + Product + Growth

Crazy Ideas

Design vs. Darwinism. Data vs. Darkness.

The data geek, it’s said, wants to make every decision based only on the numbers. Test this shade of blue against that shade. Pick the winner. Test something else.

The designer is a creative artist, creating something beautiful, something people love. The antithesis of the data geek.

I’ve been thinking about this because I’m a data geek, I just started a new company, and I know that a skilled UX designer could help our product immensely. Are my data-loving values in conflict with the values of those who are UX-focused?

No. As someone who spent lots of time painting in college, I assert that the artist vs. data geek model is an overly simplistic view of the world.

That dichotomy assumes that the data geek cares only about superficial numbers, and lacks the thoughtfulness and creativity to understand things that are hard (or impossible) to measure. It also assumes that the designer cares only about beauty and creativity, and not about whether they’re building actually works in the real world.

It’s easier to understand ourselves with these two questions:

1) Do you want to scientifically understand the way people are using your product, and use that understanding as part of your decision making process?

2) Is your business an automatically shifting, evolutionary machine that moves itself purely based on numbers, or is someone guiding it in a specific direction?

My answer to the first question is a very strong yes: I want my company to deeply understand how people are using its products.

The second question is a bit tougher for me. I like evolution, and I understand that natural selection can yield great outcomes. On the other hand, guidance and clear direction can be a far more efficient way to get to the best outcomes.

My first blog post, The Visionary and The Pivoter, discussed my experience building a company that wound up being more evolutionary than directed, and the challenges of that.

Here’s how I see things now:

Sites focused purely on viral content — Buzzfeed, Upworthy, et al — are in the top left: impressive (to me) for their ability to iterate based on data, but far more reactive than visionary.

My last startup, Circle of Moms, was focused on improving the lives of a specific audience (moms!), but we too were more evolutionary than visionary.

Amazon is a very data-centered company, but one with clear visions on where their product and business will move the world. Apple, on the other hand, possesses clarity of vision and an intent to push the world in a certain direction, but is seemingly less data-focused. Clearly, both of those models can yield tremendous successes.

Being reactive/evolutionary and in the dark with respect to data is the worst combination: you don’t know where you want to go, but you can’t see anything around you to help you find a good path. I’ve seen a few startups like that — they change their strategy every month based not on data but on a (bad) blog post someone writes — and it’s ugly.

Many companies move up on the scientific scale over time. There’s a real cost to collecting and analyzing data, and it’s easier to invest in doing it correctly with 100 employees than with ten.

I’d like to be in the brown box that has my picture: deeply scientific, but more directed than evolutionary.

Long term, I suspect that most great user experience people won’t be too far from me. They’ll use data to help them design things that more people like. But they’ll be thoughtful in the application of that data, so they won’t feel forced into a massive, evolutionary pinball game that throws them around randomly.

Your Metrics Are Bad and Why “Data Driven” Isn’t Enough

Being “data driven” is all the rage these days.

We all — businesses, government entities, sensor-equipped individuals — have more and more data that can help with decisions. The era of Big Data is here, yada yada yada: you know the annoying cliches as well as I do.

There are more and better tools. Dozens of startups are working on better ways to collect data, process it, query it, visualize it.

I recently talked with an entrepreneur who, fresh off of raising a big round of funding, was told by his investors that he needed to make his company more data driven. He wasn’t sure what “more data driven” actually meant, and he told me he wasn’t sure his investors did either.

It sure sounds nice, though — doesn’t it?

Honestly, I don’t know how I’d define “data driven”, and I’m not sure I care enough about the term to really think it through. But I’m pretty sure I know what’s missing.

Very, very few companies know what questions to ask of their data. They have metrics that are beautifully plotted on their real-time data dashboards. They’re calculated in technologically scalable ways, using something that’s much simpler than SQL, and they’re accessible by everyone inside the company.

But more often than not, the metrics are superficial and poorly thought through. They’re not reflective of the health of the product or the business.

I’ve certainly been guilty of this: for months if not years at my last startup, anything other than new user registrations barely mattered to me. For Circle of Moms, getting new users was extremely important, but at times distracted us from more important long-term goals.

And I see this again and again with tech companies. There’s a focus on one or two superficial metrics, rather than a deep understanding of what it will take to build out the broader ecosystem necessary to make the company successful.

I don’t want to be too negative: the understanding of these ecosystems has significantly improved in the decade-plus I’ve been in Silicon Valley. Ten years ago, entrepreneurs building consumer startups barely thought about distribution (if we build a great product, people will come to us!). Five years ago, entrepreneurs (myself included) started to realize that distribution mattered, but rarely took the next step (Facebook is the notable exception). Today, more and more entrepreneurs understand that both distribution and engagement matter, even if they can’t get at all of the underlying components.

Today, a few of the strongest consumer companies — Facebook, LinkedIn, Twitter — have built out growth and data teams that collectively measure and understand the key dynamics.

But there are still huge areas of our society — small non-tech businesses, government at all levels, medicine, academic studies, many startups — where there are lots of data, but not much understanding of what the data actually mean.

And that’s a big problem: I’ve long felt that having bad metrics is often worse than having no metrics at all.

If I were trying to gauge a basketball player’s skill level, my top preference would be to use a well-structured metric incorporating an entire season’s worth of extremely detailed, second-by-second data, looking at his impact on all aspects of the game. My second choice would be a good coach’s purely qualitative assessment of his skill. And my last choice would be a simple stat — say points per game — that was state of the art in 1950.

Today, most businesses are using the equivalents of the coach’s qualitative assessment and points per game to make their decisions. And quite frequently, “data driven” effectively means “we’re using points per game.”

Most of the new “Big Data” companies are focused on the relatively simple stuff: speed of processing data, ease of accessing data, beauty of data presentation. Those are all valuable, but they aren’t enough.

So how will the “bad metric” problem be solved? Certainly with some mix of better data training for everyone, plus tools that automatically discover and surface the important metrics. Both are important and I’m not sure whether training trump technology or technology trumps training.

Either way, if we want these new data to improve our collective decision-making, the good metric-bad metric problem badly needs to be solved.

To Avoid the Perils of Get-Rich-Quick, Work With People Who’ll Play the Game Again (and Again)

Why is it so irritating to be nickel and dimed?

There is of course a financial aspect (you’re trying to take more of my money) and one of expectations (I might have paid $15 if you’d quoted me that, but you told me it was $10 and now you’re asking for $15).

But perhaps more important is the message you’re sending: a few extra bucks on this transaction is more important to you than our relationship.

In an extreme case, that behavior is completely rational (even if dishonest). If you weren’t well off and had limited future opportunities, you’d probably opt to “nickel and dime” Bill Gates for a million bucks.

In most real world cases, it’s a little more dubious.

Prisoner’s Dilemma, Over and Over

Everyone knows about the Prisoner’s Dilemma, in which the rational strategy for both arrested parties is to testify against their partner. This results in a situation where both men wind up worse than if they’d colluded and remained silent.

In this situation, the players only play the game one time, and almost inevitably wind up screwing one another over.

However, the optimal strategy in other variations of the game gets a lot more complex. If rather than just playing once, players play an unknown number of repeated games, a “tit for tat” strategy can be a very good one.

In a tit for tat strategy, a player starts off playing nice. He plays nice in subsequent rounds if his opponent just played nice with him, but he plays mean if his opponent just played mean with him. If both players adopt this strategy, they’ll wind up colluding and they will consistently help one another out.

Playing the Business Game Many Times

Business is usually both more enjoyable and more successful when your co-players expect to play the game repeatedly.

I can try to extract as much value out of you from you as possible for the thing I’m working on right now. That may mean taking your money, driving you to overwork yourself on my behalf, or getting you to do me favors. I’ll probably be better off tomorrow than I would have been, but you won’t trust me and I won’t be as well positioned the next time I play the game.

Or I can work with the expectation that we’ll play the game again: by doing a little more for you and asking a little less, you’ll treat me well in the future and we’ll both wind up better than we otherwise would have.

Oddly, those who likely have the most games in front of them — new professionals just out of school — are generally more likely to act like they’re only playing the game once. When I first started working, I didn’t have the experience of working with the same people at different companies.

Sure, I might have thought, I’m working with that guy now, but could I really have imagined that I would be part of his company or he of mine ten to twenty years down the road? Probably not: I didn’t fully internalize the importance of investing in relationships.

When, later in your career, you’ve had the experience of working with one person two or three times, you see the pattern. You realize that for any of your colleagues, this may not be the last time you work together.

LinkedIn is, at its core, an embodiment of the importance of ongoing professional relationships. Keith Rabois recently mentioned that an amazing five of LinkedIn’s first twenty-seven people (Keith, Lee Hower, Reid Hoffman, Josh Elman, and Matt Cohler) are partners at VC firms; in large part, that’s because each plays the game collaboratively, with the expectation that there will be many future iterations. Keith, Lee, Reid, Josh, and Matt are all in it for the long term.

It’s very easy to adopt a mentality that prioritizes the thing that’s in front of you. In business, that often manifests itself as a get rich quick scheme.

If your goal is success right now (screw the future), you can spend time with the nickel-and-dime, get-rich-quick types. If you’re interested in the longer term, I suggest working with people like Keith, Lee, Reid, Josh, and Matt.

How to Build a Quant Venture Fund

We’re Moneyball for carpet cleaning!

It’s easy to mock the mindless business metaphors of the day, and many of the “Moneyball for _____” ideas deserve that mockery.

But in the field of venture capital, there’s a huge opportunity for data — used correctly — to help investors make better decisions. Last month TechCrunch published a story suggesting that it’s already happening.

My somewhat informed guess is that most of what’s in the article is unsophisticated and/or vaporware. The “deep data mines” Leena Rao mentions are probably more like spreadsheets filled in by interns and/or Python scripts. Such spreadsheets — complete with stats from app stores, Alexa, and AngelList — would certainly be useful, but hardly qualify as analytically brilliant.

So what could an innovative VC do instead? I will walk through a model, mainly tailored for an early stage fund.

Data Collection

Probably the most important piece of building a predictive model is the collection of a great set of features that are likely to be predictive.

In the context of a venture fund, you’d likely want to collect a bunch of data at the time of an investment. You’d then have to wait a while (probably a couple of years) to see how those data predicted winners and losers.

For the data to be predictive, you need (1) underlying data structure that doesn’t change much and (2) a world where the same factors are likely to predict success.

That’s relatively easy if you’re trying to figure out whether a restaurant’s second location will be successful: you can look at a bunch of common metrics like customers, revenue per customer, employees per customer, demographics of the old location and the new one, etc.

It’s a lot tougher in the startup world, where the rules of the game are constantly changing. A million users today is very different from a million users in 2003; most of today’s important platforms didn’t even exist ten years ago.

That then begs the question: what are the “metrics” that are most likely to be predictive in the world of 2018 or 2023? To create a model that survives for more than a year or two, one would have to look at variables that will look similar in five or ten years.

Perhaps surprisingly, that means ignoring (or at least de-emphasizing) the variables of 2013 being tracked in that Python spreadsheet.

Instead, the model would use scores generated by (hopefully insightful) VCs themselves. Those VCs — and possibly other reviewers — would score each startup on traits like the following:

– how charismatic is the best founder? (1-5)
– how business-smart is the best founder?
– how tech-smart is the best founder?
– how well do the founders seem to complement one another?
– do you think the best founder could be “world class” at something some day (even if not today)?
– what about the worst founder?
– how well do the founders understand the market?
– how scrappy and hard-working do the founders seem?
– how would you rate the potential size of the business (independent of founders)?
– how would you rate the company’s user traction?
– how would you rate the company’s revenue traction?

This is just a set of example questions.

If you got the same (say) 5-6 people to rate all of those companies, you’d have 50+ data points for each startup. It might be the case that averaging scores (e.g., the average founder charisma score) would wind up being key. Or it might turn out that one VC’s “how scrappy” score is incredibly predictive while another’s was not at all.

Ideally, you’d do the actual reviews in a very standardized way: reviewers would either always talk to people in person for a certain amount of time, always talk to them on Skype, etc.

You’d also compile the answers to more straightforward questions: how many founders were there, how much revenue did they have, how many months had they been working together, had they started companies before, what colleges did they attend, etc. Some of those are probably predictive.

You’d have to do this for a bunch of companies (over 100, ideally a lot more), but not actually do anything with the data. i.e., if a firm invests in a startup and VC #1 rates the founder’s charisma a 4, business smarts a 3, etc., you just stick that in a database and let it sit.

The Model

When you build a predictive model, you use those inputs to predict something. Sometimes, what you’re trying to predict — who will win a basketball game, whether a customer will pay for your service — is fairly straightforward. In this case, it’s not.

The obvious value to be predicted is the long-term valuation of the company (or the fraction thereof that an investment would capture): the money it returns to investors when it folds, its acquisition price, or its valuation at IPO. This would, after accounting for details like liquidation preferences, dilution, and taxes, reflect the return for a potential investor.

It’s not necessarily clear, however, that this would yield the best predictive model. As any VC knows, returns are shaped in large part by one or two exceptional wins: many funds have one very successful investment that provides the majority of returns. If there are only a few such companies a year, even an all-knowing VC is unlikely to have the data scale to make a good predictive model.

Instead, an investor might look at a few different scores:

1) How good did we think this investment would be when they first pitched us?
2) How good did we think they’d be 2 months after we invested? In two months working together, the VC should hopefully learn a lot to tell her if this was a sound investment. This is probably only a useful data point for actual investments, as opposed to rejected deals.
3) How good did we think they’d be 1 year after we invested?

etc., etc.

It may be the case that VCs (et al) are pretty good at #2 but not so good at #1. To gauge that, they could build a predictive model to predict how they’ll feel a few months in, given the dozens of measurements they took pre-investment. Within a couple of months, they may be able to more effectively forecast the likelihood of intermediate success potential. This intermediate gauge is subjective but still more thorough than what’s being done: what predicts how I’ll assess this opportunity when I really understand it more deeply?

Because returns are driven by a few outliers, asserting an exact expected long term valuation for early stage companies is difficult. If a promising but young company has a 0.1% chance of some day having a $100 billion valuation, they’re ostensibly worth at least $100 million. If, however, those odds are only 0.001%, they may be worth as little as $1 million. And a venture firm has virtually no chance of having enough data to distinguish between 0.1% odds and 0.001% odds.

However, a venture firm should have enough data to distinguish between larger tiers of companies: companies at the 98th percentile or higher, companies between the 95th and 98th, companies between 90-95, companies between 75 and 90, and everyone else. Some of the companies in the top tier — defined by competence and potential more than ultimate outcome — will fail; others will be incredibly successful and go public.

The goal of this model would be to predict the tier, rather than the ultimate outcome.

Implementation

Step one of this process is about recording data, not about changing decision making. And step two would be to take months or years of data and build a model.

Only after months or years, when the firm has actual outcome data and a predictive model, would it actually use the model to help with decisions.

An early stage fund that makes dozens of investments a year is best positioned to execute on this strategy. There are two reasons for this:

1) The metrics defined are more about people and potential, and less about business fundamentals; except in rare cases it would be irresponsible for a late-stage fund to give a company a $100 million valuation based only on those scores.
2) They have a lot more data.

All of the “human” factors that make an angel fund successful today — dealflow, partners’ skill at evaluating founders, helpfulness to entrepreneurs — would be equally important in this sort of fund. However, a formalized predictive decision-making process could improve returns significantly.

See also: Data Scale: Why Big Data Trumps Small Data

Why ‘A’ Players Should Collude

They should have been paying me more.

That was my thinking, at least. I was building predictive models for PayPal to detect fraud, and my models were effective at saving the company money. From 2002 through 2004, that work was likely saving PayPal at least ten million dollars a year.

Knowing how much I was helping the company, I figured I’d have some pretty sweet leverage if I ever wanted to try to negotiate a better pay package. My work was easily quantified, and it was making a big difference.

If my models were capturing an extra ten million bucks, surely the company could reward me with half that, or at least a quarter of it, right? Especially since PayPal was being valued relative to earnings; at a multiplier of 30x earnings, those models had created an extra $300 million in shareholder value. Even a few percent of $300 million would make my day!

(At the time, I was not the type to “lean in” and negotiate harder, so that never came to fruition — but that’s a separate story.)

The Non-Zero Baseline

Sadly, my thinking was flawed. My assumption was that the baseline — the business equivalent to the “replacement” player in baseball was a business that was exactly break-even. I could add $10 million in profits, four others could do the same, and the company could effectively divvy up those $50 million in profits among the five of us.

The problem is that the baseline for PayPal — what would have happened with average players at every position — wasn’t zero. The baseline was a business that was losing hundreds of millions of dollars a year.

So I was saving the company $10 million per year, so were perhaps nineteen of my colleagues. My statistical models were ‘A’ work and a huge improvement over the baseline, but so were our fraud policies, our viral growth channels, our eBay integration, our legal maneuverings, and many other areas.

Without all of those accomplishments, PayPal would have gone on losing hundreds of millions of dollars until we went out of business. If the baseline of mediocrity was a business that lost $180 million per year, adding twenty strong people (or teams) whose “above replacement” value was each $10 million per year could collectively improve our bottom line by $200 million — but that would still only lead to a business that made $20 million in profits.

In that sort of business — which is roughly what the pre-acquisition PayPal of 2002 looked like — my original reasoning made no sense.

Collusion Is Good

Recently, I’ve been evaluating a new startup idea. It’s something that’s as ambitious as PayPal, but also just as fraught with challenges: there are a lot of ways that it could lose money. And that’s encouraged me to revisit some of my PayPal memories.

One of my biggest lessons is the importance of team quality for such a difficult and ambitious company. Looking back, it’s as if a bunch of smart and hard-working people colluded and decided to work together on a business that would have failed if we hadn’t all worked on it. Without that collusion, PayPal wouldn’t even have a Wikipedia page, let alone a huge business or a mafia.

2002 was the ideal time for that sort of collusion, because there weren’t a whole lot of options in Silicon Valley. No one was starting their own company, and the list of hot private companies I was aware of had exactly two entries: PayPal and Google.

To solve the toughest problems — and building a slightly more elegant social networks doesn’t qualify — one still needs that type of collusion from ‘A’ players. The good news for founders is that Silicon Valley is attracting more talent than it was in 2002; the bad news is that starting a company has become cool again (I’m guilty too!) and the hot company list has grown from two to dozens.

Collusion generally has a negative connotation, but in this context it can be a very good thing. If, rather than spread themselves among ten mediocre companies, ten all-stars can be like LeBron (and Dwyane Wade and Chris Bosh) and collude, they can see better results and solve bigger problems. And unlike LeBron, they don’t have to do it in zero-sum games.

Happy 10th Birthday, LinkedIn!

Ten years ago today, LinkedIn was born. It’s radically changed my life over the past decade.

I had no knowledge of LinkedIn’s existence on the day it launched in 2003. But a few days later, I got an invitation from my former PayPal colleague Keith Rabois to join this new site, LinkedIn. I signed up (as user number 1400 or so), saw a few familiar names on it, and was immediately intrigued.

In the months to come, I used LinkedIn a handful of times, connecting to colleagues and meeting a couple of entrepreneurs who reached out to me for advice on fraud prevention.

In late 2003 or early 2004, my former PayPal colleague Lee Hower reached out to me to see if I “knew anyone” who might be interested in working on data-type problems for this company LinkedIn. Lee and I had lunch, and I learned that LinkedIn had built its network out to be a couple of hundred thousand users. This was pretty cool, and something I could certainly see working on.

A week or so later, I was sitting in a Mountain View conference room with Reid Hoffman — with whom I’d spoken exactly once when he was a PayPal exec — and Jean-Luc Vaillant, brainstorming cool stuff we could do with data.

Not long after that, in February 2004, Jean-Luc set me up with a Mac laptop, and I started delving into the data. For the next few months, I was essentially moonlighting at LinkedIn, coming into the office one or two days a week, while mostly still at PayPal.

When I found that I was consistently more excited to get out of bed on the LinkedIn days than on the PayPal days, I decided I would (after biking around France for a month!) join LinkedIn full-time.

Thus began my LinkedIn employment odyssey. I worked full-time at LinkedIn from October 2004 until January 2007, leading a small analytics team with two awesome hires, Jonathan Goldman and Shirley Xu.

I learned a ton at LinkedIn, worked on some interesting and important products, and got to collaborate with lots of great people I now consider friends (and of course, LinkedIn connections).

After I left, my team became the large, influential and highly-regarded (kudos to Jonathan and DJ Patil) Data Science team.

Though I was no longer employed at LinkedIn from 2007 on, the company has continued to play a crucial role in my life. The first two engineers we hired at Circle of Moms, Brian Leung and Louise Magno, came in through the same LinkedIn job post in 2008.

Looking back at the Inmails I’ve sent, I see a number of people I reached out to try to hire and now know well; in many cases we didn’t wind up working together, but they became friends and valuable connections.

LinkedIn has prepared me for meetings with hundreds and hundreds of people. I wish I still had database access so I could run the query to figure out just how many.

When I first started poking around LinkedIn in 2003, I had a couple dozen connections. I looked at the profiles of people like Lee and Reid, seeing well over 100 connections. I figured I was simply not the kind of person who’d ever amass that number of professional contacts.

Today, I have over 900 connections on LinkedIn; the vast majority of those are people I’d feel comfortable reaching out to for an important professional purpose. Part of that increase is a reflection of my evolution, but a lot of it is thanks to LinkedIn.

To Reid, Jean-Luc, Lee, Allen, Chris, Sarah, Matt, and the millions of others who have helped to build LinkedIn, thanks and happy birthday!

Defending the Brash Arrogance of Silicon Valley

The best thing about being a statistician is that you get to play in everyone’s backyard.

I read this quote in a New York Times obituary in 2000, and it’s stuck with me ever since. As a data guy (I’m too much of a hack to be called a statistician), I love the idea of playing in lots of backyards.

If statisticians are playing in everyone’s backyard, the best Silicon Valley entrepreneurs are knocking down all of the houses and businesses in the neighborhood and putting in place something completely different.

Oh, and by the way, this is their first construction project, and it’s all going to be finished next month.

It’s pretty arrogant to attempt that sort of thing, isn’t it? It’s arrogant even if you’re not being forced to reinvent your neighborhood — or get the latest smartphone and move half of your communications to Facebook/Twitter/LinkedIn.

Early on at PayPal, the founders brashly spoke of reinventing the way people paid one another, even describing PayPal as a new world currency. Who were these founders who wanted to reinvent payments?

Peter Thiel was a thirty-something former lawyer and hedge fund manager who had little experience with either payments or tech companies. Max Levchin was a recent college grad with coding skills and no special knowledge of payments.

So did they hire other “experts” to do all of the detailed work? Not really. To solve the fraud problems that were draining the company, they hired people like me: a recent Stanford grad who’d never thought about, let alone worked on, understanding fraud patterns.

They did the same thing in other areas central to the company’s success. Peter brashly spoke of PayPal’s lack of “adult supervision”; clearly he reveled in being the cocky first-timer, destroying the experts at their own game.

Today, there are at least two Silicon Valleys. One is the Silicon Valley of yesteryear, building faster and smaller processors, bigger and clearer screens, and lighter and longer lasting batteries. The other is the Silicon Valley I got to know at PayPal. That Silicon Valley arrogantly tries to reinvent industries with a mix of deep technology, persuasive marketing, appealing products, and data-driven insights.

In the past few months, I’ve listened to entrepreneurs’ pitches for shaking up a vast array of industries, everything from diabetes care to home-buying, from photography to car insurance, from restaurant payments to government budgeting. Most of these entrepreneurs — like Elon Musk with SpaceX and Tesla, and Max and Peter fifteen years ago — have little or no experience in the industries they’re trying to upend.

In most of the world, people wouldn’t have the guts to do that. But Silicon Valley encourages a special type of arrogance, a type that claims a few smart “kids” can solve problems that have been vexing experts for generations.

Inevitably, that arrogance can be off-putting: is the slightly awkward 22-year-old computer science major who was spending half his free time at frat parties two months ago really the right person to reinvent health care?

Most likely, he isn’t the right person. But like science — a set of theories which are sometimes individually wrong but collectively get closer and closer to truth over time — Silicon Valley is a system.

By going after the big stuff — sometimes unjustifiably or arrogantly — the system collectively increases the probability of big breakthroughs. For that system to work well, you need a culture that encourages smart people who don’t know everything to brashly assert that they can do better.

I’ve seen that pattern again and again: like many in Silicon Valley, I’ve worked in a range of areas (payments, social networking, parenting) where most of my colleagues and I started with no expertise whatsoever. And the results — both for me and for many others — have been astounding, reinventing industry after industry.

Every now and then, someone like Chamath Palihapitiya bemoans the lack of big innovation in today’s startups.

I suspect that this claim is quantitatively wrong: though there are many frivolous me-too startups, there are probably more ambitious (arrogant) work-on-a-big-problem startups than ever.

Nevertheless, I completely agree with Chamath on where we need to go. Silicon Valley at its best is both arrogant and thoughtful: brashly trying to conquer problems others couldn’t solve, while thinking seriously about the societal ramifications.

Let’s go knock down some neighborhoods and build them up to be a whole lot better. Metaphorically… right?

In Search of a Happy Medium Between Academic Papers and Shlocky Business Insider Top 10 Lists

Here’s the state of the art in home page design for an academic website:

And here’s the state of the art in home page design for a rapidly growing media site:

Suffice it to say that I don’t want to emulate either the blandness of the first interface or the tabloid feel of the second one.

Now that Numerate Choir is a year old, I’ve been thinking about these extremes of online content, and trying to figure out where sites like Numerate Choir fit.

Virality

In the world of virality, headline usually matters more than content: the headline is what people click on, and it strongly influences what people share.

In the world of virality, short, simple, and universally entertaining is key. How something is presented matters more than what is presented. That top 10 list can be utter crap, thrown together in five minutes, but hey, look at the pretty slides! And my friend is #8 on the list!

The world of virality is superficial, but it has one very valuable characteristic. It understands what normal people value, and it concerns itself with what they will respond to in the real world. It doesn’t concern itself with “could” or “should”; the key term is “actually does”.

Art and Academia

In the world of art and academia, depth and truth should matter above all else. Headline is superficial: what matters is who reads something, not how many people read it.

In the world of art and academia, entertainment is looked down upon. It’s far better to be profound — and full of jargon — than amusing or captivating.

The world of art and academia is also superficial, but in a different way: superficiality is manifested as the desire to prove one’s sophistication. Those in the academic and artistic sphere generally don’t understand how to create content for the masses, but they are far more likely to discover things that matter.

Numerate Choir and a Little of Each

As I wrote about in my post on the state of journalism, there’s a growing dichotomy between 1) pseudo-news that can be both popular and profitable, and 2) deeper content that often fits better into a non-profit model.

I have a love/hate relationship with both of these extremes. And my work on Numerate Choir has oscillated between the two.

Like an academic, I write blog posts that I — and anyone with editing experience — knows are way too long even for a geeky Silicon Valley audience. And then I float to the other side: how can I craft a headline that will maximize sharing on Twitter? And why can’t I easily a/b test the stupid thing?

Like most academics, I do a poor job marketing and selling my work. I naively hit the publish button and hope for the best: beyond a quick post on the major social networks, I do nothing to publicize my writings. But then like the growth-loving impatient exec I criticize, I keep a close eye on Twitter to see if any of the cool kids have shared my new post.

Like an academic, I hope that what I write will help people understand an issue and maybe think a little more deeply. Like a growth hacker, I do care about how many people read them (even if I recognize that they aren’t for everyone).

It’s amazing yet discouraging that my most read post — by a factor of two — was a relatively lightweight rant on not needing a real-time dashboard. That post was written in response to a friend’s emailed question; I wrote and published the whole thing in under two hours.

(Credit where it’s due: that distribution was helped in large part by Andrew Chen cross-publishing it on his blog — thanks, Andrew.)

I often write with a lofty goal: capture some of the truth and quality of academia done right, and reach a larger audience. Sometimes that’s worked: The Visionary and the Pivoter and A Founder’s Constant State of Rejection (also on founderdating.com) were both read by lots of people. Two other posts I was especially proud of, Why Big Data Trumps Small Data and In a Data-Driven World, Honesty is the Fundamental Virtue, were far less successful metrics-wise.

My blog posts have generally followed one of two patterns:

  • I post something, then link to it on Facebook, Twitter, LinkedIn, etc. A few of my friends read it, and one or two of them might retweet the link. The post is seen by at most a couple thousand people.
  • I post something and link to it and a few people click through. That’s the “soft launch.” One or two of the clickers happens to be a friend who’s an order of magnitude better known than I am. When he tweets it out himself, we have a “real launch” (thanks Naval/Keith/Andrew/Eric/Jeremy/Dave). That tech celebrity share serves not just as a driver of traffic but as social proof to others: this is not a sleazy Business Insider post.

Hence Twitter is essentially an oligarchy: a handful of people have most of the power. While many could exercise it better, it winds up being an acceptable model to propagate this sort of semi-intellectual, semi-popular content.

Metrics for Success

What metrics define success for this blog?

The business model for most online media sites is pretty simple: visitors and page views translate directly into ad dollars. In most cases, the revenue per user is constant: it doesn’t matter if the viewer is President Obama or my one year old daughter banging on the keyboard.

For this blog, revenue is and will be zero, regardless of how many readers there are. I’m not sure what the most important metric is; I can think of at least five that matter to me.

Several times, I’ve written about topics I’ll almost surely never touch professionally. When I do this, it’s often with the hope that somehow it will reach someone far more influential than I, and affect a positive change on the world.

Other times, I’ll write to crystallize my own thoughts. If I get great feedback, that’s nice, but I’m writing more for myself.

Sometimes, my posts can be a good way to tell a bunch of my friends about my experiences; they then inspire deeper conversations about interesting topics.

A couple of times, I’ve been able to link people I know to a post I wrote explaining my thoughts on a particular issue. This means I don’t need to write it up again and again. As any engineer knows, re-usability is good.

Finally, I hope to share my personal learnings with others. I can think of only two places where a 35-year-old can be considered wise: a society where the life expectancy is 40, and Silicon Valley. I’ll drift off that sentiment if I can.

In a Data-Driven World, Honesty is the Fundamental Virtue

In times of war, there are no greater virtues than loyalty and bravery. A country with disloyal citizens is likely to lose every battle. Bravery is an essential trait to overcome the harshness of war.

Likewise, for most of human history, sexual purity was promoted as an essential virtue. Because contraceptives were not an option, a promiscuous society would be one with frequent unwanted pregnancies. As a result, many societies developed strong cultural norms to discourage physical relationships before marriage.

Today, the world is relatively free of wars and effective contraceptives are widely available, so these traits are valued less. Our culture constantly re-evaluates its norms.

Meanwhile, as more and more of our lives are recorded, the data we collect facilitate better decision-making. I’ve written about many aspects of that: data help journalists find better stories, help predict the future, and much more.

However, a data-driven society is only functional when people follow the right cultural norms. In a country at war, a culture of loyalty helps ensure that everyone is in line. In a capitalist society, a culture that discourages theft can allow small businesses to prosper without fear of losing property. In a data-driven society, we must stay intellectually honest.

Without intellectual honesty, the data are flawed and unreliable. Flawed data lead to poor decision-making; it’s usually better to use only your gut than to rely on a poorly formed data set. And unfortunately, many people are using data in intellectually dishonest ways.

Schools and Cheating

In a data-driven world, we must not cheat.

One of this year’s Goldsmith Award finalists is an astonishing data-driven series which uncovered high levels of school cheating. I’m proud that we honored that story, but I’ve been somewhat taken aback by some of the responses I’ve heard to it.

Many people I’ve spoken with, when told of this investigation, immediately blamed the reward structure. No Child Left Behind, they say, created an overly pressurized education system. The sort of large scale cheating exposed by the Atlanta Journal-Constitution was inevitable given the high stakes of the tests.

That is nonsense. One wouldn’t excuse a CEO stealing money from others because there was so much pressure on him to improve his company’s performance — even if the CEO thought the means of evaluating him were unfair. No Child Left Behind and other education policies aren’t perfect (my suggestions), but they’re a starting point.

To improve, we’ll need to refine our testing system and get better at measuring progress. We’ll also need a culture of integrity from teachers and administrators. Without that integrity, our system will consist of results we can’t trust — and a terrible example for students.

Guns and Intellectual Curiosity

In a data-driven world, we must approach major issues with an open-minded, intellectually curious approach.

Of course people use data dishonestly for political arguments. But it’s not just sleazy politicians: I see intelligent friends on both sides completely misrepresenting the data on gun violence in the US. Anti-gun advocates point out that the U.S. has more gun ownership and more gun deaths than other Western countries and jump to the “obvious” conclusion that more guns means more violence. Pro-gun advocates point out that the U.S. has far higher gun ownership rates than many (non-Western) countries that are much more violent than they U.S.; they jump to the also “obvious” conclusion that criminals will find a way to purchase guns regardless of gun policy, meaning decreases in gun ownership would have no impact on gun violence.

It’s likely that each side is at least partially correct. Millions more guns in Americans’ hands mean at least a few more deaths; many of the most violent criminals will find a way to kill regardless of gun laws.

Yet my friends who post these stats do so with a lack of intellectual curiosity. In most cases, they haven’t looked at the numbers with an open mind, and they don’t really understand how gun dynamics work. I’ve never seen, for instance, someone point out that gun ownership in states is highly correlated with suicide rates but minimally correlated with murder rates. That fact — which implies that less restrictive gun laws may lead to suicides but not terrible crimes — doesn’t fit neatly into anyone’s pro- or anti-gun view of the world.

I’m realistic: I don’t expect that everyone is going to gather data sets on gun violence on their own. However, because the data are out there, I ask smart people to raise their bar: if you haven’t looked at the data closely enough to have an informed, nuanced opinion, please keep quiet. Either do some real research, or don’t spread your uninformed perspective.

Pitching Investors and Misleading

In a data-driven world, we must not mislead or be misled.

Working with many early stage companies, I see a lot of investor pitches.

Startups have gotten better at crafting an appealing pitch, by throwing out numbers like these:

  • a) We’ve increased revenue by 25%, month over month
  • b) Our user base is growing: we had 500,000 users last July and now have 800,000 (often with an attached graph showing total users at the end of each month)
  • c) Our monthly retention rate is 75%

Most of the time, these stats aim to mislead.

(a) doesn’t have a baseline or a time horizon. It might mean that revenue was $12 last month and is $15 this month. Does that sound as impressive?

(b) looks at cumulative signups rather than monthly signups. Cumulative signups are always going up, and it takes a lot more effort for the viewer to see whether the second derivative (signups this month versus last month) was positive or negative. We did this in investor presentations for Circle of Moms, because we knew it would obfuscate some of our negative trends. I advocated that approach, and I don’t feel great about it.

(c) may be a useful stat, but it’s almost always calculated in an obfuscated, company-friendly way.

These kind of pitches are “just the way it is” — as is the case for misleading political data analysis. But in a data-driven world, we need to aim higher.

Entrepreneurs should assume their audience is intelligent and mature, cognizant that not all numbers go up.

And investors should understand what they’re looking at, and should call BS on entrepreneurs who surface their numbers like this.

Conclusion

Access to data is, on the whole, a very good thing. Deep, data-driven knowledge allows us to make better decisions and preserve resources. With the right data, we can better reward the best teachers, fund the top companies, and create better public policy.

However, for that to work, our society needs to create a stronger culture of honesty around data. We can’t cheat to get around failures. We must seek out all of the facts, and not promote only those that fit into a narrow ideology. And we must use data to inform rather than mislead. If we don’t do those things, we’ll make decisions that are driven by flawed data — lies — and many will suffer.

The good news is that we’re still early in an age of data-driven decision-making. Our collective culture has developed to better discourage practices like stealing and killing. In this wonderful age of data and better decision making, can we become more honest?