How AngelList Quantitatively Changes the Investing Game

Late in 2007, Circle of Friends was adding hundreds of thousands of users a day, and Ephraim and I knew the time was right to expand our office beyond my kitchen table and raise some money.

We each reached out to a few of our friends, and quickly got a number of introductions to angel investors and venture capitalists. My friend Jared introduced us to Mike Maples. Friends of Ephraim introduced us to Jeff Clavier and Naval Ravikant.

Mike, Jeff, and Naval all wound up investing in our company. Immediately after signing on, and then for years after, each of them introduced us to a number of other well-regarded investors (thanks, guys!).

That’s been a pretty typical experience in Silicon Valley over the past couple of decades. When someone wants to raise money, they reach investors through trusted contacts. When one investor signs on, she then introduces the entrepreneur to other investors she knows well. These introductions can have a strong weight, especially when the initial investor is a trusted source.

This often works pretty well, but it means that companies often raise money from just a small number of highly connected cliques. That has positive implications — those involved likely have more trust in one another — but it also leads to a process that’s relatively closed. Investors often form clusters and invest together. So if, for instance, Mike sits in one cluster and Naval and Jeff are in another, it’s likely that many of our future investors will also hail from one of those two clusters.

To understand this story quantitatively, I looked at angels’ co-investment patterns, using AngelList’s investment data to group the most prolific investors into clusters. I put the top 870 investors into 25 distinct clusters. Each cluster represents a group of people from whom co-investments are more common, for any of a number of reasons — geographic, industry-based, philosophical, or reputational.

(Methodology note: when companies listed multiple investors from a venture firm like 500 Startups, I only counted an investment from one of the partners.)

So do the clusters reflect real investor patterns? To answer this, we can compare to a “random” world, where investors find startups and make decisions on their own, without any social input.

In this random world, the second investor in a startup would sit in a different cluster from the first investor 82% of the time. But the real world is very different from that. In fact, when a second investor comes on board, there’s a 57% chance that he’ll fall into a different cluster from the first investor. In other words, in the real world, investor #2 is almost 2.5x as likely to fall in the same cluster (18% vs. 43%) as in random world.

This trend continues as the number of investment grows. When the existing pool of capital comes from two, three, or four distinct clusters, the odds that the next investor comes from one a new cluster is 47%. If it comes from five to eight of the 25 clusters, the odds go to 43%. All of these are considerably lower than one would expect in a random world: the universe of traditional angel investment is well-networked and influential upon itself (perjoratively: an old boys club).

Enter AngelList, which is increasingly important in the startup ecosystem (disclosure: I’m an adviser). AngelList functions much more like an open marketplace for startups than the traditional model. In most cases, startups make public to all investors that they are seeking investment, effectively widening their pool of potential investors. So does this actually change the makeup of a startup’s investors?

In a word, yes.

There’s a simple way to test whether AngelList is truly opening up funding (and investing) opportunities. If it is, founder-investor connections that come via AngelList should lead to more cluster diversity than those that come through other channels.

Recall that 57% of second investments come from someone in a different cluster from the first investment. If, however, the second investor is someone the founder met via an AngelList introduction, the odds rise to 63%. For startups raising money the “old-fashioned” way with 2-4 clusters represented, the odds of staying in a known cluster are 47%. If the investor comes via an AngelList intro, the odds are 59%. And with 5-9 clusters represented and only 43% chance a new traditional investor will create additional cluster diversity, the odds for an AngelList intro are 55%. Here’s the graphical comparison:

When two or more clusters are already represented, investments via AngelList introductions are about 30% more likely to yield a relationship with a new cluster than their non-AngelList equivalents.

The numbers for AngelList intro investors are still a long way from the “random” equivalents — location, reputation of other investors, sector, and other factors matter on AngelList too — but they’re clearly indicative of a big shift.

Many technology advances over the past decade or so can be put in one of two categories:

1) They make the world more social, allowing us to see what our friends do and like. Basically, Facebook.
2) They make the world more efficient, by giving people access to information and markets. Basically, Google.

AngelList sits in between those extremes. On the friends side, its follower model means that much of what people see is from the people they already know or at least know of. But on the information side, it opens up something that was almost entirely governed by word of mouth, and creates something that at least takes a big step toward being a marketplace.

I don’t pretend to know enough about the macro dynamics of investor management to predict how this will affect company operations. But the effect on the investor pool is clear. As AngelList and crowdsourcing grow, the impact of the old boys’ clubs will shrink. For companies, the pool of investors is growing.

The Four Things That Motivate Me

There I was, a medium-sized fish in a pretty big pond. There was nothing wrong with that, but I found myself disinterested in fish size or pond size: I wanted to create a new pond. It was 2007, and I left my job at LinkedIn because I wanted to start a company.

At the time, my goals focused on the pond creation above all others. The company I’d start wouldn’t need to be anything specific; building a business was an end in itself. Of course, there were some restrictions: I had no desire to build office chairs or games, and I wanted to use at least some of my more prominent skills in social network data, ranking systems, and predictive modeling.

A friend of mine — who’d already built a large company — told me he was only interested in starting something that could be huge and world-changing, and couldn’t conceive of doing anything less than that. By contrast, I just wanted to achieve some success as an entrepreneur working on an interesting problem. I was excited to venture out on my own and test my skills as a founder.

After I left LinkedIn, I co-founded a company that allowed me to achieve much of what I set out to do. Ephraim (my co-founder) and I led a team that built out one of the world’s top few mom-focused websites, Circle of Moms. Our list of accomplishments is substantial: we built a product that helps millions of moms, a profitable business, a positive team culture, and a bunch of cool technology. And all of those things were attractive to Sugar, Inc., which acquired Circle of Moms this past February after 4.5 years as an independent company.

I left when the Circle of Moms acquisition closed, and have since been (among other things) thinking about my next big move. One option would be to spread my time across companies, as VC’s and a handful of others do.

This is what I’ve largely done over the past six months, albeit in a more scattered form. I’ve spent time working with many founders: some in my role with 500 Startups, others who started companies I individually invested in or advise. That’s been fun and educational. It’s highlighted areas where I feel investor/adviser types can add real value, by allocating resources effectively and then spreading expertise across a number of companies.

But that experience also reinforced my initial leanings: I want to start another company.

My second time through, I’m approaching things differently. I’m less excited about just starting a company; I’m being more deliberate about what it is that I start. I also recognize that many of the first decisions you make — from business model to company vacation policy — can have long ranging implications. And perhaps more than anything else, I know myself a little better, understanding both what really gets me excited and where my strengths and weaknesses are.

As part of that, I’m asking myself to think through and answer a handful of questions fundamental to the founder’s existence. Though these questions are hugely important, they’re difficult to answer when you’re actually running a startup: preparing that investor pitch, pushing out that next feature, and wooing that candidate all seem like better uses of time today.

Though it may be more individual detail than some want to see, I’ve decided to write up my answers as blog posts. There are a few reasons for this.

First, it makes me accountable: writing up my thoughts for an audience will force me to be crisper than notes I jot down for myself only.

Second, I feel this is a process all founders should go through in some form: a weekend spent thinking about these questions can lead to good decisions that will pay off for years to come. I’m currently in the fortunate position where I don’t have any competitors, so I’m happy to share things that can hopefully “raise all ships” without fear of helping the competition.

Finally, this will clearly frame my view of the world for potential collaborators; perhaps through this blog, I’ll connect with one or two readers who see the world similarly.

Here are some of those questions. I’ll answer the first in this post, and others in posts over the next few weeks.

1) What matters most as you evaluate a startup idea?

2) What are you good at, and who complements you well?

3) What kind of company culture do you want to encourage?

4) What are some areas that have room for significant innovation?

What matters most as you evaluate a startup idea?

Several experienced entrepreneurs I respect have asked me this question.

To do this properly, a founder must choose between priorities. “I want to save the world and make billions of dollars and design the most beautiful product ever and build the coolest technology ever and become famous and …” may sound ideal, but it’s not realistic and it doesn’t inform choices. Do you take the quick, safe route because you want to sell your company to Facebook in a year, or do you take a bigger chance to make more of an impact?

Here are some common traits a founder might value in a potential company:

  • Is disruptive: makes a market more efficient, by stamping out longstanding un-innovative types (US Postal Service, realtors, the taxi medallion system, etc.)
  • Affects the lives of many people
  • Significantly improves the world
  • Builds a product people love
  • Gives the founder a shot at a huge financial payout
  • Sets the founder up for an acquihire-level financial payout
  • Could lead to the respect of ______ (peers, mentors, parents, etc.)
  • Allows the founder to do better (in terms of fame, finances, etc.) than a personal rival
  • Has a natural strategy for growth/distribution
  • Has a natural strategy for profitability
  • Is technology-focused
  • Is design-focused
  • Is sales-focused
  • Is brand-focused
  • Is timely with respect to available technologies
  • Fits the skills of the founder
  • Is intellectually interesting to the founder
  • Fits in with the founder’s world view

As you can tell, this list encompasses a wide range of characteristics. Some are in opposition to one another: I’d run away from a startup that told the world it wanted to be technology-focused AND design-focused AND sales-focused AND brand-focused. Others are independent: it’s easy to imagine design-focused startups that have natural strategies for distribution and/or profitability, and others that don’t have such strategies.

There’s of course no right answer to which traits the global set of startups should prioritize, but there may well be a right answer to which traits YOUR startup should prioritize. If you’re going to spend three, five, ten, fifteen years building a company, it should be something that’s going to get you excited every day.

Having a list this long forces one to choose. Here’s how I think of each of the above; everyone will be different.

Is disruptive: makes a market more efficient, by stamping out longstanding un-innovative types (US Postal Service, realtors, the taxi medallion system, etc.)
I like the idea of building a disruptive startup, but it’s hard to imagine the idea of disruption being the central one that gets me out of bed in the morning. For other people, beating the crap out of a privileged, ossified sector of the economy would be a laudable goal. For me, it might be fun, but is not something to which I aspire.

Affects the lives of many people
Affecting people’s lives — without regard to whether the effect is good or bad — doesn’t rate for me as a criterion. I’d say that Facebook and YouTube have both clearly had big effects on the world, by allowing people to find and share videos and photos online. But I’m not completely convinced that either makes the world a better place, and it would be difficult to make a strong argument either way.

Significantly improves the world
On the other hand, a startup that has a positive impact on the world — and for me, the magnitude is key — is one of the central drivers of what I choose to work on. Being an introverted nerdy type, I’m happy and comfortable to abstract this out a couple of levels: I don’t need to physically see that I’m solving someone’s hunger problems. I’d give LinkedIn high marks on this for its role in facilitating professional relationship building, which allows the economy to grow more quickly. Likewise, Wikipedia propagates free, generally high-quality information, which is useful in many respects for everyone who’s online. This was also one of the appeals of building a product to help moms.

Builds a product people love
Building a product people love — and getting positive feedback — is nice, but it’s not ultimately what drives me. I’m just as happy to do something that helps people’s lives without them directly realizing it.

Gives the founder a shot at a huge financial payout / Sets the founder up for an acquihire-level financial payout
I wouldn’t turn down a large financial payout, but it’s not why I’m playing the startup game. Having been at least a small part of three financial successes (PayPal, LinkedIn, Circle of Moms), I’m financially comfortable, if not super wealthy. So an acquihire-type startup outcome wouldn’t be financially life-changing for me. And though it would be great to build a very valuable company, I’d much, much rather go to my deathbed having built Wikipedia than Zynga, even if Zynga would be much more lucrative.

Could lead to the respect of ______ (peers, mentors, parents, etc.) / Allows the founder to do better (in terms of fame, finances, etc.) than a personal rival
Respect from others and competition with rivals are emotions that drive me on short-term projects, but don’t underlie my long term motivations. At times, I’ve worked harder and smarter to impress someone I look up to; at others I worked my tail off to do better than someone I didn’t want to beat me. But both were over the course of a month or three: year to year, I’m not really driven by the mentorship of others, the desire to have someone’s approval, or long term competition. It’s hard to imagine consistently waking up in the morning and jumping out of bed to get to work because I want to beat or impress someone: it’s just not who I am.

Has a natural strategy for growth/distribution
As is likely obvious from my background, I think a lot about distribution and the creation of strong and sustainable online ecosystems. Thus while I wouldn’t place distribution and ecosystem at the very top of my “must have for my next startup” list, it’s one of the top things I think about. If I’m considering a consumer product, and don’t believe there’s a cost effective way to scale it, I’ll probably pass.

Has a natural strategy for profitability
The same isn’t true for profitability: I’m comfortable with short-term ambiguity around the monetization of a product, as long as my intuition tells me there’s a way to bring in revenue. Circle of Moms and LinkedIn both fell into this bucket, as neither had a clear early revenue model. Others gravitate more toward ideas with clarity around business model.

Is technology-focused /
Is design-focused / Is sales-focused / Is brand-focused

I’m less particular about whether a company is technology-, design-, brand-, or sales-focused than many are. I like technology hurdles and intellectual challenges (more on that soon), but was excited to work for a company like LinkedIn which (in the early days at least) never really felt like a pure technology company.

Is timely with respect to available technologies
Timeliness is, in my opinion, a very valuable tool in finding large businesses. Most of the largest technology businesses around today couldn’t have been formed two years earlier, because something — technology, infrastructure, culture shift — hadn’t existed. To that end, it’s an important part of brainstorming, and something I consider in evaluating a business’ viability, but it’s not a core part of my checklist telling me what I’d be happy working on.

Fits the skills of the founder
Matching a company with the skills I have is something that’s high on my list. I’m not a top notch developer, I’m certainly not a sales person, I’m not going to be a talking head on TV, and I doubt I’d be strong as a dealmaker. But I’m skilled with data, am not completely full of crap (I hope!), can understand product ecosystems better than most, and can pull together marketing, technology, and product skills in ways many others cannot. If you’ve gotten this far, it’s perhaps an indication that I can write competently. Since I get A’s on parts of my self-evaluation and D’s and F’s on others (more on this in a future post), I place a high priority on making sure that the good stuff comes out. That doesn’t mean that I don’t want to push myself — I do — but I want my company to use the unfair advantages that I have.

Is intellectually interesting to the founder
One of my not-so-good traits is a tendency to get bored. Without challenges, particularly intellectual ones, I get antsy. When I’m bored, I tend to search for difficult solutions to simple problems, because they keep me entertained. That’s not a great characteristic, but it’s who I am. So it’s better for me to work on problems I find intellectually captivating. That way, I won’t get bored and can focused on the best, simplest solutions rather than the most interesting ones.

Fits in with the founder’s world view
Fitting in with the founder’s world view is valuable to founders who have a very specific view of where the world is going. At PayPal, Peter Thiel would speak at company meeting about PayPal supplanting government-controlled currencies; that fit into his libertarian view of the world. I am passionate about moving toward a world where better decisions are made with the help of data and I’m passionate about the notion that the standard rhetoric of both the left and the right oversimplify in unfortunate ways. However, though these help inform my direction, they won’t drive it.

As you can tell, there’s a lot involved in going through that exercise.

My Top Four

Ordering my selections above, I get the following as my foremost concerns as I think about my next startup:

1) Intellectually interesting to me
2) Significantly improves the world
3) Fits my skillset
4) Can create a strong and sustainable ecosystem

This is a reflection of a good balance for me: what would keep me engaged, what would I look back upon with pride, what’s a good use of my skills, and what can really work.

When evaluating a startup idea, I measure the concept against each of my top four characteristics to gauge its appeal; I expect my next (TBD) startup to rate highly on at least three of the four. Of course, the people I might work with on a new project factor in considerably; I’ll address that in a future post.

As you’ll likely see, going through this exercise is both fun and insightful. What’s most important to you?

Job Creation Stats Under Republicans and Democrats

Former President Clinton mentioned last night in his speech that there had been 42 million jobs created in the last 24 years with Democratic presidents, compared to only 24 million jobs created in the last 28 years with Republican presidents.

That kind of statistic can be misleading in lots of ways; the two most obvious are that he’d cherry picked a time period or that a few very good or very bad years would sway the results.

So I took a look at the BLS data on nonfarm employees. They’re a little different from the numbers Clinton cited, so I imagine he’s using a slightly different definition of jobs. Nevertheless, the trends are the same, so I’m comfortable using the numbers for a comparison.

I looked, year by year, at the net change in jobs, for every year since 1953 (Eisenhower’s first year in office). This was defined as January 31 to January 31, to best coincide with the presidential term. Overall, nearly 48 million jobs were created under 23 years of Democratic presidents (over 2 million per year; I excluded 2012) and nearly 35 million under 36 years of Republican presidents (just under 1 million per year).

I sorted the years by the net percentage change in jobs, to look at whether good and bad years are more likely under presidents of one party.

The twenty best years of the last 59 were:

1955 (Eisenhower, R)
1978 (Carter, D)
1965 (Johnson, D)
1977 (Carter, D)
1966 (Johnson, D)
1972 (Nixon, R)
1983 (Reagan, R)
1984 (Reagan, R)
1968 (Johnson, D)
1964 (Johnson, D)
1994 (Clinton, D)
1959 (Eisenhower, R)
1973 (Nixon, R)
1988 (Reagan, R)
1987 (Reagan, R)
1997 (Clinton, D)
1976 (Ford, R)
1999 (Clinton, D)
1996 (Clinton, D)
1993 (Clinton, D)

1955 had the best jobs numbers with a 5% growth rate; 1993 saw a 2.5% growth rate.

In eleven of those twenty, the president was a Democrat. The years are relatively evenly spread across decades other than the 2000s: two in the 1950s (out of seven years), four in the 1960s, four in the 1970s, four in the 1980s, and five in the 1990s.

If we take out the first year after a party change in the White House, we remove two years that cast Democrats in a favorable light (1993 and 1977) and none that cast Republicans in a favorable light. That means nine out of eighteen “good” years had each party in office.

The worst twenty years tell a different story:

2008 (GW Bush, R)
2009 (Obama, D)
1982 (Reagan, R)
1957 (Eisenhower, R)
2001 (Bush, R)
1953 (Eisenhower, R)
1960 (Eisenhower, R)
1974 (Nixon/Ford, R)
1991 (GHW Bush, R)
1981 (Reagan, R)
1970 (Nixon, R)
2002 (GW Bush, R)
1990 (GHW Bush, R)
1954 (Eisenhower, R)
2003 (GW Bush, R)
1980 (Carter, D)
2007 (GW Bush, R)
1958 (Eisenhower, R)
2010 (Obama, D)
2000 (Clinton, D)

The worst year for job growth was 2008, with a net loss of 3.2%; 2000 saw a slight gain of 1.2%.

You’ll notice that on this list, we see a lot of R’s: fourteen of the fifteen worst years, and sixteen of the twenty worst years were when a Republican was in office. Taking out the transition years of 2009, 2001, 1953, and 1981, we still see the same trend: the eleven worst years, and thirteen of the worst sixteen happened during a Republican presidency.

The middle tier of nineteen years of roughly average job growth shows a Democrat-Republican split in between those of good and bad: eight years with Democrats in office and eleven years with Republicans in office. Excluding transition years, the numbers are seven and ten.

Clearly, these results indicate stronger jobs numbers under Democratic presidents. That begs the question of whether the trend is just random variation, or something that’s statistically meaningful.

So I ran some statistical tests: what’s the likelihood that — in a completely random world — of the top twenty years, at least eleven would be under a Democrat, versus four or fewer of twenty bad years under a Democrat? About 2.5% of the time. What if you only looked at comparable numbers for non-transition years? It’s higher: around 6%. What if you defined good and bad as only the top fifteen? 4% and 1.7% for all and non-transition years, respectively. All of these are one-sided tests, meaning they answer the question “what is the likelihood of Democrats doing as well or better than ___ by chance?”

Statistical significance is often at the 5% (95%) threshold, meaning that an event in the middle 95% of a probability distribution is judged not different, while an event outside that threshold is significant. That means a significant event should fall at or under 2.5%.

The numbers above suggest borderline statistical significance: with some parameters, the differences between Republicans and Democrats look significant; by tweaking them slightly, they appear insignificant.

Based on all that, it’s certainly plausible that there’s a real difference between job creation under Democratic presidents and Republican presidents. After this fairly cursory analysis, I wouldn’t feel comfortable defending such a relationship, but the claim is closer to reality than much of what was stated at the recent conventions.

However, comparing Obama and Romney — who seemingly both fit within the historical mainstream of their parties on economic policy — it’s a different story. If my goal was strictly to generate more job growth, there’s no question that I’d pick Obama. While the trend is borderline from a scientific perspective, it’s strong enough to use as a significant weighting in a real world gut-based decision.

A Primer on A/B Testing (Yummy Candy!)

I think I know how it feels to be a nagging dentist.

I spend lots of time helping startup founders figure out how to increase the number of people using their product. Sometimes, founders think that because a few silly folks labelled me with the (soon-to-be-cliched) title of growth hacker, I am “magical” like an Apple product. With one quick suggestion from me, they can get to a million users!

Unfortunately, it doesn’t usually work that way. Instead, I tell them, they need to (among other things) rigorously A/B test a dozen interface changes on their three or four most important pages. And then I get that “what-do-you-mean-I-need-to-floss-every-single-night?” kind of look.

I’ve run hundreds of A/B tests over the years, and in the process I’ve learned a lot about what messages people respond to. After seeing the results of those tests, I present a shocking hypothesis: “you should try this yummy candy!” will be more effective than “you really need to start flossing every day.”

So… I need to tell you why A/B testing is like yummy candy. Fortunately, I can make that argument without being misleading: running A/B tests can be really fun and addictive (like Skittles!). You’ve probably experienced an eager expectation that something new would immediately improve your world in a significant way. Maybe as part of a website — a new, beautiful signup flow will mean a super engaged user base — or, in your personal life: a new hairstyle will encourage people to respond to you in a better way.

A/B testing can provide that spark of hope on a very frequent basis: at Circle of Moms we’d have dozens of tests running at any given time, each serving as a quantitatively sound way to understand our usage and improve our product. Pushing out new tests multiple times a week, getting rapid feedback on each, is like regularly handing out chocolate to your team. Each test is a yummy morsel of hope: it has the potential to bring users in, excite and engage existing users, and make money. Frequent testing is like frequent chocolate consumption. Yum!

Frequent chocolate consumption has risks, and so does frequent A/B testing. With A/B testing, it’s important to be holistic and patient about collecting data. But a product development strategy involving A/B testing is generally both more fun and more effective than the alternative “change and pray” approach.

Now that we’ve established that A/B testing is fun, we get to the real questions. Why does it actually matter to your business? What should you be testing? When does it make sense to do? (brief answer: not always) And how, technically, should you do it?

Let’s tackle each of those.

WHY

The reason to A/B test is simple: because newer doesn’t always mean better, and everyone I’ve met is mediocre at predicting how effective a new experience will be. There’s often an implicit assumption that ______ in my product isn’t very good, and by spending time on it, we can only make better. In extreme cases — the current version is a 404 page not found error — that’s very likely to be true. But in more common cases — the signup flow is a little bit ugly and awkward — product changes don’t always mean progress.

We saw this time and again at Circle of Moms. We had a new homepage that looked cleaner and more usable… and users who saw it stopped contributing to conversations. We had a signup flow that seemed much simpler and more professional… but fewer people got through it and those who got through it didn’t invite their friends to join our site. Surely asking people to share their answers on Facebook would be good, right? Turns out no: very few moms actually shared their activity, while many others were scared off by the thought of us making content too public (this only applied for some content types).

Okay, you say, that’s all fair and well, but how about just making a change and seeing how it affects overall metrics for the product? There is a case where this is a good approach, and I’ll walk through it in the “When” section. But most of the time, it’s the wrong way to go.

To work, serial “testing” requires three things: the rest of the world staying steady, large changes, and a close eye on metrics. Let’s say you’re looking at how a new homepage design affects activity, and all of a sudden your sending email IP is blacklisted by Yahoo. Your numbers will almost certainly go down, regardless of the effectiveness of your new homepage. New signup flow, and all of a sudden you get a surge of search traffic that broadens your audience but decreases the quality? Same type of issue. Major site downtime or technical issues can have the same impact.

If you have a huge increase or decrease, and you know that the outside world is more or less the same over the test period, and you measure different cohorts properly, and of course you only measure one things at a time… serial testing can work. If you really think those can happen consistently, you’re a lot more optimistic than I am.

WHAT

There are two reasons to A/B test something:

1) You have a product enhancement that might improve your metrics at a level material to your business, and want to try it.

2) You have a radically revamped piece of your product, and want to verify that it’s at least as effective as the current version.

Generally, #1 is about iteration and optimization, while #2 is about design and vision. The thought processes for the two are very different.

Optimization is only useful on products close enough to “good” to be optimized. Overused but apropos cliche: A/B testing something that’s badly broken is akin to rearranging the deck chairs on the Titanic. Here are a couple of cases where you may or may not want to use optimization:

  • Viral signup flows. If your current signup flow features 1000 signups inviting 3000 people, 900 of whom register for your product, you’re very close to being viral (K=0.9). A/B testing would be a good use of time. If your current flow features 1000 signups inviting 600 people, 80 of whom join (K=0.08), then you aren’t in the ballpark: optimizing button text is likely a waste of time. Go bigger.
  • Email content. Subject lines and link text can have a huge impact on email clickthrough rates. One typical example: an email with the subject “5 Embarrassing Kid Moments” gets 2.5 times as many clicks as one with the subject “The Craziest Thing My Child Has Done.” But again, being close to “good” is key: if that 2.5x is the difference between 50 clicks a week and 125 clicks a week, does it matter? If it doesn’t matter (and good estimation is key), no point spending time A/B testing it.
  • Purchase funnel. Much of Team Rankings‘ revenue comes from subscriptions, and some purchase funnels can be much more effective than others. Last year, for instance, our March Madness product, BracketBrains, generated 30% more sales when we prompted people to “Get 2011 Picks” than to “Get BracketBrains”. Same caveats apply here, though: test if and only if the differences are likely to matter for your business.

The thought process for a new design is very different. At Circle of Moms, I’d explain to my team that we tested a new home page not because we wanted it to improve our metrics but because we didn’t want to tank them. I wrote an entire post on why A/B testing vs. holistic design is a false dichotomy; the TL;DR is “test entire designs, see how each does on a variety of metrics, then make an informed judgment call on how to go forward.”

WHEN

If you thought flossing was exciting, wait until he starts talking about statistical significance!

The most important type of significance in assessing when to A/B test isn’t statistical significance, it’s business significance. Can a new version of this page make a real difference to our business? Size of user base, development team, and revenues inform what’s useful for different companies. Facebook can move the needle with hundreds of 0.1% improvements. A small startup with no revenue and 100 users doesn’t care about 0.1%, nor will they be able to detect it. In a small product with no usage, serial testing is fine: there’s no chance that the business will be built around the existing product, so rapid change is more important than scientific understanding.

Statistical significance is the second most important variable in assessing when to A/B test. Making a decision between two options without enough data can undermine the entire point of A/B testing. Doing statistical significance properly can be difficult, but the 80-20 solution is pretty simple. Just use an online split test calculator, using estimates of statistics to see whether you are likely to attain statistical significance for a single output variable.

A big caveat: a little common sense regarding statistical significance can go a long way. If you’ve been running a test for a while and don’t have a clear winner but have some other ideas that might move the needle a lot more, you might be well-served by resolving the test now and trying something new. You’re running a company, not trying to publish in an academic journal.

At Team Rankings, we regularly do this with BracketBrains. Most of our sales happen over a four day period, so we have a limited window in which to test things. The “cost” of resolving slightly sub-optimally — say choosing the 5% option rather than the 5.1% option — is likely to be lower than the opportunity cost of not running an additional test. And since resolving a test when 99% of sales have occurred does us no good, we’re more aggressive than traditional statistical tests would dictate. If, on the other hand, you’re early in a product’s lifetime, conservative decision-making might be more appropriate.

HOW

There’s one way to A/B test properly: build your own system. Lots of people probably don’t want to hear that, but products like Optimizely are too simplistic and optimization-focused to be broadly useful. Outsourcing your A/B testing is like outsourcing your relationship with your users: you need to understand how people are using your product, and the A/B testing services currently available don’t cut it.

I wish I could recommend an open source A/B testing framework to avoid re-inventing the wheel; ping me if you know of a good one or are creating one (if so, I’d be happy to help). A/Bingo is the closest.

The good news is that it’s pretty simple to get your own very basic A/B testing system up and running, and it’s easy to build up functionality over time. Here’s the bare minimum:

Data structure:
AB_TESTS (id, name, time_created)
AB_TEST_OPTIONS (id, ab_test_id, weight, name)
USER_AB_TEST_OPTIONS (id, user_id/visitor_id, ab_test_option_id, time_created)

Code

/ CONFIG FILE //
ab_tests = array(
	“home_design”=>array(
		array(name=>“2_columns”, “weight”=> 1),
 		array(name=>”1_column”, “weight”=>9)
	);
); 

// VIEW FILE //
if (user->has_ab(“home_design”, “2_columns”) {
  // show new 2-column layout
}
if (user->has_ab(“home_design”, “1_column”) {
  // show old 1-column layout
}

// USER OR VISITOR OBJECT //
function has_ab() {
  // check if this test exists
  // if not, create it in the DB
  // (one row in AB_TESTS, multiple rows in AB_TEST_OPTIONS)

  // check if this user/visitor already has an ab test option selected for this test
  // if not, select a random number
  // use the weighting to decide which version he should get
  // record it in the USER_AB_TEST_OPTIONS table
}

Reporting (SQL)
I’m assuming you have a USER_ACTIVITY table that records different types of activity, with user, time, and activity type. A table like that makes A/B test reporting a whole lot easier.

select
  AB_TEST_OPTION_ID,
  ACTIVITY_ID,
  count(distinct USER_ID) USERS_DOING_ACTIVITY,
  count(1) TOTAL_ACTIVITIES
from USER_ACTIVITY a, USER_AB_TEST_OPTIONS b
where a.USER_ID=b.USER_ID
  and AB_TEST_OPTION_ID in (…)
  and ACTIVITY_ID in (…)
  and a.time_created>b.time_created
GROUP BY AB_TEST_OPTION_ID, ACTIVITY_ID;

Scaling
If your site is or becomes massive, scaling the framework will entail some additional work. USER_AB_TEST_OPTIONS may require a large number of writes, and that little query joining USER_ACTIVITY and USER_AB_TEST_OPTIONS might take a while. Writing to the table in batch, using a separate tracking database, and/or using non-SQL options may all help to scale everything.

At Circle of Moms, we built out a system to automatically report lots of stats for every test. This was awesome, but it takes some work to scale, and I would never recommend it as a first step.

So…
As I said, A/B testing is like candy: fun and sometimes addictive. Done correctly, it can be part of the best form of mature and thoughtful product development. It builds a culture of testing and measuring. It lets you understand what works and what doesn’t. It forces you to get smarter about what actually moves metrics. Most important, it fosters an environment where data trumps opinions… anyone want to volunteer to try to take that to D.C.?