The Epidemiology of an Internet Business

Imagine life as a bacterial plague.

Your world is pretty simple. You aren’t motivated by fancy cars, attention from attractive people, or the respect of your peers. All that matters is the propagation and survival of your species.

But being good at both propagation and survival is tougher than it sounds. Say you’re really successful at spreading. You quickly infect everyone in the world. Awesome news, right? Well, maybe not: one of two bad things might happen. Either you’re deadly and the source of an epidemic that destroys the world’s population… and with it your source of places to hang out! Or you aren’t deadly, and everyone’s immune system figures out how to get rid of you at once… and then you also don’t have anywhere to hang out.

Successful plagues therefore need a more nuanced approach: propagation at the right speed, a non-lethal effect on targets, and a means of mutating to adapt to changing conditions and antibodies.

As it turns out, Internet businesses are a lot like bacterial plagues. In both cases, results vary widely.

MySpace excelled at propagation: it launched in August 2003 and grew to 100 million users within three years. But its survival mechanisms have been less effective, meaning that those 100 million users have largely managed to expunge the MySpace bug from their system. Hence MySpace usage in 2012 is a small fraction of what it was in 2007.

Some of the first Facebook apps were even better at propagation and even worse at survival. Circle of Friends, which I developed, launched in September 2007. By November 2007, without any semblance of press, we were adding half a million users a day. Today, Facebook reports that Circle of Friends has 300 monthly active users. Not three hundred thousand. Three hundred.

By contrast, Facebook itself has excelled at both propagation and survival. Eight and a half years after launching, Facebook has around a billion monthly active users. To get there, they’ve done astoundingly well at both spreading and maintaining their role in existing people’s lives.

Most aspiring Internet entrepreneurs know about viral coefficients. But a viral coefficient mainly reflects the propagation piece of Internet epidemiology.

The other half, equally important for a sustainable business, is frequently referred to as retention. This number is usually expressed as the percentage of registered users active in the last month. Five million registered users, one million of whom have on the site in the past 30 days? That’s a “retention rate” of 20%.

In the world of epidemiology, a retained user is like a person still being affected by bacteria.

The retention number doesn’t explain a lot, though. On the Internet, are those 20% super active, or are they only using the product once or twice a month? Were those 20% the same 20% as the month before?

It’s possible to look at the entire system from a fairly simple epidemiological perspective by instead asking two key questions:

1) What percentage of users this month (or week) fall into one of a few key categories? For Facebook, those might be inactive users (no visits), lightly active users (1-3 visits), moderately active users (4-7 visits), highly active non-contributors (8+ visits, 0-1 comments/posts), and highly active contributors (8+ visits, 2+ comments/posts).

(I’m sure there are biological parallels reflecting the progression of bacteria through someone’s system, but my lack of knowledge means crafting an analogy would almost certainly entail me putting my foot in my mouth… if I haven’t already done so.)

2) How are users transitioning between the different states? For example, what percentage of last month’s inactives stayed inactive, and what percentage jumped into each of the other four categories?

Here’s what these numbers might look for Facebook (note that these are completely fictional):

In this world, Facebook has a billion active users, but still another half billion inactive users. The billion users are divided relatively evenly between the four active categories.

And here are the transition probabilities between states of activity. This (again fictional) matrix defines Facebook’s epidemiological framework:

I make the fairly conservative assumption that 90% of inactive users will stay inactive, while 4% will become lightly active, 0.4% highly active contributors, etc. Meanwhile, nearly half of highly active contributors will stay in that bucket next month.

If this model is correct, here’s how Facebook’s world will look the next two months:

As you can tell, with these transition probabilities, Facebook’s bucket stats will stay stable.

So let’s see what happens if we change the transition probabilities just a bit. In the matrix below, Facebook gets a bit better at keeping lightly active and moderately active users from becoming inactive — moving from 12% and 5% to 4% and 2% — and also better at keeping highly active contributors in the high contributor bucket — increasing from 48.5% to 75%. Everything else stays the same.

The effect seems significant but not overwhelming. For next month’s stats, here’s what we see:

In other words, a slight decrease in inactive users, from 500 million to 472 million, and a larger increase in highly active contributors from 200 million to 260 million.

But like viral growth, that difference gets a lot bigger as its effects fan out. Here’s what things look like two years from now:

Inactives have been cut almost in half; super-actives have more than doubled.

And this doesn’t take into account the effect of that increase activity on other users — both those already on Facebook and those who aren’t.

In other words, small changes in transition probabilities can have large long-term effects on a business’ usage patterns.

So how do you define the buckets for your business? The details aren’t really that important — I could have defined “lightly active” as 1-4 visits a month. What’s most important is separating users who are meaningfully different from a business perspective. So if a tiny number of contributors are generating 90% of the content for everyone else, it’s important to give them their own bucket.

It isn’t just users who matter in an ecosystem. Tweets and the way they travel around are extremely important to the way Twitter works: a minor change in the way retweets work could completely change Twitter’s tweet ecosystem. Even without any changes on Twitter’s end, the dynamics could shift because of changes in users’ culture.

Once defined, the transition probabilities allow a company to understand how people are moving through their system. More important, they serve as a set of baselines to try and beat. If I were Facebook and saw the probabilities in the initial matrix above, two numbers would stand out to me: 90% of inactive users stay inactive, and 53% of highly active non-contributors will become less active the following month.

In a company Facebook’s size, I’d probably have a person or small team focus on each of those two areas. Each could dive into the data to search for any easy wins, then spend a few months building features aimed at improving those percentages.

In a few months, we’d hope to see better results for those two metrics. By then, the ecosystem of users and technologies will have changed, and some new problems will have surfaced. This is much like bacteria suddenly confronting new medicines or even changes in their host species.

In the search to find a perfect system that can propagate and sustain itself, bacteria have a big advantage: orders of magnitude more scale. There are billions and billions of them, and many, many opportunities for subtle mutations to come up with the perfect plague. Even fast moving Internet companies can’t compare.

On the other hand, Internet companies can be guided by smart, data-focused leaders who, unlike bacteria, aren’t governed purely by randomness. For that reason, I’d rather be the founder of an Internet company than a bacterial plague… even if the company was MySpace.

Mike Greenfield founded Bonafide, Circle of Moms, and Team Rankings, led LinkedIn's analytics team, and built much of PayPal's early fraud detection technology. Ping him at [first_name] at