The Slow Road to Entrepreneurship: Learnings From Early PayPal

We all know the stories: Zuckerberg, Jobs, and Gates dropped out of school and founded three of the companies that define our world. They went from college students to entrepreneurs, no transition required. But that’s the exception; to generalize Dave McClure’s instant classic, those of us in Silicon Valley’s 99.9% have to content ourselves with being relatively late bloomers.

That’s reflective of a larger narrative that dominates much of the talk around entrepreneurship: the narrative implies that one is either 0% entrepreneur or 100% entrepreneur. Such a binary classification implies that entrepreneurs have to go from 0 to 100 in one step. While there’s some truth in that — you won’t completely understand the thrills, stresses, and demands of starting a company unless you do it yourself — you can certainly prepare yourself for future entrepreneurship by being a part of early stage companies with people who can help you learn quickly.

My own story is certainly reflective of that: I wasn’t ready for entrepreneurship at age 21 or 22. By my late 20′s, I was aching and ready to start and grow a company. Here’s how that change happened.

My first substantial job experience after college (following eight months as an engineer at a failed startup) was at early-ish PayPal. I was part of the company’s growth from small in 2000 (a very unprofitable hundred person startup) to huge in 2004 (massive subsidiary of massive Ebay).

As a new employee, I knew nothing about fraud detection and received minimal guidance. Somewhat lost, I added very little value to the company my first few months. Three months after joining PayPal, Sports Illustrated wrote an article about my side project, Team Rankings. I sensed some irritation from my boss Max: I was doing only so-so work at PayPal, but was getting significant publicity for a side project. That irritation seemed unfair to me at the time; having now been a founder myself, I can completely relate. Founders want people on their team executing at the highest level; seeing someone perform better on a side project than in the office doesn’t send that signal.

Fortunately, I soon found my way, and (with some help) figured out how to turn massive amounts of data into statistical models that could accurately predict fraud. I felt like I was living on the edge because I eschewed the “easy” off the shelf enterprise tools others were using. Some 23-year-olds rebel by using mind-altering drugs or traveling the world; for me rebellion was writing software from scratch to build statistical models. And this defined me: any job other than predictive modeling struck me as superficial, scientifically empty, and not worth doing.

My job at PayPal focused and insulated me. In 3.5 years at the company, I did one thing: build technology to predict fraud. Most of what was going on at the company — operations, usage, product, competition, finance — were of no concern whatsoever. I became very skilled in a few very specific areas, but knew very little about the goings on across PayPal.

If I had a publicist, I’m sure he or she would tell me to broadcast that I magically learned how to build a business while I was at PayPal: surely that magic would cement my place as part of an all-knowing PayPal Mafia. Alas, I don’t have a publicist, so the truth will have to do: PayPal taught me very little about how to build a successful startup.

But the PayPal experience was formative in two key ways. First, it showed me that talented, driven, resourceful people with virtually no knowledge of an industry could become skilled in areas they’d never known about before (for me: fraud detection) and collectively build a large Internet business and change the world. Second, my colleagues at PayPal set a bar for the caliber of thought and effort that I now expect from those I work with.

After several great and educational years, by early 2004 my job at PayPal had become routine. Fifteen months after Ebay acquiring us, the company’s combative, execution-focused culture had been swallowed by Ebay’s relentless drive to maximize employee time spent in PowerPoint meetings. I didn’t have deep insight into the business of PayPal and Ebay, but I knew I didn’t want to play the big company game. Yet for financial reasons, I was motivated to stay around to vest my remaining stock options.

Researching my choices, I discovered that Ebay had a policy which allowed employees to work just 24 hours a week, while continuing to fully vest their stock. This held a lot of appeal — especially since I was interested in spending some time helping out some friends who were working on a new site called LinkedIn.

Ebay’s lenient vesting policy was likely designed for new mothers or those with health issues — not 26-year-old males interested in moonlighting at a startup. That didn’t dissuade me: I soon shifted my schedule to one where I worked 3 days a week at PayPal and spent the rest of my time at LinkedIn. And the LinkedIn days were a lot of fun, as I got to work with a small team, focus on a completely different set of data problems, and understand how a social network could rapidly grow.

At this point, in the first half of 2004, my boss Nathan at PayPal was trying (struggling) to find cool stuff to work on, so we spent some time with other groups at Ebay looking for interesting data problems to solve. But I soon realized that once I started to look down upon my employer, it would be difficult for me to do top-quality work. I was still building good fraud models, but I was no longer psyched about my job at PayPal, and the caliber of my work certainly suffered. I admire those who can be completely professional and work at full intensity for anyone at any time, but I’m not like that. When I’m excited about a project and a company, I’m hard-working, clever, and efficient. When I’m coasting, I’m none of those things.

Trying to foster more commitment, Nathan came to me in July 2004 and told me that I had to choose between working full-time at PayPal and leaving. I’m guessing he thought this would push me to increase my commitment, but it had the opposite effect: I was bored, disgruntled and antsy, and I was going to leave. I left PayPal in August and darted off to France for a month of cycling.

At this point, I had aspirations of starting a company some day. I’d had some entrepreneurial experience with Team Rankings (more on that later), and had built up some skills at PayPal. At PayPal, I’d worked on some other side projects that could have turned into their own companies (none amounted to much). But I didn’t have in mind a specific company that needed to be started, nor was I compelled to start a company just to start something. And LinkedIn seemed like an attractive place to be: a strong 15-20 person team, an innovative and useful product, a really interesting data set. So I decided I’d join LinkedIn full-time.

I’d spend two and a half years at LinkedIn. I was the first analytics scientist and would lead what’s now called the Data Science team. Unlike my insulated time at PayPal, my years at LinkedIn would get me very close to the business and the product, piquing my interest in a much wider array of topics. Ultimately, that experience would nudge me to jump off the entrepreneurship cliff and start my own company. In my next post (follow me on Twitter), I’ll tell the story of those two and a half years and of the key experiences that led me to start something myself.

Data Scale – why big data trumps small data

As I walk into a coffee shop, the guy behind the counter sees that I’m in a hurry and that I’m by myself. He’s seen me a few times before, and knows that I don’t usually order sweet snacks. He knows I tip reasonably well. He’s likely to treat me a certain way: efficiently, without small talk, and not trying to sell me a muffin.

In the “real world”, his behavior — and my user experience — is largely the result of subconscious change (in Daniel Kahneman’s terrific book this is called System 1). Online, personalization and improvement of my experience usually comes from lots of data. Offline, the cashier’s “data set” is the personal experiences he’s had. Online, it’s the same thing for the site I’m visiting. The big difference is that most working humans are between 15 and 75 — a difference of 5x. Online, Facebook has nearly a billion users, and my blog has… less than one fifth that number. Online differences are orders of magnitude larger.

That advantage compounds over time, as companies with many millions of users attain data scale. Data scale is the millions of pieces of information that allow a company to improve the user experience in ways that competitors with fewer users cannot. I saw this firsthand at PayPal, LinkedIn, and Circle of Moms: all three companies were able to provide features and additional value to new and returning users because of what we’d learned from millions of others.

Network Effects and Big Data

Network effects are well-known and understood in the consumer Internet world. As Facebook grows more popular, more of your friends are on the site with you, and it becomes more and more useful (or at least, entertaining) for you. And that size distinguishes it from an upstart: why sign up for a new site with just three of your friends when you can be on Facebook with almost everyone? Network effects have clear and well-defined values for both websites and users.

By contrast, consider the opening sentence for the big data Wikipedia entry:

In information technology, big data consists of data sets that grow so large and complex that they become awkward to work with using on-hand database management tools

The entry depicts a purely technical set of requirements, with no bearing on the product or user. But lots of data is more than just awkwardness and data management tools. Companies with data scale can create a set of features and processes — prediction, testing, understanding, and segmentation — that aren’t possible to those with small user bases. Collectively, they allow a company to block access to fraudsters, tailor products to users, and understand them in deep ways. Data scale improves both the user experience and the bottom line.

The 4 Advantages of Data Scale

In my twelve years working with consumer Internet data, I’ve seen four things that companies with data scale can do much better than smaller competitors:

  1. Predict

    Fraud nearly destroyed PayPal’s business in its early years. Fortunately, we figured out how to accurately detect it, and wound up reducing fraud rates by 80-90%. After predicting which transactions were most risky, we’d block and reverse the bad ones — helping PayPal move from bleeding money in 2000 to profitability+IPO in 2002.

    Data scale was necessary for that detection. More transactions — and more fraudulent transactions — give smart scientists the data they need to discover complex but statistically valid predictors of fraud. Start with a set of only 10,000 transactions and 100 fraudulent transactions, and you can put together a few simple rules to find fraud. But with millions of transactions and tens of thousands of fraudulent transactions, our fraud analytics team could find subtler patterns and detect fraud more accurately. A mini-PayPal might have the world’s smartest predictive modelers, but without a large data set, there’s only so much they could do.

    Incidentally, this was a major reason PayPal needed to raise lots of capital. Losing a lot of money to fraud was a necessary byproduct in gathering the data needed to understand the problem and build good predictive models. A “lean startup” approach makes sense in some cases, but wouldn’t have cut it for PayPal.

    User perspective: if a company can figure out that you’re very likely a good, non-fraudulent customer, they can provide you with services they’d never want to offer their riskier users. That figuring out process is much more accurate when they have data scale.

  2. Understand

    Most websites start off with less structured data — their databases contain lots of text. Free form fields are easier for developers to code, and (early on) often make it easier for users to enter information. But unstructured data quickly get messy, and without scale, they don’t allow for easy inferences. But more unstructured data — along with a clever data scientist or two — can be a ticket to intelligently structure and build the corresponding features and insights. A few examples:

    • Until 2006, LinkedIn had no structure around company names. Users could type anything they wanted into a company field, and we had no way of automatically detecting that HP, H-P, Hewlett Packard, and HP, Inc. were all the same company. By parsing the data and matching it against other sources like email domain and address books, we were able to detect that those four names were synonyms. Without manual intervention, those processes are only possible with data scale: one person at “HP, Inc.” with an email address could be random, but when 97 out of 100 users have that property it’s a safe bet that it’s not a random fluke.

      Having an accurate list of companies allows LinkedIn to better guess who you know, it facilitates good company pages, and by using autocomplete, it improves data quality going forward.

    • Before we built Circle of Moms, we built a Facebook app called Circle of Friends. Less than a year in, with millions of users but weakening growth and minimal revenue, we started to search for ways we might shift our business. We found that moms were creating “circles of moms” and using them more than anyone else was using their circles.

      Data scale enabled us to find that trend, and understand what was going on. And that wound up being the insight that ultimately pushed us toward being a successful company.

    User perspective: younger, smaller companies don’t really know what users want, and thus have to keep their product open-ended. When you use the product of a company with lots of data, they’ve learned what people actually want to do, and you get a cleaner, more structured experience.

  3. Test

    Circle of Moms was fanatical about a/b testing on day one (LinkedIn was not — much to my chagrin — but I digress). But in order to decide between a and b, you need meaningful differences in outcomes. If there’s a large difference (say, 40-50%), then 100 outcomes (signups, clicks, whatever the company is optimizing for) in each group is often sufficient for establishing statistical significance. If the difference is 10% or less, you’ll need on the order of 1000+ outcomes.

    Let’s take a graphical look. Below are overall simulated clickthrough rates (CTRs) for different-sized user bases.

    clickthrough by population size

    [Technical details: I ran a simulation where a company a/b tests a variety of emails or subject lines. Each subject has clickthrough rate between 0.75% and 3%, randomly selected with a uniform distribution. All tests are pairwise, so A is tested against B, the winner is tested against C, that winner against D, etc. Tests are resolved at p=.99. A few aspects of this are unrealistic -- uniformly distributed CTRs, non-improving (on average) subjects, only tests with two participants, the same rules for resolving tests for big user bases and small, etc. -- but it's close enough for these purposes.]

    With a small user base, CTR will be mediocre: about 1.9%, and slow to improve. As a user base gets bigger and bigger, a higher and higher percentage of users wind up receiving a very good, well-tested subject line: big companies see a CTR very close to 3%. The largest improvement comes between 100,000 users and 1,000,000 users — in this case, that represents data scale. Most of our successful emails at Circle of Moms would go to a few hundred thousand or a few million people; we were right on the edge of having data scale. If we’d had fewer users, a high percentage of our users would have been “guinea pigs”. With millions of registered moms, we had (roughly) the same number of guinea pigs, but many more users for whom we could use our guinea pig learnings and send the very best content.

    Note the magnitudes of these differences. With data scale, testing can mean a bump of 50% or more; the testing bump is much less for a small operation. For a product close to being viral, an additional 10% — a “small data” bump — might be huge and a/b testing worthwhile. For a product with millions of users, a 50% jump is large almost regardless of application. On the other hand, for a small product where 10-20% doesn’t represent the difference between success and failure, time is best spent somewhere else. In other words, a/b testing is something every company with a large user base should do; for smaller companies the value varies.

    User perspective: if you’re part of a large group, you likely get better content because of feedback from those before you. If you’re part of a small group, you are more likely to be giving feedback rather than profiting from the feedback of those before you.

  4. Segment

    At Circle of Moms, segmenting was essentially a mix of predicting and testing. After we tested emails and subject lines with small batches of users, we’d create predictive models to figure out which future users would be likely to click on them.

    This meant we could figure out the odds that someone would click on each of twenty possible emails we might send. And we’d send the very best one for her.

    For Circle of Moms, predictive models were relatively simple and didn’t need as many users/observations as PayPal’s did. But because we were testing twenty different emails at a time and didn’t want to test everything on everyone, scale still mattered. 50,000 people is usually enough to create a model; multiply that by 20 and you have a million. That calls for a million people just in the training set (i.e., the guinea pigs). If you only have 1.5 million users, the benefit of this type of segmentation will be small — 2/3 of users will have received a “random” email to gather data for the models. At 5 million, a company is at data scale, and the vast majority of its users (80%) will get a personalized email.

    I got started on a segmentation-type problem at LinkedIn — matching people to jobs. Job matching means segmenting people into thousands of buckets (each job is a bucket), rather than only 20. Back in 2006, the quality and quantity of LinkedIn’s data made the job very difficult: 5 million users and only a few thousand past job listings was not enough data to do matching well. Today, with 20-30 times as many users, 7 years of job listings, and some scientists who are likely much better than I, LinkedIn does much better finding jobs for people than I did in 2006. That’s data scale (plus a talent upgrade) at work.

    Automated segmentation is harder to simulate and precisely quantify than testing. But the overall picture is clear: it’s useless at small scale, but usually far more valuable than testing at data scale.

    User perspective: when a company can figure out what you like, they can provide you with content uniquely suited to your needs and interests. The more data they have — both on you and on others — the better they can perform this service.

Things I don’t know I don’t know?

I’m both a data guy and an early stage startup guy, and that generally constrains the problems I see. I left PayPal a little while after the company was acquired by Ebay; I left LinkedIn when it was an 80-person company that I found too slow-moving; I left Circle of Moms after Sugar’s acquisition. That means I’ve never worked on a product with over ten million users. No doubt I’m missing out on some of the advantages that truly massive companies have. Others have more firsthand knowledge on the topic of really big data scale — those of you in that category, ping me about your favorite post and I’ll add a link.

Why An Asocial Geeky Dude is Building Technology for Moms

In early 2007, I left one of the hottest companies in Silicon Valley. LinkedIn had millions of users, top investors and was already profitable. Our 80 person team had grown from twelve when I’d joined in 2004, and was poised to grow by another factor of ten over the next 3-4 years. I’d been leading the LinkedIn analytics team, and for a quantitative Internet guy, the data opportunities don’t get much better than LinkedIn’s.

I left LinkedIn because I had an itch to start my own company, though I wasn’t sure what that company should be. Let’s do a little entrepreneurial counseling exercise. Here was my background:

  • Mathematical and Computational Science degree from Stanford
  • Built PayPal’s early statistical modeling technology for fraud detection
  • Started to rank sports teams and help needy gamblers beat the Vegas spread

And my demographic/psychographic characteristics at the time:

  • 29 year-old married male without kids
  • only marginally less touchy feely than Dick Cheney

Which of the following would you have suggested I build?

  • a) a socially optimized search engine
  • b) a cloud-based collaborative filtering system
  • c) an algorithmically personalized form of social networking
  • d) a high frequency, automated stock trading system
  • e) a company with the tagline “motherhood, shared and simplified”

Yeah, I wouldn’t have guessed (e) either.

How we found moms (and moms found us)

In September 2007, I launched a Facebook application with Ephraim Luft, who was my year at Stanford. Our app, Circle of Friends allowed me to create one circle for my math geek friends, another for my developer friends, and a third for my sports analytics friends (ah, diversity).

The application took off pretty quickly, attracting millions of users within a few months, and I got a crash course on scaling a web app (tip: don’t use MyISAM for tables that might become huge, unless you prefer learning MySQL DBA tricks to sleeping). Thanks in large part to our rapid growth, we secured funding in January 2008 from several top “micro cap” investors — Mike Maples, Jeff Clavier, and Naval Ravikant.

Around the time of our funding, we accidentally did something very clever. We built a small feature to allow users to upload their own (tacky clip art) circle icons. This was well-received, and we soon started to prioritize the new circles/icons that were most popular. I added some Bayesian logic to tie circles to the appropriate demographic, and we soon had a product that would upsell “Drinking Buddies” circles to 20-something males, and “special friends with heart in my life…” circles to 16-year old girls.

Still, as satisfying as it was to see hundreds of thousands of people creating a “WARNING NAUGHTY WHEN DRUNK” circle (see image on right) with 15 of their naughty-when-drunk-iest friends, we had no illusions that we’d discovered the future of human social interaction. We had some traction and some interesting data, but Circle of Friends’ usage was about as deep as a backyard kiddie pool, and revenue prospects were dim. We weren’t quite sure how to proceed, so I did what I do best: dug into the data.

I noticed that several hundred thousand people had created a “Circle of Moms”. It turned out that those circles were way more active than any others — moms shared more photos, had longer and deeper conversations, and accepted more of one another’s invitations.

That was interesting enough to us that we started to think about building out a product just for moms. We asked some questions around consumer demand, revenue opportunity, and our ability to provide real value to mothers. We got deep answers to these questions; I’ll spare the details and summarize in one word: “yes.”

We launched Circle of Moms in October 2008, and haven’t looked back since.

Moms need community, empathy, support… and some kickass algorithms

One of the best things about building Internet technology is the opportunity to do work that touches millions of lives. But a million times ten minutes of mindless game play is nothing but ten million wasted minutes. On the other hand, effectively touching the lives of millions of mothers can help to raise a great next generation of humans.

When a mom wakes up at 3 AM for the sixth straight night, needing to comfort her toddler who had been sleeping through until morning, she needs both support and information. This is hardly a new problem, but the Internet has done relatively little to improve moms’ collective support system or knowledge. Fast forward a few years, and her seven-year-old is having trouble concentrating in school and Mom is concerned he’s falling behind. By combining data and community, we can help a mom — and indirectly, her child — through these tough situations.

Being focused on moms keeps us honest as technologists. Shocking though it may be, many techies — myself included — might on occasion build something that’s cool or interesting but not especially useful. But when you have an audience that actually needs your product, it forces you to keep your eye on the ball.

To fully address these needs, we’re able to combine two great styles of Internet product development. The Google-style algorithmic approach to product development is often great for spam filters, optimized search results, and ad targeting. Facebook-style social incentives are great at encouraging users to do “work” like tagging photos, translating text into different languages, and indicating their likes and interests.

I love this balance. In our case, it means that Circle of Moms is fundamentally about community and connecting people, but we’re not just a social site. To take the next step beyond social, we aim to provide our users with good information and guide them in the right direction based on our understanding of them. Social cues encourage our users to share more information about their kids; we then use that information to improve their content experience in emails and on the site. Likewise, we derived a list of important milestones children accomplish via moms’ contributions; we then algorithmically customize the list each user sees based on what we know about them.

When we decided to build out Circle of Moms, we made a conscious decision to constrain our problem space. That makes this sort of pragmatic approach to technology a lot easier. We can focus on building a playgroup finder for your area, a baby name chooser tailored to your preferences, and a guide to the development of your child, all as separate products.

Like LinkedIn, we’re a vertical social network with an awesome data set. We discovered that at 18 weeks, moms on the East Coast are 40% more likely to give their children solid food than moms on the West Coast. We learned that San Francisco has lots of babies but very few school children. And our data tell us that conservative moms name their kids Reagan and Sarah, while liberal moms prefer Jalen and Jada.

The examples above are interesting stories, but they’re just a start. Connecting moms to knowledgable peers with shared experiences and values is a hard problem, and to get it right we need to be successful across multiple avenues. Parsing conversations for keywords and underlying meaning (which we do) is one step in that direction; getting users to self-identify in intuitive ways is another. One thing that defines us as a company is that we keep iterating in data-driven ways to create a great product and help to solve those 3 AM problems.

And fortunately, I work with an awesome group of engineers who are rapidly pushing our technology forward to build useful products for moms. Brian‘s a low-key dad and a brilliant engineer who does everything from architecting and building complex search systems to writing email copy. Chris is a gold-buying, Tetris-dominating, cloud engineer extraordinaire, who uses the latest ec2 technologies to automate everything except his daily five cups of coffee. Regina, our in-house yoga master, has an awesome mix of front-end and back-end web development skills and puts together different technologies in clever and scalable ways. Hoi Ying is a fun and talented young developer, who’s become a force for us as we beef up our back-end technology (and our Hello Kitty collection!). Emma is a sponge of new information, with a great mix of engineering, product, analytics, and marketing talents: she crunches numbers, builds features, and improves the content our users see.

Collectively, they — and the rest of our eighteen person team — have made us profitable, reached millions of moms, and created a product and community that’s helping both moms and the high school class of 2027. That’s not what I expected when I left LinkedIn four years ago, but it’s pretty darned rewarding.