Analyzing Billions of Transactions to Answer Questions on Consumer Behavior – Michael Babineau of Second Measure with Kevin Hale
Michael Babineau is cofounder and CEO of Second Measure. Second Measure analyzes billions of credit card transactions to answer real-time questions on consumer behavior. They were in the Summer 2015 batch of YC and you can check them out at SecondMeasure.com.
Kevin Hale is a Partner at YC. Before working at YC he cofounded Wufoo.
00:00 – What idea did Mike apply to YC with?
00:45 – Where did the idea come from?
4:00 – From project to company
9:45 – What info did investors want to know that Second Measure could provide?
11:30 – Their first customers
14:00 – The primary use case of Second Measure for VCs
16:45 – What questions are they trying to answer?
19:00 – Data examples from their blog
27:00 – Second Measure’s product development process
29:00 – Finding good data scientists who work from first principles
36:30 – Why is credit card data so messy?
41:30 – Cleaning data
43:45 – Using their product for competitive analysis
47:00 – Their sales process
48:30 – Raising money from Goldman Sachs and Citi
51:30 – Focusing on a specific problem
53:30 – Keeping the product compelling when it’s table stakes
Craig Cannon [00:00] – Hey, how is it going? This is Craig Cannon and you’re listening to Y Combinator’s podcast. Today’s episode is with Michael Babineau and Kevin Hale. Michael is co-founder and CEO of Second Measure. Second Measure analyzes billions of credit card transactions to answer real-time questions on consumer behavior. They were in the Summer 2015 batch of YC, and you can check them out at secondmeasure.com. Kevin is partner at YC. Before working at YC, he co-founded Wufoo. You can find Michael on Twitter @mikebabineau and Kevin is @ilikevests. All right, here we go. Mike, Kevin was your group partner when you did YC in the Summer 2015 batch. What idea did you apply with?
Kevin Hale [00:43] – Our basic our idea at the time was really to use credit card data to help investors make better investment decisions. That is actually not really far from what we do today. The main evolution is that now we work with companies as well, not just investors. A big part of the idea though is not just to look at credit card data and try to find interesting things and then tell investors about it, but instead to build an analytics platform, throw that in front of investors, and then let them answer their own questions.
Craig Cannon [01:15] – What led you to coming up with that idea?
Kevin Hale [01:18] – That is a good question. I don’t come from an investing background or I don’t come from finance at all. I actually worked in video games and the same is true to my co-founder, Lillian. She and I met at Electronic Arts. We worked together there and then at another gaming start-up. Before that I was in ad tech and I’ve always been… We’re both software engineers. We’ve always been into tech world, but we’ve got plenty of friends in finance and one of those friends just out of blue called me one day, and was like, “Mike, I need your help. I’ve got two terabytes of data on a hard drive. How do I load this into Excel?” It was one of those moments where, again, as a software engineer, right? I get this question like, “Why? Why are you asking me this?” He’s in New York, I’m in the Bay Area, it’s the middle of the afternoon. Why am I fielding this? I wasn’t feeling particularly helpful. I was like, what did you engineer? Did you ask you dev team, did you ask your engineering team? I just hear silence, and then, “Mike, what are you talking about? We’ve got an IT guy,” and that’s it. And that blew my mind because he was at the 30 billion dollar hedge fund. I just assumed that all hedge funds look like Two Sigma or RenTech or just these places that have hundreds of quants and hundreds of engineers. In reality, most hedge funds have a handful of analysts and just some back office support, right? They don’t have any coders in house.
Kevin Hale [03:01] – That’s when we realized there’s this huge opportunity because investors they they make money off of having an information edge, right? Off of knowing things that other people don’t. A lot of people who work at hedge funds are very, very clever. They’re looking for this edge wherever they can find it. Over recent years, increasing with them and looking at things like Google trends to see like, “Oh, is there some leading indicator in search terms that would indicate some bigger shift in consumer sentiment about,” I don’t know, some company.
Michael Babineau [03:39] – Very unsophisticated sort of analysis.
Kevin Hale [03:42] – At the same time, like a clever idea and oftentimes it works. You’ve got investors subscribing to things like Comscore, looking at how many visits to a website are happening because that is roughly correlated with actual sales and it’s also this nice leading indicator in the sense that public companies only come out with, they only report metrics once a quarter. It’s like not right at the end of the quarter, it’s actually some time afterwards. You can actually look at how many people visit. If you can see how many people visited amazon.com over the past quarter, then you can look at the full quarter of information, and then you can see how well that correlates with the resulting reported performance.
Michael Babineau [04:32] – How’d you go from helping someone with like a two terabyte Excel problem and working on video games to being like, “Okay, this is now time for us,” like, “Quit our jobs and can solve this problem.” What were you doing there at EA? What were both your roles?
Kevin Hale [04:49] – Yeah, so we are not video game programmers. We were working at a video game company, but my specialty was building high-scale infrastructure and Lillian’s specialty is building data pipelines in analytics teams. When you look at the video game space, how does a company like Zynga. I think, Zynga epitomizes this, right? They’re like very metrics-driven, very data-driven. One of the things that did very well is they optimize the hell out of their games. When you think about an online game and you think about what you want to optimize for, right? You want to, actually let’s talk about this in terms of fun. If your game is too hard, no one’s going to play it. If the game is too easy, no one’s going to play it. You have to find this balance where the game is not too hard and not too easy. If you have an online game, then you have this amazing leg up over games that are press to disc and then shipped out because you get to update them. If you, and the best way to tell if your game is too hard or too easy is to simply look at how far people make it in the game. For instance, you can look at how many players make it from level one to level two. If there’s a severe drop off… If not enough people are doing it, then it’s a signal that, “Hey, you know what? Maybe we need to tweak this.” Of course, the person who needs to answer that question is a game designer, and usually game designers aren’t writing SQL. You’ve got all these, you’re tracking all these events, right, of a player.
Kevin Hale [06:32] – They pass level one or like player died or whatever. All of these events are being tracked and you’re like this is like a standard sort of analytics pipeline, right? You instrument your application, you have all these events streaming out, you store them somewhere, you do some sort of processing on them, and then you dump them into some place where you can query. But then you’ve got these people who typically aren’t coders, like a game designer or product manager. They want to answer questions about how people are behaving in the game. You basically have two paths at this point, right? If the game designer says like, “How many people made it to level two?” Then as somebody on the data team, you can say, “Okay, let me run that report for you.” Then you’d go and you query it and you put together the results and you send it back. Then they look and they say, “Oh, this is great. How many people made it to level three?” And you’re like, “Oh.” You roll your eyes and you’re like, “I see where this is going.” At that point you’re like, “Okay. I have a choice. I can either play this go between,” right, and like fetching data over and over again. Or I can build tools, right? And if I build a tool and hand that tool to the designer and say, “Here, answer this yourself,” then I can focus on doing much cooler things and much more interesting things. Also I’m out of the way now. I’m no longer in the way of this person answering their own questions. This is exactly what we did in the video game space and this is a pattern that we recognize could be really useful in the investment space.
Kevin Hale [08:11] – Right, you got all these investment analysts and they know so much about the companies that they are making investment decisions on and they know what questions they want to answer. You can either put yourself in a position where you’re trying to guess the questions and like writing, prewriting reports and trying to sell them reports, or you can just give them some sort of tool that actually empowers them to answer all the crazy questions that they thought they couldn’t answer.
Michael Babineau [08:40] – This is the thing that’s fascinating. You guys are building tools for understanding how to improve video games. How does that become all of a sudden the skillset needed to sell financial and analytics software and insights to people who run hedge funds and investment firms or even to do corporate competitive tracking. Because to me it’s just like, I imagine they’re going to ask about, like, “What’s your background? How do I know? How does that start?” What made you realize that, “We could probably do this?”
Kevin Hale [09:12] – It comes down to, what is the fundamental problem being solved? The fundamental problem is that you’ve got somebody who probably isn’t a coder and they want to answer a question of behavioral data.
Michael Babineau [09:27] – You write video games and then how do you decide what was the first product going to be?
Kevin Hale [09:31] – For us it was really digging in further to understand what types of data investors were most interested in. What we found is that transactional data, like specifically credit card transaction data, is one of the things that they’re really excited about but they were banging their heads against it, right? This is fundamentally, like credit card transaction data, it’s a messy data set with unstructured data problems baked into it. The skillsets of investors, even the more technical ones, those tend to lean more towards time series analysis as opposed to dealing with large, messy data sets.
Michael Babineau [10:16] – What kind of questions were the investors interested in? From that data set?
Kevin Hale [10:22] – One of the main things is just how is Chipotle doing, right? Like are they, so they famously had a food poisoning incident a couple of years ago. Actually I think they had several, but eager investors wanted to know what is the impact to their actual revenue?
Michael Babineau [10:44] – How come there was no way to answer this before you guys came onto the scene?
Kevin Hale [10:50] – This is one of the interesting things. There actually was a way to answer it. It was just a terrible path. The way to answer this before was with a survey. Right, so you go to some market research company and you say hey, there’s this Chipotle, the whole food poisoning thing. Can you help me understand how many people stopped going to Chipotle?
Michael Babineau [11:13] – And they just have to like try to find a bunch of people that match a demographic and then hope these people answer. Are you going to represent, like…?
Kevin Hale [11:20] – Exactly, it takes weeks or months. It costs 10’s or hundreds of thousands of dollars. You end up with this tiny sample of, you know, like, “Oh good, we got 100 respondents and they said that they,” you know, “From this pool of 100, 20 of them said that they have considered stopping their Chipotle dining all together.”
Michael Babineau [11:44] – And then what do you guys do instead?
Kevin Hale [11:46] – For us, because we have direct observations and millions of U.S. consumers, we see all the purchases. We can just look it up right away. In fact, we don’t even need to look it up. We can just give you a tool and then you can find the answer yourself.
Craig Cannon [12:00] – You got your first customers during YC. Correct?
Kevin Hale [12:04] – Yes.
Craig Cannon [12:05] – How did you go about even getting them?
Kevin Hale [12:07] – I have to think back. This is 2015. Our very first customer was a VC who also ended up investing in us. One of the things that was interesting is that actually this is one of the things where we got to, I feel like we got to cheat a little bit because we were in YC. Because we were in YC, VCs are always excited to talk to YC companies and that’s…
Michael Babineau [12:36] – They’re trying to figure out who is in the batch and then try to invest before demo deck.
Kevin Hale [12:40] – That’s exactly right. We had a whole bunch of these funny meetings where we’re trying to get in front of them to pitch them on a product and they’re happy to take the meeting because they want to hear about what we do. It ends up being this dual purpose thing where they’re like, “Okay, show me the product. Now tell me your business model.” And you’re like, “Well, would you like to buy the product?” And fortunately a lot of times the answer did end up being yes. Now most of the VCs here in the Bay Area, they are our customers. But it was really interesting navigating those early conversations.
Michael Babineau [13:17] – What were they excited about? Because with credit card data, there’s some things that it’s really good at showing and identifying and some things that are not so good. For example it tends to be great for predicting consumer trends but…
Kevin Hale [13:29] – You just have to keep in mind like what is it we’re actually seeing? What we’re seeing is spending for a large proportion of U.S. consumers. If you want to understand a company that doesn’t target consumers, if it doesn’t target specifically U.S. consumers and more specifically if it doesn’t sell things directly to them, right? We’re not going to see General Mills, right? That’s all sold through grocery stores. But if it’s something that you might see on your credit card statement then those are the things that we can help with.
Michael Babineau [14:04] – Like Uber, Lyft.
Kevin Hale [14:06] – Exactly.
Michael Babineau [14:07] – All the meal like Gobble, et cetera. And then what you’re not going to see is like BDB Enterprise, companies, et cetera. But it tends to be like lots of people are interested in consumer stuff because they’re like the fastest growing, most interesting segment.
Kevin Hale [14:19] – Exactly, yeah exactly. The market is more than big enough.
Craig Cannon [14:24] – Are they using it as a market sizing tool? First because if you’re investing the seat stage company.
Kevin Hale [14:29] – Yeah so probably the primary use case among VC is actually diligence, right? And when you think about it, like put yourself in the shoes of a venture capitalist. So some company walks in and they say, they throw some numbers on some slides, they show it to you, and you’re like, “Okay, great. I have lots of follow on questions. Do I try to get the numbers from you? And then additionally there’s a whole bunch of questions I have about your market which you may not even know the answers to.” A good example of this would be if you are, so as a VC somebody comes in and pitches you. They’re in, actually let’s just talk about Bird and Lime. Right, so imagine you’re a VC. Bird comes in and pitches you. And they show you this chart and it’s the perfect hockey stick chart. And you’re like, “This is amazing. I’ve never seen growth like this before.” At the same time though, you’ve heard of other companies, you know Lime is out there, you’ve heard of maybe Jump bikes.
Michael Babineau [15:33] – You want to pick the best one.
Kevin Hale [15:34] – Exactly, like are you talking to number one, are you talking to number two? You know, like what? And then also fundamentally are the, like if Bird is showing good unit economics, is that best in class? Or could it be even better? And this is an area, like this is one of the key areas where we help VCs, is in giving them visibility not just into the company they’re talking to, but into their competitors, right? Into each of those, like every company in that space in relation to one another. We can say like, “Oh yeah, Bird, Lime. Like here’s where Bird’s winning, here’s where Lime’s winning, right? Here are where the differences are in how well those customers perform. How much they spend and so on and so forth.”
Craig Cannon [16:23] – When you say unit economics, how do you uncover that data?
Kevin Hale [16:26] – So we don’t see unit economics, I’m sorry. So obviously we don’t see the cost side, we just see the spending side.
Craig Cannon [16:33] – Right, so you could say an average Bird customer spends $40 a week versus a Lime customer that might be 20.
Kevin Hale [16:39] – Exactly, and generally, like again, if you’re a VC you have your own ideas for how to estimate the cost side of the equation.
Craig Cannon [16:50] – Okay, gotcha.
Michael Babineau [16:51] – What other metrics are you able to show? I was always impressed when looking to the dashboard inside of Second Measure about like, “Wow, I cannot just see how much revenue is being pooled. But also things like cohorts, lifetime value, et cetera.” What metrics get investors super excited? You got to share them.
Kevin Hale [17:14] – Taking a step back, let’s think about like what are main problems that we’re trying to solve? One is generally… One is generally focused on company performance, right? And this includes things like competitive intelligence and benchmarking, right? Like show me, what is, I don’t know. What is the relative market share of the various meal kit players? How long do their customers stick around? Right, how much do they spend over time, right? Like what are the lifetime sales after 12 months? And again, if we split those into different cohorts, are newer cohorts performing better or worse than older cohorts? There’s all of these things in and around company performance. Then separately, there’s stuff around consumer behavior, right? These are things like where else do my customers shop? Things intended to help you get a better picture of who your customers are and really help you hone in on who your best customer. I’m saying you but really it could be you, it could be your competitor, it could be a company you’re doing diligence on, some target company.
Craig Cannon [18:31] – What are some good example of that? Because your blog is basically just this, right? It’s like just insights.
Kevin Hale [18:36] – Yeah, yeah, it’s interesting, right? Because our core product is really about empowerment and saying, “Hey, you as a user, you can answer whatever questions you want within the space of U.S. consumer spending.” But then, and we don’t sell research.
Michael Babineau [18:54] – Oh, so you don’t answer questions for people directly?
Kevin Hale [18:58] – We’ll do it on a case by, on like a project by project basis, but we’re not the ones coming up with the questions, right? If somebody comes to us and says I have this specific question. I tried this in your application. I can’t quite answer it yet. I have this more specific question. Could it be answered? Those are cases where we can do a one-off research project. Those are paid projects, but we don’t publish those. The thing we don’t do is we don’t proactively do research and go out and call up 10 of our clients and try to sell it to them.
Michael Babineau [19:32] – Gotcha, what’s some stuff that you guys have put on the blog recently that’s your favorite?
Kevin Hale [19:36] – One thing we’ve started doing is, actually if we talk about our blog, we also need to talk about our press mention. So we actually work with the press a whole ton. Right, and so we keep getting quoted in like Wall Street Journal, Financial Times, et cetera. This has been great for us. It’s great for the reporters too because they’re trying to write about the upcoming potential Lyft IPO or whatever and they want to support their reporting with more information and we can help provide them with that information. We’re happy to do so. The Uber, Lyft thing is like a recurring topic and so in our blog we decided, you know what? We’re just going to keep publishing periodically. Publishing updates on that.
Michael Babineau [20:23] – When you choose a question you want to ask about the Uber versus Lyft, do you guys have come up with initial questions and now you listen to what the press are kind of asking you that they want to verify or is it always you guys they’re coming up with?
Kevin Hale [20:36] – I’d say it is us always coming up with it. We actually have a dedicated editorial team. We literally have a team of data scientists and writers who just pay attention to what’s going on in the news, what’s going on with companies that could potentially be interesting to others. The person who runs it, she has a journalistic background. This is their core focus, right? Is find interesting things to write about and then write about them.
Craig Cannon [21:09] – Let’s talk about some examples. So before we started recording, one you mentioned was Stitch Fix and where the customers of Stitch Fix do and do not maybe spend.
Kevin Hale [21:20] – This is a really interesting thing, right? Because one thing, so part of understanding what questions people are asking is just going out and talking to people. One recurring question we heard about Stitch Fix was is Stitch Fix cannibalizing department store sales? Are they competitive with department stores? We decided to dig in. We had no idea what the answer was, but we decided to dig in and we attacked the problem by basically saying okay, let’s look at people’s spending at department stores before and after they become a Stitch Fix customer. What we found is that Stitch Fix had no impact on department store spend, right? People just started spending more on clothes period. Right and in fact, the people who Stitch Fix, Stitch Fix’s best customers, actually spent even more on clothes before becoming a Stitch Fix customer than after.
Michael Babineau [22:24] – Oh, like Stitch Fix inspired them to go out and find more clothes or to buy more.
Kevin Hale [22:28] – Yeah, I think one way to characterize it is that it peaks their interest in fashion and so they don’t spend any less, they just…
Michael Babineau [22:38] – But part of it is like it probably jump starts a variety. They’re like, “Oh, I’m introduced to a variety of stuff I never would have considered beforehand,” and now it’s like, “Oh, now when I’m out there in the real world looking at stuff,” I’m like, “Oh, there’s more things that might appeal to me because I’ve been exposed to them.”
Kevin Hale [22:53] – The key thing is that it’s not displacing the spend, right? That was a real surprise. It’s also a really important question to answer because if you’re at a department store and you’re trying to figure out is this Stitch Fix friend or foe? Right, like this really points more to friend.
Craig Cannon [23:12] – Do you actively track the rise and fall of brands? Because I’m wondering there must be instances of certain things being swapped out. On a recent post was Peloton memberships going up ahead of SoulCycle, right? That’s really interesting. Are there trades happening that you can follow?
Kevin Hale [23:29] – So sorry, when you say trades, do you mean people…
Craig Cannon [23:34] – Sign up for Peloton instead of SoulCycle.
Kevin Hale [23:36] – We, again, this is something we will attack from an editorial perspective. But again, like our core business is about putting a product in front of our clients that through which they can answer their own questions. Now on the blog side, yeah. I mean the Peloton and SoulCycle story is super interesting. Like Peloton is a beast. SoulCycle has some interesting. Actually, so after we came out with this article, SoulCycle, basically they had a nice non-denial denial where they basically said we don’t know what they’re talking about. Our numbers are great, but didn’t actually dispute the metrics.
Michael Babineau [24:21] – To give some context, what did your blog post say and then what was it that SoulCycle was nervous about?
Kevin Hale [24:26] – The short version is that Peloton has now surpassed SoulCycle in terms of the number of active Peloton members, right? And this is based on spending behavior. Active Peloton members on a monthly basis have surpassed the number of SoulCycle active riders on a monthly basis.
Michael Babineau [24:48] – Is there an overlap, like a Venn diagram of people who used to be SoulCycle and they’ve switched to Pelaton?
Kevin Hale [24:54] – There is. There’s both a current overlap and there’s the Sankey diagram-type thing of people who used to be one and now are another.
Craig Cannon [25:02] – Have you been following how Amazon Basics has developed their products?
Kevin Hale [25:10] – I am generally familiar with it. I’d say for us, that is not something we have a lot of visibility into because–
Craig Cannon [25:20] – It’s Amazon.
Kevin Hale [25:20] – At the end of the day, we just see an Amazon.
Michael Babineau [25:20] – Just like a general Amazon.
Kevin Hale [25:23] – Exactly.
Michael Babineau [25:23] – But you’ve done some research about Amazon Prime people.
Kevin Hale [25:26] – We did. This is a case where we did a much deeper dive and we actually gave several talks on this. One thing, and this is spearheaded, again, by a editorial team. One of our data scientists, Brandon, so he dug into Amazon’s customer base and specifically he wanted to understand really the differences in behavior between Amazon Prime members and non Prime members and how that’s changed over time and really how important Amazon Prime’s members are to Amazon. And I think one of the interesting takeaways is that increasingly Amazon is looking more and more like a subscription business. They’re increasingly reliant on Amazon Prime customers for their revenue. Then another interesting thing is that even people who became an Amazon Prime subscriber, even if they lapse, right? Even if they are no longer a subscriber, they’re still spending more on Amazon after than they did before.
Michael Babineau [26:43] – How do you get to that conclusion? What was the evidence that showed that oh, Amazon is more focused on subscriber? How did you guys get to that conclusion?
Kevin Hale [26:54] – I would characterize it that they’re less… It’s not that they’re more focused on subscribers, but instead that an increase in proportion of their revenue is derived from people who are Amazon subscribers.
Michael Babineau [27:04] – I got you. It’s one of these things where it’s like, “Oh, it’s turning out like Amazon’s most valuable revenue come from the Amazon Prime subscribers.” And we don’t know the reason why, but there’s obvious things that people can say is important. For example, it’s just like, “Hey. They already pay for this membership, so they might as well use it when they’re ordering and buying stuff.” It’s an excuse to have something delivered to your house versus go to the store. Because I’m already paying for the membership. It’s like I cost something.
Craig Cannon [27:34] – When it comes to product development on your side, are you incorporating this data in any way or is it just talking to your users, developing product from there?
Kevin Hale [27:44] – When we think about improving our product, we have a few different streams for really feeding the backlog. One is internally driven, right? It’s based on where we know we want to take our application and also factors in us going out and proactively speaking with our own customers, doing that user research and really digging into their use cases and then figuring out where the gaps are and then attacking those. That’s one. Another is, I mentioned earlier that we do some custom research for customers. This is like, think of it as a professional services like approach. This is something that also helps feed our backlog because if we see recurring requests, then you know what? This is probably something we should add to our product. Then finally we have the editorial side which for us is the best form of dog fooding, right? We can go in and try to use our app to answer a question if we find that we hit a wall, right? We can’t, it’s like, “Well, we’ve dug as far as we can go and now we have to go to the data behind it to answer the question.” That’s a great signal that this is something we should probably build.
Michael Babineau [29:03] – One thing that’s interesting to me is that I feel like we just recently talked to Jay Klinka at Insight Data Science and I feel like data scientists, like hearing about your company, this seems like a dream job. Like, “I work on interesting problems and questions and then even if it’s with your editorial board that’s figuring that stuff out, it seems fascinating to me,” is like, “Oh, every problem is going to be kind of different. We put that out there and whether it’s solving this stuff for your customers or stuff to promote the company.” How do you look at finding… Because you guys are hiring right now, right?
Kevin Hale [29:38] – Yes.
Michael Babineau [29:39] – Like how do you find a good data science and what are traits that you’re looking for that you know is going to be a good fit for this kind of nebulous work?
Kevin Hale [29:47] – Yeah, it’s such a good question. I feel like data scientist is such an overloaded and I think a bit overused, like an overstretched, term. I think for us specifically what we’re looking for are people who are scientists with a capital S who have very strong quantitative backgrounds and can understand from first principles like that problems that they’re trying to solve. Very frequently what you find are people interested in data science, they learn a lot of the tools but maybe skip over the fundamentals.
Michael Babineau [30:26] – When you say are able to think from first principles, I think this is something I hear as a common theme also for people who’re looking for good engineers or product manage, et cetera. Like what does that mean exactly?
Kevin Hale [30:37] – Let’s think about it this way. A third of our company have PhDs, right? We’re basically equally, so most of the team is technical.
Michael Babineau [30:49] – How big is the team?
Kevin Hale [30:49] – We’re 60 people today. Most of the team is technical and it’s about a 50 50 split between engineers and data scientist. Now on the data side, what you’ll find is that we have people with backgrounds ranging from statistical genetics to cognitive neuroscience to string theory to earth science to climate science. Really all over the place. The common theme though is that all of them are extremely good in statistics, right? You’ve got this, there’s sort of this statistical foundation that in our opinion everything is built on top of. And it’s our view that if you come in with that strong math-y foundation, that learning the tools. The tools can be taught, right? We’re happy to help people get onboard with using Python. Likem, “Okay, cool. You’ve only used R? Like that’s fine,” right? We can help you learn to switch over to Jupyter Notebooks. But the thing that we’re not going to teach you is we’re not going to teach you how to do math.
Michael Babineau [32:01] – And then how does that translate like into the first principles? Because I usually think of it like someone who’s willing to challenge… I will give someone a task and sometimes they will come back and say, like, actually can we just dive down as like what’s the reason behind this task and maybe just be able to be like, “Oh, actually I think I can improve the question we need to be looking into… Instead.”
Kevin Hale [32:22] – A lot of this ties in with the nature of the types of problems we’re trying to solve, right? You can’t, like there’s no playbook of best practices for dealing with the problems associated with transactional data, right? There’s no playbook on building an analytics platform focused on consumer spending behavior. A lot of the things that we’re doing, we’re either we’re doing them for the first time and in some cases maybe they are simply being done for the first time. It’s something where we benefit from people who can approach these big nebulous and open-ended problems. Come in and figure out how to structure and decompose the problem and then tackle it piece by piece.
Craig Cannon [33:10] – Do you train for that or you just hope that they have it? Like What is the test, is my question really. Because it’s really just like, here’s a problem. But then before you get overwhelmed by the problem because often you’re told like hey, you have to take route A or B. Usually there’s options like C through infinity. You have to ask why. How do you, whether it’s through interviews or training, get that out of employees?
Kevin Hale [33:36] – I think of this as less something that we train people to do and more something that we hire for, like we screen for in the hiring process. So we’ve taken great care in designing and actually iterating on our interview process and I’d say that there is a significant technical evaluation where we’re trying to test for exactly these types of things. For a data scientist, one of the things that we do is we actually give them a big messy data set and we say it’s open-ended. Do some research and then present it to us. Tell us what you were looking for at tell us what you found.
Michael Babineau [34:19] – What’s some common mistakes that people do that end up not working out so well? What’s some stuff that the really great applicants have been able to do? I know I’m trying to help people like cheat on their…
Kevin Hale [34:31] – I’d say the number one mistake that people make is that they assume too much of the data. They assume the data’s perfect, right? They assume that what we give them, like, “Oh, this is easy. All I have to do is load it into whatever.” Like into Pandas or load it in, like, throw it on a database and just start running queries, get the answers, and then throw it into a slide and be done with it. Like that never really works because, like, and this just isn’t how data in our world works. There are always dragons somewhere. And so a big part of this exercise is, well how diligent were you in looking for dragons and anticipating these problems? And then you don’t necessarily need to solve all of them, but you need to be aware of them because they actually can distort your findings. As long as you, if you identify them and even if you have findings that are invalid but you’re able to identify that, “Hey, I found this thing, but I deliberately made the simplifying assumption so I could complete it in a reasonable amount of time,” like that’s fine.
Michael Babineau [35:49] – So the good people, what they’re good at is not starting from their own assumption, but actually trying to query and figure out what were the assumptions that I’m working with?
Kevin Hale [35:59] – Yeah, exactly. Whether it’s in the data, the question, et cetera. Once you have that, it helps you understand how strong or how weak is my ultimate conclusion going to be as a result. Whether it’s in the data, the question, et cetera. Once you have that, it’s sort of like building a house, right? If you were to hire a construction crew to come out and build a house and they just came out on site and they just started erecting walls and then they hand over the keys, you slam the front door and the whole thing falls over because it was on a shaky foundation, right? Then like clearly they failed. And so for us, what we like is to find people who really like to understand the foundation that they’re working with to make sure that it will be sound when they build the house.
Craig Cannon [36:38] – I’ve never done a project involving credit card data. But then I use these tools like Mint and it consistently classifies things as the wrong thing. Can you explain to me why this stuff is not normalized because it seems incredible valuable, potentially not that difficult, obviously it is difficult, but why isn’t it normalized? Why do you have to clean it all?
Kevin Hale [37:02] – I guess maybe the easiest place to start is think about your last credit card statement, right? Think about the time where you’ve looked at your credit card statement and you saw a transaction on there and it says something like S Bucks or, I don’t know, MW San Carlos, which would be like Men’s Warehouse San Carlos. It doesn’t say Men’s Warehouse, it doesn’t say Starbucks. It says something which if you squint at it and scratch your head a little bit, like you as a human can probably figure out what it is. Now the problem is that there are many different companies all putting in some piece. Actually, the fundamental problem here is that some human decided how to represent that store in a credit card statement and they’re working within this constraint of a limited space. They only have a certain number of characters and they have to type something in, which again communicates to a human that yeah, you were at Walmart so you don’t dispute the charge. But it was never designed for a machine to read. And so the result of this is that you end up with this cardinality problem, right? You end up with many different variants for a single merchant. And part of our job is to find all the variants and to map it back to that singular merchant.
Craig Cannon [38:39] – You’re saying there are multiple text strings associated with Men’s Warehouse in San Jose or San Carlos or whatever.
Kevin Hale [38:46] – Correct. Within our data set, so we’re looking at 50 plus billion transactions. We have one billion unique transaction descriptions. I’ll tell you what, there are not one billion merchants in the U.S.
Craig Cannon [39:05] – Right, okay.
Kevin Hale [39:07] – Macy’s alone has like three million different representations.
Craig Cannon [39:13] – Yeah, I’m just kind of baffled that it was never like hey, Macy’s you’re store number 12,000, whatever, done.
Kevin Hale [39:19] – There’s basically two layers of problems. One is the human layer, right? Where somewhere you’ve got a human and they’re setting up the point of sale system like the swiping device for a certain Macy’s store. Actually, let’s just talk about McDonald’s for a second. McDonald’s, you’ve got franchises. When somebody sets up the franchise, they work with a point of sale provider and they get their point of sale sets up. And like, “Okay, what should this be?” It should be like McDonald’s, I don’t know, like F139. “Okay, great.” Right, now we’ve got this one location. The problem is, depending on how the transaction is processed, the apostrophe that you expected to appear in McDonald’s could be a space, could be a star, it could be deleted, right? Could just be McDonald nothing S. Right, and like basically the two problems, one is a human one where different humans could describe things differently. They can even typo the name of their own company, which happens. And then the second problem is they’re various perturbations that can take place in the processing chain.
Michael Babineau [40:37] – Part of what was the corrections had to happen by user of Mint. And I think humans don’t want to correct that data diligently. And also if it turns out like oh, I can see a human getting really frustrated where it’s like this is the 50th time I had to correct that this was coming from McDonald’s and therefore I no longer want to correct this anymore because this is just not any good. The problem actually is all of them are so different and so humans are giving up on the classification when really it’s like this is actually–
Craig Cannon [41:08] – I have such limited incentive to classify my own data. I don’t really care. I mean I’m sure some people do, but I don’t really care.
Kevin Hale [41:15] – I mean the problem gets even worse, right?
Michael Babineau [41:17] – Sometimes I don’t want to know. It’s like I need to sit in a fast food denial.
Craig Cannon [41:21] – Yeah, if Amazon was all classified in one category, that would not be good.
Kevin Hale [41:25] – If you’re coming into this with, I don’t know, a software mindset, right? You’re thinking oh yeah, there should be some unique identifier for Blue Apron. Right but if you actually just look at all the Blue Apron transactions, what you’re going to find out is that there’s actually more than one Blue Apron. Did you know that there’s a Blue Apron grocery store that’s in Brooklyn?
Michael Babineau [41:47] – Oh, that’s very close.
Kevin Hale [41:49] – Yeah, things like that. Or like United. United Airlines of course, but then there’s also a United grocery store. And they show up, in some cases they show up the exact same on your credit card statement.
Michael Babineau [42:03] – How much time are you guys spending cleaning up data? Is it like perpetual and nonstop?
Kevin Hale [42:08] – We don’t think of it as a fundamentally human… There are human elements of it, but really it’s something that we try to use machine-based approaches to operate as a giant lever. I guess we think of it this way, right? We’ve basically had to build two different products. One is this pipeline which ingests raw transactional data and then outputs something useful. The things that we do in that process are things like this entity resolution, which is what we’ve just been talking about with merchants. But it also includes other things like figuring out for an Uber transaction it say San Francisco, it always say San Francisco. But, you know, obviously not all Uber rides are in the city.
Michael Babineau [42:54] – Oh, looking at other transactions around it to see like, “Oh, maybe this originated somewhere else…”
Kevin Hale [42:59] – Exactly. We figure out the location of the purchaser based on where their other purchases are and that let’s us fill in the gaps. So we say like, “Oh, you know what? Ignore this location for Uber and instead use this computed location.” There are other things that we needed to solve and then there’s this whole other thing around de-biasing, right? Because we basically have this longitudinal study going on. Right, we have this panel. The panel of consumers and obviously it’s not going to be a perfectly representative sample of the U.S. so we endeavor to figure out all the ways in which it isn’t representative and then apply corrections to make sure that whatever results you get do represent the greater population. Anyway, so that’s one thing that we’re building is this pipeline and we’ve got 10 to 15 people working on that. But then we also have our analytics platform, right? This is the, think of it as the hyper-specialized Tableau where we’ve built in lots of different analyses that operate on this nice clean data set that the pipeline is outputted.
Michael Babineau [44:12] – One increasingly growing set of customers for you guys are corporations doing this for I guess competitive analysis. How did that come up and so like why is that? I mean like I can see why it would be interesting to them, but I’m just wondering are they looking at questions very differently when they’re looking on your platform to answer them?
Kevin Hale [44:35] – This is a really interesting journey for us because we started out building a platform that was focused on helping investors understand company performance. YC hammers in that you need to focus, focus right? That it’s not, like it’s better to have something a small number of people love than something that many people just like. We really took that to heart and we didn’t want to work with companies for a long time because we were afraid that it would spread out our focus. One of the things that changed our thinking was this. So there’s a book from Clayton Christensen. He’s a professor at HBS and he wrote Innovator’s Dilemma. More recently, he published a book called Competing Against Luck and in it he talks about the theory of jobs to be done. The basic premise is that when you’re thinking about substitutes for your product, you shouldn’t be thinking about things that just look similar to your product. Instead you should be thinking about fundamentally what is the job that your customer is hiring your product to do? This changed the way we thought about focus because this whole time we’ve been thinking investors, investors, investors. But in truth, there are many different use cases for investors. A fundamental discretionary hedge fund, right? Think of it as a group of analysts who are working in Excel and trying to figure out is Stitch Fix a good, like poised for growth in the longer term? They have a very different use case from a quant investor who’s focused on
Kevin Hale [46:26] – like someone who has a purely systematic strategy and is trying to trade on the daily, weekly, or even like just quarter-to-quarter based on where they think companies are likely to beat or miss relative to expectation. These are different use cases. Now if we think about one of our core use cases as helping people understand company performance, then that’s when we begin to understand like okay, well investors want to know how company is performing, but so do other companies. Companies want to know how their competitors are doing. We had a really convenient way into this because we were working with so many VCs, they were actually bringing our product into the boardroom. They were showing their portfolio companies and then the CEO would raise their hand and say wait, “How do I get that?”
Craig Cannon [47:23] – It’s an interesting sales strategy. Maybe you could speak to that a little bit more because there are so many YC companies and oftentimes people just think YC is just consumer. Very much not true. YC’s just software, also not true. How do you guys think about your sales process?
Kevin Hale [47:40] – Yeah, I mean this is an area of focus for us now. We were very, very fortunate to have just a ton of really a ton of virality, which is like a funny thing to talk about in the context of really enterprise sales. But we actually haven’t done any out of bound sales yet. We have 150 clients, every single one of them came to us through Inbound, right? They basically… Somebody signed up and then they told their friend about us. Their friend reached out, love what they saw, signed up, told their friends, and so on.
Michael Babineau [48:18] – It’s a box of secrets. To me it’s just like, “Hey, I have this thing and it let’s me see stuff that I’ve never been able to see before.” That’s a very remarkable thing that’s easy to spread around.
Kevin Hale [48:31] – Exactly. Everyone knows that Uber is bigger than Lyft, but like how much? We can actually quantify it. I think that it’s a lot of fun and for certain people, right, it sort of unlocks a new way of doing their job and so it’s become like table stakes. That’s been great for us, but now we just raised our Series A, so that was led by Bessemer and co-led by Goldman Sachs and then we also had participation from Citi.
Michael Babineau [49:03] – Like Citibank?
Kevin Hale [49:04] – Correct.
Michael Babineau [49:05] – Goldman Sachs is huge. That’s such interesting partners or investors to be leading around. Why were they super excited?
Kevin Hale [49:13] – I’d say the reasons are different for each. So we fall into this general category. When you’re talking about the investment world, we fall into this category of companies generally known as alternative data companies. Alternative data basically refers to any information that can help you understand how companies are performing that isn’t just the traditional reported fundamentals or stock prices or things like that. This collectively it’s referring to credit card data, satellite imagery, web traffic data, geolocation data from mobile devices, and so on.
Michael Babineau [50:01] – Gotcha.
Kevin Hale [50:02] – Goldman Sachs has made a big push into the alternative data space and they had not made an investment in any company touching dealing with credit card data and so we’re their horse in that race if you will.
Michael Babineau [50:18] – Awesome.
Kevin Hale [50:19] – They’ve been just phenomenal. I think here in the Bay Area there’s so much of… Everybody is focused on working with big traditional VCs, but I think we’ve actually had tremendous success working with sort of, like, I don’t know, less expected players out here. Our seat round was actually led by Jefferies, another investment bank. One thing that we found to be true for both Jefferies and Goldman is that they are extraordinarily well-connected in New York City, in the East Coast, with not just investors, but also with companies because they’re investment banks. They’ve been just tremendous in terms of helping us get in front of more of the types of clients we want. Now for Citi, of course they have a ton of transactional data and this is something that they, like, this is a pain point that they feel internally. All the things that I described about messy transactional data, they understand that.
Michael Babineau [51:33] – It seems odd to me that they wouldn’t have a handle on this already themselves.
Kevin Hale [51:39] – It’s a really, really hard problem. I can’t understate that enough.
Michael Babineau [51:44] – Why are they so bad? Why everyone else so bad?
Kevin Hale [51:47] – I wouldn’t say that everyone else is so bad. I think it’s just that–
Michael Babineau [51:52] – You’re so good?
Craig Cannon [51:53] – Or that their products are so profitable?
Kevin Hale [51:55] – I think it’s that people are focused on solving specific problems. And so I wouldn’t say that Mint is terrible at identifying, at understanding transactions, right? They’re good at different things because they’re focused on a solving different problem. Like mint.com is not trying to, sorry. They’re trying to solve the problem of, “You know what? We need a best guess as to what this transaction is, but we need to do it for all the transactions,” right? We flip that problem upside down. We say, you know what? We don’t care about most transactions. We only care about the 5,000 or so companies that we track and growing. We care about that and we can’t be wrong because if we’re wrong, somebody’s going to lose millions of dollars.
Michael Babineau [52:39] – So the constraints actually help make it much easier as a result of not having to focus on everything.
Kevin Hale [52:44] – Exactly, it makes the problem tractable. And because we’re focused on that, what we’re discovering is that there are surprisingly interesting applications of this thing that we built for this hyper specific use case. Suddenly we’re finding out that, “Oh, this could help this type of company,” I don’t know, find new customers. It’s a company that sells to other businesses and they want to find fast growing businesses so they can sell to them. I think this has been one of the interesting parts about our journey, is discovering really by accident all of these additional use cases that we really didn’t anticipate.
Michael Babineau [53:30] – One thing that’s tricky and it’s probably one of these great problems that I have as a company is that if you’re people’s secret weapon and it becomes table stakes to be like, “Hey, if we want to stay ahead of the game and I have to,” like Bloomberg is good example. It’s like, “Oh, I have to sign up for Bloomberg if I’m a trader to use this.” And I think Second Measure might easily become into that category as well for a lot of investors. I feel like the tricky part is then if all of a sudden now everyone is using it as, like, how do you evolve the product? Like how do you keep it interesting?
Kevin Hale [54:03] – Yeah so–
Michael Babineau [54:04] – Like keep people on board versus jumping ship or trying to find some other solution.
Kevin Hale [54:08] – This is a really, really good point in particular for the investment audience. Because investors are looking, like they make money off of information edge. They make money off of knowing things, things that other people don’t. This actually informed a lot about how we tackle this problem because we could have very easily focused on selling “insights” or “signal” to hedge funds, right? Where we say, “Oh, here are the most interesting,” I don’t know, trading signals, and we send those out. But as we add more and more customers, then the value to each one becomes significantly diluted. We took the view that in particular because transactional data, there’s no single owner of transactional data. There’s no way to control how many people have access to it. Why not just assume everybody’s going to have access to it one day and then focus on building a tool to help people answer more creative questions. Our view is that even if everybody has access to the same data, that if they simply focus on asking better questions, they’ll still find their own edge. Now that’s for the investment community though. On the corporate side, really the fact that somebody else uses the product.
Michael Babineau [55:41] – That would be delightful. It’s like every major corporate company is like we have to use this for competitive analysis.
Craig Cannon [55:47] – If the worst case scenario was you were Bloomberg, you’d be okay.
Kevin Hale [55:52] – Bloomberg’s doing just fine.
Craig Cannon [55:54] – All right, awesome Mike. Thanks for coming in.
Michael Babineau [55:56] – Oh definitely, thank you.
Craig Cannon [55:58] – All right, thanks for listening. As always, you can find the transcript and video at blog.ycombinator.com and if you have a second it would be awesome to give us a rating and review wherever you find you find your podcasts. See you next time.