Building Dota Bots That Beat Pros - OpenAI's Greg Brockman, Szymon Sidor, and Sam Altman

by Y Combinator11/8/2017

Greg Brockman is the CTO and cofounder of OpenAI.

Szymon Sidor is a Research Scientist at OpenAI.

Sam Altman is the President of Y Combinator and Co-Chairman of OpenAI.

Watch their bot compete at The International.



Subscribe

iTunes
Google Play
Stitcher
SoundCloud
RSS


Transcript

Craig Cannon [00:00:00] – Hey, this is Craig Cannon, and you’re listening to Y Combinator’s podcast. Today’s guests are Greg Brockman, Szymon Sidor, and Sam Altman. Greg is the CTO and co-founder of OpenAI. Szymon is also at OpenAI. He’s a research scientist there, and before we get going, if you haven’t yet subscribed or reviewed the podcast yet, that would be awesome if you did. Alright, here we go.

Greg Brockman [00:00:22] – Now if you look forward to what’s going to happen over upcoming years is the hardware for these applications for running neural nets really, really quickly, are going to get fast, faster than people expect, and I think that what that’s going to unlock is you’re going to be able to scale up these models, and you’re going to see qualitatively different behaviors from what you’ve seen so far. At OpenAI we see this sometimes. For example, we had a paper on this unsupervised learning where you train a model to predict the next character in Amazon reviews, and just by learning to predict the next character in Amazon reviews somehow it learned a state of the art sentiment analysis classifier. And so it’s kind of crazy if you think about it, right? You just were told, hey, predict the next character, and if you were told to do this, well, the first thing you’d do is you’d learn the spelling of words and you’d learn punctuation. The next thing you do is you start to learn semantics, right? If you have extra capacity there, and that this effect goes away if you use a slightly smaller model. And what happens if you have a slightly larger model? Well, we don’t know because we can’t run those models yet. But in upcoming years we’ll be able to.

Sam Altman [00:01:25] – What do you guys think are the most promising under explored areas in AI? If we’re trying to make it come faster, what should people be working on that they’re not?

Szymon Sidor [00:01:34] – Yeah, so, the many areas of AI that we already developed, quite a bit, there is some basic research in just classification, planning, and getting reinforcement learning. And what people do is people kind of try to invent problems such as solving some complicated games of characterization. They try to add the kind of extra features to their models to combat those problems, but I think there’s very little research happening on actually understanding the existing methods and their limits. For example, it was a long held belief in deep learning that to paralyze your computation, you need to cram as small batches as possible in every device. And by doing this impressive engineering feat where they took recurrent networks and they implement that kind of GPU assembly code to make sure that you can fit bot sized odd and ends on every GPU and you know despite all the smart people working on this program only recently did Facebook take look at the very basic problem of classification. And their great effort called image that in one hour, they showed that if you actually take a code that doesn’t need classification and if you fix all the bugs, you actually can get away with a much larger bot size and therefore finish the classification much faster. And it’s not the kind of sexy research that people want to see where you have hierarchy some odd and end. But actually this kind of research I think at this point will advance the field the most.

Craig Cannon [00:03:31] – Greg you mentioned hardware in your initial answer, in the near term what are the actual innovations that you foresee happening?

Greg Brockman [00:03:39] – The big change is that the kinds of computers that we’ve been trying to make really fast are general purpose computers that are built on the von Neumann architecture. You basically have a processor, you have a big memory, and you have some bottleneck between the two. With the applications that we’re starting to do now, suddenly you can start making use of massively parallel computers that the architecture that these models can run on sort of the fastest are going to look kind of like the brain, where you have a bunch of neurons that all have their own memory right near to them, and they all talk to their neighbors. Maybe there’s some kind of longer range skip connections. And that just no one’s really had incentive to develop hardware like this and so what we’ve seen is well you move your neural networks from running on a CPU to GPU and now suddenly you have a thousand CUDA cores running in parallel and that you can get massive performance boost there. Now if you moved to specialized hardware that is sort of much more brain like and that runs in parallel with a bunch of tiny little cores, that you’re going to be able to run these model sort of insanely faster.

Craig Cannon [00:04:52] – One of the most common questions or threads of questions that we’re asked on Twitter and Facebook were generally how to get into AI. Could you guys give us just a primer of where someone should start if they’re just a CS major in college?

Szymon Sidor [00:05:09] – Yeah absolutely. It really depends on the nature of the project that you would like to do. I can tell you a bit about our project which is essentially developing live scale reinforcement learning for Dota 2. And their majority of the work is actually engineering. And you know like essentially taking the algorithms that we have already implemented and trying to scale them is usually the fastest way to get improvement in our experiments. Essentially becoming a good engineer for our team is much more valuable than for example people spending months upon months implementing exotic models…

Sam Altman [00:06:07] – Just to echo this, because I hear this come up all the time. People say it’s like my dream to work at OpenAI but I got to go get an AI PhD so I’ll see you in like five or seven years. If people with just a really solid engineer but no experience at all with AI, how long does it take someone like that to become productive for that kind of work at OpenAI that we’re looking for?

Greg Brockman [00:06:25] – Someone like that can actually become productive from day one and different engineers who show up at OpenAI, there’s a spectrum of where they end up specializing. There are some people who focus on building out infrastructure and that actually this infrastructure can range from well we have Kubernetes deployment that we run on top of Cloud Platform and building tooling and monitoring and sort of managing this underlying layer and actually it looks quite a bit like running a startup in that a lot of people who are most successful at that have quite a bit of running that large scale in a startup or production environment. There’s kind of a next level of getting to the actual machine learning where if you think of how machine learning systems look, that they tend to be this like magical black box of machine learning that this core and you actually try to make that core be as small as possible because machine learning is really hard. Needs a lot of compute, it’s really hard to tell what’s going on there and so you want it to be as simple as possible but then you surround it by as much engineering as you possibly can.

Sam Altman [00:07:29] – Wat percent of work on the Dota 2 project would you guys say was what people would think of as like machine learning science versus engineering?

Szymon Sidor [00:07:38] – As far as day to day work goes this kind of work was almost nonexistent. It was like it might be a few person weeks spend on that compared to like person would spend on engineering. And I think maybe placing some good bets was one of it.

Sam Altman [00:08:00] – Good bets on the machine learning side?

Szymon Sidor [00:08:02] – On the machine learning side, yeah. And they’re often more about what not to do rather than what to do.

Greg Brockman [00:08:10] – At the very beginning of the project, we knew we wanted to solve a game, a hard game. We didn’t know exactly which one we wanted to do because these are great test beds for pushing the limits of our algorithms and one of the great things about it too–

Sam Altman [00:08:24] – And just to be clear you guys are two of the key people. The entire team was like 10 people?

Greg Brockman [00:08:27] – 10 people. And these things good test beds for algorithms to see what the limits are to really push the limits of what’s possible. And you know for sure that when you’ve done it, that you’ve done it. It’s very binary testable. And so actually the way that se selected the game was went on Twitch and just looked down the list of the most popular games in the world and starting number one is League of Legends. The thing about League of Legends is it doesn’t run on Linux and it doesn’t have a game API. And little things like that actually are the biggest barrier to making AI progress in a lot of ways. Looking down the list, Dota actually was the first one that had all the right properties. Runs on Linux, that it has a big community around replay parsing that there’s a built in Lua API. This API was meant for building mods rather than building bots and we were like but we could probably use it to build bots. And one of the great things about Valve as a company is that they’re very into having these open hackable games where people can go and do a bunch of custom things. And so kind of philosophically it was very much the right kind of company to be working with. We actually did this initial selection back in November and we were working on some other projects at the time and so didn’t really get started until late in December and one of the funny things is, by total coincidence in mid-December, Valve released a new bot focused API. And they were saying hey our bots are famously bad, maybe the community can solve this problem.

Greg Brockman [00:10:00] – We’ll actually build an API specific for people to do this, and that was just one of those coincidences of the universe that just worked out extremely well. We were kind of in close contact with the developer of this API and kind of all throughout. At the very beginning of the project, well what are you going to do, right? The first thing was we had to become very familiar with this game API to make sure we understood all the little semantics and all of the different corner cases. And also to make sure that we could run this thing large scale and to turn it into a pleasant development environment. And so at the time it was just two of us. One person was working with the bot API building a scripted bot and so basically this the learn all the game rules, think really hard about how it works. The particular person who wrote it, Rafal (Jozefowicz), has played about three or four games of Dota in his life but he’s watched over 1,000 hours of Dota gameplay and has now written the best Dota scripted bot in the world. That sort of a lot of just writing this thing in Lua, getting very intimately familiar with all of those details. In the meanwhile, what I was working on was trying to figure out how do you turn this thing into a docker container, and so we had this whole build process. Turns out that Steam can only be in offline mode for two weeks at a time that they pushed new patches all the time so you needed to go from this like sort of manually download the game and whatever to actually have an automated for people process.

Greg Brockman [00:11:29] – It turns out that the full game files are about 17 gigabytes and that our Docker registry can only support five gigabyte layers and so I had to write a thing to chunk up things into five gigabyte tarballs and put those in S3 and suck them back down. A bunch of sort of like things there where it was really just about figuring out what the right workflow is, what the right abstractions are. And then the next step was well we know we want to be writing our bots in TenserFlow in Python. How do you get that?

Craig Cannon [00:11:53] – Why was that?

Greg Brockman [00:11:55] – Because so machine learning, you know it’s actually quite interesting, a lot of the highest order bit on progress, just like having the game API is higher to bit, also can you use tools that are familiar and sort of easy to iterate with. Before the world of kind of modern machine learning everyone had to write their code in MATLAB. If you had a new idea it’d take you two months to do it. Good luck making progress. And so it was really all about iteration speed, and so if you can get into the Python world, well we have these large code bases that we built up of high quality algorithms, that there’s just so much tooling built around it that that’s the optimal experience. And so the next step was to port the scripted bot into Python, and so the way I did that was I literally just renamed all of the dot lua files to dot py, commented out the code and then started uncommenting function by function. And then you know you run the function, you get an exception, you go and uncomment whatever code it depends on and as mechanically as possible. I tried to be like a human transpiler, and Lua’s one index, Python’s zero index so you have to do that. Lua doesn’t distinguish between an array type and dictionary type so you kind of have separate those two. But for the most part I did something that could have been like totally mechanically done, and it’s great because I didn’t have to understand any of the game logic. I didn’t have to understand anything that was going on under the hood, I could just basically port

Greg Brockman [00:13:13] – over and it just kind of came together. But then you end up with a small set of functions that you do not have the implementations of. Which are all of the actual API calls. And so I ended up with a file with a bunch of dummy calls and I knew exactly which calls I needed, and then implemented on top of GRPC, a protobuf based protocol where on every tick the game would dump the full game state, send the diff over the wire, reassemble that into an in-memory state object Python. And then all of these API methods would be implemented in Python. And so at the end of this, you know it sounds like a bit of a Frankenstein process, but it actually worked really well. And in the end we had something that looked just like a typical OpenAI Gym environment. And so all you have to do is you say gym dot make this Dota environment ID. And suddenly, you’re playing Dota, and your Python code just has to call into some object that elements the Lua API. And suddenly these characters are running around the screen doing what you want. And so this was like a lot of the kind of thing that I was working on at the pure engineering side. And actually as I went on so Szymon and Jakub, Ben and Jay and others joined the project. And most people were building on top of this API, and really didn’t have to dig into any of the underlying implementation details. Personally my one machine learning contribution to the project, I’ll tell you about that. ‘Cause you know my background is primarily startup engineering building large infrastructure, not sort of machine learning, definitely not

Greg Brockman [00:14:45] – machine learning PhD, I didn’t even finish college. I kind of reached a point where I’d gotten the infrastructure to a pretty stable point that I felt like all right I don’t have to be fighting the fires here constantly, I have some time to actually focus on digging some of the machine learning. One particular piece that we were interested in doing was behavioral cloning. We had one of the systems we had built was to go and download all of the replays that are published each day and so the game this works is that there’s about 1.5 million replays that are available for public download. Valve clears them out after two weeks, and so you have to have some discovery process, you have to stick them in S3 somewhere. Originally we were downloading all of them everyday and realized that was about two terabytes worth of data a day. That adds up quite quickly, so we ended up filtering down to the most expert players. But we wanted to actually take this data, parcel it and use it to clone the behavior for a bot. I spent a lot of time with… It’s basically you need this whole pipeline to download the replays to parse them to kind of iterate on that. To then take a train a model and try to predict what the behavior would be. And first it’s just like one thing I find very interesting is the sort of different work flow that you end up with when doing machine learning. Like there are a bunch of things where when software engineers join OpenAI that are just very surprising. For example, if you look at a typical research

Greg Brockman [00:16:10] – workflow you’ll see a lot of files named like whatever the name of the experiment is one, two, three, four, and you look at them and they’re just like slight forks of the same thing and you’re like isn’t this what version control’s for, why do you do that? And after doing this cloning project, I learned exactly why. Because the things is if you have a new idea for okay I kind of got this thing working, and now I’m going to try something slightly different. As you’re doing the new thing, well machine learning is to some extent very binary, at the start it just like doesn’t work at all and you don’t know why. Or it kind of works but it has a weird performance and you’re not sure exactly is it a bug, how this data set works, like you just don’t know. And so if you’ve got it working at all, then you make a change. You’re always going to want to go back and compare to the previous thing you had running and so you actually do want the new thing running side by side with the old thing. And if you’re constantly stashing and unstashing and checking out whatever, then you’re just going to be sad. And there are a lot of kind of like workflow issues like that, that you just got to bang your head against the wall and then you see like I’ve been enlightened.

Craig Cannon [00:17:12] – Before we progress further on the story, can you just explain the basics of training a bot in a game like how are you actually giving it the feedback?

Szymon Sidor [00:17:20] – On a high level we are using reinforcement learning with self play, so what that means is it’s not rocket science even though reinforcement learning sounds so fancy. Essentially what’s happening is we have a bot which observes some state in the environment and performs some actions based on that state. And based on those actions that it executes and can explain and eventually you know either does well or poorly, so that’s something that we can quantify in a number and that’s one of the engineering problem, how to quantify how good the bot is doing. I need to come up with some metric. And then, you know the bot gets feedback on whether it’s doing good or not and then tries to select the actions that yield to high that positive feedback that high reward.

Sam Altman [00:18:15] – To give us a sense for how well that works. The bot plays against itself to get better, once you had everything working, how good would bot from day N do from bot from day N minus one?

Szymon Sidor [00:18:28] – We have a story that kind of illustrates what to expect from those techniques. When we started this project, our goal wasn’t really to do research. I mean on some high level it was, but we were very goal oriented. All we wanted to do was we wanted to solve problem. And the way it started it was everyone, it was just Greg and Rafal and Rafal was importing scripted bots so he just retired the logic, I think this is what bots should do. But he sees a creep he attack it, yada yada. And he spent like three months of his time. And Rafal’s actually a really good engineer so we got a really good scripted bot. What happened then, he got to the point where he thought he couldn’t improved it much more so we tried let’s try some reinforcement learning. And I was actually on vacation at the time, but an engineering Jakub who throughout my vacation which I found super surprising. I leave there is nothing, I come back there is this reinforcement learning bot. And actually it’s beating our scripted bot. After like week worth of engineering. Possibly those two weeks. But it was something very miniature compared to the development of a scripted bot, so actually our bot didn’t have any assumptions above the game, figured out the underlying game structure well enough to beat anything that we could code by hand which was pretty amazing to see.

Craig Cannon [00:19:59] – And at what point do you decide to compete in the tournament?

Greg Brockman [00:20:03] – Well so maybe I should finish up my story. Sorry if it’s running a bit long.

Craig Cannon [00:20:06] – No it’s good.

Greg Brockman [00:20:08] – It’ll get good shortly. Just to finish up my machine learning contribution, so I basically spent about a month really learning the workflow, got something that like was able to do some signs of life like run to the middle. And like oh it knows what it’s doing, it’s so good. And it’s very clear when you’re just doing cloning that these algorithms learn to imitate what it sees rather than the actual intent, and so it would get kind of confused and kind of run, try to do some creep blocking or something. But the creeps wouldn’t be around so it would be zig zagging back and forth. And anyway I got this to the point where it was actually creep blocking reliably pretty well, and then at that point I turned it over to Jay who’s also working on the project, and he used reinforcement learning to fine tune that. And so suddenly it went from only understanding the actions rather than the intent, to suddenly it really knew what it was doing and kind of has the best creep block that anyone has seen. And that was my one machine learning contribution to the project.

Craig Cannon [00:21:04] – Congrats.

Greg Brockman [00:21:05] – So time went one, and one of the most important parts of the project was having a scoreboard. So we had a metric on the wall which was the true skill of our best bot. So true skill is basically like an evil rating that measures the win rate of your bot versus others. And you put that on the wall, and each week people try all the ideas, and some of them work. Some of them improve the performance. And we actually ended up with this very smooth, almost linear curve. So we posted in a blog post. That really means like exponential increase in the strength of this bot over time. And that part of that is, sometimes these data points were you just trained the same experiment for longer, and you know typically our experiments last up to two weeks. But also a lot of those where we had a new idea, we tried something else, we made this tweak, we added this feature, removed this other component that wasn’t necessary. And so we chose the goal of one v. one. I don’t recall exactly when but it must have been in the spring or maybe even early summer. And we really didn’t know, are we actually going to be able to make it. And unlike normally when you’re building an engineering system you think really hard about all the components and it’s like well you decompose them into this subsystem, that subsystem, that subsystem, and you can measure your progress as what percent of the components are built. Here, you really have ideas that you need to try out. And that it’s sort of unpredictable in some sense.

Greg Brockman [00:22:29] – And actually one of the most important changes to the project in terms of making progress was initially the way the project management was happening. Was that we had written down our milestones, of let’s beat this person by this date, let’s beat this other person by this date, let’s be able to do kind of these outcome based milestones on kind of a weekly or bi-weekly basis. Those things would come and go, and you wouldn’t have them and then what are you supposed to do, it’s completely unactionable, right, it’s not like there’s anything else you could have done it’s just you have more ideas you need to try. And instead of shifting it to what are all the things we’re going to try by next week.

Sam Altman [00:23:03] – That’s a good insight.

Greg Brockman [00:23:05] – And then you do that, and then yeah, if you didn’t actually do everything you said you were going to do then you should feel bad, then you should more of it. And if you did all of those and it didn’t work, then fair enough, but you achieved what you wanted to achieve. And so even going into the international, so two weeks before the international was kind of our cutoff, for at this point there’s not much more we can do, that we’re going to do our biggest experiment ever, put all our compute into one basket and see where it goes.

Sam Altman [00:23:34] – And at that point like at two weeks out, how good was the bot?

Szymon Sidor [00:23:38] – It was barely sometimes when we were professionals, that we had testing.

Sam Altman [00:23:45] – But not even always.

Szymon Sidor [00:23:48] – No, no, it sometimes happened.

Greg Brockman [00:23:53] – So to be specific, I’m just pulling this back in. So July 8th is when we had our first win against semi protestor, and then–

Szymon Sidor [00:24:01] – A sequence of losses.

Greg Brockman [00:24:05] – And then we were kind of more consistent with it. And then he went on vacation and so he was on some laptop somewhere that was not very good and then we were consistently beating him. But that was not very reliable data. This was the week before the international. And so we didn’t really know how good we were getting. We knew that true scale was going up.

Sam Altman [00:24:20] – When was the last time that an OpenAI employee beat the bot? How far out was that from?

Szymon Sidor [00:24:26] – I think like a month or two before TI.

Greg Brockman [00:24:33] – We’re not very good at Dota.

Sam Altman [00:24:35] – And so a month or two out, it could beat all the OpenAI people. Two weeks out it could one time beat a semi-pro.

Greg Brockman [00:24:41] – Four weeks was the first time that it beat the semi pro, and then two weeks out we don’t know, we still can’t really find out. I mean I guess we could rewind that bot, but you know we really didn’t know how good it was at the time we just knew, hey we’re able to beat our semi pro occasionally. And going into the international, figured that hey there’s a 50/50 shot. And I think we were telling Sam the whole way, with these things you never really trust the probabilities, you just trust the trend of the probabilities. That was just swinging wildly, you guys would text me every night I would be oh no chance we’re not going, or we’re definitely going to win every game. Yeah, and so it was very clear their own estimates of what was going to happen were miscalibrated.

Szymon Sidor [00:25:27] – And throughout the week of TI, actually we still didn’t know. And what was happening is we–

Sam Altman [00:25:33] – You guys all went to Seattle for this week?

Szymon Sidor [00:25:35] – Most of the team went there, yes.

Sam Altman [00:25:38] – And so you’re like holed up in a hotel or conference center or something.

Szymon Sidor [00:25:41] – Well actually the reality of it was we were holed up near the stadium and company.

Greg Brockman [00:25:47] – Let me describe how we were holed up. We were given a locker room in the basement of KeyArena, so we all had production badges so you feel very special as you walk in, you’re just like oh yeah, you know I just get to kind of skip the line and go to the backstage area. But it was literally a locker room that they converted into a filming area. And we all had our laptops in there, and they would also bring in pro players every so often. We had a whole filming set up. And then we’d play against the pros. And we had a partition that we set up which was just like a black cloth basically, between like the whole team sitting there being like are we going to be able to beat this pro? Maybe, and trying to be as quiet as possible. And these you know these pros we were playing, and on Monday they brought I think two pros and one very high ranked analyst by, and we had our first game, and we really didn’t know what was going to happen, and we beat this person 3-0. And this was actually a very exciting thing for everyone at OpenAI where it’s about the time I was kind of live slacking the updates, this person just said this and I’m like now it’s this many last hits.

Craig Cannon [00:26:59] – And were you winning by a large margin?

Greg Brockman [00:27:01] – Do you remember the details of that one? This is Blitz.

Szymon Sidor [00:27:06] – Oh, Blitz, I think we won every game.

Greg Brockman [00:27:09] – Yeah we did, 3-0. I don’t know exactly what the margin was, we have all the data, but Valve brought in the second pro, this professional named Pajkatt, and he played the bot and we beat him once, we beat him twice, and then he beat us.

Craig Cannon [00:27:22] – Oh, okay.

Greg Brockman [00:27:23] – And looking at the game, we– Knew exactly what had happened.

Szymon Sidor [00:27:28] – Yeah, essentially what happened is he accumulated a bunch of wand charges. Alright there’s this item that accumulated charges. And he had accumulated more charges than our what has ever seen in game. Because our bots just don’t, it turns out that there was a small, I think it’s safe to say a bug in our set up.

Craig Cannon [00:27:53] – Oh, okay. Passed some threshold that your bot was not ready for.

Greg Brockman [00:27:58] – Very specifically, the root cause here was that he had gone for an item, an early wand build, and we had just never done an early wand build. And so it’s just like our bot had just never seen this particular item build before. And so it never had a chance to really explore. What does it mean, and so it had never learned to save up stick charges and to use them and whatever and so it was very good at calculating who’s going to win a fight.

Craig Cannon [00:28:22] – Wild.

Greg Brockman [00:28:23] – But because, and Pajkatt kind of recognized it, he’s like I wonder what happens if I push on this axis. And sure enough it was an axis the bot hadn’t seen. So, then we played a third match against another pro, went 3-0 on that. And it’s actually very interesting getting the pro’s reaction, ’cause we also didn’t really know, are they going to have fun? Is he going to be cool? Is he going to hate it? And we got a mix of reactions. Some of the pros were like this is the coolest thing ever, I want to learn more about it. One of the pro’s like this thing’s stupid, I would never use it. But, apparently after the pros left that night they spent four hours just talking about the bot and what it meant.

Szymon Sidor [00:28:56] – And the players were highly emotional in their reactions to the bot. They were never beaten by the computer. For example, one of the players that actually managed to eventually beat the bot, he was like okay this bot is f’inh useless, I’d never want to see… And then he kind of calmed down and after like five or ten minutes he was like okay, this is actually great this is going to improve my practice a lot.

Craig Cannon [00:29:23] – And so after your bot lost that first time, did they start talking about counter intuitive strategies to beat it?

Greg Brockman [00:29:30] – Well actually I don’t know. Maybe you can answer that particular question.

Szymon Sidor [00:29:38] – Yes, I don’t think pro players are interested in that, the pro players are mostly interested in the aspect where it lets them get better at the game. Which means that… but there was a point after the event where we set up this big LAN party where we had like 50 computers running the bot, it kind of unleashed this swarm of humans that kind of at our bot. And they found all the exploits, and we kind of expected them to be there, because the bot can only learn as well as the environment in which it plays allows it to. Alright, so there are some things that it just never seen. And of course, those ones will be exploitable. And we are kind of excited about our next step which is five in five, because five in five is one giant exploit. Essentially it’s somewhat exploiting the team, like being where they don’t expect you to be kind of like doing other sufficient things. So naturally we will have to solve those problems head on for five in five.

Greg Brockman [00:30:40] – One thing I think was pretty interesting about the training process is that a lot of our job while we were doing this was seeing what the exploits were, and then making a small tweak that fixes them. And like the way that I now think of machine learning systems is that they’re really a way to make the leverage of human programmers go way up.

Craig Cannon [00:30:59] – Okay.

Greg Brockman [00:31:00] – Right, ’cause again normally when you’re building a system you build a component one, component two, components three, and kind of your marginal return on building component four is like similar to your marginal return on component one. Whereas here, a lot of the early stuff that we did is just like your thing goes from being crappy to slightly less crappy. But, once we’re at the International and we had just lost at Pajkatt, we knew okay well the root cause here is just it’s never seen this item build before. Well, all we had to do was make a tweak to add that to our list of item builds. And then it played out this scenario.

Craig Cannon [00:31:31] – Oh, okay.

Greg Brockman [00:31:33] – For the next however long.

Craig Cannon [00:31:35] – Can you walk me through how that tweak works actually on the technical side because my impression is kind of what you guys have been saying it’s just been in a million games. So it kind of has learned all this stuff, and some people talk about these networks, it’s just very gray and they don’t actually know how to manipulate what… How are you guys getting in there and changing things?

Szymon Sidor [00:31:54] – Yeah, so it’s kind of funny, in some sense in the high level you can compare this task to teaching a human. You see a kid doing math and it’s kind of confusing addition with subtraction suppose, and you need to kind of look here at this symbol, this is what you’re not seeing clearly, right? And the same of those tweaks to our bots, so clearly our bot has never seen this wand build that Greg mentioned. And, you know, all we had to do is we had to say that when the bot plays games and chooses what items to purchase we just need to add some probability of sampling that specific build that it has never seen. And when it plays a couple of games against opponents that use that build, when it uses this build a couple of times itself, then it kind of becomes more comfortable with the idea of what happens, what are the in game consequences of that build.

Craig Cannon [00:32:58] – Okay.

Greg Brockman [00:32:59] – I have a couple different levels to that answer that I think are pretty interesting. So, one is at a very object level, so the way these models work, is you basically do have a black box which takes in some list of numbers, and outputs a list of numbers. And it’s very smart in how it does that mapping, but that’s what you get. And then you think of this as this is my primitive. Now what do I build on top of that so that as little work as possible has to be done inside of a learning here. And a lot your job is well one thing that we noticed we’d forgotten as well on Monday, was well it wasn’t that we’d forgotten, we just hadn’t gotten around to it, was the passing in data that corresponds to the person… Was passing in the visibility of a teleport. So, as a human, you can see when someone’s teleporting out, our bot just did not have that feature. So, the list of numbers passed in, did not have that feature. And so, one of the things you need to do, is you need to add it. And that kind of goes from your feature vectors however long it was and now it’s got one more feature on it.

Craig Cannon [00:33:58] – And the bot wasn’t recognizing that as an on-screen thing?

Sam Altman [00:34:01] – So, it doesn’t see the screen. It’s passed data from the bot API.

Craig Cannon [00:34:05] – Oh, okay.

Sam Altman [00:34:06] – And so, it really is given whatever data we give it. And so it’s kind of on us to do some of this feature engineering. And you want to do as much as you can, to make it as easy as possible, so it has to do as little work inside as possible so it can spend… You know you think of it as you got some fixed capacity, do you want to spend that on learning the strategy? Do you want to spend it on learning how to map, choose which creep you want to hit? Do you want to spend that on trying to parse pixels?

Greg Brockman [00:34:33] – At the end of the day, I think that a lot of our job as the system designers here, is to push as much as that model capacity, and as much of that learning towards the interesting parts of the problem that you can’t script. That you can’t possibly do any processing for. And so that’s kind of one level, that a lot of the work ends up being identifying which features aren’t there or kind of engineering the observation action spaces in appropriate way. Another thing is, another level that we zoom out is like the way that this actually happened was so, we’re there on Monday, then people got dinner, and then Szymon and Jakub and Rafal and I and Saiho and maybe one or two others stayed up all night to do surgery on our running experiment. And so, it was very much like a, you’ve got your production outage, and everyone’s there, all hands on deck, trying to go and make the improvements.

Szymon Sidor [00:35:29] – Yeah, specifically the kind of zooming and to give you a bit of a feel, what it felt like working on the model, this is very tiring week. Every day we were like the day was just meeting with the problems, and kind of watching our bot getting excited. And the nights were kind of cutting out the next part of the experiment. Because actually it’s a little known fact but from day to day, especially with experiments was not good enough to beat the next player, the next day’s professional. Just that one you could download the new parameters of the network and it would be good enough to beat it, the day before it wasn’t.

Craig Cannon [00:36:04] – How are you discerning that?

Greg Brockman [00:36:05] – If this was again, something of almost a coincidence there might be something a little bit deeper, but the full story of the week was we did the Monday play and there we lost to Pajkatt.

Craig Cannon [00:36:17] – And so just to clarify, are you guys in the competition or not in the competition?

Greg Brockman [00:36:21] – The thing that we did, was we did a special event to play against Dendi, who’s one of the best players of all time, and while we were there we were also like let’s test this out against all of the other pros, ’cause they were physically here right now and let’s see how we do.

Craig Cannon [00:36:37] – Got it, alright so Monday happens, you start training it.

Greg Brockman [00:36:40] – Actually, this experiment we kicked off you know maybe sometime the prior week, and…

Szymon Sidor [00:36:46] – Two weeks before, I think.

Greg Brockman [00:36:47] – Something like that, and we’d been running this experiment for a while, and our infrastructure’s really meant for you’re running an experiment from scratch, you know you start from complete randomness and then you run it. And then two weeks later, go and see how it does. We didn’t have two weeks anymore. And so we had to do this surgery and this very careful read every single character of your commit to make sure that you’re not going to have any bugs, because if you mess it up, we’re out of time, there’s nothing you can do. And it’s not one of those things like, if you’re just a little more clever, that you can go around and do a hot patch and have everything be good. It’s just literally the case that you got to let this thing sit here, and it’s got to bake. And so, Monday came and went, we were running this experiment that we performed surgery on and the next day we got a little bit of reprieve where we just played against some lower ranked players who were kind of commentators and popular in the community but were not pushing the limit of our bot. On Wednesday at 1:00 pm our contact from Valve came by and said hey, I’m going to get you Arteezy and SumaiL, who are basically the top players in the world. And I was like could we push them off to Thursday maybe? And he was like their schedule is booked, you’re going to get them when you get them. And we were scheduled to get them at 4:00 pm. So, we looked at our bot to see how it was doing, and we kind of along the way gaging it. We tested it against our semi pro player. And he said this bot is completely broken.

Greg Brockman [00:38:17] – Oh no, and you kind of pictures of maybe we had a bug during the surgery, like went through our head. And he showed us the issue, he said, “Look first wave, this bot takes a bunch of damage it doesn’t have to take. There’s no advantage to that. I’m going to run and I’m going to go kill it, to show you how easy it is.” He ran in to kill it and he lost.

Craig Cannon [00:38:37] – But don’t jump ahead, explain what happened?

Greg Brockman [00:38:39] – He played it five times, and he lost each time. Until he finally did figure out how to exploit it. And we realized what was going on, was that this bot had learned this strategy of baiting. You pretend to be a really dumb bot, that you don’t know what you’re doing, and then when the person comes in to kill you, you just turn around and you go super bot. It was legitimately a bad strategy, if you’re really really good, but I guess it was good against the whole population of bots it was playing against.

Craig Cannon [00:39:03] – And you had never seen it until that day.

Greg Brockman [00:39:05] – We had not seen that behavior.

Szymon Sidor [00:39:07] – And we did not at all expect this. One of the major examples of the things that we kind of didn’t have explicit incentive for, and the bot actually learned them. It was kind of funny because of course when the bot played against it’s other versions it was just like good baiting strategy, it was kind of the solution. But it had very interesting psychological effect on humans because our first strategy was not to fall for the baiters, kind of to wait it out a little bit, because the bot is already at a disadvantage. But he’s like, okay look at how stupid this bot is I’m going to go for a kill. Right? It kind of had interesting psychological effect on humans, which I thought was like…

Craig Cannon [00:39:59] – It almost knows it’s a bot. It knows how it’s attacked.

Szymon Sidor [00:40:02] – Yeah, it’s funny to see a bot which kind of seems like it’s playing with emotions of the player. Of course it was not what actually happened, but it seemed this way.

Greg Brockman [00:40:11] – Now we were faced with a dilemma, it’s 1:00 pm Wednesday, these best players are going to be showing up at 4:00 pm, we have a broken bot. What are we going to do? And we know that our Monday bot is not going to be good enough, we know it’s not going to cut it. And so, the first thing we do is we’re like well, Monday bot, it is pretty good at the first wave. This new bot is a super bot thereafter.

Craig Cannon [00:40:30] – Okay.

Greg Brockman [00:40:31] – Can we stitch the two together? So we wrote, we had some code for doing something similar, so we kind of revived that, and then in the three hours Jay spent his time doing a very careful stitch where you run the first bot, and then you cut over at the right time to the second bot, and…

Sam Altman [00:40:49] – And this is literally just like bot one plays the first X amount of time, and then bot two takes over.

Greg Brockman [00:40:54] – Literally just that. And he finished it 20 minutes before the venue. We ran it by our semi-pros, semi-pros are like this is great. So, we at least, we got that done in the nick of time. But the other question was how do we actually fix this bot?

Szymon Sidor [00:41:07] – Actually just to finish your story, there’s like one aspect because we are also kind of uncertain what happens when you switch over from one bot to the other. So, I was actually standing by the pro who was playing it, and I was looking at the timer, the moment when it was switching I was like… Kind of distract the guy in case something silly happens there.

Sam Altman [00:41:28] – Just to try to distract him for one second.

Szymon Sidor [00:41:28] – Yeah. And of course, it was probably completely unnecessary but we weren’t sure what would happen there.

Greg Brockman [00:41:35] – I didn’t know about that part of the story. The question of how do you actually fix it? So, there was a little bit of debate of like maybe we should abandon ship on this, switch back to our old experiment, run that one for longer. And I forget who suggested it, but someone was like I think we just have to let it run for longer, because you learn a strategy of baiting, well the counter strategy for that is just don’t bait, play well the whole time. And so, we got that run for the additional three hours, and so we first played Arteezy who showed up on our switch bot, kind of the Franken-bot and that beat him three times. We’re like alright, lets try out this other bot, and just see what happened with the additional three hours of training ’cause our semi-pro tester at least validated that it looks like it’s fixed.

Craig Cannon [00:42:19] – And so in that three hours of training, how many games is it actually playing simultaneously?

Greg Brockman [00:42:23] – That’s a good question, quite a bit.

Craig Cannon [00:42:25] – Yeah, okay.

Greg Brockman [00:42:26] – And, so we played this new bot against Arteezy, didn’t know how it was going to do, and sure enough it beats him and he loved it. He was having a lot of fun. He ended up playing 11 games that day, maybe it was ten but I think that he was just like oh this is so cool. We were supposed to have SumaiL that day as well, but due to a scheduling snafu, he had to be at some panel and so like timing didn’t work out. But Arteezy and his coach, who also coaches SumaiL both said SumaiL’s going to beat this bot, it’s going to happen. Maybe he’ll have a little bit of trouble to figure it out for the first game, but after that you’re in trouble. And so, we’re like alright we’ve got one more day to figure out what to do. And so what do we do?

Szymon Sidor [00:43:11] – I don’t know, kind of like some nice dinner.

Sam Altman [00:43:17] – Kind of what did you say?

Szymon Sidor [00:43:18] – Kind of went for some nice dinner, we kind of rested, we kind of chatted, slacked with some people at home and then in the morning we download the new parameters of the network and just let it play.

Craig Cannon [00:43:32] – You just hung out and just let it go?

Greg Brockman [00:43:34] – Just let it play. It’s the exact opposite of how I’m used to engineering deadlines happening. Normally it’s you work right up until the minute.

Sam Altman [00:43:41] – You guys weren’t like, you guys were getting like full nights of sleep, nice and relaxed.

Szymon Sidor [00:43:45] – Oh no no. Absolutely not. Okay, so to make this clear, the night before the night, two nights before the day where we got the rest and the relaxation, the night looked something like the following: we had full day of dealing with the problems and kind of like emotional highs and so on, it’s absolutely not good. Come midnight we start working, okay we need to make all those changes, like the one thing that we talked about around midnight we start with four people, and we are all so tired that we looked through all the commits that we are going to add to the experiment. There’s actually two people looking at them because we didn’t trust a single person given how tired we all were. So, they’re like looking at those currents till 6:00 am. I was doing just like updating the model which is a lot of nasty like off by one indexing things so even though there’s a shortcut, it took me like six hours to do. Somewhere around 3:00 am we had a phone call of Azure because it turns out that we set the number of machines started exceeding some limits, so we tried to not to make them not raise the limits. And around 6:00 am we’re okay, we are ready to deploy this. And then there was… deploying is just like one man job so, Jakub was just like clicking deploy and kind of fixing all the issues that came up. I was staying around just exclusively just to make sure that Jakub doesn’t fall asleep. And eventually at 11:00 am the experiment was running and we kind of went to sleep, woke up at 4:00 pm or something and then it was like all…

Sam Altman [00:45:27] – So it had over 24 hours to train?

Szymon Sidor [00:45:30] – I think it ended up being one and a half until the game of SumaiL.

Greg Brockman [00:45:36] – Sorry, just to repeat the timeline. So this was Monday when you played the first set of games, had the loss, did the surgery that night. It was played starting at 11 on Tuesday. Then Wednesday 4:00 pm is when we played Arteezy, and then trained for longer, I don’t think we made any changes after that. Maybe we made some small ones. But then on Thursday is when we played SumaiL.

Craig Cannon [00:45:59] – Okay.

Greg Brockman [00:46:00] – And so that…

Szymon Sidor [00:46:01] – I think Tuesday to Wednesday was the night where we made last changes.

Greg Brockman [00:46:04] – Yeah. And there was quite a bit of different work going on that all kind of came together at once. Like one thing I think that was really important was one of our team members his handle is siho who is very well known programming competition competitor was spending a lot of time just watching the bot play and seeing why does it do this weird thing in this case. What are all the weird tweaks and really getting intuitions for, oh because we’re representing this feature in this way and so if we change it to this other thing than it’s going to work in a different way. And I think that this really trying to, it’s almost this human-like process of watching this expert playing the game, and trying to figure out what are all the little micro-decisions that are going into this macro choice. And it’s kind of interesting starting to have this very different relationship to the system you build, because normally the way that you do it is well, your goal is to have everything be very observable, and so yeah you want to put metrics in everything and if something’s not understandable add more logging, like you know that’s how you design the systems. Whereas here, you have this… you do have that for the surrounding bits but for the machine running core there you really do have to understand it at more of a sort of behavioral level.

Craig Cannon [00:47:24] – Was it ever stumping you where you’re just like ugh it’s being creative in a way that we didn’t expect it to and maybe even working but you don’t know how or why it decided to make that choice?

Szymon Sidor [00:47:34] – Yeah, I think the baiting story that we shared is the main one like this. We had a few small ones, like where there’s some early phase of the project where we have professionals playing the next version of the ridings, like mmm, everybody’s really good at crippling. And we are like oh, what is crippling?

Greg Brockman [00:47:58] – There was also one other part of the story that I think is interesting, and then I think we can probably wrap up this part. To see how well, our semi-pro tester played hundreds of games against this bot over the past couple of months, and so we wanted to see just how does he benchmark relative to Arteezy. And so we had him play against Arteezy, and you know Arteezy was up the whole game. He was just like beating him to the last hit by like 500 milliseconds every single time. And so, our semi-pro was like alright, I’ve got one last ditch effort to go try this strategy that the bot always does to me. And it’s like some strategy where you do something complicated and you like triple wave the your opponent, you get him out of the tower, you have regen, you go in for the kill, and he did it and it worked.

Craig Cannon [00:48:41] – Whoa.

Greg Brockman [00:48:42] – And this was the bot had like taught him this strategy that he could use against a human, and I think that was very interesting and a good example of the kinds of things you can get out of these systems. That they can discover these very sort of non-obvious strategies that can actually be taught to humans.

Craig Cannon [00:48:59] – And how did it go with SumaiL?

Greg Brockman [00:49:01] – With SumaiL, we went undefeated, I think it was 5-0 that day. One thing that was actually interesting, so we’ll probably blog about this in upcoming weeks, but we’ve actually been playing against a bunch of pros since then, so our bot has been in high demand. And some of these pros have been live streaming it and so we’ve gotten a better sense of kind of watching as humans go from being completely unable to beat it, to if you play against it for long enough, you can actually get pretty good. And so there’s actually a very interesting set of stats there that I will be pulling and analyzing in a bit.

Sam Altman [00:49:39] – Are there humans that consistently beat the bot today?

Greg Brockman [00:49:41] – Yeah, so I think there was one who has like a 20% win rate or something?

Szymon Sidor [00:49:44] – I think it might be actually 30, and that player played hundreds of games.

Sam Altman [00:49:50] – And just finds strategies to exploit?

Szymon Sidor [00:49:51] – Uh, no actually. He becomes essentially as good as the bot.

Sam Altman [00:49:58] – Really?

Greg Brockman [00:49:59] – At what the bot is doing. Which we find is extremely surprising. But it turns out, that he played hundreds of games with it, so it’s actually…

Craig Cannon [00:50:06] – And is he a top player? Does he beat most humans?

Greg Brockman [00:50:09] – Yeah, these are all professionals.

Craig Cannon [00:50:11] – It’s not just some random kid who’s good at beating the bot.

Greg Brockman [00:50:13] – The way to think about this is that being a professional video game player is a pretty high bar. Yeah. I think everyone wants to be a professional video game player, you know, who plays these games and the number of pros is very small and there are some who have really like, when you’re playing hundreds of games against it, you’re going to get very very good at the things that it does. And so talking to Arteezy, I was asking him has it changed your play style at all? And he said he thinks that the thing that it’s done for him is it’s helped him focus more, ’cause while you’re just there and laying last hitting, now suddenly like that’s just so wrote, right? ‘Cause you just have been doing it so much, you’ve gotten so good at it. And I think that one really interesting thing to see is going to be how can you improve, can you improve human play style? Can you change human play style? And I think that we’re starting to see some positive answers in that direction.

Sam Altman [00:51:04] – I know we’re almost out of time, I could do like a little lightning round, just quickly go through something special with these guys.

Szymon Sidor [00:51:10] – Actually, to the question of what kind of skills you need to work at OpenAI, could we have like a very small…

Sam Altman [00:51:18] – That was going to be the first lightning round question.

Szymon Sidor [00:51:20] – Specific list of things that we found very useful, at least in the DOTA team. It’s some knowledge of distributed systems, because we build a lot of those and those are easy to not do properly. And another thing that we found very important is actually writing bug-free code. Essentially, I know it’s kind of taken for granted in Computer Science community that like everybody makes bugs and so on. But here it’s even more important than the other projects that you minimize it, because they are very hard to debug. Specifically, many bugs manifest in kind of lower training performance, where to get that number it takes a day, and in like a spree of hundreds of code instances it’s really easy to miss. So, and primary way of debugging this is actually reading the code. So every bug has very high cost associated with it. So, actually writing the correct bug free code is quite important to us. And we sometimes actually kind of sacrifice good engineering habits, good kind of code modularity to make our codes shorter, and simpler, and kind of having less… essentially less lines where you can make bugs. And I guess, lastly, I should mention like primary skills, good engineering, but if somebody really feels like gosh I really need to brush up my maths, I really need to go in there and feel comfortable, not have somebody ask me question about maths that I don’t understand. I think mostly getting good basics in linear algebra in linear algebra in basic statistics, that’s especially when doing experiments it’s easy to make like elementary statistics mistakes.

Szymon Sidor [00:53:23] – And linear algebra is just kind of most of what you need to know to… like basic optimization as well to follow what’s happening in those models. But, this is kind of compared to being a good engineer quite easy to pick up at least a project like the one we are doing.

Greg Brockman [00:53:44] – I wanted to talk about some non technical skills that I think are really important. One is that I think that there’s a real humility that’s required if you’re coming from an engineering background like I am, in working in these projects, where you’re no longer the technical expert in the way that you’re used to. I think that if you go and you build, you talk to like, let’s say you want to build a product for doctors, I think you can talk to ten doctors and honestly whatever you’re going to build is probably going to be a pretty valuable addition to their work flow because doctors can’t really build their own software tools. Some can but as a general rule, no. Whereas with machine learning research, everyone that you’re working with is very technical, can build their own tools. But, if you inject engineering discipline in the right place, if you build the right tool at the right time, if you kind of look at the workflow and think about oh we can do it in this other way, that’s where you can really add a bunch of value.

Sam Altman [00:54:34] – And so I think it’s about knowing when to inject the engineering discipline, but also knowing when not to. And being to Szymon’s point, sometimes well, we really just want the really short code, ’cause we’re really terrified of bugs, and so that can yield different choices than you might expect for something that’s just a pure production system. Who writes the least bugs at all OpenAI? Oh, that’s a contentious question. Actually, that’s a good…

Szymon Sidor [00:54:57] – What was the question?

Sam Altman [00:54:58] – Who writes the least bugs per line of code at all of OpenAI?

Greg Brockman [00:55:01] – I’m definitely not going to say me.

Szymon Sidor [00:55:05] – Possibly Jakub.

Greg Brockman [00:55:07] – Yeah. It’s hard to say.

Szymon Sidor [00:55:09] – But it’s really hard to say.

Sam Altman [00:55:11] – It could be Greg.

Greg Brockman [00:55:13] – I write a lot of bugs.

Szymon Sidor [00:55:15] – I caught Jakub the least amount of times on bugs.

Greg Brockman [00:55:20] – It’s more okay to have bugs that are going to cause exceptions.

Sam Altman [00:55:22] – Right.

Greg Brockman [00:55:23] – And my bugs usually cause exceptions.

Sam Altman [00:55:22] – Right.

Greg Brockman [00:55:25] – That’s fine.

Szymon Sidor [00:55:26] – That’s fine.

Greg Brockman [00:55:26] – Yeah. It’s what you don’t want is that the things that cause correctness issues where it gets 10% worse.

Craig Cannon [00:55:30] – Yeah. Tthere was another question related to skills, but this is for non-technical people.

Greg Brockman [00:55:36] – Yep.

Craig Cannon [00:55:37] – So Tim Bekoe asks how can non-technical people can be helpful to AI startups?

Greg Brockman [00:55:43] – Well, I was going to say, I think one important thing is that for AI generally right now I think there’s a lot of noise. I think it can be hard to distinguish what is real from what’s not. I think just simply educating yourself I think is a pretty important thing. I think it’s very clear that AI’s going to have a pretty big impact. Just look at what’s already been created and extrapolate that without any new technology development, any new research. It’s pretty clear that it’s going to be baked into lots of different systems. There are a lot of ethical issues to work through. I think that being a voice in those conversations and educating yourself, I think is a really important thing. Then you look to, well, what are we going to be able to develop next? I think that that’s where the really transformative stuff’s going to come.

Sam Altman [00:56:28] – Okay. I once saw a post of Greg’s RescueTime report and was pretty shocked. Do you have advice for working such long, focused hours?

Greg Brockman [00:56:38] – I think it’s not a good goal. I would not have a goal of trying to maximize the number of hours you sit at your computer. For me, I do it because I love it. The thing, the activity that I love most in the world is when you’re in the zone, writing code, producing it for something that’s meaningful and worthwhile. So I think that as a second-order effect, it can be good. But I wouldn’t say that that is the way to have an impact.

Sam Altman [00:57:01] – I will also say more specifically, the only way I’ve ever seen people be super productive is if they’re doing something they love. There is nothing else that will sustain you for a long enough period of time. Okay. Is the term AI overused by many startups just to look good in the press?

Greg Brockman [00:57:14] – Yes.

Szymon Sidor [00:57:16] – Indeed.

Sam Altman [00:57:17] – Okay. What is the last job that will remain as AI starts to do everything else? The last human job? What is going to be hardest thing for AI to do?

Greg Brockman [00:57:31] – It is a hard question to answer in general. Because I think it’s actually not AI researcher. The AI researcher will go before…

Craig Cannon [00:57:40] – Yeah, yeah, yeah.

Greg Brockman [00:57:41] – It’s actually very interesting when you ask people this question. I think that everyone tends to say whatever their job is.

Craig Cannon [00:57:46] – Yeah.

Greg Brockman [00:57:47] – As the hardest one. But I actually think that AI researcher is going to be one that you’re going to want to make these systems very good at doing.

Sam Altman [00:57:53] – Totally.

Craig Cannon [00:57:54] – I think the last question, maybe this is obvious, is can you just connect the dots between how playing video games is relevant to building AGI?

Greg Brockman [00:58:01] – Yeah. It’s actually maybe one of the most surprising things to me, the degree to which games end up being used for AI research. The real thing that you want is you really want to have algorithms that are operating in complex environments where they can learn skills and that you want to increase the complexity of those skills that they learn and that’s either you push the environment, you push the complexity of the algorithms, you scale these things up and that that’s really the path that you want to take to building really powerful systems. Games are great, because they are a prepackaged environment that some other humans have spent time making… First of all, putting in a lot of complexity, making sure that there’s actual intellectual things to solve there. Not even just intellectual but interesting mechanical challenges that you can get human-level baselines on them, so you know exactly how hard they are. They’re very nice unlike something like robotics where you can just run them entirely virtually. That means you can scale them up and you can run many copies of them. They’re a very convenient test bed. I think what you’re going to see is that there’s a lot of work that’s going to be done in games, but the goal is, of course, bring it out of the game and actually use it to solve problems in the real world and to actually be able to interact with humans and do useful things there. They’re a very good starter and a very good place to… I think one thing that I really like about this DOTA project

Greg Brockman [00:59:25] – and bringing it to all these pros is that we’re all going to be interacting with super-advanced AI systems in the future. Right now, I don’t think we don’t really have good intuitions as to how they operate, where they fail, what it’s like to interact with them. This is very low stakes way of having your first interaction with very advanced AI technology.

Craig Cannon [00:59:42] – Cool. If someone wants to get involved with OpenAI, what should they do?

Szymon Sidor [00:59:47] – Well, we have a job posting at our website. The tips that us giving about how to get a job at OpenAI are very geared towards a specific job posting that we have there which is a Large-Scale Reinforcement Learning Engineer.

Sam Altman [01:00:03] – Cool.

Greg Brockman [01:00:04] – Yep. And in general we look for people who are very good at whatever technical axis they specialize in and we can use lots of different specialties.

Craig Cannon [01:00:13] – Great. All right. Thanks guys.

Sam Altman [01:00:15] – Just to echo that. Everyone thinks they have to be an AI Ph.D. Not true. Neither of these guys are. All right, thanks a lot.

Greg Brockman [01:00:22] – Cool. Thanks. Thank you.

Szymon Sidor [01:00:23] – Thank you.

Craig Cannon [01:00:25] – Alright, thanks for listening. As always, the video and transcript are at blog.ycombinator.com and if you have a second, please subscribe and review the show. Alright, see you next week.

Author

  • Y Combinator

    Y Combinator created a new model for funding early stage startups. Twice a year we invest a small amount of money ($150k) in a large number of startups (recently 200). The startups move to Silicon