Unity 3D’s ML-Agents

If you’ve made any sort of game odds are you have some sort of artificial intelligence. Think about the old MegaMan games or Super Mario Bros. Sure, the enemies only move left and right, maybe they pop up, maybe they shoot something. That’s been good enough for indie game makers, AAA titles and those of us who play them. Few, if any, games have tried to implement what scientists would consider artificial intelligence or machine learning.

Unity 3D is a game engine that’s got a lot of bells and whistles. Great graphics, lighting, post-processing. Great UI system (on the third attempt). Now you can add machine learning to the list.

What is machine learning?

https://en.wikipedia.org/wiki/Machine_learning

Wikipedia says, “Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed.”  That pretty much sums up the intention of ml-agents.  The way I understand it is that for the algorithm that ml-agents is using, you input a bunch of numbers that describe the world around your agent.  These numbers are messed around with inside the internal workings of a “neural network” and then it spits out an action.

Unity3D released a beta of it’s machine learning sdk.  It’s complicated.  I can tell you that from the start.  There are multiple steps to take because the basic set-up has you building and running a Unity3D game on one side and a Python program on the other side.  It isn’t for the faint of heart.

I went through the process and managed to get it up and running despite some snafus.  One thing to look out for is that Google’s Tensorflow has a bad copy of a file called html5lib.  It breaks Python’s software installer called PIP.  But fixing PIP breaks Tensorflow and you need Tensorflow for ml-agents.  The solution is to install Tensorflow last and don’t try to install anything else afterward.

The first game I tried making was a little space shooter.  Sort of like asteroids with two spaceships trying to shoot each other.  It actually worked to the point that the spaceships were shooting at each other, pulled off some great dodges and managed to land some shots I wouldn’t have expected.  The most interesting behavior was that ships seemed to use their shots to block incoming shots.  Something a human player would have a hard time pulling off I think.

Machine learning controlled spaceships try to learn how to play capture the flag.

Machine learning controlled spaceships try to learn how to play capture the flag.

The workflow is like this:  First, you follow https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Making-a-new-Unity-Environment.md to create a new scene.  Then you open up your Agent script and edit the CollectState function and the AgentStep function.  Once you feel they are where you want them to be, you then build your game.  You load up a specific Python script and run all but the last section.  The Python script loads your game, creates a network connection with it and starts training the agent(s).

CollectState holds a list of floats that describe the world to your agent.  Things like position of the agent, position of a goal or things that are important.  CollectState is run on each step.  So you add variables to it and those variables can change over time.  It “collects the state,” or the information from those variables each step.

One thing to know is that you can set up a camera as eyes for the machine learning agent.  Look at the GridWorld example and the GridBrain specifically for a demonstration of how to do this.  The resolution of the camera has to be small and square.  (32 pixels by 32 pixels etc)

AgentStep is the actions the agent should take each step.  This is a LOT like Unity’s update function as far as I can tell.  It just updates on machine learning steps, that’s all.  One thing you do here is grab the action or actions performed by the Agent.  The state was the input and now the neural network will spit out an action to take.

This is where everything really gets fuzzy.

I am not 100% sure but it seems like if you choose discrete as your action space, the neural network will choose one action and output that action.  A demonstration of this can be found in the TennisAgent script which is an example that comes with the ml-agents download.  Then you can do an IF statement.  If act[0] == 0f (zero as a float) then write the action it will take.  You do this for all your actions.  In my case I made 0 = left, 1 = right, 2 = forward, 3 = stop and 4 = fire weapon.

If you set your action space to continuous, again, not complete sure, but it seems you can grab act[0], act[1], act[2] etc.  They will be floats.  Take for instance what I did to try to create a lottery predicting program.  I downloaded the drawings from one particular lottery, the file I used had about 800 drawings in it.  Each step I asked the machine learning agent to guess the day’s numbers.  I grabbed the actions as those guesses, compared them with the actual drawing for that day and rewarded the agent when it got a number right.  The whole experiment has been a failure so far but it’s just an example of what you can do with continuous action space.

The documentation on the ml-agents is so sparse and so full of machine learning jargon that I’m having a hard time understanding it myself.

I’ve done two other experiments.  One is an expansion of the space game as capture the flag.  There was some interesting progress on the machine learning until suddenly all the ships stop doing anything and just spin.  The other one is a basketball game.  The agent may aim and then “shoot” a basketball at the hoop.  So far I’ve had no luck getting the agent to do anything except spin in circles.

One of the best things that I witnessed in the CTF game is one agent shooting his teammate who had a hold of the flag. Shots at teammates do not hurt them but still push them. He shot his teammate most of the way to their base where his teammate proceeded to capture the flag.

One thing that give no real help on is how to actually train your agent.  Do you run the PPO script once? Ten times?  Or do you set the max step count extremely high?  etc.  There are so many variables between the PPO Python script, the ML brain, ML academy and agent and none of it really has a description.

Anyway, I really suggest that anyone who likes Unity, has experience getting Python up and running and is really interested in artificial intelligence to give it a shot.  The first night, I was having an absolute blast.  I’m still interested in playing around with it but it’s taking a back seat to other projects until they work on the documentation.