Game development continues to evolve every day, and a major lift in the industry has come from AI and machine learning, writes Daniele Gravina, Ph.D., AI Researcher at modl.ai. An essential — but often overlooked — part of machine learning in game development is data cleaning. Unfortunately, even the most sophisticated game studios with their own data scientists struggle with this because it involves some heavy lifting. Data cleaning may involve standardising data sets, correcting mistakes such as blank fields, spelling and syntax errors and recognizing duplicated data points.
Having clean data is a fundamental step for artificial intelligence (AI), in particular, to support game developers in their quest to take their games to the next level.
Gaming’s new generation relies on data not only to analyse players’ behaviours but to incorporate AI and machine learning for optimised development. It’s a missed opportunity if the games studio doesn’t incorporate a data scientist at the beginning of the development process. Data scientists are essential in the development process. They create mathematical and automated models for analysing and identifying game optimization points.
For a game development to be successful, the game needs to be data-ABLE, and that means it needs to comply with two main states, it has to be cleaned and versioned.
Cleaning the Data
Games are tricky and complex. To make the game stable and avoid unnecessary patching, game developers require that the collected data is normalised and versioned. The data may look correct at first sight, but ‘dirty’ data can jeopardise the accuracy of the AI algorithms.
It’s not uncommon for gamers to let other people play their games – especially those with kids. If a player lets somebody else play the game they usually play, it immediately contaminates the game data because this new player – playing the same game – may not have the same level of expertise as the original player. That’s what we call outliers, and it’s important to filter this data from the data set to get the results needed from a machine learning bot.
Also, it’s necessary to keep the data in similar ranges – normalise your data – because of how machines work, numbers need to be in a similar range. For example, if a bot detects a really big number, it may incline the bot to pay close attention to this number instead of the others because it’s a hugely different number from your regular data.
Games get patches, and they get changed — a lot. For example, a puzzle game can start with 15 levels and grow to 4,000. It’s common to see studios getting creative and changing the level but not the ID of the level. So now you have data that says this level is hard and data that says the level is easy, but it’s probably not even the same level. Another example – is the Player A performance on Level X the same as Player B performance on Level X six months later? It’s hard to know with un-versioned data.
A good data system can dynamically adapt to new game versions and updates so that you can ask new questions without changing the game or the way the data is collected. In conclusion, before getting into the complexity of artificial intelligence and machine learning, you need your data to be clean and ready to add functional and smart bots to a game.