shibboleet's Blog — AI...Good or bad?

Overview
An interesting conversation was taking place in the decompilation Discord server for the Super Mario Galaxy series which was questioning the use of using AI to assist or even completely taking over the decompilation effort. There were plenty of great points made and I want to go over a few of them that I found interesting and would like to shed my own opinion on the topic.

AI Contribution

AI can do a lot of things, like generate images, answer prompts, follow commands, and more. But...how could it possibly contribute to a decompilation effort? How much human intervention would an AI decompilation need?
The approach to AI contributing to a decompilation of a game is different depending on how much one would like to have the AI do. Do you want it to properly name variables? Functions? Try to optimize it down while still matching? Or match the entire game? There are many approaches to doing this. So let's break down some of them.

Variables / Functions

This area makes sense. You can ask the AI to examine your code, and given function names (and perhaps some context), AI may be able to determine variable names that are better suited. However, I have not had much luck in this area as the names ChatGPT gives me tend to be pretty far off what the variables are actually representing.

Optimizations

This area is where things get a little shaky. CodeWarrior, the compiler used on the GameCube and Wii, is very picky about how functions are written for them to be considered fully matching, meaning that the assembly generated from the compiler matches the retail game code. With this option, you would have to "train" a AI to recognize what patterns CodeWarrior will generate based on the code you give it (which is something we refer to as codegen). So even then, not only would an AI have to optimize code down as it sees fit, but it also has to adhere to CodeWarrior's tendency to change entire insructions and registers with a small code difference. So this would likely require the function to already match to work best, which is another area where there will likely have to have human intervention.

AI Independent Decompilation

This is where we get into the most disagreement, is how AI could possibly be used to accelerate the decompilation process. While this may seem like a good idea to finish the decompilation side of things quicker, there are many more downsides to this method than there are upsides.
  • Readability. If you are forcing an AI to decompile an entire game, you are forcing it to make a lot of assumptions, and with these assumptions can also have faulty assumptions. This can cause a lot of issues when it comes to reading the code, as misleading variables / method names (more specifically inlines, as we already have the names of functions) can cause misinformation if the assumptions that the AI make are incorrect. This will involve extensive human intervention to correct these errors and can even possibly delay the process, as a human decompiling the same code would have not make this mistake in the first place since a human is better at applying context to variable names than an AI could.
  • User interaction. A lot of people approach the decompiling community to learn more about the ways of reverse engineering games and how they are coded. Many people use the opportunity to better themselves at languages such as C and C++ and even the machine language they are translating the game from, which in our case, is PowerPC. I have seen many people in the decompiling community benefit from decompiling by hand as it extends their knowledge. Users interacting with the compiler and how it generates code is vital to completing any sort of project that involves matching decompilation, and even helps other decompilation projects succeed. If you put AI in the way of this, many people are blocked out of the learning process as there is a computer "trained" to do this already. And when you are blocking many new users, you are lowering the amount of people interested in learning due to the fact that the "hard work" has already been done for them, which defeats the entire purpose.
  • Satisfaction. Matching functions and entire files is extremely satisfying. It also gives users a sense of accompishment as decompiling is not an easy task to complete. Allowing an AI into the picture will take away from this sense of accomplishment as using AI to do all of the work just feels lazy to a lot of people. It is also very satisfying to discover how the game truly works at your own pace, instead of an AI just documenting the structures and hastly making assumptions and going from there.
  • Time. This is quite literally the only upside that I can think of when allowing an entire decompilation to be led by an AI. It simply is faster. However, people fail to see that faster is not always better. While you will get a fully "complete" byte-for-byte decompilation, it will be a mess of wrongly named, variables, functions, faulty assumptions about code (fakematches) and much more headache down the road.

So...is it bad?

Overall, I think that AI-led decompilations are a terrible idea and that they will gatekeep new reverse engineers from entering the scene as it takes away a lot of people's main motivation to even try this feat in the first place: learning how things work. There is immense satisfaction when you work hard at something and accomplish great goals in the process. I feel like a AI decompilation will take this all away and will kill people's motivation.

However, there is some justification for some people's support for a AI-led effort. They simply want to port the game to other platforms, like Super Mario 64's decompilation did when it was completed. However, I believe that going a destructive path of laziness is not worth getting a hastly assembled port together with many probable faulty assumptions in the codebase is ultimately not worth it. And it even has the potential to kill off other decompilation efforts as it will remove that human interaction that people enjoy with decompiling.
© 2019-2023 shibboleet