Back# How to solve the unsolvable? – A learning story from the field of machine-learning

Antti Rauhala • Lead Data Scientist

Have you ever tried to solve the unsolved and unsolvable? Have you ever tried to tackle the great problems of our age; the ones that even greater and wiser men have tried solving in vain? Have you ever tried it with some real effort?

There exists an idea that it is challenges that force us to develop our thoughts and thinking. And even if – as it happened for me - the unsolvable remains unsolved, the sole challenge did guide me to the fountain of curious thought, where I found something very new and unique: a new powerful approach to machine learning.

My war against the windmills was about solving the harder version of the AI problem, where you learn the (game) world entirely from (simplistic) sensory information. I believe I haven't been on this quest alone and certainly quite a number of computer science students and graduates have gone after the same problem. And likely the results haven't been very impressive - or at least, they were not for me.

My attempt was to use established machine learners to tackle the issue, but quite soon it become obvious that the neural networks, KNNs and Bayesian classifiers were not very useful at modeling the way the games worked. There was something very troubling in the idea that the otherwise powerful machine learners would so horribly struggle with the problems and patterns that seem so easy or even trivial for us. Even more, it became obvious that algorithms used could not replicate and learn the simple rules of the games in the game setting. Neither were the provided classification values very useful, as what you really wanted were probabilities. And you didn't really want to just predict a few values, but build more a comprehensive model for the world, and you wanted the learned model to be formal and clear to better support a decision-making algorithm. And the list of issues went on and on.

Eventually a thought emerged: the problem which the machine learners were created to solve was fundamentally different from the 'world learning' problem at hand. But if the problem of 'world learning' was fundamentally different, then not only the problem but also the approach to solve the problem and the solution itself would need to be reinvented and the whole attempt would return back to square one.

To deal with the new problems, many things had to be reconsidered and rethought. I wanted no artificial limitations on what could be learned while I sought to make the model clear, formal and probabilistic. I also wanted the method to perform modeling in a clean, non-supervised way.

Coincidentally, at the time, I had also played with language learning and a special 'smart database' acting as a giant sparse naive Bayesian predictor. When playing with this database, I had started to wonder how to make it smarter and better at predicting outcomes. Then a few simple observations struck me:

- First, you could often modify the data itself to improve results.
- You could use a language learning kind of algorithm to do such modifications.
- You could integrate such an algorithm easily with the database.
- It would have really fast predictions.
- And even more - it seemed likely to have the attributes I needed with the 'world modeling' problem.

The grand realization seemed to be that you could transform a machine-learning problem into a language-learning problem and create a solution for this problem with very curious and useful properties. Another realization (later on) was that the language learning problem could actually be generalized even further to also allow learning the patterns in ordered data like text, images, and even patterns over time and space, which was all very relevant to the original problem of 'world learning'. And you could mix the ordered and the unordered data and it seemed that the algorithm could be generalized and extended even further.

I started working on this interesting idea and – well – work there was. I had implemented a simple text learning algorithm before, which I consider the first version. I re-implemented the algorithm for the database I had built, but it was slow and didn't work. I threw the database away and started the algorithm (again) from scratch.

The third version was fast, it worked, and it could provide wonderful results in classical machine learning data sets. I also tried to rewrite the fourth version for solving a separate problem, but when I realized that I could extend the algorithm to also learn 'grammar' in text, images, video and so forth, I abandoned the previous versions to build the fifth version, which was quickly abandoned, and the sixth version, which showed that the idea worked but was a dead end design-wise.

It was the seventh version of the algorithm that had both the extended capabilities and actually worked. It was extremely fast (much faster than the working third version) and it managed to not only provide great results in machine learning, but it could e.g. parse Space Invaders screenshot easily into the graphical primitives of creatures, ships, numbers, and letters.

Eventually, the originally fuzzy idea behind languages and data transformations had crystalized in the form of the re-expression method with new formal concepts, constructs, and mathematical equations. In the renewed model, the method sought to find ideal expression for original information - let it be variables, images, or similar - in a process where the patterns are eliminated from the data and captured in the generated language/grammar. In the process, the bits of data that once described variable states, letters, or pixels became bits for classes, words, and graphical primitives.

So, the idea had reached formality and clarity, the results were impressive, the method was extremely high performing and very generic and even further - it became clear that it was a new and unique contribution to the field. At this point, the big question became: ‘What next?’ Following the beloved hacker ethics, I made my first publication of the method on GitHub and posted the method on a few public forums, where it raised curiosity and even heated discussion. I have also discussed about the method with professor Hollmen from Aalto University, who seemed genuinely interested and supportive of pushing the idea further. Most likely, the next logical step would be to contribute the results forward to academic circles through a real publication. And maybe in the future, with some additional research, it will evolve into something even more wonderful and curious, as there seems to be great potential for even further improvement and extension of its abilities.

Someday, the solution may see its day in real life applications. Because of its probabilistic nature, it could be employed in applications like credit risk rating. With some sparse optimizations, it may well be possible that it could be employed for ranking in search engines due to its raw power and speed. But who knows? Maybe the solution will solve a totally different problem – sometime in the future. Whatever the destination may be, the journey itself has been rewarding and has taught many interesting things; for what it has showed: certainly creativity most flourishes in challenges and even further; inventing is fun!