This is the first in an open ended series of blog posts in which I describe my approach to working on a large AI project. At the time of writing I have just started with this endeavor. But whether I achieve my goal of creating an algorithm that solves Rubik’s Cube without prior human knowledge, or whether my research goes down a different path, my hope is that this blog will be helpful to anyone who is new to machine learning or research in general.

Rubik’s Cube

The Rubik’s Cube is a popular combination puzzle. Like any other cube it has six sides, but each side consists of six panels. In the sorted (solved) state, there is exactly one color on each side of the cube. You can turn each side independently and thereby re-order the colored panels. The problem is to sort the cube again, after you’ve scrambled it.

There are 43.252.003.274.489.856.000 possible configurations of the Rubik’s Cube. This makes this puzzle very hard to solve with pure reinforcement learning (RL) techniques, because once you are in a scrambled state, it is exceedingly unlikely that you stumble over the solution by taking random actions. Taking random actions, however, is exactly what a typical untrained RL agent does.

Recently, McAleer et al. managed to write a program (“DeepCube”) that learns to solve Rubik’s cube without prior human knowledge , except for an abstract representation of the cube. Check out my other blog post for a summary.

In the present project, my goal is to go one step further and solve a somewhat more difficult problem than that of McAleer et al., as I describe in the next paragraph. For instance, in my problem the algorithm is only provided with a description of the goal, and cannot learn by starting from the goal state, as DeepCube does.

The objective

The idea is this: I want to create an algorithm that is given only images of the cube, as well as a set of actions that turn its six sides. After each turn it can observe a new set of images of the (now) modified cube.

In addition to the set of images of the present state of the cube, the algorithm is also given a set of images of the solved cube. Thus, there is no external reward signal, but only a description (image) of what it is supposed to achieve.

Drafting a solution

To solve the objective above, I plan to construct an algorithm with three components.

A representation component that generates an abstract representation of the environment / Rubik’s Cube. This might be achieved with a generative query network , or some variation of it.

An action manager component that learns to reliably modify specific parts of that representation by taking actions in the environment and updating the representation. (It could also learn using an internal action-consequence predictor.) Once trained, this component should be able to return a set of useful algorithms (action-sequences) when it is given a set of positions in the representation that are to be manipulated. An algorithm is useful if:

  • it reliably changes a specific subset of the representation with little to no effect on the other parts and
  • the algorithms are “almost orthogonal” in the sense that linear combinations of algorithms approximately affect the union of the subsets of the representation.

The actor component uses algorithms generated by the manager component to move the current state of the environment towards the goal state (e.g. using Monte Carlo tree search and some distance heuristic). The latter is given as a list of images or textural description of the solved cube, as mentioned above. This description is encoded using the representation component.

Outlook

My Rubik’s Cube project is a playground for trying out different machine learning techniques. But the specific long-term goal outlined in this post should act as a guide, as well as a source of new problems to solve.

This is my first blog post (ever!), and also my first large machine learning project. I hope you enjoy reading about my journey into the exciting field of artificial intelligence, and I very much appreciate your constructive feedback.

Click here to return to the project overview.

References