The Rubik's Cube project - Johannes E. M. Mosig

This is the first in an open ended series of blog posts in which I describe my approach to working on a large AI project. At the time of writing I have just started with this endeavor. But whether I achieve my goal of creating an algorithm that solves Rubik’s Cube without prior human knowledge, or whether my research goes down a different path, my hope is that this blog will be helpful to anyone who is new to machine learning or research in general.

Rubik’s Cube

The Rubik’s Cube is a popular combination puzzle. Like any other cube it has six sides, but each side consists of six panels. In the sorted (solved) state, there is exactly one color on each side of the cube. You can turn each side independently and thereby re-order the colored panels. The problem is to sort the cube again, after you’ve scrambled it.

There are 43.252.003.274.489.856.000 possible configurations of the Rubik’s Cube. This makes this puzzle very hard to solve with pure reinforcement learning (RL) techniques, because once you are in a scrambled state, it is exceedingly unlikely that you stumble over the solution by taking random actions. Taking random actions, however, is exactly what a typical untrained RL agent does.

Recently, McAleer et al. managed to write a program (“DeepCube”) that learns to solve Rubik’s cube without prior human knowledge , except for an abstract representation of the cube. Check out my other blog post for a summary.

In the present project, my goal is to go one step further and solve a somewhat more difficult problem than that of McAleer et al., as I describe in the next paragraph. For instance, in my problem the algorithm is only provided with a description of the goal, and cannot learn by starting from the goal state, as DeepCube does.

The objective

The idea is this: I want to create an algorithm that is given only images of the cube, as well as a set of actions that turn its six sides. After each turn it can observe a new set of images of the (now) modified cube.

In addition to the set of images of the present state of the cube, the algorithm is also given a set of images of the solved cube. Thus, there is no external reward signal, but only a description (image) of what it is supposed to achieve.

Drafting a solution

To solve the objective above, I plan to construct an algorithm with three components.

A representation component that generates an abstract representation of the environment / Rubik’s Cube. This might be achieved with a generative query network , or some variation of it.

An action manager component that learns to reliably modify specific parts of that representation by taking actions in the environment and updating the representation. (It could also learn using an internal action-consequence predictor.) Once trained, this component should be able to return a set of useful algorithms (action-sequences) when it is given a set of positions in the representation that are to be manipulated. An algorithm is useful if:

it reliably changes a specific subset of the representation with little to no effect on the other parts and
the algorithms are “almost orthogonal” in the sense that linear combinations of algorithms approximately affect the union of the subsets of the representation.

The actor component uses algorithms generated by the manager component to move the current state of the environment towards the goal state (e.g. using Monte Carlo tree search and some distance heuristic). The latter is given as a list of images or textural description of the solved cube, as mentioned above. This description is encoded using the representation component.

Outlook

My Rubik’s Cube project is a playground for trying out different machine learning techniques. But the specific long-term goal outlined in this post should act as a guide, as well as a source of new problems to solve.

This is my first blog post (ever!), and also my first large machine learning project. I hope you enjoy reading about my journey into the exciting field of artificial intelligence, and I very much appreciate your constructive feedback.

Click here to return to the project overview.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Johannes E. M. Mosig

The Rubik’s Cube project

Rubik’s Cube

The objective

Drafting a solution

Outlook

References

Leave a ReplyCancel reply