Enumerating all of 3x3x3 cube space - required performance characteristics

I've been playing around with what it would take to enumerate the entire cube space for the standard 3x3x3 cube, using a large number of computers working together.  I've come up with what I think is a reasonable approach, a bit past the bare edge of what might be possible with today's technology, but perhaps not too far past.  The assumption will be that we want to solve the problem in one year, with all the machines involved in the project fully dedicated to the problem for the whole year.

The following is not a compete description of my proposed approach, but rather is an analysis of the performance characteristics that would be required.  There is not much advantage in coming up with an approach that would require billions of years to compute.

Cube space contains the well known 4.3E19 positions (approximately).  This can be reduced to about 4.5E17 with symmetry and antisymmetry.  My proposed algorithm would require 934778 computers (a little less than a million), with the 934778 figure being the number of unique corner positions reduced by symmetry and antisymmetry.  Each computer would have to store the distance from Start for about 4.9E11 edge positions, where 490497638400 is one half the size of the edges group.  At one byte per position, 4.9E11 is about one half a terabyte, which is at the level of a large disk drive.  Using memory would be better than using disk, but it would surely be easier to find a million machines with a half terabyte disk than a million machines with a half terabyte of memory.  Storage requirements could in principle be reduced to two bits per position.  I worry that if this project were to be put in place for real, it would be necessary to use the full byte per position in order to allow the computers to take checkpoints and to be restartable from a checkpoint.

There are 31536000 seconds in a year.  About 1.43E10 positions would have to be calculated per second overall, and about 15283 positions would have to be calculated per second per machine.  The figure of about 15000 positions per second per machine certainly seems doable based on my experience through the years in enumerating entire cube spaces.  This is a very different order of magnitude of speed as compared to finding optimal solutions for particular positions, which typically takes much more than one second per position.

Finally, all the machines would have to send large amounts of data to each other.  log2(490497638400), or 39 bits would be required to represent each edge position.  Essentially, that means that each edge position could be stored in five bytes.  Each machine would be sending and receiving about 15000*5 bytes of data per second to and from other machines, or about 75000 bytes.  Network bandwidth is usually stated in bits per second rather than bytes per second, so each machine would have to send and receive about 600000 bits of data per second.  (Within a machine, what would have to be stored would be distances from Start, ergo either one byte or maybe even two bits would be required for each position.  The five bytes per position would arise only when information had to be exchanged between machines.)

For each computing resource - processing speed, memory, disk space, number of computers, and network bandwidth - the resources would be challenging but not utterly unreasonable in perhaps the next decade or two.  As all the required computing resources get bigger and faster and cheaper, I think the key obstacle would be coming up with the million machines in something like the style of seti@home.  seti@home runs as a screensaver in the background using otherwise idle resources.  The approach I'm talking about would require a dedication of many more resources on each machine than does the seti@home approach.  It seems to me that to use something like the seti@home approach to enumerate all of cube space, cube space would have to be enumerated by finding optimal solutions for one position at a time, which seems to be prohibitively slow.  On the other hand, the seti@home approach would allow all the machines to be running pretty much independently of each other and asynchronously of each other.  The approach I'm talking about would require all million machines to be running cooperatively and somewhat synchronously with each other at all times.

Comment viewing options

Select your preferred way to display the comments and click 'Save settings' to activate your changes.

Enumerating all of cube space

Why not consider a simple coset approach? It's already been shown that a coset approach can be very effective; it eliminates all network communication (except for splitting up the computation and communicating final results); it eliminates the need (for most cosets) to "explore" the last ply (since that last ply is typically reachable from positions in the previous ply plus a move in the coset). Best of all, you can "approach" the full enumeration piecemeal. Say this year you explore fully to ply 16 (138M cosets at 20min each is only 5,300 CPU years) and follow that up with ply 17, 18, and 19 as resources permit. Indeed, I strongly suspect almost all cosets will be "finished" at ply 18, and the number of cosets that will need to be explored at ply 19 will be minimal.

And the coset approach, for the most part, fits the SETI model; asynchronous, independent computations. True, the computation is currently about 20min long and requires about 3.5GB of RAM (if you use the Kociemba coset at ply 16) and gets much longer at deeper plys, but it *could* be suspended and resumed.

I think the key question is, what do we hope to achieve with all this data? If it's just count at each depth, along with perhaps some key positions (all depth-20 positions, all depth-21 positions, etc.) that's one thing, but for me at least the primary interest is just in the diameter; once we have shown that, the other data is of much less interest. And I believe we are very close to computing the diameter.

My hope is a little different

My hope is a little different. I hope to calculate the count at each depth, from which the diameter could obviously be easily derived.

I'm not sure I understand exactly how the coset model fits in to my goal of achieving the count at each depth. As I understand it, the coset approach allows you to place an upper limit on a coset that then applies to every element of the coset. For example, placing an upper limit of 20 on every coset (face turn metric) would prove that the diameter of the Cayley graph for the cube group is 20 for the face turn metric (given that there are positions known to require 20 face turns). And you are pretty close to achieving that result already.

But having an upper limit for every element of a coset is not the same thing as having an exact distance from Start for every element of the coset. Could you explain a little further how to get from an upper limit for the coset to an exact distance from Start for every element of the coset?

Coset search

I should also mention that using the coset search (at least, using the Kociemba group), it's not clear you can take advantage of nearly as much symmetry. Using it in a straightforward fashion, you can only take advantage of 16-way symmetry, rather than 96-way combined symmetry and anti-symmetry.

It's OK, you can get the counts

Although I can't find a post where the process was described explicitly, I imagine this must have been explained before, when Tom and Silviu were reporting coset searches early in 2006.

Suppose that H is a subgroup of the full cube group G  and that its cosets are {Ci }. Make an iterated, depth-first search starting from any position gi in coset Ci . Each time you encounter a position hk in H that you haven't visited before, check it off your list and add 1 to the count for the current depth—you just found an optimal solution to gi hk-1, which is one of the cubes in Ci . Continue the search till you've visited all of H . This process is relatively fast because you can prune the search using a table of (lower bounds on) the shortest distance to H : Tom has previously given counts/distance info from a number of coset searches of this kind.

By repeating for all cosets Ci , you build up the complete distance information for all cubes in G.

If we define the permutation

If we define the permutation x*y in the way that we do x before y, we should write hk-1 gi, which is an element of Cj.

The problem with the coset approach is, that it takes about 20 min to find the 20 moves upper bound, but to get the exact distribution it takes longer (how much longer?) even with the H-moves "prescan" method Tom describes.

Analysis of coset approach

Okay, here are some numbers on runtime. This message is quite long.

These numbers are from an analysis of 300K coset runs, each with phase
1 run to level 16, from earlier this year. These positions are not
randomly selected, but they are well distributed and, I believe, fully
representative.

These runtimes are from a variety of machines, some multi-core, some
not, some faster than others. Since I do not have information on what
specific machine was used for what run, just consider these results to
be for a variety of reasonably modern machines.

Time is split into phase 1 and phase 2. For the implementation used
for this run, phase 1 is not threaded, but phase 2 is. On average,
phase 2 took 684.9s; this should not be significantly impacted by how
deep we run phase 1. (The 684.9s is for runs on machines with varying
numbers of cores, but most we run with one or two cores, and for
searches of depth 18 or greater, this time becomes inconsequential.)

Average time for phase 1 (for all searches through depth 16) is
477.5s. For those searches that averaged more than 1 second, we have:

Ply Average time
14 2.303s
15 31.98s
16 440.9s

The branching factor (using the full precision) works out to be
13.786. This is slightly higher than expected, and may reflect
changing locality. Using this value to project to deeper plys, we
expect to see:

Ply Average time
17 101m
18 23.3h
19 13.4d

This does not include the phase 2 time, but the phase 2 time should is
negligible for plys 18 and 19.

A ply 19 search followed by a phase 2 scan should complete every coset
unless there is a distance 20 with sufficiently few solutions that
none of them start with a move in H; I believe this will be
sufficiently rare to not affect the total runtime.

If we assume the upper bound is indeed 20, then clearly about 138M x
14.4d completely suffices; in reality, fewer cosets are needed, but so
far I have only been able to reduce this to 90,209,384. (I expect
further reductions as I continue work on this.) Multiplying 90.2M by
14.4d gives 3.55M computer years, but I believe we can do much better
than this.

My expectation is that after a ply 18 search followed by the phase 2
scan (which would prove all distance-18 and less positions, and prove
almost all distance-19 positions), very few positions, if any, should
remain. If any do, they can quickly be run through a six-axis
Kociemba algorithm to find a depth-19 solution; if none is found in a
reasonable time, they can be run through a full optimal solver
(starting at a depth of 19, since we have already shown them to be at
distance 19 or greater). If the time required to analyze the
remaining positions is relatively small, as I suspect it will be, and
we assume ply 18 searches are sufficient, then we need only 90.2M x
25.1h or about 258K computer years.

For those rare cosets that have many possible distance-20 positions
after a ply 18 search and a phase 2 scan, simply running a full ply 19
search may be better than trying to analyze each individual remaining
position. But I expect this to be sufficiently rare that it would
not be a significant factor.

Careful tracking of symmetry and coset overlap will be needed of
course, but this should be straightforward.

So I am pretty confident that using today's machines, 258K core years
should suffice to find the counts of positions at all depths (suitably
categorized by symmetry class) along with a full explicit enumeration
of all distance 20 (and perhaps distance-21 or distance-22) positions.

By comparison, simply proving a diameter of 20 (if true) would require
perhaps only 3500 core years.

Thank you for supplying all t

Thank you for supplying all the details. Only one sentence is not clear to me: "...unless there is a distance 20 with sufficiently few solutions that none of them start with a move in H." I would say ....none of the solutions end with a move in H.

Correct

Yes, thanks for the correction!

Further

This is correct, and there's one other thing that helps.

If the subgroup H is generated by cube moves, before starting any depth-first search for a given depth, first extend all found positions so far by one of the generators of H and cross those moves off too. This will (very nearly always but I believe I have found at least one exception) eliminate the need to do the last depth-first search.

So if the coset is truly at distance 20, a distance-19 depth-first search will suffice; if it is at distance 19, a distance-18 depth-first search will suffice.

Indeed, if after doing a distance-18 depth first search followed by this extension (the phase 2 extension), there are but a few positions left, you can hand those few positions to first the six-phase Kociemba solver and if needed, an optimal solver rather than doing a full depth-19 search.