# Algorithm for Counting Identities

I've been thinking about writing a program to calculate and count duplicate positions - roughly speaking, those positions that are half way through an identity.  What I have in mind will probably be a more time consuming program to write than I would prefer.  So I wonder if I could ask Herbert Kociemba and/or mdlazreg to post a little something about the programs they have already written to find identities.  It may well be that there is a much simpler approach to calculating duplicate positions than what I have in mind.

What I have in mind is an iterative deepening depth first search beginning at the Start position.  If that's all I did, the search would simply count 12n maneuvers for each distance from n, and it would not extract any useful information about how many duplicate positions there are for each n.  To solve these problems, I propose to store all the duplicate positions and not to store those positions that are not duplicate.  This would be for the quarter turn metric.  The program I have in mind would not be able to handle the face turn metric.

By storing the duplicate positions, the following should be possible:

1. After making a move, determine the EndsWith value for the resulting position.  EndsWith is an attribute of positions rather than of maneuvers.  For example, the maneuver FF yields a position that has an EndsWith value of {F,F'} because FF=F'F'.  By knowing the EndsWith value associated with the position that results from a maneuver, it is possible to prune a depth first search down to maneuvers that are the same length as the associated position.  For example, maneuvers such as FF' and FFF would be pruned from the search.
2. After making a move, determine whether the resulting position is a duplicate position or not.  And if the position is duplicate, determine whether the current maneuver is the first maneuver for the position that is encountered in the depth first search.  For example, suppose FF comes before F'F' in the search.  The program I have in mind would be able to determine that FF is a duplicate position and that FF is the first maneuver in the depth first search that produces the position.  The program would also be able to determine that F'F' is a duplicate position and that F'F' is not the first maneuver in the depth first search that produces the position.  So the program would count FF and not count F'F'.  Upon encountering duplicate positions, they would be counted and stored.  Upon encountering non-duplicate positions, they would be counted but they would not be stored or otherwise processed.

## Comment viewing options

### Duplicate Positions within Duplicate Positions

Thanks for the responses from both mdlazreg and from Tom Rokicki.  I'll certainly take a look at Herbert's program to see how he generates identities.

I have in mind something very similar but slightly different than what you both describe.  The algorithm I've worked out (without any computer code to back it up yet) would treat duplicate positions as only a single position when they become embedded into a longer sequence.  For example, LFB'U would be treated as (L)(FB')(U) - a unique decomposition into shorter syllables.  The sequence LB'FU would never appear, and LFB'U and LB'FU would not be treated as duplicate positions.  And it's not just commuting.  LFFU would be uniquely decomposed as (L)(FF)(U).  The sequence LF'F'U would never appear, and LFFU and LF'F'U would not be treated as duplicate positions.  Longer duplicate positions such as those of length 6q that are associated with identities of length 12q would be treated in the same manner.

I'm nowhere near as interested in identities as I am in duplicate positions.  But it seems to me that if my algorithm would work (for example) to identify 7q duplicate positions that were uniquely decomposable into shorter duplicate positions (i.e., syllables), then the 7q duplicate positions would create 14q identities that did not contain any shorter embedded identities.

### Enumeration of cosets

My program that counts positions at each depth (used to get the new 12f*, 13f*, 14q*, and 15q* results) can trivially be modified to list all duplicate positions at a given depth (where we define a duplicate position x as a position that has more than one canonical sequence leading to it, where a canonical sequence is a sequence for which all commuting moves have been placed in some canonical order). Indeed, the program works right now by finding all positions and then eliminating duplicates; it would be an absolute trivial modification to tell it to print the duplicates it is eliminating as it eliminates them.

Of course there are *many* such positions (how many is easy to compute). It would work fine in both face turn metric and quarter turn metric.

I could run through 11f* and 13q* easily in well under a day if you want me to generate those results. 12f* and 14q* would take probably a day each (or a little more) depending on the disk requirements for storing the duplicate positions. 13f* and 15q* would take a couple of weeks, mostly because the programs as they stand right now beat the heck out of the virtual memory system on the computer.

On the other hand, this technique, without at least some modification, would list duplicates that are trivially derived from shorter duplicates (i.e., if p_1 and p_2 are duplicates, and neither end with a move of U or D, then it would list p_1 U and p_2 U as a duplicate, even though it's just an extension of a shorter duplicate.)

### Kociemba's QTM solver

The program I have been using is Herbert Kociemba's QTM solver which can be found here:

http://kociemba.org/optiqtmSrc.zip

If you give it the UU' sequence as an input it will generate all 12q identities, it can also generate 14q and up identities as suboptimal solutions.

The generated 14q identities will however include trivial identities that can be deduced directly from the 12q identities, for example FAF' where A is some 12q identity.

I do not know if Herbert Kociemba's program can be improved to generate only non trivial identities at each level.

### The QTM-solver is just an opt

The QTM-solver is just an optimal solver which computes all optimal and suboptimal solutions for a given input, in this case for the identity UU'. Of course it would be possible to implement some code to suppress the output if for example a substring is already the identity - or some other rules which depend only on the current string.

### Kociemba's identities

Hi Herbert,

I am currently using your QTM-solver to generate identities at higher levels, trying to categorize them in an attempt to understand how they account for missing positions.

For example I have generated all 14q identities and calculated that they are 9944 missing positions that I need to account for.

So far I was able to understand why 8880+288 positions are missing, but I am left with 9944 - 8880 - 288 = 776 positions that I can not figure it out yet.

I think the reason for that is probably because your QTM-solver does not generate all the identities, for example :

initializing memory.
initializing tables.........................................................
enter cube (x to exit): U

solving optimal: U
Cube has 4 symmetries.

U' (1q*)

enter cube (x to exit): UU

solving optimal: UU
Cube has 8 symmetries.

U U (2q*)

enter cube (x to exit): UUU

solving optimal: UUU
Cube has 4 symmetries.

U (1q*)

enter cube (x to exit): UUUU

solving optimal: UUUU
Cube has 48 symmetries.

depth 2 completed, 1 nodes, 1 tests
depth 4 completed, 12 nodes, 3 tests
depth 6 completed, 100 nodes, 17 tests
depth 8 completed, 1159 nodes, 176 tests
depth 10 completed, 14240 nodes, 1960 tests
U U R U R' F' U U L' U' L F (12q*)

enter cube (x to exit):

So as you can see for the UUUU input it did not generate the following identities for example:

I (0q*)
UU' (2q*)
UUUU (4q*)
UDU'D'(4q*)

it immediatly generated 12q identities but missed the 4q identities...

Can you please explain to me why is that the case? and if there is anything I can change in your code so that such identities are generated?

Regards

### If the position has some symm

If the position has some symmetry, with the flag -s, all solutions are computed, even those which are related by the symmetry. But the program *never* prints out strings, which can be reduced trivially like the examples you gave. UDU'D' is the same as UDD'U'. XX', XXX and X'X'X' are the same as I, X' and X for any move X and hence not printed out because they trivially can be reduced to shorter maneuvers.

### Trivial identities at higher levels

Correct. And of course you did it this way because it is after all an _optimal_ solver...

However I am trying to see if it can be tweaked so that it does generate even what it considers trivial maneuvers.

The reason I need this is that I am adopting a new approach of counting the positions at each level. For example at level N, the number of possible maneuvers is simply pow(12,N), to get the number of positions I need to remove from this number all maneuvers that should not appear because of some identity.

P(n) = sum(R(k)*P(n-k)) for 1 <= k <= n

so the approach I am taking is to calculate R(K) from which I can then deduce P(n).

I think your QTM solver can be used to calculate the R(k), for example at level 6 it immediately returns 1440 identities leading to a 744 duplicate positions, but at level 7 it returns 8880 duplicate positions of length 7 and 288 positions of length 5 but the missing positions number at level 7 is 9944. So somehow the list of 14q identities is not generating all maneuvers which is causing me not to figure out how to account for the 9944 - 8880 - 288 = 776 missing positions...

Probably I should explain the full process I am following:

First I generate the 14q identities list using your QTM solver, I take then half of each identity and I solve it optimally which gives me a list of positions that are 7q* and another list of 5q*. The list of 5q* has 288 positions and I consider all of them to be removed from the 7th level as duplicate maneuvers. Out of the 7q* positions I keep only one representative of each duplicate positions set, all the rest is considered to be duplicate maneuvers, the number I get is 8880. So in a sense the 8880 at level 7 is similar to 744 at level 6.

So in total I get 8880+288 maneuvers that I can remove from the 7th level. But to match the exact number at level 7 I need to be able to remove 9944.

So right now I am missing 776.

Since I am fully relying on your QTM solver I thought that I can blame this on the fact that it is not generating all identities that it considers as trivial...

For example it can not generate 14q identities like this :

ABCDEFGHIJKLMO=I

such that ABCDEFG is of optimal length 3. It does generate however all identities that have ABCDEFG of length 7 and 5...

Sorry If I am not making myself clear.

### Instead of using 12^k, why no

Instead of using 12^k, why not use a slightly smarter formula that takes into account the required ordering of adjacent commutative moves?

### I do use them

I used 12^k just to explain the approach but what I use actually is 12*P(k-1) which takes into account not only identities due to the commutative moves but also all other lower identities...