# Numerical formula

I wrote a program that counts cube positions by taking into account only the identities of length 4 and the identities of length 12. The results of this program are in the second column below:

``` d                  positions I4     positions I4&I12   positions  ALL
--                  ------------     ----------------   --------------

0                             1                    1                1
1                            12                   12               12
2                           114                  114              114
3                          1068                 1068             1068
4                         10011                10011            10011
5                         93840                93840            93840
6                        879624               878880           878880
7                       8245296              8231400          8221632
8                      77288598             77094824         76843595
9                     724477008                             717789576
10                    6791000856                            6701836858
11                   63656530320                           62549615248
12                  596694646092                          583570100997
13                 5593212493440                         5442351625028
14                52428869944896                        50729620202582
15               491450379709824
16              4606688566257048
17             43181530471120320
18            404768967341615520
19           3794166513675844032
20          17118481264698063536
21	   185802363081369801472
```

The first column is based on Dan's recurrent formula that takes into account only identities of length 4.
The third column is based on Rokicki's results up to 14q*. [I thought he posted the results up to 15q* but I do not find that post!]. Of course his results take into account ALL identities.

The second column are my program results. I was only able to reach 8q, my program would take a month of calculations to produce the number at 9q...

The above results however show that either my program has a bug or the recent new formula developed by Jerry Bryan is wrong. In fact he predicts that based on the I4 and I12 identities P is equal to 8225857. My program predicts 8231400.

If we look at Jerry's formula :

P[n] <= s1*(2/3)*(749376/750720)*P[n-1] + s2*(2/3)*(120036/120132)*P[n-2] + s3*(2/3)*P[n-3] + s4*(2/3)*P[n-4] + s5*P[n-5] + s6*P[n-6], n >= 6

The strange looking numbers are what Jerry calls fudge factors. For example 2/3 is a fudge factor to account for the fact that each position at P[n-1] may follow only 12*2/3 = 8 paths.

This however can not be told about 749376/750720 which in my opinion has no "physical" or mathematical explanation... This leads me to have doubts about the correctness of this new formula.

## Comment viewing options

### A Further Clarification

This will say the same thing as my first post in this thread, but I see that I can make it a little clearer.

We let P[n] be the number of positions of length n.  Knowledge of identities of length up to and including 2n allow us to develop a formula that is an equation for P[i] for 0 <= i <= n.  We can extend the formula to values of i that are greater than n without changing any terms in the formula, but to do so we must change the formula into an inequality.  The formula is still mathematically correct, and we shouldn't say that it is "incorrect".  But as an inequality it is less useful than if it were an equation.

We say that Dan Hoey's original formula is based on 4q identities.  Dan's formula is really based on 8q identities, and the 8q identities are really double 4q identities such as FFFFBBBB.  Therefore, Dan's analysis supports the equation version of the formula up through P.

Dan was very careful to change his P[i] formula from an equation to an inequality for values of i that were larger than could be supported by identities of length up to and including 2n.  Because identities up to a length of 8q were known, Dan's formula remained an equation up through P.  And because there are no 10q identities, Dan's formula remained an equation for P.  But there are 12q identities and his formula did not take them into account.  Therefore, he wrote his formula as in inequality for P and above.

My version of the formula takes 12q identities into account, so it is an equality for P.  But my formula does not take 14q identities or greater into account, so it is an inequality for P and above.  Because it takes 12q identities into account, my formula is a slightly tighter inequality for P and above than is Dan's formula, but it is still in inequality.  That's why the P results from your program for the "positions I4&I12" case do not match the Rokicki results for P.

Finally, I will repeat my assertion that using identities of length up to and including 2n to develop a formula for P[i] for 0 <= i <= n  is a little problematic.  What is really needed in order to develop the P[i] formula for 0 <= i <= n  is duplicate positions of length up to and including n.  We can get back and forth between duplicate positions of length n and identities of length 2n, but only by taking any "middle syllables" that appear in the identities into account.

### Thoughts about the recurrent formula

If you look at Dan's recurrent formula we notice that it is only at depth 4 that the formula becomes really recurrent with no new terms appearing. The reason for that is that after depth 4 no longer syllable can be found.

If you take the ABCD sequence such that AB, BC, CD are smaller sylables, there is no way to choose E such that DE is a syllable. This is the secret behind Dan's formula. So if you take an arbitrary sequence such as ABCDEFGHIJKLMN you know for sure that this sequence can belong to one of the following 4 categories:

1) MN, LM and KL are syllables. [JK MUST not be a syllable]
2) MN and LM are syllables. KL is not a syllable.
3) MN is a syllable. LM is not a syllable. KL can be or not a syllable.
4) MN, LM and KL are not syllables.

The fact that ANY sequence belong to one and only one category of the above is what makes Dan's formula possible.

I have come up with a notation to describe each category:

1) 2-1+2-1+2
2) 2-1+2+1
3) 2+1+1
2) 1+1+1+1

2 means a syllable of length 2.
-1 means that the previous syllable of length 2 shares its last move with the next syllable of length 2.
1 means that the move does not have any adjacent move that can turn both of them into a syllable.

Notice that the sum above of each category is 4.

Now if you take a sequence of length 5 and using the above notation you will be able to write it in one of the following form:

1+1+1+1+1 [no syllables that share their moves]
2+1+1+1 [one syllable]
2-1+2+1+1 [two syllables]
2-1+2-1+2+1 [three syllables]

The category 2-1+2-1+2-1+2+1 is impossible because there are 0 positions that can be described this way. It is the fact that they are no positions with 4 adjacent syllables that limits the Dan's formula to 4 terms only.

With this in mind, how one can develop a recurrent formula for the 12q identities. if we take a position of length 6 we can write it as follow :

1)1+1+1+1+1+1
2)6

The first category is of positions that are not a half sequence of a 12q identity. The second category is of positions that are half sequence of a 12q identity.

Now if we take a position of length 7 we can write is as follow:

1)1+1+1+1+1+1+1
2)6+1
3)6-5+6

The category 3) is of sequences ABCDEFG such that ABCDEF is a syllable of length 6 and BCDEFG is a syllable of length 6.

The category 2) is of sequences ABCDEFG such that ABCDEF is a syllable of length 6 and BCDEFG is NOT a syllable of length 6.

I hope you are following my notation and what it means...

Developing a recurrent formula for the 12q identities will need to figure out how many categories they are. For example the category:

6-5+6-5+6-5+6-5+6-5+...... which can be written as N*(6-5) will stop for some N, in such a way that there is 0 position of the form (N+1)(6-5). The same thing for the category:

6-4+6-4+6-4+......

Of course they are many other categories, they are in the form:

6-a+6-b+6-c+6-d+.... where a,b,c,d,.... are >=1 and <=5.

The recurrent formula will be the sum of terms where each term correspond to one category.

I am working on a program that can list all categories for 12q identities. My current program was able to rediscover Dan's formula by numerical means only. This means that without having knowledge about the cube or its physics and based on the 4q identities it was able to infer it. I am hoping to do the same thing with the 12q identities, my guess however is that this recurrent formula will have new terms at each new depth and it will take so many depths before the number of terms becomes constant... Actually the depth at which it might stop might be bigger than God's diameter. In comparison Dan's formula has a new term at depths 1,2,3,4. After depth 4 the number of terms is constant and equals 4.

### Numerical formula versus Mathematical formula

I think I do understand the logic you are using to develop your new formula. The issue I am having with it is that what it predicts does not much what my brute force program is producing.

My program produces the exact same numbers as Dan's formula when I take into account all 4q identities. I would expect my program [If I assume it is free of bugs] to produce the exact numbers as your new formula. Unfortunately it did not so either your new forumla is not correct or my program is buggy.

My program simply generates all sequences up to a certain depth then starts removing all sequences that should not appear because of a certain identity. What is left is the number of positions at that depth.

### Some things to check with your program

I certainly could have a mistake with my formula, but I think it is correct.  In any case, here are some things to check with your program.

First of all, your program should match Dan Hoey's formula exactly out to P (and really, out to P because there are no 10q identities).  By "exactly", I don't mean just the final total.  The individual terms should match Dan's formula as well.

Here is what I think you should see from your program.  For our purposes here, we do not need to break a maneuver down to all of its constituent syllables.  We just need to break a maneuver down as y=xs, where s is the rightmost syllable and x is everything to the left of the rightmost syllable.  For example, consider RUFB.  In your terminology, I think this is 1+1+2, which I have been writing by syllable as (R)(U)(FB).  For our purposes here, we just need to break it down as far as RU(FB).  We don't really need to know that RU is 1+1.

As another example, your program should find 80088 positions that I describe as 4+1 in the table below.  That is, I'm talking about a position of length 5 consisting of a position of length 4 followed by a syllable of length 1.  For the purposes of this table, it doesn't matter whether the 4 is broken down as 1+1+1+1 or 2+1+1 or 1+2+1 etc.  Through P, this table has been known to be correct for many, many years.  Other things to note in this table are that the 0+(n-0) column represents syllables of length n, and that the 0 for n=5 in the 0+(n-0) column reflects the fact that there are no 10q identities.

```                       Dan Hoey's Formula Through P

y=xs           4+(n-4)  3+(n-3) 2+(n-2)  1+(n-1)  0+(n-0)      P[n]

n

0                                                                 1
1                                                     12         12
2                                            96       18        114
3                                  912      144       12       1068
4                        8544     1368       96        3      10011
5               80088   12816      912       24        0      93840
```

Now we get to the good stuff.  First, let's just extend Dan's formula through P.

```                       Dan Hoey's Formula Through P

y=xs  5+(n-5)  4+(n-4)  3+(n-3) 2+(n-2)  1+(n-1)  0+(n-0)      P[n]

n

0                                                                 1
1                                                     12         12
2                                            96       18        114
3                                  912      144       12       1068
4                        8544     1368       96        3      10011
5               80088   12816      912       24        0      93840
6     750720   120132    8544      228        0        0     879624
```

Dan's formula says that P≤879624 when in fact we know that P=878880.  So Dan's formula is not incorrect.  It's just an inequality.  In order to make Dan's formula into an equality, we need to adjust the 5+1 column, the 4+2 column, and the 0+6 column.  That is, we need to adjust the 750720, the 120132, and the rightmost 0 in the row for P.  For example, suppose we could find a duplicate position of length 6 which was of the form 5+1.  By duplicate position, we mean 2 or more different maneuvers of length 6 that yield the same position.  And let's suppose we find exactly 2 such duplicate positions.  In that case, we would subtract the 2 from 750720, and we would add the 1 to the 0.  The result would be a follows.

```                       Dan Hoey's Formula Through P
Adjusted for one duplicate position of the form 5+1

y=xs  5+(n-5)  4+(n-4)  3+(n-3) 2+(n-2)  1+(n-1)  0+(n-0)      P[n]

n

0                                                                 1
1                                                     12         12
2                                            96       18        114
3                                  912      144       12       1068
4                        8544     1368       96        3      10011
5               80088   12816      912       24        0      93840
6     750718   120132    8544      228        0        1     879623
```

But there is more than one duplicate position.  Some of them are of the form 5+1, which is where we need to adjust the 750720 figure, and some of them are of the form 4+2, which is where we need to adjust the 120132 figure.  Here is what the table looks like when all necessary adjustments are made to make P into an equality.

```                       Jerry Bryan's Formula Through P

y=xs  5+(n-5)  4+(n-4)  3+(n-3) 2+(n-2)  1+(n-1)  0+(n-0)      P[n]

n

0                                                                 1
1                                                     12         12
2                                            96       18        114
3                                  912      144       12       1068
4                        8544     1368       96        3      10011
5               80088   12816      912       24        0      93840
6     749376   120036    8544      228        0      696     878880
```

I figured out which adjustments were necessary by looking at the list of all 12q identities not reduced by symmetry and antisymmetry that you posted, and also by looking at the list of all 12q identities reduced by symmetry and antisymmetry that Herbert Kociemba posted.  However, the 12q identities could not be used directly to make the necessary adjustments.  That's because making the adjustments correctly really requires the 6q duplicate positions rather than the 12q identities.  It's possible to determine the 6q duplicate positions from the 12q identities, but the correspondence between the two is not exact because of what I call the middle syllable problem in the 12q identities.

You should be able to make your program produce all the results in the table above, and not just the 878880 total for P.  You really should be able to produce the 749376, the 120036, etc. But to do so, I think you are going to have to convert your 12q identities into 6q duplicate positions, taking the middle syllable problem into account.  I don't think you are going to be able to use your 12q identities directly.

### 4 or 1+1+1+1

The only reason I did not use 4+1 is that according to my notation it would be interpreted as a syllable of length 4 plus a syllable of length 1.

I think I should improve my notation by adding something like:

4'+1 where 4' would mean any one of 1+1+1+1 or 2+1+1 or 1+2+1 etc

I think if we continue the calculation at further depths using the format of your last table, we will reach a point where no new terms are appearing, it is at that moment that we can find the recurrent formula...

### The Funny Fudge Factors

Tom Rockicki did post his 15q results recently.  They were lost by the corruption of the Drupal database.  I hope he reposts them now that the Drupal database has been repaired.

"Fudge factors" may not have been the best terminology to use, but I didn't know what else to call them.  Nevertheless, the "fudge factors" simply take into account the identities of 12q.  Dan Hoey's original formula only took into account identities of length 4q such as FFFF plus the double 4q identities such as FBFBFBFB=FFFFBBBB that arise because moves of opposite faces commute.

The 749376/750720 factor takes into account a part of the 12q identities and the 120036/120132 factor takes into account the rest of them.  The way I think of it is that Hoey's formula is making a number of predictions, and the predictions are correct as long as we only have to deal with 4q identities and don't have to deal with 12q identities.

For example, the Hoey version of the first term in my formula would be s1*(2/3)*P[n-1].  That's a prediction of the number of positions there are of length 6q that are of the form 5q+1q - that is, a 5q position followed by a 1q syllable.  The Hoey version of the second term in my formula would be s2*(2/3)*P[n-2].  That's a prediction of the number of positions of length 6q there are that are of the form 4q+2q - that is, a 4q position followed by a 2q syllable.  Both predictions are very close to correct but are ultimately incorrect because they do not take 12q identities into account.  And indeed, the predictions of the Hoey formula are completely correct for positions of length 5q or less because the 12q identities do not come into play until we reach positions of 6q or more.

I would describe 749376/750720 as being the proportion of the 5q+1q positions that do not involve 12q identities, and 120036/120132 as being the proportion of the 4q+2q positions that do not involve 12q identities.  Also, my formula would need additional "fudge factors" further from Start to take into account 14q identities, 16q identities, etc.

Finally, I don't think that searching for identities is the best way to approach this problem.  Rather, I think it's more productive to search for sequences of moves that yield duplicate positions.  In principle, sequences of moves that yield duplicate positions can be derived from identities and vice versa.  But discrepancies between identities and duplicate positions creep into the calculations when identities have what I call a middle syllable.  For example, if a sequence X that ends with U produces the same position as a sequence Y that ends with U', then the corresponding identity XY' can have either a (UU) or a (U'U') syllable in the middle and the correspondence between identities and duplicate positions becomes very muddled.  Dan Hoey called this the "4q within the 12q" problem, and tried to solve it by considering only identities.  I think it's better to give up considerations of identities altogether and focus only on duplicate positions.

### Reposted

I've reposted the 15q* results; sorry it took so long.