create a balanced index

December 8, 200619 yr

Newbies

I do not use Filemaker much - so please excuse me if I have put this in the wrong area

I have a database of artist that is used to export an xml for flash, I need a way to create a balanced index from the names.

the index needs to have 6 divisions — a-d, e-f etc

which if I had a normal person asking for this - it would be easy and i would just divide the alpha and call it done

instead - and I am not sure it can be easily done - I need the splits to be so that the number of artist in each split is as close to being equal as possible with out splitting up the artist in a letter group

ie: if there are 300 artist there should be as close to 50 artist in each split as possible

I can get the number of artists in each letter group

but I am not sure how to get from there to something like this

1 = a, b = 48 artist

2 = c, d, e, f, g = 52

3 = h, i, j, k, l, m, n = 49

4 = o, p, q, r = 54

5 = s = 49

6 = t, u, v, w, x, y, z = 48

any help is greatly appreciated

December 9, 200619 yr

Welcome pmnot,

I've come up with a basic iterative algorithm that loops through once and assigns the division based on the number of records seen previously in the current division (see attached). It's a start, but I'm afraid it's not perfect.

The trouble is when looking at the first part of a new group (say the "c"s) if there are 4 c's, the algorithm doesn't know whether those should be part of division 1 or division 2. So what I have it doing is assuming if the additional count for the new group would bring the current division count above the average number (total / number of divisions), it puts them in the next division. This has the effect of weighing down the last division with more records than all the others.

I'm not sure how to get around this. Perhaps it can be made a little smarter by checking how much over the current division would be with the additional group. But I suspect a better algorithm might be out there. Maybe something that goes through twice, shifting the groups as needed.

BalancedDivisions.fp7.zip

December 9, 200619 yr

This may seem simple at first glance, but it is in fact a member of the nasty knapsack problem family. It's easy to show that in some situations no optimal solution is possible. Suppose you have the following letter groups:

A - 100

B - 100

C - 100

D - 100

E - 100

F - 100

G - 100

The only way to divide these into 6 without splitting the groups is to have one group that is twice as large as all the others.

The problem is made worse by the constraint of keeping the original order. For example:

A - 50

B - 100

C - 100

D - 100

E - 100

F - 100

G - 50

Since you cannot combine A and G, you need to increase the allowable group size to 150. But this will also allow F and G to be merged together, so the result is only 5 groups instead of the required 6.

That said, given a reasonably large and diverse set, one could hope for an acceptable result by gradually increasing the allowed group size, until the number of the groups is reduced to 6.

BalanceGroups.fp7.zip