Jump to content

create a balanced index


 Share

This topic is 5765 days old. Please don't post here. Open a new topic instead.

Recommended Posts

  • Newbies

I do not use Filemaker much - so please excuse me if I have put this in the wrong area

I have a database of artist that is used to export an xml for flash, I need a way to create a balanced index from the names.

the index needs to have 6 divisions — a-d, e-f etc

which if I had a normal person asking for this - it would be easy and i would just divide the alpha and call it done

instead - and I am not sure it can be easily done - I need the splits to be so that the number of artist in each split is as close to being equal as possible with out splitting up the artist in a letter group

ie: if there are 300 artist there should be as close to 50 artist in each split as possible

I can get the number of artists in each letter group

but I am not sure how to get from there to something like this

1 = a, b = 48 artist

2 = c, d, e, f, g = 52

3 = h, i, j, k, l, m, n = 49

4 = o, p, q, r = 54

5 = s = 49

6 = t, u, v, w, x, y, z = 48

any help is greatly appreciated

Link to comment
Share on other sites

Welcome pmnot,

I've come up with a basic iterative algorithm that loops through once and assigns the division based on the number of records seen previously in the current division (see attached). It's a start, but I'm afraid it's not perfect.

The trouble is when looking at the first part of a new group (say the "c"s) if there are 4 c's, the algorithm doesn't know whether those should be part of division 1 or division 2. So what I have it doing is assuming if the additional count for the new group would bring the current division count above the average number (total / number of divisions), it puts them in the next division. This has the effect of weighing down the last division with more records than all the others.

I'm not sure how to get around this. Perhaps it can be made a little smarter by checking how much over the current division would be with the additional group. But I suspect a better algorithm might be out there. Maybe something that goes through twice, shifting the groups as needed.

BalancedDivisions.fp7.zip

Link to comment
Share on other sites

This may seem simple at first glance, but it is in fact a member of the nasty knapsack problem family. It's easy to show that in some situations no optimal solution is possible. Suppose you have the following letter groups:

A - 100

B - 100

C - 100

D - 100

E - 100

F - 100

G - 100

The only way to divide these into 6 without splitting the groups is to have one group that is twice as large as all the others.

The problem is made worse by the constraint of keeping the original order. For example:

A - 50

B - 100

C - 100

D - 100

E - 100

F - 100

G - 50

Since you cannot combine A and G, you need to increase the allowable group size to 150. But this will also allow F and G to be merged together, so the result is only 5 groups instead of the required 6.

That said, given a reasonably large and diverse set, one could hope for an acceptable result by gradually increasing the allowed group size, until the number of the groups is reduced to 6.

BalanceGroups.fp7.zip

Link to comment
Share on other sites

  • Newbies

Thank both of you for your input and sample solutions - I readily admit both are a lot farther than I ever would have gotten without your help.

what I have ended up with is using the solution from comment (thank you very much), with the ability to finetune the balance by checkboxes

not very elegant but it works

BalanceGroups_pm.zip

Link to comment
Share on other sites

This topic is 5765 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.