Newbies pmnot Posted December 8, 2006 Newbies Posted December 8, 2006 I do not use Filemaker much - so please excuse me if I have put this in the wrong area I have a database of artist that is used to export an xml for flash, I need a way to create a balanced index from the names. the index needs to have 6 divisions — a-d, e-f etc which if I had a normal person asking for this - it would be easy and i would just divide the alpha and call it done instead - and I am not sure it can be easily done - I need the splits to be so that the number of artist in each split is as close to being equal as possible with out splitting up the artist in a letter group ie: if there are 300 artist there should be as close to 50 artist in each split as possible I can get the number of artists in each letter group but I am not sure how to get from there to something like this 1 = a, b = 48 artist 2 = c, d, e, f, g = 52 3 = h, i, j, k, l, m, n = 49 4 = o, p, q, r = 54 5 = s = 49 6 = t, u, v, w, x, y, z = 48 any help is greatly appreciated
Ender Posted December 9, 2006 Posted December 9, 2006 Welcome pmnot, I've come up with a basic iterative algorithm that loops through once and assigns the division based on the number of records seen previously in the current division (see attached). It's a start, but I'm afraid it's not perfect. The trouble is when looking at the first part of a new group (say the "c"s) if there are 4 c's, the algorithm doesn't know whether those should be part of division 1 or division 2. So what I have it doing is assuming if the additional count for the new group would bring the current division count above the average number (total / number of divisions), it puts them in the next division. This has the effect of weighing down the last division with more records than all the others. I'm not sure how to get around this. Perhaps it can be made a little smarter by checking how much over the current division would be with the additional group. But I suspect a better algorithm might be out there. Maybe something that goes through twice, shifting the groups as needed. BalancedDivisions.fp7.zip
comment Posted December 9, 2006 Posted December 9, 2006 This may seem simple at first glance, but it is in fact a member of the nasty knapsack problem family. It's easy to show that in some situations no optimal solution is possible. Suppose you have the following letter groups: A - 100 B - 100 C - 100 D - 100 E - 100 F - 100 G - 100 The only way to divide these into 6 without splitting the groups is to have one group that is twice as large as all the others. The problem is made worse by the constraint of keeping the original order. For example: A - 50 B - 100 C - 100 D - 100 E - 100 F - 100 G - 50 Since you cannot combine A and G, you need to increase the allowable group size to 150. But this will also allow F and G to be merged together, so the result is only 5 groups instead of the required 6. That said, given a reasonably large and diverse set, one could hope for an acceptable result by gradually increasing the allowed group size, until the number of the groups is reduced to 6. BalanceGroups.fp7.zip
Newbies pmnot Posted December 13, 2006 Author Newbies Posted December 13, 2006 Thank both of you for your input and sample solutions - I readily admit both are a lot farther than I ever would have gotten without your help. what I have ended up with is using the solution from comment (thank you very much), with the ability to finetune the balance by checkboxes not very elegant but it works BalanceGroups_pm.zip
Recommended Posts
This topic is 6616 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now