burdenslifted Posted March 8, 2010 Posted March 8, 2010 I'm trying to produce a report for payment of royalties by a company that sells both digital and physical (CDs) music. Sales can be based on tracks (individual songs) or albums (i.e., collections of songs); royalties are paid out based on sales numbers. The problem is that the report from which the sales information is extracted doesn't directly link the albums with the songs contained on the album. The song database contains a record for each song and relevant info: kp_song_id (a number that uniquely idetifies each song or song part on which the royalties need to be paid) title artist ISRC (a number that identifies the song) UPC (a number that identifies the album the song appears on) publisher royalty rate The sales report they get from the digital distributor contains the sales information based on whether it was an individual track sale or an album sale. So the info in that report is: kp_record_id title (track or album) artist ISRC (reported for songs sold as tracks ONLY) UPC (reported for both tracks and albums) Unit Count (i.e., number sold in that transaction) The objective is to create a report that summarizes for each publisher each song (kp_song_id) the total unit count (i.e., the total number of sales, whether by track or by album, for a particular sales period), e.g. publisher 1 -- sum of total sales song 1 - title - total sales - amt due song 2 - title - total sales - amt due publisher 2 -- sum of total sales song 1 - title - total sales - amt due song 2 - title - total sales - amt due (amt due = sales*rate) Sales for a particular period can be 100000+, so importing and summarizing totals unwieldly. Since this is a many-to-many relationship (i.e., each song can be sold many times and because of the album sales a sales record can be of many songs), must I create a join table? How do I best associate the individual track data and the album sales information? (BTW, I'm working with a legacy database and trying not to redevelop the entire solution.)
Vaughan Posted March 8, 2010 Posted March 8, 2010 (edited) From what you write it sounds as though you get a file from the distributors with sales information, and it looks to me as though this contains *almost* all the information you need. Let me see if I understand the problem: the sales info contains the ISRC only if the song was sold as a separate track. If a song was sold as part of an album then it won't be listed: only the album UPC will be listed. That's a show-stopper for getting reports on songs sold. You need to convert the album sale data into song sales. To do this you'll need an Albums table holding the album information and a listing of each song. Create a table for the imported Sales data, and another table related to it for the SongsSold. Import the data into the Sales table, and add the distributor info. Then process each sales record: if the sale was a individual song then create a SongsSold record for that song; if it was an album then create a SongsSold record for each song on the album. Do your reporting from the SongsSold table. Edited March 8, 2010 by Guest
burdenslifted Posted March 8, 2010 Author Posted March 8, 2010 Thanks -- you've got the problem right and I'm glad to know I'm not crazy to think this is a bear, even though it seems like all the data I need is there. The path you've described is just the one went down before realizing that there was 100,000 records to import. If only 10% are album sales, and each album averages 10 songs, that's a heck of a lot of processing to do on a regular basis. It is server based, but I'm working remotely, so it's not really practical. I'm now working on an alternative solution based on the songs database, with two filtered portals to the imported sales data table, one for tracks (based on ISRC) and one for albums (based on UPC eliminating tracks). Since the final report is based on songs anyway, this seems to be a viable solution... isn't it? Is there a fatal flaw I'm not yet seeing?
Vaughan Posted March 9, 2010 Posted March 9, 2010 This is not a difficult problem, nor is it uncommon. Don't baulk at importing 100,000 records regularly.* I don't see any other way to do it: I cannot see how using portals will help because you need to generate the sales records to get the reports you've said you need, especially if you need historical data kept. The SongSales table is basically a join between Sales and Songs, because it's a many-to-many. * I did a job a couple of years ago on an importer file that processed government stats data. Before my mods the data took over 2 days and nights continuous to process, so it only ever got run on a Friday evening to churn over the weekend and be ready for Monday morning. I got the processing down to less that 7 hours so it could run overnight.
comment Posted March 9, 2010 Posted March 9, 2010 I'm now working on an alternative solution based on the songs database, with two filtered portals to the imported sales data table It should work, but you will be quite limited in your reporting ability. Selling a product both individually and as part of a "kit" is a tough issue and I don't know of a really good solution for it. Breaking the album sales into individual song sales is a tedious and rather vulnerable process - but ultimately it provides full reporting potential (for example, sales by song by month).
burdenslifted Posted March 9, 2010 Author Posted March 9, 2010 Thanks both of you -- the client doesn't much care about the more flexible reporting options at the moment, so I think I'm gonna avoid the tedious and potentially error-prone join table for now. I am comforted to know that it's a sticky problem, and not something with an obvious solution I couldn't figure out. Maybe I can convince them to rebuild the database at some point in the "right" way.
comment Posted March 9, 2010 Posted March 9, 2010 There's no way to rebuild the database that will eliminate the problem. Consider an album with 10 songs that has sold 50 copies - but none of its songs sold a single copy. You have one sale record that affects 10 songs - but there is no way to summarize the sales by song, because none of the songs has a record in the sales table.
burdenslifted Posted March 9, 2010 Author Posted March 9, 2010 True, it would do nothing to solve that problem, but it would solve some other issues I'm tiptoeing around and make me feel better about the vulnerabilities introduced the join table process.
Vaughan Posted March 10, 2010 Posted March 10, 2010 Thanks both of you -- the client doesn't much care about the more flexible reporting options at the moment, so I think I'm gonna avoid the tedious and potentially error-prone join table for now. The join table is not error prone, nor is it tedious; it's the right data structure. What will be tedious will be doing more and more more elaborate hacks to get around the bad data structure. With a good data structure the answers just fall out of the database.
Recommended Posts
This topic is 5432 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now