Processing XML Data & Script Performance

Klypto · August 2, 2013

Hello

I'm working on a personal side project that processes game character information and will (hopefully) provide a set of leaderboards where you can find your ranking with a give weapon or vehicle. The issue I've run into is that even with only 10,000 characters with the most play time and only a small sample set of around 30 weapons from a single class of vehicle, it would take 4-7 days for the server using my script processes to finish a single run of updating that set of character weapon information. I'm looking for ideas on optimizing the script process or alternatives to get the data processed faster. Currently the largest chunk of processing is parsing out information from XML data.

Background information:

Character data is available through the Sony Online Entertainment API:

http://census.soe.com/

They type or data requests I am currently pulling are:

http://census.soe.com/xml/get/ps2/single_character_by_id/?id=5428010618040144225

Characters can use a fairly large number of weapons, including weapons they do no own (gunning for another player's vehicles). There are about 12 categories of data I'm interested in for each given weapon id that the character has used (kills, deaths, usage time, shots fired, etc). Statistics for weapons used against that player are also listed, but do not include all categories (roughly 3).

There are 3 factions within the game, each with their own "variant" of a weapon. Some weapons between the factions are generic and completely identical, but appear with a different id number within the API to distinguish which faction that weapon is being used by.

The server I'm using is a Dell PowerEdge 2850

4x146GB drives (I beleive it's RAID 5, which might limit write speeds)
8GB RAM
2 Xeon 3.6GHz (w/HT? not sure if they are physically dual cores or HT enabled, but there are 4 threads)

I have the filemaker server RAM cache at 3,072 MB

Right now, I have 5 scripts that run to pull updated character data into the database that are scheduled to run at different frequencies and times. The data is dumped into each character record for other scripts to process.

Characters are broken up into 5 levels of ranking based on play time in the past week. This determines the frequency of which the data is to be pulled to limit both overuse of the API and the number of records the server I'm using has to process. The rankings are as follows:

not logged in for the past week - Dropped from updating until the end of the month.
logged on, but have hardly played at all - Checked at the end of the week
logged on for 5 hours - Checked Sunday and Wednesday
logged on for 10 hours - checked every other day
logged on for 15+ hours - checked every day

It's a bit more restrictive than I want it to be, but until it's possible to process more records within 24 hours, I don't see how I will be able to allow more frequent updates.

There is another script that runs periodically and checks to see if character data was imported and updates the found set. This script only covers generic character specific data like their total score, in-game rank, play time. This script also updates the rankings for the data get frequency.

The final script is a weapons update script. All it does is look for records where the last timestamp recorded when the character data was process is greater than the timestamp for the last time the weapon data was processed. This is the slowest script of the server that has to loop through each character's weapon that is specified.

The sample set of weapons I am currently using are from the faction specific Main Battle Tanks (MBTs). Each faction MBT has 10 possible weapons usage / listed with up to 12 categories of data each and 6 of those categories have 3 XML queries each as they are broken up on a per-faction basis. The 20 other weapons used by the other 2 factions each have up to 3 sets of data and 2 of the categories have 3x queries. So in total you have 20*(2*3+3)+10*(6*3+6) = up to 420 XML queries per character. This leaves each character taking about 40-60 seconds to process, which is too long.

Reasons it may take so long:

The XML I pull has some extra data that is not entirely relevant to what I want to do, bloating the size of the XML document by about 15-20%
XML in itself isn't the most efficient for parsing. You can also get the data in JSON, but I don't know how to parse that into Filemaker
I have poor experience with using xPath and might be using inefficient queries
The server hardware I'm using might not be able to handle it all
There's just freaking too much data to crunch and I'm insane for thinking I can do this as a side project
Filemaker sucks at doing large scale stuff like this and it should be processed externally
I'm approaching the entire project the wrong way.

Some things I have done to speed things up:

The weapons that are processed are currently limited to only weapons that the character owns (cuts out weapons that he may have used but does not own, and weapons that are used against them).
Switched the xPath queries from being done in 360Works to the FM.NEXUS Web services plugin which seemed to be faster

I probably will break up the weapons processing script to use the same ranking system and schedule all of them to run simultaneously, each focusing on their own ranking group, so that at least "priority players" will be updated in a semi-reasonable amount of time.

I also had the theory that it might be faster if I pre-combine the xpath queries in 360 works and then output the data, but then it has to package the info > dump it back to filemaker > parse and unpack the info into fields which might defeat any optimization gained.

Right now with this post I'm looking for higher level suggestions or general guidance. We can get more detailed as this moves along. So if you have any preliminary suggestions or need more information let me know.

beverly · August 2, 2013

importing xml is best done with an XSLT, of course.

Klypto · August 8, 2013

Alright, that definitely got me going in the right direction. I rebuilt some of the simpler script to import the data directly into filemaker (didn't even know it could do that) with some xsl files, one for static data and one for dynamic.

The problem I now am hung up on is the actual importing of the weapons data. They store everything under the same tag in multiple locations, for example:

<weapon_stat character_id="5428010618040144225" item_id="3700" last_save="1375814217" last_save_date="2013-08-06 18:36:57.0" stat_name="weapon_hit_count" value="5887" vehicle_id="5"/>

<weapon_stat character_id="5428010618040144225" item_id="3700" last_save="1375814217" last_save_date="2013-08-06 18:36:57.0" stat_name="weapon_play_time" value="226281" vehicle_id="5"/>

<weapon_stat character_id="5428010618040144225" item_id="73" last_save="1371069716" last_save_date="2013-06-12 20:41:56.0" stat_name="weapon_play_time" value="149511" vehicle_id="0"/>

In the example 2 of them have the statistics for "weapon hit count" and "weapon play time" in separate tags for the same item, the last one is an entirely separate item with the same stat group as the 2nd entry.

~~Should I be trying to grab it all at once is a series of non-nested loops or should I break them into separate imports, each focusing on one specific type?~~

<xsl:for-each select="weapon_stat[@stat_name='weapon_play_time']" >

To me it seems like all at once approach would be best, because it doesn't cause multiple HTTP requests which would probably be the slowest part of the process. However since I just started teaching myself xsl in the past few days, I don't really know how to do that "correctly" or if that is even possible. I'll still try different things in the meantime.

~~If it must use multiple imports, can I reference the XML stored in a field instead of external data?~~

Edit:

So I have the new scripts running for importing the weapon data and they are broken into 11 independent imports. It allows for all weapon data to be imported instead of a specific few for each character at a much, much faster rate than my original method would have taken, but the overall result is worse as each character now takes even more time to process. Basically it's greatly improved in a horizontal sense (scope), but slightly worse vertically, where it counts the most.

The problem is still the same as I mentioned before though, where they list the different stat types for a single weapon as separate entries with the same node tags, just different attributes. The server is now bottlenecked on the latency from HTTP requests to the API server, and with the xsl imports broken into 11 separate imports for a single character's weapon data, it makes 11 independent HTTP requests & imports. I would rather have one or two xsl imports that are run for each character, as that should be much faster (and far less drain on Sony's API servers, but I'm sure they can handle it).

Currently, I identify unique weapons to characters records (where the character weapon statistics are being stored) with a combined key of the character id, the item id, and the vehicle id. The imports match for these fields and add any new ones should there be no matches. That part works pretty well and the actual import into filemaker is pretty quick.

XSL example for weapon Fire Count:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns="http://tempuri.org/impt.xsd"
>


<xsl:template match="/single_character_by_id_list/single_character_by_id/stats/weapon_stat_list">
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
    <ERRORCODE>0</ERRORCODE>
    <PRODUCT BUILD="" NAME="FileMaker Pro XML Import" VERSION="fake" />
    <DATABASE DATEFORMAT="yyyy.MM.dd" LAYOUT="" NAME="" RECORDS="" TIMEFORMAT="k:mm:ss" />
    <METADATA>
		<FIELD EMPTYOK="NO" MAXREPEAT="1" NAME="Character_ID" TYPE="TEXT" />
		<FIELD EMPTYOK="NO" MAXREPEAT="1" NAME="Item_ID" TYPE="TEXT" />
		<FIELD EMPTYOK="NO" MAXREPEAT="1" NAME="Vehicle_ID" TYPE="TEXT" />
		<FIELD EMPTYOK="NO" MAXREPEAT="1" NAME="Value" TYPE="TEXT" />
    </METADATA>
    <RESULTSET>
        <xsl:for-each select="weapon_stat[@stat_name='weapon_fire_count']" >
            <xsl:variable name="c">
                <xsl:value-of select="position()" />
            </xsl:variable>
        <ROW MODID="$c" RECORDID="$c" >
		<COL><DATA><xsl:value-of select="@character_id" /></DATA></COL>
		<COL><DATA><xsl:value-of select="@item_id" /></DATA></COL>
		<COL><DATA><xsl:value-of select="@vehicle_id" /></DATA></COL>
		<COL><DATA><xsl:value-of select="@value" /></DATA></COL>
        </ROW>
        </xsl:for-each>
    </RESULTSET>
</FMPXMLRESULT>
    </xsl:template>
</xsl:stylesheet>

Pretty basic, but I have hardly any experience with XSL. I was trying to figure out a way of using "choose" to loop through all 'weapon_stat' nodes and then select the imported fields based on the 'stat_name' attribute, but I'm a bit stumped on the whole linear way in which the metadata to selected value works. I'm still looking into options and reading around to see what I can do with this, (maybe include variables or key selects?). The solution is probably a lot simpler than I think it is.

The other part I have also considered is caching the XML data locally before the import process as a semi-cheesy way of getting around the latency during the import, but then I would have the data twice, in XML form on the server and then again stored in filemaker. It would also suck up some disk read/write time to cache 6~ Gigs of XML data on a daily basis.

I could combine both for the most efficient import speed I guess, but I still don't know yet how to address the first issue.

Side Note:

So far in the current system it's only imported the weapons data of 195 of 14087 characters, resulting in 86,000~ weapon stats records. With that I would guesstimate that there are 440-450 weapon stat records per character, leaving the total at 6,339,150 records when finished with the current database. I'm hoping the server doesn't run out of disk space when I expand the character count, but I wouldn't be able to database all 1,314,300 characters since that would leave me at 590,000,000+ weapon stat records). Unfortunately, at the current rate there is no way I would be able to keep up with just the daily login count (estimated at 15K characters, or even a simple goal of 2-3K overly active characters).

Sign In

Processing XML Data & Script Performance

Recommended Posts

Klypto

beverly

Klypto

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information