image sync speed improvement

fm easysync

Followers

July 8, 201511 yr

I was running some speed tests with EasySync recently and was surprised to see how long a sync took after I added a few images to the the sample record set that ships with EasySync. It was taking over 5 min. to download those records from FileMaker server which was installed on the same machine as FileMaker Pro. A competitor's product was doing the same sync in under 10 seconds.

The interesting part was that all records in the sync took much longer to process, once the images were added. In other words, the fact that I added some images to record #1 made records #2-100 take exponentially longer to process. So, I did some digging and found that the entire payload was being saved to a local script $variable in the Pull Payload script. When the payload included images, that $variable was really large; too large to all fit into FileMakers available memory. This caused the number of page faults to increase by about 600,000 while the records were being processed.

My theory was that if I could reduce the amount of data store in memory (variables) at once, then I could reduce the time it takes to process a large payload. It turns out this basic concept worked. The sync which previously took over 5 min. now only took 30 seconds! (and page faults barely increased at all)

I've attached a file with these changes that prove my theory. I ended up saving each segment returned from the server to a new record in the EasySync table, then only extracting a single record at a time from the payload, reading from each payload segment as necessary.

NOTE: the attached file is a proof of concept ONLY! Please do not use it as-is in a production; it has not been tested and will probably fail in all but the most obvious circumstances. Also, my system crashed once while I was working on this file, so it's been closed improperly at least once. DO NOT USE IT IN PRODUCTION!

I think I may start a fork of this project which includes this change; I'll post back here if I do.

FM_Surveys_Mobile_v1r3_DSMOD.fmp12

July 8, 201511 yr

Excellent work! I am also seeing very slow syncs once images are included, even "small" images from the iPad camera. I'm interested in your fork and would be willing to do some testing on our image-heavy sync systems.

I think I saw an effect of this problem when debugging easysync myself. If my data viewer is open the entire system crawls, sometimes taking 30+ seconds to even redraw the data viewer window on a fast mac. The only other time I saw such slow behavior from the FMP client was another system that was setting and parsing very large variables.

July 8, 201511 yr

Yes, wonderful work! I will try to contribute as best I can. Looking forward to your testing feedback, Jodin.

3 months later...

October 28, 201510 yr

Hopefully, new improvements would be made on the current version.. More grease to your work

November 4, 201510 yr

Author

I'm in the middle of modifying EasySync right now. I was close to done when I had to switch to other work, but I'll be picking it up again before too long. You can see the current progress on the dev branch of my GitHub repo: https://github.com/dansmith65/FileMaker-EasySync/tree/dev

2 months later...

January 15, 201610 yr

Newbies

Dan,

Curious if you have visited this recently. Your last note on this thread said you were still in the middle of modifying EasySync, and that was a couple months after the last submission to GitHub.

We have been testing out EasySync, and are running into some of the performance issues you talk about. We don't have any container fields, but a solution where different users will sync different sets of customer and products. The product tables have tens of thousands of records, but each user will sync a few hundred to a few thousand. Testing on some of the small tables, it goes pretty fast, but as soon as the payload gets above 1MB, things slow down. At 3MB performance becomes unacceptable, taking over an hour to process a couple thousand records. It's odd because gathering and downloading the payload from the server doesn't take that long. It's just the looping through and setting all of the fields and creating the records locally that bogs down. Eliminating unused fields helped a lot, but it's still not enough.

Have you done testing on larger sets of records, with or without container data?

Are the modifications you have done easy to apply to a solution already set up with EasySync, or would you recommend tearing ES out and starting over?

Have you tried any of the paid sync solutions? Curious if they have similar performance issues.

Appreciate the research and work that you have done!

Thanks!
-Shawn

January 17, 201610 yr

Author

I've gotten sidetracked with other work so I haven't worked on this much since the last commit on GitHub. I will get back to it eventually, though.

I haven't done testing with many records like you are doing, but I have every reason to believe that it has the same impact as including large container data in a sync. That is to say that the changes I'm working on will speed up your syncing as well.

I'm not yet sure what the upgrade path from the last version of EasySync to my first release will look like. There shouldn't be any reason you can't modify your current EasySync installation with my changes, but I also don't expect that process to be simple.

I did a comparison between EasySync and GoZync and GoZync did not have the same performance issues with large data sets that EasySync does

FYI: so far, I've mostly be doing refactoring of the code to use my development practices and adding logging and timing code. The next task on my list is to apply the technique to process a payload that I mentioned in the original post of this thread.

2 weeks later...

January 31, 201610 yr

Hey Dan (and anyone else interested),

I did some fiddling recently and found another method that seems to resist the issue of $variable memory buildup when reassembling multiple payloads. Unfortunately the code I refactored was in EasyDeploy, not EasySync, so I don't have any code to share as it relates to this topic (yet).

Turns out using a repeating global field accomplishes the same objective. The part I like is there's no real record management to deal with. Fill up the repetitions with the payloads, do the parsing (I re-assembled the payloads into a single global field using a dynamically constructed Evaluate (), seems to be quick so far ), then clear them out when done. All nice and neat in two globals.

...some food for thought.

The above being said, I still hate repeating fields.

February 2, 201610 yr

Good call, Perren. I think this method has been used in other non-EasySync solutions like HyperList as well.

February 2, 201610 yr

Newbies

Perren,

Interesting find. The only thing I wonder about is how many repetitions do you have? That has to be pre-defined in the schema, correct? What happens if your sync is too large for the number of repetitions you have? Or with Deploy, if your new solution file adds a lot of great features, and ends up being larger than you anticipated?

We did some playing with dansmith65's proof of concept file (on the OP, not the one on GitHub), and tweaked it to get it working for small project. Our project did not have a need for the "replace" method, so we didn't have to worry about that piece that was incomplete. We experimented with the max pull size setting, and found a sweet spot for us around 250,000. At 500,000, we could see the processing of each segment slow down, and then get back up to speed at the beginning of the next segment. Much lower than that (we also tried down to 100,000), and the processing went a little faster, but you had more round trips to the server, so overall performance declined.

Looked at GoZync, and I can see why it doesn't have the slow-down issue. With the intermediary file making the connection between the hosted and mobile files, it does all of it's Set Field operations there, directly from one to the other, so it is only dealing with 1 record/field at a time, and not storing a big payload in a field or variable.

Great to see all of the experimenting going on!
-Shawn

February 3, 201610 yr

Author

Thanks for mentioning this @Perren. I do like the idea of not having to add another portal to the layout and jump between the that new one and the existing one, like my original proof of concept did.

My initial concern was the same as @flybynight, but then I ran some numbers and realized that you can easily set the max repetitions high enough to account for any amount of data you would want to sync. A field can have a max of 32,000 repetitions, which means it can store 15GB of data if each segment is limited to 500,000 characters:

500000 * 32000
/ 1024  /* KB */
/ 1024  /* MB */
/ 1024  /* GB */

If you want to figure out an appropriate number of repetitions for your system, here's a handy FM calc:

Let ( [
	/* set your desired max payload size and segment size */
	megabytes = 50 ;
	segmentSize = 250000
] ;
	Ceiling ( megabytes * 1024 * 1024 / segmentSize ) & " repetitions required"
)

I wonder if there are any negative or unexpected side effects of using many repetitions? I've never used more than a dozen or so.

4 months later...

June 16, 201610 yr

I think we found an issue with the storing of segments. Seems that you assume that a segment wouldn't get cut in the middle of <record>, since you search for the position of "<record>" when you reassemble the segment records . Am I not seeing how you may have accounted for that possibility?

June 16, 201610 yr

Author

I stored segments the same way they were returned by the server, but I combined them, as necessary, in the "Loop over the records" section. So, I loop through all the segments if necessary, until "<record>" is found, adding the result to the $record variable...

At least that's what I was trying to do, and I think it's working.

June 17, 201610 yr

i believe the problem is that you assume the tag is in one piece. It could have been split across segments. <rec in one and ord> At the start of the next saved segment. So, you'll never find the end of that record.

In tim's original code, he built his payload immediately, and so when the next segment was appended, the tag would be in one piece again.

We we too are trying to avoid the huge $var. So, we're looking at creating the segments with more care, so as to never break a tag.

June 17, 201610 yr

Author

Thanks! I hadn't noticed that yet. I created an issue for it on GitHub.

June 18, 201610 yr

So, we decided to more carefully create segments so they never split a tag. I would advise EasySyncers to do the same if they intend to use this modification.

June 18, 201610 yr

Newbies

bcooney,

Care to go into any detail as for HOW you are more carefully creating the segments so they never split a tag?

We haven't run into this issue… but I suppose it's just a matter of time?

Thanks!
-Shawn

June 19, 201610 yr

Are you sure you haven't run into the issue? I don't know how you'd avoid it.

When I get a chance I'll post our approach.

4 weeks later...

July 11, 201610 yr

I didn't expect this and hadn't faced it yet.

Solved by checking if segment position =1 and then concatenate the $record part to the beginning of the next segment and modify the final $record assignment.

Sure " segments never split a tag" solution would be better but I don't know how you did it.

2 weeks later...

July 22, 20169 yr

What Barbara and I did was this:

//In the EasySync script "Push payload"

//From the original script g

//Given

$temp_recs = Evaluate ( $dyn_esql )

//and

$$max_push_segment_size //set to some value in the ES settings script

$start = 1

$find = "</"

//set a variable

$string = Middle ( $temp_recs; $start ; $$max_push_segment_size)

//set a variable

$length //as seen in the attached "Parse Data" pdf

//Then use that variable to figure out how much text to grab from the incoming string

$segment_recs = Middle ( $temp_recs ; $start ; $length )

//then

$start = $start+$length

//See the attached Script.tiff for a screen capture of the script section above...

Hope this is helpful.

mark

Parse data.pdf

Script.tiff

3 weeks later...

August 10, 20169 yr

Was this clear?

1 month later...

September 12, 20169 yr

.. anyone able to contribute sample file, with all the above proposed changes -> to not overload single variable on mobile (client) as well as to take care of not braking embedded tags at the same time?

I may try to assemble the above 2 parts on my own, but me and others would really appreciate if anyone would be willing to provide his complete implementation here.

September 12, 20169 yr

Time. If I find the time... this stuff is embedded in our solution and I can't post that due to NDA etc.

September 12, 20169 yr

Totally understand. Actually if anyone would find time ..

.. (I) we'd probably only need sort of extract of what has changed and where. If it's based on dansmith65 sample file then only further addition would be needed.

I will probably try to do it on my own this weekend if enough time