dkey Posted September 2, 2015 Posted September 2, 2015 (edited) Hi all I need to extract from my students PDFs certain dates they appear originally as follows: Printed: 28/08/2015 01.15 the seconds are missing and the separation is a full stop rather than a ":" In the TOOL menu I successfully tested the calculation field and it works throughout the hundred files I have in the MONITOR window This is the function: Substitute ( Middle ( Prova::Mail ; Position ( Prova::Mail ; "Printed: " ; 1 ; Prova::PcntPrinted ) +9 ; 30 ) ; ["." ; ":"] ;[¶ ; ""]) & ":00" The Mail text field, extracted from PDFs files, has several, "PRINTED: " patterns: hence I use the fie4ld "PcntPrinted" which returns last time the "PRINTED: " pattern appears in the file. Creating a calculated field as timestamp with the very same function instead returns everything except the final "00" seconds I tried to use the GETasTimestamp and the the function but it only returns the correct digit if the calculated field is a TEXT field As such I cannot use it while sorting which is essential as I receive several papers daily Any Hints Thanks Edited September 2, 2015 by dkey
Raybaudi Posted September 2, 2015 Posted September 2, 2015 (edited) Try this calculation: Let([ text = Prova::Mail ; count = PatternCount ( text ; "Printed:" ) ; string = Middle ( text ; Position ( text ; "Printed:" ; 1 ; count ) + 9 ; 16 ) ; d = Left ( string ; 2 ) ; m = Middle ( string ; 4 ; 2 ) ; y = Middle ( string ; 7 ; 4 ) ; t = Substitute ( Middle ( string ; 12 ; 5 ) ; "." ; ":" ) ]; Timestamp ( Date ( m ; d ; y ) ; t ) ) Then you can trash the calculated field "Prova::PcntPrinted" Edited September 2, 2015 by Raybaudi t was wrong
comment Posted September 2, 2015 Posted September 2, 2015 t = Substitute ( Middle ( string ; 12 ; 16 ) ; "." ; ":" ) 16? Why not 5?
Raybaudi Posted September 2, 2015 Posted September 2, 2015 oh, ofcourse! Thank you, I'm going to edit it.
dkey Posted September 3, 2015 Author Posted September 3, 2015 (edited) Thanks I will try ,,,, howe3ver what puzzles me is the difference between the TOOL menu result and the "Question mark" I receive while usibg the very same formula in the actual calculation field. It seems to happen in other cases too My process is always the same: I receive the PDF files which I convert to text using the Automator conversion Function As Automator leaves a lot of text garbage more or less hidden (as HTML code and more) from the PDF files I use TextWrangler to clean it up. The garbage is always the same patterns and TW great Applescripts implementation allow me to do it in a breeze. However Filemaker (V13) and Text Wrangler dont really talk the same language. I am not sure about the Unicode encodings. FMP only tells me Unicode 8 or Unicode 16 while TW has several different Unicodes both in Y8 and U16 Could this be one of the problems? Such as hidden chars which confuse the conversion, the functions and the calculation? Edited September 3, 2015 by dkey
comment Posted September 3, 2015 Posted September 3, 2015 (edited) what puzzles me is the difference between the TOOL menu result and the "Question mark" I receive while usibg the very same formula in the actual calculation field. You didn't say anything about a question mark in your original post. If your calculation is set to return a result of type Timestamp, and the actual result is a question mark, then the result is not a valid timestamp. So most likely you do have some additional garbage in the field. Why don't you post a small example file? Edited September 3, 2015 by comment
dkey Posted September 3, 2015 Author Posted September 3, 2015 (edited) Ok this is one sample as I clean it up in Text Wrangler before importing it in FMP 13 Aiko Class 25798, 2015-2016 This is a sample file. The patterns I use are "the year 2015" to collect the dates I need for the first mail sent and "Printed" although I only care about the last one. We all deal through the College Site where all messages are posted. As you can see the date are at times without zeros and the time is separated with a ".". I created 3 formulas: the first gives me the initial date asw each page is about a specific topic. the 2nd for the last date which is always different from the "printed date" The last printed date Film Project: Back to the same old ways: a memory Profile: http://•••••••••••••••••/ID/KU23AVJ1981 Hello Sir Our team is "almost" ready We still don't have the wardrobe and props set up However locations and casting are all set We should be able to send you everything in 3 days Aiko 2/08/2015 23.45 Dear Aiko Don't worry! However I feel your schedule is too tight. Please be careful with dates times and permissions 2/08/2015 23.48 Printed: 3/08/2015 1.04 Thank you for your reply We will be careful: can we talk to you directly at your earliest convenience? Still waiting for the props Aiko and Lou 9/08/2015 14.13 OK we'll talk about it in person Best 10/08/2015 18.56 Lou I didn't receive the script ... please mail again The whole team should come to my office Tuesday Everyone !!!!! and with the script updates Best 10/08/2015 08.14 Printed: 17/08/2015 21.24 Edited September 3, 2015 by dkey Some editing and additions
comment Posted September 3, 2015 Posted September 3, 2015 I am afraid this is not very helpful for detecting hidden characters. Could you zip the text file and post it here as an attachment? And explain the exact procedure you use to import it?
dkey Posted September 3, 2015 Author Posted September 3, 2015 Thanks I just find out what could be the problem: The original process: 1-Automator extracts the text which I Print as PDF from Chrome 2-I clean up the text in Text Wrangler and save it as Unicode8 with no additional options (Little Endian, With Boom) etc. 3-Import in FMP 13 as Unicode 8 text I am now certain FMP has its own way to read Unicode, as every wrong date I found re-exporting in Text Wrangler adds a variety of different codes: mainly the "\0x00" they are invisible in FMP and I am unable to understand where they come from. I am not sure what \0x00" stands for but this is what it does in certain text files. Not in all of them. Probably each student uses a different encoding when they post their topics. While everything stay invisible the various conversions generate hidden "\0x00" or others code in certain PDF files. Probably FMP, while calculating the text files, also evaluates what is hidden. So even if the dates appear as a sequence of 16 digits, following the same pattern in MAIL text field when I can calculate them as from your help, many dates return the question mark. I serched all the records returning an error in the calculated field and exported them back in Text Wrangler. Playing around with the "ZAP Gremlin" function I was able to see what was hidden and I believe generates the errors. So the question is how to import in FMP using an option which will not insert these hidden garbage --- But I dont know if this is the proper forum where I can find the reply Thanks again and hopefully you still have a suggestion Dante
comment Posted September 3, 2015 Posted September 3, 2015 I am now certain FMP has its own way to read Unicode IMHO you are jumping to conclusions. Why don't you post that file as I suggested so that we can have a look at it?
dkey Posted September 21, 2015 Author Posted September 21, 2015 (edited) Thanks I solved the problem reexporting the files as text into Text Wrangler and cleaning up all the Gremlins. After reimporting the "clean" tab file the filemaker scripts worked perfectly I don't know how the happened to appear but they made the original script useless. I might be wrong but the first import into the DB while using USB-8 and reexporting it as USB-8 intext wrangler gave the visible "gremlins" and a slightlyb larger file than then original file import However I was unable for privacy reasons to send students tiles with personal details as an example .... Edited September 21, 2015 by dkey
comment Posted September 21, 2015 Posted September 21, 2015 I might be wrong I suspect you might be - but we won't know for sure until we can reproduce the problem.
Lee Smith Posted September 21, 2015 Posted September 21, 2015 (edited) However I was unable for privacy reasons to send students tiles with personal details as an example .... There are ways around this, read this. Save a copy of your file with no records. If some data is needed to show you problems, then create a few bogus records. We are not interested in your confidential information, we are only interested in solving the problem as you see it. Next, you need to zip the demo file and add it to a Reply to this thread. If you have questions about this, contact me by Private Message. Lee Edited September 21, 2015 by Lee Smith
Recommended Posts
This topic is 3687 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now