David Jondreau Posted November 13, 2010 Posted November 13, 2010 I'm trying to scrape a web page and clean it up. I've subbed out spaces, tabs, pilcrows, and something that showed up in a text editor as a diamond. But I'm still getting a couple hundred line breaks. What character am I missing? I take the page source, apply this code Let([ source = GetLayoutObjectAttribute("viewer"; "content"); clean=Substitute(source; [" "; ""]; ["¶";""]; [" ";""];[" ";""]) ]; clean) and before I get to the initial text of -//W3C, I get this: " " and so on for a 209 ValueCount(). It's driving me nuts!
comment Posted November 13, 2010 Posted November 13, 2010 Hey David, give us something to chew on. -- BTW, congrats on the certifications. Your replies are much smarter now. :P
fseipel Posted November 13, 2010 Posted November 13, 2010 (edited) It sounds like the literal characters cannot be pasted directly into the substitute command. In any event, to determine the unicode point values of the offending characters, use Show Custom Dialog Code(Middle ( clean ; 1; 1 )) & "," & Code(Middle ( clean ; 2; 1 )) & "etc" Once the code point values are known, the substitutions can be made using the values as opposed to the literal characters using char, the inverse command of code: clean=Substitute(source; [Char(codepoint1); ""]; [Char(codepoint2);""]) You may also consider filtering the text for alphanumeric characters, brackets, etc to clip other values, but that may suffer slow execution speed. I'm assuming here that simply pasting the characters from the field into the substitute calculation, isn't working. You might also want to look at ScriptMaster, since regular expressions are supported, it is easy to clip all the tags and other characters, the SM demo file has an excellent example of stripping all tags. Unfortunately, basic character code functions were only added in FM10, so if it needs to run in earlier versions, you can always store the 'un-pasteable' character values you need to substitute for in global fields. Edited November 13, 2010 by Guest
David Jondreau Posted November 13, 2010 Author Posted November 13, 2010 comment: When it's Friday at 6pm and I've been stuck on a stupid problem for an hour, I just need to get it into the aether, good examples or no. Funny thing is, I know more about FileMaker from taking the exams. I certainly know what I don't know, which is considerable! fseipel: Thanks, just blanked on Code(). It was line feed/LF/Code 10, but that doesn't paste into FileMaker, instead it pastes as a space/32 (!) which was really throwing me for a loop.
fseipel Posted November 13, 2010 Posted November 13, 2010 I'm glad you were able to resolve this, I have had this problem before, but didn't remember which character(s) were "un-pasteable" (is that even a word?). It may be only Chr(10). It was particularly frustrating for me also, because when you paste this, no alert is given that FM changed what was in the clipboard. One would think any character that can be stored in a field, can be stored in a calculation, but that isn't the case. It just occurred to me you ought to be able to type Chr(10) using the keyboard escape (Alt+0010) under Windoze, or alt+fn on a laptop with no numeric keypad, but that doesn't work, either -- if you type it into the field, it gives Char(13) instead. If you type it into the calculation, it transforms it into Char(32), the space character, which is consistent with what happens when you paste.
Recommended Posts
This topic is 5143 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now