Removing Unwanted characters

Ascarine · November 19, 2013

Hey all,

I've been looking around the internet trying to find the answer to this but can't seem to find someone that's having the same problem I am. What I'm looking to do is remove every character to the right of a specific point in a string. The string has Markup tags and I want to remove everything after the last one.

Here's an example, the string I pull down has a long value and at the end has the following:

"</stringName>

Æö± µþ"

The ending is seemingly random UTF-8 transmission which I want to cut, and as it's random it doesn't have a fixed length of characters and therefore I can't use a LeftString - X Right characters, and as they're not whitespace I can't use Trim.

If anyone knows a way to remove characters after a certain point it would be greatly appreciated.

comment · November 19, 2013

Does the string always end with the exact tag "</stringName>"? And is this the only time this tag appears in the string? If yes to both, try =

Left ( string  ; Position ( string ; "</stringName>" ; 1 ; 1 ) + 12 )

Ascarine · November 19, 2013

Can you explain what that would do? It may or may not work, I've not tried it yet, but I'd like to understand what the purpose for each part is.

Edit:

Have just tested, it almost works. Here's a live example:

"</stringName>

N†cÒÄ>•Æ‰êcÒÄ>cÒÄ>"

Having run the command you suggested I get down to:

"</stringName>

N"

Double Edit:

Changed to 11 from 12 and it now works, but can someone please explain why if possible

comment · November 19, 2013

Are you sure you are using this exact tag "</stringName>"? The number 12 is calculated from the length of this tag; if your actual tag is different, you must adjust it accordingly.

Ascarine · November 19, 2013

No it wasn't that exact tag as it's a project with my company and I can't give out information on what, so I used a replacement tag, however I've got it working with the real tag now. If you can explain how it works I'd be grateful

And if it was the length of that exact tag, shouldn't it have been 11?

comment · November 19, 2013

Why 11? The length of the tag is 13, and the general formula is =

Left ( string  ; Position ( string ; tag ; 1 ; 1 ) + Length ( tag ) - 1 )

The - 1 is required because the Position() is the position of the first character of tag.

Ascarine · November 19, 2013

Ok I understand why you said 12 then, but can you explain how it works?

comment · November 19, 2013

Suppose your string is "abcSTOPgarbage":

Position ( string ; "STOP" ; 1 ; 1 ) = 4

Left ( string ; 4 ) = "abcS"

Left ( string ; 4 - 1 ) = "abc"

Now, if you want to include the "STOP" tag in the result, you must add 4 more characters (the length of "STOP") to get to =

Left ( string ; 7 ) = "abcSTOP"

So you can see that we have eventually arrived at the number 7 by:

Position ( string ; tag ; 1 ; 1 ) - 1 + Length ( tag )

LaRetta · November 19, 2013

Left ( string ; Position ( string ; "</stringName>" ; 1 ; 1 ) + 12 )

You always start with the inner set of parentheses so:

Position ( string ; "</stringName>" ; 1 ; 1 )

Find the beginning location of "</stringName>" within string starting at position 1 and find the first occurrence

Then you look to the outer set of parentheses

Select the string starting from the Left and collect the number of characters specified in its second parameter

Left ( string ; 'the result from the inner parenthesis' )

--------------

I see Michael just responded but I'll include mine as well. :-)

Justin Close · January 30, 2014

Bit late to the part, but wanted to add something. A slightly more robust version would be to start the Position() search at the end of the string instead of at the start. Searching from the start assumes that the first occurrence of "</stringName>" is the ONLY (and thus the last) occurrence of it. Which may in reality be true depending on the contents of the input, but just in case:

Position ( string ; "</stringName>" ; Length(string) ; -1 )

This will go the end of the string and search backwards for the value "</stringName>".

--C

comment · January 30, 2014

Searching from the start assumes that the first occurrence of "</stringName>" is the ONLY (and thus the last) occurrence of it.

See post #2.

Justin Close · February 1, 2014

Oops! Yep, you covered that. But mine can apply if the answer to 'is this the only time it appears in the string' question is 'No'. And it can be used also if the answer to the question is 'yes'.

comment · February 1, 2014

Not to beat a dead horse, but your answer is guilty of an assumption, too. If the text contains:

"Some text we need for certain</stringName>borderline case</stringName>undisputed garbage"

then your answer assumes we want the "borderline case" part, and mine assumes we don't. OP was somewhat ambiguous on this point, hence it's best to ask (or at least state the assumption explicitly).

Justin Close · February 2, 2014

Ah, true.

Sign In

Removing Unwanted characters

Recommended Posts

Ascarine

comment

Ascarine

comment

Ascarine

comment

Ascarine

comment

LaRetta

Justin Close

comment

Justin Close

comment

Justin Close

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information