Jump to content
View in the app

A better way to browse. Learn more.

FMForums.com

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Removing extra carriage returns

Featured Replies

  • Newbies

I have text that I occasionally have to copy & paste from a newspaper PDF file. The text comes in formatted like this:

ELON—Elon University

announced it has surpassed

a $100 million fundraising

goal in what the school

calls its largest-ever

fundraising effort.

The private college has

raised nearly $106 million

in its “Ever Elon” campaign,

which was launched

three years ago. The college

will continue collecting

donations through the

end of the year.

I need to find a way to strip out the "extra" carriage returns without removing the ones that are supposed to be there. It seems like these is usually a letter followed by a carriage return, followed by a letter. But sometimes there is a comma, sometimes a hyphen, a number, etc. If there is a period or a quote followed by a carriage return, those are usually the ones that are supposed to be there. I have tried pasting unformatted text into various programs, but it seems that whatever program the newspaper uses to create the PDF file hard codes the carriage returns.

I would like the text to look like this:

ELON—Elon University announced it has surpassed a $100 million fundraising goal in what the school calls its largest-ever fundraising effort.

The private college has raised nearly $106 million in its “Ever Elon” campaign, which was launched three years ago. The college

will continue collecting donations through the end of the year.

Has anyone delt with a similar problem? Is there some kind of pattern count or similar function that I can use? Or some way to find each carriage return and evaluate the character preceding it before either keeping it or substituting it with nothing?

Right now I am removing all the carriage returns, but then I have to go back in and add the ones where each paragraph starts. Not a big deal for a short story, but a major pain for a long article!

Any help will be greatly appreciated!

I'd go back to the original text and examine it on character level for any differences between line and paragraph separators.

If they are indeed the same, you are playing a guessing game - and there's no way you can win them all. For example, you could assume that any period, question mark or exclamation mark, followed by a carriage return is the end of a paragraph - but then all your paragraphs would be one sentence long. OTOH, a list like:

a) this;

B) that;

c) another

will be mangled into one line.

:iagree:

Hi rekates,

I have text that I occasionally have to copy & paste from a newspaper PDF file. The text comes in formatted like this:

ELON—Elon University

announced it has surpassed

.... snip

As comment said, parsing or extracting text can be a mixed bag. The tool I use for this, is TextWrangler, by Bare Bones Software, http://www.barebones...gler/index.html, a free text editor that has grep patterns capabilities, which can come in handy when dealing with this type of situation. Once I see what your text looks like, I can supply you with the Find and Replace Grep Patterns you'll need to deal with it.

There are many threads about this need on FMForum dealing with this kind of need. To research futher, do a search for Parse or Extract and Text,

BTW, I went to ELON's and captured one of the articles and pasted it into TestWrangler, and all of the paragraph returns were normal. I went to our local newspaper site and did the same process and again, all of the paragraph returns were normal. Perhaps you could give me the link to the particular text and I'll try it with TextWrangler and see if get the same thing.

Lee

Create an account or sign in to comment

Important Information

By using this site, you agree to our Terms of Use.

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.