Jump to content

Cleaning up text files for import


This topic is 7319 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hi I need help in cleaning up text edl files.

They out put like this

0016 BLK V D 060 00:00:00:00 00:00:02:00 01:06:45;26 01:06:47;26

0017 AUX V C 00:00:00:03 00:00:08:25 01:09:35;29 01:09:44;21

0018 AUX V C 00:00:00:00 00:00:10:17 01:09:47;11 01:09:57;28

0019 AUX V C 00:00:00:00 00:00:10:08 01:09:59;24 01:10:10;02

0020 6408 V C 00:26:26;28 00:26:29;17 01:10:41;07 01:10:57;13

PEG A 016 00:00:00 6408

* REPAIR: FROM SOURCE TRUE SPEED IS 4.869000 FPS

0021 6408 V C 00:26:26;28 00:26:27;29 01:11:00;17 01:11:07;01

PEG A 016 00:00:00 6408

* REPAIR: FROM SOURCE TRUE SPEED IS 4.869000 FPS

0022 6408 V C 00:26:28;00 00:26:28;00 01:11:07;01 01:11:07;01

0022 BLK V D 060 00:00:00:00 00:00:02:00 01:11:07;01 01:11:09;01

And I need them to be cleaned up like this

0008 AUX 00:00:00:00 00:00:05:28 01:04:40;00 01:04:45;28

0009 AUX 00:00:05:28 00:00:05:28 01:04:45;28 01:04:45;28

0009 AUX 00:00:00:22 00:00:06:18 01:04:45;28 01:04:51;24

0010 BLK 00:00:00:00 00:00:00:00 01:05:09;19 01:05:09;19

I have used Tex edit plus to do half of the work to clean it up this far. But these files are HUGE! And it takes to long to do this by hand.

Are there any links to text edit prasing and such stuff. ?

From the clean file I put tabs in each space so I can then import it into Filemaker pro. So I knopw that it does work, I just have to automate the text clean up part some how.

Version: v6.x

Platform: Mac OS X Panther

Link to comment
Share on other sites

I wrote an OS X text parsing AppleScript that you can modify to do this. It's attached. There is no documentation, I hacked up something a lot more complex I wrote to make this.

The downside is that it's slow. Walk away & eat dinner...

Link to comment
Share on other sites

Nice applet. I've just glanced through. I would suggest to tripdragon that he might want to get hold of either Tex-Edit Plus ($10?, shareware) or TextWrangler ($50). Both of them come with some AppleScript examples.

Both are also "recordable." TextWrangler is the small brother of BBEdit: http://www.barebones.com. It is awesomely powerful and fast. I once edited a 14MB text file no problem (with BBEdit Lite).

TextWrangler supports "grep," which once you learn a little can do wonders, esp separating text from numbers; basically any kind of pattern-matching.

You could also probably do this stuff with grep in Unix for free. But there would be more of a learning curve.

Link to comment
Share on other sites

CyborgSam said:

I wrote an OS X text parsing AppleScript that you can modify to do this. It's attached. There is no documentation, I hacked up something a lot more complex I wrote to make this.

The downside is that it's slow. Walk away & eat dinner...

I tried the applet. It did nothing different to the file. But it did create a new file which had no change.

Text Wrangler is very nice. It does almost all of what I need. One question though. With the find all instances feature that shows me all of the lines how do I get it to select all oof tose lines at a time to delete ? That would compleate half of the work right there.

Link to comment
Share on other sites

Hi tripdragon,

Your problem touches upon one of my favorite areas.

I'll start by saying that I agree with Fenton, the simplest way to go about this is to use one of the products by Bare Bones Software, or one with the abilities that BBEdit TextWrangler have.

Since I'm not familiar with "Tex edit plus" I don't know it's capabilities, and looking at the site information it doesn't mention Grep at all, and it does mention a file size limit.

On the other hand, Bare Bones' products have some very handy tools, including as Fenton said, the ability to use Grep Patterns. Greps allow you to Find and Replace multiple patterns, you can also use them to cut the lines containing patterns.

It's not clear to me by looking at your example, what you do with some of the lines because the "Before" text isn't the same as your "Clean Up" text.

Perhaps a little more impute here could save you a lot of time in your clean up.

HTH

Lee

smile.gif

Version: v6.x

Platform: Mac OS 9

Link to comment
Share on other sites

Sure. !

I almost figured out the method to use.

With BBedit or text wrangler. I found that I could vertical select text . !! WOO HOO! that will help so much for other stuff to ..

The messy text is like so...

* REPAIR: TO SOURCE TRUE SPEED IS 4.869000 FPS

0004 BLK V C 00:00:00:00 00:00:00:00 01:02:46;02 01:02:46;02

0004 AUX V D 020 00:00:18:03 00:00:23:03 01:02:46;02 01:02:51;02

0005 AUX V C 00:00:23:03 00:00:23:03 01:02:51;02 01:02:51;02

0005 BLK V D 020 00:00:00:00 00:00:00:20 01:02:51;02 01:02:51;22

0006 BLK V C 00:00:00:00 00:00:00:00 01:03:01;25 01:03:01;25

0006 AUX V D 020 00:00:18:03 00:00:22:27 01:03:01;25 01:03:06;19

0007 BLK V C 00:00:00:00 00:00:00:00 01:04:15;17 01:04:15;17

0007 AUX V D 020 00:00:18:03 00:00:22:27 01:04:15;17 01:04:20;11

0008 BLK V C 00:00:00:00 00:00:00:00 01:05:09;19 01:05:09;19

0008 AUX V D 020 00:00:18:03 00:00:22:27 01:05:09;19 01:05:14;13

0009 6510 V C 01:26:19;15 01:26:27;18 01:10:41;07 01:10:57;13

PEG A 050 00:00:00 6510

0010 6510 V C 01:26:19;15 01:26:22;22 01:11:00;17 01:11:07;01

PEG A 050 00:00:00 6510

0011 6510 V C 01:26:22;22 01:26:22;22 01:11:07;01 01:11:07;01

0011 BLK V D 060 00:00:00:00 00:00:02:00 01:11:07;01 01:11:09;01

And studing that text I have found that the

"V" and "C" and "D" in the center is always in the same row, this also includes the three grouped numbers. like this: V D 060

V D 060

V D 060

V D 060

V D 060

V D 060

They are also always in a line on the charecters number 13-27.

So if I can remove all charicters on those lines 13-27 that will get half done.

Then if I could remove all lines that begin with

strings that I choose to not be useful

Like " * REPAIR: ---

So if I could wild card all lines that begin with something like * REPAIR: and PEG A

That would de the full compleate clean up ! Yahooo! No more copy paste till i'm blue in the face

Version: v6.x

Platform: Mac OS X Panther

Link to comment
Share on other sites

You write a grep using the pipe character two separate the finds. In other words, you can cut the lines containing REPAIR: and PEG A in one swoop. Using the tools, select "Process Lines Containing" than copy this in the box

PEG|REPAIR:

then, select these buttons:

Use Grep

Case Sensitive

Delete Match Lines

I own BBEdit 5.1, and it doesn't have the Vertical Select Text ability, how do you do this?

Lee

Link to comment
Share on other sites

I'm guessing that the "terminal" is unix or something.

I can't get it to work either. Is this Perl or something. It does contain some RegEx in it, but it appears to be a script that will process your file or something.

Did you ask the other person helping you about it not working?

lee

Link to comment
Share on other sites

sweet! Happy!

That did it!

A few things now.

to do the vertical select hold down option

How can I automate this as an apple script or someting near it.

What command do I use to do the make all spaces into into tabs and all tabs into spaces ?

Link to comment
Share on other sites

If there is a pattern of more than one space (2 or 3) in between data, you can use the Entab under the Text Menu. but one for one, you have to do a search for (space, just enter from the keyboard) and replace with t (t= tab in BBEdit)

Link to comment
Share on other sites

When you want to automate conversion of a large text file you don't want to be selecting columns with the mouse (though this is a cool feature, very useful for fixing up irregular spreadsheet files or tables copied from the internet). Here's what you want to do:

Find lines beginning with any text (not numbers)

^[^d].*

Replace with nothing

Find space, V C or V D followed by a space, 3 numbers and a space (watch the spaces, grep)

V [CD] ddd

Replace with t

Find spaces (single or multiple, grep)

+

Replace t

[OK, the above is missing leading spaces in places, 'cause HTML strips them. Use the AppleScript below.]

When I said BBEdit (or TextWrangler) is "recordable," it means you can manually turn on AppleScript Recording, under the Scripts icon, then do the preceeding search/replaces, and it will record them as an AppleScript script, then ask you to save it. You can then put the resulting file into BBEdit's Scripts folder and it will be available. You can even assign it a command key.

This is what it looks like. It adds all the default parameters, which you can generally ignore; or even remove, but there's no need:

tell application "BBEdit"

activate

replace "^[^d].*" using "" searching in text 1 of text window 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:true, extend selection:false}

replace " V [CD] ddd " using "t" searching in text 1 of text window 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:true, match words:true, extend selection:false}

replace " +" using "t" searching in text 1 of text window 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:true, extend selection:false}

replace "rr+" using "r" searching in text 1 of text window 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:true, extend selection:false}

end tell

Link to comment
Share on other sites

I should have been more clear that the applet does NOT do any processing as I posted it. You have to enter the AppleScript code to do this.

Sounds like your finding BBEdit the easiest way to go.

Version: v7.x

Platform: Mac OS X Panther

Link to comment
Share on other sites

Fenton said:

tell application "BBEdit"

activate

replace "^[^d].*" using "" searching in text 1 of text window 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:true, extend selection:false}

replace " V [CD] ddd " using "t" searching in text 1 of text window 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:true, match words:true, extend selection:false}

replace " +" using "t" searching in text 1 of text window 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:true, extend selection:false}

replace "rr+" using "r" searching in text 1 of text window 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:true, extend selection:false}

end tell

This stuff! Is great!

I am very far busy right now. I have more question and I will send them here soon. Thankyou laugh.gif

Link to comment
Share on other sites

This topic is 7319 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.