Advanced text calculation - nut to crack...

innodat · September 27, 2007

Hey guys!

I've been racking by brain about this... The idea is to be able to break-down lines of a theater play into separate records (to take line-specific notes, change the order, etc.).

The starting point is a text field with the source text, as follows:

RECORD 1 (global field)

Kelly. Hi George, how is it going?

George. I had a long day

Debby. Aww, George!

Kelly. Shut it.

George. Ok now, that’s enough.

The end result needs to look like this:

RECORD 1

field1: Kelly

field2: Hi George, how is it going?

RECORD 2

field1: George

field2: I had a long day.

RECORD 3

field1: Debby

field2: Aww, George!

...

You get the hang of it. All plays published have one thing in common: whenever a different character starts speaking, it's after a line-break and the line starts with the character name and a period:

¶George. or ¶Kelly. or ¶Debby. etc.

I've been experimenting with a lot of different text functions - just couldn't get it right. I'm not going to list all my efforts here... I don't think that would be helpful.

Does anyone have an idea?

Thank you so much for your input!!!

Michael

Edited September 27, 2007 by Guest

innodat · September 27, 2007

PS: I know it also takes several script steps to make this work - I'm not worried about those, just the text calcs...

comment · September 27, 2007

Actually, this is quite easy. The difficult question is: do you have records that are NOT dialog lines, and if so, how can you tell them apart. For example: "Night. Julia walks onto the balcony."

BTW, I believe there are applications much better suited for this type of 'data'.

innodat · September 27, 2007

hey there, I'm not worried about the 2% of lines which could (will) pull in wrong.

I'm working on a database for stage managers. This separation of text will lay the ground work for taking notes on blocking and other things, print reports etc. - which is just one section of the entire program.

Easy or not, I'm stuck in my though process, and would appreciate a tip..

Thanks!

Edited September 27, 2007 by Guest

Fenton · September 27, 2007

Also, what about long lines of dialog. Do any of them have a carriage return in them? Because they will imported as a separate record (with no name at the beginning).

You might want to look into using a grep (or regex) capable text editor, to go through and put a tab between the names and the dialog.

Find: ^[^.]+.

Will find the text up to and including the 1st period. One could get fancier and omit the period itself. You can then add a tab.

Fancier:

Find: (^[^.]+)(. )

Replace: 1t

comment · September 27, 2007

Well, OK. But this point is not clear: do you now have the entire text in a single field of a single record, or have you broken each line into a separate record?

aholtzapfel · September 27, 2007

Here is a sample file to look at. It was easier for me to just throw together a sample than explain it. It does assume that the only Carraige Returns are at the end of the line. If that isn't always true, it won't always work. It does work on the sample you posted. If this doesn't work for you, let me know how it fails and I'll try to help.

PLAY.zip

innodat · September 27, 2007

Hello aholzapfel,

thank you for the file, amazing! I does just what I need. EXCEPT...

As you pointed out: there *will* be line breaks within dialog.

I wish there was a way to include an additional condition, as opposed to just line breaks for the value count.

If there was a filed containing a list of character names, could the value count react to occurrences like 'line-break & characterXX.', 'line-break & characterXY.', 'line-break & characterXZ.' ? There might be up to 30 characters in a play... But this is a 99% accurate condition:

"¶characterXY."

Thank you so much for your help!

Michael

Edited September 27, 2007 by Guest

innodat · September 27, 2007

The source text is in one (global) field.

A script/play is scanned with OCR, then copied into a (global) field in the database. That's the starting point.

Edited September 27, 2007 by Guest

innodat · September 27, 2007

I understand your idea: add a new character (tab) between character name and spoken text to identify values. But what if there's tabs in other places? I would also like this to be a "no-brainer" for the user, and stay in FileMaker

comment · September 28, 2007

Having a list of characters would certainly help, though it still wouldn't be 100% safe. You could check that the line starts with a character name, followed by a period. But that could also happen with spoken text, e.g.

Kelly. Hi George, how is it going?

George. I had a long day. I think I want to kill

Debby. She's impossible.

Fenton · September 28, 2007

You could do it like aholtzapfel's file. But instead of creating a new record right away, you could load it 1 line at a time into a global field, then figure out what to do with it (rather than immediately setting fields with it). So, if there were line breaks in the dialog, the next line would not start with a name, and you would know that that line still belonged to the last record you'd done; so you'd append it instead of making a new record.

When you came to another line with a name starting it, you'd know to make to new record for it.

The idea being that you would just import the entire text file into 1 field, likely a global field. Then run through it 1 line at a time. This is very flexible, but it is not very fast.

You could even put the 1st section, before the 1st period, into a global first, then check it to see if it's a "name." If not, then the line is one like comment said, something else. I'm not sure how you'd decide what a "name" is, unless you had a list of the characters; you could also check for # of words; most regular sentences would have more than 2 words, whereas a name wouldn't.

innodat · September 28, 2007

That's great, Fenton.

Mr. aholzapfel came pretty close, except that his value count function only reacts to line breaks. I wish I knew how to have a

¶name.

as a indicator for value count. I think that would pretty much solve it.

Gosh, I wish I was better with calculations! Anything else FileMaker I'm pretty solid in...

comment · September 28, 2007

As I said, it can be done if you have a list of characters - see attached. But there's no way a computer can tell that the last line is NOT a new speech.

ParsePlay.fp7.zip

comment · September 28, 2007

On second thought, one could check if the previous line ended with some kind of punctuation. Still not fail-safe, but it could probably eliminate quite a few mistakes.

aholtzapfel · September 28, 2007

Sorry I'm at home and can't download the file so I'm doing this from memory (and that's as bad as my spelling) but, fenton is on the right track I think. and you will need another table to hold char names. (relation should be from a global field so all Char records are related, unless somone has a better idea.)

Add an if statement in the script, within the loop so it looks something like this.

Loop

SetVariable[$Char..; Leftwords(getvalue(text;$counter);1)]

If[ FilterValues( CharTable::Char; $Char) = $Char]

New Record

Set Char..

Set Line...

Else

Set Line = Line & " " & getvalue(text;$counter)

End If

Exit Loop If...

End Loop

If this doesn't work (and I have NOT tested it), I'd be happy to look at the file again and make these changes and test it. Fenton is right that this is not the most efficant way of doing thing, and Comments example is a problem (Sooner or later that will happen.)but Imperfect Solutions are often "good enough" in this imperfect world.

(sorry took me awhile to write this post and I have not looked at comments file yet. sounds like he beat me to it. I would add the check he suggested 'looking for puncuation at the end of lines' it would elimanate at least some errors.)

Edited September 28, 2007 by Guest

innodat · September 28, 2007

Hey comment!

This is fantastic!!! If a couple of lines need to be fixed manually, that's no big deal.

Thank you so much for your help! All of you. I hope I can return the favor soon - well I actually doubt that, you're so super savvy... but I'll try. -)

Sign In

Advanced text calculation - nut to crack...

Recommended Posts

innodat

innodat

comment

innodat

Fenton

comment

aholtzapfel

innodat

innodat

innodat

comment

Fenton

innodat

comment

comment

aholtzapfel

innodat

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information