Jump to content

Advanced text calculation - nut to crack...


This topic is 6114 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hey guys!

I've been racking by brain about this... The idea is to be able to break-down lines of a theater play into separate records (to take line-specific notes, change the order, etc.).

The starting point is a text field with the source text, as follows:

RECORD 1 (global field)

Kelly. Hi George, how is it going?

George. I had a long day

Debby. Aww, George!

Kelly. Shut it.

George. Ok now, that’s enough.

The end result needs to look like this:

RECORD 1

field1: Kelly

field2: Hi George, how is it going?

RECORD 2

field1: George

field2: I had a long day.

RECORD 3

field1: Debby

field2: Aww, George!

...

You get the hang of it. All plays published have one thing in common: whenever a different character starts speaking, it's after a line-break and the line starts with the character name and a period:

¶George. or ¶Kelly. or ¶Debby. etc.

I've been experimenting with a lot of different text functions - just couldn't get it right. I'm not going to list all my efforts here... I don't think that would be helpful.

Does anyone have an idea?

Thank you so much for your input!!!

Michael

Edited by Guest
Link to comment
Share on other sites

Actually, this is quite easy. The difficult question is: do you have records that are NOT dialog lines, and if so, how can you tell them apart. For example: "Night. Julia walks onto the balcony."

BTW, I believe there are applications much better suited for this type of 'data'.

Link to comment
Share on other sites

hey there, I'm not worried about the 2% of lines which could (will) pull in wrong.

I'm working on a database for stage managers. This separation of text will lay the ground work for taking notes on blocking and other things, print reports etc. - which is just one section of the entire program.

Easy or not, I'm stuck in my though process, and would appreciate a tip..

Thanks!

Edited by Guest
Link to comment
Share on other sites

Also, what about long lines of dialog. Do any of them have a carriage return in them? Because they will imported as a separate record (with no name at the beginning).

You might want to look into using a grep (or regex) capable text editor, to go through and put a tab between the names and the dialog.

Find: ^[^.]+.

Will find the text up to and including the 1st period. One could get fancier and omit the period itself. You can then add a tab.

Fancier:

Find: (^[^.]+)(. )

Replace: 1t

Link to comment
Share on other sites

Here is a sample file to look at. It was easier for me to just throw together a sample than explain it. It does assume that the only Carraige Returns are at the end of the line. If that isn't always true, it won't always work. It does work on the sample you posted. If this doesn't work for you, let me know how it fails and I'll try to help.

PLAY.zip

Link to comment
Share on other sites

Hello aholzapfel,

thank you for the file, amazing! I does just what I need. EXCEPT...

As you pointed out: there *will* be line breaks within dialog.

I wish there was a way to include an additional condition, as opposed to just line breaks for the value count.

If there was a filed containing a list of character names, could the value count react to occurrences like 'line-break & characterXX.', 'line-break & characterXY.', 'line-break & characterXZ.' ? There might be up to 30 characters in a play... But this is a 99% accurate condition:

"¶characterXY."

Thank you so much for your help!

Michael

Edited by Guest
Link to comment
Share on other sites

The source text is in one (global) field.

A script/play is scanned with OCR, then copied into a (global) field in the database. That's the starting point.

Edited by Guest
Link to comment
Share on other sites

I understand your idea: add a new character (tab) between character name and spoken text to identify values. But what if there's tabs in other places? I would also like this to be a "no-brainer" for the user, and stay in FileMaker

Link to comment
Share on other sites

Having a list of characters would certainly help, though it still wouldn't be 100% safe. You could check that the line starts with a character name, followed by a period. But that could also happen with spoken text, e.g.:(

Kelly. Hi George, how is it going?

George. I had a long day. I think I want to kill

Debby. She's impossible.

Link to comment
Share on other sites

You could do it like aholtzapfel's file. But instead of creating a new record right away, you could load it 1 line at a time into a global field, then figure out what to do with it (rather than immediately setting fields with it). So, if there were line breaks in the dialog, the next line would not start with a name, and you would know that that line still belonged to the last record you'd done; so you'd append it instead of making a new record.

When you came to another line with a name starting it, you'd know to make to new record for it.

The idea being that you would just import the entire text file into 1 field, likely a global field. Then run through it 1 line at a time. This is very flexible, but it is not very fast.

You could even put the 1st section, before the 1st period, into a global first, then check it to see if it's a "name." If not, then the line is one like comment said, something else. I'm not sure how you'd decide what a "name" is, unless you had a list of the characters; you could also check for # of words; most regular sentences would have more than 2 words, whereas a name wouldn't.

Link to comment
Share on other sites

That's great, Fenton.

Mr. aholzapfel came pretty close, except that his value count function only reacts to line breaks. I wish I knew how to have a

¶name.

as a indicator for value count. I think that would pretty much solve it.

Gosh, I wish I was better with calculations! Anything else FileMaker I'm pretty solid in...

Link to comment
Share on other sites

Sorry I'm at home and can't download the file so I'm doing this from memory (and that's as bad as my spelling) but, fenton is on the right track I think. and you will need another table to hold char names. (relation should be from a global field so all Char records are related, unless somone has a better idea.)

Add an if statement in the script, within the loop so it looks something like this.

Loop

SetVariable[$Char..; Leftwords(getvalue(text;$counter);1)]

If[ FilterValues( CharTable::Char; $Char) = $Char]

New Record

Set Char..

Set Line...

Else

Set Line = Line & " " & getvalue(text;$counter)

End If

Exit Loop If...

End Loop

If this doesn't work (and I have NOT tested it), I'd be happy to look at the file again and make these changes and test it. Fenton is right that this is not the most efficant way of doing thing, and Comments example is a problem (Sooner or later that will happen.)but Imperfect Solutions are often "good enough" in this imperfect world.

(sorry took me awhile to write this post and I have not looked at comments file yet. sounds like he beat me to it. I would add the check he suggested 'looking for puncuation at the end of lines' it would elimanate at least some errors.)

Edited by Guest
Link to comment
Share on other sites

Hey comment!

This is fantastic!!! If a couple of lines need to be fixed manually, that's no big deal.

Thank you so much for your help! All of you. I hope I can return the favor soon - well I actually doubt that, you're so super savvy... but I'll try. :(-)

Link to comment
Share on other sites

This topic is 6114 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.