ScriptMaster erroring on processing scanned PDFs

ignotum · December 2, 2014

We have a SM process that combines PDFs into one. Most of these PDFs are ones created via some print or save as PDF process from a word processor or the like.... but SOME of them are scans of printed pages. When our processor encounters these pages it often (always?) errors and causes a failure.

Has anyone else encountered this and, more importantly, does anyone have any suggestions on how to get SM to process these scanned PDFs properly.

More info on or process is available if it might help to diagnose the problem...

Thanks,

mark

Ocean West · December 3, 2014

Mark,

What are the settings when saving scanned documents?

are you using a current version of iText and SM to merge the PDFs ?

john renfrew · December 3, 2014

Can you post the code you are using to concatenate??

Can you post a simple example file that fails??

ignotum · December 3, 2014

I can't post a "bad" file because of privacy concerns, these are parts of applications for a major grant competition. I will get the code we are using and post that up though. We CAN see that the ones that ALWAYS fail were done on Konica/Minolta scanners. Attached is the Acrobat file info.

Â

Mark

Â

john renfrew · December 3, 2014

what about scanning any old piece of printed paper on the scanner then as its whats inside that is likely to be causing the problems not whats in the info...

if you open the failing file in Adobe Reader first then try to concatenate does it still fail??

ignotum · December 4, 2014

The "bad" scans are not coming from us... they are being uploaded into our system by applicants from all over. We have no control over those source files.

I can test a bad file to see if some pre-processing would fix the problem but in our system this would not be possible (at least not manually) as it processes somewhere in the neighborhood of 20,000 pages per competition (per year).

We are using this to do the PDF merging:

RegisterGroovy( "mergePDFs( files ; pdfOut )" ; "import java.io.FileOutputStream;¶

import java.util.ArrayList;¶

import java.util.List;¶

¶

import com.lowagie.text.pdf.*;¶

import com.lowagie.text.*;¶

¶

String[] inFiles = files.split("n");¶

int fileIndex = 0;¶

int pageOffset = 0;¶

Document document = null;¶

PdfCopy copy = null;¶

ArrayList bookmarks = new ArrayList();¶

¶

while (fileIndex < inFiles.length) {¶

¶

// Create a reader for the next document¶

PdfReader reader = new PdfReader(new RandomAccessFileOrArray(inFiles[fileIndex]), null);¶

reader.consolidateNamedDestinations();¶

¶

// Retrieve the total number of pages¶

int numberOfPages = reader.getNumberOfPages();¶

¶

// Create the master document¶

if (fileIndex == 0) {¶

// step 1: Create the document-object¶

document = new Document(reader.getPageSizeWithRotation(1));¶

// step 2: Create the copy that listens to the document¶

a copy = new PdfCopy(document, new FileOutputStream(pdfOut));¶

// step 3: Open the document¶

document.open();¶

¶

// cache bookmarks form the first file¶

ArrayList temp = SimpleBookmark.getBookmark(reader);¶

if( temp != null ) bookmarks.addAll(temp);¶

} else {¶

// cache bookmarks from subsequent files and adjust the page number references¶

ArrayList tmp = SimpleBookmark.getBookmark(reader);¶

if( tmp != null ) {¶

SimpleBookmark.shiftPageNumbers(tmp, pageOffset, null);¶

bookmarks.addAll(tmp);¶

}¶

¶

// step 4: Add content¶

PdfImportedPage page;¶

for (int i = 0; i < numberOfPages; ) {¶

++i;¶

page = copy.getImportedPage(reader, i);¶

copy.addPage(page);¶

}¶

//update counters¶

pageOffset += numberOfPages;¶

fileIndex++;¶

}¶

// add cached bookmarks¶

if(bookmarks.size() > 0) copy.setOutlines(bookmarks);¶

¶

//close document¶

document.close();¶

return true;" )

john renfrew · December 4, 2014

OK, its some code which is more than 6 years old in a version which has long past being supported, this in itself might be enough to cause your method to fail.

read some stuff here about why the latest version is a better deal. http://itextpdf.com/salesfaq

There will be some news soon about licensing for FileMaker users... watch this space.

I would start by removing the bookmarks code and just add the pages to see what happens.

Also take a PDF which breaks your code and add it just one more page at a time - so +1, +2, +3 etc to see if there is a specific page which causes the problem

Sign In

ScriptMaster erroring on processing scanned PDFs

Recommended Posts

ignotum

Link to comment

Share on other sites

Ocean West

Link to comment

Share on other sites

john renfrew

Link to comment

Share on other sites

ignotum

Link to comment

Share on other sites

john renfrew

Link to comment

Share on other sites

ignotum

Link to comment

Share on other sites

john renfrew

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information