Ocean West Posted May 24, 2012 Posted May 24, 2012 just hadn't had time to investigate the possibility but we routinely grab a federal document from a website but we only care about including the second page from this document with our document is there an easy automated way to extract just that page from the pdf and place it into a container or super container? thanks
clemhoff Posted May 25, 2012 Posted May 25, 2012 Hi Stephen, Page extraction is very straightforward using iText's PdfCopy class. /** * iText_ExtractSinglePage ( fm_pathToSrc ; fm_pathToDest ; fm_getPageNum ) * by clem 2010-11-04 * Extrait une page d'un document PDF. * * === Parameters === * fm_pathToSrc: path to pdf input file. * fm_pathToDest: path to pdf output file. * fm_getPageNum: the page number to be extracted. **/ import com.itextpdf.text.Document import com.itextpdf.text.DocumentException import com.itextpdf.text.pdf.PdfCopy import com.itextpdf.text.pdf.PdfReader try{ def reader = new PdfReader(fm_pathToSrc) def document = new Document() def copy = new PdfCopy(document, new FileOutputStream(fm_pathToDest)) document.open() copy.addPage copy.getImportedPage(reader, fm_getPageNum.toInteger() ) document.close() return true } catch (IOException ioe){ return "ERROR: $ioe.message" } catch (DocumentException de){ return "ERROR: $de.message" }
john renfrew Posted May 25, 2012 Posted May 25, 2012 Stephen Same thing but using the PdfSmartCopy class // PDFextractPage2 ( fm_fileIn ; fm_fileOut ; fm_num ) // 11_09_12 JR // v1.4A import com.itextpdf.text.Document import com.itextpdf.text.pdf.PdfSmartCopy import com.itextpdf.text.pdf.PdfReader document = new Document() copy = new PdfSmartCopy(document, new FileOutputStream(fm_fileOut)) try { reader = new PdfReader(fm_fileIn) } catch (Exception e) { if (e.toString().contains('BadPassword')) { return 'PASSWORD ERROR' } //end if } //end try document.open() try { copy.addPage(copy.getImportedPage(reader, fm_num.toInteger())) document.close() } catch(e) { //return e return 'ERROR' } //end try return true PdfSmartCopy has the same functionality as PdfCopy, but when resources (such as fonts, images,...) are encountered, a reference to these resources is saved in a cache, so that they can be reused. This requires more memory, but reduces the file size of the resulting PDF document. This has more impact on multi-page PDF files, but is a better class to use regularly
clemhoff Posted May 25, 2012 Posted May 25, 2012 Hi John, `PdfSmartCopy' is of course a better class to use if you concatenate PDF documents containing duplicate resources and a has very little impact on splitting PDF Documents. No ?
Ocean West Posted May 25, 2012 Author Posted May 25, 2012 I tried both versions however the 'Smart' one threw up an error and the dialog and subsequent more info were empty. The first one seems to work.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now