Extracting A single Page from a PDF - iText?

Ocean West · May 24, 2012

just hadn't had time to investigate the possibility but we routinely grab a federal document from a website but we only care about including the second page from this document with our document is there an easy automated way to extract just that page from the pdf and place it into a container or super container?

thanks

clemhoff · May 25, 2012

Hi Stephen,

Page extraction is very straightforward using iText's PdfCopy class.


/**

* iText_ExtractSinglePage ( fm_pathToSrc ; fm_pathToDest ; fm_getPageNum )

* by clem 2010-11-04

* Extrait une page d'un document PDF.

*

* === Parameters ===

* fm_pathToSrc: path to pdf input file.

* fm_pathToDest: path to pdf output file.

* fm_getPageNum: the page number to be extracted.

**/



import com.itextpdf.text.Document

import com.itextpdf.text.DocumentException

import com.itextpdf.text.pdf.PdfCopy

import com.itextpdf.text.pdf.PdfReader



try{

	def reader = new PdfReader(fm_pathToSrc)

	def document = new Document()

	def copy = new PdfCopy(document, new FileOutputStream(fm_pathToDest))



	document.open()

	copy.addPage copy.getImportedPage(reader, fm_getPageNum.toInteger() )

	document.close()

	return true



} catch (IOException ioe){

	return "ERROR: $ioe.message"

	

} catch (DocumentException de){

	return "ERROR: $de.message"

}

john renfrew · May 25, 2012

Stephen

Same thing but using the PdfSmartCopy class


// PDFextractPage2 ( fm_fileIn ; fm_fileOut ; fm_num )

// 11_09_12 JR

// v1.4A

import com.itextpdf.text.Document

import com.itextpdf.text.pdf.PdfSmartCopy

import com.itextpdf.text.pdf.PdfReader



document = new Document()

copy = new PdfSmartCopy(document, new FileOutputStream(fm_fileOut))

try {

reader = new PdfReader(fm_fileIn)

} catch (Exception e) {

if (e.toString().contains('BadPassword')) {

  return 'PASSWORD ERROR'

} //end if

} //end try

document.open()

try {

copy.addPage(copy.getImportedPage(reader, fm_num.toInteger()))

document.close()

} catch(e) {

//return e

return 'ERROR'

} //end try

return true

PdfSmartCopy has the same functionality as PdfCopy, but when resources (such as fonts, images,...) are encountered, a reference to these resources is saved in a cache, so that they can be reused. This requires more memory, but reduces the file size of the resulting PDF document.

This has more impact on multi-page PDF files, but is a better class to use regularly

clemhoff · May 25, 2012

Hi John,

`PdfSmartCopy' is of course a better class to use if you concatenate PDF documents containing duplicate resources and a has very little impact on splitting PDF Documents. No ?

Ocean West · May 25, 2012

I tried both versions however the 'Smart' one threw up an error and the dialog and subsequent more info were empty.

The first one seems to work.

Sign In

Extracting A single Page from a PDF - iText?

Recommended Posts

Ocean West

clemhoff

john renfrew

clemhoff

Ocean West

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information