the bloggard

PDF generation in Java w/ App Engine

Posted in Technology by conorpower on January 11, 2010

Adobe’s Portable Document Format (PDF) has become the de facto standard for sharing documents across operating systems and as such is encountered extensively  on the web when documents are required to be downloaded for off line usage or alternatively to present web based content in a more traditional and professional format, providing better print quality over printing web pages directly.

I’ve been looking into using the PDF format to create my resume on demand and also to generate documents dynamically on an application I’ve been working on on Google App Engine. I had previously used Apache FOP for PDF generation but wanted to do a little more research to see what other options were available and came up with the following shortlist:

  1. Apache FOP (Formatting Object Processor)
  2. iText PDF Library
  3. PDF my URL
  4. PDF Jet

The following briefly describes my experiences with each and what I finally settled on …

Apache FOP

Apache FOP is a formatter that transforms from a tree representation of a document using XSL to provide a target output format such a PostScript, PDF, PNG and RTF to name but a few. It does however appear that PDF generation seems to be the most common use case.

The XML to be passed to the XSL transformation must be in the XSL-FO (formatting object) format which is an XML document adhereing to the tags and attributes specific in the XSL-FO DTD. The compliance page provides a comprehensive list of supported tags and attributes that can be used in the XML document.

Given this, the a standardized and extensible approach to the transformation would be as follows:

  1. Generate common XML representation of the entity to be converted to PDF
  2. Using XSL transform this to it’s XSL-FO representation
  3. Using PDF XSL transformation, transform the XSL-FO XML representation to the final PDF document

To get started there are a number of examples available at the following URL to see how the XSL-FO are transformed to their PDF representations. I decided to not go with this approach due to the following reasons:

  1. Entity to be converted into PDF was not already readily available in an XML format
  2. The necessity to learn the XSL-FO syntax in detail to get the necessary flexibility in layout that would be required
  3. Given past experiences with FOP it was quite time consuming to get the generated PDF document to have a professional quality feel to it

NOTE: From an App Engine specific perspective, Apache FOP has a number of dependencies on the awt classes which are not currently on the list of whitelisted classes available with Google App Engine. It seems that this is soon to be resolved by adding support for the awt dependencies but timelines are not known. You can monitor this update at the following issue. There are some workarounds available but unfortunately they seem to be in conflict with SUN licensing restrictions.

iText

iText is freely available Java library that can be used to generate PDF documents programmatically. As with any library its usage is very straightfoward once you have added the necessary JAR file to the classpath for your application. The documentation and javadocs are very thorough which eliminates the need to constantly search online for specific examples of its usage.

In combination with the Spring framework generating a PDF document becomes very forward if you are familiar with the Spring MVC framework:

  1. Copy the iText JAR file to your classpath
  2. Create a controller to handle your request to download the PDF document
  3. Within the controller return the entity and any other dependencies to render in the PDF document
  4. Create a class which extends AbstractPdfView and implement the buildPdfDocument() which builds the actual PDF document using the iText library

A good article covering the basic usage of the iText library is available on the IBM developerworks site. Similarly to Apache FOP, this approach was not a desirable option for me due to the following reasons:

  1. Being personally opposed to writing presentation code in the view class to create the class as the code becomes less reusable when compared to a typical templating framework.
  2. The approach would be used longer term for all PDF generation needs and maintaining the look and feel in compile time classes did not provide an efficient solution to rapid prototyping.
  3. For every entity that might be required in PDF form, a duplication of the rendering of the entity would be needed above and beyond that which was already needed in the base HTML representation of the entity, increasing the development time required for new entities or changes to existing entities.

NOTE: From an App Engine specific perspective, and similarly to Apache FOP, iText has a number of dependencies on the awt classes which are not currently on the list of whitelisted classes available with Google App Engine. See the note above for more details regarding this issue.

PDF my URL

PDF my URL is a freely available web site (“service”) that can be very easily used to create PDF documents from a given URL. There are no limits or quotas (at least at the moment) on the number of PDF documents that can be generated. The only requirement is that the URL to be transformed into PDF is available on the internet. The process to generate the PDF document is as follows:

  1. Invoke the PDF generation using a URL similar to: http://www.pdfmyurl.com/?url=www.myurl.com

In this way any URL can be passed to the service and additional query parameters can be used to more finely control the PDF document i.e. orientation, character encoding. Refer to the PDF my URL advanced options section for full details of the additional options available. Also, parameters can be passed to the URL that will transformed to PDF with the caveat that these parameters must be URL encoded when invoking the pdfmyurl.com URL. Consequently dynamic documents can be very easily converted into PDF documents.

Due to it’s simplicity, elegance and elimination of coding in the creation of the document this option was the solution chosen to generate PDF documents. The only obvious downside to this approach is the need to have the URLs be available on the internet, which is unlikely to be acceptable for internal applications for enterprises and the fact that the generation of the PDF document comes with the cost of an additional request coming from the pdfmyurl.com to retrieve the HTML to be generated, although this may be of minimal importance in most situations.

PDFJet

The final approach listed here is something I came across recently and thought it worth mentioning for the fact that it solves the compatibility issue that both Apache FOP and iText have with Google App Engine (see notes above).

PDFJet is a Java and .NET library in a similar vein to iText for creating PDF documents programmatically. The advantage it may provide readers of this article is that it claims to be fully compatible with Google App Engine i.e. it has no dependencies on any classes which are not on the Java whitelist.

For the same reasons that iText was not a preferred option for my requirements, I did not take the time to explore PDFJet in any more detail than mentioned in this post.

Summary

In summary, the simplicity of pdfmyurl.com with its extensive options to control the generation process was by far the best solution for my needs. As a longer term solution, when considering the amount of work it would save a developer in not having to recreate a PDF presentation and layout programmatically or using XML the alternatives have little to offer.

The only concerns to keep in mind longer are the fact that the support and maintenance of the service is out of your hands i.e. what level of guarantees need to be given for availability and uptime, lack of support when issues out of your control may be encountered and what the strategic direction and longer plans the owners of the service have for monitizing it are all unknown. Certainly things to keep in mind and monitor when using any third party service.

Update (12/14/2010) – I came across two other utilities recently, that may be of use:

I haven’t used either but they seem to work well in Linked In Labs resume builder.

Advertisement

2 Responses

Subscribe to comments with RSS.

  1. Cristian Nicanor said, on May 7, 2010 at 2:25 am

    Hi,

    I got fop working on gae. For details check this out:
    http://nicanorcristian.blogspot.com/2009/11/apache-fop-on-google-application-engine.html
    Contact me if you have any question.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: