Suvarna Garge (Editor)

Apache PDFBox

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Written in
  
Java

License
  
Apache License 2.0

Operating system
  
Cross-platform

Apache PDFBox

Developer(s)
  
Apache Software Foundation

Stable release
  
2.0.4 / December 16, 2016; 2 months ago (2016-12-16)

Type
  
Portable Document Format (PDF)

Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.

Contents

Open Hub reports over 4,000 commits (since the start as an Apache project) by 17 contributors representing more than 120,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with stable Y-O-Y commits. Using the COCOMO model, it took an estimated 34 person-years of effort.

Structure

Apache PDFBox has these components:

  • PDFBox: the main part
  • FontBox: handles font information
  • XmpBox: handles XMP metadata
  • Preflight (optional): checks PDF files for PDF/A-1b conformity.
  • History

    PDFBox was started in 2002 in SourceForge by Ben Litchfield who wanted to be able to extract text of PDF files for Lucene. It became an Apache Incubator project in 2008, and an Apache top level project in 2009.

    Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011.

    In February 2015, Apache PDFBox was named an Open Source Partner Organization of the PDF Association.

    References

    Apache PDFBox Wikipedia