Use PyPDF2 - which PyPDF 2 or PyPDF 3 should be used?

Use PyPDF2 - which PyPDF 2 or PyPDF 3 should be used?
Page content

Introduction

In previous article, we can extract text on a PDF file using PyPDF2.

I will introduce PyPDF3 in this article.

PyPDF2 and PyPDF3 exist

When I looked for various usage of PyPDF2, I found the follwing commnet in StackOverflow.

stack_overflow

The PyPDF2 has been stopped since 3 years ago?! And, new version PyPDF3 exists?! Really?

Which should I use PyPDF2 or PyPDF3 ??

Check the PyPI

Does PyPDF3 exist on PyPI? Check with pip command.

This is PyPDF2.

1pip search PyPDF2
2> PyPDF2 (1.26.0)   - PDF toolkit

This is PyPDF3.

1pip search PyPDF3
2> PyPDF3 (1.0.1)  - Pure Python PDF toolkit

Both are really present!!

What is PyPDF3 ?

In this section, I show my understanding about PyPDF3 by reading roadmap on Github and another resources.

  • Volunteers have started PyPDF3 project that is based on PyPDF2 because PyPDF2 has not been updated since 3 years ago.

  • Initial goals are to fully implement existing features and fix some of the most critical bugs/performance issues from PyPDF2 before moving on to new functionality.

  • However, development is not active as far as seeing the commit log.

All of the story is discussed in a certain github issue

As a further investigation, I got to one github issue.

reboot_pypdf2

In summarize..

  • PyPDF2 core maintainer had not updated it because of busy
  • However he has decided to restart to update PyPDF2
  • Developers also discuss PyPDF3 in that issues

Conclusion

We can use PyPDF2 without problems.

I checked issues and pull requests in PyPDF2 repository and I understand that PyPDF2 is still alive.