PDF Images Remover

Simple web application which removes images from PDF files.

Where?

Web app running at https://www.milanlaslop.dev/app/pdf.

Used Technologies

Implementation Details and Challenges

Files Loading and Saving

The whole application including files processing runs in the browser. There is no server-side code.

The PDF files are not sent to any server, they are:

Files Processing in Browser (using C++)

The processing is done in C++ (mainly using C++ 11 regex standard library). Regular expressions are used to conveniently find parts of the PDF file (I do not parse and analyze the whole PDF file - I only look for patterns which lead to image objects to remove).

The C++ code is then compiled to JavaScript using emscripten.

UI

The UI is a ReactJS application, which makes use of Bootstrap library using React-Bootstrap.

Background Processing

Since the PDF file processing can take long, I do all this work in a Web Worker. Then, the work is done in a separate thread, not freezing the whole page (and sometimes the whole browser). Moreover, the processing can be conveniently canceled at any time.