PDF Images Remover
Simple web application which removes images from PDF files.
Where?
Web app running at https://www.milanlaslop.dev/app/pdf.
Used Technologies
- ReactJS
- JavaScript, Web Workers
- Bootstrap
- modern C++
- emscripten
Implementation Details and Challenges
Files Loading and Saving
The whole application including files processing runs in the browser. There is no server-side code.
The PDF files are not sent to any server, they are:
Files Processing in Browser (using C++)
The processing is done in C++ (mainly using C++ 11 regex standard library). Regular expressions are used to conveniently find parts of the PDF file (I do not parse and analyze the whole PDF file - I only look for patterns which lead to image objects to remove).
The C++ code is then compiled to JavaScript using emscripten.
UI
The UI is a ReactJS application, which makes use of Bootstrap library using React-Bootstrap.
Background Processing
Since the PDF file processing can take long, I do all this work in a Web Worker. Then, the work is done in a separate thread, not freezing the whole page (and sometimes the whole browser). Moreover, the processing can be conveniently canceled at any time.