Select folder first : . will automatically get all JSON files inside the folder
Toggle [Write Next] button to automatically process next file
Processed data will be write to a newly created or existed SQLite DB
2024-11-29 Add Chapter Header & Title Exraction Feature :
Processing selected pdf file with Mozilla PDF.js: . the pdf file will also be opend in pdf viewer and flip pages synchronously . using self developed algorithm to detect header and subtitle of each chapter . using Compromise-NLP to detect and extract keywords, terms and numbers . (optionally format text in paragraph for a better review)
The processed paragraphs will be saved in JSON
Using another web app <pdf processor> to write data into SQLite DB
2024-11-16 Add Table & Image Exraction Feature :
Extract Table within pdf-table-extractor
Extract image within PyMuPDF or manually export using Acrobat
The extracted item will be inserted to the correpsond paragraph
2024-09-22 WPF PDF Extrator Test :
Click open button or drag pdf file to the app
Process pdf by pages and paragraphs : . use self developed algorithm to detect and format text by keywords, terms and numbers . the pdf file will also be opend in pdf viewer and flip pages synchronously
Click on the paragraph to generate a new title for chapter