Use automatic OCR for PDF files

If you have PDF files and need to extract the text with optical character recognition (OCR), follow these steps:

Step-by-step guide

  1. Log in to Nextcloud - see Access shared files remotely
  2. Upload your PDF files to your Nextcloud home folder. DO NOT upload them to the Shared Drive or Private Drive - OCR will not work from these locations.

    If you're not sure where your Nextcloud home is, look for a folder called Documents and upload files there.




  3. Click the "..." on the entry of the files you uploaded, or right click the file name and choose Details from the menu


  4. Click "..." in the upper right corner of the Details pane and click Tags



  5. Click the box labeled Collaborative tags and select needs-ocr
  6. Wait a 5-10 minutes for OCR to run
  7. Check your OCR folder

    If you did not have an OCR folder in your Nextcloud home folder, one will be created automatically


  8. Review the processed PDF files for accuracy

    Each processed file will match the name of the original file but with "-ocr" appended to the file name.

  9. Copy processed PDF files to the Shared Drive or Private Drive for archiving
  10. Remove the needs-ocr tag from the original file or just delete it from your Nextcloud home folder


You can see all of your files tagged with needs-ocr by selecting Tags from the left hand menu in Nextcloud and typing needs-ocr in the box labeled Select tags...