← Back to Articles How-To Guide

How to Extract Text from a PDF (3 Methods for Any Situation)

March 2025 5 min read PDFHub Free Editorial

You open a PDF, try to select some text, and nothing happens. Or the text copies but pastes as garbled characters. Or the document is a scan and there's literally no text in it — just an image of a page.

This is frustrating, but fixable. The right approach depends on what type of PDF you have. Here's how to identify it and extract the text in each case.

First: identify what type of PDF you have

The extraction method depends entirely on the type of PDF:

Text PDF — Created digitally from Word, Google Docs, InDesign or similar. The text is embedded as real, selectable data. Extraction is instant and perfect.
Scanned PDF — Pages are photographs of paper documents. There is no real text — only pixels. Requires OCR (Optical Character Recognition) to convert image to text.
Protected PDF — May be either type above, but with a permission password that blocks copying. Need to unlock first (if you're authorized to do so).

Quick test: try selecting any word in your PDF. If you can highlight text, it's a text PDF. If clicking does nothing — or highlights the entire page as one image block — it's a scanned PDF.

Method 1: PDFHub Free — Instant for text PDFs (no upload)

FASTEST · FREE · PRIVATE

For PDFs with embedded text, the Extract Text tool reads all text directly in your browser and exports a clean .txt file. No server upload, no account needed.

  1. Open the Extract Text tool.
  2. Upload your PDF.
  3. Choose plain text output or text with page markers.
  4. Click Process and download your .txt file.

The extracted text preserves paragraph breaks and page structure. It's ideal for searching a large document, feeding content into another application, or quickly finding specific information without reading the whole file.

This method only works for PDFs with embedded text. If your PDF is a scan (or if the extracted .txt file comes back empty), use Method 2 or 3 below.

Extract text from your PDF — free, no upload

Processes locally in your browser. Your file never leaves your device.

Extract Text →

Method 2: Adobe Acrobat — Best for scanned PDFs (OCR)

SCANNED DOCUMENTS · BEST ACCURACY

For scanned PDFs, you need OCR — software that analyzes the page image and converts it to machine-readable text. Adobe Acrobat's Recognize Text feature is the most accurate OCR solution available, supporting 40+ languages, handling complex layouts (tables, columns, mixed text/image pages) and producing searchable PDFs that look identical to the original.

In Acrobat: Tools → Scan & OCR → Recognize Text → In This File. After processing, the document contains real text you can copy, search and export normally.

Adobe Acrobat Pro — OCR in 40+ Languages

Industry-leading OCR accuracy for scanned documents. Handles complex layouts, tables and mixed-content pages. Makes any scanned PDF fully searchable and copyable.

Try Adobe Acrobat →

Affiliate link — we may earn a commission at no extra cost to you.

Method 3: Google Drive — Free OCR for scanned documents

FREE · SCANNED PDFs · GOOGLE ACCOUNT REQUIRED

If you have a Google account, you can use Google Drive's built-in OCR for free — no paid software required:

  1. Upload the scanned PDF to Google Drive.
  2. Right-click the file → Open with → Google Docs.
  3. Google Docs automatically runs OCR and places the recognized text below the original page images.
  4. Copy the text or export as .txt or .docx.

Accuracy is good for standard printed text in major languages. Complex layouts, handwriting, or non-Latin scripts may produce errors. For critical documents, verify the output before relying on it.

Google Drive OCR works best on clean, high-resolution scans (150 dpi or above). Faded, skewed or low-quality scans produce more errors — Acrobat's OCR handles poor scan quality significantly better.

Which method should you use?

SituationBest Method
Text PDF, quick extractionPDFHub Free (Method 1)
Text PDF, sensitive document (no upload)PDFHub Free (Method 1)
Scanned PDF, high accuracy neededAdobe Acrobat OCR (Method 2)
Scanned PDF, free optionGoogle Drive OCR (Method 3)
Multiple scanned documents, batch processingAdobe Acrobat Pro (batch OCR)
Non-English scanned documentAdobe Acrobat (40+ language support)

What to do with extracted text

Once you have the text as a .txt file, you can do things that are impossible inside a PDF:

  • Search with Ctrl+F across hundreds of pages instantly
  • Feed it into AI tools (summarization, translation, Q&A)
  • Import into word processors for editing and reformatting
  • Index it in a content management or knowledge base system
  • Run text analysis, word counts or keyword extraction

If you're building workflows that involve processing large volumes of PDFs — for a business, a content platform, or data analysis — having reliable text extraction is foundational. Tools like SEMrush or Ahrefs can also help if your goal is to publish extracted and reformatted content as optimized web pages.

SEMrush — Turn Your Content into Search Traffic

Keyword research, on-page SEO audits, competitor analysis and content optimization. If you're publishing web content based on extracted or reformatted documents, SEMrush helps it rank.

Try SEMrush →

Affiliate link — we may earn a commission at no extra cost to you.

Frequently Asked Questions

Why does text come out garbled or with weird characters when I copy it?

Usually a font encoding issue — the PDF uses a custom or embedded font that doesn't map correctly to Unicode when copied. Using the PDFHub Free Extract Text tool fixes this by reading the raw character data rather than relying on clipboard copy.

Can I extract text from just a few pages, not the whole PDF?

The easiest approach is to split the PDF to extract specific pages first (using the Split PDF tool), then run the extract text tool on those pages only.

How accurate is OCR text extraction?

For clean, high-resolution scans of standard printed text, modern OCR (especially Adobe Acrobat's) achieves 99%+ accuracy for Latin-script languages. Accuracy drops with low-quality scans, handwriting, complex table layouts, or non-standard fonts. Always proofread critical documents.

Does extracting text preserve formatting like bold and tables?

Plain text extraction (.txt) does not preserve formatting — you get the content without styling. If you need to preserve formatting, Adobe Acrobat's export to Word (.docx) or the Google Docs OCR output does a better job of maintaining structure.

What if my PDF is protected and I can't copy text?

If the PDF has a permissions password restricting copying, you can remove it (if you're authorized and have the password) using the Remove Password tool. You cannot bypass a password on a file you don't have authorization to unlock.

The bottom line

For text PDFs: PDFHub Free extracts everything instantly, privately and for free. For scanned PDFs: Adobe Acrobat gives the best accuracy, or Google Drive works as a free alternative for standard documents. The key is identifying which type of PDF you have before choosing your method.

Extract text from your PDF right now

Works in your browser. No upload, no account, no cost.

Extract Text Free →