Skip to main content

Command Palette

Search for a command to run...

How to Install Tesseract OCR on Windows, macOS, and Linux

Updated
3 min read
How to Install Tesseract OCR on Windows, macOS, and Linux

Introduction

Tesseract OCR is one of the most powerful and open-source Optical Character Recognition (OCR) engines available. It’s widely used in Python projects via the pytesseract wrapper to convert images and PDFs into editable text.

Before using it with Python, you need to install Tesseract on your system and ensure it's correctly configured.

What Is Tesseract?

Tesseract is a command-line OCR engine developed by HP and maintained by Google. It supports multiple languages and works on many platforms. While Python uses the pytesseract library to interface with Tesseract, you must install the Tesseract engine separately.

Prerequisites

  • Python (already installed)

  • Internet connection

  • Basic familiarity with the command line

Installation Guide by Operating System

Windows

Step 1: Download the Installer

Visit the official installer page:
https://github.com/UB-Mannheim/tesseract/wiki

This version is maintained by UB Mannheim and is one of the most stable builds for Windows.

Step 2: Run the Installer

  1. Download and run the .exe file.

  2. During setup:

    • Choose the destination folder (e.g., C:\Program Files\Tesseract-OCR).

    • Make sure "Add to PATH" is checked.

    • Install additional language packs if needed.

Step 3: Verify Installation

Open Command Prompt and type:

tesseract --version

You should see version info, which confirms it's working.

In your Python script, explicitly set the path:

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

macOS

Step 1: Install with Homebrew

If you have Homebrew installed, run:

brew install tesseract

Step 2: (Optional) Install Language Packs

brew install tesseract-lang

This installs support for additional languages.

Step 3: Verify Installation

Run:

tesseract --version

You should see version details.

Linux (Ubuntu/Debian)

Step 1: Install Tesseract via APT

sudo apt update
sudo apt install tesseract-ocr

Step 2: (Optional) Install Language Packs

sudo apt install tesseract-ocr-[lang]

Replace [lang] with the desired language code (e.g., deu for German, fra for French).

Step 3: Verify Installation

tesseract --version

Test Tesseract from the Terminal

After installation, you can test it directly by converting an image to text:

tesseract example.png output

This reads example.png and saves the extracted text in a file called output.txt.

Python Integration (with pytesseract)

After installing Tesseract, you can use it in Python:

pip install pytesseract pillow

Example usage:

from PIL import Image
import pytesseract

image = Image.open("example.png")
text = pytesseract.image_to_string(image)
print(text)

Final Notes

  • Tesseract works best with clear, high-contrast images.

  • For scanned documents, consider preprocessing with OpenCV for better results.

  • You can find supported languages and models in the /tessdata directory.

Summary

PlatformCommand / Tool
WindowsInstaller from UB Mannheim
macOSbrew install tesseract
Linuxsudo apt install tesseract-ocr

Installing Tesseract is quick and easy, and once set up, it opens the door to powerful text recognition capabilities in your Python applications.

More from this blog

PyScript Academy

29 posts

PyScript Academy is a blog sharing practical Python scripts, tips, and mini projects—helping you learn Python by doing, one useful script at a time.