Optical Character Recognition (OCR) is a powerful technology that converts text in images into machine-readable text. From scanning invoices and documents to extracting text from screenshots, OCR plays a crucial role in many modern applications.

In this tutorial, we’ll explore a GitHub project that demonstrates how to build a FastAPI-based OCR API service capable of extracting text from images. The project exposes a simple API endpoint where users can upload an image and receive the extracted text in response.

The repository we’ll explore is:
https://github.com/sf-co/5-fastapi-imageocr-api-fastapi-optical-character-recognition-service

By the end of this guide, you’ll understand how the project works, how to run it locally, and how to use the API for extracting text from images.

What is OCR?

OCR (Optical Character Recognition) is a technology that analyzes images and detects textual information within them. It converts scanned documents, screenshots, and photographs into editable and searchable text. OCR is widely used in document processing systems, automation workflows, and data extraction pipelines.

This project implements OCR as a REST API using FastAPI, allowing developers to integrate image text extraction functionality into applications easily.

Project Overview

This project provides a lightweight OCR microservice that accepts image files and returns extracted text via an API endpoint. The application uses Python libraries for image processing and OCR recognition, making it suitable for building document processing or automation systems.

The API workflow typically follows these steps:

A user uploads an image file.
The FastAPI endpoint receives the image.
The OCR engine processes the image.
Extracted text is returned in the API response.

Such OCR APIs are commonly used in applications that require automated data extraction from documents, receipts, invoices, or screenshots.

Technologies Used

This project leverages several technologies commonly used in Python-based AI and API development.

1. Python

Python is the core programming language used to build the OCR service. Its ecosystem includes powerful libraries for machine learning, computer vision, and automation.

2. FastAPI

FastAPI is a modern Python web framework used for building high-performance APIs. It automatically generates interactive documentation and supports asynchronous processing.

3. Tesseract OCR

Tesseract is one of the most widely used open-source OCR engines. It processes images and extracts textual content from them.

4. Pillow (PIL)

Pillow is a Python library used for opening, manipulating, and processing images before OCR processing.

5. OpenCV

OpenCV helps with image preprocessing such as resizing, filtering, and noise reduction to improve OCR accuracy.

6. Uvicorn

Uvicorn is an ASGI server used to run FastAPI applications efficiently.

Project Architecture

The architecture of the OCR API is simple and efficient.

Client
  |
  | Upload Image
  |
FastAPI API Endpoint
  |
Image Processing (OpenCV / PIL)
  |
OCR Engine (Tesseract)
  |
Extracted Text Response

The API receives the image, processes it with image processing libraries, then sends the processed image to the OCR engine which extracts the text.

How to Use the Project (Step-by-Step)

Follow the steps below to run the OCR API locally.

Step 1: Clone the Repository

First, clone the GitHub repository.

git clone https://github.com/sf-co/5-fastapi-imageocr-api-fastapi-optical-character-recognition-service.git
cd 5-fastapi-imageocr-api-fastapi-optical-character-recognition-service

Step 2: Create a Virtual Environment

Create a Python virtual environment.

python -m venv venv

Activate it:

Windows:

venv\Scripts\activate

Linux / macOS:

source venv/bin/activate

Step 3: Install Dependencies

Install the required Python libraries.

pip install -r requirements.txt

This installs FastAPI, OCR libraries, and image-processing dependencies required by the project.

Step 4: Install Tesseract OCR

The OCR engine must also be installed on your system.

Ubuntu:

sudo apt install tesseract-ocr

Mac:

brew install tesseract

Windows:

Download and install from the official Tesseract repository.

Step 5: Run the FastAPI Server

Start the API server using Uvicorn.

uvicorn main:app --reload

The server will start at:

http://127.0.0.1:8000

Step 6: Access the API Documentation

FastAPI automatically provides interactive API documentation.

Open:

http://127.0.0.1:8000/docs

Here you can test the OCR endpoint by uploading an image.

Step 7: Test the OCR API

Upload an image file through the API interface.

Example API request:

POST /ocr

Upload an image and the response will return extracted text like:

{
  "text": "Hello World"
}

This makes it easy to integrate the OCR API into other systems such as document processing pipelines or automation tools.

Use Cases for OCR APIs

OCR APIs like this one can be used in many real-world applications:

Document Digitization

Convert scanned documents into searchable digital text.

Invoice Processing

Extract information from invoices automatically.

Receipt Scanning Apps

Capture purchase information from receipt images.

Data Entry Automation

Reduce manual data entry by extracting text automatically.

Image Content Analysis

Analyze screenshots or images containing text.

Advantages of Using FastAPI for OCR Services

There are several reasons why FastAPI is ideal for building OCR microservices.

High Performance

FastAPI supports asynchronous processing which makes it suitable for handling multiple image-processing requests.

Automatic Documentation

The framework automatically generates Swagger UI documentation.

Easy Integration

REST APIs allow easy integration with web apps, mobile apps, and automation systems.

Lightweight Architecture

FastAPI requires minimal boilerplate code, making development faster.

Conclusion

Building an OCR API using FastAPI is an excellent way to create scalable services for extracting text from images. The GitHub project demonstrates how to combine Python libraries like Tesseract, OpenCV, and Pillow with FastAPI to create a simple yet powerful OCR microservice.

With just a few steps, you can deploy your own OCR API and integrate it into applications such as document processing systems, automation workflows, or AI-powered tools.

If you’re looking to build practical AI-powered APIs with Python, this project is a great starting point.

Build a FastAPI Image OCR API: Extract Text from Images Using Python

What is OCR?

Project Overview

Technologies Used

1. Python

2. FastAPI

3. Tesseract OCR

4. Pillow (PIL)

5. OpenCV

6. Uvicorn

Project Architecture

How to Use the Project (Step-by-Step)

Step 1: Clone the Repository

Step 2: Create a Virtual Environment

Step 3: Install Dependencies

Step 4: Install Tesseract OCR

Step 5: Run the FastAPI Server

Step 6: Access the API Documentation

Step 7: Test the OCR API

Use Cases for OCR APIs

Document Digitization

Invoice Processing

Receipt Scanning Apps

Data Entry Automation

Image Content Analysis

Advantages of Using FastAPI for OCR Services

High Performance

Automatic Documentation

Easy Integration

Lightweight Architecture

Conclusion

Leave a Reply Cancel reply

What is OCR?

Project Overview

Technologies Used

1. Python

2. FastAPI

3. Tesseract OCR

4. Pillow (PIL)

5. OpenCV

6. Uvicorn

Project Architecture

How to Use the Project (Step-by-Step)

Step 1: Clone the Repository

Step 2: Create a Virtual Environment

Step 3: Install Dependencies

Step 4: Install Tesseract OCR

Step 5: Run the FastAPI Server

Step 6: Access the API Documentation

Step 7: Test the OCR API

Use Cases for OCR APIs

Document Digitization

Invoice Processing

Receipt Scanning Apps

Data Entry Automation

Image Content Analysis

Advantages of Using FastAPI for OCR Services

High Performance

Automatic Documentation

Easy Integration

Lightweight Architecture

Conclusion

Related Articles

Leave a Reply Cancel reply