Optical Character Recognition (OCR) is a powerful technology that converts text in images into machine-readable text. From scanning invoices and documents to extracting text from screenshots, OCR plays a crucial role in many modern applications.
In this tutorial, we’ll explore a GitHub project that demonstrates how to build a FastAPI-based OCR API service capable of extracting text from images. The project exposes a simple API endpoint where users can upload an image and receive the extracted text in response.
The repository we’ll explore is:
https://github.com/sf-co/5-fastapi-imageocr-api-fastapi-optical-character-recognition-service
By the end of this guide, you’ll understand how the project works, how to run it locally, and how to use the API for extracting text from images.
What is OCR?
OCR (Optical Character Recognition) is a technology that analyzes images and detects textual information within them. It converts scanned documents, screenshots, and photographs into editable and searchable text. OCR is widely used in document processing systems, automation workflows, and data extraction pipelines.
This project implements OCR as a REST API using FastAPI, allowing developers to integrate image text extraction functionality into applications easily.
Project Overview
This project provides a lightweight OCR microservice that accepts image files and returns extracted text via an API endpoint. The application uses Python libraries for image processing and OCR recognition, making it suitable for building document processing or automation systems.
The API workflow typically follows these steps:
- A user uploads an image file.
- The FastAPI endpoint receives the image.
- The OCR engine processes the image.
- Extracted text is returned in the API response.
Such OCR APIs are commonly used in applications that require automated data extraction from documents, receipts, invoices, or screenshots.
Technologies Used
This project leverages several technologies commonly used in Python-based AI and API development.
1. Python
Python is the core programming language used to build the OCR service. Its ecosystem includes powerful libraries for machine learning, computer vision, and automation.
2. FastAPI
FastAPI is a modern Python web framework used for building high-performance APIs. It automatically generates interactive documentation and supports asynchronous processing.
3. Tesseract OCR
Tesseract is one of the most widely used open-source OCR engines. It processes images and extracts textual content from them.
4. Pillow (PIL)
Pillow is a Python library used for opening, manipulating, and processing images before OCR processing.
5. OpenCV
OpenCV helps with image preprocessing such as resizing, filtering, and noise reduction to improve OCR accuracy.
6. Uvicorn
Uvicorn is an ASGI server used to run FastAPI applications efficiently.
Project Architecture
The architecture of the OCR API is simple and efficient.
Client
|
| Upload Image
|
FastAPI API Endpoint
|
Image Processing (OpenCV / PIL)
|
OCR Engine (Tesseract)
|
Extracted Text Response
The API receives the image, processes it with image processing libraries, then sends the processed image to the OCR engine which extracts the text.
How to Use the Project (Step-by-Step)
Follow the steps below to run the OCR API locally.
Step 1: Clone the Repository
First, clone the GitHub repository.
git clone https://github.com/sf-co/5-fastapi-imageocr-api-fastapi-optical-character-recognition-service.git
cd 5-fastapi-imageocr-api-fastapi-optical-character-recognition-service
Step 2: Create a Virtual Environment
Create a Python virtual environment.
python -m venv venv
Activate it:
Windows:
venv\Scripts\activate
Linux / macOS:
source venv/bin/activate
Step 3: Install Dependencies
Install the required Python libraries.
pip install -r requirements.txt
This installs FastAPI, OCR libraries, and image-processing dependencies required by the project.
Step 4: Install Tesseract OCR
The OCR engine must also be installed on your system.
Ubuntu:
sudo apt install tesseract-ocr
Mac:
brew install tesseract
Windows:
Download and install from the official Tesseract repository.
Step 5: Run the FastAPI Server
Start the API server using Uvicorn.
uvicorn main:app --reload
The server will start at:
http://127.0.0.1:8000
Step 6: Access the API Documentation
FastAPI automatically provides interactive API documentation.
Open:
http://127.0.0.1:8000/docs
Here you can test the OCR endpoint by uploading an image.
Step 7: Test the OCR API
Upload an image file through the API interface.
Example API request:
POST /ocr
Upload an image and the response will return extracted text like:
{
"text": "Hello World"
}
This makes it easy to integrate the OCR API into other systems such as document processing pipelines or automation tools.
Use Cases for OCR APIs
OCR APIs like this one can be used in many real-world applications:
Document Digitization
Convert scanned documents into searchable digital text.
Invoice Processing
Extract information from invoices automatically.
Receipt Scanning Apps
Capture purchase information from receipt images.
Data Entry Automation
Reduce manual data entry by extracting text automatically.
Image Content Analysis
Analyze screenshots or images containing text.
Advantages of Using FastAPI for OCR Services
There are several reasons why FastAPI is ideal for building OCR microservices.
High Performance
FastAPI supports asynchronous processing which makes it suitable for handling multiple image-processing requests.
Automatic Documentation
The framework automatically generates Swagger UI documentation.
Easy Integration
REST APIs allow easy integration with web apps, mobile apps, and automation systems.
Lightweight Architecture
FastAPI requires minimal boilerplate code, making development faster.
Conclusion
Building an OCR API using FastAPI is an excellent way to create scalable services for extracting text from images. The GitHub project demonstrates how to combine Python libraries like Tesseract, OpenCV, and Pillow with FastAPI to create a simple yet powerful OCR microservice.
With just a few steps, you can deploy your own OCR API and integrate it into applications such as document processing systems, automation workflows, or AI-powered tools.
If you’re looking to build practical AI-powered APIs with Python, this project is a great starting point.





