👍 What we like
- ✓Automatic OCR for scanned PDFs and images
- ✓Self-hosted with no subscription fees
- ✓Organizes documents by correspondent, type, and date
- ✓Runs on VPS, NAS, or mini-PC
👎 What to watch
- ✕OCR is resource-intensive, requiring at least 2 GB RAM
- ✕Requires manual setup of Docker, DNS, and reverse proxy
- ✕Complex configuration with multiple services (PostgreSQL, Redis, Tika)
📑 Contents ▾
- 01 Prerequisites
- 02 Step 1: Install Docker and Docker Compose
- 03 Step 2: Configure DNS
- 04 Step 3: Create the shared Docker network
- 05 Step 4: Prepare the folder and .env file
- 06 Step 5: The Paperless-ngx docker-compose.yml
- 07 Step 6: The Caddyfile for automatic HTTPS
- 08 Step 7: Start services and create the admin account
- 09 Step 8: The consume folder
- 10 Step 9: Setting up automatic filing
- 11 Step 10: Backups
- 12 Final verification
- 13 Security and hardening
- 14 Common pitfalls and troubleshooting
- 15 FAQ
- · What is OCR and why is it essential here?
- · How exactly does the consume folder work?
- · Can I scan directly from my multifunction printer to Paperless?
- · What are Gotenberg and Tika for?
- · Are my documents stored in plain text on the server?
- · Paperless-ngx, Mayan EDMS, or Docspell: which one to choose?
- 22 Related topics
Are you drowning in paper or PDF invoices, contracts, payslips, and administrative mail scattered everywhere? Paperless-ngx turns this chaos into a clean, searchable, and backed-up document library. It is an open-source DMS (Document Management System) that OCRs your documents automatically — it recognizes text in scanned PDFs and images — and then categorizes them by correspondent, type, and date. You drop in a scan, and a few seconds later it is indexed, searchable, and filed according to your rules.
All of this runs on your server, with no subscription. In this tutorial, we deploy Paperless-ngx via Docker Compose with its OCR and consume folder (the “consume folder” where any document dropped is ingested automatically), behind a Caddy reverse proxy in HTTPS.
Prerequisites
- A server or VPS running Ubuntu 24.04 (or Debian 12) with 2 GB of RAM minimum (OCR is resource-intensive; 4 GB is more comfortable for large volumes). A VPS from Hetzner or Infomaniak works well, but Paperless-ngx also runs very well on a NAS or a mini-PC at home.
- Docker and Docker Compose installed (command in step 1).
- A domain name for which you control the DNS. We will use
docs.example.com. - Ports 80 and 443 open for Let’s Encrypt and HTTPS traffic.
Step 1: Install Docker and Docker Compose
sudo apt update
sudo apt install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
Log out and log back in, then verify:
docker --version && docker compose version
Step 2: Configure DNS
At your DNS provider, create an A record (and AAAA if IPv6):
| Type | Name | Value |
|---|---|---|
| A | docs.example.com | 203.0.113.10 |
Verify propagation:
dig +short docs.example.com
Step 3: Create the shared Docker network
docker network create web
If you are already following our Caddy reverse proxy tutorial, this network might already exist: ignore any error.
Step 4: Prepare the folder and .env file
Paperless-ngx relies on four containers: the web application, a PostgreSQL database, Redis (task queue), and Gotenberg + Tika to convert and read certain formats. Configuration is handled via a .env file.
mkdir -p ~/paperless && cd ~/paperless
nano paperless.env
# --- Security ---
# Django secret key (generated in the next step)
PAPERLESS_SECRET_KEY=replace_with_generated_key
# --- Public URL (essential behind an HTTPS reverse proxy) ---
PAPERLESS_URL=https://docs.example.com
PAPERLESS_CSRF_TRUSTED_ORIGINS=https://docs.example.com
# --- OCR: recognized languages (fra = French, eng = English) ---
PAPERLESS_OCR_LANGUAGE=fra+eng
# --- PostgreSQL Database ---
PAPERLESS_DBHOST=postgres
PAPERLESS_DBNAME=paperless
PAPERLESS_DBUSER=paperless
PAPERLESS_DBPASS=replace_with_a_long_password
# --- Redis ---
PAPERLESS_REDIS=redis://redis:6379
# --- Tika / Gotenberg for Office documents ---
PAPERLESS_TIKA_ENABLED=1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT=http://tika:9998
# --- Miscellaneous ---
PAPERLESS_TIME_ZONE=Europe/Paris
# Automatically reclassify the consume folder into subfolders
PAPERLESS_CONSUMER_RECURSIVE=true
USERMAP_UID=1000
USERMAP_GID=1000
Generate the secret key and a database password:
echo "PAPERLESS_SECRET_KEY : $(openssl rand -base64 48)"
echo "PAPERLESS_DBPASS : $(openssl rand -base64 32)"
Copy these values into paperless.env.
Step 5: The Paperless-ngx docker-compose.yml
nano docker-compose.yml
services:
redis:
image: docker.io/library/redis:7-alpine
container_name: paperless_redis
restart: unless-stopped
volumes:
- redisdata:/data
networks:
- paperless-internal
postgres:
image: docker.io/library/postgres:16-alpine
container_name: paperless_postgres
restart: unless-stopped
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: ${PAPERLESS_DBPASS}
env_file:
- paperless.env
volumes:
- pgdata:/var/lib/postgresql/data
networks:
- paperless-internal
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
container_name: paperless
restart: unless-stopped
depends_on:
- postgres
- redis
- gotenberg
- tika
env_file:
- paperless.env
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
networks:
- web
- paperless-internal
gotenberg:
image: docker.io/gotenberg/gotenberg:8
container_name: paperless_gotenberg
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
networks:
- paperless-internal
tika:
image: docker.io/apache/tika:latest
container_name: paperless_tika
restart: unless-stopped
networks:
- paperless-internal
volumes:
data:
media:
pgdata:
redisdata:
networks:
web:
external: true
paperless-internal:
The webserver service is on two networks: paperless-internal (database, Redis, Gotenberg, Tika) and web (for Caddy). The ./consume and ./export volumes are local folders: the first is the consume folder, the second is used for exports/backups. No ports: section on the webserver: everything goes through the reverse proxy.
Step 6: The Caddyfile for automatic HTTPS
If you already have a Caddy setup, add the site block. Otherwise, create a Caddy folder:
mkdir -p ~/caddy && cd ~/caddy
nano Caddyfile
{
email admin@example.com
}
docs.example.com {
encode gzip zstd
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
X-Content-Type-Options "nosniff"
Referrer-Policy "strict-origin-when-cross-origin"
-Server
}
# Paperless accepts large scanned PDF files
request_body {
max_size 200MB
}
reverse_proxy paperless:8000
}
The internal port of Paperless-ngx is 8000.
Then the Caddy docker-compose.yml:
nano docker-compose.yml
services:
caddy:
image: caddy:2-alpine
container_name: caddy
restart: unless-stopped
ports:
- "80:80"
- "443:443"
- "443:443/udp"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
networks:
- web
volumes:
caddy_data:
caddy_config:
networks:
web:
external: true
Step 7: Start services and create the admin account
Start Paperless (the first launch downloads several images and OCR models), then Caddy:
cd ~/paperless && docker compose up -d
cd ~/caddy && docker compose up -d
Create the superuser (administrator account) using the dedicated command:
cd ~/paperless
docker compose run --rm webserver createsuperuser
Enter a username, email, and strong password. Then visit https://docs.example.com and log in: the Paperless-ngx interface will appear.
Step 8: The consume folder
This is the key feature. Any file dropped into the ~/paperless/consume folder is automatically ingested, OCR’d, indexed, and then deleted from the folder (the original is kept in the internal media library). Test it:
# Copy a test PDF into the consume folder
cp ~/a-document.pdf ~/paperless/consume/
# Follow ingestion in the logs
docker compose -f ~/paperless/docker-compose.yml logs -f webserver
A few seconds later, the document appears in the interface, and its text is searchable.
To go further, you can mount this consume folder as a network share (Samba/NFS): your scanner or multifunction printer can then “scan to folder,” and every scan lands directly in Paperless. You can also automate the import of PDFs received by email via the application’s rules.
Step 9: Setting up automatic filing
Paperless-ngx organizes documents by correspondents, document types, and tags. In Administration → Processing Rules, create rules that automatically apply a correspondent or tag based on OCR content (for example: any document containing “EDF” gets the correspondent “EDF” and the tag “Energy”). The more documents you process, the better the auto-suggestion engine becomes.
Step 10: Backups
Two options. The most robust is Paperless’ native export, which produces a complete archive (documents + database + metadata) that can be re-imported:
nano ~/paperless/backup.sh
#!/bin/bash
set -euo pipefail
cd "$HOME/paperless"
STAMP=$(date +%Y%m%d-%H%M%S)
# Native Paperless export (documents + metadata) into ./export
docker compose exec -T webserver document_exporter ../export -d
# Archive the dated export
tar -czf "$HOME/paperless/paperless-export-$STAMP.tar.gz" -C "$HOME/paperless/export" .
# Keep 14 days of archives
find "$HOME/paperless" -maxdepth 1 -name 'paperless-export-*.tar.gz' -mtime +14 -delete
chmod +x ~/paperless/backup.sh
(crontab -l 2>/dev/null; echo "0 3 * * * $HOME/paperless/backup.sh") | crontab -
Your administrative documents are irreplaceable: send these archives off-site and encrypted using our guide automatic backup with Restic and Backblaze.
Final verification
# Containers are running
docker compose -f ~/paperless/docker-compose.yml ps
# HTTP redirects to HTTPS (code 308)
curl -sI http://docs.example.com | head -1
# HTTPS responds
curl -sI https://docs.example.com | head -1
Security and hardening
- Strict UFW firewall. Open only 80, 443, and SSH. Internal ports (PostgreSQL, Redis, Tika, Gotenberg) are never exposed.
- Strong admin password and, ideally, separate user accounts with permissions per document.
- Protected secret key.
paperless.envcontainsPAPERLESS_SECRET_KEYand the database password:chmod 600 paperless.env, never commit it to version control. - Controlled updates. Some versions migrate the database: do an export (step 10) first, then
docker compose pull && docker compose up -d. See also install and secure an Ubuntu VPS.
Common pitfalls and troubleshooting
- OCR in English on French documents.
PAPERLESS_OCR_LANGUAGEdoes not containfra. Set it tofra+engand restart; for existing documents, re-process them. - CSRF error on login.
PAPERLESS_URLandPAPERLESS_CSRF_TRUSTED_ORIGINSdo not match the public HTTPS URL. Correct them and restart. - Consume folder not ingesting anything. Check permissions (
USERMAP_UID/GIDmust match the folder owner) and logs. On a network share, enablePAPERLESS_CONSUMER_POLLINGas inotify does not always work. - Lack of RAM during OCR. OCRing large PDFs is heavy: add RAM/swap, or limit parallel OCR tasks.
FAQ
What is OCR and why is it essential here?
OCR (Optical Character Recognition) extracts text from a scanned document or image. This is what makes a scanned PDF searchable: without OCR, a scan is just an image, impossible to find by its content. Paperless-ngx automatically OCRs every ingested document, in the languages you configure, allowing you to find any paper by a word it contains.
How exactly does the consume folder work?
It is a watched folder: any file you drop into it (manually, via scanner, via network share) is automatically ingested, OCR’d, indexed, and filed according to your rules, then removed from the folder. It is the simplest way to feed your DMS: “scan to folder” and forget.
Can I scan directly from my multifunction printer to Paperless?
Yes. Mount the consume folder as a network share (Samba/NFS), then configure your scanner or multifunction printer to “scan to this folder.” Every scan is processed automatically. Remember to enable polling if inotify does not detect files on a network share.
What are Gotenberg and Tika for?
They extend supported formats: Tika extracts content from Office documents (Word, Excel…) and Gotenberg converts them to PDF for uniform archiving. Without them, Paperless handles PDFs and images perfectly, but not office files. Optional but recommended if you archive Office documents.
Are my documents stored in plain text on the server?
Yes, files are stored as-is in the media volume (no encryption at rest by default). Protection relies on disk encryption and encrypted off-site backups. For very sensitive documents, encrypt the underlying volume at the filesystem level.
Paperless-ngx, Mayan EDMS, or Docspell: which one to choose?
Paperless-ngx is the most popular and easiest to deploy for personal or family use; Mayan EDMS targets enterprise needs with advanced workflows; Docspell stands out in automatic processing and matching. For most homelabs, Paperless-ngx is the best starting point. Our comparison Paperless-ngx vs Mayan vs Docspell details the differences.
Related topics
- Paperless-ngx vs Mayan vs Docspell: which self-hosted DMS
- Automatic HTTPS reverse proxy with Caddy and Docker
- Install and secure an Ubuntu VPS from A to Z
- Automatic backup with Restic and Backblaze
You now have a complete DMS, with multilingual OCR, consume folder, and automatic filing, in HTTPS and backed up — your administrative papers finally under control. This is one of the services that reduces mental load the most for a household. To follow Paperless-ngx updates, filing tips, and best practices for self-hosting, subscribe to our Telegram watch bot.