🛠️ Tutorials · ⏱ 9 min read

Deploy Paperless-ngx with Docker in 2026: Complete Guide (OCR, Consume Folder, HTTPS)

Step-by-step 2026 tutorial to self-host Paperless-ngx, the open-source DMS, using Docker Compose. Features include multilingual OCR, consume folder, smart classification, HTTPS via Caddy and Let's Encrypt, backups, and hardening. Ready-to-use configs included.

S By Selfhostr Team · independent tests
Deploy Paperless-ngx with Docker in 2026: Complete Guide (OCR, Consume Folder, HTTPS)
ⓘ This article may contain affiliate links (no extra cost to you, it supports our tests). See the disclosure.
💾
2 GB
Min RAM
🧠
4 GB
Recommended RAM
🌍
fra+eng
OCR Languages
🐳
Docker Compose
Deployment

👍 What we like

  • Automatic OCR for scanned PDFs and images
  • Self-hosted with no subscription fees
  • Organizes documents by correspondent, type, and date
  • Runs on VPS, NAS, or mini-PC

👎 What to watch

  • OCR is resource-intensive, requiring at least 2 GB RAM
  • Requires manual setup of Docker, DNS, and reverse proxy
  • Complex configuration with multiple services (PostgreSQL, Redis, Tika)
▶ Video tutorial: deploy Paperless-ngx
📑 Contents

Are you drowning in paper or PDF invoices, contracts, payslips, and administrative mail scattered everywhere? Paperless-ngx turns this chaos into a clean, searchable, and backed-up document library. It is an open-source DMS (Document Management System) that OCRs your documents automatically — it recognizes text in scanned PDFs and images — and then categorizes them by correspondent, type, and date. You drop in a scan, and a few seconds later it is indexed, searchable, and filed according to your rules.

All of this runs on your server, with no subscription. In this tutorial, we deploy Paperless-ngx via Docker Compose with its OCR and consume folder (the “consume folder” where any document dropped is ingested automatically), behind a Caddy reverse proxy in HTTPS.

Prerequisites

  • A server or VPS running Ubuntu 24.04 (or Debian 12) with 2 GB of RAM minimum (OCR is resource-intensive; 4 GB is more comfortable for large volumes). A VPS from Hetzner or Infomaniak works well, but Paperless-ngx also runs very well on a NAS or a mini-PC at home.
  • Docker and Docker Compose installed (command in step 1).
  • A domain name for which you control the DNS. We will use docs.example.com.
  • Ports 80 and 443 open for Let’s Encrypt and HTTPS traffic.

Step 1: Install Docker and Docker Compose

sudo apt update
sudo apt install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER

Log out and log back in, then verify:

docker --version && docker compose version

Step 2: Configure DNS

At your DNS provider, create an A record (and AAAA if IPv6):

TypeNameValue
Adocs.example.com203.0.113.10

Verify propagation:

dig +short docs.example.com

Step 3: Create the shared Docker network

docker network create web

If you are already following our Caddy reverse proxy tutorial, this network might already exist: ignore any error.

Step 4: Prepare the folder and .env file

Paperless-ngx relies on four containers: the web application, a PostgreSQL database, Redis (task queue), and Gotenberg + Tika to convert and read certain formats. Configuration is handled via a .env file.

mkdir -p ~/paperless && cd ~/paperless
nano paperless.env
# --- Security ---
# Django secret key (generated in the next step)
PAPERLESS_SECRET_KEY=replace_with_generated_key

# --- Public URL (essential behind an HTTPS reverse proxy) ---
PAPERLESS_URL=https://docs.example.com
PAPERLESS_CSRF_TRUSTED_ORIGINS=https://docs.example.com

# --- OCR: recognized languages (fra = French, eng = English) ---
PAPERLESS_OCR_LANGUAGE=fra+eng

# --- PostgreSQL Database ---
PAPERLESS_DBHOST=postgres
PAPERLESS_DBNAME=paperless
PAPERLESS_DBUSER=paperless
PAPERLESS_DBPASS=replace_with_a_long_password

# --- Redis ---
PAPERLESS_REDIS=redis://redis:6379

# --- Tika / Gotenberg for Office documents ---
PAPERLESS_TIKA_ENABLED=1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT=http://tika:9998

# --- Miscellaneous ---
PAPERLESS_TIME_ZONE=Europe/Paris
# Automatically reclassify the consume folder into subfolders
PAPERLESS_CONSUMER_RECURSIVE=true
USERMAP_UID=1000
USERMAP_GID=1000

Generate the secret key and a database password:

echo "PAPERLESS_SECRET_KEY : $(openssl rand -base64 48)"
echo "PAPERLESS_DBPASS : $(openssl rand -base64 32)"

Copy these values into paperless.env.

Step 5: The Paperless-ngx docker-compose.yml

nano docker-compose.yml
services:
  redis:
    image: docker.io/library/redis:7-alpine
    container_name: paperless_redis
    restart: unless-stopped
    volumes:
      - redisdata:/data
    networks:
      - paperless-internal

  postgres:
    image: docker.io/library/postgres:16-alpine
    container_name: paperless_postgres
    restart: unless-stopped
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: ${PAPERLESS_DBPASS}
    env_file:
      - paperless.env
    volumes:
      - pgdata:/var/lib/postgresql/data
    networks:
      - paperless-internal

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    container_name: paperless
    restart: unless-stopped
    depends_on:
      - postgres
      - redis
      - gotenberg
      - tika
    env_file:
      - paperless.env
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    healthcheck:
      test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
      interval: 30s
      timeout: 10s
      retries: 5
    networks:
      - web
      - paperless-internal

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8
    container_name: paperless_gotenberg
    restart: unless-stopped
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"
    networks:
      - paperless-internal

  tika:
    image: docker.io/apache/tika:latest
    container_name: paperless_tika
    restart: unless-stopped
    networks:
      - paperless-internal

volumes:
  data:
  media:
  pgdata:
  redisdata:

networks:
  web:
    external: true
  paperless-internal:

The webserver service is on two networks: paperless-internal (database, Redis, Gotenberg, Tika) and web (for Caddy). The ./consume and ./export volumes are local folders: the first is the consume folder, the second is used for exports/backups. No ports: section on the webserver: everything goes through the reverse proxy.

Step 6: The Caddyfile for automatic HTTPS

If you already have a Caddy setup, add the site block. Otherwise, create a Caddy folder:

mkdir -p ~/caddy && cd ~/caddy
nano Caddyfile
{
    email admin@example.com
}

docs.example.com {
    encode gzip zstd

    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Content-Type-Options "nosniff"
        Referrer-Policy "strict-origin-when-cross-origin"
        -Server
    }

    # Paperless accepts large scanned PDF files
    request_body {
        max_size 200MB
    }

    reverse_proxy paperless:8000
}

The internal port of Paperless-ngx is 8000.

Then the Caddy docker-compose.yml:

nano docker-compose.yml
services:
  caddy:
    image: caddy:2-alpine
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
      - "443:443/udp"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy_data:/data
      - caddy_config:/config
    networks:
      - web

volumes:
  caddy_data:
  caddy_config:

networks:
  web:
    external: true

Step 7: Start services and create the admin account

Start Paperless (the first launch downloads several images and OCR models), then Caddy:

cd ~/paperless && docker compose up -d
cd ~/caddy && docker compose up -d

Create the superuser (administrator account) using the dedicated command:

cd ~/paperless
docker compose run --rm webserver createsuperuser

Enter a username, email, and strong password. Then visit https://docs.example.com and log in: the Paperless-ngx interface will appear.

Step 8: The consume folder

This is the key feature. Any file dropped into the ~/paperless/consume folder is automatically ingested, OCR’d, indexed, and then deleted from the folder (the original is kept in the internal media library). Test it:

# Copy a test PDF into the consume folder
cp ~/a-document.pdf ~/paperless/consume/
# Follow ingestion in the logs
docker compose -f ~/paperless/docker-compose.yml logs -f webserver

A few seconds later, the document appears in the interface, and its text is searchable.

To go further, you can mount this consume folder as a network share (Samba/NFS): your scanner or multifunction printer can then “scan to folder,” and every scan lands directly in Paperless. You can also automate the import of PDFs received by email via the application’s rules.

Step 9: Setting up automatic filing

Paperless-ngx organizes documents by correspondents, document types, and tags. In Administration → Processing Rules, create rules that automatically apply a correspondent or tag based on OCR content (for example: any document containing “EDF” gets the correspondent “EDF” and the tag “Energy”). The more documents you process, the better the auto-suggestion engine becomes.

Step 10: Backups

Two options. The most robust is Paperless’ native export, which produces a complete archive (documents + database + metadata) that can be re-imported:

nano ~/paperless/backup.sh
#!/bin/bash
set -euo pipefail
cd "$HOME/paperless"
STAMP=$(date +%Y%m%d-%H%M%S)

# Native Paperless export (documents + metadata) into ./export
docker compose exec -T webserver document_exporter ../export -d

# Archive the dated export
tar -czf "$HOME/paperless/paperless-export-$STAMP.tar.gz" -C "$HOME/paperless/export" .

# Keep 14 days of archives
find "$HOME/paperless" -maxdepth 1 -name 'paperless-export-*.tar.gz' -mtime +14 -delete
chmod +x ~/paperless/backup.sh
(crontab -l 2>/dev/null; echo "0 3 * * * $HOME/paperless/backup.sh") | crontab -

Your administrative documents are irreplaceable: send these archives off-site and encrypted using our guide automatic backup with Restic and Backblaze.

Final verification

# Containers are running
docker compose -f ~/paperless/docker-compose.yml ps

# HTTP redirects to HTTPS (code 308)
curl -sI http://docs.example.com | head -1

# HTTPS responds
curl -sI https://docs.example.com | head -1

Security and hardening

  • Strict UFW firewall. Open only 80, 443, and SSH. Internal ports (PostgreSQL, Redis, Tika, Gotenberg) are never exposed.
  • Strong admin password and, ideally, separate user accounts with permissions per document.
  • Protected secret key. paperless.env contains PAPERLESS_SECRET_KEY and the database password: chmod 600 paperless.env, never commit it to version control.
  • Controlled updates. Some versions migrate the database: do an export (step 10) first, then docker compose pull && docker compose up -d. See also install and secure an Ubuntu VPS.

Common pitfalls and troubleshooting

  • OCR in English on French documents. PAPERLESS_OCR_LANGUAGE does not contain fra. Set it to fra+eng and restart; for existing documents, re-process them.
  • CSRF error on login. PAPERLESS_URL and PAPERLESS_CSRF_TRUSTED_ORIGINS do not match the public HTTPS URL. Correct them and restart.
  • Consume folder not ingesting anything. Check permissions (USERMAP_UID/GID must match the folder owner) and logs. On a network share, enable PAPERLESS_CONSUMER_POLLING as inotify does not always work.
  • Lack of RAM during OCR. OCRing large PDFs is heavy: add RAM/swap, or limit parallel OCR tasks.

FAQ

What is OCR and why is it essential here?

OCR (Optical Character Recognition) extracts text from a scanned document or image. This is what makes a scanned PDF searchable: without OCR, a scan is just an image, impossible to find by its content. Paperless-ngx automatically OCRs every ingested document, in the languages you configure, allowing you to find any paper by a word it contains.

How exactly does the consume folder work?

It is a watched folder: any file you drop into it (manually, via scanner, via network share) is automatically ingested, OCR’d, indexed, and filed according to your rules, then removed from the folder. It is the simplest way to feed your DMS: “scan to folder” and forget.

Can I scan directly from my multifunction printer to Paperless?

Yes. Mount the consume folder as a network share (Samba/NFS), then configure your scanner or multifunction printer to “scan to this folder.” Every scan is processed automatically. Remember to enable polling if inotify does not detect files on a network share.

What are Gotenberg and Tika for?

They extend supported formats: Tika extracts content from Office documents (Word, Excel…) and Gotenberg converts them to PDF for uniform archiving. Without them, Paperless handles PDFs and images perfectly, but not office files. Optional but recommended if you archive Office documents.

Are my documents stored in plain text on the server?

Yes, files are stored as-is in the media volume (no encryption at rest by default). Protection relies on disk encryption and encrypted off-site backups. For very sensitive documents, encrypt the underlying volume at the filesystem level.

Paperless-ngx, Mayan EDMS, or Docspell: which one to choose?

Paperless-ngx is the most popular and easiest to deploy for personal or family use; Mayan EDMS targets enterprise needs with advanced workflows; Docspell stands out in automatic processing and matching. For most homelabs, Paperless-ngx is the best starting point. Our comparison Paperless-ngx vs Mayan vs Docspell details the differences.

You now have a complete DMS, with multilingual OCR, consume folder, and automatic filing, in HTTPS and backed up — your administrative papers finally under control. This is one of the services that reduces mental load the most for a household. To follow Paperless-ngx updates, filing tips, and best practices for self-hosting, subscribe to our Telegram watch bot.

Tags: Paperless-ngxDockerDocker ComposeOCRCaddyLet's EncryptSelf-hostingOpen SourceDocument ManagementDevOps

Related

🛠️ Tutorials

Deploy Immich with Docker in 2026: Complete Guide (HTTPS, Reverse Proxy, Backups)

Step-by-step 2026 tutorial to self-host Immich, the Google Photos alternative, using Docker Compose. Features include photo/video gallery, facial recognition, automatic HTTPS via Caddy and Let's Encrypt, backups, and hardening. Ready-to-use configs included.

Read
🛠️ Tutorials

Deploy Jellyfin with Docker in 2026: Complete Guide (Hardware Transcoding, HTTPS, Reverse Proxy)

Step-by-step 2026 tutorial to self-host Jellyfin, the open-source media server, using Docker Compose. Covers film/series libraries, Intel/NVIDIA hardware transcoding, automatic HTTPS via Caddy and Let's Encrypt, remote access, and hardening. Ready-to-use configs included.

Read
🛠️ Tutorials

Self-Hosting Your Website in 2026: Complete Guide (VPS, Docker, HTTPS)

2026 technical guide to self-hosting on a VPS: choosing plans, Docker setup, Let's Encrypt HTTPS, security, and real costs. Compare self-hosting vs. cloud.

Read