- Dockerfile 58.6%
- Shell 41.4%
- Changed Caddy ports from 80/443 to 1080/1443 for non-root deployment - Changed Caddy image from caddy:2.9-alpine to caddy:latest - Updated container names: keycloak-db -> postgres, keycloak-app -> keycloak - Added restart: always policy for all services - Added json-file logging with 10MB max size and 3 file rotation - Added health_port 9000 to Caddy health check configuration - Changed PostgreSQL healthcheck to explicit command with pg_isready - Added KC_HTTP_ENABLED and KC_HTTP_MANAGEMENT_HEALTH_ENABLED to Keycloak - Updated Caddy volume paths to use local directory (./caddy_volume/) |
||
|---|---|---|
| Caddyfile | ||
| docker-compose.yml | ||
| Dockerfile | ||
| init-replication.sh | ||
| README.md | ||
| sample.env | ||
Keycloak Docker Scaleout
Docker Compose Configuration Details
This project uses several Docker Compose features for production-grade deployments:
Logging Configuration
All services use the json-file logging driver with the following settings:
- Max Size: 10MB per log file
- Max Files: 3 files (rotation)
This prevents log files from consuming excessive disk space. View logs with:
docker logs keycloak-proxy
docker logs postgres
docker logs keycloak
docker logs keycloak-db-backup
Restart Policies
All services are configured with restart: always which ensures:
- Containers automatically restart if they crash
- Containers restart after system reboot
- Services come up in the correct order after restart
View container restart count with:
docker compose ps
Health Checks
PostgreSQL uses the pg_isready command to verify the database is accepting connections. Keycloak waits for PostgreSQL to pass its health check before starting.
A production-ready solution for deploying Keycloak with PostgreSQL using Docker Compose, designed for seamless transition from a single VPS to a multi-region cluster architecture.
A production-ready solution for deploying Keycloak with PostgreSQL using Docker Compose, designed for seamless transition from a single VPS to a multi-region cluster architecture.
Architecture Overview
This setup uses:
-
Caddy as a reverse proxy with automatic TLS certificate management via Let's Encrypt
-
Keycloak 26 with a multi-stage Docker build for optimized production deployment
-
PostgreSQL 17 with physical streaming replication support
-
Sidecar backup container for automated database backups
-
JDBC-based clustering for automatic node discovery
-
HTTP enabled on port 8080 (management health on port 9000)
-
Production optimizations via
--optimizedflag -
Separate management port for health checks (
KC_HTTP_MANAGEMENT_HEALTH_ENABLED=true)
Prerequisites
Before you begin, ensure you have:
- Domain Name: A domain name pointing to your primary VPS IP address
- VPS Requirements:
- Linux-based VPS (Ubuntu 22.04+ or similar)
- At least 4GB RAM (8GB recommended for production)
- Docker and Docker Compose installed
- Network Access: Ports 1080 and 1443 must be open for Caddy (or configure your firewall for the ports you choose)
Preparation
Step 1: Directory Setup
Put all files in a single directory on your primary VPS:
mkdir -p ~/keycloak-cluster
cd ~/keycloak-cluster
# Copy all files from this repository into this directory
Note: This setup uses named volumes for Caddy data (caddy_data) and config (caddy_config), and local directories for PostgreSQL data (postgres_data) and backups (backups).
Step 2: Domain Configuration
Ensure your domain name is already pointing to your VPS IP address:
# Verify DNS is working
nslookup your-auth-domain.com
# or
dig your-auth-domain.com
Let's Encrypt validation requires the domain to resolve correctly before certificate issuance.
Step 3: Environment Configuration
Copy and configure the sample environment file:
cp sample.env .env
nano .env # or use your preferred editor
Configure the following variables:
| Variable | Description | Example |
|---|---|---|
DOMAIN |
Your Keycloak hostname | auth.example.com |
LETSENCRYPT_EMAIL |
Email for Let's Encrypt certificates | admin@example.com |
POSTGRES_DB |
Database name for Keycloak | keycloak |
POSTGRES_USER |
Database user for Keycloak | keycloak |
POSTGRES_PASSWORD |
Password for the database user | Secure password |
REPLICATION_USER |
User for PostgreSQL replication | replicator |
REPLICATION_PASSWORD |
Password for replication user | Secure password |
KEYCLOAK_ADMIN |
Keycloak admin username | admin |
KEYCLOAK_ADMIN_PASSWORD |
Keycloak admin password | Secure password |
KC_DB_PASSWORD |
Database password for Keycloak | Same as POSTGRES_PASSWORD |
BACKUP_SCHEDULE |
Cron schedule for backups | @daily |
BACKUP_KEEP_DAYS |
Number of days to retain backups | 7 |
Service Ports
| Service | Internal Port | External Port | Purpose |
|---|---|---|---|
| Caddy (HTTP) | 80 | 1080 | Web proxy for Keycloak |
| Caddy (HTTPS) | 443 | 1443 | TLS-terminated web proxy |
| PostgreSQL | 5432 | - | Internal database access |
| Keycloak | 8080 | - | Internal application server |
| Keycloak Management | 9000 | - | Internal health checks |
Deployment
Step 1: Start the Initial Cluster
Run Docker Compose to deploy all services:
docker compose up -d
This will:
- Build the optimized Keycloak Docker image
- Start PostgreSQL with replication-ready configuration
- Initialize the database with replication user and slot
- Start Keycloak connected to the database with HTTP and management health endpoints enabled
- Start Caddy on ports 1080 (HTTP) and 1443 (HTTPS) which automatically requests TLS certificates
- Start the backup container
Note: Caddy is configured to use ports 1080 and 1443 instead of the standard 80 and 443. Adjust your firewall rules accordingly. The Caddy container image has been updated to caddy:latest for the most recent features and security patches.
Step 2: Verify Deployment
Check that all services are running:
docker compose ps
Expected output shows all containers as "Up":
NAME IMAGE STATUS
keycloak-proxy caddy:latest Up
postgres postgres:17-alpine Up
keycloak keycloak-docker-scaleout Up
keycloak-db-backup prodrigestivill/... Up
Note: All containers are configured with restart: always to ensure automatic recovery after failures or system reboots.
Step 3: Access Keycloak
Wait 1-2 minutes for Let's Encrypt certificates to be issued, then access:
https://your-domain.com:1443
Note: Since Caddy is configured on port 1443 instead of 443, you need to include the port in the URL unless you configure a reverse proxy or firewall to redirect traffic.
You should see the Keycloak login page with a valid HTTPS certificate.
Database Replication Setup
Understanding the Replication Strategy
For scaling to multiple regions, PostgreSQL uses physical streaming replication where:
- The primary VPS (initial deployment) acts as the primary server
- Additional VPS instances act as standby servers that stream changes
- Read queries can be distributed to standbys, writes go to primary
Step 1: Prepare the Primary Database
The initial deployment automatically:
- Creates a replication user (
REPLICATION_USER) - Creates a physical replication slot
- Updates
pg_hba.confto allow replication connections
Verify the replication slot exists:
docker exec postgres psql -U postgres -c "SELECT * FROM pg_replication_slots;"
Step 2: Create Standby Nodes
On each secondary VPS (future standby server):
- Install Docker and Docker Compose
- Create a directory and copy the necessary files (Dockerfile is not needed)
- Create a
standby-compose.ymlfile:
version: '3.8'
services:
postgres-standby:
image: postgres:17-alpine
container_name: keycloak-db-standby
environment:
POSTGRES_DB: keycloak
POSTGRES_USER: keycloak
POSTGRES_PASSWORD: your_secure_password
command: >
postgres -c wal_level=replica
-c max_wal_senders=10
-c max_replication_slots=10
-c hot_standby=on
volumes:
- ./postgres_data:/var/lib/postgresql/data
networks:
- keycloak-net
networks:
keycloak-net:
driver: bridge
- Run the standby database:
# Stop and remove any existing container
docker compose down
# Clone the primary database using pg_basebackup
docker run --rm \
-v $(pwd)/postgres_data:/var/lib/postgresql/data \
postgres:17-alpine \
pg_basebackup -h <primary-vps-ip> -U replicator -D /var/lib/postgresql/data -P -Xs -P -R --slot=replication_slot_primary
# Replace <primary-vps-ip> with your primary VPS IP address
# Start the standby database
docker compose up -d
The -R flag creates a standby.signal file and postgresql.auto.conf with connection info.
- Verify replication is working:
docker exec keycloak-db-standby psql -U postgres -c "SELECT * FROM pg_stat_replication;"
On the primary, you should see the standby connection listed.
Scaling Out Keycloak Nodes
Step 1: Deploy Additional Keycloak Instances
On each additional VPS where you want to run Keycloak:
- Copy the Dockerfile and create a
keycloak-compose.yml:
version: '3.8'
services:
keycloak:
build: .
container_name: keycloak-app
command: start --optimized
environment:
KC_DB: postgres
KC_DB_URL: jdbc:postgresql://<primary-vps-ip>:5432/keycloak
KC_DB_USERNAME: keycloak
KC_DB_PASSWORD: your_db_password
KC_HOSTNAME: your-cluster-domain.com
KC_PROXY_HEADERS: xforwarded
KEYCLOAK_ADMIN: admin
KEYCLOAK_ADMIN_PASSWORD: your_admin_password
# Clustering configuration
KC_PROFILE: ha
KC_TRANSPORT: jdbc-ping
networks:
- keycloak-net
restart: unless-stopped
networks:
keycloak-net:
driver: bridge
- Start the additional Keycloak node:
docker compose up -d
Step 2: Verify Clustering
Keycloak 26 uses JDBC-based clustering with jdbc-ping for automatic node discovery. All nodes connect to the same PostgreSQL database and automatically discover each other.
Check that nodes are discovering each other by viewing the logs:
docker logs keycloak | grep -i cluster
You should see messages about cluster membership and node discovery.
Step 3: Add Standby Nodes to Caddy Load Balancing
Update your primary Caddyfile to include all Keycloak nodes:
{$DOMAIN} {
reverse_proxy keycloak:8080 your-secondary-node-ip:8080 {
header_up X-Forwarded-Proto {scheme}
lb_policy cookie
health_uri /health/live
health_interval 10s
health_status 200
}
}
Reload Caddy configuration:
docker exec keycloak-proxy caddy reload --config /etc/caddy/Caddyfile
Note: When scaling to multiple Keycloak nodes, each node should be configured with the same KC_DB_URL pointing to your primary PostgreSQL database.
Backup and Recovery
Automated Backups
The pg-backup container runs on a schedule defined in .env:
# View backup logs
docker logs keycloak-db-backup
# List backups
ls -la backups/
Manual Backup
Trigger a backup on demand:
docker exec keycloak-db-backup pg_dump -U $POSTGRES_USER $POSTGRES_DB > backup.sql
Restore from Backup
# Stop services
docker compose down
# Restore the backup (use 'postgres' container name)
docker exec -i postgres psql -U $POSTGRES_USER -d $POSTGRES_DB < backup.sql
# Start services
docker compose up -d
Monitoring and Health Checks
Keycloak Health Endpoints
Keycloak exposes health checks at:
- Liveness:
https://your-domain.com/health/live - Readiness:
https://your-domain.com/health/ready - Metrics:
https://your-domain.com/health/metrics
Database Health
Check PostgreSQL replication status on primary:
docker exec postgres psql -U postgres -c "SELECT * FROM pg_stat_replication;"
Check if standby is in recovery mode:
docker exec postgres-standby psql -U postgres -c "SELECT pg_is_in_recovery();"
Troubleshooting
Let's Encrypt Certificate Issues
If certificates fail to issue:
# Check Caddy logs
docker logs keycloak-proxy
# Verify DNS and port 80 access
curl -I http://your-domain.com/.well-known/acme-challenge/
Database Connection Failures
# Check PostgreSQL is accepting connections
docker exec postgres pg_isready
# Check Keycloak logs
docker logs keycloak | grep -i postgres
Replication Issues
# Check replication lag on primary
docker exec postgres psql -U postgres -c "SELECT * FROM pg_stat_replication;"
# Check if standby is receiving WAL
docker exec postgres-standby psql -U postgres -c "SELECT * FROM pg_stat_wal_receiver;"
Keycloak Clustering Issues
# Verify JDBC-Ping is enabled
docker logs keycloak | grep -i "jdbc-ping"
# Check cluster membership
docker logs keycloak | grep -i "JGroups"
Production Checklist
Before going to production:
- All passwords are strong and unique
- TLS certificates are issued and valid
- Database replication is working
- Backups are running on schedule
- Monitoring and alerting are configured
- Firewall rules are restrictive (only allow necessary ports)
- Regular security updates are planned
- Disaster recovery procedures are documented