Modular Backup Architecture: From Centralized to Application-Level with Restic
After launching Platform One in July with 6 applications across 3 VMs, my centralized backup strategy immediately broke. Stackback couldn’t discover volumes across different Docker Compose contexts.
The Problem
My initial architecture:
- Single Stackback (Restic wrapper) container
- Two shared S3 buckets
- Centralized backup configuration via Ansible
The catch: Stackback relies on Docker labels (stack-back.volumes=true) to auto-discover backup targets.
When your backup container runs in a separate docker-compose.yml from your application stacks, it can’t see what it’s supposed to back up.
Real-world impact:
- ❌ Inconsistent backup coverage (some volumes discovered, others missed)
- ❌ No visibility into PostgreSQL backups
- ❌ Single point of failure for credentials
- ❌ Resource contention when all backups ran simultaneously
The Solution
Each application stack gets its own dedicated backup container.
Architecture Evolution
Before:
# centralized-stackback/docker-compose.yml
services:
stackback:
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
RESTIC_REPOSITORY: s3:minio.internal/shared-bucket
After:
# mattermost/docker-compose.yml
services:
mattermost:
labels:
stack-back.volumes: "true"
stack-back.postgres: "true"
stackback:
image: ghcr.io/lawndoc/stack-back:latest
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
RESTIC_REPOSITORY: s3:minio.internal/mattermost-backup-bucket
RESTIC_PASSWORD: ${RESTIC_PASSWORD_MATTERMOST}
BACKUP_CRON: "0 2 * * *"
Key Design Decisions
1. Dedicated S3 Buckets Per Application
Using Terraform’s MinIO provider:
# terraform/modules/minio/main.tf
resource "minio_s3_bucket" "stackback_per_app" {
for_each = var.applications
bucket = "restic-stackback-${each.key}-bucket"
acl = "private"
}
resource "minio_ilm_policy" "stackback_lifecycle" {
for_each = minio_s3_bucket.stackback_per_app
bucket = each.value.bucket
rule {
id = "delete-old-backups"
expiration {
days = 30
}
}
}
Why 30 days? GitLab backups alone consumed exponential storage. Automated lifecycle policies prevent the “set-and-forget-until-disk-full” trap.
2. IAM Credential Isolation
Each application receives unique S3 credentials:
resource "minio_iam_user" "stackback_per_app" {
for_each = var.applications
name = "restic-${each.key}-user"
}
resource "minio_iam_policy" "stackback_per_app" {
policy = jsonencode({
Statement = [{
Effect = "Allow"
Action = ["s3:*"]
Resource = [
"arn:aws:s3:::restic-stackback-${each.key}-bucket/*"
]
}]
})
}
Security win: A compromised application can only access its own backup bucket.
3. Staggered Backup Schedules
Running all backups simultaneously caused I/O storms on NFS storage. Solution: offset schedules.
| Application | Schedule | Resource Group |
|---|---|---|
| Mattermost | 2:00 AM | Green |
| N8N | 2:20 AM | Green |
| Vault | 2:40 AM | Green |
| Linkwarden | 2:00 AM | Blue |
| Solidtime | 2:20 AM | Blue |
4. Vault Integration for Secrets
Backup credentials stored in HashiCorp Vault:
# ansible/roles/platform_one/templates/stackback.env.j2
RESTIC_REPOSITORY=s3:https://{{ minio_endpoint }}/{{ backup_bucket }}
RESTIC_PASSWORD={{ lookup('community.hashi_vault.hashi_vault_read',
'secret=ansible/data/stackback_{{ app_name }}').password }}
AWS_ACCESS_KEY_ID={{ lookup('community.hashi_vault.hashi_vault_read',
'secret=ansible/data/stackback_{{ app_name }}').access_key }}
Implementation with Ansible
Dynamic template generation per application:
# roles/platform_one/tasks/deploy_application.yml
- name: Generate stackback environment file
template:
src: stackback.env.j2
dest: "{{ container_data }}/{{ app_name }}/stackback.env"
mode: '0600'
vars:
backup_bucket: "restic-stackback-{{ vm_name }}-{{ app_name }}-bucket"
backup_schedule: "{{ applications[app_name].backup_schedule | default('0 2 * * *') }}"
when: applications[app_name].backup_enabled | default(false)
Docker Compose integration:
# templates/docker-compose.yml.j2
{% if app.backup_enabled | default(false) %}
stackback:
image: ghcr.io/mittbachweg/stack-back:2024.11.1
container_name: {{ app_name }}_stackback
env_file: ./stackback.env
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
restart: unless-stopped
{% endif %}
Lessons Learned
What worked:
- ✅ Application-level isolation caught backup failures early
- ✅ Staggered schedules eliminated I/O contention
- ✅ Lifecycle policies prevented storage exhaustion
- ✅ Vault integration centralized credential management
What didn’t:
- ❌ Initial 7-day retention was too short (extended to 30 days)
- ❌ Forgot to monitor backup success (added Prometheus metrics)
- ❌ Manual Restic repository initialization (automated via Ansible)
The modular approach trades simplicity for reliability. Worth it.