DSO Runtime Operation & Systemd Integration
Overview
This document covers how DSO operates as a runtime service, with emphasis on the systemd-managed agent mode (Phase 4 integration).
v3.5 Runtime Enhancements
Automatic Crash Recovery
On agent startup, DSO v3.5 automatically recovers from incomplete rotations:
- Detection Phase — Scans state file for rotations older than 5 minutes
- Recovery Phase — Identifies orphaned containers using naming patterns (
_dso_backup_,_dso_new_) - Cleanup Phase — Removes orphaned containers automatically
- Validation Phase — Verifies original container state
- Completion Phase — Marks recovery in state tracker
Result: Most agent crashes require zero operator intervention.
Enhanced State Tracking
The state tracker now persists additional metadata:
- New statuses:
recovered,critical_error - 7-day retention: Automatic cleanup of completed rotations prevents state file bloat
- 24-hour stale detection: Rotations without progress marked for operator review
Observability Monitoring
v3.5 includes comprehensive monitoring:
- Per-rotation tracing: Unique trace IDs for end-to-end correlation
- Provider latency monitoring: Tracks min/max/average response times
- Lock contention detection: Alerts on slow acquisitions (>1s)
- Health check diagnostics: Captures exit codes and output for debugging
- Circuit breaker status: Monitors provider failure isolation
Access via:
docker dso status --json | jq '.observability'
Provider Failure Isolation (Circuit Breaker)
When a provider fails:
- Closed State — Normal operation
- Open State — After failure threshold, rejects requests
- Half-Open State — Tests recovery periodically
- Auto-Recovery — Returns to closed when provider recovers
Prevents cascade failures when one provider is unavailable.
Agent Lifecycle
Initialization Phase
1. Binary Installed
└─ /usr/local/bin/dso (or ~/.local/bin/dso)
2. Bootstrap Agent
└─ sudo docker dso bootstrap agent
├─ Create directories (/etc/dso, /var/lib/dso, /var/log/dso, /run/dso)
├─ Generate config (/etc/dso/dso.yaml)
├─ Create systemd service (/etc/systemd/system/dso-agent.service)
└─ Verify permissions
3. Enable Service
└─ sudo docker dso system enable
├─ systemctl enable dso-agent
└─ systemctl start dso-agent
4. Operational
└─ Agent running as systemd service
├─ Listening on /run/dso/agent.sock
├─ Logging to journald
└─ Processing events
Systemd Service Configuration
File: /etc/systemd/system/dso-agent.service
[Unit]
Description=DSO Secret Injection Runtime Agent
Documentation=https://github.com/docker-secret-operator/dso
After=docker.service
Requires=docker.service
[Service]
Type=simple
User=root
Group=root
WorkingDirectory=/var/lib/dso
ExecStart=/usr/local/bin/dso agent --config /etc/dso/dso.yaml
Restart=on-failure
RestartSec=10
StartLimitInterval=60s
StartLimitBurst=3
StandardOutput=journal
StandardError=journal
SyslogIdentifier=dso-agent
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
Key Features:
Type=simple: Direct service executionRestart=on-failure: Automatic restart on unexpected exitStartLimitBurst=3: Allow 3 restarts within 60 secondsjournald: All output goes to systemd journalLimitNOFILE=65535: High file descriptor limit for many connections
Service Management (Phase 4 Commands)
Check Service Status
docker dso system status
# Shows:
# - Service running/stopped
# - Enabled/disabled
# - Recent logs
# - Last activity
Enable Service
sudo docker dso system enable
# - systemctl enable dso-agent
# - systemctl start dso-agent
# - Verifies startup
Disable Service
sudo docker dso system disable
# - systemctl disable dso-agent
# - systemctl stop dso-agent
Restart Service
sudo docker dso system restart
# - systemctl restart dso-agent
# - Useful after config changes
View Service Logs
docker dso system logs # Last 20 lines
docker dso system logs -f # Follow in real-time
docker dso system logs -n 100 # Last 100 lines
docker dso system logs -p err # Errors only
docker dso system logs --since 1h # Last hour
Directory Structure
Local Mode
~/.dso/
├── config.yaml # Configuration file
├── vault.enc # Encrypted vault (AES-256)
├── state/ # State tracking
│ ├── rotations.json # Rotation history
│ ├── containers.json # Container mappings
│ └── runtime.json # Runtime metadata
├── cache/ # Secret cache
│ ├── .metadata # Cache metadata
│ └── secrets.db # Cached secrets
├── logs/ # Local logs
│ └── dso.log # Application logs
└── plugins/ # Provider plugins
├── vault # Vault provider binary
├── aws # AWS provider binary
└── azure # Azure provider binary
Agent Mode
/etc/dso/
├── config.yaml # Production configuration
└── tls/ # Optional TLS certificates
├── cert.pem
└── key.pem
/var/lib/dso/
├── state/ # Persistent state
│ ├── rotations.json
│ ├── containers.json
│ └── runtime.json
├── cache/ # Secret cache
│ └── secrets.db
├── locks/ # Rotation locks
├── plugins/ # Provider plugins
└── snapshots/ # Rollback snapshots
/var/log/dso/
└── agent.log # Agent logs (if file logging enabled)
/run/dso/
└── agent.sock # Unix socket for communication
Configuration Loading
Priority Order
- CLI flag:
dso agent -c /custom/path/config.yaml - Agent config:
/etc/dso/dso.yaml(requires root) - Local config:
~/.dso/config.yaml - Current directory:
./dso.yaml
Validation
On startup, DSO validates:
- YAML syntax
- Required fields presence
- Version compatibility
- Provider configuration
- File permissions
Hot Reload
Configuration changes:
# Edit configuration
sudo nano /etc/dso/dso.yaml
# Validate changes
docker dso config validate
# Apply changes
sudo docker dso system restart
Event-Driven Operation
Docker Events
DSO monitors for:
- Container start/stop
- Container health changes
- Network changes
- Volume changes
# View Docker events
docker events --filter 'service=dso'
Secret Backend Events
DSO monitors providers for:
- Secret creation
- Secret updates
- Secret deletion
- Provider availability changes
Rotation Workflow
Event Detected
↓
Queue operation (with debounce)
↓
Wait 5 seconds (debounce window)
↓
Fetch fresh secret from provider
↓
Create new container with updated secret
↓
Verify health (configurable timeout/retries)
├─ PASS: Atomic swap (rename containers)
│ Stop old container
│ Mark rotation complete
├─ FAIL: Rollback to previous container
│ Log failure reason
│ Alert (if configured)
└─ TIMEOUT: Rollback on timeout
Log timeout event
Alert (if configured)
State Persistence
Why Persistent State?
- Detect incomplete rotations on restart
- Resume interrupted operations
- Track rotation history
- Implement crash recovery
State Files
- rotations.json: Rotation audit log
- containers.json: Container-to-secret mappings
- runtime.json: Startup metadata
Crash Recovery
If agent crashes during rotation:
- On restart, load last state
- Detect incomplete rotation
- Either resume or rollback based on state
- Log recovery action
# Monitor recovery
docker dso system logs --since 5m | grep recovery
Health Checks
Container Health
DSO uses Docker's health check:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8471/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
Agent Health
Check agent availability:
# Via socket (internal)
curl --unix-socket /run/dso/agent.sock http://health
# Via status command
docker dso status
docker dso doctor
Performance Tuning
Cache Configuration
agent:
cache:
ttl: 1h # Time-to-live for cached secrets
max_size: 500Mi # Max cache size
Rotation Tuning
agent:
rotation:
strategy: restart # or: reload (if supported)
timeout: 30s
rollback_on_failure: true
watch:
polling_interval: 5m # Fallback poll if no webhooks
debounce_window: 5s # Debounce rapid changes
Resource Limits
Memory: ~100-200MB baseline
CPU: <5% idle
Disk I/O: Minimal (state updates only)
Network: Per-provider (polling or webhooks)
Monitoring & Logging
Journald Integration
# View all agent logs
journalctl -u dso-agent
# Follow in real-time
journalctl -u dso-agent -f
# Filter by level
journalctl -u dso-agent -p err # Errors only
journalctl -u dso-agent -p warning # Warnings and errors
# Time-based filtering
journalctl -u dso-agent --since "1 hour ago"
journalctl -u dso-agent --until "5 minutes ago"
# JSON output
journalctl -u dso-agent -o json
Log Levels
DEBUG: Verbose operation detailsINFO: Normal operation (default)WARNING: Recoverable issuesERROR: Failures requiring attention
Key Log Events
[INFO] Agent started, version v1.0.0
[INFO] Configuration loaded from /etc/dso/dso.yaml
[INFO] Docker socket connected
[INFO] Secret backend connected: vault
[INFO] Event watcher started
[DEBUG] Secret rotation triggered: app/db_password
[DEBUG] Rotation debounce: 5 second window
[INFO] Rotation complete: app/db_password
[ERROR] Rotation failed: health check timeout
[WARNING] Provider connection lost, retrying...
Troubleshooting
Service Won't Start
# Check service status
systemctl status dso-agent
# Check recent logs
journalctl -u dso-agent -n 50
# Verify configuration
docker dso config validate
# Check permissions
ls -la /etc/dso /var/lib/dso /run/dso
High Memory Usage
# Check cache size
docker dso status | grep -i cache
# Reduce cache if needed
sudo nano /etc/dso/dso.yaml
# Reduce max_size: 500Mi → 100Mi
sudo docker dso system restart
Rotation Failures
# View recent rotations
journalctl -u dso-agent | grep rotation
# Check container health
docker ps --filter health=unhealthy
# Manually check provider
docker dso doctor --level full
Socket Communication Issues
# Check socket exists and is accessible
ls -la /run/dso/agent.sock
# Verify permissions
stat /run/dso/agent.sock
# Test socket communication
echo '{"action":"status"}' | nc -U /run/dso/agent.sock
Operational Runbooks
Graceful Restart
# 1. Get current state
docker dso status --json > ~/dso-backup.json
# 2. Stop agent
sudo docker dso system disable
# 3. Make changes (e.g., update config)
sudo nano /etc/dso/dso.yaml
# 4. Validate changes
docker dso config validate
# 5. Restart agent
sudo docker dso system enable
# 6. Verify
docker dso doctor
docker dso status
Upgrade Agent
# 1. Download new binary
curl -Lo /tmp/dso-new https://...
# 2. Verify checksum
sha256sum -c /tmp/dso-new.sha256
# 3. Backup current
sudo cp /usr/local/bin/dso /usr/local/bin/dso.backup
# 4. Install new version
sudo install -m 755 /tmp/dso-new /usr/local/bin/dso
# 5. Restart service
sudo docker dso system restart
# 6. Verify upgrade
docker dso version
docker dso doctor
For system architecture, see architecture.md.
For day-2 operations, see operational-guide.md.