Troubleshooting & Diagnostics
DSO is designed to fail securely. When a synchronization or rotation event fails, the system prioritizes existing secret integrity over applying potentially corrupted or unauthorized updates.
1. Provider Authentication Failures
Symptom: ERROR: [Watcher] Failed to fetch secret 'myapp/db': AccessDenied
DSO relies on Machine Identity. Authentication failures typically indicate a mismatch between the host's IAM profile and the required vault permissions.
Diagnostic Steps:
- Verify IAM Policy: Ensure the host's IAM Role (AWS) or Managed Identity (Azure) has explicit
GetSecretValuepermissions for the specific Resource ARN. - Context Check: If running in a containerized environment (e.g., Docker-in-Docker), ensure the cloud metadata service (IMDS) is reachable from within the network namespace.
- Manual Resolution Test: Use the DSO CLI to verify connectivity independently of the rotation engine:bash
docker dso fetch <secret-name>
2. Secret Drift & Rotation Stalls
Symptom: Secret updated in Vault, but container remains on the legacy version.
Diagnostic Steps:
- Watcher Mode: Verify
agent.watch.polling_intervalindso.yaml. If the interval is high (e.g.,10m), the "Reconciliation Gap" may be expected. - Debouncer Interaction: DSO ignores rapid successions of vault updates to prevent "flapping." Check logs for
[Debouncer] Update suppressed. - Label Verification: Ensure the target container has
dso.reloader=true. DSO ignores all containers without this explicit opt-in label.
3. Atomic Rotation Failures (Rolling Strategy)
Symptom: ERROR: [Reloader] Rolling update failed for 'api': Healthcheck timeout
When using the rolling strategy, DSO starts a new container and waits for it to become healthy before removing the old one. If the new container fails its healthcheck, DSO aborts the rotation.
Diagnostic Steps:
- Healthcheck Definition: Ensure a valid
healthcheckis defined indocker-compose.yml. - Logs Inspection: Check the logs of the "Candidate" container (usually named
<service>_dso_new) to see if the application failed to boot with the new secret. - Timeout Adjustment: If the application has a long cold-start time, increase
agent.rotation.health_check_timeout.
4. Signal-Based Reload Issues (SIGHUP)
Symptom: INFO: [Reloader] Signal SIGHUP sent to 'proxy', but configuration remains old.
Diagnostic Steps:
- PID 1 Requirement: DSO sends signals to the container's PID 1. If your entrypoint is a shell script (
sh -c ...), the signal may not be propagated to the application. Useexecin your entrypoint scripts. - App Support: Verify that the application (e.g., Nginx, Go-binary) actually implements a handler for
SIGHUPto reload its configuration from the environment or filesystem.
5. Socket Connectivity
Symptom: failed to connect to /run/docker/plugins/dso.sock: connection refused
Diagnostic Steps:
- Plugin Status: Verify the DSO plugin is enabled:
docker plugin ls. - Daemon Logs: Inspect the plugin's internal logs via the host's journal:bash
# For native installs journalctl -u dso-agent -f
Diagnostic Reference
| Command | Purpose |
|---|---|
docker dso version | Check binary version and build hash |
docker dso fetch <name> | Test vault connectivity and resolution |
docker dso watch | Foreground watcher logs (real-time diagnostics) |
docker dso inspect <id> | View active secret mappings for a container |
Next Steps
- System Architecture: Understand the internal reconciliation loop.
- Security Model: Detailed boundaries and threat mitigations.
- Production Readiness: Best practices to avoid common issues.
