alpha
Vector
Vector sink resilience patterns for ClickHouse and disk-buffered pipelines.
ClickHouse Sink Failure Cascade
When ClickHouse becomes unavailable (restart, maintenance), Vector’s default configuration triggers three cascading failures:
- File descriptor exhaustion — Retry attempts accumulate open HTTP connections. Vector hits the OS limit (
EMFILE, errno 24) and thehttp_serversource crashes withToo many open files. - Event loss from backpressure — Default in-memory buffer fills, backpressure propagates to sources. HTTP sources drop incoming events with
"Source send cancelled". No replay possible. - Process crash — The
http_serversource task exits fatally, taking Vector down.
Raise File Descriptor Limit
Vector MUST have a raised file descriptor limit in systemd. Default is too low for retry storms.
# /etc/systemd/system/vector.service.d/override.conf
[Service]
LimitNOFILE=262144
Verify on running process:
cat /proc/$(pidof vector)/limits | grep "open files"
Disk Buffers on All Sinks
Every sink MUST use disk buffers. In-memory buffers lose events on restart and cause backpressure cascades.
sinks:
clickhouse_from_files:
buffer:
type: "disk"
max_size: 5368709120 # 5GB
when_full: "block"
clickhouse_from_http:
buffer:
type: "disk"
max_size: 5368709120 # 5GB
when_full: "drop_newest"
Minimum disk buffer size: 256MB (268435488 bytes). Data syncs to disk every 500ms — survives forced restarts.
block vs drop_newest
The when_full strategy depends on whether the source supports replay.
- File sources (
type: file) →block. Vector tracks read position via checkpoints. When the sink recovers, it resumes from where it stopped. No data loss. - HTTP sources (
type: http_server) →drop_newest. No replay mechanism. Dropping new events is better than blocking the source and triggering"Source send cancelled"on all incoming requests.