Vector | AdaptiveGears

ClickHouse Sink Failure Cascade

When ClickHouse becomes unavailable (restart, maintenance), Vector’s default configuration triggers three cascading failures:

File descriptor exhaustion — Retry attempts accumulate open HTTP connections. Vector hits the OS limit (EMFILE, errno 24) and the http_server source crashes with Too many open files.
Event loss from backpressure — Default in-memory buffer fills, backpressure propagates to sources. HTTP sources drop incoming events with "Source send cancelled". No replay possible.
Process crash — The http_server source task exits fatally, taking Vector down.

Raise File Descriptor Limit

Vector MUST have a raised file descriptor limit in systemd. Default is too low for retry storms.

# /etc/systemd/system/vector.service.d/override.conf
[Service]
LimitNOFILE=262144

Verify on running process:

cat /proc/$(pidof vector)/limits | grep "open files"

Disk Buffers on All Sinks

Every sink MUST use disk buffers. In-memory buffers lose events on restart and cause backpressure cascades.

sinks:
  clickhouse_from_files:
    buffer:
      type: "disk"
      max_size: 5368709120  # 5GB
      when_full: "block"

  clickhouse_from_http:
    buffer:
      type: "disk"
      max_size: 5368709120  # 5GB
      when_full: "drop_newest"

Minimum disk buffer size: 256MB (268435488 bytes). Data syncs to disk every 500ms — survives forced restarts.

`block` vs `drop_newest`

The when_full strategy depends on whether the source supports replay.

File sources (type: file) → block. Vector tracks read position via checkpoints. When the sink recovers, it resumes from where it stopped. No data loss.
HTTP sources (type: http_server) → drop_newest. No replay mechanism. Dropping new events is better than blocking the source and triggering "Source send cancelled" on all incoming requests.

ClickHouse Sink Failure Cascade

Raise File Descriptor Limit

Disk Buffers on All Sinks

block vs drop_newest

`block` vs `drop_newest`