前言
prometheus在容器云的領(lǐng)域?qū)嵙ξ阌怪靡桑絹碓蕉嗟脑圃M件直接提供prometheus的metrics接口,無需額外的exporter。所以采用prometheus作為整個集群的監(jiān)控方案是合適的。但是metrics的存儲這塊,prometheus提供了本地
存儲,即tsdb時序數(shù)據(jù)庫。本地存儲的優(yōu)勢就是運維簡單,啟動prometheus只需一個命令,下面兩個啟動參數(shù)指定了數(shù)據(jù)路徑和保存時間。
* storage.tsdb.path: tsdb數(shù)據(jù)庫路徑,默認 data/
* storage.tsdb.retention: 數(shù)據(jù)保留時間,默認15天
缺點就是無法大量的metrics持久化。當然prometheus2.0以后壓縮數(shù)據(jù)能力得到了很大的提升。
為了解決單節(jié)點存儲的限制,prometheus沒有自己實現(xiàn)集群存儲,而是提供了遠程讀寫的接口,讓用戶自己選擇合適的時序數(shù)據(jù)庫來實現(xiàn)prometheus的擴展性。
prometheus通過下面兩張方式來實現(xiàn)與其他的遠端存儲系統(tǒng)對接
* Prometheus 按照標準的格式將metrics寫到遠端存儲
* prometheus 按照標準格式從遠端的url來讀取metrics
下面我將重點剖析遠端存儲的方案
遠端存儲方案
配置文件
遠程寫
# The URL of the endpoint to send samples to. url: <string> # Timeout for
requests to the remote write endpoint. [ remote_timeout: <duration> | default =
30s ] # List of remote write relabel configurations. write_relabel_configs: [ -
<relabel_config> ... ]# Sets the `Authorization` header on every remote write
request with the # configured username and password. # password and
password_file are mutually exclusive. basic_auth: [ username: <string> ] [
password: <string> ] [ password_file: <string> ]# Sets the `Authorization`
header on every remote write request with # the configured bearer token. It is
mutually exclusive with `bearer_token_file`. [ bearer_token: <string> ] # Sets
the `Authorization` header on every remote write request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]# Configures the remote write
request's TLS settings. tls_config: [ <tls_config> ] # Optional proxy URL. [
proxy_url: <string> ]# Configures the queue used to write to remote storage.
queue_config:# Number of samples to buffer per shard before we start dropping
them. [ capacity: <int> | default = 100000 ] # Maximum number of shards, i.e.
amount of concurrency. [ max_shards: <int> | default = 1000 ] # Maximum number
of samples per send. [ max_samples_per_send: <int> | default = 100] # Maximum
time a sample will wait in buffer. [ batch_send_deadline: <duration> | default =
5s ] # Maximum number of times to retry a batch on recoverable errors. [
max_retries: <int> |default = 10 ] # Initial retry delay. Gets doubled for
every retry. [ min_backoff: <duration> | default = 30ms ] # Maximum retry delay.
[ max_backoff: <duration> |default = 100ms ]
遠程讀
# The URL of the endpoint to query from. url: <string> # An optional list of
equality matchers which have to be # present in a selector to query the remote
read endpoint. required_matchers: [ <labelname>: <labelvalue> ... ] # Timeout
for requests to the remote read endpoint. [ remote_timeout: <duration> |
default =1m ] # Whether reads should be made for queries for time ranges that #
the local storage should have complete data for. [ read_recent: <boolean> |
default =false ] # Sets the `Authorization` header on every remote read request
with the # configured username and password. # password and password_file are
mutually exclusive. basic_auth: [ username: <string> ] [ password: <string> ] [
password_file: <string> ] # Sets the `Authorization` header on every remote
read request with # the configured bearer token. It is mutually exclusive with
`bearer_token_file`. [ bearer_token: <string> ] # Sets the `Authorization`
header on every remote read request with the bearer token # read from the
configured file. It is mutually exclusive with `bearer_token`. [
bearer_token_file: /path/to/bearer/token/file ] # Configures the remote read
request's TLS settings. tls_config: [ <tls_config> ] # Optional proxy URL. [
proxy_url: <string> ]
PS
* 遠程寫配置中的write_relabel_configs
該配置項,充分利用了prometheus強大的relabel的功能??梢赃^濾需要寫到遠端存儲的metrics。
例如:選擇指定的metrics。
remote_write: - url: "http://prometheus-remote-storage-adapter-svc:9201/write"
write_relabel_configs: - action: keep source_labels: [__name__] regex:
container_network_receive_bytes_total|container_network_receive_packets_dropped_total
* global配置中external_labels,在prometheus的聯(lián)邦和遠程讀寫的可以考慮設(shè)置該配置項,從而區(qū)分各個集群。 global:
scrape_interval:20s # The labels to add to any time series or alerts when
communicating with # external systems (federation, remote storage,
Alertmanager). external_labels: cid: '9'
已有的遠端存儲的方案
現(xiàn)在社區(qū)已經(jīng)實現(xiàn)了以下的遠程存儲方案
* AppOptics: write
* Chronix: write
* Cortex: read and write
* CrateDB: read and write
* Elasticsearch: write
* Gnocchi: write
* Graphite: write
* InfluxDB: read and write
* OpenTSDB: write
* PostgreSQL/TimescaleDB: read and write
* SignalFx: write
上面有些存儲是只支持寫的。其實研讀源碼,能否支持遠程讀,
取決于該存儲是否支持正則表達式的查詢匹配。具體實現(xiàn)下一節(jié),將會解讀一下prometheus-postgresql-adapter
<https://yq.aliyun.com/go/articleRenderRedirect?url=https%3A%2F%2Fgithub.com%2Ftimescale%2Fprometheus-postgresql-adapter>
和如何實現(xiàn)一個自己的adapter。
同時支持遠程讀寫的
* Cortex來源于weave公司,整個架構(gòu)對prometheus做了上層的封裝,用到了很多組件。稍微復(fù)雜。
* InfluxDB
開源版不支持集群。對于metrics量比較大的,寫入壓力大,然后influxdb-relay方案并不是真正的高可用。當然餓了么開源了influxdb-proxy,有興趣的可以嘗試一下。
* CrateDB 基于es。具體了解不多
* TimescaleDB 個人比較中意該方案。傳統(tǒng)運維對pgsql熟悉度高,運維靠譜。目前支持 streaming replication方案支持高可用。
后記
其實如果收集的metrics用于數(shù)據(jù)分析,可以考慮clickhouse數(shù)據(jù)庫,集群方案和寫入性能以及支持遠程讀寫。這塊正在研究中。待有了一定成果以后再專門寫一篇文章解讀。目前我們的持久化方案準備用TimescaleDB。
本文轉(zhuǎn)自SegmentFault-k8s與監(jiān)控--prometheus的遠端存儲
<https://yq.aliyun.com/go/articleRenderRedirect?url=https%3A%2F%2Fsegmentfault.com%2Fa%2F1190000015576540>
熱門工具 換一換