IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    k8s_安装12_operator_Prometheus+grafana

    C1G发表于 2024-05-17 09:50:16
    love 0

    十二 operator安装Prometheus+grafana

    Prometheus

    Promtheus 本身只支持单机部署,没有自带支持集群部署,也不支持高可用以及水平扩容,它的存储空间受限于本地磁盘的容量。同时随着数据采集量的增加,单台 Prometheus 实例能够处理的时间序列数会达到瓶颈,这时 CPU 和内存都会升高,一般内存先达到瓶颈,主要原因有:

    • Prometheus 的内存消耗主要是因为每隔 2 小时做一个 Block 数据落盘,落盘之前所有数据都在内存里面,因此和采集量有关。
    • 加载历史数据时,是从磁盘到内存的,查询范围越大,内存越大。这里面有一定的优化空间。
    • 一些不合理的查询条件也会加大内存,如 Group 或大范围 Rate。
      这个时候要么加内存,要么通过集群分片来减少每个实例需要采集的指标。
      Prometheus 主张根据功能或服务维度进行拆分,即如果要采集的服务比较多,一个 Prometheus 实例就配置成仅采集和存储某一个或某一部分服务的指标,这样根据要采集的服务将 Prometheus 拆分成多个实例分别去采集,也能一定程度上达到水平扩容的目的。

    安装选型

    • 原生 prometheus
      自行创造一切
      如果您已准备好了Prometheus组件、及其先决条件,则可以通过参考其相互之间的依赖关系,以正确的顺序为Prometheus、Alertmanager、Grafana的所有密钥、以及ConfigMaps等每个组件,手动部署YAML规范文件。这种方法通常非常耗时,并且需要花费大量的精力,去部署和管理Prometheus生态系统。同时,它还需要构建强大的文档,以便将其复制到其他环境中。

    • prometheus-operator
      Prometheus operator并非Prometheus官方组件,是由CoreOS公司研发
      使用Kubernetes Custom Resource简化部署与配置Prometheus、Alertmanager等相关的监控组件 ​
      官方安装文档: https://prometheus-operator.dev/docs/user-guides/getting-started/
      ​Prometheus Operator requires use of Kubernetes v1.16.x and up.)需要Kubernetes版本至少在v1.16.x以上 ​
      官方Github地址:https://github.com/prometheus-operator/prometheus-operator

    • kube-prometheus
      kube-prometheus提供基于Prometheus & Prometheus Operator完整的集群监控配置示例,包括多实例Prometheus & Alertmanager部署与配置及node exporter的metrics采集,以及scrape Prometheus target各种不同的metrics endpoints,Grafana,并提供Alerting rules一些示例,触发告警集群潜在的问题 ​
      官方安装文档:https://prometheus-operator.dev/docs/prologue/quick-start/ ​
      安装要求:https://github.com/prometheus-operator/kube-prometheus#compatibility
      ​官方Github地址:https://github.com/prometheus-operator/kube-prometheus

    • helm chart prometheus-community/kube-prometheus-stack
      提供类似kube-prometheus的功能,但是该项目是由Prometheus-community来维护,
      具体信息参考https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#kube-prometheus-stack

    k8s operator 方式安装 Prometheus+grafana

    用operator的方式部署Prometheus+Grafana,这是一种非常简单使用的方法
    打开Prometheus operator的GitHub主页https://github.com/prometheus-operator/kube-prometheus,首先确认自己的kubernetes版本应该使用哪个版本的Prometheus operator.
    我这里的kubernetes是1.28版本,因此使用的operator应该是release-0.13
    https://github.com/prometheus-operator/kube-prometheus/tree/release-0.13

    资源准备

    安装资源准备

    wget –no-check-certificate https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.zip -O prometheus-0.13.0.zip
    unzip prometheus-0.13.0.zip
    cd kube-prometheus-0.13.0

    提取image

    cat manifests/.yaml|grep image:|sed -e ‘s/.image: //’|sort|uniq
    提取出image地址

    grafana/grafana:9.5.3
    jimmidyson/configmap-reload:v0.5.0
    quay.io/brancz/kube-rbac-proxy:v0.14.2
    quay.io/prometheus/alertmanager:v0.26.0
    quay.io/prometheus/blackbox-exporter:v0.24.0
    quay.io/prometheus/node-exporter:v1.6.1
    quay.io/prometheus-operator/prometheus-operator:v0.67.1
    quay.io/prometheus/prometheus:v2.46.0
    registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
    registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1

    推送到私仓

    手动下载网络不好的,并推送至私仓repo.k8s.local
    注意:事配制好私仓repo.k8s.local,并建立相应项目及权限.

    #registry.k8s.io地址下的
    registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
    registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1
    
    docker pull k8s.dockerproxy.com/kube-state-metrics/kube-state-metrics:v2.9.2
    docker pull k8s.dockerproxy.com/prometheus-adapter/prometheus-adapter:v0.11.1
    
    docker tag k8s.dockerproxy.com/kube-state-metrics/kube-state-metrics:v2.9.2 repo.k8s.local/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
    docker tag k8s.dockerproxy.com/prometheus-adapter/prometheus-adapter:v0.11.1 repo.k8s.local/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1
    
    docker push repo.k8s.local/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
    docker push repo.k8s.local/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1
    #重命名docker.io下的
    docker pull jimmidyson/configmap-reload:v0.5.0
    docker pull grafana/grafana:9.5.3
    
    docker tag jimmidyson/configmap-reload:v0.5.0 repo.k8s.local/docker.io/jimmidyson/configmap-reload:v0.5.0
    docker tag grafana/grafana:9.5.3 repo.k8s.local/docker.io/grafana/grafana:9.5.3
    docker push repo.k8s.local/docker.io/jimmidyson/configmap-reload:v0.5.0
    docker push repo.k8s.local/docker.io/grafana/grafana:9.5.3
    kube-prometheus-0.13.0/manifests/prometheusOperator-deployment.yaml
    #     - --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
    #quay.io单独一个
    docker pull  quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
    docker tag quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1 repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
    docker push repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
    #使用脚本批量下载quay.io
    vi images.txt
    quay.io/prometheus/alertmanager:v0.26.0
    quay.io/prometheus/blackbox-exporter:v0.24.0
    quay.io/brancz/kube-rbac-proxy:v0.14.2
    quay.io/prometheus/node-exporter:v1.6.1
    quay.io/prometheus-operator/prometheus-operator:v0.67.1
    quay.io/prometheus/prometheus:v2.46.0

    vim auto-pull-and-push-images.sh

    #!/bin/bash
    #新镜像标签:默认取当前时间作为标签名
    imageNewTag=`date +%Y%m%d-%H%M%S`
    #镜像仓库地址
    registryAddr="repo.k8s.local/"
    
    #循环读取images.txt,并存入list中
    n=0
    
    for line in $(cat images.txt | grep ^[^#])
    do
        list[$n]=$line
        ((n+=1))
    done
    
    echo "需推送的镜像地址如下:"
    for variable in ${list[@]}
    do
        echo ${variable}
    done
    
    for variable in ${list[@]}
    do
        #下载镜像
        echo "准备拉取镜像: $variable"
        docker pull $variable
    
        # #获取拉取的镜像ID
        imageId=`docker images -q $variable`
        echo "[$variable]拉取完成后的镜像ID: $imageId"
    
        #获取完整的镜像名
        imageFormatName=`docker images --format "{{.Repository}}:{{.Tag}}:{{.ID}}" |grep $variable`
        echo "imageFormatName:$imageFormatName"
    
        #最开头地址
      #如:quay.io/prometheus-operator/prometheus-operator:v0.67.1  -> quay.io
      repository=${imageFormatName}
        repositoryurl=${imageFormatName%%/*}
        echo "repositoryurl :$repositoryurl"
    
        #删掉第一个:及其右边的字符串
      #如:quay.io/prometheus-operator/prometheus-operator:v0.67.11:b6ec194a1a0 -> quay.io/prometheus-operator/prometheus-operator:v0.67.11
        repository=${repository%:*}
    
        echo "新镜像地址: $registryAddr$repository"
    
        #重新打镜像标签
        docker tag $imageId $registryAddr$repository
    
        # #推送镜像
        docker push $registryAddr$repository
      echo -e "\n"
    done

    chmod 755 auto-pull-and-push-images.sh
    ./auto-pull-and-push-images.sh

    替换yaml中image地址为私仓

    #测试
    sed -n "/image:/{s/image: jimmidyson/image: repo.k8s.local\/docker.io\/jimmidyson/p}" `grep 'image: jimmidyson' ./manifests/ -rl`
    sed -n "/image:/{s/image: grafana/image: repo.k8s.local\/docker.io\/grafana/p}" `grep 'image: grafana' ./manifests/ -rl`
    sed -n "/image:/{s/image: registry.k8s.io/image: repo.k8s.local\/registry.k8s.io/p}" `grep 'image: registry.k8s.io' ./manifests/ -rl`
    sed -n "/image:/{s/image: quay.io/image: repo.k8s.local\/quay.io/p}" `grep 'image: quay.io' ./manifests/ -rl`
    
    #替换
    sed -i "/image:/{s/image: jimmidyson/image: repo.k8s.local\/docker.io\/jimmidyson/}" `grep 'image: jimmidyson' ./manifests/ -rl`
    sed -i "/image:/{s/image: grafana/image: repo.k8s.local\/docker.io\/grafana/}" `grep 'image: grafana' ./manifests/ -rl`
    sed -i "/image:/{s/image: registry.k8s.io/image: repo.k8s.local\/registry.k8s.io/}" `grep 'image: registry.k8s.io' ./manifests/ -rl`
    sed -i "/image:/{s/image: quay.io/image: repo.k8s.local\/quay.io/}" `grep 'image: quay.io' ./manifests/ -rl`
    
    #重新验证
    cat manifests/*.yaml|grep image:|sed -e 's/.*image: //'
    manifests/prometheusOperator-deployment.yaml
          containers:
          - args:
            - --kubelet-service=kube-system/kubelet
            - --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
            image: repo.k8s.local/quay.io/prometheus-operator/prometheus-operator:v0.67.1
            name: prometheus-operator
    修改prometheus-config-reloader
           - --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1

    安装Prometheus+Grafana(安装和启动)

    首先,回到kube-prometheus-0.13.0 目录,执行以下命令开始安装

    kubectl apply --server-side -f manifests/setup
    
    customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied
    customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied
    namespace/monitoring serverside-applied
    kubectl apply -f manifests/
    alertmanager.monitoring.coreos.com/main created
    networkpolicy.networking.k8s.io/alertmanager-main created
    poddisruptionbudget.policy/alertmanager-main created
    prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
    secret/alertmanager-main created
    service/alertmanager-main created
    serviceaccount/alertmanager-main created
    servicemonitor.monitoring.coreos.com/alertmanager-main created
    clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
    clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
    configmap/blackbox-exporter-configuration created
    deployment.apps/blackbox-exporter created
    networkpolicy.networking.k8s.io/blackbox-exporter created
    service/blackbox-exporter created
    serviceaccount/blackbox-exporter created
    servicemonitor.monitoring.coreos.com/blackbox-exporter created
    secret/grafana-config created
    secret/grafana-datasources created
    configmap/grafana-dashboard-alertmanager-overview created
    configmap/grafana-dashboard-apiserver created
    configmap/grafana-dashboard-cluster-total created
    configmap/grafana-dashboard-controller-manager created
    configmap/grafana-dashboard-grafana-overview created
    configmap/grafana-dashboard-k8s-resources-cluster created
    configmap/grafana-dashboard-k8s-resources-multicluster created
    configmap/grafana-dashboard-k8s-resources-namespace created
    configmap/grafana-dashboard-k8s-resources-node created
    configmap/grafana-dashboard-k8s-resources-pod created
    configmap/grafana-dashboard-k8s-resources-workload created
    configmap/grafana-dashboard-k8s-resources-workloads-namespace created
    configmap/grafana-dashboard-kubelet created
    configmap/grafana-dashboard-namespace-by-pod created
    configmap/grafana-dashboard-namespace-by-workload created
    configmap/grafana-dashboard-node-cluster-rsrc-use created
    configmap/grafana-dashboard-node-rsrc-use created
    configmap/grafana-dashboard-nodes-darwin created
    configmap/grafana-dashboard-nodes created
    configmap/grafana-dashboard-persistentvolumesusage created
    configmap/grafana-dashboard-pod-total created
    configmap/grafana-dashboard-prometheus-remote-write created
    configmap/grafana-dashboard-prometheus created
    configmap/grafana-dashboard-proxy created
    configmap/grafana-dashboard-scheduler created
    configmap/grafana-dashboard-workload-total created
    configmap/grafana-dashboards created
    deployment.apps/grafana created
    networkpolicy.networking.k8s.io/grafana created
    prometheusrule.monitoring.coreos.com/grafana-rules created
    service/grafana created
    serviceaccount/grafana created
    servicemonitor.monitoring.coreos.com/grafana created
    prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
    clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
    clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
    deployment.apps/kube-state-metrics created
    networkpolicy.networking.k8s.io/kube-state-metrics created
    prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
    service/kube-state-metrics created
    serviceaccount/kube-state-metrics created
    servicemonitor.monitoring.coreos.com/kube-state-metrics created
    prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
    servicemonitor.monitoring.coreos.com/kube-apiserver created
    servicemonitor.monitoring.coreos.com/coredns created
    servicemonitor.monitoring.coreos.com/kube-controller-manager created
    servicemonitor.monitoring.coreos.com/kube-scheduler created
    servicemonitor.monitoring.coreos.com/kubelet created
    clusterrole.rbac.authorization.k8s.io/node-exporter created
    clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
    daemonset.apps/node-exporter created
    networkpolicy.networking.k8s.io/node-exporter created
    prometheusrule.monitoring.coreos.com/node-exporter-rules created
    service/node-exporter created
    serviceaccount/node-exporter created
    servicemonitor.monitoring.coreos.com/node-exporter created
    clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
    clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
    networkpolicy.networking.k8s.io/prometheus-k8s created
    poddisruptionbudget.policy/prometheus-k8s created
    prometheus.monitoring.coreos.com/k8s created
    prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
    role.rbac.authorization.k8s.io/prometheus-k8s-config created
    role.rbac.authorization.k8s.io/prometheus-k8s created
    role.rbac.authorization.k8s.io/prometheus-k8s created
    role.rbac.authorization.k8s.io/prometheus-k8s created
    service/prometheus-k8s created
    serviceaccount/prometheus-k8s created
    servicemonitor.monitoring.coreos.com/prometheus-k8s created
    apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
    clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
    clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader configured
    clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
    clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
    clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
    configmap/adapter-config created
    deployment.apps/prometheus-adapter created
    networkpolicy.networking.k8s.io/prometheus-adapter created
    poddisruptionbudget.policy/prometheus-adapter created
    rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
    service/prometheus-adapter created
    serviceaccount/prometheus-adapter created
    servicemonitor.monitoring.coreos.com/prometheus-adapter created
    clusterrole.rbac.authorization.k8s.io/prometheus-operator created
    clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
    deployment.apps/prometheus-operator created
    networkpolicy.networking.k8s.io/prometheus-operator created
    prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
    service/prometheus-operator created
    serviceaccount/prometheus-operator created
    servicemonitor.monitoring.coreos.com/prometheus-operator created
    kubectl get pods -o wide -n monitoring
    NAME                                   READY   STATUS    RESTARTS   AGE    IP              NODE                 NOMINATED NODE   READINESS GATES
    alertmanager-main-0                    2/2     Running   0          82s    10.244.1.6      node01.k8s.local     <none>           <none>
    alertmanager-main-1                    2/2     Running   0          82s    10.244.1.7      node01.k8s.local     <none>           <none>
    alertmanager-main-2                    2/2     Running   0          82s    10.244.2.3      node02.k8s.local     <none>           <none>
    blackbox-exporter-76847bbff-wt77c      3/3     Running   0          104s   10.244.2.252    node02.k8s.local     <none>           <none>
    grafana-5955685bfd-shf4s               1/1     Running   0          103s   10.244.2.253    node02.k8s.local     <none>           <none>
    kube-state-metrics-7dddfffd96-2ktrs    3/3     Running   0          103s   10.244.1.4      node01.k8s.local     <none>           <none>
    node-exporter-g8d5k                    2/2     Running   0          102s   192.168.244.4   master01.k8s.local   <none>           <none>
    node-exporter-mqqkc                    2/2     Running   0          102s   192.168.244.7   node02.k8s.local     <none>           <none>
    node-exporter-zpfl2                    2/2     Running   0          102s   192.168.244.5   node01.k8s.local     <none>           <none>
    prometheus-adapter-6db6c659d4-25lgm    1/1     Running   0          100s   10.244.1.5      node01.k8s.local     <none>           <none>
    prometheus-adapter-6db6c659d4-ps5mz    1/1     Running   0          100s   10.244.2.254    node02.k8s.local     <none>           <none>
    prometheus-k8s-0                       2/2     Running   0          81s    10.244.1.8      node01.k8s.local     <none>           <none>
    prometheus-k8s-1                       2/2     Running   0          81s    10.244.2.4      node02.k8s.local     <none>           <none>
    prometheus-operator-797d795d64-4wnw2   2/2     Running   0          99s    10.244.2.2      node02.k8s.local     <none>           <none>
    kubectl get svc -n monitoring -o wide
    NAME                    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE     SELECTOR
    alertmanager-main       ClusterIP   10.96.71.121   <none>        9093/TCP,8080/TCP            2m10s   app.kubernetes.io/component=alert-router,app.kubernetes.io/instance=main,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=kube-prometheus
    alertmanager-operated   ClusterIP   None           <none>        9093/TCP,9094/TCP,9094/UDP   108s    app.kubernetes.io/name=alertmanager
    blackbox-exporter       ClusterIP   10.96.33.150   <none>        9115/TCP,19115/TCP           2m10s   app.kubernetes.io/component=exporter,app.kubernetes.io/name=blackbox-exporter,app.kubernetes.io/part-of=kube-prometheus
    grafana                 ClusterIP   10.96.12.88    <none>        3000/TCP                     2m9s    app.kubernetes.io/component=grafana,app.kubernetes.io/name=grafana,app.kubernetes.io/part-of=kube-prometheus
    kube-state-metrics      ClusterIP   None           <none>        8443/TCP,9443/TCP            2m9s    app.kubernetes.io/component=exporter,app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/part-of=kube-prometheus
    node-exporter           ClusterIP   None           <none>        9100/TCP                     2m8s    app.kubernetes.io/component=exporter,app.kubernetes.io/name=node-exporter,app.kubernetes.io/part-of=kube-prometheus
    prometheus-adapter      ClusterIP   10.96.24.212   <none>        443/TCP                      2m7s    app.kubernetes.io/component=metrics-adapter,app.kubernetes.io/name=prometheus-adapter,app.kubernetes.io/part-of=kube-prometheus
    prometheus-k8s          ClusterIP   10.96.57.42    <none>        9090/TCP,8080/TCP            2m8s    app.kubernetes.io/component=prometheus,app.kubernetes.io/instance=k8s,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=kube-prometheus
    prometheus-operated     ClusterIP   None           <none>        9090/TCP                     107s    app.kubernetes.io/name=prometheus
    prometheus-operator     ClusterIP   None           <none>        8443/TCP                     2m6s    app.kubernetes.io/component=controller,app.kubernetes.io/name=prometheus-operator,app.kubernetes.io/part-of=kube-prometheus
    kubectl  get svc  -n monitoring 
    NAME                    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
    alertmanager-main       ClusterIP   10.96.71.121   <none>        9093/TCP,8080/TCP            93m
    alertmanager-operated   ClusterIP   None           <none>        9093/TCP,9094/TCP,9094/UDP   93m
    blackbox-exporter       ClusterIP   10.96.33.150   <none>        9115/TCP,19115/TCP           93m
    grafana                 ClusterIP   10.96.12.88    <none>        3000/TCP                     93m
    kube-state-metrics      ClusterIP   None           <none>        8443/TCP,9443/TCP            93m
    node-exporter           ClusterIP   None           <none>        9100/TCP                     93m
    prometheus-adapter      ClusterIP   10.96.24.212   <none>        443/TCP                      93m
    prometheus-k8s          ClusterIP   10.96.57.42    <none>        9090/TCP,8080/TCP            93m
    prometheus-operated     ClusterIP   None           <none>        9090/TCP                     93m
    prometheus-operator     ClusterIP   None           <none>        8443/TCP                     93m

    blackbox_exporter: Prometheus 官方项目,网络探测,dns、ping、http监控
    node-exporter:prometheus的exporter,收集Node级别的监控数据,采集机器指标如 CPU、内存、磁盘。
    prometheus:监控服务端,从node-exporter拉数据并存储为时序数据。
    kube-state-metrics:将prometheus中可以用PromQL查询到的指标数据转换成k8s对应的数据,采集pod、deployment等资源的元信息。
    prometheus-adpater:聚合进apiserver,即一种custom-metrics-apiserver实现

    创建ingress

    方便通过域名访问,前提需安装ingress.

    cat > prometheus-ingress.yaml  << EOF
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: ingress-prometheus
      namespace: monitoring
      labels:
        app.kubernetes.io/name: nginx-ingress
        app.kubernetes.io/part-of: monitoring
      annotations:
        #kubernetes.io/ingress.class: "nginx"
        #nginx.ingress.kubernetes.io/rewrite-target: /  #rewrite
    spec:
      ingressClassName: nginx
      rules:
      - host: prometheus.k8s.local
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: prometheus-k8s
                port:
                  name: web
                  #number: 9090
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: ingress-grafana
      namespace: monitoring
      labels:
        app.kubernetes.io/name: nginx-ingress
        app.kubernetes.io/part-of: monitoring
      annotations:
        #kubernetes.io/ingress.class: "nginx"
        #nginx.ingress.kubernetes.io/rewrite-target: /  #rewrite
    spec:
      ingressClassName: nginx
      rules:
      - host: grafana.k8s.local
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: grafana
                port:
                  name: http
                  #number: 3000
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: ingress-alertmanager
      namespace: monitoring
      labels:
        app.kubernetes.io/name: nginx-ingress
        app.kubernetes.io/part-of: monitoring
      annotations:
        #kubernetes.io/ingress.class: "nginx"
        #nginx.ingress.kubernetes.io/rewrite-target: /  #rewrite
    spec:
      ingressClassName: nginx
      rules:
      - host: alertmanager.k8s.local
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: alertmanager-main
                port:
                  name: web
                  #number: 9093
    
    EOF
    kubectl delete -f  prometheus-ingress.yaml  
    kubectl apply -f  prometheus-ingress.yaml  
    kubectl get ingress -A

    host文件中添加域名

    127.0.0.1 prometheus.k8s.local
    127.0.0.1 grafana.k8s.local
    127.0.0.1 alertmanager.k8s.local
    #测试clusterip
    curl -k  -H "Host:prometheus.k8s.local"  http://10.96.57.42:9090/graph
    curl -k  -H "Host:grafana.k8s.local"  http://10.96.12.88:3000/login
    curl -k  -H "Host:alertmanager.k8s.local"  http://10.96.71.121:9093/
    #测试dns
    curl -k  http://prometheus-k8s.monitoring.svc:9090
    #在测试pod中测试
    kubectl exec -it pod/test-pod-1 -n test -- ping prometheus-k8s.monitoring

    在浏览器上访问
    http://prometheus.k8s.local:30180/
    http://grafana.k8s.local:30180/
    admin/admin
    http://alertmanager.k8s.local:30180/#/alerts

    #重启pod
    kubectl get pods -n monitoring
    
    kubectl rollout restart deployment/grafana -n monitoring
    kubectl rollout restart sts/prometheus-k8s -n monitoring

    卸载

    kubectl delete –ignore-not-found=true -f manifests/ -f manifests/setup

    更改 Prometheus 的显示时区

    Prometheus 为避免时区混乱,在所有组件中专门使用 Unix Time 和 Utc 进行显示。不支持在配置文件中设置时区,也不能读取本机 /etc/timezone 时区。

    其实这个限制是不影响使用的:

    如果做可视化,Grafana是可以做时区转换的。

    如果是调接口,拿到了数据中的时间戳,你想怎么处理都可以。

    如果因为 Prometheus 自带的 UI 不是本地时间,看着不舒服,2.16 版本的新版 Web UI已经引入了Local Timezone 的选项。

    更改 Grafana 的显示时区

    默认prometheus显示的是UTC时间,比上海少了8小时。
    对于已导入的模板通用设置中的时区及个人资料中的修改无效

    helm安装修改values.yaml

       ##defaultDashboardsTimezone: utc
    <   defaultDashboardsTimezone: "Asia/Shanghai"

    方式一
    每次改查询时时区

    方式二
    另导出一份改了时区的模板

    方式三
    修改导入模板时区
    cat grafana-dashboardDefinitions.yaml|grep -C 2 timezone

                  ]
              },
              "timezone": "utc",
              "title": "Alertmanager / Overview",
              "uid": "alertmanager-overview",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / API server",
              "uid": "09ec8aa1e996d6ffcd6817bbaff4db1b",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Networking / Cluster",
              "uid": "ff635a025bcfea7bc3dd4f508990a3e9",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Controller Manager",
              "uid": "72e0e05bef5099e5f049b05fdc429ed4",
    --
                  ]
              },
              "timezone": "",
              "title": "Grafana Overview",
              "uid": "6be0s85Mk",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Compute Resources / Cluster",
              "uid": "efa86fd1d0c121a26444b636a3f509a8",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Compute Resources /  Multi-Cluster",
              "uid": "b59e6c9f2fcbe2e16d77fc492374cc4f",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Compute Resources / Namespace (Pods)",
              "uid": "85a562078cdf77779eaa1add43ccec1e",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Compute Resources / Node (Pods)",
              "uid": "200ac8fdbfbb74b39aff88118e4d1c2c",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Compute Resources / Pod",
              "uid": "6581e46e4e5c7ba40a07646395ef7b23",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Compute Resources / Workload",
              "uid": "a164a7f0339f99e89cea5cb47e9be617",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Compute Resources / Namespace (Workloads)",
              "uid": "a87fb0d919ec0ea5f6543124e16c42a5",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Kubelet",
              "uid": "3138fa155d5915769fbded898ac09fd9",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Networking / Namespace (Pods)",
              "uid": "8b7a8b326d7a6f1f04244066368c67af",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Networking / Namespace (Workload)",
              "uid": "bbb2a765a623ae38130206c7d94a160f",
    --
                  ]
              },
              "timezone": "utc",
              "title": "Node Exporter / USE Method / Cluster",
              "version": 0
    --
                  ]
              },
              "timezone": "utc",
              "title": "Node Exporter / USE Method / Node",
              "version": 0
    --
                  ]
              },
              "timezone": "utc",
              "title": "Node Exporter / MacOS",
              "version": 0
    --
                  ]
              },
              "timezone": "utc",
              "title": "Node Exporter / Nodes",
              "version": 0
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Persistent Volumes",
              "uid": "919b92a8e8041bd567af9edab12c840c",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Networking / Pod",
              "uid": "7a18067ce943a40ae25454675c19ff5c",
    --
                  ]
              },
              "timezone": "browser",
              "title": "Prometheus / Remote Write",
              "version": 0
    --
                  ]
              },
              "timezone": "utc",
              "title": "Prometheus / Overview",
              "uid": "",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Proxy",
              "uid": "632e265de029684c40b21cb76bca4f94",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Scheduler",
              "uid": "2e6b6a3b4bddf1427b3a55aa1311c656",
    --
                  ]
              },
              "timezone": "UTC",
              "title": "Kubernetes / Networking / Workload",
              "uid": "728bf77cc1166d2f3133bf25846876cc",

    删除utc时区
    sed -rn ‘/"timezone":/{s/"timezone": "."/"timezone": ""/p}’ grafana-dashboardDefinitions.yaml
    sed -i ‘/"timezone":/{s/"timezone": ".
    "/"timezone": ""/}’ grafana-dashboardDefinitions.yaml

    数据持久化

    默认没有持久化,重启pod后配制就丢了

    准备pvc

    提前准备好StorageClass
    注意namesapce要和 service一致

    cat > grafana-pvc.yaml  << EOF
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: grafana-pvc
      namespace: monitoring
    spec:
      storageClassName: managed-nfs-storage  
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 10Gi
    EOF

    修改yaml

    grafana的存储
    grafana-deployment.yaml

          serviceAccountName: grafana
          volumes:
          - emptyDir: {}
            name: grafana-storage

    修改成

          serviceAccountName: grafana
          volumes:
          - PersistentVolumeClaim: 
              claimName:grafana-pvc
            name: grafana-storage

    spec:添加storage
    prometheus-prometheus.yaml

      namespace: monitoring
    spec:
      storage:
          volumeClaimTemplate:
            spec:
              storageClassName: managed-nfs-storage
              resources:
                requests:
                  storage: 10Gi

    新增一些权限配置,修改完毕后的完整内容如下所示,新增的位置主要在resources和varbs两处
    prometheus-clusterRole.yaml

    rules:
    - apiGroups:
      - ""
      resources:
      - nodes/metrics
      verbs:
      - get
    - nonResourceURLs:
      - /metrics
      verbs:
      - get
    rules:
    - apiGroups:
      - ""
      resources:
      - nodes/metrics
      - services
      - endpoints
      - pods
      verbs:
      - get
      - list
      - watch
    - nonResourceURLs:
      - /metrics
      verbs:
      - get

    再执行以下操作,给prometheus增加管理员身份(可酌情选择)

    kubectl create clusterrolebinding kube-state-metrics-admin-binding \
    --clusterrole=cluster-admin  \
    --user=system:serviceaccount:monitoring:kube-state-metrics
    kubectl apply -f grafana-pvc.yaml
    kubectl apply -f prometheus-clusterRole.yaml
    
    kubectl apply -f grafana-deployment.yaml
    kubectl apply -f prometheus-prometheus.yaml
    
    kubectl get pv,pvc -o wide
    NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                           STORAGECLASS          REASON   AGE   VOLUMEMODE
    persistentvolume/pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19   10Gi       RWX            Delete           Bound      monitoring/grafana-pvc                          managed-nfs-storage            25h   Filesystem
    persistentvolume/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-1   managed-nfs-storage            25h   Filesystem
    persistentvolume/pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-0   managed-nfs-storage            25h   Filesystem

    修改动态pv回收为Retain,否测重启pod会删数据

    kubectl edit pv -n default pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 
    persistentVolumeReclaimPolicy: Retain
    
    kubectl edit pv -n default pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
    kubectl edit pv -n default pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
    
    kubectl get pods -n monitoring

    在nfs上查看是否有数据生成

    ll /nfs/k8s/dpv/
    total 0
    drwxrwxrwx. 2 root root  6 Oct 24 18:19 default-test-pvc2-pvc-f9153444-5653-4684-a845-83bb313194d1
    drwxrwxrwx. 2 root root  6 Nov 22 15:45 monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
    drwxrwxrwx. 3 root root 27 Nov 22 15:52 monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
    drwxrwxrwx. 3 root root 27 Nov 22 15:52 monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e

    kubectl logs -f prometheus-k8s-0 prometheus -n monitoring

    自定义pod/service自动发现配置

    目标:
    用户启动的service或pod,在annotation中添加label后,可以自动被prometheus发现:

    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "9121"
    1. secret保存自动发现的配置
      若要特定的annotation被发现,需要为prometheus增加如下配置:
      prometheus-additional.yaml
      cat > prometheus-additional.yaml << EOF
      - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: \$1:\$2
      - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_name
      EOF
      有变量再次查看
      cat prometheus-additional.yaml

      上述配置会筛选endpoints:prometheus.io/scrape=True

    在需要监控的服务中添加

      annotations: 
         prometheus.io/scrape: "True"

    将上述配置保存为secret:

    kubectl delete secret additional-configs -n monitoring
    kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
    secret "additional-configs" created
    kubectl get secret additional-configs -n monitoring  -o yaml 
    1. 将配置添加到prometheus实例
      修改prometheus CRD,将上面的secret添加进去:

    vi prometheus-prometheus.yaml

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      labels:
        prometheus: k8s
      name: k8s
      namespace: monitoring
    spec:
      ......
      additionalScrapeConfigs:
        name: additional-configs
        key: prometheus-additional.yaml
      serviceAccountName: prometheus-k8s
      serviceMonitorNamespaceSelector: {}
      serviceMonitorSelector: {}
      version: 2.46.0

    kubectl apply -f prometheus-prometheus.yaml

    prometheus CRD修改完毕,可以到prometheus dashboard查看config是否被修改。
    http://prometheus.k8s.local:30180/targets?search=#pool-kubernetes-service-endpoints

    kubectl get pods -n monitoring -o wide
    kubectl rollout restart sts/prometheus-k8s -n monitoring
    kubectl logs -f prometheus-k8s-0 prometheus -n monitoring

    nfs重启后服务503,无法关闭pod

    #df -h 无反应,nfs卡死,需重启客户端服务器
    kubectl get pods -n monitoring
    
    kubectl delete -f prometheus-prometheus.yaml
    kubectl delete pod prometheus-k8s-1  -n monitoring
    kubectl delete pod prometheus-k8s-1 --grace-period=0 --force --namespace monitoring
    
    kubectl delete -f grafana-deployment.yaml
    kubectl apply -f grafana-deployment.yaml
    
    kubectl apply -f prometheus-prometheus.yaml
    kubectl logs -n monitoring pod prometheus-k8s-0 
    kubectl describe -n monitoring pod prometheus-k8s-0 
    kubectl describe -n monitoring pod prometheus-k8s-1 
    kubectl describe -n monitoring pod grafana-65fdddb9c7-xml6m  
    
    kubectl get pv,pvc -o wide
    
    persistentvolume/pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19   10Gi       RWX            Delete           Bound      monitoring/grafana-pvc                          managed-nfs-storage            25h   Filesystem
    persistentvolume/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-1   managed-nfs-storage            25h   Filesystem
    persistentvolume/pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-0   managed-nfs-storage            25h   Filesystem
    persistentvolume/pvc-f9153444-5653-4684-a845-83bb313194d1   300Mi      RWX            Retain           Released   default/test-pvc2                               managed-nfs-storage            29d   Filesystem
    
    #完全删除重装
    kubectl delete -f manifests/
    kubectl apply -f manifests/

    当nfs异常时,网元进程读nfs挂载目录超时卡住,导致线程占满,无法响应k8s心跳检测,一段时间后,k8s重启该网元pod,在终止pod时,由于nfs异常,umount卡住,导致pod一直处于Terminating状态。

    去原有布署的node上卸载nfs挂载点
    mount -l | grep nfs

    sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
    192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e on /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volumes/kubernetes.io~nfs/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e type nfs4 (rw,relatime,vers=4.1,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.244.7,local_lock=none,addr=192.168.244.6)
    192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e/prometheus-db on /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volume-subpaths/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e/prometheus/2 type nfs4 (rw,relatime,vers=4.1,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.244.7,local_lock=none,addr=192.168.244.6)
    umount -l -f /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volumes/kubernetes.io~nfs/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e

    修改默认挂载为Soft方式

    vi /etc/nfsmount.conf
    Soft=True

    soft 当NFS Client以soft挂载Server后,若网络或Server出现问题,造成Client和Server无法传输资料时,Client会一直尝试到 timeout后显示错误并且停止尝试。若使用soft mount的话,可能会在timeout出现时造成资料丢失,故一般不建议使用。
    hard 这是默认值。若用hard挂载硬盘时,刚好和soft相反,此时Client会一直尝试连线到Server,若Server有回应就继续刚才的操作,若没有回应NFS Client会一直尝试,此时无法umount或kill,所以常常会配合intr使用。
    intr 当使用hard挂载的资源timeout后,若有指定intr可以在timeout后把它中断掉,这避免出问题时系统整个被NFS锁死,建议使用。

    StatefulSet删除pv后,pod起来还是会找原pv,不建议删pv

    kubectl get pv -o wide
    pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e   10Gi       RWO            Retain           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-1   managed-nfs-storage            26h   Filesystem
    pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca   10Gi       RWO            Retain           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-0   managed-nfs-storage            26h   Filesystem
    
    kubectl patch pv pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e -p '{"metadata":{"finalizers":null}}'
    kubectl patch pv pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca -p '{"metadata":{"finalizers":null}}'
    kubectl delete pv pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
    kubectl delete pv pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca 
    
    kubectl describe pvc pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 | grep Mounted
    kubectl patch pv pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 -p '{"metadata":{"finalizers":null}}'
    kubectl delete pv pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19

    恢复pv

    恢复grafana-pvc,从nfs的动态pv目录下找到原来的挂载点 monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19

    kubectl describe -n monitoring pod grafana-65fdddb9c7-xml6m
    default-scheduler 0/3 nodes are available: persistentvolumeclaim "grafana-pvc" bound to non-existent persistentvolume "pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19". preemption: 0/3 nodes are available:
    3 Preemption is not helpful for scheduling..

    cat > rebuid-grafana-pvc.yaml  << EOF
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
      labels:
        pv: pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
    spec:
      capacity:
        storage: 10Gi
      accessModes:
        - ReadWriteMany
      persistentVolumeReclaimPolicy: Retain
      storageClassName:  managed-nfs-storage 
      nfs:
        path: /nfs/k8s/dpv/monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
        server: 192.168.244.6
    EOF
    kubectl apply -f ../k8s/rebuid-grafana-pvc.yaml 

    恢复prometheus-k8s-0

    kubectl describe -n monitoring pod prometheus-k8s-0
    Warning FailedScheduling 14m (x3 over 24m) default-scheduler 0/3 nodes are available: persistentvolumeclaim "prometheus-k8s-db-prometheus-k8s-0" bound to non-existent persistentvolume "pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca". preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..

    cat > rebuid-prometheus-k8s-0-pv.yaml  << EOF
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
      labels:
        pv: pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
    spec:
      capacity:
        storage: 10Gi
      accessModes:
        - ReadWriteMany
      persistentVolumeReclaimPolicy: Retain
      storageClassName:  managed-nfs-storage 
      nfs:
        path: /nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
        server: 192.168.244.6
    EOF

    kubectl describe -n monitoring pod prometheus-k8s-1
    Warning FailedScheduling 19m (x3 over 29m) default-scheduler 0/3 nodes are available: persistentvolumeclaim "prometheus-k8s-db-prometheus-k8s-1" bound to non-existent persistentvolume "pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e". preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..

    cat > rebuid-prometheus-k8s-1-pv.yaml  << EOF
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
      labels:
        pv: pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
    spec:
      capacity:
        storage: 10Gi
      accessModes:
        - ReadWriteMany
      persistentVolumeReclaimPolicy: Retain
      storageClassName:  managed-nfs-storage 
      nfs:
        path: /nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
        server: 192.168.244.6
    EOF
    kubectl apply -f rebuid-prometheus-k8s-0-pv.yaml 
    kubectl apply -f rebuid-prometheus-k8s-1-pv.yaml 
    
    kubectl get pv -o wide
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                           STORAGECLASS          REASON   AGE     VOLUMEMODE
    pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19   10Gi       RWX            Retain           Bound    monitoring/grafana-pvc                          managed-nfs-storage            9m17s   Filesystem
    pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e   10Gi       RWX            Retain           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-1   managed-nfs-storage            17s     Filesystem
    pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca   10Gi       RWX            Retain           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0   managed-nfs-storage            2m37s   Filesystem
    kubectl get pods -n monitoring
    kubectl -n monitoring logs -f prometheus-k8s-1

    Error from server (BadRequest): container "prometheus" in pod "prometheus-k8s-1" is waiting to start: PodInitializing
    iowait很高
    iostat -kx 1
    有很多挂载进程
    ps aux|grep mount

    mount -t nfs 192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e ./tmp
    showmount -e 192.168.244.6
    Export list for 192.168.244.6:
    /nfs/k8s/dpv     *
    /nfs/k8s/spv_003 *
    /nfs/k8s/spv_002 *
    /nfs/k8s/spv_001 *
    /nfs/k8s/web     *
    
    mount -v -t nfs 192.168.244.6:/nfs/k8s/web ./tmp
    mount.nfs: timeout set for Fri Nov 24 14:33:04 2023
    mount.nfs: trying text-based options 'soft,vers=4.1,addr=192.168.244.6,clientaddr=192.168.244.5'
    
    mount -v -t nfs -o vers=3  192.168.244.6:/nfs/k8s/web ./tmp
    #nfs3可以挂载

    如果客户端正在挂载使用,服务器端 NFS 服务突然间停掉了,那么在客户端就会出现执行 df -h命令卡死的现象。
    可以杀死挂载点,重启客户端和服务端nfs服务,重新挂载,或重启服务器。

    The post k8s_安装12_operator_Prometheus+grafana first appeared on C1G军火库.



沪ICP备19023445号-2号
友情链接