Migrate from FluentD to Fluent Bit

Older versions of EKF used FluentD to send logs to Amazon CloudWatch Logs. This section describes how to migrate from FluentD to Fluent Bit on your EKS cluster.

Fast Forward

If you are not running on EKS, proceed to the What’s Next section.

Optional

This guide is optional. If you have not enabled logging to CloudWatch in your EKS cluster using FluentD, proceed to the What’s Next section.

AWS has announced that Container Insights Support for FluentD is now in maintenance mode. That is, AWS will not provide any further updates for FluentD and is planning to deprecate it in the near future.

For this reason EKF now uses Fluent Bit to forward logs to Amazon CloudWatch Logs in order to take advantage of security updates and significant performance gains.

Note

FluentD and Fluent Bit operate on the same log groups on Amazon CloudWatch Logs. As such, there is no need to delete existing log groups or create new ones. Fluent Bit will automatically adopt and use existing ones, if present.

What You’ll Need

Check Your Environment

To migrate from FluentD to Fluent Bit, you are going to deploy a CloudFormation stack. When working with AWS CloudFormation stacks to manage resources, you need sufficient permissions both on AWS CloudFormation and on the underlying resources that are defined in the template.

In order to create an IAM role with proper IAM policies attached to it for your EKS Cluster Fluent Bit using AWS CloudFormation you need permissions for the following actions:

  • Deploy and delete AWS CloudFormation stacks.
  • Create IAM roles.
  • Create IAM policies.
  • Attach managed IAM policies to IAM roles.

Note

If you do not have the above permissions, contact your AWS administrator to grant sufficient permissions to your IAM user or deploy the below AWS CloudFormation stack for you.

Procedure

Note

You will first deploy Fluent Bit alongside FluentD (to avoid losing any logs) and then delete FluentD from your EKS cluster.

  1. Go to your GitOps repository inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Restore the required context from previous sections:

    root@rok-tools:~/ops/deployments# source <(cat deploy/env.{envvars-aws,eks-cluster,\ > eks-identity})
    root@rok-tools:~/ops/deployments# export AWS_ACCOUNT_ID AWS_DEFAULT_REGION \ > EKS_CLUSTER_OIDC EKS_CLUSTER
  3. Create an IAM role for Fluent Bit:

    1. Set the name of the IAM role for Fluent Bit:

      root@rok-tools:~/ops/deployments# export FLUENT_BIT_EKS_IAM_ROLE=rok-\ > ${AWS_DEFAULT_REGION?}-${EKS_CLUSTER?}-fluent-bit
    2. Verify that the IAM role name you specified is not longer than 64 characters:

      root@rok-tools:~/ops/deployments# [[ ${#FLUENT_BIT_EKS_IAM_ROLE} -le 64 ]] \ > && echo OK || echo FAIL OK

      Troubleshooting

      The output of the command is FAIL

      Go back to step 3a and specify a shorter name. Ensure the new name is not already in use.

    3. Set the name of the CloudFormation stack you will deploy:

      root@rok-tools:~/ops/deployments# export FLUENT_BIT_EKS_IAM_CF=rok-\ > ${AWS_DEFAULT_REGION?}-${EKS_CLUSTER?}-fluent-bit
    4. Verify that the CloudFormation stack name you specified is not longer than 128 characters:

      root@rok-tools:~/ops/deployments# [[ ${#FLUENT_BIT_EKS_IAM_CF} -le 128 ]] && echo OK || echo FAIL OK

      Troubleshooting

      The output of the command is FAIL

      Go back to step 3c and specify a shorter name. Ensure the new name is not already in use.

    5. Generate the AWS CloudFormation stack:

      root@rok-tools:~/ops/deployments# j2 rok/eks/fluent-bit-eks-iam-resources.yaml.j2 \ > -o rok/eks/fluent-bit-eks-iam-resources.yaml

      Alternatively, download the fluent-bit-eks-iam-resources CloudFormation template provided below and use it locally.

      fluent-bit-eks-iam-resources.yaml
      1Metadata:
      2 Rok::StackName: <FLUENT_BIT_EKS_IAM_CF>
      3
      4-18
      4Resources:
      5 FluentBitRole:
      6 Type: AWS::IAM::Role
      7 Description: Fluent Bit Role
      8 Properties:
      9 RoleName: <FLUENT_BIT_EKS_IAM_ROLE>
      10 AssumeRolePolicyDocument:
      11 Version: '2012-10-17'
      12 Statement:
      13 - Effect: Allow
      14 Action: sts:AssumeRoleWithWebIdentity
      15 Principal:
      16 Federated: arn:aws:iam::<AWS_ACCOUNT_ID>:oidc-provider/<EKS_CLUSTER_OIDC>
      17 Condition:
      18 StringEquals:
      19 <EKS_CLUSTER_OIDC>:sub: system:serviceaccount:amazon-cloudwatch:fluent-bit
      20 ManagedPolicyArns:
      21 - !Sub "arn:${AWS::Partition}:iam::aws:policy/CloudWatchAgentServerPolicy"
    6. Save your state:

      root@rok-tools:~/ops/deployments# j2 deploy/env.fluent-bit-eks-iam.j2 \ > -o deploy/env.fluent-bit-eks-iam
    7. Commit your changes:

      root@rok-tools:~/ops/deployments# git commit \ > -am "Create IAM Role for Fluent Bit"
    8. Deploy the CloudFormation stack:

      root@rok-tools:~/ops/deployments# aws cloudformation deploy \ > --stack-name ${FLUENT_BIT_EKS_IAM_CF?} \ > --template-file rok/eks/fluent-bit-eks-iam-resources.yaml \ > --capabilities CAPABILITY_NAMED_IAM Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - rok-us-east-1-arrikto-cluster-fluent-bit

      Troubleshooting

      AccessDenied

      If the above command fails with an error message similar to the following:

      An error occurred (AccessDenied) when calling the DescribeStacks operation: User: arn:aws:iam::123456789012:user/user is not authorized to perform: cloudformation:DescribeStacks on resource: arn:aws:cloudformation:us-east-1:123456789012:stack/rok-us-east-1-arriko-cluster-fluent-bit/e84c63f0-3247-11ec-9c73-0a316e131472

      it means that your IAM user does not have sufficient permissions to perform an action necessary to deploy an AWS CloudFormation stack.

      To proceed, Check Your Environment and contact your AWS administrator to grant sufficient permissions to your IAM user or deploy the AWS CloudFormation stack for you.

      Failed to create/update the stack

      If the above command fails with an error message similar to the following:

      Failed to create/update the stack. Run the following command to fetch the list of events leading up to the failure aws cloudformation describe-stack-events --stack-name rok-us-east-1-arriko-cluster-fluent-bit

      describe the events of the CloudFormation stack to identify the root cause of the failure:

      root@rok-tools:~/ops/deployments# aws cloudformation describe-stack-events \ > --stack-name ${FLUENT_BIT_EKS_IAM_CF?}
      • A stack event like the following:

        { "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/rok-us-east-1-arriko-cluster-fluent-bit/599bc930-7b3f-11eb-ac1c-029efe3a90a0", "EventId": "rok-us-east-1-arriko-cluster-fluent-bit-CREATE_FAILED-2021-03-02T10:09:27.457Z", "StackName": "rok-us-east-1-arriko-cluster-fluent-bit", "LogicalResourceId": "FluentBitRole", "PhysicalResourceId": "", "ResourceType": "AWS::IAM::Role", "Timestamp": "2021-03-02T10:09:27.457000+00:00", "ResourceStatus": "CREATE_FAILED", "ResourceStatusReason": "rok-us-east-1-arrikto-cluster-fluent-bit already exists in stack arn:aws:cloudformation:es-east-1:123456789012:stack/rok-us-east-1-arrikto-another-cluster-fluent-bit/e84c63f0-3247-11ec-9c73-0a316e131472", "ResourceProperties": "{\"ManagedPolicyArns\":[\"arn:aws:iam::123456789012:policy/rok-us-east-1-arrikto-cluster-fluent-bit\"],\"RoleName\":\"rok-us-east-1-arrikto-cluster-fluent-bit\",\"AssumeRolePolicyDocument\":{\"Version\":\"2012-10-17\",\"Statement\":[{\"Condition\":{\"StringEquals\":{\"oidc.eks.eu-central-1.amazonaws.com/id/123456789ABCDEFGHIJKLMNOPQRSTUVW:sub\":\"system:serviceaccount:kube-system:fluent-bit\"}},\"Action\":\"sts:AssumeRoleWithWebIdentity\",\"Effect\":\"Allow\",\"Principal\":{\"Federated\":\"arn:aws:iam::123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/123456789ABCDEFGHIJKLMNOPQRSTUVW\"}}]}}" }

        means that the IAM role or IAM policy that the AWS CloudFormation stack defines already exist, leading to name conflicts.

        To proceed, go back to step 3a, specify a different name for the resources that already exist and follow the rest of the guide.

      • A stack event like the following:

        { "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/rok-us-east-1-arriko-cluster-fluent-bit/415eef80-7b46-11eb-b047-06980f530fec", "EventId": "rok-us-east-1-arriko-cluster-fluent-bit-CREATE_FAILED-2021-03-02T10:09:27.457Z", "StackName": "rok-us-east-1-arriko-cluster-fluent-bit", "LogicalResourceId": "FluentBitRole", "PhysicalResourceId": "", "ResourceType": "AWS::IAM::Role", "Timestamp": "2021-03-02T10:58:54.216000+00:00", "ResourceStatus": "CREATE_FAILED", "ResourceStatusReason": "API: iam:CreateRole User: arn:aws:iam::123456789012:user/user is not authorized to perform: iam:CreateRole on resource: arn:aws:iam::123456789012:role/rok-us-east-1-arrikto-cluster-fluent-bit", "ResourceProperties": "{\"ManagedPolicyArns\":[\"arn:aws:iam::123456789012:policy/rok-us-east-1-arrikto-cluster-fluent-bit\"],\"RoleName\":\"rok-us-east-1-arrikto-cluster-fluent-bit\",\"AssumeRolePolicyDocument\":{\"Version\":\"2012-10-17\",\"Statement\":[{\"Condition\":{\"StringEquals\":{\"oidc.eks.eu-central-1.amazonaws.com/id/123456789ABCDEFGHIJKLMNOPQRSTUVW:sub\":\"system:serviceaccount:kube-system:fluent-bit\"}},\"Action\":\"sts:AssumeRoleWithWebIdentity\",\"Effect\":\"Allow\",\"Principal\":{\"Federated\":\"arn:aws:iam::123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/123456789ABCDEFGHIJKLMNOPQRSTUVW\"}}]}}" }

        means that your IAM user does not have sufficient permissions to create the resources that the AWS CloudFormation stack defines.

        To proceed, Check Your Environment and contact your AWS administrator to grant your IAM user sufficient permissions or deploy the AWS CloudFormation stack for you.

      ValidationError

      If the above command fails with an error message similar to the following:

      An error occurred (ValidationError) when calling the CreateChangeSet operation: Stack:arn:aws:cloudformation:us-east-1:123456789012:stack/rok-us-east-1-arriko-cluster-fluent-bit/671606f0-eb2b-11eb-8afb-0217413c9ed2 is in ROLLBACK_COMPLETE state and can not be updated.

      delete the stack and deploy it again.

  4. Deploy Fluent Bit:

    1. Render the ConfigMap patch template using the environment variables you specified:

      root@rok-tools:~/ops/deployments# j2 rok/amazon-cloudwatch/overlays/deploy/patches/configmap.yaml.j2 \ > -o rok/amazon-cloudwatch/overlays/deploy/patches/configmap.yaml
    2. Render the ServiceAccount patch template with the variables you have specified:

      root@rok-tools:~/ops/deployments# j2 rok/amazon-cloudwatch/overlays/deploy/patches/sa.yaml.j2 \ > -o rok/amazon-cloudwatch/overlays/deploy/patches/sa.yaml
    3. Commit your changes:

      root@rok-tools:~/ops/deployments# git commit -am "Deploy Fluent Bit"
    4. Apply the manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/amazon-cloudwatch/overlays/deploy/

      Note

      We use the default Fluent Bit optimized configuration that is aligned with Fluent Bit best practices.

      Note

      By default, the retention of log groups that Fluent Bit creates on Amazon CloudWatch Logs is set to Never expire.

  5. Delete FluentD:

    1. Delete the FluentD DaemonSet from your EKS cluster:

      root@rok-tools:~/ops/deployments# kubectl delete --ignore-not-found \ > -n amazon-cloudwatch ds fluentd-cloudwatch daemonset.apps "fluentd-cloudwatch" deleted
    2. Delete the FluentD ConfigMap from your EKS cluster:

      root@rok-tools:~/ops/deployments# kubectl delete \ > --ignore-not-found -n amazon-cloudwatch cm fluentd-config configmap "fluentd-config" deleted
    3. Delete the FluentD ClusterRoleBinding from your EKS cluster:

      root@rok-tools:~/ops/deployments# kubectl delete \ > --ignore-not-found clusterrolebinding fluentd-role-binding clusterrolebinding.rbac.authorization.k8s.io "fluentd-role-binding" deleted
    4. Delete the FluentD ClusterRole from your EKS cluster:

      root@rok-tools:~/ops/deployments# kubectl delete --ignore-not-found clusterrole fluentd-role clusterrole.rbac.authorization.k8s.io "fluentd-role" deleted
    5. Delete the FluentD ServiceAccount from your EKS cluster:

      root@rok-tools:~/ops/deployments# kubectl delete \ > --ignore-not-found -n amazon-cloudwatch sa fluentd serviceaccount "fluentd" deleted
    6. Restore the name of the CloudFormation stack you previously used to create the IAM role for FluentD:

      root@rok-tools:~/ops/deployments# export FLUENTD_EKS_IAM_CF=rok- \ > ${AWS_DEFAULT_REGION?}-${EKS_CLUSTER?}-fluentd
    7. Delete the CloudFormation stack that created the IAM role for FluentD:

      root@rok-tools:~/ops/deployments# aws cloudformation delete-stack \ > --stack-name ${FLUENTD_EKS_IAM_CF?}

Verify

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Restore the required context from previous sections:

    root@rok-tools:~/ops/deployments# source <(cat deploy/env.fluent-bit-eks-iam)
    root@rok-tools:~/ops/deployments# export FLUENT_BIT_EKS_IAM_ROLE
  3. Verify that you have successfully deployed the IAM role for Fluent Bit:

    1. Verify that the IAM role exists:

      root@rok-tools:~/ops/deployments# aws iam get-role \ > --role-name ${FLUENT_BIT_EKS_IAM_ROLE?} \ > --query Role.RoleName \ > --output text && echo OK rok-us-east-1-arrikto-cluster-fluent-bit OK
    2. Verify that the AmazonEKSClusterPolicy is attached to your IAM role for Fluent Bit:

      root@rok-tools:~/ops/deployments# POLICIES=$(aws iam list-attached-role-policies \ > --role-name ${FLUENT_BIT_EKS_IAM_ROLE?} \ > --query 'length(AttachedPolicies[? > PolicyArn==`arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy`])') && \ > ((POLICIES==1)) && \ > echo OK || \ > echo FAIL OK
  4. Verify that you have successfully deployed Fluent Bit:

    1. Verify that the Fluent Bit DaemonSet is ready. Verify that fields READY and UP-TO-DATE are equal to field DESIRED:

      root@rok-tools:~/ops/deployments# kubectl get ds -n amazon-cloudwatch fluent-bit NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE fluent-bit 2 2 2 2 2 <none> 2m
    2. Verify that you have enabled logging for your containers and worker nodes. Ensure that the corresponding log groups have been created in Amazon CloudWatch Logs:

      root@rok-tools:~/ops/deployments# aws logs describe-log-groups \ > --log-group-name-prefix /aws/containerinsights/${EKS_CLUSTER?} \ > --query logGroups[].[logGroupName] --output text /aws/containerinsights/arrikto-cluster/application /aws/containerinsights/arrikto-cluster/dataplane /aws/containerinsights/arrikto-cluster/host
  5. Verify that you have successfully deleted FluentD:

    1. Verify that the FluentD DaemonSet does not exist in your EKS cluster:

      root@rok-tools:~/ops/deployments# kubectl get ds -n amazon-cloudwatch fluentd-cloudwatch Error from server (NotFound): daemonsets.apps "fluentd-cloudwatch" not found
    2. Verify that the FluentD ConfigMap does not exist in your EKS cluster:

      root@rok-tools:~/ops/deployments# kubectl get cm -n amazon-cloudwatch fluentd-config Error from server (NotFound): configmaps "fluentd-config" not found

Summary

You have successfully upgraded Fluentd to Fluent Bit.

What’s Next

The next step is to test your upgraded deployment.