Migrate from FluentD to Fluent Bit¶
Older versions of EKF used FluentD to send logs to Amazon CloudWatch Logs. This section describes how to migrate from FluentD to Fluent Bit on your EKS cluster.
Fast Forward
If you are not running on EKS, proceed to the What’s Next section.
Optional
This guide is optional. If you have not enabled logging to CloudWatch in your EKS cluster using FluentD, proceed to the What’s Next section.
Fast Forward
If you are upgrading from EKF 2.0 or later, expand this box to fast-forward.
- Proceed to the Verify section.
AWS has announced that Container Insights Support for FluentD is now in maintenance mode. That is, AWS will not provide any further updates for FluentD and is planning to deprecate it in the near future.
For this reason EKF now uses Fluent Bit to forward logs to Amazon CloudWatch Logs in order to take advantage of security updates and significant performance gains.
Note
FluentD and Fluent Bit operate on the same log groups on Amazon CloudWatch Logs. As such, there is no need to delete existing log groups or create new ones. Fluent Bit will automatically adopt and use existing ones, if present.
What You’ll Need¶
- An upgraded management environment.
- An existing FluentD deployment.
- Your clone of the Arrikto GitOps repository.
- Arrikto manifests for EKF version 2.0.1.
Check Your Environment¶
To migrate from FluentD to Fluent Bit, you are going to deploy a CloudFormation stack. When working with AWS CloudFormation stacks to manage resources, you need sufficient permissions both on AWS CloudFormation and on the underlying resources that are defined in the template.
In order to create an IAM role with proper IAM policies attached to it for your EKS Cluster Fluent Bit using AWS CloudFormation you need permissions for the following actions:
- Deploy and delete AWS CloudFormation stacks.
- Create IAM roles.
- Create IAM policies.
- Attach managed IAM policies to IAM roles.
Note
If you do not have the above permissions, contact your AWS administrator to grant sufficient permissions to your IAM user or deploy the below AWS CloudFormation stack for you.
Procedure¶
Note
You will first deploy Fluent Bit alongside FluentD (to avoid losing any logs) and then delete FluentD from your EKS cluster.
Go to your GitOps repository inside your
rok-tools
management environment:root@rok-tools:~# cd ~/ops/deploymentsRestore the required context from previous sections:
root@rok-tools:~/ops/deployments# source <(cat deploy/env.{envvars-aws,eks-cluster,\ > eks-identity})root@rok-tools:~/ops/deployments# export AWS_ACCOUNT_ID AWS_DEFAULT_REGION \ > EKS_CLUSTER_OIDC EKS_CLUSTERCreate an IAM role for Fluent Bit:
Set the name of the IAM role for Fluent Bit:
root@rok-tools:~/ops/deployments# export FLUENT_BIT_EKS_IAM_ROLE=rok-\ > ${AWS_DEFAULT_REGION?}-${EKS_CLUSTER?}-fluent-bitVerify that the IAM role name you specified is not longer than 64 characters:
root@rok-tools:~/ops/deployments# [[ ${#FLUENT_BIT_EKS_IAM_ROLE} -le 64 ]] \ > && echo OK || echo FAIL OKTroubleshooting
The output of the command is FAIL
Go back to step 3a and specify a shorter name. Ensure the new name is not already in use.
Set the name of the CloudFormation stack you will deploy:
root@rok-tools:~/ops/deployments# export FLUENT_BIT_EKS_IAM_CF=rok-\ > ${AWS_DEFAULT_REGION?}-${EKS_CLUSTER?}-fluent-bitVerify that the CloudFormation stack name you specified is not longer than 128 characters:
root@rok-tools:~/ops/deployments# [[ ${#FLUENT_BIT_EKS_IAM_CF} -le 128 ]] && echo OK || echo FAIL OKTroubleshooting
The output of the command is FAIL
Go back to step 3c and specify a shorter name. Ensure the new name is not already in use.
Generate the AWS CloudFormation stack:
root@rok-tools:~/ops/deployments# j2 rok/eks/fluent-bit-eks-iam-resources.yaml.j2 \ > -o rok/eks/fluent-bit-eks-iam-resources.yamlAlternatively, download the
fluent-bit-eks-iam-resources
CloudFormation template provided below and use it locally.fluent-bit-eks-iam-resources.yaml1 Metadata: 2 Rok::StackName: <FLUENT_BIT_EKS_IAM_CF> 3 4-18 4 Resources: 5 FluentBitRole: 6 Type: AWS::IAM::Role 7 Description: Fluent Bit Role 8 Properties: 9 RoleName: <FLUENT_BIT_EKS_IAM_ROLE> 10 AssumeRolePolicyDocument: 11 Version: '2012-10-17' 12 Statement: 13 - Effect: Allow 14 Action: sts:AssumeRoleWithWebIdentity 15 Principal: 16 Federated: arn:aws:iam::<AWS_ACCOUNT_ID>:oidc-provider/<EKS_CLUSTER_OIDC> 17 Condition: 18 StringEquals: 19 <EKS_CLUSTER_OIDC>:sub: system:serviceaccount:amazon-cloudwatch:fluent-bit 20 ManagedPolicyArns: 21 - !Sub "arn:${AWS::Partition}:iam::aws:policy/CloudWatchAgentServerPolicy" Save your state:
root@rok-tools:~/ops/deployments# j2 deploy/env.fluent-bit-eks-iam.j2 \ > -o deploy/env.fluent-bit-eks-iamCommit your changes:
root@rok-tools:~/ops/deployments# git commit \ > -am "Create IAM Role for Fluent Bit"Deploy the CloudFormation stack:
root@rok-tools:~/ops/deployments# aws cloudformation deploy \ > --stack-name ${FLUENT_BIT_EKS_IAM_CF?} \ > --template-file rok/eks/fluent-bit-eks-iam-resources.yaml \ > --capabilities CAPABILITY_NAMED_IAM Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - rok-us-east-1-arrikto-cluster-fluent-bitTroubleshooting
AccessDenied
If the above command fails with an error message similar to the following:
An error occurred (AccessDenied) when calling the DescribeStacks operation: User: arn:aws:iam::123456789012:user/user is not authorized to perform: cloudformation:DescribeStacks on resource: arn:aws:cloudformation:us-east-1:123456789012:stack/rok-us-east-1-arriko-cluster-fluent-bit/e84c63f0-3247-11ec-9c73-0a316e131472it means that your IAM user does not have sufficient permissions to perform an action necessary to deploy an AWS CloudFormation stack.
To proceed, Check Your Environment and contact your AWS administrator to grant sufficient permissions to your IAM user or deploy the AWS CloudFormation stack for you.
Failed to create/update the stack
If the above command fails with an error message similar to the following:
Failed to create/update the stack. Run the following command to fetch the list of events leading up to the failure aws cloudformation describe-stack-events --stack-name rok-us-east-1-arriko-cluster-fluent-bitdescribe the events of the CloudFormation stack to identify the root cause of the failure:
root@rok-tools:~/ops/deployments# aws cloudformation describe-stack-events \ > --stack-name ${FLUENT_BIT_EKS_IAM_CF?}A stack event like the following:
{ "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/rok-us-east-1-arriko-cluster-fluent-bit/599bc930-7b3f-11eb-ac1c-029efe3a90a0", "EventId": "rok-us-east-1-arriko-cluster-fluent-bit-CREATE_FAILED-2021-03-02T10:09:27.457Z", "StackName": "rok-us-east-1-arriko-cluster-fluent-bit", "LogicalResourceId": "FluentBitRole", "PhysicalResourceId": "", "ResourceType": "AWS::IAM::Role", "Timestamp": "2021-03-02T10:09:27.457000+00:00", "ResourceStatus": "CREATE_FAILED", "ResourceStatusReason": "rok-us-east-1-arrikto-cluster-fluent-bit already exists in stack arn:aws:cloudformation:es-east-1:123456789012:stack/rok-us-east-1-arrikto-another-cluster-fluent-bit/e84c63f0-3247-11ec-9c73-0a316e131472", "ResourceProperties": "{\"ManagedPolicyArns\":[\"arn:aws:iam::123456789012:policy/rok-us-east-1-arrikto-cluster-fluent-bit\"],\"RoleName\":\"rok-us-east-1-arrikto-cluster-fluent-bit\",\"AssumeRolePolicyDocument\":{\"Version\":\"2012-10-17\",\"Statement\":[{\"Condition\":{\"StringEquals\":{\"oidc.eks.eu-central-1.amazonaws.com/id/123456789ABCDEFGHIJKLMNOPQRSTUVW:sub\":\"system:serviceaccount:kube-system:fluent-bit\"}},\"Action\":\"sts:AssumeRoleWithWebIdentity\",\"Effect\":\"Allow\",\"Principal\":{\"Federated\":\"arn:aws:iam::123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/123456789ABCDEFGHIJKLMNOPQRSTUVW\"}}]}}" }means that the IAM role or IAM policy that the AWS CloudFormation stack defines already exist, leading to name conflicts.
To proceed, go back to step 3a, specify a different name for the resources that already exist and follow the rest of the guide.
A stack event like the following:
{ "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/rok-us-east-1-arriko-cluster-fluent-bit/415eef80-7b46-11eb-b047-06980f530fec", "EventId": "rok-us-east-1-arriko-cluster-fluent-bit-CREATE_FAILED-2021-03-02T10:09:27.457Z", "StackName": "rok-us-east-1-arriko-cluster-fluent-bit", "LogicalResourceId": "FluentBitRole", "PhysicalResourceId": "", "ResourceType": "AWS::IAM::Role", "Timestamp": "2021-03-02T10:58:54.216000+00:00", "ResourceStatus": "CREATE_FAILED", "ResourceStatusReason": "API: iam:CreateRole User: arn:aws:iam::123456789012:user/user is not authorized to perform: iam:CreateRole on resource: arn:aws:iam::123456789012:role/rok-us-east-1-arrikto-cluster-fluent-bit", "ResourceProperties": "{\"ManagedPolicyArns\":[\"arn:aws:iam::123456789012:policy/rok-us-east-1-arrikto-cluster-fluent-bit\"],\"RoleName\":\"rok-us-east-1-arrikto-cluster-fluent-bit\",\"AssumeRolePolicyDocument\":{\"Version\":\"2012-10-17\",\"Statement\":[{\"Condition\":{\"StringEquals\":{\"oidc.eks.eu-central-1.amazonaws.com/id/123456789ABCDEFGHIJKLMNOPQRSTUVW:sub\":\"system:serviceaccount:kube-system:fluent-bit\"}},\"Action\":\"sts:AssumeRoleWithWebIdentity\",\"Effect\":\"Allow\",\"Principal\":{\"Federated\":\"arn:aws:iam::123456789012:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/123456789ABCDEFGHIJKLMNOPQRSTUVW\"}}]}}" }means that your IAM user does not have sufficient permissions to create the resources that the AWS CloudFormation stack defines.
To proceed, Check Your Environment and contact your AWS administrator to grant your IAM user sufficient permissions or deploy the AWS CloudFormation stack for you.
ValidationError
If the above command fails with an error message similar to the following:
An error occurred (ValidationError) when calling the CreateChangeSet operation: Stack:arn:aws:cloudformation:us-east-1:123456789012:stack/rok-us-east-1-arriko-cluster-fluent-bit/671606f0-eb2b-11eb-8afb-0217413c9ed2 is in ROLLBACK_COMPLETE state and can not be updated.delete the stack and deploy it again.
Deploy Fluent Bit:
Render the
ConfigMap
patch template using the environment variables you specified:root@rok-tools:~/ops/deployments# j2 rok/amazon-cloudwatch/overlays/deploy/patches/configmap.yaml.j2 \ > -o rok/amazon-cloudwatch/overlays/deploy/patches/configmap.yamlRender the
ServiceAccount
patch template with the variables you have specified:root@rok-tools:~/ops/deployments# j2 rok/amazon-cloudwatch/overlays/deploy/patches/sa.yaml.j2 \ > -o rok/amazon-cloudwatch/overlays/deploy/patches/sa.yamlCommit your changes:
root@rok-tools:~/ops/deployments# git commit -am "Deploy Fluent Bit"Apply the manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/amazon-cloudwatch/overlays/deploy/Note
We use the default Fluent Bit optimized configuration that is aligned with Fluent Bit best practices.
Note
By default, the retention of log groups that Fluent Bit creates on Amazon CloudWatch Logs is set to
Never expire
.
Delete FluentD:
Delete the FluentD
DaemonSet
from your EKS cluster:root@rok-tools:~/ops/deployments# kubectl delete --ignore-not-found \ > -n amazon-cloudwatch ds fluentd-cloudwatch daemonset.apps "fluentd-cloudwatch" deletedDelete the FluentD
ConfigMap
from your EKS cluster:root@rok-tools:~/ops/deployments# kubectl delete \ > --ignore-not-found -n amazon-cloudwatch cm fluentd-config configmap "fluentd-config" deletedDelete the FluentD
ClusterRoleBinding
from your EKS cluster:root@rok-tools:~/ops/deployments# kubectl delete \ > --ignore-not-found clusterrolebinding fluentd-role-binding clusterrolebinding.rbac.authorization.k8s.io "fluentd-role-binding" deletedDelete the FluentD
ClusterRole
from your EKS cluster:root@rok-tools:~/ops/deployments# kubectl delete --ignore-not-found clusterrole fluentd-role clusterrole.rbac.authorization.k8s.io "fluentd-role" deletedDelete the FluentD
ServiceAccount
from your EKS cluster:root@rok-tools:~/ops/deployments# kubectl delete \ > --ignore-not-found -n amazon-cloudwatch sa fluentd serviceaccount "fluentd" deletedRestore the name of the CloudFormation stack you previously used to create the IAM role for FluentD:
root@rok-tools:~/ops/deployments# export FLUENTD_EKS_IAM_CF=rok- \ > ${AWS_DEFAULT_REGION?}-${EKS_CLUSTER?}-fluentdDelete the CloudFormation stack that created the IAM role for FluentD:
root@rok-tools:~/ops/deployments# aws cloudformation delete-stack \ > --stack-name ${FLUENTD_EKS_IAM_CF?}
Verify¶
Go to your GitOps repository, inside your
rok-tools
management environment:root@rok-tools:~# cd ~/ops/deploymentsRestore the required context from previous sections:
root@rok-tools:~/ops/deployments# source <(cat deploy/env.fluent-bit-eks-iam)root@rok-tools:~/ops/deployments# export FLUENT_BIT_EKS_IAM_ROLEVerify that you have successfully deployed the IAM role for Fluent Bit:
Verify that the IAM role exists:
root@rok-tools:~/ops/deployments# aws iam get-role \ > --role-name ${FLUENT_BIT_EKS_IAM_ROLE?} \ > --query Role.RoleName \ > --output text && echo OK rok-us-east-1-arrikto-cluster-fluent-bit OKVerify that the
AmazonEKSClusterPolicy
is attached to your IAM role for Fluent Bit:root@rok-tools:~/ops/deployments# POLICIES=$(aws iam list-attached-role-policies \ > --role-name ${FLUENT_BIT_EKS_IAM_ROLE?} \ > --query 'length(AttachedPolicies[? > PolicyArn==`arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy`])') && \ > ((POLICIES==1)) && \ > echo OK || \ > echo FAIL OK
Verify that you have successfully deployed Fluent Bit:
Verify that the Fluent Bit
DaemonSet
is ready. Verify that fields READY and UP-TO-DATE are equal to field DESIRED:root@rok-tools:~/ops/deployments# kubectl get ds -n amazon-cloudwatch fluent-bit NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE fluent-bit 2 2 2 2 2 <none> 2mVerify that you have enabled logging for your containers and worker nodes. Ensure that the corresponding log groups have been created in Amazon CloudWatch Logs:
root@rok-tools:~/ops/deployments# aws logs describe-log-groups \ > --log-group-name-prefix /aws/containerinsights/${EKS_CLUSTER?} \ > --query logGroups[].[logGroupName] --output text /aws/containerinsights/arrikto-cluster/application /aws/containerinsights/arrikto-cluster/dataplane /aws/containerinsights/arrikto-cluster/host
Verify that you have successfully deleted FluentD:
Verify that the FluentD
DaemonSet
does not exist in your EKS cluster:root@rok-tools:~/ops/deployments# kubectl get ds -n amazon-cloudwatch fluentd-cloudwatch Error from server (NotFound): daemonsets.apps "fluentd-cloudwatch" not foundVerify that the FluentD
ConfigMap
does not exist in your EKS cluster:root@rok-tools:~/ops/deployments# kubectl get cm -n amazon-cloudwatch fluentd-config Error from server (NotFound): configmaps "fluentd-config" not found
Summary¶
You have successfully upgraded Fluentd to Fluent Bit.