Part3: Installing and Configuring LitmusChaos
This is part3 of the series : “Simplifying Chaos Engineering in Kubernetes: A Guide with LitmusChaos“
In Part1, we discussed why chaos engineering is vital for Kubernetes environments, helping to test and improve the resilience of your applications. In Part2, we discussed LitmusChaos, an opensource chaos engineering tool, and explored its architecture and key components, including the Chaos Operator, ChaosEngine, and ChaosHub.
Now, let’s get hands-on! In this part, we will walk through the installation of LitmusChaos, configuring it through the UI, and laying the groundwork for running chaos experiments.
Installing LitmusChaos on Kubernetes
To start chaos testing, we first need to install LitmusChaos in Kubernetes cluster. The easiest way to do this is using Helm. Follow these steps:
- Add the LitmusChaos Helm repository:
helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm/
helm repo update
- Update
values.yaml
: Customize it according to your setup. In my case, I used Ingress to access the LitmusChaos UI. - Install LitmusChaos
helm install litmus litmuschaos/litmus --namespace litmus --create-namespace
- Verify the installation: Ensure everything is up and running:
helm ls -n litmus
kubectl get deploy -n litmus
kubectl get sts -n litmus
- Access the Litmus UI: If you are using ingress, like I did in my lab, check the ingress resource to get the UI URL:
kubectl get ing -n litmus
litmus-ingress nginx litmus.example.com 192.168.248.101 80 7d4h
Configuring Chaos Experiments Using the LitmusChaos UI
While you can manually create and apply resources like the ChaosEngine through yaml files, the LitmusChaos UI offers several advantages that make it a more convenient and user-friendly option for chaos testing, especially for those who prefer a visual interface.
Core Features of LitmusChaos UI:
- Centralized Experiment Management: The Litmus UI acts as a one-stop shop for managing all chaos experiments. Whether you are configuring new experiments, viewing logs, or tracking the results of ongoing chaos tests, the UI provides a centralized dashboard that simplifies operations.
- Pre-built ChaosHub Templates: One of the most powerful features of the UI is the direct integration with ChaosHub, where you can easily access and use pre-built templates for common chaos experiments. This removes the need to create experiments from scratch, saving you time and reducing the likelihood of configuration errors.
- Graphical Workflow Editor: For complex chaos workflows, the Blank Canvas feature allows you to visually design your chaos tests. You can drag and drop experiments, create dependencies, and visualize the flow of tasks. This is especially useful for designing multistep scenarios where you want different experiments to run in sequence or in parallel.
- Real-Time Monitoring and Logs: While running an experiment, the Litmus UI provides realtime insights into what is happening in the cluster. You can see which components are affected, view logs directly from the dashboard, and track the progress of your chaos test.
- Easy Environment Targeting: In the UI, you can quickly add different environments(Kubernetes clusters) to target for chaos experiments. This is helpful if you manage multiple clusters and want to test resilience across different environments from a single interface.
- Integrated Result Visualization: After running chaos experiments, viewing the results is as important as the experiment itself. The LitmusChaos UI offers a detailed breakdown of the results, allowing you to understand how your system behaved during the chaos event. You can also view the ChaosResult object, which gives insights into whether the system passed or failed the resilience check.
Enhanced Collaboration
- With the UI, teams working together on chaos experiments can easily share configurations, results, and insights. Developers, operations teams, and SREs can collaborate more effectively by leveraging the UI’s transparency and ease of use, making chaos testing a more accessible process across teams.
Overview of Creating Chaos Experiments from the LitmusChaos UI
- Add Your Environment: From the UI, create the environment where you define the Kubernetes cluster where you want to run experiments.
- Pre-configure Resilience Probes: Before running any chaos experiments, you will need to configure Resilience Probes. But what exactly are they?
- Resilience Probes are checks used to validate the health of your application during chaos experiments. They ensure that the system remains functional before, during, and after the chaos event. Here are some types of probes:
- HTTP Probes: Validate health by making HTTP requests to specific endpoints.
- Command Probes: Run custom commands to check the system’s status.
- Kubernetes Probes: Monitor Kubernetes resources, such as pods.
- Prometheus Probes: Query Prometheus metrics to ensure system performance under stress.
- For more details, you can check the official documentation on Resilience Probes.
- Resilience Probes are checks used to validate the health of your application during chaos experiments. They ensure that the system remains functional before, during, and after the chaos event. Here are some types of probes:
- Creating Chaos Experiments: Once the probes are set up, you are ready to create a chaos experiment. The LitmusChaos UI offers three options:
- Blank Canvas: Build your chaos workflow from scratch, adding experiments and steps as needed.
- Templates from ChaosHub: Choose pre-built templates for common failure scenarios and quickly get started.
- Upload from yaml: If you already have a predefined configuration, you can upload it as a yaml file.
Conclusion
With LitmusChaos installed and the UI configured, you are now ready to start experimenting with chaos in your Kubernetes environment. Using the LitmusChaos UI simplifies the process, allowing you to focus on testing resilience without the hassle of manually managing resources.
In the next blog, we will dive into real chaos test scenarios that I explored in my lab, including injecting latency, manually increasing the application load, simulating pod failures. These practical examples will help to understand how to use LitmusChaos to improve the resilience of your applications.
Stay tuned for the next post in the series! And as always, if you have any questions or need clarification, feel free to leave a comment below. I will do my best to research and respond!
Pingback: Introduction to LitmusChaos - A Chaos Engineering Tool for Kubernetes - SnapInCloud
Pingback: LitmusChaos in Action: Practical Chaos Experiments - SnapInCloud