DNS Outage in the Kubernetes Cluster? Here’s the Quick and Easy Fix!

Introduction

Ever had a day where everything is running smoothly and then, boom! DNS stops working in your Kubernetes cluster? That’s exactly what happened to me in my lab setup. Everything was fine until, for some reason, the domain I had configured on my DNS server started acting up. This meant none of my FQDNs were resolvable. And, of course, if you can’t resolve DNS, you can’t access the Kubernetes cluster or any of the services exposed via Ingress.

Let me walk you through how I got things back on track quickly using dnsmasq as a workaround. Spoiler alert: It’s a super simple and fast solution, especially when you’re in a pinch!

What is dnsmasq?

Before we dive in, let’s quickly cover what dnsmasq is(in case you are unfamiliar). In short, it’s a lightweight DNS forwarder that can either cache or forward DNS queries to a real DNS server. It’s perfect for scenarios where you need to quickly stand up a DNS service but don’t need the full complexity of something like BIND.

Here’s the best part: dnsmasq loads the contents of your /etc/hosts file, meaning you can manually resolve local names without needing an actual DNS server.

Why dnsmasq?

In my case, deploying a full DNS server like BIND felt like overkill for my lab environment. I needed a quick fix to get things running again, and dnsmasq fit the bill perfectly.

The Situation: DNS Goes Down

Here is what happened when my DNS went down:

$ kubectl get nodes
Unable to connect to the server: dial tcp: lookup alpha.trinity.com on 192.168.100.2:53: server misbehaving

$ curl -k  https://grafana.trinity.com
curl: (6) Could not resolve host: grafana.trinity.com; Unknown error

$ nslookup alpha.trinity.com
Server:    192.168.100.2
Address:   192.168.100.2#53

** server can't find alpha.trinity.com: NXDOMAIN

At this point, it was clear the DNS server was reachable, but the domain was having issues. I needed to get things working without waiting for my DNS provider to fix it. Thats when I decided to implement dnsmasq as a temporary fix.

Steps to Configure dnsmasq

Here is a quick rundown of what I did to get dnsmasq running.

  • Create a Virtual Machine(VM) to Run dnsmasq
    • You will need a machine(physical or virtual) to host the dnsmasq service. In my case, I spun up a VM.
  • Install and Configure dnsmasq
$ sudo dnf install dnsmasq -y
$ sudo systemctl enable --now dnsmasq
  • Next, I configured /etc/dnsmasq.conf:
$ sudo cat /etc/dnsmasq.conf
[...]
domain-needed
bogus-priv
no-resolv
server=8.8.8.8
server=8.8.4.4
local=/trinity.com/
listen-address=::1,127.0.0.1,192.168.100.2
expand-hosts
domain=trinity.com
[...]

This setup ensures dnsmasq listens on the right IPs and handles the trinity.com domain locally.

  • Update /etc/hosts
    • Now, I mapped the internal IPs of my Kubernetes nodes by updating the /etc/hosts file:
$ sudo cat /etc/hosts
127.0.0.1 localhost
192.168.100.2    dnsmasq
192.168.150.5    alpha
192.168.150.10   master01
192.168.150.11   master02
192.168.150.12   master03
192.168.150.13   worker01
192.168.150.14   worker02
192.168.150.50   grafana
  • Restart dnsmasq service
$ sudo systemctl restart dnsmasq
  • On the clients, update /etc/resolv.conf
$ sudo cat /etc/resolv.conf
nameserver 192.168.100.2
  • Test DNS Resolution
    • To verify that everything’s working, I used dig(though ping would also work):
$ dig alpha.trinity.com +short

Unfortunately, I got no response, so it was time to troubleshoot.

Troubleshooting dnsmasq

Here is how I diagnosed and fixed the issue.

  • Check dnsmasq Logs
    • First, I checked the dnsmasq logs for any errors. None were found.
  • Is Port 53 Listening?
    • DNS operates on port 53, so I made sure dnsmasq was listening:
$ sudo ss -plan | egrep -i :53
udp   UNCONN 0      0   0.0.0.0:53   0.0.0.0:*  users:(("dnsmasq",pid=1477,fd=4))

Port 53 was open, but still, no response.

  • Capture Network Traffic with tcpdump
    • I ran tcpdump on the dnsmasq server and noticed the requests were reaching dnsmasq, but it wasn’t replying.
$ sudo tcpdump -i ens3 port 53

It turns out the problem was with iptables. There was no rule allowing DNS traffic on port 53.

  • Fix iptables
    • Adding an iptables rule to allow traffic on port 53 solved the issue:
$ sudo iptables -A INPUT -p udp --dport 53 -j ACCEPT
  • With that, I could see dnsmasq responding to queries in the tcpdump output:
tcpdump: listening on ens3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:22:12.783083 IP (tos 0x0, ttl 64, id 42198, offset 0, flags [none], proto UDP (17), length 81)
   192.168.150.10.45423 > 192.168.100.2.53: 24927+ [1au] A? alpha.trinity.com. (53)
12:22:12.783457 IP (tos 0x0, ttl 64, id 45033, offset 0, flags [DF], proto UDP (17), length 97)
    192.168.100.2.53 > 192.168.150.10.45423: 24927* 1/0/1 alpha.trinity.com. A 192.168.150.5 (69)

$ dig alpha.trinity.com +short
192.168.150.5

Kubernetes Cluster is Back! \o/

Finally, I was able to access the Kubernetes cluster again:

$ kubectl  get node
NAME           STATUS   ROLES    AGE   VERSION
master01   Ready    master   21d   v1.27.10
master02   Ready    master   21d   v1.27.10
master03   Ready    master   21d   v1.27.10
worker01   Ready    worker   21d   v1.27.10
worker02   Ready    worker   21d   v1.27.10

$ curl -k  https://grafana.trinity.com
<a href="/login">Found</a>.

Handling wildcard domain

If you are managing multiple apps under subdomains, simply adding a wildcard entry in dnsmasq.conf can save you tons of time.

$ sudo cat /etc/dnsmasq.conf
[...]
address=/.apps.alpha.trinity.com/192.168.200.100
[...]

$ sudo systemctl restart dnsmasq

Now, any application hosted under .apps.alpha.trinity.com will resolve to the IP of your LoadBalancer(in this case, 192.168.200.100).

root@dnsmasq:~# nslookup  abc.apps.alpha.trinity.com
Server:		192.168.100.2
Address:	192.168.100.2#53

Name:	abc.apps.alpha.trinity.com
Address: 192.168.200.100

root@dnsmasq:~# nslookup  xyz.apps.alpha.trinity.com
Server:		192.168.100.2
Address:	192.168.100.2#53

Name:	xyz.apps.alpha.trinity.com
Address: 192.168.200.100

This ensures smooth routing for all your apps with minimal configuration!

Conclusion

Using dnsmasq was a quick solution to a bigger DNS issue in my Kubernetes lab environment. Sure, it’s not a long-term fix, but in a lab setup where you need things running ASAP, it works like a charm. Next time you face DNS resolution problems in your Kubernetes cluster, give dnsmasq a shot!

References

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top