Klustered: The second Rawkode vs Community edition
Table of Contents
Klustered: Rawkode and The Null Channel vs The Communities #
For a list of steps to recreate the break, see the Rawkode Academy’s Klustered github repo (I submitted a pull request which will hopefully be merged).
Backstory #
The opportunity to be a part of another Klustered episode came about via a conversation a few of us were having Marek’s Discord about his appearance on Klustered that day. I said if Rawkode ever does a community Klustered again that everyone should volunteer, as I believe it is a great way to learn and also support a great show that helps others learn not only Kubernetes concepts but also general Linux skills and various methods of diagnosing errors you may come across on Linux servers with slightly non standard configuration. In pops Rawkode to say some teams had to pull out and offered up a “vs Community” episode if Marek joined him on team fix. Just after making the statement about people volunteering, I had to put myself forward. I volunteered to break one and was immediately told my break would be crappy and was then called out on Twitter (in case anyone thinks this mean, this was friendly banter). And, so our adventure begins…
Side note: Diagnosing clusters, especially live, is far harder than breaking them. That being said, making breaks is not some 30 minute endeavour, it too can be difficult. I abandoned 14 different ideas due to lack of time and/or knowledge on my part. Plus I had some ideas that I put into tweets that I knew would be too difficult/not complementary to the other breaks I was doing. I am interested in seeing how etcd and a shell would cope with the million rows being returned for nodes…
Strategy #
I approach Klustered as a game of deception, I don’t need to beat the cluster as that is easy, I need to take the fixers on a journey that is hopefully new to them. Previously I have:
- altered their diagnostic tools to lie to them
- had them execute commands that break the cluster every time they issue any command
- stopped Systemd spinning up the Kubelet
- configured the Kubelet to not allow any more containers to run due to resource limits
- replaced a core component (Kubelet) which switched out the container images
- broke DNS (everyone needs to break DNS at some point on Klustered)
- broke the static manifest configuration
So as I was up against 2 experienced Klustered fixers in David (the Rawkode from Rawkode Academy) and Marek (from The Null Channel), and with the call out on Twitter to “bring it on”, I set about finding some new games and beaks I’ve not seen before on Klustered.
The Control Plane #
Rawkode is not an easy person to beat, especially not with altering Kubernetes/Systemd configuration or damaging networks. Marek is no pushover either, they both can fix clusters. I needed to slow them down, interrupt their A game, only then could I sneak things past them. So to begin with there can be nothing better than Teleporting in to something that breaks their expectations. Once I’ve occupied their time with tricks, I can apply the break which for the control plane was all aimed at stopping the apiserver from coming online.
Subterfuge 1/6 - Speak and Spell game #
As long as you have a program that can read from stdin and write to stdout, you can use it to provide something resembling a shell. For the Control Plane I created one that required team fix to answer 3 correct questions, and for some reason I have linked the phrase “The cow goes moo” to the Speak & Spell toy that was around in the 1980s. So I decided to test Rawkode and Marek on their animal dialects.
Once they had 3 right answers, there was only 1 command that could be accepted which would take them into something they could actually start to use to inspect the system, “bash”. However, I had control of what “bash” actually run, so I chose rbash
which is a restricted bash. Now it doesn’t work very well for root logins as you can just call bash again or sudo. I solved the bash side of things by inspecting the PATH env var and putting a custom “bash” script in a “/usr/local/sbin” which was declared before “/bin/” by default (i.e. I didn’t have to modify the PATH env var on the Ubuntu server).
Subterfuge 2/6 - vim trickery #
Rawkode is familiar with vim and makes fast changes to files and moves quickly, so the :wq
command is his shortcut of choice when it comes to saving and exiting at peak vim
speed. I’m not as familiar with the debugging style and text editor usage of Marek, but generally :wq
should be quite common. All I had to do was break this shortcut. I used cabbrev
in a .vimrc file to alter :wq
(write changes to file and quit), :q
(quit if no changes) and :w
(write changes to file) to actually issue :q!
(quit ignoring any changes). This worked spectacularly well. If team fix just slowed a little and issued :write
instead then my break would have been rendered useless. I also removed the other text editors so there only hope was vim
, or moving to something like sed
or awk
.
Subterfuge 3/6 - removing the “K” key #
I had threatened on Twitter that I was going to mess with their keyboard and had previously chatted to Marek about how altering the keyboard would really throw someone off, in that conversation we were talking about just switching out the layout from Qwerty to Colemak. A little bit of research led me to the bind command which customises readline which, if I understand it correctly, is used to parse user input and both keep a track of it as well as echo it out to stdout so you can see what you’ve typed before hitting enter which will submit it to be executed. Anyhow, with the bind command I replaced “k” for the Chinese symbol for water and “K” for the uppercase Kappa symbol. This worked well enough, it didn’t show the actual symbols on screen but did have the desired effect of slowing down team fix, I had a smile on my face the second time Rawkode tried to type KUBECONFIG and looked puzzled as to what was going on.
Subterfuge 4/6 - timeouts #
This one was a partial failure as I had set the timeout too harshly and so they were immediately kicked rather than it being something more casual that interrupted them later on. This was a case of not testing it in a real world usage scenario. The trick to it is an environment variable, TMOUT
, which when set will wait for user input for TMOUT
seconds and if no commands have been submitted, disconnect the session for inactivity.
Subterfuge 5/6 - no external tools #
Next up in the series of deception was to stop team fix from using Sleuthkit’s fls
as I have seen this used previously to great effect. Its a game changer if you know roughly when changes to a system have been made. To prevent it, I moved apt
, curl
and wget
as well as blacklisting Sluethkit using an apt configuration file. I also used my Twitter posts to indicate to them that I had run something to constantly make changes so that the noise of that script would prevent fls
from being a quick affair. I learned from watching them try to download Rawkode’s diagnostic script (they didn’t seem to care for Sleuthkit) that apt
and apt-get
are 2 different tools, for some reason I thought they were the same thing.
Subterfuge 6/6 - unexpected tools #
If you have watched a couple of Klustered episodes, you will see Rawkode at some point suggesting systemctl cat kubelet
and journalctl [flags] kubelet
(where flags are the combination of flags that a fixer prefers to use among the many available to them from journalctl). The systemctl cat
command is incredibly useful for showing up how people have messed with the Kubelet, which is a key target, maybe more so than DNS…
Anyway, I decided to make using these commands a little harder. My original intention was to use a modified version of Bad-BPF to call a different executable but I had too many issues making it stable and targeted to just the programs with arguments that I wanted to attack. So I shifted strategy and just decided to copy “funny” executables with the same names into a directory that was declared sooner in the list of directories in the PATH environment variable. So when systemctl
was meant to be run, it ran nyancat
and when journalctl
was meant to be run, it ran cmatrix
.
Whereas not quite as technically complex and much easier to fix, the impact seeing nyancat on screen when Rawkode was expecting a listing of configuration was exactly what I was aiming for.
Apiserver break 1/4 - Kubelet Systemd #
Once all the noise is out of the way, the break is a multi-part attack on the API server to prevent kubectl
from working. I was looking through some Systemd documentation talking about unit files and all of the various configuration options when I found 2 little gems, perfect for causing mischief. These were InaccessiblePaths
and TemporaryFileSystem
.
- InaccessiblePaths: this somehow (not gone into the implementation details of it, but I probably should) prevents whatever application Systemd is starting via the Exec instruction of the unit file from accessing the paths configured. As
kubelet
has to read the static manifests from/etc/Kubernetes/manifests
I decided that would be a nice directory to block. The idea being the files are there, kubelet is up and running, but no Kubernetes infra pods are starting. - TemporaryFileSystem: another one where I don’t yet know how it does what it does, this mounts a filesystem at any location you want just for the application Systemd is configured to run via the unit file.
I modified the unit file for Kubelet fto add one (/lib/systemd/system/kubelet.service), and then used the override that kubeadm makes for the other (/etc/systemd/system/kubelet.service.d/10-kubeadm.conf). The only reason for this was to try to make them think they fixed it if they only caught one, as when systemctl cat kubelet
was run (without nyancat getting in the way) the 2 additions would be a reasonable distance from one another in the output. This worked out exactly as I had hoped.
Apiserver break 2/4 - PID namespace #
I’m really happy with this as I don’t think it has been used before, it felt nice for me to break something in an as of yet unexplored area for Klustered (I’ve not actually watched every episode, there are some old ones I need to catch up on, so I might be incorrect about the originality of this break). My original target was runc
and whilst investigating how to break that I somehow discovered the OS limits for namespaces. A quick check on the working cluster of how many PID namespaces root used when Kubernetes was healthy led me to the number 11 (please note this is for the cluster Rawkode provided with all the components he loaded, this is not a magic number for all clusters in the world). So I needed to make the maximum number less than that, I opted to go lower by the number of usual Kubernetes static manifests, 4 (etcd, api server, controller manager, scheuler), therefore I needed to execute echo 7 > /proc/sys/user/max_pid_namespaces
(was initially 255194). I found out in the episode that I could have also used the sysctl
command to set this. The last thing to do after setting the value was to kill the processes already running, and after that they could not start back up as they had to abide by the limit.
I was really happy with this break, it worked as I wanted and took them a reasonable amount of time to find the issue. In the end Marek resorted to a web search, the smartest move to make considering it’s a very rare issue, to find the meaning behind the errors that were being reported from the kubelet.
Apiserver break 3/4 - break connection with etcd pt 1 #
I did not appreciate how scared they would be of a certificate break and how badly it would turn out for them once they started deleting things and then tried to use kubeadm, this was brutal. The break was relatively simple, I say that as I made 2 changes for the apiserver etcd cert and key, the easy ones were just to add “Hello…” to the start of the certificate/key data, which stood out when viewing either the key or cert as the rest of the data was base64 encoded. The other changes were more subtle but I’m not sure if they would even work as I didn’t get time to test them on their own. In the cert file I remove one “-” from the header from the left, if you paid close attention to the begin and end lines you would notice it, but a quick check would be difficult to see. In the key file I removed the word “RSA” from the header, again with a close eye you would spot it but a rushed one would not.
As soon as they tried to smash their way though it, they ruined the cluster some how.
Apiserver break 4/4 - break connection with etcd pt 2 #
The final break was quite simple but again was designed to pass a quick eye check but would be found under a closer inspection, this was to remove the etcd ca key so that the chain of trust was broken. The thing to remember here is that Rawkode is great at spotting extra settings but not at recognising what is missing. Tailor your breaks to your opponents weaknesses, sit back and enjoy the ride.
Worker #
Once all the traps were triggered/avoided and the control plane came back online, team fix would be left with a worker node that was offline. The whole point of this was to get them to log on to the worker. On the control plane the altered shell was just part of the trickery, on the worker it was my main goal. The breaks were fairly simple after dealing with the control plane, I just wanted them to have to play Wordle before they could finish the cluster.
Wordle login shell #
My original intention was to use one of the freely available PAM Wordle modules but I have never setup PAM before and the examples with the module I was planning on using was aimed at sudo usage not login for all. I spent probably 30+ minutes trying different ways to configure it and test it but I never managed to force a game via SSH. So, rather than burn more time I thought about how much easier it was to setup the root user’s shell and so I quickly found a rust cli Wordle game and modified it to be a shell. There is nearly always more than one way to solve a problem.
Broken node pt 1 #
As I said before, the aim of these breaks was really just to get them to login so they had to play Wordle. The fastest way to get a node to report unhealthy is to stop the kubelet, so that is what I did. Nice and easy to fix once you are on the node.
Broken node pt 2 #
As I had time to think of many breaks, I have often watched episodes of Klustered where IPTables were used to break communication between nodes, I’ve not seen a disabled route be used though so I decided to have a go to see if they would catch it quickly or if they’d be too busy examining Kubernetes network policies, IPTables or Cilium breaks.
An extra break just for fun #
When on the control plane, I had to make a choice which namespace to limit, PID or MNT as they were the 2 that each Kubernetes control plane container used. I ended up using PID on the control plane so wondered if MNT on the worker would be as easy to find. Technically they’ve pretty much fixed this before, so surely it would be quick to do again…
Cleanup #
On the control plane, I used the find
command to find all files in /
and then the -exec
option to perform a touch
which would reset the date of each file to be the same, just to stop them from using file dates to see what I had messed with. I also ran a history -c
and thanks to Kevin, my fellow community member, I cleared out the ~/.bash_history
file.
Reflections #
As I said early on in the article, I was suggesting people volunteer for these opportunities when content creators offer them. You can only get so much from watching, to push your skills you need to actively participate. Many people join communities stating they want to learn and that they are eager to join in, but when opportunities arise, they stay quiet. Don’t be afraid to take opportunities, ask for help if need be, trust me if you are in a good community people will rush to help you.
My knowledge of C and eBPF is so little I may as well state it as having none. I thought it would be quite simple to take something already available and tweak it to suit my needs, how wrong I was. I was experiencing intermittent failures at one point, then all of a sudden the code that was meant to intercept execve under certain conditions, started to run when an SSH connection was made and broke SSH (thankfully I have learned before to keep your connection open and start a second rather than restarting your current connection). I think after my WASM project, I need to explore eBPF some more to get a better understanding of how to debug it.
I maybe overdid it with the hype on Twitter, to the point my notes were completely ignored just to spite me. I need to find a better balance as I take responsibility for the cert issue that led to kubeadm putting the cluster into a seemingly unstable state.