Skip to main content
  1. Posts/

Initial Commit - Klustered #20

·2641 words·13 mins

I guess I will start with what I’ve been learning/doing most recently, and that will be appearing on Klustered #20, up against the amazing Marek Counts (check out his YouTube here: TheNullChannel).

If you’ve not watched the episode, go do that, as I’m going to be writing about my breaks and my performance diagnosing Marek’s breaks.

The break #

Once myself and Marek knew we were going up against one another, we started to raise the stakes and not agree we could go a little wild, but also add a Lord of the Rings theme to it somehow. I chose to tell a story from an outsider’s perspective of people trying to travel with Frodo. As usual, most of my hints were ignored 🙁

The break consisted of 4 parts

  • kubectl fake permission issue
  • Broken CoreDNS
  • Repeatedly deleting the deployment
  • Custom Kubelet to alter the container image being requested

Everything was a bit rushed as I realised when I came to load the breaks onto the cluster that I had compiled them on my linux distribution that had a different version of libc, so nothing worked, but in a bad way.

kubectl #

I’ve previously seen people use attributes and real permissions to stop the use of kubectl so I thought maybe a trick to look like a permissions issue would slow them down. For this I created a super simple go program. I just needed to print out the standard bash permission denied message (which I got by setting a script to owner only and tried to execute it via another user). I padded the resulting binary by using go’s embed feature, and cheekily embedded the actual kubectl, so the fake kubectl would look a more realistic size.

package main
import (
		"fmt"
		"os"
		_ "embed"
)

//go:embed baggage
var b []byte
	
func main() {
	
	fmt.Println("bash: kubectl: Permission denied")
	if len(b) > 200 {
		os.Exit(126)
	}
}

The keen eyed may notice that the embedded file was called baggage, I just rename the real kubectl to baggage. I’m not a go dev so wasn’t sure if the compile would remove the byte array if I didn’t use it so I used a super hacky way to reference it in the main function.

CoreDNS #

I wanted a DNS break, as it is always DNS, so had a look at what plugins were available in a vanilla CoreDNS (i.e. one that wouldn’t need me to load custom images for) and spotted cancel. I intended to set it to a low value in the hopes it would refuse to answer DNS requests, but this didn’t happen, it answers so quickly that even setting it to 1 (seems to default to milliseconds) still allowed it to operate. Then I noticed that it required the measurement unit to be added whereas the cache value and ttl in the kubernetes plugin wouldn’t accept a unit of time.

Corefile: |
     .:53 {
         errors
         health {
            lameduck 5s
         }
         ready
         kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
         }
         prometheus :9153
         forward . /etc/resolv.conf {
            max_concurrent 1000
         }
         cache 30
         loop
         reload
         loadbalance
     }     

With this I added a cancel 5 about the cache entry which would crash as there was no unit of time specified, then I added units of time cache and the kubernetes ttl value so on first glance it would look like I’ve just removed the unit from cancel. This was only ever meant to slow them down and seed DNS tampering in their mind for the 4th break.

Deleting Deployment #

I only partially completed this as I ran out of time.

Looping delete #

The premise was to run a process that would continually delete the deployment. I did this using another quick and dirty go program. I was aware of the kubernetes modules but I opted to execute kubectl for the stream as it was really simple.

package main
import (
		"time"
		"os/exec"
)

func main() {
	for {
		lsCmd := exec.Command("kubectl", "--kubeconfig=/var/opt/orc.config", "delete", "deployment", "klustered")
		lsCmd.Output()
		time.Sleep(5 * time.Second)
    }
}

Hiding an orc #

This was going to be the hack on procps, in the procps/ps/select.c file, just find any process that have orc in the cmdline and remove it from the list of processes by setting accept_proc equal to 0. So the start of the want_this_proc looks like this:


int want_this_proc(proc_t *buf){
	int accepted_proc = 1;
	/* elsewhere, convert T to list, U sets x implicitly */

    // THIS HIDES OUR PROCESS
	if (strstr(rSv(CMDLINE, str, buf), "orc")) {
        accepted_proc = 0;
        goto finish;
    }

	// unmodified code from here on down
	//...
}

I mistakenly said this would remove it from all the tools, however on second look I was working within the ps folder and a quick test showed that a build on procps with the above modification did not remove orc from the top output. I’m sure it’s not much more difficult to remove orc from that as well.

I was hoping we might see a new technique to find a process that didn’t require ps or top, some nice chaining of simple commands iterating over the /proc directory or some command I’ve never heard of before, but since I ran out of time to get this installed on the node, trusty ps was used to find the process. I did learn of pgrep whilst looking at procps, which will give you all the process IDs that match the query you specify, so pgrep orc would have shown the process ID, and my hints gave away the process name and which node it was on.

❯ pgrep orc        
49837

Custom Kubelet #

Previously, which I will write do another post about, I broke a cluster for a special edition of Klustered where Rawkode was flying solo for Cloud Native London’s monthly meetup. In that I tampered with the image at runtime and I wanted to do something that could affect the website again for this episode. For those not in the know, the cluster in Klustered hosts a website that pulls some data from postgres and shows a video, there are 2 versions and the cluster is classed as fixed when the second version of the website can be displayed.

Custom image #

As I have previously altered the video, I wanted to do something different this time out (although I plan to revisit the showing of a different video in a future episode if I am ever invited back). Anyhow, for episode 20 I decided to seed doubt in their minds by having the version 2 of the website show the DNS trouble screen. Rather than breaking the DNS for this, I could just replace the website with one that pumps out the exact same HTML that is shown when connecting to Postgres is broken on the real image, so making it look like a DNS issue. I left a little clue that it wasn’t actually DNS in the header of the page in a meta tag.

As the Klustered test site is public on github, I decided to use that and just alter the internals. Once I had it showing a broken looking page, I decided to tidy up a little too as the image was rather large. According to dive, it was 9 layers at a rather hefty 1.3GB…

┃ ● Layers ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cmp   Size  Command                                                                        
    114 MB  FROM cdc08e3f3469b06                                                           
     16 MB  set -eux;     apt-get update;     apt-get install -y --no-install-recommends   
     18 MB  set -ex;     if ! command -v gpg > /dev/null; then         apt-get update;     
    146 MB  apt-get update && apt-get install -y --no-install-recommends         git       
    510 MB  set -ex;     apt-get update;     apt-get install -y --no-install-recommends    
    448 MB  set -eux;     dpkgArch="$(dpkg --print-architecture)";     case "${dpkgArch##* 
       0 B  WORKDIR /workload                                                              
    712 kB  COPY /assets/Two.webm /workload/assets/video.webm # buildkit                   
     11 MB  COPY /code/target/release/workload /workload/httpd # buildkit                  
                                                                                           
│ Layer Details ├───────────────────────────────────────────────────────────────────────── 
                                                                                           
Tags:   (unavailable)                                                                      
Id:     4ea4387946b3eb06673993d4ff7fa691730ba5c270ab0a3bf0ac6cf28cf928ca                   
Digest: sha256:cd82161b8f56274c32f86fc960a8b939780712fd424223be1afe901387b43c9d            
Command:                                                                                   
COPY /code/target/release/workload /workload/httpd # buildkit                              
                                                                                           
│ Image Details ├───────────────────────────────────────────────────────────────────────── 
                                                                                           
Image name: ghcr.io/rawkode/klustered:v2                                                   
Total Image size: 1.3 GB                                                                   
Potential wasted space: 9.7 MB                                                             
Image efficiency score: 99 %                                                               
                                                                                           

That is just the top left panel for dive, I’ve removed the more detailed contents views. It’s a really nice tool, if you want to see what your container filesystem looks like, and why it looks like that, check out dive

My version was much smaller and I was super pleased with myself right up until Rawkode said he wanted to SSH into the image to see if he could work out why the DNS was failing to resolve…

┃ ● Layers ┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 
Cmp   Size  Command                                                                        
    9.1 MB  FROM b5ca3372308adc0                                                           

│ Layer Details ├─────────────────────────────────────────────────────────────────────────

Tags:   (unavailable)
Id:     b5ca3372308adc078546c25a3cef874965ac75fa3d7aa31df2c383b167ff4abd
Digest: sha256:8abe4f6a6113cde6f9f6c5ba4e6d943eaa4b7fea4ad01cc5e3a11f2955ed5be6
Command:
#(nop) COPY file:041c5327a2264aaf08c8d3a37d4bf7dcfd812a515ad7a44ac96c65dce3f40ca3 in /

│ Image Details ├─────────────────────────────────────────────────────────────────────────

Image name: russriguez/klustered:v20
Total Image size: 9.1 MB
Potential wasted space: 0 B
Image efficiency score: 100 %

Editing the kubelet #

This was a case of trial and error, I’m not a go developer so this might not have been the best way to approach it. Disclaimer out of the way, here’s how I attacked it…

I was under the belief that the Kubelet had to be told by the api server that a ne pod was required and what should be in that pod. So I assumed there would be a createContainer or createPod function that would have the image name I could tamper with before sending the request to Containerd. I was close, a few searches later led me to the startContainer function in pkg/kubelet/kuberuntime/kuberuntime_container.go. Here is the coe I added after the first line of the function:

func (m *kubeGenericRuntimeManager) startContainer(podSandboxID string, podSandboxConfig *runtimeapi.PodSandboxConfig, spec *startSpec, pod *v1.Pod, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, podIP string, podIPs []string) (string, error) {
	container := spec.container
	
    if strings.Contains(container.Image, "klustered") && !strings.Contains(container.Image, "klustered:v1") {
		container.Image = "russriguez/klustered:v20"
	}

    // ... rest of function unchanged, not shown to save space
}

Now this did switch the image, but it was quite noisy in the events showing my repo name, so I wanted to edit those messages too that were giving me away, cue searching for part of those event messages and finding pkg/kubelet/images/image_manager.go. To save showing the same change multiple times across that file, whenever there was log with interpolated container.Image in the string, I altered it to call a function (sanitizeImageName) on that image, which would swap out my Image name with the real one so the events would look like it was the proper image.

logPrefix := fmt.Sprintf("%s/%s/%s", pod.Namespace, pod.Name, sanitizeImageName(container.Image))

GoLand was altering the Sprintf calls to use placeholders rather than in string interpolation, not sure why but it’s not a battle I am interested in, so I let it do it’s thing.

Here is sanitizeImageName implementation:

func sanitizeImageName(image string) string {
	if strings.Contains(image, "russriguez") {
		return "ghcr.io/rawkode/klustered:v2"
	}
	return image
}

The final hurdle was the pod never starting. Once the pod was created, it would be destroyed as it differed from the intended spec. To stop it from continually being terminated, I searched for “definition changed” (that I spotted in the events when I described the pod) and found the code responsible for syncing up containers in pkg/kubelet/kuberuntime/kuberuntime_manager.go.

Inside the fairly hefty computePodActions I added a check for “klustered” and jumped out of the loop before any actions were taken:

        if _, _, changed := containerChanged(&container, containerStatus); changed {
			// tried altering containerChanged but it wasn't working, let's try this ... quick hack
			if strings.Contains(container.Image, "klustered") && !strings.Contains(container.Image, "klustered:v1") {
				keepCount++
				continue
			}
			message = fmt.Sprintf("Container %s definition changed", container.Image)
			// Restart regardless of the restart policy because the container
			// spec changed.
			restart = true
		} else if liveness, found := m.livenessManager.Get(containerStatus.ID); found && liveness == proberesults.Failure {
		

You may notice the comment I added above about containerCHanged, I had edited that function to look like this:

func containerChanged(container *v1.Container, containerStatus *kubecontainer.Status) (uint64, uint64, bool) {

	expectedHash := kubecontainer.HashContainer(container)
	if strings.Contains(container.Image, "klustered:v2") { // this isn't working, not sure why...
		expectedHash = containerStatus.Hash
	}

	return expectedHash, containerStatus.Hash, containerStatus.Hash != expectedHash
}

However when I tried running that on my kind cluster, it wasn’t working. I was however making a stupid mistake over and over, I was using my shell history to copy it over to the cluster, however I was using the wrong command so my modifications were not using the actual binary with the modification in, but one from an early trial, SIGDOH. I think the containerChanged would have actually worked.

removing the shell is what gave it away #

My decision to make the image super small was the key to Rawkode realising it wasn’t his image and shenanigans were afoot. As he couldn’t SSH onto the klustered pod, he realised it wasn’t actually his and my ruse was rumbled. I’ve watched a fair few episodes but don’t remember him ever SSHing onto the klustered pod so I can’t be too down on myself for leaving such an obvious clue. Not altering the image ID also didn’t help…

My time diagnosing #

First off, excuses:

  • I am not currently employed in a tech role
  • my history is in .NET on Windows servers
  • my last job in tech was managerial
  • the sun was in my eyes…

That should be enough :-)

In the video you will see a definite difference between people who routinely look at issues with Linux machines and me, I am a bit of a fearless novice. I’m not going to be able to fix everything (hello BGP issues) but that’s okay. From the experience I learned, and I know what I need more experience looking at (cough, low level network configuration).

I’m happy the vim :oldfiles trick worked to show me what was modified on the control plane. For those that don’t know about it, vim (and neovim) maintain a history of edits and you can view them using oldfiles, you can use it as a command or you can use the browse command with oldfiles as an argument :browse oldfiles. If you are just using oldfiles on its own, you can use the index to open the file in whatever means you want (tab, split, etc.) by using hte #< syntax, so for instance to use plain edit on the 4th item in the list you would use :edit #<4 or even shorten it to :e #<4.

I’m happy we zeroed in on the IPTables break. This was good old fashioned looking at logs and then theorising how that error could happen. I wasn’t able to fix that without Rawkode’s knowledge as my IPTables experience is very limited, but at a push I could have searched the internet.

I’m intrigued as to what the Quality of Service break was Marek told us about. I believe he said he plans on showing it on one of his videos on his YouTube channel: TheNullChannel.

I wish I wasn’t as flustered as after the show, in Discord, I came up with a super hacky work around to all the issues we had, planned and unplanned, schedule the workload to run on the control plane. Don’t judge me, you know it would have worked!

Conclusions #

Use terminal based Teleport if you are sharing a session and you run at a different resolution to the other party.

If your break involves compiling, make sure to get it onto the server early and compile there. Or statically compile everything.

Give yourself a bit of time to practise diagnosing, I went straight in to it as I was still editing and compiling a custom Kubelet on the control plane about an hour before the start of the stream. I regret that.

Marek doesn’t follow the first rule of Klustered (don’t break Teleport) 😂

Rawkode won’t leave you to suffer, he even helped me and I’ve previously rick-rolled him… So if you think you maybe want to have a go but are not sure - do it.