Categories for Uncategorized

CFP: How to build a Kubernetes operator that doesn’t break production

September 21, 2020 8:46 am Published by

Audience

This talk is targeted at software developers and SREs interested in development practices for Kubernetes operators. Are they interested in how development of an operator is different from other software projects? This project will give an outline of the operator pattern and how development looks like, focusing on the importance of good engineering practices. Are they writing a Kubernetes operator just to automate a simple task? They should write tests for it, and this talk will tell them why. As Site Reliability Engineers in OpenShift Dedicated, we’re developing and maintaining a number of operators to keep toil on all our operated clusters as low as possible.

Outline

In a recently published blog post I wrote about how to make sure a Kubernetes operator project is maintainable and follows software development best practices. As SREs we create and maintain a growing number of Operators to keep toil away from us. But a poorly designed, implemented or tested operator can just create toil on its own by not functioning correctly. Adding new features to it can get hard for SREs as bugs can go in undiscovered and the confidence in adding new code can be low if the operator lacks an adequate test suite.

In this talk I will talk about the important concepts you should keep in mind when developing your own Kubernetes operator. Even if you want to start a new project just to automate the setup and configuration of a small application, make sure to give all the attention to good software development practices it needs, even if you feel this could slow down the development and even take you more time than just performing that task by hand. Software grows, and in the long run it will pay out if you craft a tested and readable operator from the beginning.

Key Takeaways

During this talk, attendees should have learned the importance of (1) treating a Kubernetes operator as production code. (2) It is very helpful to wrap external dependencies, where (3) tests will help achieve this goal as well as help improve the overall structure of the code.

CFP: 5 agile practices and why they are useful to SRE teams

April 21, 2020 2:28 pm Published by

As SRE (Site Reliability Engineering) teams contain a fair portion of software development work, and get filled up by software developers, it is a natural move to also adapt agile software development practices. The right agile model depends heavily on the percentage of development work vs. operations, which may be influenced by the team size. For example, in a small team where a high percentage of people is on call during the day, it might not make too much sense to plan sprints of 2 weeks if only a few backlog items are expected to get done in that timeframe.

Audience

This talk is targeted at everyone involved in Site Reliability Engineering, wondering how much agile to adopt – team leads, product owners, software developers, SREs. If you’re planning to transform your ops team into an SRE team, your SRE team just got started, or already do SRE since quite some time. As a software engineer who recently joined SRE, I will talk about which practices I found useful to take over from software engineering, which ones are better dropped, and which ones I’m still missing sorely.

Agile Practices

Retrospective

While often being the first meeting to get dropped by teams as the relation to actual work items cannot be seen easily, the retrospective meeting is the tool for teams to iterate on how they work and improve, including which of the agile practices make sense to adopt it which don’t.

Planning: Estimating Backlog Items

Planning meetings help the team understand priorities of items, the overall direction a project is heading and get a common understanding of how complex work is (with estimation). However, given a (not known) number of people is on call or doing incident response makes it hard to set sprint goals or commit to a consistent number of stories.

Standups

Standup meetings are useful, especially in distributed teams, to talk about what you’re working on and where you need help. Frequency of the meeting does not necessarily be daily – and that hit me as software engineer unexpectedly hard.

Testing

If your SRE team is writing software, that software should be tested. No room for discussion.

That’s what the software engineer might think – but you need to discuss. You need to convince your team testing is helpful. And that’s as equally hard in an SRE team as in any software engineering team.

Pair programming

It’s hard to convince people pair programming is helpful, and it isn’t helpful in every situation – but confidence in code as well as operations changes (in an outage for example) is so much higher when working in a pair.

Key Takeaways

During this talk, attendees should have learned (1) that SRE and software engineering likewise benefit from agile development practices, of which at least (2) some practices are worth to adopt while others may not be too helpful for SRE. (3) Which ones are and are not helpful can be the easiest spotted by iterating not only work but also how we work (practice retrospectives).

Guest article: Build a Kubernetes Operator in 10 minutes with Operator SDK

April 20, 2020 1:53 pm Published by

It’s been a while since I last submitted an article to opensource.com. This time it is about quickly kick-starting a Kubernetes Operator with Operator SDK. Click here to get to the article.

When you start working on a new software project, often a bunch of code is already existing. That’s by no means different when joining development of a Kubernetes Operator. In the case of Operator SDK a good part of the code is additionally generated, so you also want to know which code is hand-written, meant for changes, and which is generated by the SDK.

As I’ve been working on the GCP Project Operator with my team at Red Hat, I wanted to know what exactly the steps are to start an operator from scratch, to better understand what it is, that you get from the SDK. I thought it might also be useful for other hopping on operator development, so I wrote those steps down in a blog article.

CFP: Things I love about SRE that I loathe about DevOps

January 28, 2020 8:32 pm Published by

DevOps & SRE – what is it?

Let’s define what those terms mean.

DevOps means, the same team building the software is responsible for running it. This can be easiest imagined for software that is operated by the vendor themselves, i.e. cloud services. The idea is, no one knows the software better than the developers, so no one could better operate and fix in-flight issues than them. On the other hand, developers have high interest in building software that is easiest to operate, if they operate it themselves. Issues found during operations can be addressed immediatly.

SRE is a concept where one team is responsible for running one or more services. Again, imagine a company building cloud services. They may all have similar requirements in operations, so why not create a dedicated team to run them all. This team contains software developers to automate common tasks that emerge when running the services to minimize manual efforts. SRE teams monitor the software to spot issues as soon as they appear and fix them in the best case before a customer notices them.

Which DevOps problems may SRE solve?

In DevOps, having a team that runs software and a separate team that builds it, is a well-known anti-pattern.

“If you have a DevOps team, you’re doing it wrong”

This quote can be found in many tweets, blog posts, and conference talks. But when it comes to SRE, this is a common pattern. One team is building software, another team is running it, and building software to improve running it.

In DevOps, you usually find a specific support role being rotated through the team. That means everybody is in that role for example for a week out of 10, given a team of 10 members.

Working in this support role usually has very different requirements than the usual development work.

The context can get completely lost during the transition into the support role (which often comes as surprise on Monday mornings). And once context is built up and the developer feels comfortable in that role, his shift ends and the support context is again lost.

This results in people hating to get into the support role, which also makes it less attractive to build things that improve the supportability of the software.

In SRE teams, people are on call as well every few weeks, often even 50% of their time. Why don’t they dislike it? The difference is the focus of the team. When not on call, developers in the SRE team work on improving the support. They are less involved in actual product code, but rather build software to ease the operations. For example, to automate updates and other maintenance tasks.

DevOps: supportability vs. new features
SRE: supportability = new features
DevOps: Dev team learning ops all together
SRE: Ops learning from devs and devs learning from Ops

Agile development in SRE teams

As SRE teams contain a fair portion of software development work, and get filled up by software developers, it is a natural move to also adapt agile software development practices. It depends heavily on the percentage of development work vs operations, which may be influenced by the team size, to find the right model to track and manage the project work. For example, in a small team where a high percentage of people is on call during the day, it might not make too much sense to plan sprints of 2 weeks if only a few backlog items are expected to get done in that timeframe.

Key Takeaways of this talk

By the end of this talk about the differences between SRE and DevOps working styles, attendees should have awareness (1) what the most significant differences between DevOps and SRE are, (2) A successful team running DevOps or SRE needs experienced Ops as well as Dev people, (3) If team members greatly dislike getting into the operating role, the team should work hard on improving the support experience.

Ruby best practice: Implementing operator == and ensuring it doesn’t break

March 8, 2019 12:42 pm Published by

In ruby, comparing hashes, strings and objects is a complicated topic. Should you use equal?, eql? or ==? There is plenty of help on this topic, but in this post, we will focus on the interesting behavior of the == operator and how you can make it behave as you need it for your use case.

When comparing Hashes in Ruby, the == operator compares the content of a hash recursively.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
my_hash = {
    :sub_hash => {
        :value => 42
    }
}

my_second_hash = {
    :sub_hash => {
        :value => 42
    }
}

my_third_hash = {
    :sub_hash => {
        :value => 21
    }
}

puts "my_hash == my_second_hash? #{my_hash == my_second_hash}"
puts "my_hash == my_third_hash? #{my_hash == my_third_hash}"

1
2
my_hash == my_second_hash? true
my_hash == my_third_hash? false

Unfortunately, when comparing objects of arbitrary classes, the default operator only compares the object identity.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class MyClass
  def initialize(value)
    @value = value
  end
end

my_object = MyClass.new(42)
my_second_object = MyClass.new(42)

puts "my_object == my_second_object? #{my_object == my_second_object}"

1
my_object == my_second_object? false

If you want to do a deep comparison of objects of your class, you need to implement your own operator == by overriding the existing operator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class MyClass
  attr_reader :value

  def initialize(value)
    @value = value
  end

  def ==(other)
    other.respond_to?("value") && value == other.value
  end
end

my_object = MyClass.new(42)
my_second_object = MyClass.new(42)

puts "my_object == my_second_object? #{my_object == my_second_object}"

1
my_object == my_second_object? true

That was easy. But imagine this was a bigger class and someone else needed to add a property, not being aware of the existence of this operator and some other code depending on it to ensure no public member of the object changed. How can you ensure such a change doesn’t sneak in unnoticed?

I stumbled across the following solution when implementing an operator == for a class in the BOSH code together with my colleague Max.

As BOSH code is written in TDD – and your code should be as well – writing a test that breaks with a change as the one described above should ensure the operator to keep working. But how can such a test look like?

Consider the following change to our code above:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
class MyClass
  attr_reader :value
  attr_reader :value_new

  def initialize(value)
    @value = value
    @value_new = value
  end

  def ==(other)
    other.respond_to?("value") && value == other.value
  end
end

my_object = MyClass.new(42)
my_second_object = MyClass.new(42)

puts "my_object == my_second_object? #{my_object == my_second_object}"

To detect the variable @value_new has been added using rspec can be done with a test like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
require './object_compare_op'

describe :MyClass do
  describe 'operator ==' do
    context 'when instance variables are modified' do
      let :obj do
        MyClass.new(42)
      end
      let :other_obj do
        MyClass.new(42)
      end

      all_members = MyClass.new(0).instance_variables.map { |var| var.to_s.tr('@', '') }
      all_members.each do |member|
        it "returns false when #{member} is modified" do
          eval <<-END_EVAL
            class MyClass
              def modify_#{member}
               @#{member} = 'foo'
              end
            end
          END_EVAL
          obj.send("modify_#{member}")
          expect(obj == other_obj).to(
            equal(false),
            "Modification of #{member} not detected by == operator.",
          )
        end
      end
    end
  end
end

The variable @value_new only has an attribute reader, so we cannot simply assign a new value. But this doesn’t stop you from changing the value. Not in Ruby. Using the eval in the test, we add a method for all existing instance variables of MyClass (one in each iteration) that modifies the member.

Afterwards, the newly added method is called to change the value of the member and the expect checks if the operator detects the modification. And – for our code above – will fail. Hence, whenever someone adds a new member to MyClass, he will be reminded to also it to the operator == by this test. Even if the code of test itself might not be as speaking, the output of the failing test is:

 Modification of value_new not detected by == operator.

In some situations you may want to exclude a member from this check as it is just internal or not important to the equality of two objects. To enable this, we added an exclude list for private members to the test. This adds a bit of complexity to adding new members to the class as the test will bother you and you also have to add the member to the exclude list, but it improves the safety of your operator ==.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
require './object_compare_op'

describe :MyClass do
  describe 'operator ==' do
    context 'when instance variables are modified' do
      let :obj do
        MyClass.new(42)
      end
      let :other_obj do
        MyClass.new(42)
      end

      all_members = MyClass.new(0).instance_variables.map { |var| var.to_s.tr('@', '') }
      private_members = %w[value_new]
      public_members = all_members - private_members
      public_members.each do |member|
        it "returns false when #{member} is modified" do
          eval <<-END_EVAL
            class MyClass
              def modify_#{member}
               @#{member} = 'foo'
              end
            end
          END_EVAL
          obj.send("modify_#{member}")
          expect(obj == other_obj).to(
            equal(false),
            "Modification of #{member} not detected by == operator.",
          )
        end
      end
    end
  end
end

With this kind of test, you can easily implement comparison operators for your classes that check for object equality rather than identity and ensure you do not forget to add new members of the class also to the comparison.
You can take a look at productive code in the BOSH code base here. As you may see it’s not much different to what I presented here – it’s a universal approach to solve the problem.

Raspberry Pi powered Wifi Pictureframe

March 6, 2019 7:09 am Published by
Some days ago I wrote an post about building a picture frame using a Raspberry Pi. The article has been published on opensource.com. The software to show the slideshow has been written by myself and published on github as I wasn’t happy with all the existing solutions. As they either involve using a GL rendered xscreensaver which was terribly slow on the Raspberry Pi or installing Kodi which I think is kind of overkill, just to get a slide show.
Also I wanted the nice feature of a blurred, screen-filling version of the image displayed in the background. This is not possible using the xscreensaver slide show.
Take a look at it if you are as well looking for a lightweight slide show software and let me know what you think!

    Ansible: Passing arrays to BASH scripts

    January 7, 2019 7:56 am Published by

    When using Ansible, it may become handy sooner or later to invoke a BASH script in one of you playbooks. Invoking a BASH script in Ansible can be done using a simple shell task:

    1
    2
    3
    4
    5
    6
    ---
    - hosts: 127.0.0.1
      connection: local
      tasks:
          - name: ensure stuff is done
            shell: ./do_stuff.sh

    This task will execute the bash script do_stuff.sh. Sometimes, it is also necessary to configure the behaviour of the BASH script you are executing. The simplest way to do so is passing environment variables to the bash script as done in the following example.

    1
    2
    3
    4
    5
    6
    7
    8
    ---
    - hosts: 127.0.0.1
      connection: local
      tasks:
          - name: ensure custom stuff is done
            shell: ./do_stuff.sh
            environment:
                STUFF: some_stuff
    

    In the BASH script we can now work with the environment variable as usual:

    1
    2
    3
    #!/bin/bash
    
    echo $STUFF > /tmp/stuff.txt
    

    If we now want to pass multiple values in environment variable as array to the BASH script, we have two different options. Either we pass the array as string, or we parse the array output of Ansible in the bash script.

    Option 1: Pass array as string

    The first option is to pass the multiple values as string instead of a YAML array. In the following example, we separate them by spaces, which results in a single environment variable ARRAY being set when executing the BASH script.

    1
    2
    3
    4
    5
    6
    7
    8
    ---
    - hosts: 127.0.0.1
      connection: local
      tasks:
          - name: pass array as string
            shell: ./do_stuff_string.sh
            environment:
                ARRAY: one two three
    

    To handle those values as array in our BASH script, we simply surround the value of the environment variable with parentheses. Afterwards the ARRAY variable is a BASH array containing the values onetwo, and three.

    1
    2
    3
    4
    5
    6
    7
    #!/bin/bash
    
    ARRAY=($ARRAY)
    :> /tmp/stuff_string.txt
    for a in "${ARRAY[@]}"; do
        echo $a >> /tmp/stuff_string.txt
    done

    Option 2: Parse array

    The second option is much more readable in the Ansible playbook. We define the values as array in YAML:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    ---
    - hosts: 127.0.0.1
      connection: local
      tasks:
          - name: pass array as array
            shell: ./do_stuff_array.sh
            environment:
                ARRAY:
                - one
                - two
                - three
    

    Unfortunately, this environment variable is still set as string, now containing a python string representation of the array:

    1
    [u'one', u'two', u'three']
    

    To parse the array in the BASH script, the simplest way is to use python again, which is able to handle this value out of the box.

    1
    2
    3
    4
    5
    6
    7
    #!/bin/bash
    
    ARRAY=($(python <<< "print(' '.join($ARRAY))"))
    :> /tmp/stuff_array.txt
    for a in "${ARRAY[@]}"; do
        echo $a >> /tmp/stuff_array.txt
    done
    

    With the python one-liner in line 3, ARRAY again is a BASH array with the values onetwo, and three and can be processed further.

    Please note that, to be able to get array elements with spaces passed by ansible, we need to change the array seperator as the following version of the script shows. Otherwise, bash would recognize every single word as separate element of the array:

    1
    2
    3
    4
    5
    6
    7
    #!/bin/bash
    IFS=
    

    We have seen 2 different options to pass an array from Ansible to a BASH script. Option 1, passing it as a simple string variable has some implications, for example an array containing values with spaces might result in a less readable YAML line. Option 2, defining the array as YAML array and parsing it in your BASH script with python makes the definition better readable and more solid, but adds a dependency to python for your BASH script.

    If simple one-word arguments need to be set for your script, option 1 might still be a good option as it is easier to handle in the BASH script and doesn’t include a dependency to python.
    Nevertheless, for more complex playbooks or values in the array, I recommend option 2 as the cleaner solution.

    n’
    ARRAY=( $(python <<< “print(‘n’.join($ARRAY))”) )
    echo “${ARRAY[@]}”
    for a in “${ARRAY[@]}”; do
    echo “$a”
    done
    We have seen 2 different options to pass an array from Ansible to a BASH script. Option 1, passing it as a simple string variable has some implications, for example an array containing values with spaces might result in a less readable YAML line. Option 2, defining the array as YAML array and parsing it in your BASH script with python makes the definition better readable and more solid, but adds a dependency to python for your BASH script.

    If simple one-word arguments need to be set for your script, option 1 might still be a good option as it is easier to handle in the BASH script and doesn’t include a dependency to python.
    Nevertheless, for more complex playbooks or values in the array, I recommend option 2 as the cleaner solution.

    Use Ansible to clone & update private git repositories via ssh

    July 7, 2018 7:21 am Published by

    One of the first things I wanted to do when I started using Ansible was to clone a git repository on a remote machine as I keep configuration, scripts, and source code in github or gitlab repositories. Things that are not meant for the public, I store in private repositories that I want to clone via ssh. Cloning and updating them I now want to automate with Ansible.

    There are different ways to go for this task:

    • Checkout the repo locally and copy it to the server via a Ansible synchronize task
    • Generate an ssh key on the server and allow cloning the repo with that key manually
    • Copy a local ssh key to the server and allow cloning the repo with that key
    • use ssh-agent to load the local key and forward the agent to the server
    While it might be tempting to just copy an ssh key via Ansible to the remote server, I find this quite risky,  as it means you copy a secret to a persistent storage on a remote server. Also, if you version your Ansible playbooks in a git repository as well to be able to execute the playbook from somewhere else, the private key has to be versioned along with it.

    Using ssh-agent, you can easily load your ssh key prior to provisioning the git repo on the remote server without copying it over, and without allowing access to your repo for a different key than the one you have granted access for development.
    Let’s go through this via a simple example. Let’s say you want to run the following playbook, which includes ensuring the git repository github.com/ntlx/my-private-repo is up-to-date.

    1
    2
    3
    4
    5
    6
    7
    ---
    - hosts: webserver
      tasks:
          - name: Ensure repo is up-to-date
            git:
                repo: git@github.com/ntlx/my-private-repo.git
                dest: repos/my-private-repo
    
    I assume you added your public ssh key to your github.com repository so you are able to clone and work on the repository locally. To clone the repository on the remote machine, you need to load your ssh-key to ssh-agent with the following command.

    ssh-add ~/.ssh/id_rsa
    

    Now we need to enable the forwarding of the ssh agent to the remote machine so we can access the loaded key remotely. There are different ways to do so, but I find it most useful to do it in your ansible.cfg like this:

    1
    2
    [ssh_connection]
    ssh_args=-o ForwardAgent=yes
    

    That way, you allow the forwarding for all your Ansible-managed hosts at once.

    Now you can go on executing your playbook and should be able to clone the repository on the remote host.

    To make it even easier, we can add a task to load the ssh-key before executing the other tasks in the playbook. For this, add the local host to your Ansible inventory:

    1
    2
    [local]
    local_machine ansible_connection=local ansible_host=localhost
    

    Now we can add a small shell task to load the ssh-key:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    ---
    - hosts: local
    - name: load ssh key
      shell: |
          ssh-add ~/.ssh/id_rsa
    
    - hosts: webserver
      tasks:
          - name: Ensure repo is up-to-date
            git:
                repo: git@github.com/ntlx/my-private-repo.git
                dest: repos/my-private-repo
    

    When you now execute the playbook, you shouldn’t need to load the ssh-key before.