If we have Infrastructure as Code (IaC) why can’t we have Network as Code (NaC)?

12 min readMay 3, 2020

First of all, apologies to all of those people who are feeling hurt by decoupling the term “Network” from “Infrastructure”. I am one of you. But that’ s not my fault. I am well aware that a network is part of an infrastructure (especially in these days where the boundaries between the two are shrinking day after day) but wherever you look for infrastructure, you will find Docker, API, VMs, Cloud, AWS, Kubernetes, etc. but not a single word about router or switch (kind of Dockers, running in VMs, installed in a Cloud, are communicating each other by magic…aren’t they?). In my opinion, when we talk about infrastructure we talk about all ISO/OSI stack, from Layer 1 to Layer 7. End of story. So when someone talks about IaC, I expect their crapping magic code to take care of every Layer of the mentioned stack above. But this is not the case, so let’ s follow the flow and think about the network as a separated entity that has nothing to do with the infrastructure (I feel like I need to wash my hands after writing this).

So, let’s start taking a cup of coffee (this post is a bit long…) and asking ourselves: what [Infrastructure|Network] as Code means?

Well, I have never been a big fan of definitions or labels so personally, I believe that the better way to explain the meaning of [Infrastructure|Network] as Code is through an example (oh, by the way [something|else] is a RegEx syntax that means “something or else”). Think about the classic (read legacy) SysAdmin and Network Engineer kind of work and put it in a bowl. Now, think about today Software Developer kind of work and put it in the same bowl. Now, take a whisk (the electric one — more powerful) and start to mix them up at maximum speed and add a bit of DevOps at the end. The result is the answer to our question.

If we want to go through the recipe of this mess, we will see ingredients like, source-control (git), source of truth (again, git), CI/CD (gitlab runner, Jenkins, GoCD, ...), config management tools (Ansible, SaltStack, ..), templates (jinja2), provisioning tools (Terraform), coding (python, GO, ..) etc. etc. So, as you can understand [Infrastructure|Network] as Code is a mix of legacy and innovations. Or to better say, applying the tools and workflows of DevOps and Software Developers to the legacy SysAdmin and Networking (by now I expect you know how to read [Infrastructure|Network]. If you don’t, get out of my blog!).

Second question: Why everyone is talking about IaC and nobody about NaC?

This is an easy one. Reason is that Networking is evolving at a slower pace compared to DevOps and Software. Try to think about the innovations in the last 10 years in Networking (VXLAN as far as I remember/know) compared to Infrastructure’s ones (Docker, AWS, Kubernetes, …you name them). Network usually is more static whilst an infrastructure has more moving pieces. Because of these reasons, engineers are more prone to develop new tools and apply new ideas to manage Infrastructure instead of Network, as well as Network Engineers are ok to stick around with the usual way to work ( I know that because I have been one of them — lazy folks!)

Another question: What are the benefits of a NaC?

What? What is that? A job interview? I hate this kind of recruiter questions so, let’s move on and see how to build a Network as Code. I am sure you will pick up all the advantages of a NaC compared to traditional CLI approach.

How to build a NaC. Let’s start: the code bit.

I believe that the better way to explain something is with examples. So, let’ s use this demo as reference: lvrfrc87/NaC. You can also find here some slides presented at CloudExpo 2020, including a live demo recording.

The first thing we need to do is to honor the word Code. We need to find a way to represent our network configuration, not as a bunch of txt files or spreadsheet, but rather as piece of code. That is, find a way to abstract our configs from a show running-config to data type. One of the most common way to represent a piece of network config is via YAML or JSON data serialization. Taking the below YAML as example interface.yml, you can pretty much understand what the router or switch Ethernet1 port configuration will look like, once that YAML will be “translated” (read serialized) in a piece of config readable by the network device.

- name: Ethernet1                           
  ip: 10.76.64.0/31                           
  virt_router_ip: false                           
  virt_ip: false                           
  description: Link-to-p1-Leaf-1                           
  mtu: 1500                           
  enabled: true                           
  switchport: false

As we “serialized” a port configuration, we can do the same with BGP for example …

<<: *default
neighbors:
  - '10.76.100.45'
remote_as: '64524'
graceful_restart: true
next_hop_self: true
name: 'iBGP_64524'

…or VXLAN if you like.

---
vxlans:
  - number: 1
    source_interface: Loopback250
    port: 4789
    mapping:
      - map: 'vlan 10 vni 10'

As you can understand, all the device’s configuration can be represented with a YAML file. The level of abstraction is so good that even a non Network Engineer can easily “configure” it through a YAML file (obviously…you need to know what you are doing).

The holy grail: the source of truth.

Our YAMLs must be our source of truth. That is, all the network config (all of it, not just a bit) must be represented by YAMLs. Whatever is in there, is in the running-config. Simple like that.

Now, thinking about my personal experience (and for a second think also about yours), there is always some lazy chap that does not get along very well with innovation and new way to do work. They just love to do the same things over and over again. From 9 to 5 (I respect you, but you need to understand that the world is evolving and you have to adapt). So, this kind of dinosaur engineer (I am sure everyone has met one once in his/her life) always try to find a backdoor, a workaround or weakens in the workflow, to stick around with his paleolithic way to work. To stop him to do it (yes, I take it personally) we need to enforce our code method. To do that, we can rely on some networking OS functionality like config replace.

Let’s go with an example: I have my network switches up and running and their running config written in YAML. Someone raises a ticket and wants to change the NTP server to all network devices in a DC (let’s say 100 of them). Our dinosaur colleague never bothered to learn how to write a NTP config on YAML so he logs into every single switch and spend all day doing: enable, conf t, ntp server 1.2.3.4, wr, exit. Because his laziness, ntp.yml is not updated with the new IP. This is how the file looks like:

---                       
ntp:
  servers:                           
    - 10.8.8.5
    - 10.8.4.5                         
  source: Loopback 0

Now let’ s assume that we have a cronjob that takes all the YAMLs files, generates our candidate configuration and performs config replace (that is, replace running config with candidate config). Not having — 1.2.3.4 in ntp.yml try to guess what’ s happened to the NTP config done via CLI on those 100 devices?

…Puff! …gone.

There is no other way to provision a device apart using our YAMLs. All the rest is wiped out. Thanks to this example you might have also realized that adding a line to a file and push a button, it takes few seconds while ssh in 100 devices and do CTRL-C/CTRL-V a bit longer (…does it, my dear dinosaur?)

Here, a practical example using NAPALM and Ansible to perform config replace on Arista device. Don’t worry if it does not make sense right now. It will at the end.

- name: //--LOAD REPLACE CONFIG--//  
  napalm_install_config:                               
    config_file: './files/{{ inventory_hostname }}.cfg'
    commit_changes: True
    replace_config: True                               
    get_diffs: True                               
    diff_file: './diff/{{ inventory_hostname }}.diff'   
    provider: "{{ eos_auth }}"                             
    register: result                             
  tags:                               
    - cfgrpl

The magic: config from template.

I am sure some of you at this point would ask: how can we have a full candidate cofing, readable by a network device, from a bunch of YAMLs ?

Here the answer: jinja2 template.

Jinja2 is an amazing and powerful templating language that let you do very fancy stuff. From a bunch of variables (in YAML, or JSON or whatever you like format) you can render a template and have a config file ready to go.

Let’ s find out how it works: here a piece of jinja2 template for interface configuration (read it slowly...I am sure you will pick up some stuff here and there)

{% for iface in interfaces -%}
interface {{ iface.name }}
  {% if iface.ip != false -%}
  ip address {{ iface.ip }}
  {% endif -%}
  {% if iface.virt_router_ip != false -%}
  ip virtual-router address {{ iface.virt_router_ip }}
  {% endif -%}
  {% if iface.virt_ip != false -%}
  ip address virtual {{ iface.virt_ip }}
  {% endif -%}
  {% if 'loopback' not in iface.name -%}
  mtu {{ iface.mtu }}
  {% endif -%}
  {% if iface.description != false -%}
  description {{ iface.description }}
  {% endif -%}
  {% if 'Ethernet' in iface.name -%}
  {% if iface.switchport == false -%}
    no switchport
  {% elif iface.switchport == true -%}
    switchport
  {% endif -%}
  {% endif -%}
  {% if iface.enabled == true -%}
    no shutdown
  {% elif iface.enabled== false -%}
    shutdown
  {% endif -%}
!

If we take interface.yml and we use it to “feed” the our template, the result would be like this:

interface Ethernet1
  ip address 10.76.64.0/31
  mtu 1500
  description Link-to-p1-Leaf-1
  no switchport
  no shutdown
 !

If you browse the repo, you will find a demo.j2 that is a perfect example of how you can template a full configuration for an Arista switch.

There is not much else to say. Variables → Template → Configuration. That’s it.

The engine: Ansible (or whatever you like)

Introduction: on this demo I decided to use Ansible just because is the most common automation provisioning automation tool. I personally prefer SaltStack for different reasons and I am also dreaming about the day Terraform will support network vendors too (...working on turning that dream into reality!). I am not wasting time in political battles on which tool is better or which one is the coolest. Life is too short and full of adventures in my opinion. If you get upset because your favorite tool has been criticized, or if you are wasting time criticizing someone’s tool, I personally believe you have a problem and you need to re-think the way you live your life.

Oh…I feel much better now.

Let’s carry on.

So far we have our variables in YAML files, our template that renders our configuration (based on the variables). What we are missing is an engine that takes those variable, feeds the template, generates the candidate config, pushes that config on network device (1 or 100…doesn’t matter) and runs a nice config-replace. It is like having a new nice and shining Vespa and not an engine yet (how would that Vespa get to NordKapp then?).

Thanks to some extra plugins, Ansible can do all that work for us.

Rendering the template (Ansible will pick-up variables automagically based on specific folder structure)…

local_action: 
  template: 
    src="./templates/demo.j2" 
    dest="./files/{{ inventory_hostname }}.cfg"

…pushing the config to device and perform config-replace (NAPALM plugin)…

- name: //--LOAD REPLACE CONFIG--//
    napalm_install_config:
    config_file: './files/{{ inventory_hostname }}.cfg'
    commit_changes: True
    replace_config: True
    get_diffs: True
    diff_file: './diff/{{ inventory_hostname }}.diff'
    provider: "{{ eos_auth }}"

…and also run some operational test if we like (pyATEOS plugin)…

- name: PYATEOS | run pre-check tests.
  eos_pyateos:
    before: true
    test: "{{ tests }}"
    hostname: "{{ inventory_hostname }}"
    register: result
  tags: tests

..and push all the changes on our git repo (git_acp module). More on this, later.

- name: GIT | commit tests and configurations folders.
  git_acp:
      path: "{{ playbook_dir }}"
      comment: "CI/CD - {{ config_file_name }}"
      user: "{{ git.user }}"
      token: "{{ git.token }}"
      add: "{{ item }}"
      branch: "{{ branch }}"
      mode: https
      push_option: ci.skip
      url: https://github.com/lvrfrc87/NaC
  loop:
    - ./tests
    - ./configurations
  tags: git

Triggering Ansible with a simple command, can let us push the latest config run some operational test and save all the work on a git repo. Now think for a minute: this is also particularly useful in all those cases where you want enforcing config hardening across hundreds of devices or even security policies. Everything has to go through Ansible and source control with git, so no more wild CLI changes.

The sheriff: source control with Git.

One of the most important toll borrowed by Software Developer chaps is, without any sort of doubt, git. I am not going to bother you about the endless benefits and features of git as I assume you have a basic understanding of what it is and how to use it. What I am going to show you, it is a comparison between the “legacy” way to manage network config files (read spreadsheet) and the one introduced in this demo (read source control).

Collaboration

Hands up who has network configuration saved on shared spreadsheet! I still remember shouting to a colleague of mine, sitting on the opposite side of the office,asking to close that damns file as I had to do some change on tab number 23… The idea of a shared spreadsheet was to have a sort of collaboration between team members. That was more a race on who managed to open the file first and annoying the other forgetting to close it once work was done. With git you can have members of different teams, working a the same project, at the same time.

Commit SHA

In networking with a thing called TACACS server (or AAA) that also takes care of Accounting logs for each network device (that is, who is doing what, where). On git, a commit tracks the work done by a specific user (who changed what and when). Every commit is fingerprinted with a universal unique SHA value. There is not a way to get away from it: once you have a SHA on your commit, you are doomed.

Diff

Once in life, we all used diff on notepad, hoping to figure out the mess we did with the config we were working on. Git has a built-in diff functionality that let you see what has been changed between your actual commit (called HEAD) and the previous one (ore even n commits ago if you like).

Enforce workflows

Git(lab, in this case) can let you assign role per user or group and you can enforce restriction to each [user|group]. Everyone can edit a variable in a YAML file and maybe push it on dev branch, but just the maintainer can merge the changes into production branch and trigger CI/CD to apply the new config (only after a good code review based on diffs)

There are more and more things that we should discuss about git. I will leave you exploring them, because my fingers are bleeding as I have being typing since the last 3 hours…

The glue: CI/CD.

The last bit guys! Don’t give up. This is the last piece of our work. The most important. The one that let you click and forget as you know that result will be what you expect! Catch a breath and let’s go!

Another tool (or better say workflow) stolen from Software chaps is Continuous Integration/Continuous Delivery pipeline. That is, the ability to deploy changes at any rate, at any time, knowing that the result will be what we expected.

A pipeline takes all the pieces we talked about so far, and put all together in a nice and automated workflow. It is the pipeline to trigger all the actions required, evaluate the results and at every step, to decide if carrying on with the work or halting for an error. If you implement a smart pipeline, you might have 99% of chance to go to sleep without any kind of concern, knowing that nothing is going to break (…I wished it was true.)

In a ideal pipeline, you commit your changes in a dev branch, and automatically you trigger a new pipeline job. The pipeline checks the file syntax to make sure there are no errors, then it runs you config against some dev environment (a Vagrant for Arista as in demo, for example). From there, if everything looks good, the pipeline moves to staging where the config is pushed against some physical hardware. Before giving a green light, the pipeline wants to run some checks just to make sure your changes did not break something: for example a BGP session or removed a VLAN that was not intended to be removed. If everything looks good, the change can be later reviewed, approved and pushed in production (…good luck)

This is the end. Thanks for finding time to go through this article. I hope you will enjoy building a NaC as much as I did.

See you on the next post.