VLAN for Flannel network integrated Docker CNI and Kubernetes across multiple hosts

  • Update at 2016-08-05: add kubernets CNI + flannel integration to deploy rc using newly created vlan. add more trouble shooting for vlan ip managment.

Multiple VLAN for Docker cluster

Abstract

flannel and cni is powerful to enable multiple VxLAN over multiple interface using the same bridge. It simplify complexity of VxLAN configuration of kubernetes cluster or any cluster of docker multiple hosts.

Setup multiple VLAN of flannel

1. Create VLAN setting using multiple flannel bridge

etcdctl set /coreos.com/network/vlan001/config  '{ "Network": "10.1.0.0/16", "Backend": { "Type": "vxlan", "VNI": 1 } }'
etcdctl set /coreos.com/network/vlan002/config '{ "Network": "10.2.0.0/16", "Backend": { "Type": "vxlan", "VNI": 2 } }'
etcdctl set /coreos.com/network/vlan003/config   '{ "Network": "10.3.0.0/16", "Backend": { "Type": "vxlan", "VNI": 3 } }'

2. Configure master node

/opt/bin/flanneld --etcd-endpoints=http://127.0.0.1:4001 --ip-masq --iface=10.160.61.145 --networks=vlan001,vlan002,vlan003

3. Configure minion node

/opt/bin/flanneld --etcd-endpoints=http://10.160.61.145:4001 --ip-masq --iface=10.122.158.23 --networks=vlan001,vlan002,vlan003

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# /opt/bin/flanneld --etcd-endpoints=http://10.160.61.145:4001 --ip-masq --iface=10.122.158.23 --networks=vlan001,vlan002,vlan003
I0722 09:45:05.006103 07773 main.go:275] Installing signal handlers
I0722 09:45:05.006800 07773 main.go:188] Using 10.122.158.23 as external interface
I0722 09:45:05.006838 07773 main.go:189] Using 10.122.158.23 as external endpoint
I0722 09:45:05.015478 07773 etcd.go:204] Picking subnet in range 10.3.1.0 ... 10.3.255.0
I0722 09:45:05.016021 07773 etcd.go:204] Picking subnet in range 10.1.1.0 ... 10.1.255.0
I0722 09:45:05.016704 07773 etcd.go:204] Picking subnet in range 10.2.1.0 ... 10.2.255.0
I0722 09:45:05.017581 07773 etcd.go:84] Subnet lease acquired: 10.3.46.0/24
I0722 09:45:05.019527 07773 etcd.go:84] Subnet lease acquired: 10.1.71.0/24
I0722 09:45:05.021254 07773 etcd.go:84] Subnet lease acquired: 10.2.20.0/24
I0722 09:45:05.027355 07773 ipmasq.go:50] Adding iptables rule: FLANNEL -d 10.3.0.0/16 -j ACCEPT
I0722 09:45:05.035341 07773 ipmasq.go:50] Adding iptables rule: FLANNEL -d 10.2.0.0/16 -j ACCEPT
I0722 09:45:05.037472 07773 ipmasq.go:50] Adding iptables rule: FLANNEL -d 10.1.0.0/16 -j ACCEPT
I0722 09:45:05.046376 07773 ipmasq.go:50] Adding iptables rule: FLANNEL ! -d 224.0.0.0/4 -j MASQUERADE
I0722 09:45:05.050128 07773 ipmasq.go:50] Adding iptables rule: FLANNEL ! -d 224.0.0.0/4 -j MASQUERADE
I0722 09:45:05.053277 07773 ipmasq.go:50] Adding iptables rule: FLANNEL ! -d 224.0.0.0/4 -j MASQUERADE
I0722 09:45:05.064919 07773 ipmasq.go:50] Adding iptables rule: POSTROUTING -s 10.3.0.0/16 -j FLANNEL
I0722 09:45:05.066320 07773 ipmasq.go:50] Adding iptables rule: POSTROUTING -s 10.1.0.0/16 -j FLANNEL
I0722 09:45:05.067803 07773 ipmasq.go:50] Adding iptables rule: POSTROUTING -s 10.2.0.0/16 -j FLANNEL
I0722 09:45:05.079573 07773 ipmasq.go:50] Adding iptables rule: POSTROUTING ! -s 10.3.0.0/16 -d 10.3.0.0/16 -j MASQUERADE
I0722 09:45:05.082317 07773 ipmasq.go:50] Adding iptables rule: POSTROUTING ! -s 10.1.0.0/16 -d 10.1.0.0/16 -j MASQUERADE
I0722 09:45:05.086633 07773 ipmasq.go:50] Adding iptables rule: POSTROUTING ! -s 10.2.0.0/16 -d 10.2.0.0/16 -j MASQUERADE
I0722 09:45:05.097434 07773 vxlan.go:153] Watching for L3 misses
I0722 09:45:05.097514 07773 vxlan.go:159] Watching for new subnet leases
I0722 09:45:05.099034 07773 vxlan.go:153] Watching for L3 misses
I0722 09:45:05.099082 07773 vxlan.go:159] Watching for new subnet leases
I0722 09:45:05.099479 07773 vxlan.go:153] Watching for L3 misses
I0722 09:45:05.099515 07773 vxlan.go:159] Watching for new subnet leases
I0722 09:45:05.100738 07773 vxlan.go:273] Handling initial subnet events
I0722 09:45:05.100774 07773 device.go:159] calling GetL2List() dev.link.Index: 6
I0722 09:45:05.100960 07773 device.go:164] calling NeighAdd: 10.160.61.145, 32:f7:77:37:0b:2c
I0722 09:45:05.103145 07773 vxlan.go:273] Handling initial subnet events
I0722 09:45:05.103174 07773 device.go:159] calling GetL2List() dev.link.Index: 5
I0722 09:45:05.103408 07773 vxlan.go:280] fdb already populated with: 10.160.61.232 56:74:a6:20:87:9e
I0722 09:45:05.103459 07773 vxlan.go:280] fdb already populated with: 10.160.61.34 ea:89:b7:02:57:2b
I0722 09:45:05.103480 07773 vxlan.go:280] fdb already populated with: 10.160.61.232 22:c2:c0:9b:c9:ac
I0722 09:45:05.103499 07773 vxlan.go:280] fdb already populated with: 10.160.61.82 02:50:4d:1b:e7:ae
I0722 09:45:05.103517 07773 vxlan.go:280] fdb already populated with: 10.70.189.198 7a:7c:52:20:19:ca
I0722 09:45:05.103558 07773 vxlan.go:280] fdb already populated with: 10.160.61.145 46:92:f9:29:2e:f3
I0722 09:45:05.103602 07773 vxlan.go:280] fdb already populated with: 10.160.61.34 4a:1f:ca:23:b7:43
I0722 09:45:05.103641 07773 vxlan.go:280] fdb already populated with: 10.160.61.58 7a:eb:8d:f0:bb:98
I0722 09:45:05.103686 07773 device.go:176] calling NeighDel: 10.160.61.232, 56:74:a6:20:87:9e
I0722 09:45:05.103805 07773 device.go:176] calling NeighDel: 10.160.61.34, ea:89:b7:02:57:2b
I0722 09:45:05.103878 07773 device.go:176] calling NeighDel: 10.160.61.232, 22:c2:c0:9b:c9:ac
I0722 09:45:05.103980 07773 device.go:176] calling NeighDel: 10.160.61.82, 02:50:4d:1b:e7:ae
I0722 09:45:05.104045 07773 device.go:176] calling NeighDel: 10.70.189.198, 7a:7c:52:20:19:ca
I0722 09:45:05.104119 07773 device.go:176] calling NeighDel: 10.160.61.145, 46:92:f9:29:2e:f3
I0722 09:45:05.104182 07773 device.go:176] calling NeighDel: 10.160.61.34, 4a:1f:ca:23:b7:43
I0722 09:45:05.104251 07773 device.go:176] calling NeighDel: 10.160.61.58, 7a:eb:8d:f0:bb:98
I0722 09:45:05.104313 07773 device.go:164] calling NeighAdd: 10.160.61.145, 02:5c:40:4b:dd:89
I0722 09:45:05.106499 07773 vxlan.go:273] Handling initial subnet events
I0722 09:45:05.106534 07773 device.go:159] calling GetL2List() dev.link.Index: 7
I0722 09:45:05.106643 07773 device.go:164] calling NeighAdd: 10.160.61.145, 5e:c2:7e:2b:70:d4
I0722 09:41:58.244501 08996 vxlan.go:280] fdb already populated with: 10.160.61.58 d6:d3:e2:cb:91:0e
I0722 09:41:58.244530 08996 device.go:176] calling NeighDel: 10.160.61.232, 56:74:a6:20:87:9e
I0722 09:41:58.244638 08996 device.go:176] calling NeighDel: 10.160.61.34, ea:89:b7:02:57:2b
I0722 09:41:58.244731 08996 device.go:176] calling NeighDel: 10.160.61.82, 96:c6:d3:f1:ff:f0
I0722 09:41:58.244812 08996 device.go:176] calling NeighDel: 10.160.61.232, 22:c2:c0:9b:c9:ac
I0722 09:41:58.244885 08996 device.go:176] calling NeighDel: 10.70.189.198, 7a:7c:52:20:19:ca
I0722 09:41:58.244978 08996 device.go:176] calling NeighDel: 10.122.158.23, e2:ef:2a:db:3a:04
I0722 09:41:58.245059 08996 device.go:176] calling NeighDel: 10.160.61.34, 4a:1f:ca:23:b7:43
I0722 09:41:58.245146 08996 device.go:176] calling NeighDel: 10.160.61.58, d6:d3:e2:cb:91:0e
I0722 09:41:58.247117 08996 vxlan.go:153] Watching for L3 misses
I0722 09:41:58.247163 08996 vxlan.go:159] Watching for new subnet leases
I0722 09:41:58.247952 08996 vxlan.go:273] Handling initial subnet events
I0722 09:41:58.247994 08996 device.go:159] calling GetL2List() dev.link.Index: 7
I0722 09:41:58.248601 08996 vxlan.go:273] Handling initial subnet events
I0722 09:41:58.248642 08996 device.go:159] calling GetL2List() dev.link.Index: 6

4. Choose different VLAN for docker daemon

docker -d --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU}

5. Go further to share the same bridge for multiple interface

Add 2 net interface to the same bridge using the same VNI.

1
2
3
4
5
6
7
8
9
10
etcdctl set /coreos.com/network/default/config '{ "Network": "172.31.0.0/16", "Backend": { "Type": "vxlan" } }'
etcdctl set /coreos.com/network/vlan001/config '{ "Network": "10.1.0.0/16", "Backend": { "Type": "vxlan" } }'
ifconfig flannel.1
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
inet 172.31.54.0/16 scope global flannel.1
inet 10.1.78.0/16 scope global flannel.1

Conclusion

It is possible to use the same flannel bridge to enable multiple VLAN over multiple interface.

Configure CNI on minion node

Look into more detail about cni

Build plugin of CNI for flannel

1
2
3
git git@github.com:containernetworking/cni.git
cd cni
./test

Install CNI and test it

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
scp cni.tar 10.160.61.232:~/
ssh 10.160.61.232 tar -C /opt -xvf cni.tar
cat > /etc/cni/net.d/10-vlan001.conf <<EOF
{
"name": "vlan0001",
"type": "flannel",
"delegate": {
"bridge": "br-tun",
"mtu": 1400
},
"subnetFile": "/run/flannel/networks/vlan001.env"
}
EOF
apt-get install jq
cd /opt/cni
CNI_PATH=`pwd`/bin
cd scripts
sudo CNI_PATH=$CNI_PATH ./priv-net-run.sh ifconfig
sudo CNI_PATH=$CNI_PATH ./docker-run.sh --rm busybox:latest ifconfig

Setup VLAN001 all hosts

1. Setup CNI for flannel

1
2
3
4
5
6
7
8
9
10
11
cat > /etc/cni/net.d/10-vlan001.conf <<EOF
{
"name": "vlan0001",
"type": "flannel",
"delegate": {
"bridge": "br-tun",
"mtu": 1400
},
"subnetFile": "/run/flannel/networks/vlan001.env"
}
EOF

2. Modify docker-run1.sh to apply different network configuration for newly created docker container.

1
2
3
NETCONFPATH=/etc/cni/net.d #Using default location of cfg for CNI network
./exec-plugins.sh add $contid $netnspath
#trap cleanup EXIT #Commet out cleanup and keep container running

3. Create new container using VLAN001 on both host1 and host2

1
2
export CNI_PATH=/opt/cni/bin
./docker-run1.sh --rm busybox:latest ifconfig

4. Result

1. ContainerA on Host1 using vlan001
1
2
3
4
5
6
7
8
9
10
eth1 Link encap:Ethernet HWaddr 32:5B:2A:AF:84:17
inet addr:10.1.71.11 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::305b:2aff:feaf:8417/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1400 Metric:1
RX packets:50 errors:0 dropped:0 overruns:0 frame:0
TX packets:50 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4316 (4.2 KiB) TX bytes:4316 (4.2 KiB)
10.1.71.0 * 255.255.255.0 U 0 0 0 br-tun
2. ContainerB on Host2 using vlan001
1
2
3
4
5
6
7
8
9
10
eth1 Link encap:Ethernet HWaddr 82:FB:85:C3:60:A2
inet addr:10.1.78.2 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::80fb:85ff:fec3:60a2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1400 Metric:1
RX packets:56 errors:0 dropped:0 overruns:0 frame:0
TX packets:49 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4832 (4.7 KiB) TX bytes:4274 (4.1 KiB)
10.1.78.0 * 255.255.255.0 U 0 0 0 br-tun

Setup VLAN002 on all hosts

1. Setup CNI setting for flannel

1
2
3
4
5
6
7
8
9
10
11
12
13
mkdir -p /etc/cni/net2.d/
cat > /etc/cni/net2.d/10-vlan002.conf <<EOF
{
"name": "vlan0002",
"type": "flannel",
"delegate": {
"bridge": "br-tun1",
"mtu": 1400
},
"subnetFile": "/run/flannel/networks/vlan002.env"
}
EOF

2. Modify docker-run2.sh to apply different network configuration for newly created docker container.

1
2
NETCONFPATH=/etc/cni/net2.d
./exec-plugins.sh add $contid $netnspath

3. Create new container using VLAN002 on both host1 and host2

1
2
export CNI_PATH=/opt/cni/bin
./docker-run.sh --rm busybox:latest ifconfig

VLAN002 setup for containerA and ContainerB on different hosts

1. ContainerC on Host1 using vlan002

eth0      Link encap:Ethernet  HWaddr FE:07:71:04:74:CD
          inet addr:10.2.20.5  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::fc07:71ff:fe04:74cd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1400  Metric:1
          RX packets:11827 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11837 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1150970 (1.0 MiB)  TX bytes:1151210 (1.0 MiB)

10.2.20.0       *               255.255.255.0   U     0      0        0 br-tun1

2. ContainerD on Host2 using vlan002

eth0      Link encap:Ethernet  HWaddr 0E:10:EB:05:36:C4
          inet addr:10.2.88.2  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::c10:ebff:fe05:36c4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1400  Metric:1
          RX packets:9786 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9691 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:949748 (927.4 KiB)  TX bytes:933454 (911.5 KiB)

10.2.88.0       *               255.255.255.0   U     0      0        0 br-tun1

Connection isolation between containers

1. ContainerB reach ContainerA since they are within the same VLAN001 (10.1.0.0/16)

1
2
3
4
5
#/opt/cni/scripts# docker exec dfc448f7294e ping 10.1.78.2
PING 10.1.78.2 (10.1.78.2): 56 data bytes
64 bytes from 10.1.78.2: seq=0 ttl=62 time=4.039 ms
64 bytes from 10.1.78.2: seq=1 ttl=62 time=1.449 ms
64 bytes from 10.1.78.2: seq=2 ttl=62 time=1.489 ms

2. ContainerD reach ContainerC since they are both located in the same VLAN002 (10.2.0.0/16)

1
2
3
4
5
6
#/opt/cni/scripts# docker exec b063653e3b9a ping 10.2.20.5
PING 10.2.20.5 (10.2.20.5): 56 data bytes
64 bytes from 10.2.20.5: seq=0 ttl=62 time=1.636 ms
64 bytes from 10.2.20.5: seq=1 ttl=62 time=1.555 ms
64 bytes from 10.2.20.5: seq=2 ttl=62 time=1.566 ms
64 bytes from 10.2.20.5: seq=3 ttl=62 time=1.540 ms

3. containerB does NOT reach containerD even they are placed on the same host due to different VLAN

1
2
3
docker exec f59d50f5196d ping 10.1.71.11
PING 10.1.71.11 (10.1.71.11): 56 data bytes
ping: sendto: Network is unreachable

4. But ContainerD does NOT reach containerA since they are not in the same VLAN of flannel.

1
2
3
#/opt/cni/scripts# docker exec b063653e3b9a ping 10.1.71.11
ping: sendto: Network is unreachable
PING 10.1.71.11 (10.1.71.11): 56 data bytes

Configure kubelet to enable CNI+Flannel

The CNI plugin is selected by passing Kubelet the –network-plugin=cni command-line option. Kubelet reads the first CNI configuration file from –network-plugin-dir and uses the CNI configuration from that file to set up each pod’s network. The CNI configuration file must match the CNI specification, and any required CNI plugins referenced by the configuration must be present in /opt/cni/bin.

Configuration file of kublet:

- ubuntu: /etc/default/kubelet
- centos: /etc/sysconf/kubelet
  1. Append following parameters to “KUBELET_OPTS=”

    --network-plugin=cni --network-plugin-dir=/etc/cni/net2.d

  2. Restart kubelet

    service kubelet restart

  3. ADD/DEL interface to container

    ./exec-plugins.sh add <container_id> <nspath /proc/PID/ns/net >

    1
    2
    NETCONFPATH=/etc/cni/net3.d ./exec-plugins.sh add 35b6274bbaeb /proc/32118/ns/net
    NETCONFPATH=/etc/cni/net3.d ./exec-plugins.sh del 35b6274bbaeb /proc/32118/ns/net
  4. Run an deployment to test the CNI vxlan 10.2.0.0/16, all container running inside the same VLAN could reach each other

    kubectl run nginx-eric --image=nginx --replicas=6

1
2
3
4
5
6
7
8
9
10
11
12
13
root@cdsdev-sjc03-hweicdl02-api-01:~# kubectl get pods -o wide|grep eric|sort -k 6
nginx-eric2-4157641887-bwja1 1/1 Running 0 1h 10.2.31.232 10.160.61.232
nginx-eric2-4157641887-lfpe3 1/1 Running 0 1h 10.2.31.233 10.160.61.232
nginx-eric-2045285435-jecfw 1/1 Running 0 4m 10.2.31.243 10.160.61.232
nginx-eric-2045285435-u9xv6 1/1 Running 0 4m 10.2.69.4 10.122.158.23
nginx-eric-2045285435-k96jh 1/1 Running 0 4m 10.2.69.5 10.122.158.23
nginx-eric-2045285435-9oyik 1/1 Running 0 4m 172.31.22.10 10.160.61.34
nginx-eric2-4157641887-9bgqi 1/1 Running 0 1h 172.31.22.11 10.160.61.34
nginx-eric-2045285435-k3cym 1/1 Running 0 4m 172.31.68.10 10.160.61.58
nginx-eric2-4157641887-fqfpb 1/1 Running 0 1h 172.31.68.11 10.160.61.58
nginx-eric2-4157641887-9m9ql 1/1 Running 0 26m 172.31.68.13 10.160.61.58
nginx-eric-2045285435-2c9u6 1/1 Running 0 4m 172.31.69.8 10.70.189.198
nginx-eric2-4157641887-uzm55 1/1 Running 0 1h 172.31.69.9 10.70.189.198

Known limitation

  • [x] CNI plugin arguments was issue found on kubelt 1.2.4. Make sure kubelet version => 1.3
  • [x] Lack of configurable way to enable multiple VLAN using default cni support of kubelet (master at 7/27/2016 does not support). solution: It is necessary to develop an exec plugin of cni to add and del interface, and extend the spec of yaml.
  • [x] nsenter is missing on ubuntu 14.04, it is default unil-util on centos7. so strongly suggest to choose centos7+ as host rather than ubuntu.

Troubleshooting

  • [x] vlan0002 : error executing ADD: no IP addresses available in network: vlan0002

Solution: verify the last reserved ip, check if all IP address are occupied, and clean the IP address and last_reserved_ip.

1
2
3
cat /var/lib/cni/networks/vlan0002/last_reserved_ip
rm -rf /var/lib/cni/networks/vlan0002