And welcome to my personal site.
This resource is about my professional experience and interests, they are: open source software, software development with C++, Tungsten Fabric SDN, computational analysis of fluids & gases motion.
You can find news, list of resources and short notes about open source software in the blog below. If you have any ideas for sharing or about collaboration, please feel free to contact me.
Tungsten Fabric as the predecessor of OpenSDN was widely known for its support of dual IP stack. Due to the dual IP addresses situation in the networking world today it is vital for corresponding programs to support both formats: on the one hand, IPv4 address space was exhausted several years ago and IPv6 is the only solution for modern industrial applications, but on the other hand many companies still use IPv4 addresses in private and public networks and they will continue to require them for some time to come.
The dual IP stack feature highlights OpenSDN/Tungsten Fabric/OpenContrail in comparison with other SDN platforms because it provides support for modern network applications while preserving backward compatibility with older IPv4 services.
The transition to IPv6 protocol support in Tungsten Fabric had not been 100% complete and some valuable network functions were limited to 4-byte IP addresses. Some restrictions were alleviated in the latest R24.1 release of OpenSDN:
With this realease OpenSDN starts its own story as a separate project. It is a successor of Tungsten Fabric, OpenContrail and Contrail, but having it's own highly awaited features, such as:
The official release page can be found here. Other links:
Network Address Translation (NAT) technique is widely used in IPv4 networks. This approach allows to connect hosts in private networks to the Internet, saves IPv4 addresses (which are now steadily becoming more and more expensive resource) and adds additional layer of network protection and security for customers.
With the advent of IPv6, the main reason (exhaustion of network addresses) has become obsolete and this network feature is not supported by many forwarding and routing devices. However, NAT protocol for IPv6 (NAT66) might be used in networks due to some other reasons, related to clouds exploitation. Today (October, 2, 2023), Tungsten Fabric vRouter Forwarder is not an exception and while it allows to perform all types of NAT (source/destination network address translation, source/destination port address translation(PAT)) for IPv4 standard, there are no similar tools for IPv6 standard.
The described here vRouter code modifications allow to enable NAT66 functionality in Tungsten Fabric and also show how to introduce changes into vrouter.ko module.
The implementation of NAT66 for Tungsten Fabric will require modification of it's control and data planes components: vRouter Agent and vRouter Forwarder. Before we proceed to description of needed code changes, it is necessary to give brief description of loading the vRouter Forwarder modified source code. Afterwards, ...
vRouter Forwarder kernel module is compiled and loaded into the memory using special Docker container, therefore we should understand how to inject the modified source code into this container and how to run it. This procedure involves next 5 steps.
Prepare original sources as tarball. During this step we should create a compressed tarball which has an extension .tgz and contains next directories (with corresponding files):
Prepare a Dockerfile with necessary changes. New Dockerfile should contain list of all files that are to be added into the new docker image. For instance, in our example we should load previous default Docker image instructions and then we should update files: vr_flow.h, vr_flow.c, vr_proto_ip6.c:
FROM the_name_of_original_image ADD ./the_name_of_tarball_with_sources.tgz /vrouter_src/ ADD ./vr_flow.h /vrouter_src/include/vr_flow.h ADD ./vr_flow.c /vrouter_src/dp-core/vr_flow.c ADD ./vr_proto_ip6.c /vrouter_src/dp-core/vr_proto_ip6.c
Stop previous services. Before loading the modified image into the memory, it is necessary to remove the old one: devices should be removed, dockers and modules should be stopped and removed. A script to do this work might look as:
#!/usr/bin/bash
ifconfig vhost0 down
I=0
N=3 # if pkt devices are numbered from 0 to 3
for I in `seq 0 $N`
do
ifconfig pkt$I down
done
docker stop vrouter_vrouter-agent_1 # stop vRouter Agent container
docker rm -f vrouter_vrouter-agent_1 # rm vRouter Agent container
docker stop vrouter_vrouter-kernel-init_1 # stop vRouter Forwarder Init
docker rm -f vrouter_vrouter-kernel-init_1 # rm vRouter Forwarder Init
rmmod vrouter
kernels=`ls -d /lib/modules/[0-9]*` # remove all vrouter.ko for all kernels
for k in $kernels
do
rm -rf $k/updates/dkms/vrouter.ko
done
Load/create new docker image using common Docker commands:
# remove previous version of the container
docker image rm --force kmod_init:R2011-latest
#
docker image build -t kmod_init:R2011-latest .
docker run --mount type=bind,src=/usr/src,dst=/usr/src --mount type=bind,src=/lib/modules,dst=/lib/modules kmod_init:R2011-latest > run_log 2>&1
Load to memory. For this purpose you can use for example docker-compose:
docker-compose -f ./docker-compose.yaml up -d
To enable NAT for IPv6 we will firstly modify PktFlowInfo class (pkt_flow_info.cc) methods:
Functions PktFlowInfo::FloatingIpDNat and PktFlowInfo::FloatingIpSNat are responsible for setting up Source NAT (SNAT) and Destination NAT forwarding on vRouter Forwarder. These function are executed per each flow entry (a FlowEntry entry object) and program vRouter Forwarder settings for that flow entry.
In method PktFlowInfo::FloatingIpDNat we replace
if (pkt->ip_daddr.to_v4() == vm_port->primary_ip_addr()) {
return;
}
with
if (pkt->ip_daddr.is_v4()) {
if (pkt->ip_daddr.to_v4() == vm_port->primary_ip_addr()) {
return;
}
}
if (pkt->ip_daddr.is_v6()) {
if (pkt->ip_daddr.to_v6() == vm_port->primary_ip6_addr()) {
return;
}
}
which means that we check for the coincidence of a virtual machine primary address with the Floating IP address both for IPv6 and IPv4 formats.
Then we replace
if (pkt->ip_daddr.to_v4() != it->floating_ip_) {
continue;
}
with
if (pkt->ip_daddr != it->floating_ip_) {
continue;
}
And right after if (underlay_flow) {
we add
if (pkt->ip_daddr.is_v6()) {
return;
}
because Tungsten Fabric doesn't support IPv6 for underlay for the time being.
In function PktFlowInfo::FloatingIpSNat we remove next code to enable SNAT:
if (pkt->family == Address::INET6) {
return;
//TODO: V6 FIP
}
Then we relplace
if (it->fixed_ip_ != Ip4Address(0) && (pkt->ip_saddr != it->fixed_ip_)) {
with
if (it->fixed_ip_ != IpAddress() && (pkt->ip_saddr != it->fixed_ip_)) {
to make comparison in the if
condition to be agnostic of IP version
number.
There is some issue regarding NAT (NAT66) + VxLAN. Unfortunately, NAT and NAT66 doesn't work well with VRF NextHop, therefore we might go two ways:
Both solutions are described in the corresponding note (NAT + VxLAN in Tungsten Fabric). Solution number 2 (with the adjustement of the source code) seems simpler, because adjustment of ACL for cases with UDP/TCP might be cumbersome.
Finally, we should update PktFlowInfo::UpdateFipStatsInfo function to enable statics collection for IPv6 floating IP. However, these changes don't affect forwarding functionality of TF and they are not considered here.
Next, when we have modified vRouter Agent to program vRouter Forwarder, we should teach the latter one to work with IPv6 packets destined for NAT processing.
NAT processing of IP packets in vRouter Forwarder is attached to flow forwarding. Each time when flow is identified for a given combination of IP addresses, ports and protocol type, vRouter Forwarder applies the address and port translation according to settings specified by vRouter Agent. A function that performs the translation for IPv4 is called from vr_flow_nat. The whole process of flow indentification and configuration is triggered by vr_do_flow_lookup which is invoked:
A full call stack from it's start in vr_flow_forward till NAT processing is next:
vr_flow_forward
| +--> vr_inet6_flow_lookup --+
v | |
vr_do_flow_lookup --+ +->- vr_flow_lookup
| | |
+--> vr_inet_flow_lookup ---+ |
|
v
vr_flow_nat <-- vr_flow_action_default <-- vr_flow_action
|
v
vr_inet_flow_nat
To conform to the coding practices of TF vRouter Forwarder, a new procedure of NAT IPv6 packets processing is placed into function vr_inet6_flow_nat, which is called from vr_flow_nat.
The new function has been created on the basis of the original IPv4 NAT processing function (vr_inet_flow_nat) by accounting next peculiarities of IPv6 protocol:
Finally, in case we have TCP or UDP packets as a payload of our IP6 packet, we should update it's checksum after address translation. For this operation a function vr_ip6_update_csum (by analogy to original vr_ip_update_csum) has been created. Listings of all code changes are shown below.
Changes to vr_flow_nat function: if (pkt->vp_type == VP_TYPE_IP6) return vr_inet6_flow_nat(fe, pkt, fmd);
vr_inet6_flow_nat function code: flow_result_t vr_inet6_flow_nat(struct vr_flow_entry *fe, struct vr_packet *pkt, struct vr_forwarding_md *fmd) { uint32_t ip_inc = 0, port_inc = 0; uint16_t *t_sport = NULL, *t_dport = NULL; const uint8_t dw_p_ip6 = VR_IP6_ADDRESS_LEN / sizeof(uint32_t); uint8_t i = 0;
struct vrouter *router = pkt->vp_if->vif_router;
struct vr_flow_entry *rfe = NULL;
struct vr_ip6 *ip6 = NULL, *icmp_pl_ip6 = NULL;
struct vr_icmp *icmph = NULL;
if (fe->fe_rflow < 0)
goto drop;
rfe = vr_flow_get_entry(router, fe->fe_rflow);
if (!rfe)
goto drop;
ip6 = (struct vr_ip6 *)pkt_network_header(pkt);
if (!ip6)
goto drop;
if (ip6->ip6_nxt == VR_IP_PROTO_ICMP6) {
icmph = (struct vr_icmp *)((char *)ip6 + sizeof(struct vr_ip6));
if (vr_icmp_error(icmph)) {
icmp_pl_ip6 = (struct vr_ip6 *)(icmph + 1);
if (fe->fe_flags & VR_FLOW_FLAG_SNAT) {
memcpy(icmp_pl_ip6->ip6_dst, rfe->fe_key.flow6_dip,
VR_IP6_ADDRESS_LEN);
}
if (fe->fe_flags & VR_FLOW_FLAG_DNAT) {
memcpy(icmp_pl_ip6->ip6_src, rfe->fe_key.flow6_sip,
VR_IP6_ADDRESS_LEN);
}
t_sport = (uint16_t *)((uint8_t *)icmp_pl_ip6 +
sizeof(struct vr_ip6));
t_dport = t_sport + 1;
if (fe->fe_flags & VR_FLOW_FLAG_SPAT) {
*t_dport = rfe->fe_key.flow6_dport;
}
if (fe->fe_flags & VR_FLOW_FLAG_DPAT) {
*t_sport = rfe->fe_key.flow6_sport;
}
}
}
if ((fe->fe_flags & VR_FLOW_FLAG_SNAT) &&
(memcmp(ip6->ip6_src, fe->fe_key.flow6_sip, VR_IP6_ADDRESS_LEN) == 0)) {
for (i = 0; i < VR_IP6_ADDRESS_LEN; i += dw_p_ip6) {
vr_incremental_diff( *((uint32_t*)(ip6->ip6_src + i)),
*((uint32_t*)(rfe->fe_key.flow6_dip + i)), &ip_inc);
}
memcpy(ip6->ip6_src, rfe->fe_key.flow6_dip,
VR_IP6_ADDRESS_LEN);
}
if (fe->fe_flags & VR_FLOW_FLAG_DNAT) {
for (i = 0; i < VR_IP6_ADDRESS_LEN; i += dw_p_ip6) {
vr_incremental_diff( *((uint32_t*)(ip6->ip6_dst + i)),
*((uint32_t*)(rfe->fe_key.flow6_sip + i)), &ip_inc);
}
memcpy(ip6->ip6_dst, rfe->fe_key.flow6_sip,
VR_IP6_ADDRESS_LEN);
}
if (vr_ip6_transport_header_valid(ip6)) {
t_sport = (uint16_t *)((uint8_t *)ip6 +
sizeof(struct vr_ip6));
t_dport = t_sport + 1;
if (fe->fe_flags & VR_FLOW_FLAG_SPAT) {
vr_incremental_diff(*t_sport,
rfe->fe_key.flow6_dport, &port_inc);
*t_sport = rfe->fe_key.flow6_dport;
}
if (fe->fe_flags & VR_FLOW_FLAG_DPAT) {
vr_incremental_diff(*t_dport,
rfe->fe_key.flow6_sport, &port_inc);
*t_dport = rfe->fe_key.flow6_sport;
}
}
if (!vr_pkt_is_diag(pkt))
vr_ip6_update_csum(pkt, ip_inc, port_inc);
if ((fe->fe_flags & VR_FLOW_FLAG_VRFT) && pkt->vp_nh &&
((pkt->vp_nh->nh_vrf != fmd->fmd_dvrf) ||
(pkt->vp_nh->nh_flags & NH_FLAG_ROUTE_LOOKUP))) {
/* only if pkt->vp_nh was set before... */
pkt->vp_nh = vr_inet6_ip_lookup(fmd->fmd_dvrf, ip6->ip6_dst);
}
return FLOW_FORWARD;
drop:
PKT_LOG(VP_DROP_FLOW_NAT_NO_RFLOW, pkt, 0, VR_PROTO_IP6_C, __LINE__);
vr_pfree(pkt, VP_DROP_FLOW_NAT_NO_RFLOW);
return FLOW_CONSUMED;
}
vr_ip6_update_csum function code: static void vr_ip6_update_csum(struct vr_packet *pkt, uint32_t ip_inc, uint32_t port_inc) { struct vr_ip6 *ip6 = NULL; struct vr_tcp *tcp = NULL; struct vr_udp *udp = NULL; uint32_t csum_inc = ip_inc; uint32_t csum = 0; uint16_t *csump = NULL;
ip6 = (struct vr_ip6 *)pkt_network_header(pkt);
if (ip6->ip6_nxt == VR_IP_PROTO_TCP) {
tcp = (struct vr_tcp *)((uint8_t *)ip6 + sizeof(struct vr_ip6));
csump = &tcp->tcp_csum;
} else if (ip6->ip6_nxt == VR_IP_PROTO_UDP) {
udp = (struct vr_udp *)((uint8_t *)ip6 + sizeof(struct vr_ip6));
csump = &udp->udp_csum;
if (*csump == 0) {
return;
}
} else {
return;
}
if (vr_ip6_transport_header_valid(ip6)) {
/*
* for partial checksums, the actual value is stored rather
* than the complement
*/
if (pkt->vp_flags & VP_FLAG_CSUM_PARTIAL) {
csum = (*csump) & 0xffff;
} else {
csum = ~(*csump) & 0xffff;
csum_inc += port_inc;
}
csum += csum_inc;
if (csum < csum_inc)
csum += 1;
csum = (csum & 0xffff) + (csum >> 16);
if (csum >> 16)
csum = (csum & 0xffff) + 1;
if (pkt->vp_flags & VP_FLAG_CSUM_PARTIAL) {
*csump = csum & 0xffff;
} else {
*csump = ~(csum) & 0xffff;
}
}
return;
}
The implementation of address translation for IPv6 is tested for NAT and PAT cases separately. Since TF is able to use both MPLS and VxLAN technologies, all tests are performed for each overlay type. When connecting virtual machines reside on one hypervisor, transferred data between them is not encapsulated. When virtual machines belong to different computes, a tunnel is established between them. Therefore, 2 configurations should be considered for each overlay type: without tunnel (Interface) and with tunnel. For each configuration 3 types of L4 packets are considered: ICMP, TCP, UDP.
An inter-VRF packet forwarding is tested additionally for VxLAN overlay.
Drop of packets for cases where there is no reverse flow is verified separately.
Finally, we have the next map of trials:
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|
1 | NAT | MPLS | Intf | ICMP |
2 | NAT | MPLS | Tunn | ICMP |
3 | NAT | MPLS | Intf | TCP |
4 | NAT | MPLS | Tunn | TCP |
5 | NAT | MPLS | Intf | UDP |
6 | NAT | MPLS | Tunn | UDP |
7 | PAT | MPLS | Intf | TCP |
8 | PAT | MPLS | Tunn | TCP |
9 | PAT | MPLS | Intf | UDP |
10 | PAT | MPLS | Tunn | UDP |
11 | NAT | VxLAN | Intf | ICMP |
12 | NAT | VxLAN | Tunn | ICMP |
13 | NAT | VxLAN | Intf | TCP |
14 | NAT | VxLAN | Tunn | TCP |
15 | NAT | VxLAN | Intf | UDP |
16 | NAT | VxLAN | Tunn | UDP |
17 | PAT | VxLAN | Intf | TCP |
18 | PAT | VxLAN | Tunn | TCP |
19 | PAT | VxLAN | Intf | UDP |
20 | PAT | VxLAN | Tunn | UDP |
Columns of the table have next meaning:
We have two VM's created, for example, using OpenStack. These machines reside on compute2. Each machine has: one public IPv4 interface for interaction with an external world and one private IPv6 interface for inter-VM connection.
It is expected that both interfaces are configured properly (i.e., ifconfig shows they are present and ip -6 r gives correct routes).
IPv6 interfaces are connected to a TF IPv6 virtual network (let say U6-net) with subnet aaaa::/32. One interface has address aaaa::11, another has address aaaa::21. The interface with an address aaaa::11 is linked to Floating IP with address cccc::11.
For ICMP tests we run ping command from aaaa::21 interface to cccc::11:
ping -6 cccc::11
For TCP tests we run on the machine with aaaa::11 (and cccc::11) interfaces:
nc -6 -nlv aaaa::11 8888
And on the machine with aaaa::21 interface:
nc -6 -nv cccc::11 8888
For UDP tests we run nc with the -u key.
When PAT capabilities are tested, next settings are to apply:
For cases with tunnels betweens VM's, another VM with interface address aaaa::41 that is located on different compute node is used for outgoing ping and nc connections.
Security groups for all ports should allow egress and ingress connections for ICMP, TCP and UDP protocols.
For TCP and UDP inter-VRF tests, the permissive policy (e.g., pass for all addresses, ports and protocols) should be enabled for both networks (VRF instances) and for routing network (routing VRF instance) if we use ACL's.
Due to issues with checksum calculations in TF vrouter.ko module, it is recommended to turn it of on all computes:
ethtool -K ethX tx off rx off
Where ethX is a name of the ethernet interface that is connected to the vhost0 interface.
firewalld, iptables, ip6tables are disabled.
Packet mode is turned off for all ports, reverse flow forwarding is on for all networks.
For the VxLAN configuration settings are analogous to MPLS configuration, however:
Networks U6-net and V6-net with respective subnets aaaa::/32 and bbbb::/32 are connected to a single VxLAN logical router with some predefined VNI - let's say, 112233.
To test ICMP and TCP/UDP packets, commands ping and nc are used.
An example of a packet path from when travelling via VxLAN from one VRF instance (V6-net) to another (U6-net) between virtual machines on different computes (VxLAN + VRF-T + NAT66):
compute 0
tap
|
v
bbbb::41 aaaa::/32 aaaa::31
-------- ---> --------- ---> --------
V6-net VRF V6-net VRF Rt VRF
|
v
ethX compute0
- - - - - - - - - - - - - - - - - - - - -
ethX compute2
|
v
aaaa::11 NAT aaaa::31
tap <--- -------- <--- --------
Rt VRF Rt VRF
compute 2
Tungsten Fabric vRouter Agent component partially belongs to data plane and control planes, which fits well in the architecture proposed by Open Networking Foundation (ONF) [ONF_SDN_ARCH]. Therefore, it has dual role:
The process of packets forwarding in vRouter is performed in accordance with:
There are also other tables that carry auxiliary functions: EVPN tables, multicast tables, nexthop table, interface table, etc. Nonetheless, all tables hold the same structure and derive from the same parent class. For example, on figure 1 an inheritance diagram is shown for the special branch of table classes (AgentRouteTable) that are used to represent forwarding information in vRouter Agent.
Figure 1.
Each table (class DBTable) in vRouter Agent is represented as a partitioned tree. Partitioning is organized as array (std::vector) of partitions (DBTablePartition class). And each partition contains a red-black tree of records that constitutes the table. For the implementation of a red-black tree, boost::intrusive::set template class of boost library is used.
Therefore, route tables in vRouter Agent (just like all other tables) are represented as partiotioned sets of records, where each record represents a route.
Each route in vRouter Agent is represented with class AgentRoute that holds information about a destination address (an IP prefix for L3 routes or a MAC address for L2 routes) and the list of paths that can be used to reach the destination.
A path to the destination is represented by:
When a packet travels from one virtual machine to another one, it passes through the sequence of nexthops, each of them directs the packet to another destination (router, network, etc) or processes it (encapsulates or decapsulates, for example). Examples of nexthops are:
A peer marker is used to distinguish an origin of a path or it's purpose. Examples of peers are:
Let's traverse an EVPN Type2 table to illustrate this structure of a route table. If we want to traverse a route table in vRouter Agent to print its content, we should organize 2 cycles (see listing 1):
RTTI is used to cast results of base DBTable class functions to needed final classes (e.g., EvpnRouteEntry as type of a route table record). In the current implementation of vRouter Agent, partitioning for AgentRouteTable is turned off and, hence during walk we use only partition number 0. However, in other table several partitions might be used and in this situation there are two choices:
As it is seen of figure 2, we have:
Some routes were created by vRouter Agent of a hypervisor (LocalVmExportPeer peer type), and some routes were imported from SDN controller.
Figure 2.
Listing 1:
VrfEntry* routing_vrf = const_cast<VrfEntry*>(const_vrf);
EvpnAgentRouteTable *evpn_table = dynamic_cast<EvpnAgentRouteTable *>
(routing_vrf->GetEvpnRouteTable());
if (evpn_table == NULL) {
std::cout<< "VxlanRoutingManager::PrintEvpnTable"
<< ", NULL EVPN tbl ptr"
<< std::endl;
return;
}
EvpnRouteEntry *c_entry = dynamic_cast<EvpnRouteEntry *>
(evpn_table->GetTablePartition(0)->GetFirst());
if (c_entry) {
if (c_entry->IsType5())
std::cout << "Evpn Type 5 table:" << std::endl;
else
std::cout << "Evpn Type 2 table:" << std::endl;
}
while (c_entry) {
const Route::PathList & path_list = c_entry->GetPathList();
std::cout<< " IP:" << c_entry->ip_addr()
<< ", path count = " << path_list.size()
<< ", ethernet_tag = " << c_entry->ethernet_tag()
<< std::endl;
for (Route::PathList::const_iterator it = path_list.begin();
it != path_list.end(); ++it) {
const AgentPath* path =
dynamic_cast<const AgentPath*>(it.operator->());
if (!path)
continue;
std::cout<< " NH: "
<< (path->nexthop() ? path->nexthop()->ToString() :
"NULL")
<< ", " << "Peer:"
<< (path->peer() ? path->peer()->GetName() : "NULL")
<< std::endl;
if (path->nexthop()->GetType() == NextHop::COMPOSITE) {
CompositeNH *comp_nh = dynamic_cast<CompositeNH *>
(path->nexthop());
std::cout<< " n components="
<< comp_nh->ComponentNHCount()
<< std::endl;
}
}
if (evpn_table && evpn_table->GetTablePartition(0))
c_entry = dynamic_cast<EvpnRouteEntry *>
(evpn_table->GetTablePartition(0)->GetNext(c_entry));
else
break;
}
[ONF_SDN_ARCH] SDN architecture, Issue 1. Open Networking Foundation, 2014.
'ppose, you want to implement in Tungsten Fabric a network configuration that allows interconnection between internal nodes (virtual machines) from different TF networks. This can be accomplished with employment of a VxLAN feature of Tungsten Fabric. All networks connected to a Logical Router (LR) expose their interface routes to the exchange (or routing) Virtual Routing and Forwarding (VRF) instance making them public for all hosts, which connect to this LR. Private IP addresses of virtual machines can be hidden using special TF tools like Allowed Address Pairs (AAP) or Floating IP (FIP) which is actually a synonym to the NAT technology.
However, there are two issues that hinder usage of all TF powerfull features in this scenario:
The first issue can be solved with the new experimental VxLAN joint, which is available in the master branch of Tungsten Fabric controller: https://github.com/tungstenfabric/tf-controller. The new VxLAN implementation allows to copy from bridge VRF instances into routing VRF instance all kinds of interface (and composite of interfaces) routes, including:
The second problem can be resolved by means of another Tungsten Fabric tool: Access Control List (ACL). Implementation of NAT in Tungsten Fabric uses information from the endpoint interface of a path (interface nexthop) to set up modification of a forwarded IP packet header. Specifically, NAT works in TF only when the first nexthop of a route is of interface type. When the first nexthop in a path from a bridge VRF instance to the routing VRF instance is not an interface but the vrf, then this prohibits usage of NAT.
However, the solution with ACL might be cumbursome when it is neccessary to implement forwarding of packets for several different protocols and many FIPs. Therefore, an alternative solution, which changes code of vRouter Agent is proposed at the end of the material.
In this case we can use ACL to transfer the packet from one VRF instance (bridge) into another VRF instance (routing) where the needed interface nexthop is present and condition for NAT is met.
Now as general considerations about configuration have been set out, let's consider a practical example. We're going to use master branch of TF controller with OpenStack for virtualization orchestration. Networks connectivity sketch is presented here:
+--------------------------------------------------------+
| hypervisor |
| +---+ +---+ |
| |VM1| |VM2| |
| +---+ +---+ |
+-----|--------------------------------------------|-----+
| |
+---+ FIP pool +---+ VxLAN +----+ VxLAN +---+
| U | ---------- | W | ------ | LR | ------ | V |
+---+ +---+ +----+ +---+
bridge bridge routing bridge
network network network network
Let's say a virtual machine VM1 is connected to virtual network U ( subnet mask 10.1.1.0/24) via the IP address 10.1.1.21 and a virtual machine VM2 is connected to virtual network V (subnet mask 10.2.2.0/24) via the IP address 10.2.2.21. 'ppose, these virtual machines belong to a hypervisor compute2 and we want to connect them using VxLAN. Moreover, let's envsion also that we want to hide 10.1.1.0/24 addresses from V network. In this case we create network W, which will keep Floating IP addresses and connect it to our LR. We also connect network V to the LR.
NB. To enable Floating IP pool for W network, we should mark it as external in it's advanced properties. On the flip side, TF GUI doesn't allow to connect external networks to a LR. To overcome this, configuration steps should be executed in the next sequence:
After that, TF GUI will hide this external network from the list of connected to the LR networks, but actually it will stay there.
To implement our configuration:
When we connect the newly created VMI's to their virtual machines (virtual machines are assumed to be running) we will see next configuration of virtual networks:
VRF Inet tables for V, W and routing networks are presented on figures 1a, 1b and 1c respectively.
Now the VMI from W network (VRF instance) can talk to the VMI from V network (VRF instance) via the LR: when a packet from W VRF VMI is send to the prefix 10.2.2.0/24, this packet hops into the routing VRF instance (according to vrf nexthop of 10.2.2.0/24 route). There the packet finds the interface nexthop of 10.2.2.21/32 route and proceeds to the TAP of the destination guest OS.
Guest OS's should also be configured to know about foreign networks subnets:
We use network 10.9.9.0/24 prefix because we don't want to expose IP addresses from virtual network U to network V.
It is assumed also that all virtual port security groups are open in ingress and egress directions for all needed addresses, protocols and ports ranges.
If we didn't use Floating IP (which is actually the NAT) to hide addresses of U network, this configuration would be enough to connect virtual machines from different networks. To overcome the problem of the conflict between TF VxLAN and NAT implementations (see intro for more details), we are going to employ ACL facility of Tungsten Fabric:
Rule No. 0 is needed to transfer packets from the VRF instance of network V into the routing VRF instance where the Floating IP (10.9.9.91) is present and where NAT will be performed.
Rule No. 1 is needed to unblock passage of all other packets coming from or to network V.
ACL configurations for rules No. 0 and No. 1 are presented in JSON format on figures 2a and 2b respectively.
We can now try to ping VM1 from VM2 using FIP address 10.9.9.91 (figure 3).
And a final question: is it possible to connect internal networks with an external one using this approach? Use, it is possible, but in this case ACL is not needed, because a packet from an external network hits directly the routing VRF instance where FIP routes reside.
Setting up ACL rules for many VRF instances, several protocols (IPv4 / IPv6, ICMP/TCP/ UDP) might be very time-consuming and error prone process. In this case, it is better to change vRouter Agent code. Inter-VRF packets forwarding is available through setting up a flow and this job is performed in class PktFlowInfo.
We will add a new method NatVxlanVrfTranslate to class PktFlowInfo that performs all neccesary operations inter-VRF packets forwarding. The declaration of the class is contained in file pkt_flow_info.h, and hence we put our first modification (inside PktFlowInfo scope) there:
void NatVxlanVrfTranslate(const PktInfo *pkt, PktControlInfo *in,
PktControlInfo *out);
Afterwards, we should put body of the function in file pkt_flow_info.cc. The function contains checks that test whether all necessary is present and is allocated. We also check that an output VRF instance has an interface for the given destination IP address and this interface corresponds to some Floating IP entity. Finally, parameters of the flow are updated via functions ChangeVrf and UpdateRoute.
void PktFlowInfo::NatVxlanVrfTranslate(const PktInfo *pkt, PktControlInfo *in,
PktControlInfo *out) {
if (out == NULL ||
in == NULL ||
out->rt_ == NULL ||
in->vrf_ == NULL ||
in->vrf_->routing_vrf()) {
return;
}
const NextHop *nh = out->rt_->GetActiveNextHop();
if (nh == NULL || nh->GetType() != NextHop::VRF) {
return;
}
const VrfNH *vrf_nh = static_cast<const VrfNH *>(nh);
const VrfEntry *vrf = vrf_nh->GetVrf();
if (vrf == NULL || vrf->routing_vrf() == false) {
return;
}
InetUnicastRouteEntry *inet_rt = vrf->GetUcRoute(pkt->ip_daddr);
const NextHop *rt_nh = inet_rt ?
inet_rt->GetActiveNextHop() : NULL;
if (rt_nh == NULL || rt_nh->GetType() != NextHop::INTERFACE) {
return;
}
const Interface *intf = static_cast<const InterfaceNH*>
(rt_nh)->GetInterface();
if (intf == NULL || intf->type() != Interface::VM_INTERFACE ||
static_cast<const VmInterface*>(intf)->FloatingIpCount() == 0) {
return;
}
ChangeVrf(pkt, out, vrf);
UpdateRoute(&out->rt_, vrf, pkt->ip_daddr, pkt->dmac,
flow_dest_plen_map);
UpdateRoute(&in->rt_, vrf, pkt->ip_saddr, pkt->smac,
flow_source_plen_map);
}
Finally, we have to put call to this methods in a proper place. The correct place is PktFlowInfo::IngressProcess method just before the first call to PktFlowInfo::VrfTranslate method:
NatVxlanVrfTranslate(pkt, in, out);
This example was tested for Tungsten Fabric R2011, but it should also work for later versions.
Finally, vRouter Agent should be recompiled and the new binary must be copied into the proper docker container. After restart of the container, ping and nc commands can be used to test inter-VRF forwarding of packages.
A non-exhaustive but still relevant list of terms and abbreviations used in networking.
Once I needed a translator from Markdown into HTML formats to display contents of .md files on web pages. I intended to call that translator from C++ code, therefore I put several constraints:
I tried several solutions from the web:
Finally, there are next libraries that were not considered by me (but they could have their advantages):
The upcoming Linux Foundation Networkin (LFN) Developer & Testing Forum (D&TF) will be held virtually from June 6th to June 8th, 2023.
This event provides an opportunity for various specialists involved in LFN software development (both testers and developers) to collaborate and innovate on open-source networking projects. Next projects are usually presented at the event:
The deadline for topic submissions is May 19th, 2023. Topics should be submitted via this link: https://wiki.lfnetworking.org/x/xgjxB.
Participants registration is available at: https://cvent.me/NKWY5M.
Questions regarding topics submission or participation should be directed to events@lfnetworking.org.
Tungsten Fabric uses a lot of servers to provide tools for information, analysis and configuration. Sometimes they are called as Northbound and Soundbound bridges. It is possible to find tools addresses at the Juniper's site and in the source code. And also it might be valuable to keep them in one place.
Let's envision that the TF control node is located at the IP 172.16.0.26.
http://172.16.0.26:8081/documentation/index.html
http://172.16.0.26:8082/documentation/index.html
http://172.16.0.26
http://172.16.0.26:8143
In some circumstances it is neccessary to disable the TF vrouter kernel module right at the boot stage. For example, when a version of vrouter module with serious bug was compiled and installed in the kernel. In this case, reboot will not help and system continues to crash right after the boot stage. Or, for example, it is neccessary to switch off flow through the data plane provided by the TF at the boot.
In the aforementioned and other cases the editor of the GNU GRUB loader allows to disable vrouter kernel module just for one start of an OS. See, for example here.
The procedure contains 6 steps.
linux
and browse to it's end. Put space and write comma-separated list of disable modules after the keyword module_blacklist
, e.g.: module_blacklist=vrouter
, Fig. 3Fig. 1
Fig. 2
Fig. 3
The compilation process of Tungsten Fabric (or the TF) on CentOS 8 is slightly different from CentOS 7 and requires some modifications of tf-dev-env scripts.
Next steps are needed to compile it on the fresh CentOS 8.
Please, write me if I miss anything in these steps.
The Tungsten Fabric (or the TF) might work with many Linux distributions, such as: Ubuntu, CentOS, and others. The process of the TF compilation from sources is distribution-agnostic and is covered in tf-dev-env repository. However, this process might slightly vary depending on a type of the selected Linux distribution. While deviations from the common instructions are usually small, they might confuse a new TF user.
Here I summarize nuances of the TF R2011 compilation on the fresh Ubuntu 20.04.
Next steps are needed to compile it on the fresh Ubuntu 20.04.
Please, write me if I miss anything in these steps.