SONiC学习笔记(七):BGP工作流(下)—— BGP路由变更下发

(以下内容已经整合进《SONiC入门指南》的 BGP路由变更下发 一节中。)

在上一篇中,我们介绍了BGP路由变更的工作流中从bgpdfpmsync的部分,本篇我们将继续介绍剩下的BGP路由变更的下发流程。

1. SONiC路由变更工作流

当FRR变更内核路由配置后,SONiC便会收到来自Netlink和FPM的通知,然后进行一系列操作将其下发给ASIC,其主要流程如下:

sequenceDiagram
    autonumber
    participant K as Linux Kernel
    box lightyellow bgp容器
    participant Z as zebra
    participant FPM as fpmsyncd
    end
    box pink database容器
    participant R as Redis
    end
    box lightblue swss容器
    participant OA as orchagent
    end
    box lightgreen syncd容器
    participant SD as syncd
    end
    participant A as ASIC

    K->>FPM: 内核路由变更时通过Netlink发送通知
    Z->>FPM: 通过FPM接口和Netlink<br/>消息格式发送路由变更通知

    FPM->>R: 通过ProducerStateTable<br/>将路由变更信息写入<br/>APPL_DB

    R->>OA: 通过ConsumerStateTable<br/>接收路由变更信息
    
    OA->>OA: 处理路由变更信息<br/>生成SAI路由对象
    OA->>SD: 通过ProducerTable<br/>或者ZMQ将SAI路由对象<br/>发给syncd

    SD->>R: 接收SAI路由对象,写入ASIC_DB
    SD->>A: 通过SAI接口<br/>配置ASIC

1.1. fpmsyncd更新Redis中的路由配置

首先,我们从源头看起。fpmsyncd在启动的时候便会开始监听FPM和Netlink的事件,用于接收路由变更消息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// File: src/sonic-swss/fpmsyncd/fpmsyncd.cpp
int main(int argc, char **argv)
{
...

DBConnector db("APPL_DB", 0);
RedisPipeline pipeline(&db);
RouteSync sync(&pipeline);

// Register netlink message handler
NetLink netlink;
netlink.registerGroup(RTNLGRP_LINK);

NetDispatcher::getInstance().registerMessageHandler(RTM_NEWROUTE, &sync);
NetDispatcher::getInstance().registerMessageHandler(RTM_DELROUTE, &sync);
NetDispatcher::getInstance().registerMessageHandler(RTM_NEWLINK, &sync);
NetDispatcher::getInstance().registerMessageHandler(RTM_DELLINK, &sync);

rtnl_route_read_protocol_names(DefaultRtProtoPath);
...

while (true) {
try {
// Launching FPM server and wait for zebra to connect.
FpmLink fpm(&sync);
...

fpm.accept();
...
} catch (FpmLink::FpmConnectionClosedException &e) {
// If connection is closed, keep retrying until it succeeds, before handling any other events.
cout << "Connection lost, reconnecting..." << endl;
}
...
}
}

这样,所有的路由变更消息都会以Netlink的形式发送给RouteSync,其中[EVPN Type 5][EVPN]必须以原始消息的形式进行处理,所以会发送给onMsgRaw,其他的消息都会统一的发给处理Netlink的onMsg回调:(关于Netlink如何接收和处理消息,请移步之前总结的通信机制一篇

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// File: src/sonic-swss/fpmsyncd/fpmlink.cpp
// Called from: FpmLink::readData()
void FpmLink::processFpmMessage(fpm_msg_hdr_t* hdr)
{
size_t msg_len = fpm_msg_len(hdr);
nlmsghdr *nl_hdr = (nlmsghdr *)fpm_msg_data(hdr);
...

/* Read all netlink messages inside FPM message */
for (; NLMSG_OK (nl_hdr, msg_len); nl_hdr = NLMSG_NEXT(nl_hdr, msg_len))
{
/*
* EVPN Type5 Add Routes need to be process in Raw mode as they contain
* RMAC, VLAN and L3VNI information.
* Where as all other route will be using rtnl api to extract information
* from the netlink msg.
*/
bool isRaw = isRawProcessing(nl_hdr);

nl_msg *msg = nlmsg_convert(nl_hdr);
...
nlmsg_set_proto(msg, NETLINK_ROUTE);

if (isRaw) {
/* EVPN Type5 Add route processing */
/* This will call into onRawMsg() */
processRawMsg(nl_hdr);
} else {
/* This will call into onMsg() */
NetDispatcher::getInstance().onNetlinkMessage(msg);
}

nlmsg_free(msg);
}
}

void FpmLink::processRawMsg(struct nlmsghdr *h)
{
m_routesync->onMsgRaw(h);
};

接着,RouteSync收到路由变更的消息之后,会在onMsgonMsgRaw中进行判断和分发:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// File: src/sonic-swss/fpmsyncd/routesync.cpp
void RouteSync::onMsgRaw(struct nlmsghdr *h)
{
if ((h->nlmsg_type != RTM_NEWROUTE) && (h->nlmsg_type != RTM_DELROUTE))
return;
...
onEvpnRouteMsg(h, len);
}

void RouteSync::onMsg(int nlmsg_type, struct nl_object *obj)
{
// Refill Netlink cache here
...

struct rtnl_route *route_obj = (struct rtnl_route *)obj;
auto family = rtnl_route_get_family(route_obj);
if (family == AF_MPLS) {
onLabelRouteMsg(nlmsg_type, obj);
return;
}
...

unsigned int master_index = rtnl_route_get_table(route_obj);
char master_name[IFNAMSIZ] = {0};
if (master_index) {
/* If the master device name starts with VNET_PREFIX, it is a VNET route.
The VNET name is exactly the name of the associated master device. */
getIfName(master_index, master_name, IFNAMSIZ);
if (string(master_name).find(VNET_PREFIX) == 0) {
onVnetRouteMsg(nlmsg_type, obj, string(master_name));
}

/* Otherwise, it is a regular route (include VRF route). */
else {
onRouteMsg(nlmsg_type, obj, master_name);
}
} else {
onRouteMsg(nlmsg_type, obj, NULL);
}
}

从上面的代码中,我们可以看到这里会有四种不同的路由处理入口,这些不同的路由会被最终通过各自的ProducerStateTable写入到APPL_DB中的不同的Table中:

路由类型 处理函数 Table
MPLS onLabelRouteMsg LABLE_ROUTE_TABLE
Vnet VxLan Tunnel Route onVnetRouteMsg VNET_ROUTE_TUNNEL_TABLE
其他Vnet路由 onVnetRouteMsg VNET_ROUTE_TABLE
EVPN Type 5 onEvpnRouteMsg ROUTE_TABLE
普通路由 onRouteMsg ROUTE_TABLE

这里以普通路由来举例子,其他的函数的实现虽然有所不同,但是主体的思路是一样的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// File: src/sonic-swss/fpmsyncd/routesync.cpp
void RouteSync::onRouteMsg(int nlmsg_type, struct nl_object *obj, char *vrf)
{
// Parse route info from nl_object here.
...

// Get nexthop lists
string gw_list;
string intf_list;
string mpls_list;
getNextHopList(route_obj, gw_list, mpls_list, intf_list);
...

// Build route info here, including protocol, interface, next hops, MPLS, weights etc.
vector<FieldValueTuple> fvVector;
FieldValueTuple proto("protocol", proto_str);
FieldValueTuple gw("nexthop", gw_list);
...

fvVector.push_back(proto);
fvVector.push_back(gw);
...

// Push to ROUTE_TABLE via ProducerStateTable.
m_routeTable.set(destipprefix, fvVector);
SWSS_LOG_DEBUG("RouteTable set msg: %s %s %s %s", destipprefix, gw_list.c_str(), intf_list.c_str(), mpls_list.c_str());
...
}

1.2. orchagent处理路由配置变化

接下来,这些路由信息会来到orchagent。在orchagent启动的时候,它会创建好VNetRouteOrchRouteOrch对象,这两个对象分别用来监听和处理Vnet相关路由和EVPN/普通路由:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// File: src/sonic-swss/orchagent/orchdaemon.cpp
bool OrchDaemon::init()
{
...

vector<string> vnet_tables = { APP_VNET_RT_TABLE_NAME, APP_VNET_RT_TUNNEL_TABLE_NAME };
VNetRouteOrch *vnet_rt_orch = new VNetRouteOrch(m_applDb, vnet_tables, vnet_orch);
...

const int routeorch_pri = 5;
vector<table_name_with_pri_t> route_tables = {
{ APP_ROUTE_TABLE_NAME, routeorch_pri },
{ APP_LABEL_ROUTE_TABLE_NAME, routeorch_pri }
};
gRouteOrch = new RouteOrch(m_applDb, route_tables, gSwitchOrch, gNeighOrch, gIntfsOrch, vrf_orch, gFgNhgOrch, gSrv6Orch);
...
}

所有Orch对象的消息处理入口都是doTask,这里RouteOrchVNetRouteOrch也不例外,这里我们以RouteOrch为例子,看看它是如何处理路由变化的。

note
1
从`RouteOrch`上,我们可以真切的感受到为什么这些类被命名为`Orch`。`RouteOrch`有2500多行,其中会有和很多其他Orch的交互,以及各种各样的细节…… 代码是相对难读,请大家读的时候一定保持耐心。

RouteOrch在处理路由消息的时候有几点需要注意:

  • 从上面init函数,我们可以看到RouteOrch不仅会管理普通路由,还会管理MPLS路由,这两种路由的处理逻辑是不一样的,所以在下面的代码中,为了简化,我们只展示普通路由的处理逻辑。
  • 因为ProducerStateTable在传递和接受消息的时候都是批量传输的,所以,RouteOrch在处理消息的时候,也是批量处理的。为了支持批量处理,RouteOrch会借用EntityBulker<sai_route_api_t> gRouteBulker将需要改动的SAI路由对象缓存起来,然后在doTask()函数的最后,一次性将这些路由对象的改动应用到SAI中。
  • 路由的操作会需要很多其他的信息,比如每个Port的状态,每个Neighbor的状态,每个VRF的状态等等。为了获取这些信息,RouteOrch会与其他的Orch对象进行交互,比如PortOrchNeighOrchVRFOrch等等。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
// File: src/sonic-swss/orchagent/routeorch.cpp
void RouteOrch::doTask(Consumer& consumer)
{
// Calling PortOrch to make sure all ports are ready before processing route messages.
if (!gPortsOrch->allPortsReady()) { return; }

// Call doLabelTask() instead, if the incoming messages are from MPLS messages. Otherwise, move on as regular routes.
...

/* Default handling is for ROUTE_TABLE (regular routes) */
auto it = consumer.m_toSync.begin();
while (it != consumer.m_toSync.end()) {
// Add or remove routes with a route bulker
while (it != consumer.m_toSync.end())
{
KeyOpFieldsValuesTuple t = it->second;

// Parse route operation from the incoming message here.
string key = kfvKey(t);
string op = kfvOp(t);
...

// resync application:
// - When routeorch receives 'resync' message (key = "resync", op = "SET"), it marks all current routes as dirty
// and waits for 'resync complete' message. For all newly received routes, if they match current dirty routes,
// it unmarks them dirty.
// - After receiving 'resync complete' (key = "resync", op != "SET") message, it creates all newly added routes
// and removes all dirty routes.
...

// Parsing VRF and IP prefix from the incoming message here.
...

// Process regular route operations.
if (op == SET_COMMAND)
{
// Parse and validate route attributes from the incoming message here.
string ips;
string aliases;
...

// If the nexthop_group is empty, create the next hop group key based on the IPs and aliases.
// Otherwise, get the key from the NhgOrch. The result will be stored in the "nhg" variable below.
NextHopGroupKey& nhg = ctx.nhg;
...
if (nhg_index.empty())
{
// Here the nexthop_group is empty, so we create the next hop group key based on the IPs and aliases.
...

string nhg_str = "";
if (blackhole) {
nhg = NextHopGroupKey();
} else if (srv6_nh == true) {
...
nhg = NextHopGroupKey(nhg_str, overlay_nh, srv6_nh);
} else if (overlay_nh == false) {
...
nhg = NextHopGroupKey(nhg_str, weights);
} else {
...
nhg = NextHopGroupKey(nhg_str, overlay_nh, srv6_nh);
}
}
else
{
// Here we have a nexthop_group, so we get the key from the NhgOrch.
const NhgBase& nh_group = getNhg(nhg_index);
nhg = nh_group.getNhgKey();
...
}
...

// Now we start to create the SAI route entry.
if (nhg.getSize() == 1 && nhg.hasIntfNextHop())
{
// Skip certain routes, such as not valid, directly routes to tun0, linklocal or multicast routes, etc.
...

// Create SAI route entry in addRoute function.
if (addRoute(ctx, nhg)) it = consumer.m_toSync.erase(it);
else it++;
}

/*
* Check if the route does not exist or needs to be updated or
* if the route is using a temporary next hop group owned by
* NhgOrch.
*/
else if (m_syncdRoutes.find(vrf_id) == m_syncdRoutes.end() ||
m_syncdRoutes.at(vrf_id).find(ip_prefix) == m_syncdRoutes.at(vrf_id).end() ||
m_syncdRoutes.at(vrf_id).at(ip_prefix) != RouteNhg(nhg, ctx.nhg_index) ||
gRouteBulker.bulk_entry_pending_removal(route_entry) ||
ctx.using_temp_nhg)
{
if (addRoute(ctx, nhg)) it = consumer.m_toSync.erase(it);
else it++;
}
...
}
// Handle other ops, like DEL_COMMAND for route deletion, etc.
...
}

// Flush the route bulker, so routes will be written to syncd and ASIC
gRouteBulker.flush();

// Go through the bulker results.
// Handle SAI failures, update neighbors, counters, send notifications in add/removeRoutePost functions.
...

/* Remove next hop group if the reference count decreases to zero */
...
}
}

解析完路由操作后,RouteOrch会调用addRoute或者removeRoute函数来创建或者删除路由。这里以添加路由addRoute为例子来继续分析。它的逻辑主要分为几个大部分:

  1. 从NeighOrch中获取下一跳信息,并检查下一跳是否真的可用。
  2. 如果是新路由,或者是重新添加正在等待删除的路由,那么就会创建一个新的SAI路由对象
  3. 如果是已有的路由,那么就更新已有的SAI路由对象
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// File: src/sonic-swss/orchagent/routeorch.cpp
bool RouteOrch::addRoute(RouteBulkContext& ctx, const NextHopGroupKey &nextHops)
{
// Get nexthop information from NeighOrch.
// We also need to check PortOrch for inband port, IntfsOrch to ensure the related interface is created and etc.
...

// Start to sync the SAI route entry.
sai_route_entry_t route_entry;
route_entry.vr_id = vrf_id;
route_entry.switch_id = gSwitchId;
copy(route_entry.destination, ipPrefix);

sai_attribute_t route_attr;
auto& object_statuses = ctx.object_statuses;

// Create a new route entry in this case.
//
// In case the entry is already pending removal in the bulk, it would be removed from m_syncdRoutes during the bulk call.
// Therefore, such entries need to be re-created rather than set attribute.
if (it_route == m_syncdRoutes.at(vrf_id).end() || gRouteBulker.bulk_entry_pending_removal(route_entry)) {
if (blackhole) {
route_attr.id = SAI_ROUTE_ENTRY_ATTR_PACKET_ACTION;
route_attr.value.s32 = SAI_PACKET_ACTION_DROP;
} else {
route_attr.id = SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID;
route_attr.value.oid = next_hop_id;
}

/* Default SAI_ROUTE_ATTR_PACKET_ACTION is SAI_PACKET_ACTION_FORWARD */
object_statuses.emplace_back();
sai_status_t status = gRouteBulker.create_entry(&object_statuses.back(), &route_entry, 1, &route_attr);
if (status == SAI_STATUS_ITEM_ALREADY_EXISTS) {
return false;
}
}

// Update existing route entry in this case.
else {
// Set the packet action to forward when there was no next hop (dropped) and not pointing to blackhole.
if (it_route->second.nhg_key.getSize() == 0 && !blackhole) {
route_attr.id = SAI_ROUTE_ENTRY_ATTR_PACKET_ACTION;
route_attr.value.s32 = SAI_PACKET_ACTION_FORWARD;

object_statuses.emplace_back();
gRouteBulker.set_entry_attribute(&object_statuses.back(), &route_entry, &route_attr);
}

// Only 1 case is listed here as an example. Other cases are handled with similar logic by calling set_entry_attributes as well.
...
}
...
}

在创建和设置好所有的路由后,RouteOrch会调用gRouteBulker.flush()来将所有的路由写入到ASIC_DB中。flush()函数很简单,就是将所有的请求分批次进行处理,默认情况下每一批是1000个,这个定义在OrchDaemon中,并通过构造函数传入:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
// File: src/sonic-swss/orchagent/orchdaemon.cpp
#define DEFAULT_MAX_BULK_SIZE 1000
size_t gMaxBulkSize = DEFAULT_MAX_BULK_SIZE;

// File: src/sonic-swss/orchagent/bulker.h
template <typename T>
class EntityBulker
{
public:
using Ts = SaiBulkerTraits<T>;
using Te = typename Ts::entry_t;
...

void flush()
{
// Bulk remove entries
if (!removing_entries.empty()) {
// Split into batches of max_bulk_size, then call flush. Similar to creating_entries, so details are omitted.
std::vector<Te> rs;
...
flush_removing_entries(rs);
removing_entries.clear();
}

// Bulk create entries
if (!creating_entries.empty()) {
// Split into batches of max_bulk_size, then call flush_creating_entries to call SAI batch create API to create
// the objects in batch.
std::vector<Te> rs;
std::vector<sai_attribute_t const*> tss;
std::vector<uint32_t> cs;

for (auto const& i: creating_entries) {
sai_object_id_t *pid = std::get<0>(i);
auto const& attrs = std::get<1>(i);
if (*pid == SAI_NULL_OBJECT_ID) {
rs.push_back(pid);
tss.push_back(attrs.data());
cs.push_back((uint32_t)attrs.size());

// Batch create here.
if (rs.size() >= max_bulk_size) {
flush_creating_entries(rs, tss, cs);
}
}
}

flush_creating_entries(rs, tss, cs);
creating_entries.clear();
}

// Bulk update existing entries
if (!setting_entries.empty()) {
// Split into batches of max_bulk_size, then call flush. Similar to creating_entries, so details are omitted.
std::vector<Te> rs;
std::vector<sai_attribute_t> ts;
std::vector<sai_status_t*> status_vector;
...
flush_setting_entries(rs, ts, status_vector);
setting_entries.clear();
}
}

sai_status_t flush_creating_entries(
_Inout_ std::vector<Te> &rs,
_Inout_ std::vector<sai_attribute_t const*> &tss,
_Inout_ std::vector<uint32_t> &cs)
{
...

// Call SAI bulk create API
size_t count = rs.size();
std::vector<sai_status_t> statuses(count);
sai_status_t status = (*create_entries)((uint32_t)count, rs.data(), cs.data(), tss.data()
, SAI_BULK_OP_ERROR_MODE_IGNORE_ERROR, statuses.data());

// Set results back to input entries and clean up the batch below.
for (size_t ir = 0; ir < count; ir++) {
auto& entry = rs[ir];
sai_status_t *object_status = creating_entries[entry].second;
if (object_status) {
*object_status = statuses[ir];
}
}

rs.clear(); tss.clear(); cs.clear();
return status;
}

// flush_removing_entries and flush_setting_entries are similar to flush_creating_entries, so we omit them here.
...
};

1.3. orchagent中的SAI对象转发

细心的小伙伴肯定已经发现了奇怪的地方,这里EntityBulker怎么看着像在直接调用SAI API呢?难道它们不应该是在syncd中调用的吗?如果我们对传入EntityBulker的SAI API对象进行跟踪,我们甚至会找到sai_route_api_t就是SAI的接口,而orchagent中还有SAI的初始化代码,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// File: src/sonic-sairedis/debian/libsaivs-dev/usr/include/sai/sairoute.h
/**
* @brief Router entry methods table retrieved with sai_api_query()
*/
typedef struct _sai_route_api_t
{
sai_create_route_entry_fn create_route_entry;
sai_remove_route_entry_fn remove_route_entry;
sai_set_route_entry_attribute_fn set_route_entry_attribute;
sai_get_route_entry_attribute_fn get_route_entry_attribute;

sai_bulk_create_route_entry_fn create_route_entries;
sai_bulk_remove_route_entry_fn remove_route_entries;
sai_bulk_set_route_entry_attribute_fn set_route_entries_attribute;
sai_bulk_get_route_entry_attribute_fn get_route_entries_attribute;
} sai_route_api_t;

// File: src/sonic-swss/orchagent/saihelper.cpp
void initSaiApi()
{
SWSS_LOG_ENTER();

if (ifstream(CONTEXT_CFG_FILE))
{
SWSS_LOG_NOTICE("Context config file %s exists", CONTEXT_CFG_FILE);
gProfileMap[SAI_REDIS_KEY_CONTEXT_CONFIG] = CONTEXT_CFG_FILE;
}

sai_api_initialize(0, (const sai_service_method_table_t *)&test_services);
sai_api_query(SAI_API_SWITCH, (void **)&sai_switch_api);
...
sai_api_query(SAI_API_NEIGHBOR, (void **)&sai_neighbor_api);
sai_api_query(SAI_API_NEXT_HOP, (void **)&sai_next_hop_api);
sai_api_query(SAI_API_NEXT_HOP_GROUP, (void **)&sai_next_hop_group_api);
sai_api_query(SAI_API_ROUTE, (void **)&sai_route_api);
...

sai_log_set(SAI_API_SWITCH, SAI_LOG_LEVEL_NOTICE);
...
sai_log_set(SAI_API_NEIGHBOR, SAI_LOG_LEVEL_NOTICE);
sai_log_set(SAI_API_NEXT_HOP, SAI_LOG_LEVEL_NOTICE);
sai_log_set(SAI_API_NEXT_HOP_GROUP, SAI_LOG_LEVEL_NOTICE);
sai_log_set(SAI_API_ROUTE, SAI_LOG_LEVEL_NOTICE);
...
}

相信大家第一次看到这个代码会感觉到非常的困惑。不过别着急,这其实就是orchagent中SAI对象的转发机制。

熟悉RPC的小伙伴一定不会对proxy-stub模式感到陌生 —— 利用统一的接口来定义通信双方调用接口,在调用方实现序列化和发送,然后再接收方实现接收,反序列化与分发。这里SONiC的做法也是类似的:利用SAI API本身作为统一的接口,并实现好序列化和发送功能给orchagent来调用,然后再syncd中实现接收,反序列化与分发功能。

这里,发送端叫做ClientSai,实现在src/sonic-sairedis/lib/ClientSai.*中。而序列化与反序列化实现在SAI metadata中:src/sonic-sairedis/meta/sai_serialize.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// File: src/sonic-sairedis/lib/ClientSai.h
namespace sairedis
{
class ClientSai:
public sairedis::SaiInterface
{
...
};
}

// File: src/sonic-sairedis/meta/sai_serialize.h
// Serialize
std::string sai_serialize_route_entry(_In_ const sai_route_entry_t &route_entry);
...

// Deserialize
void sai_deserialize_route_entry(_In_ const std::string& s, _In_ sai_route_entry_t &route_entry);
...

orchagent在编译的时候,会去链接libsairedis,从而实现调用SAI API时,对SAI对象进行序列化和发送:

1
2
# File: src/sonic-swss/orchagent/Makefile.am
orchagent_LDADD = $(LDFLAGS_ASAN) -lnl-3 -lnl-route-3 -lpthread -lsairedis -lsaimeta -lsaimetadata -lswsscommon -lzmq

我们这里用Bulk Create作为例子,来看看ClientSai是如何实现序列化和发送的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// File: src/sonic-sairedis/lib/ClientSai.cpp
sai_status_t ClientSai::bulkCreate(
_In_ sai_object_type_t object_type,
_In_ sai_object_id_t switch_id,
_In_ uint32_t object_count,
_In_ const uint32_t *attr_count,
_In_ const sai_attribute_t **attr_list,
_In_ sai_bulk_op_error_mode_t mode,
_Out_ sai_object_id_t *object_id,
_Out_ sai_status_t *object_statuses)
{
MUTEX();
REDIS_CHECK_API_INITIALIZED();

std::vector<std::string> serialized_object_ids;

// Server is responsible for generate new OID but for that we need switch ID
// to be sent to server as well, so instead of sending empty oids we will
// send switch IDs
for (uint32_t idx = 0; idx < object_count; idx++) {
serialized_object_ids.emplace_back(sai_serialize_object_id(switch_id));
}
auto status = bulkCreate(object_type, serialized_object_ids, attr_count, attr_list, mode, object_statuses);

// Since user requested create, OID value was created remotely and it was returned in m_lastCreateOids
for (uint32_t idx = 0; idx < object_count; idx++) {
if (object_statuses[idx] == SAI_STATUS_SUCCESS) {
object_id[idx] = m_lastCreateOids.at(idx);
} else {
object_id[idx] = SAI_NULL_OBJECT_ID;
}
}

return status;
}

sai_status_t ClientSai::bulkCreate(
_In_ sai_object_type_t object_type,
_In_ const std::vector<std::string> &serialized_object_ids,
_In_ const uint32_t *attr_count,
_In_ const sai_attribute_t **attr_list,
_In_ sai_bulk_op_error_mode_t mode,
_Inout_ sai_status_t *object_statuses)
{
...

// Calling SAI serialize APIs to serialize all objects
std::string str_object_type = sai_serialize_object_type(object_type);
std::vector<swss::FieldValueTuple> entries;
for (size_t idx = 0; idx < serialized_object_ids.size(); ++idx) {
auto entry = SaiAttributeList::serialize_attr_list(object_type, attr_count[idx], attr_list[idx], false);
if (entry.empty()) {
swss::FieldValueTuple null("NULL", "NULL");
entry.push_back(null);
}

std::string str_attr = Globals::joinFieldValues(entry);
swss::FieldValueTuple fvtNoStatus(serialized_object_ids[idx] , str_attr);
entries.push_back(fvtNoStatus);
}
std::string key = str_object_type + ":" + std::to_string(entries.size());

// Send to syncd via the communication channel.
m_communicationChannel->set(key, entries, REDIS_ASIC_STATE_COMMAND_BULK_CREATE);

// Wait for response from syncd.
return waitForBulkResponse(SAI_COMMON_API_BULK_CREATE, (uint32_t)serialized_object_ids.size(), object_statuses);
}

最终,ClientSai会调用m_communicationChannel->set(),将序列化后的SAI对象发送给syncd。而这个Channel,在202106版本之前,就是基于Redis的ProducerTable了。可能是基于效率的考虑,从202111版本开始,这个Channel已经更改为ZMQ了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// File: https://github.com/sonic-net/sonic-sairedis/blob/202106/lib/inc/RedisChannel.h
class RedisChannel: public Channel
{
...

/**
* @brief Asic state channel.
*
* Used to sent commands like create/remove/set/get to syncd.
*/
std::shared_ptr<swss::ProducerTable> m_asicState;

...
};

// File: src/sonic-sairedis/lib/ClientSai.cpp
sai_status_t ClientSai::initialize(
_In_ uint64_t flags,
_In_ const sai_service_method_table_t *service_method_table)
{
...

m_communicationChannel = std::make_shared<ZeroMQChannel>(
cc->m_zmqEndpoint,
cc->m_zmqNtfEndpoint,
std::bind(&ClientSai::handleNotification, this, _1, _2, _3));

m_apiInitialized = true;

return SAI_STATUS_SUCCESS;
}

关于进程通信的方法,这里就不再赘述了,大家可以参考第四章描述的进程间的通信机制

1.4. syncd更新ASIC

最后,当SAI对象生成好并发送给syncd后,syncd会接收,处理,更新ASIC_DB,最后更新ASIC。这一段的工作流,我们已经在Syncd-SAI工作流中详细介绍过了,这里就不再赘述了,大家可以移步去查看。

2. 参考资料

  1. SONiC Architecture
  2. Github repo: sonic-swss
  3. Github repo: sonic-swss-common
  4. Github repo: sonic-sairedis

同系列文章:
原创文章,转载请标明出处:Soul Orbit
本文链接地址:SONiC学习笔记(七):BGP工作流(下)—— BGP路由变更下发