* [PATCH net-next v2 0/7] Support for nexthop group statistics
@ 2024-02-29 18:16 Petr Machata
2024-02-29 18:16 ` [PATCH net-next v2 1/7] net: nexthop: Adjust netlink policy parsing for a new attribute Petr Machata
` (6 more replies)
0 siblings, 7 replies; 23+ messages in thread
From: Petr Machata @ 2024-02-29 18:16 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Simon Horman, mlxsw
ECMP is a fundamental component in L3 designs. However, it's fragile. Many
factors influence whether an ECMP group will operate as intended: hash
policy (i.e. the set of fields that contribute to ECMP hash calculation),
neighbor validity, hash seed (which might lead to polarization) or the type
of ECMP group used (hash-threshold or resilient).
At the same time, collection of statistics that would help an operator
determine that the group performs as desired, is difficult.
A solution that we present in this patchset is to add counters to next hop
group entries. For SW-datapath deployments, this will on its own allow
collection and evaluation of relevant statistics. For HW-datapath
deployments, we further add a way to request that HW counters be installed
for a given group, in-kernel interfaces to collect the HW statistics, and
netlink interfaces to query them.
For example:
# ip nexthop replace id 4000 group 4001/4002 hw_stats on
# ip -s -d nexthop show id 4000
id 4000 group 4001/4002 scope global proto unspec offload hw_stats on used on
stats:
id 4001 packets 5002 packets_hw 5000
id 4002 packets 4999 packets_hw 4999
The point of the patchset is visibility of ECMP balance, and that is
influenced by packet headers, not their payload. Correspondingly, we only
include packet counters in the statistics, not byte counters.
We also decided to model HW statistics as a nexthop group attribute, not an
arbitrary nexthop one. The latter would count any traffic going through a
given nexthop, regardless of which ECMP group it is in, or any at all. The
reason is again hat the point of the patchset is ECMP balance visibility,
not arbitrary inspection of how busy a particular nexthop is.
Implementation of individual-nexthop statistics is certainly possible, and
could well follow the general approach we are taking in this patchset.
For resilient groups, per-bucket statistics could be done in a similar
manner as well.
This patchset contains the core code. mlxsw support will be sent in a
follow-up patch set.
This patchset progresses as follows:
- Patches #1 and #2 add support for a new next-hop object attribute,
NHA_OP_FLAGS. That is meant to carry various op-specific signaling, in
particular whether SW- and HW-collected nexthop stats should be part of
the get or dump response. The idea is to avoid wasting message space, and
time for collection of HW statistics, when the values are not needed.
- Patches #3 and #4 add SW-datapath stats and corresponding UAPI.
- Patches #5, #6 and #7 add support fro HW-datapath stats and UAPI.
Individual drivers still need to contribute the appropriate HW-specific
support code.
v2:
- Patch #2:
- Change OP_FLAGS to u32, enforce through NLA_POLICY_MASK
- Patch #3:
- Set err on nexthop_create_group() error path
- Patch #4:
- Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS
- Rename jump target in nla_put_nh_group_stats() to avoid
having to rename further in the patchset.
- Patch #7:
- Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS_HW
- Do not cancel outside of nesting in nla_put_nh_group_stats()
Ido Schimmel (5):
net: nexthop: Add nexthop group entry stats
net: nexthop: Expose nexthop group stats to user space
net: nexthop: Add hardware statistics notifications
net: nexthop: Add ability to enable / disable hardware statistics
net: nexthop: Expose nexthop group HW stats to user space
Petr Machata (2):
net: nexthop: Adjust netlink policy parsing for a new attribute
net: nexthop: Add NHA_OP_FLAGS
include/net/nexthop.h | 29 ++++
include/uapi/linux/nexthop.h | 45 +++++
net/ipv4/nexthop.c | 317 ++++++++++++++++++++++++++++++-----
3 files changed, 351 insertions(+), 40 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH net-next v2 1/7] net: nexthop: Adjust netlink policy parsing for a new attribute
2024-02-29 18:16 [PATCH net-next v2 0/7] Support for nexthop group statistics Petr Machata
@ 2024-02-29 18:16 ` Petr Machata
2024-03-01 15:31 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 2/7] net: nexthop: Add NHA_OP_FLAGS Petr Machata
` (5 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Petr Machata @ 2024-02-29 18:16 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Simon Horman, mlxsw
A following patch will introduce a new attribute, op-specific flags to
adjust the behavior of an operation. Different operations will recognize
different flags.
- To make the differentiation possible, stop sharing the policies for get
and del operations.
- To allow querying for presence of the attribute, have all the attribute
arrays sized to NHA_MAX, regardless of what is permitted by policy, and
pass the corresponding value to nlmsg_parse() as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
net/ipv4/nexthop.c | 58 ++++++++++++++++++++++------------------------
1 file changed, 28 insertions(+), 30 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 70509da4f080..bcd4df2f1cad 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -43,6 +43,10 @@ static const struct nla_policy rtm_nh_policy_get[] = {
[NHA_ID] = { .type = NLA_U32 },
};
+static const struct nla_policy rtm_nh_policy_del[] = {
+ [NHA_ID] = { .type = NLA_U32 },
+};
+
static const struct nla_policy rtm_nh_policy_dump[] = {
[NHA_OIF] = { .type = NLA_U32 },
[NHA_GROUPS] = { .type = NLA_FLAG },
@@ -2966,9 +2970,9 @@ static int rtm_new_nexthop(struct sk_buff *skb, struct nlmsghdr *nlh,
return err;
}
-static int __nh_valid_get_del_req(const struct nlmsghdr *nlh,
- struct nlattr **tb, u32 *id,
- struct netlink_ext_ack *extack)
+static int nh_valid_get_del_req(const struct nlmsghdr *nlh,
+ struct nlattr **tb, u32 *id,
+ struct netlink_ext_ack *extack)
{
struct nhmsg *nhm = nlmsg_data(nlh);
@@ -2991,26 +2995,12 @@ static int __nh_valid_get_del_req(const struct nlmsghdr *nlh,
return 0;
}
-static int nh_valid_get_del_req(const struct nlmsghdr *nlh, u32 *id,
- struct netlink_ext_ack *extack)
-{
- struct nlattr *tb[ARRAY_SIZE(rtm_nh_policy_get)];
- int err;
-
- err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb,
- ARRAY_SIZE(rtm_nh_policy_get) - 1,
- rtm_nh_policy_get, extack);
- if (err < 0)
- return err;
-
- return __nh_valid_get_del_req(nlh, tb, id, extack);
-}
-
/* rtnl */
static int rtm_del_nexthop(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(skb->sk);
+ struct nlattr *tb[NHA_MAX + 1];
struct nl_info nlinfo = {
.nlh = nlh,
.nl_net = net,
@@ -3020,7 +3010,12 @@ static int rtm_del_nexthop(struct sk_buff *skb, struct nlmsghdr *nlh,
int err;
u32 id;
- err = nh_valid_get_del_req(nlh, &id, extack);
+ err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb, NHA_MAX,
+ rtm_nh_policy_del, extack);
+ if (err < 0)
+ return err;
+
+ err = nh_valid_get_del_req(nlh, tb, &id, extack);
if (err)
return err;
@@ -3038,12 +3033,18 @@ static int rtm_get_nexthop(struct sk_buff *in_skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(in_skb->sk);
+ struct nlattr *tb[NHA_MAX + 1];
struct sk_buff *skb = NULL;
struct nexthop *nh;
int err;
u32 id;
- err = nh_valid_get_del_req(nlh, &id, extack);
+ err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb, NHA_MAX,
+ rtm_nh_policy_get, extack);
+ if (err < 0)
+ return err;
+
+ err = nh_valid_get_del_req(nlh, tb, &id, extack);
if (err)
return err;
@@ -3157,11 +3158,10 @@ static int nh_valid_dump_req(const struct nlmsghdr *nlh,
struct nh_dump_filter *filter,
struct netlink_callback *cb)
{
- struct nlattr *tb[ARRAY_SIZE(rtm_nh_policy_dump)];
+ struct nlattr *tb[NHA_MAX + 1];
int err;
- err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb,
- ARRAY_SIZE(rtm_nh_policy_dump) - 1,
+ err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb, NHA_MAX,
rtm_nh_policy_dump, cb->extack);
if (err < 0)
return err;
@@ -3300,11 +3300,10 @@ static int nh_valid_dump_bucket_req(const struct nlmsghdr *nlh,
struct netlink_callback *cb)
{
struct nlattr *res_tb[ARRAY_SIZE(rtm_nh_res_bucket_policy_dump)];
- struct nlattr *tb[ARRAY_SIZE(rtm_nh_policy_dump_bucket)];
+ struct nlattr *tb[NHA_MAX + 1];
int err;
- err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb,
- ARRAY_SIZE(rtm_nh_policy_dump_bucket) - 1,
+ err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb, NHA_MAX,
rtm_nh_policy_dump_bucket, NULL);
if (err < 0)
return err;
@@ -3474,16 +3473,15 @@ static int nh_valid_get_bucket_req(const struct nlmsghdr *nlh,
u32 *id, u16 *bucket_index,
struct netlink_ext_ack *extack)
{
- struct nlattr *tb[ARRAY_SIZE(rtm_nh_policy_get_bucket)];
+ struct nlattr *tb[NHA_MAX + 1];
int err;
- err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb,
- ARRAY_SIZE(rtm_nh_policy_get_bucket) - 1,
+ err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb, NHA_MAX,
rtm_nh_policy_get_bucket, extack);
if (err < 0)
return err;
- err = __nh_valid_get_del_req(nlh, tb, id, extack);
+ err = nh_valid_get_del_req(nlh, tb, id, extack);
if (err)
return err;
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH net-next v2 2/7] net: nexthop: Add NHA_OP_FLAGS
2024-02-29 18:16 [PATCH net-next v2 0/7] Support for nexthop group statistics Petr Machata
2024-02-29 18:16 ` [PATCH net-next v2 1/7] net: nexthop: Adjust netlink policy parsing for a new attribute Petr Machata
@ 2024-02-29 18:16 ` Petr Machata
2024-03-01 15:36 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 3/7] net: nexthop: Add nexthop group entry stats Petr Machata
` (4 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Petr Machata @ 2024-02-29 18:16 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Simon Horman, mlxsw
In order to add per-nexthop statistics, but still not increase netlink
message size for consumers that do not care about them, there needs to be a
toggle through which the user indicates their desire to get the statistics.
To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to
use the attribute for carrying of arbitrary operation-specific flags, i.e.
not make it specific for get / dump.
Add the new attribute to get and dump policies, but do not actually allow
any flags yet -- those will come later as the flags themselves are defined.
Add the necessary parsing code.
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
Notes:
v2:
- Change OP_FLAGS to u32, enforce through NLA_POLICY_MASK
include/uapi/linux/nexthop.h | 3 +++
net/ipv4/nexthop.c | 24 ++++++++++++++++++++----
2 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
index d8ffa8c9ca78..086444e2946c 100644
--- a/include/uapi/linux/nexthop.h
+++ b/include/uapi/linux/nexthop.h
@@ -60,6 +60,9 @@ enum {
/* nested; nexthop bucket attributes */
NHA_RES_BUCKET,
+ /* u32; operation-specific flags */
+ NHA_OP_FLAGS,
+
__NHA_MAX,
};
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index bcd4df2f1cad..816ae8ee3e06 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -41,6 +41,7 @@ static const struct nla_policy rtm_nh_policy_new[] = {
static const struct nla_policy rtm_nh_policy_get[] = {
[NHA_ID] = { .type = NLA_U32 },
+ [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32, 0),
};
static const struct nla_policy rtm_nh_policy_del[] = {
@@ -52,6 +53,7 @@ static const struct nla_policy rtm_nh_policy_dump[] = {
[NHA_GROUPS] = { .type = NLA_FLAG },
[NHA_MASTER] = { .type = NLA_U32 },
[NHA_FDB] = { .type = NLA_FLAG },
+ [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32, 0),
};
static const struct nla_policy rtm_nh_res_policy_new[] = {
@@ -2971,7 +2973,7 @@ static int rtm_new_nexthop(struct sk_buff *skb, struct nlmsghdr *nlh,
}
static int nh_valid_get_del_req(const struct nlmsghdr *nlh,
- struct nlattr **tb, u32 *id,
+ struct nlattr **tb, u32 *id, u32 *op_flags,
struct netlink_ext_ack *extack)
{
struct nhmsg *nhm = nlmsg_data(nlh);
@@ -2992,6 +2994,11 @@ static int nh_valid_get_del_req(const struct nlmsghdr *nlh,
return -EINVAL;
}
+ if (tb[NHA_OP_FLAGS])
+ *op_flags = nla_get_u32(tb[NHA_OP_FLAGS]);
+ else
+ *op_flags = 0;
+
return 0;
}
@@ -3007,6 +3014,7 @@ static int rtm_del_nexthop(struct sk_buff *skb, struct nlmsghdr *nlh,
.portid = NETLINK_CB(skb).portid,
};
struct nexthop *nh;
+ u32 op_flags;
int err;
u32 id;
@@ -3015,7 +3023,7 @@ static int rtm_del_nexthop(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
return err;
- err = nh_valid_get_del_req(nlh, tb, &id, extack);
+ err = nh_valid_get_del_req(nlh, tb, &id, &op_flags, extack);
if (err)
return err;
@@ -3036,6 +3044,7 @@ static int rtm_get_nexthop(struct sk_buff *in_skb, struct nlmsghdr *nlh,
struct nlattr *tb[NHA_MAX + 1];
struct sk_buff *skb = NULL;
struct nexthop *nh;
+ u32 op_flags;
int err;
u32 id;
@@ -3044,7 +3053,7 @@ static int rtm_get_nexthop(struct sk_buff *in_skb, struct nlmsghdr *nlh,
if (err < 0)
return err;
- err = nh_valid_get_del_req(nlh, tb, &id, extack);
+ err = nh_valid_get_del_req(nlh, tb, &id, &op_flags, extack);
if (err)
return err;
@@ -3080,6 +3089,7 @@ struct nh_dump_filter {
bool group_filter;
bool fdb_filter;
u32 res_bucket_nh_id;
+ u32 op_flags;
};
static bool nh_dump_filtered(struct nexthop *nh,
@@ -3151,6 +3161,11 @@ static int __nh_valid_dump_req(const struct nlmsghdr *nlh, struct nlattr **tb,
return -EINVAL;
}
+ if (tb[NHA_OP_FLAGS])
+ filter->op_flags = nla_get_u32(tb[NHA_OP_FLAGS]);
+ else
+ filter->op_flags = 0;
+
return 0;
}
@@ -3474,6 +3489,7 @@ static int nh_valid_get_bucket_req(const struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct nlattr *tb[NHA_MAX + 1];
+ u32 op_flags;
int err;
err = nlmsg_parse(nlh, sizeof(struct nhmsg), tb, NHA_MAX,
@@ -3481,7 +3497,7 @@ static int nh_valid_get_bucket_req(const struct nlmsghdr *nlh,
if (err < 0)
return err;
- err = nh_valid_get_del_req(nlh, tb, id, extack);
+ err = nh_valid_get_del_req(nlh, tb, id, &op_flags, extack);
if (err)
return err;
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH net-next v2 3/7] net: nexthop: Add nexthop group entry stats
2024-02-29 18:16 [PATCH net-next v2 0/7] Support for nexthop group statistics Petr Machata
2024-02-29 18:16 ` [PATCH net-next v2 1/7] net: nexthop: Adjust netlink policy parsing for a new attribute Petr Machata
2024-02-29 18:16 ` [PATCH net-next v2 2/7] net: nexthop: Add NHA_OP_FLAGS Petr Machata
@ 2024-02-29 18:16 ` Petr Machata
2024-03-01 15:36 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space Petr Machata
` (3 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Petr Machata @ 2024-02-29 18:16 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Simon Horman, mlxsw
From: Ido Schimmel <idosch@nvidia.com>
Add nexthop group entry stats to count the number of packets forwarded
via each nexthop in the group. The stats will be exposed to user space
for better data path observability in the next patch.
The per-CPU stats pointer is placed at the beginning of 'struct
nh_grp_entry', so that all the fields accessed for the data path reside
on the same cache line:
struct nh_grp_entry {
struct nexthop * nh; /* 0 8 */
struct nh_grp_entry_stats * stats; /* 8 8 */
u8 weight; /* 16 1 */
/* XXX 7 bytes hole, try to pack */
union {
struct {
atomic_t upper_bound; /* 24 4 */
} hthr; /* 24 4 */
struct {
struct list_head uw_nh_entry; /* 24 16 */
u16 count_buckets; /* 40 2 */
u16 wants_buckets; /* 42 2 */
} res; /* 24 24 */
}; /* 24 24 */
struct list_head nh_list; /* 48 16 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct nexthop * nh_parent; /* 64 8 */
/* size: 72, cachelines: 2, members: 6 */
/* sum members: 65, holes: 1, sum holes: 7 */
/* last cacheline: 8 bytes */
};
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
Notes:
v2:
- Set err on nexthop_create_group() error path
include/net/nexthop.h | 6 ++++++
net/ipv4/nexthop.c | 25 +++++++++++++++++++++----
2 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 77e99cba60ad..4bf1875445d8 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -95,8 +95,14 @@ struct nh_res_table {
struct nh_res_bucket nh_buckets[] __counted_by(num_nh_buckets);
};
+struct nh_grp_entry_stats {
+ u64 packets;
+ struct u64_stats_sync syncp;
+};
+
struct nh_grp_entry {
struct nexthop *nh;
+ struct nh_grp_entry_stats __percpu *stats;
u8 weight;
union {
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 816ae8ee3e06..4be66622e24f 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -480,6 +480,7 @@ static void nexthop_free_group(struct nexthop *nh)
struct nh_grp_entry *nhge = &nhg->nh_entries[i];
WARN_ON(!list_empty(&nhge->nh_list));
+ free_percpu(nhge->stats);
nexthop_put(nhge->nh);
}
@@ -1182,6 +1183,7 @@ static struct nexthop *nexthop_select_path_fdb(struct nh_group *nhg, int hash)
if (hash > atomic_read(&nhge->hthr.upper_bound))
continue;
+ this_cpu_inc(nhge->stats->packets);
return nhge->nh;
}
@@ -1191,7 +1193,7 @@ static struct nexthop *nexthop_select_path_fdb(struct nh_group *nhg, int hash)
static struct nexthop *nexthop_select_path_hthr(struct nh_group *nhg, int hash)
{
- struct nexthop *rc = NULL;
+ struct nh_grp_entry *nhge0 = NULL;
int i;
if (nhg->fdb_nh)
@@ -1206,16 +1208,20 @@ static struct nexthop *nexthop_select_path_hthr(struct nh_group *nhg, int hash)
if (!nexthop_is_good_nh(nhge->nh))
continue;
- if (!rc)
- rc = nhge->nh;
+ if (!nhge0)
+ nhge0 = nhge;
if (hash > atomic_read(&nhge->hthr.upper_bound))
continue;
+ this_cpu_inc(nhge->stats->packets);
return nhge->nh;
}
- return rc ? : nhg->nh_entries[0].nh;
+ if (!nhge0)
+ nhge0 = &nhg->nh_entries[0];
+ this_cpu_inc(nhge0->stats->packets);
+ return nhge0->nh;
}
static struct nexthop *nexthop_select_path_res(struct nh_group *nhg, int hash)
@@ -1231,6 +1237,7 @@ static struct nexthop *nexthop_select_path_res(struct nh_group *nhg, int hash)
bucket = &res_table->nh_buckets[bucket_index];
nh_res_bucket_set_busy(bucket);
nhge = rcu_dereference(bucket->nh_entry);
+ this_cpu_inc(nhge->stats->packets);
return nhge->nh;
}
@@ -1804,6 +1811,7 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
newg->has_v4 = true;
list_del(&nhges[i].nh_list);
+ new_nhges[j].stats = nhges[i].stats;
new_nhges[j].nh_parent = nhges[i].nh_parent;
new_nhges[j].nh = nhges[i].nh;
new_nhges[j].weight = nhges[i].weight;
@@ -1819,6 +1827,7 @@ static void remove_nh_grp_entry(struct net *net, struct nh_grp_entry *nhge,
rcu_assign_pointer(nhp->nh_grp, newg);
list_del(&nhge->nh_list);
+ free_percpu(nhge->stats);
nexthop_put(nhge->nh);
/* Removal of a NH from a resilient group is notified through
@@ -2483,6 +2492,13 @@ static struct nexthop *nexthop_create_group(struct net *net,
if (nhi->family == AF_INET)
nhg->has_v4 = true;
+ nhg->nh_entries[i].stats =
+ netdev_alloc_pcpu_stats(struct nh_grp_entry_stats);
+ if (!nhg->nh_entries[i].stats) {
+ err = -ENOMEM;
+ nexthop_put(nhe);
+ goto out_no_nh;
+ }
nhg->nh_entries[i].nh = nhe;
nhg->nh_entries[i].weight = entry[i].weight + 1;
list_add(&nhg->nh_entries[i].nh_list, &nhe->grp_list);
@@ -2522,6 +2538,7 @@ static struct nexthop *nexthop_create_group(struct net *net,
out_no_nh:
for (i--; i >= 0; --i) {
list_del(&nhg->nh_entries[i].nh_list);
+ free_percpu(nhg->nh_entries[i].stats);
nexthop_put(nhg->nh_entries[i].nh);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space
2024-02-29 18:16 [PATCH net-next v2 0/7] Support for nexthop group statistics Petr Machata
` (2 preceding siblings ...)
2024-02-29 18:16 ` [PATCH net-next v2 3/7] net: nexthop: Add nexthop group entry stats Petr Machata
@ 2024-02-29 18:16 ` Petr Machata
2024-03-01 15:45 ` David Ahern
2024-03-01 17:32 ` Eric Dumazet
2024-02-29 18:16 ` [PATCH net-next v2 5/7] net: nexthop: Add hardware statistics notifications Petr Machata
` (2 subsequent siblings)
6 siblings, 2 replies; 23+ messages in thread
From: Petr Machata @ 2024-02-29 18:16 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Simon Horman, mlxsw
From: Ido Schimmel <idosch@nvidia.com>
Add netlink support for reading NH group stats.
This data is only for statistics of the traffic in the SW datapath. HW
nexthop group statistics will be added in the following patches.
Emission of the stats is keyed to a new op_stats flag to avoid cluttering
the netlink message with stats if the user doesn't need them:
NHA_OP_FLAG_DUMP_STATS.
Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
Notes:
v2:
- Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS
- Rename jump target in nla_put_nh_group_stats() to avoid
having to rename further in the patchset.
include/uapi/linux/nexthop.h | 30 ++++++++++++
net/ipv4/nexthop.c | 92 ++++++++++++++++++++++++++++++++----
2 files changed, 114 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
index 086444e2946c..f4db63c17085 100644
--- a/include/uapi/linux/nexthop.h
+++ b/include/uapi/linux/nexthop.h
@@ -30,6 +30,8 @@ enum {
#define NEXTHOP_GRP_TYPE_MAX (__NEXTHOP_GRP_TYPE_MAX - 1)
+#define NHA_OP_FLAG_DUMP_STATS BIT(0)
+
enum {
NHA_UNSPEC,
NHA_ID, /* u32; id for nexthop. id == 0 means auto-assign */
@@ -63,6 +65,9 @@ enum {
/* u32; operation-specific flags */
NHA_OP_FLAGS,
+ /* nested; nexthop group stats */
+ NHA_GROUP_STATS,
+
__NHA_MAX,
};
@@ -104,4 +109,29 @@ enum {
#define NHA_RES_BUCKET_MAX (__NHA_RES_BUCKET_MAX - 1)
+enum {
+ NHA_GROUP_STATS_UNSPEC,
+
+ /* nested; nexthop group entry stats */
+ NHA_GROUP_STATS_ENTRY,
+
+ __NHA_GROUP_STATS_MAX,
+};
+
+#define NHA_GROUP_STATS_MAX (__NHA_GROUP_STATS_MAX - 1)
+
+enum {
+ NHA_GROUP_STATS_ENTRY_UNSPEC,
+
+ /* u32; nexthop id of the nexthop group entry */
+ NHA_GROUP_STATS_ENTRY_ID,
+
+ /* uint; number of packets forwarded via the nexthop group entry */
+ NHA_GROUP_STATS_ENTRY_PACKETS,
+
+ __NHA_GROUP_STATS_ENTRY_MAX,
+};
+
+#define NHA_GROUP_STATS_ENTRY_MAX (__NHA_GROUP_STATS_ENTRY_MAX - 1)
+
#endif
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 4be66622e24f..0ede8777bd66 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -41,7 +41,8 @@ static const struct nla_policy rtm_nh_policy_new[] = {
static const struct nla_policy rtm_nh_policy_get[] = {
[NHA_ID] = { .type = NLA_U32 },
- [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32, 0),
+ [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
+ NHA_OP_FLAG_DUMP_STATS),
};
static const struct nla_policy rtm_nh_policy_del[] = {
@@ -53,7 +54,8 @@ static const struct nla_policy rtm_nh_policy_dump[] = {
[NHA_GROUPS] = { .type = NLA_FLAG },
[NHA_MASTER] = { .type = NLA_U32 },
[NHA_FDB] = { .type = NLA_FLAG },
- [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32, 0),
+ [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
+ NHA_OP_FLAG_DUMP_STATS),
};
static const struct nla_policy rtm_nh_res_policy_new[] = {
@@ -661,8 +663,77 @@ static int nla_put_nh_group_res(struct sk_buff *skb, struct nh_group *nhg)
return -EMSGSIZE;
}
-static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg)
+static void nh_grp_entry_stats_read(struct nh_grp_entry *nhge,
+ struct nh_grp_entry_stats *stats)
{
+ int i;
+
+ memset(stats, 0, sizeof(*stats));
+ for_each_possible_cpu(i) {
+ struct nh_grp_entry_stats *cpu_stats;
+ unsigned int start;
+ u64 packets;
+
+ cpu_stats = per_cpu_ptr(nhge->stats, i);
+ do {
+ start = u64_stats_fetch_begin(&cpu_stats->syncp);
+ packets = cpu_stats->packets;
+ } while (u64_stats_fetch_retry(&cpu_stats->syncp, start));
+
+ stats->packets += packets;
+ }
+}
+
+static int nla_put_nh_group_stats_entry(struct sk_buff *skb,
+ struct nh_grp_entry *nhge)
+{
+ struct nh_grp_entry_stats stats;
+ struct nlattr *nest;
+
+ nh_grp_entry_stats_read(nhge, &stats);
+
+ nest = nla_nest_start(skb, NHA_GROUP_STATS_ENTRY);
+ if (!nest)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(skb, NHA_GROUP_STATS_ENTRY_ID, nhge->nh->id) ||
+ nla_put_uint(skb, NHA_GROUP_STATS_ENTRY_PACKETS, stats.packets))
+ goto nla_put_failure;
+
+ nla_nest_end(skb, nest);
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+}
+
+static int nla_put_nh_group_stats(struct sk_buff *skb, struct nexthop *nh)
+{
+ struct nh_group *nhg = rtnl_dereference(nh->nh_grp);
+ struct nlattr *nest;
+ int i;
+
+ nest = nla_nest_start(skb, NHA_GROUP_STATS);
+ if (!nest)
+ return -EMSGSIZE;
+
+ for (i = 0; i < nhg->num_nh; i++)
+ if (nla_put_nh_group_stats_entry(skb, &nhg->nh_entries[i]))
+ goto cancel_out;
+
+ nla_nest_end(skb, nest);
+ return 0;
+
+cancel_out:
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+}
+
+static int nla_put_nh_group(struct sk_buff *skb, struct nexthop *nh,
+ u32 op_flags)
+{
+ struct nh_group *nhg = rtnl_dereference(nh->nh_grp);
struct nexthop_grp *p;
size_t len = nhg->num_nh * sizeof(*p);
struct nlattr *nla;
@@ -691,6 +762,10 @@ static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg)
if (nhg->resilient && nla_put_nh_group_res(skb, nhg))
goto nla_put_failure;
+ if (op_flags & NHA_OP_FLAG_DUMP_STATS &&
+ nla_put_nh_group_stats(skb, nh))
+ goto nla_put_failure;
+
return 0;
nla_put_failure:
@@ -698,7 +773,8 @@ static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg)
}
static int nh_fill_node(struct sk_buff *skb, struct nexthop *nh,
- int event, u32 portid, u32 seq, unsigned int nlflags)
+ int event, u32 portid, u32 seq, unsigned int nlflags,
+ u32 op_flags)
{
struct fib6_nh *fib6_nh;
struct fib_nh *fib_nh;
@@ -725,7 +801,7 @@ static int nh_fill_node(struct sk_buff *skb, struct nexthop *nh,
if (nhg->fdb_nh && nla_put_flag(skb, NHA_FDB))
goto nla_put_failure;
- if (nla_put_nh_group(skb, nhg))
+ if (nla_put_nh_group(skb, nh, op_flags))
goto nla_put_failure;
goto out;
}
@@ -856,7 +932,7 @@ static void nexthop_notify(int event, struct nexthop *nh, struct nl_info *info)
if (!skb)
goto errout;
- err = nh_fill_node(skb, nh, event, info->portid, seq, nlflags);
+ err = nh_fill_node(skb, nh, event, info->portid, seq, nlflags, 0);
if (err < 0) {
/* -EMSGSIZE implies BUG in nh_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
@@ -3085,7 +3161,7 @@ static int rtm_get_nexthop(struct sk_buff *in_skb, struct nlmsghdr *nlh,
goto errout_free;
err = nh_fill_node(skb, nh, RTM_NEWNEXTHOP, NETLINK_CB(in_skb).portid,
- nlh->nlmsg_seq, 0);
+ nlh->nlmsg_seq, 0, op_flags);
if (err < 0) {
WARN_ON(err == -EMSGSIZE);
goto errout_free;
@@ -3255,7 +3331,7 @@ static int rtm_dump_nexthop_cb(struct sk_buff *skb, struct netlink_callback *cb,
return nh_fill_node(skb, nh, RTM_NEWNEXTHOP,
NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq, NLM_F_MULTI);
+ cb->nlh->nlmsg_seq, NLM_F_MULTI, filter->op_flags);
}
/* rtnl */
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH net-next v2 5/7] net: nexthop: Add hardware statistics notifications
2024-02-29 18:16 [PATCH net-next v2 0/7] Support for nexthop group statistics Petr Machata
` (3 preceding siblings ...)
2024-02-29 18:16 ` [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space Petr Machata
@ 2024-02-29 18:16 ` Petr Machata
2024-03-01 15:46 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 6/7] net: nexthop: Add ability to enable / disable hardware statistics Petr Machata
2024-02-29 18:16 ` [PATCH net-next v2 7/7] net: nexthop: Expose nexthop group HW stats to user space Petr Machata
6 siblings, 1 reply; 23+ messages in thread
From: Petr Machata @ 2024-02-29 18:16 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Simon Horman, mlxsw
From: Ido Schimmel <idosch@nvidia.com>
Add hw_stats field to several notifier structures to communicate to the
drivers that HW statistics should be configured for nexthops within a given
group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
include/net/nexthop.h | 3 +++
net/ipv4/nexthop.c | 2 ++
2 files changed, 5 insertions(+)
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 4bf1875445d8..a8dad8f48ca8 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -130,6 +130,7 @@ struct nh_group {
bool resilient;
bool fdb_nh;
bool has_v4;
+ bool hw_stats;
struct nh_res_table __rcu *res_table;
struct nh_grp_entry nh_entries[] __counted_by(num_nh);
@@ -193,6 +194,7 @@ struct nh_notifier_grp_entry_info {
struct nh_notifier_grp_info {
u16 num_nh;
bool is_fdb;
+ bool hw_stats;
struct nh_notifier_grp_entry_info nh_entries[] __counted_by(num_nh);
};
@@ -206,6 +208,7 @@ struct nh_notifier_res_bucket_info {
struct nh_notifier_res_table_info {
u16 num_nh_buckets;
+ bool hw_stats;
struct nh_notifier_single_info nhs[] __counted_by(num_nh_buckets);
};
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 0ede8777bd66..02629ba7a75d 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -139,6 +139,7 @@ static int nh_notifier_mpath_info_init(struct nh_notifier_info *info,
info->nh_grp->num_nh = num_nh;
info->nh_grp->is_fdb = nhg->fdb_nh;
+ info->nh_grp->hw_stats = nhg->hw_stats;
for (i = 0; i < num_nh; i++) {
struct nh_grp_entry *nhge = &nhg->nh_entries[i];
@@ -170,6 +171,7 @@ static int nh_notifier_res_table_info_init(struct nh_notifier_info *info,
return -ENOMEM;
info->nh_res_table->num_nh_buckets = num_nh_buckets;
+ info->nh_res_table->hw_stats = nhg->hw_stats;
for (i = 0; i < num_nh_buckets; i++) {
struct nh_res_bucket *bucket = &res_table->nh_buckets[i];
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH net-next v2 6/7] net: nexthop: Add ability to enable / disable hardware statistics
2024-02-29 18:16 [PATCH net-next v2 0/7] Support for nexthop group statistics Petr Machata
` (4 preceding siblings ...)
2024-02-29 18:16 ` [PATCH net-next v2 5/7] net: nexthop: Add hardware statistics notifications Petr Machata
@ 2024-02-29 18:16 ` Petr Machata
2024-03-01 15:49 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 7/7] net: nexthop: Expose nexthop group HW stats to user space Petr Machata
6 siblings, 1 reply; 23+ messages in thread
From: Petr Machata @ 2024-02-29 18:16 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Simon Horman, mlxsw
From: Ido Schimmel <idosch@nvidia.com>
Add netlink support for enabling collection of HW statistics on nexthop
groups.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
include/net/nexthop.h | 2 ++
include/uapi/linux/nexthop.h | 3 +++
net/ipv4/nexthop.c | 15 ++++++++++++++-
3 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index a8dad8f48ca8..20cd337b4a9c 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -47,6 +47,8 @@ struct nh_config {
bool nh_grp_res_has_idle_timer;
bool nh_grp_res_has_unbalanced_timer;
+ bool nh_hw_stats;
+
struct nlattr *nh_encap;
u16 nh_encap_type;
diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
index f4db63c17085..b86af80d4e09 100644
--- a/include/uapi/linux/nexthop.h
+++ b/include/uapi/linux/nexthop.h
@@ -68,6 +68,9 @@ enum {
/* nested; nexthop group stats */
NHA_GROUP_STATS,
+ /* u32; nexthop hardware stats enable */
+ NHA_HW_STATS_ENABLE,
+
__NHA_MAX,
};
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 02629ba7a75d..15f108c440ae 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -37,6 +37,7 @@ static const struct nla_policy rtm_nh_policy_new[] = {
[NHA_ENCAP] = { .type = NLA_NESTED },
[NHA_FDB] = { .type = NLA_FLAG },
[NHA_RES_GROUP] = { .type = NLA_NESTED },
+ [NHA_HW_STATS_ENABLE] = NLA_POLICY_MAX(NLA_U32, 1),
};
static const struct nla_policy rtm_nh_policy_get[] = {
@@ -765,7 +766,8 @@ static int nla_put_nh_group(struct sk_buff *skb, struct nexthop *nh,
goto nla_put_failure;
if (op_flags & NHA_OP_FLAG_DUMP_STATS &&
- nla_put_nh_group_stats(skb, nh))
+ (nla_put_u32(skb, NHA_HW_STATS_ENABLE, nhg->hw_stats) ||
+ nla_put_nh_group_stats(skb, nh)))
goto nla_put_failure;
return 0;
@@ -1189,6 +1191,7 @@ static int nh_check_attr_group(struct net *net,
if (!tb[i])
continue;
switch (i) {
+ case NHA_HW_STATS_ENABLE:
case NHA_FDB:
continue;
case NHA_RES_GROUP:
@@ -2609,6 +2612,9 @@ static struct nexthop *nexthop_create_group(struct net *net,
if (cfg->nh_fdb)
nhg->fdb_nh = 1;
+ if (cfg->nh_hw_stats)
+ nhg->hw_stats = true;
+
rcu_assign_pointer(nh->nh_grp, nhg);
return nh;
@@ -2951,6 +2957,9 @@ static int rtm_to_nh_config(struct net *net, struct sk_buff *skb,
err = rtm_to_nh_config_grp_res(tb[NHA_RES_GROUP],
cfg, extack);
+ if (tb[NHA_HW_STATS_ENABLE])
+ cfg->nh_hw_stats = nla_get_u32(tb[NHA_HW_STATS_ENABLE]);
+
/* no other attributes should be set */
goto out;
}
@@ -3042,6 +3051,10 @@ static int rtm_to_nh_config(struct net *net, struct sk_buff *skb,
goto out;
}
+ if (tb[NHA_HW_STATS_ENABLE]) {
+ NL_SET_ERR_MSG(extack, "Cannot enable nexthop hardware statistics for non-group nexthops");
+ goto out;
+ }
err = 0;
out:
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH net-next v2 7/7] net: nexthop: Expose nexthop group HW stats to user space
2024-02-29 18:16 [PATCH net-next v2 0/7] Support for nexthop group statistics Petr Machata
` (5 preceding siblings ...)
2024-02-29 18:16 ` [PATCH net-next v2 6/7] net: nexthop: Add ability to enable / disable hardware statistics Petr Machata
@ 2024-02-29 18:16 ` Petr Machata
2024-03-01 15:55 ` David Ahern
6 siblings, 1 reply; 23+ messages in thread
From: Petr Machata @ 2024-02-29 18:16 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Ido Schimmel, Petr Machata, David Ahern, Simon Horman, mlxsw,
Kees Cook
From: Ido Schimmel <idosch@nvidia.com>
Add netlink support for reading NH group hardware stats.
Stats collection is done through a new notifier,
NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for
a given NH group are thereby asked to collect the stats and report back to
core by calling nh_grp_hw_stats_report_delta(). This is similar to what
netdevice L3 stats do.
Besides exposing number of packets that passed in the HW datapath, also
include information on whether any driver actually realizes the counters.
The core can tell based on whether it got any _report_delta() reports from
the drivers. This allows enabling the statistics at the group at any time,
with drivers opting into supporting them. This is also in line with what
netdevice L3 stats are doing.
So as not to waste time and space, tie the collection and reporting of HW
stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS.
Co-developed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits
---
Notes:
v2:
- Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS_HW
- Do not cancel outside of nesting in nla_put_nh_group_stats()
include/net/nexthop.h | 18 +++++
include/uapi/linux/nexthop.h | 9 +++
net/ipv4/nexthop.c | 133 ++++++++++++++++++++++++++++++++---
3 files changed, 151 insertions(+), 9 deletions(-)
diff --git a/include/net/nexthop.h b/include/net/nexthop.h
index 20cd337b4a9c..235f94ab16a8 100644
--- a/include/net/nexthop.h
+++ b/include/net/nexthop.h
@@ -122,6 +122,7 @@ struct nh_grp_entry {
struct list_head nh_list;
struct nexthop *nh_parent; /* nexthop of group with this entry */
+ u64 packets_hw;
};
struct nh_group {
@@ -166,6 +167,7 @@ enum nexthop_event_type {
NEXTHOP_EVENT_REPLACE,
NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE,
NEXTHOP_EVENT_BUCKET_REPLACE,
+ NEXTHOP_EVENT_HW_STATS_REPORT_DELTA,
};
enum nh_notifier_info_type {
@@ -173,6 +175,7 @@ enum nh_notifier_info_type {
NH_NOTIFIER_INFO_TYPE_GRP,
NH_NOTIFIER_INFO_TYPE_RES_TABLE,
NH_NOTIFIER_INFO_TYPE_RES_BUCKET,
+ NH_NOTIFIER_INFO_TYPE_GRP_HW_STATS,
};
struct nh_notifier_single_info {
@@ -214,6 +217,17 @@ struct nh_notifier_res_table_info {
struct nh_notifier_single_info nhs[] __counted_by(num_nh_buckets);
};
+struct nh_notifier_grp_hw_stats_entry_info {
+ u32 id;
+ u64 packets;
+};
+
+struct nh_notifier_grp_hw_stats_info {
+ u16 num_nh;
+ bool hw_stats_used;
+ struct nh_notifier_grp_hw_stats_entry_info stats[] __counted_by(num_nh);
+};
+
struct nh_notifier_info {
struct net *net;
struct netlink_ext_ack *extack;
@@ -224,6 +238,7 @@ struct nh_notifier_info {
struct nh_notifier_grp_info *nh_grp;
struct nh_notifier_res_table_info *nh_res_table;
struct nh_notifier_res_bucket_info *nh_res_bucket;
+ struct nh_notifier_grp_hw_stats_info *nh_grp_hw_stats;
};
};
@@ -236,6 +251,9 @@ void nexthop_bucket_set_hw_flags(struct net *net, u32 id, u16 bucket_index,
bool offload, bool trap);
void nexthop_res_grp_activity_update(struct net *net, u32 id, u16 num_buckets,
unsigned long *activity);
+void nh_grp_hw_stats_report_delta(struct nh_notifier_grp_hw_stats_info *info,
+ unsigned int nh_idx,
+ u64 delta_packets);
/* caller is holding rcu or rtnl; no reference taken to nexthop */
struct nexthop *nexthop_find_by_id(struct net *net, u32 id);
diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
index b86af80d4e09..dd8787f9cf39 100644
--- a/include/uapi/linux/nexthop.h
+++ b/include/uapi/linux/nexthop.h
@@ -31,6 +31,7 @@ enum {
#define NEXTHOP_GRP_TYPE_MAX (__NEXTHOP_GRP_TYPE_MAX - 1)
#define NHA_OP_FLAG_DUMP_STATS BIT(0)
+#define NHA_OP_FLAG_DUMP_HW_STATS BIT(1)
enum {
NHA_UNSPEC,
@@ -71,6 +72,9 @@ enum {
/* u32; nexthop hardware stats enable */
NHA_HW_STATS_ENABLE,
+ /* u32; read-only; whether any driver collects HW stats */
+ NHA_HW_STATS_USED,
+
__NHA_MAX,
};
@@ -132,6 +136,11 @@ enum {
/* uint; number of packets forwarded via the nexthop group entry */
NHA_GROUP_STATS_ENTRY_PACKETS,
+ /* uint; number of packets forwarded via the nexthop group entry in
+ * hardware
+ */
+ NHA_GROUP_STATS_ENTRY_PACKETS_HW,
+
__NHA_GROUP_STATS_ENTRY_MAX,
};
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 15f108c440ae..169a003cc855 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -43,7 +43,8 @@ static const struct nla_policy rtm_nh_policy_new[] = {
static const struct nla_policy rtm_nh_policy_get[] = {
[NHA_ID] = { .type = NLA_U32 },
[NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
- NHA_OP_FLAG_DUMP_STATS),
+ NHA_OP_FLAG_DUMP_STATS |
+ NHA_OP_FLAG_DUMP_HW_STATS),
};
static const struct nla_policy rtm_nh_policy_del[] = {
@@ -56,7 +57,8 @@ static const struct nla_policy rtm_nh_policy_dump[] = {
[NHA_MASTER] = { .type = NLA_U32 },
[NHA_FDB] = { .type = NLA_FLAG },
[NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
- NHA_OP_FLAG_DUMP_STATS),
+ NHA_OP_FLAG_DUMP_STATS |
+ NHA_OP_FLAG_DUMP_HW_STATS),
};
static const struct nla_policy rtm_nh_res_policy_new[] = {
@@ -687,8 +689,95 @@ static void nh_grp_entry_stats_read(struct nh_grp_entry *nhge,
}
}
+static int nh_notifier_grp_hw_stats_init(struct nh_notifier_info *info,
+ const struct nexthop *nh)
+{
+ struct nh_group *nhg;
+ int i;
+
+ ASSERT_RTNL();
+ nhg = rtnl_dereference(nh->nh_grp);
+
+ info->id = nh->id;
+ info->type = NH_NOTIFIER_INFO_TYPE_GRP_HW_STATS;
+ info->nh_grp_hw_stats = kzalloc(struct_size(info->nh_grp_hw_stats,
+ stats, nhg->num_nh),
+ GFP_KERNEL);
+ if (!info->nh_grp_hw_stats)
+ return -ENOMEM;
+
+ info->nh_grp_hw_stats->num_nh = nhg->num_nh;
+ for (i = 0; i < nhg->num_nh; i++) {
+ struct nh_grp_entry *nhge = &nhg->nh_entries[i];
+
+ info->nh_grp_hw_stats->stats[i].id = nhge->nh->id;
+ }
+
+ return 0;
+}
+
+static void nh_notifier_grp_hw_stats_fini(struct nh_notifier_info *info)
+{
+ kfree(info->nh_grp_hw_stats);
+}
+
+void nh_grp_hw_stats_report_delta(struct nh_notifier_grp_hw_stats_info *info,
+ unsigned int nh_idx,
+ u64 delta_packets)
+{
+ info->hw_stats_used = true;
+ info->stats[nh_idx].packets += delta_packets;
+}
+EXPORT_SYMBOL(nh_grp_hw_stats_report_delta);
+
+static void nh_grp_hw_stats_apply_update(struct nexthop *nh,
+ struct nh_notifier_info *info)
+{
+ struct nh_group *nhg;
+ int i;
+
+ ASSERT_RTNL();
+ nhg = rtnl_dereference(nh->nh_grp);
+
+ for (i = 0; i < nhg->num_nh; i++) {
+ struct nh_grp_entry *nhge = &nhg->nh_entries[i];
+
+ nhge->packets_hw += info->nh_grp_hw_stats->stats[i].packets;
+ }
+}
+
+static int nh_grp_hw_stats_update(struct nexthop *nh, bool *hw_stats_used)
+{
+ struct nh_notifier_info info = {
+ .net = nh->net,
+ };
+ struct net *net = nh->net;
+ int err;
+
+ if (nexthop_notifiers_is_empty(net))
+ return 0;
+
+ err = nh_notifier_grp_hw_stats_init(&info, nh);
+ if (err)
+ return err;
+
+ err = blocking_notifier_call_chain(&net->nexthop.notifier_chain,
+ NEXTHOP_EVENT_HW_STATS_REPORT_DELTA,
+ &info);
+
+ /* Cache whatever we got, even if there was an error, otherwise the
+ * successful stats retrievals would get lost.
+ */
+ nh_grp_hw_stats_apply_update(nh, &info);
+ *hw_stats_used = info.nh_grp_hw_stats->hw_stats_used;
+
+ nh_notifier_grp_hw_stats_fini(&info);
+ return notifier_to_errno(err);
+}
+
static int nla_put_nh_group_stats_entry(struct sk_buff *skb,
- struct nh_grp_entry *nhge)
+ struct nh_grp_entry *nhge,
+ u32 op_flags)
{
struct nh_grp_entry_stats stats;
struct nlattr *nest;
@@ -700,7 +789,13 @@ static int nla_put_nh_group_stats_entry(struct sk_buff *skb,
return -EMSGSIZE;
if (nla_put_u32(skb, NHA_GROUP_STATS_ENTRY_ID, nhge->nh->id) ||
- nla_put_uint(skb, NHA_GROUP_STATS_ENTRY_PACKETS, stats.packets))
+ nla_put_uint(skb, NHA_GROUP_STATS_ENTRY_PACKETS,
+ stats.packets + nhge->packets_hw))
+ goto nla_put_failure;
+
+ if (op_flags & NHA_OP_FLAG_DUMP_HW_STATS &&
+ nla_put_uint(skb, NHA_GROUP_STATS_ENTRY_PACKETS_HW,
+ nhge->packets_hw))
goto nla_put_failure;
nla_nest_end(skb, nest);
@@ -711,18 +806,35 @@ static int nla_put_nh_group_stats_entry(struct sk_buff *skb,
return -EMSGSIZE;
}
-static int nla_put_nh_group_stats(struct sk_buff *skb, struct nexthop *nh)
+static int nla_put_nh_group_stats(struct sk_buff *skb, struct nexthop *nh,
+ u32 op_flags)
{
struct nh_group *nhg = rtnl_dereference(nh->nh_grp);
struct nlattr *nest;
+ bool hw_stats_used;
+ int err;
int i;
+ if (nla_put_u32(skb, NHA_HW_STATS_ENABLE, nhg->hw_stats))
+ goto err_out;
+
+ if (op_flags & NHA_OP_FLAG_DUMP_HW_STATS &&
+ nhg->hw_stats) {
+ err = nh_grp_hw_stats_update(nh, &hw_stats_used);
+ if (err)
+ goto out;
+
+ if (nla_put_u32(skb, NHA_HW_STATS_USED, hw_stats_used))
+ goto err_out;
+ }
+
nest = nla_nest_start(skb, NHA_GROUP_STATS);
if (!nest)
- return -EMSGSIZE;
+ goto err_out;
for (i = 0; i < nhg->num_nh; i++)
- if (nla_put_nh_group_stats_entry(skb, &nhg->nh_entries[i]))
+ if (nla_put_nh_group_stats_entry(skb, &nhg->nh_entries[i],
+ op_flags))
goto cancel_out;
nla_nest_end(skb, nest);
@@ -730,7 +842,10 @@ static int nla_put_nh_group_stats(struct sk_buff *skb, struct nexthop *nh)
cancel_out:
nla_nest_cancel(skb, nest);
- return -EMSGSIZE;
+err_out:
+ err = -EMSGSIZE;
+out:
+ return err;
}
static int nla_put_nh_group(struct sk_buff *skb, struct nexthop *nh,
@@ -767,7 +882,7 @@ static int nla_put_nh_group(struct sk_buff *skb, struct nexthop *nh,
if (op_flags & NHA_OP_FLAG_DUMP_STATS &&
(nla_put_u32(skb, NHA_HW_STATS_ENABLE, nhg->hw_stats) ||
- nla_put_nh_group_stats(skb, nh)))
+ nla_put_nh_group_stats(skb, nh, op_flags)))
goto nla_put_failure;
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 1/7] net: nexthop: Adjust netlink policy parsing for a new attribute
2024-02-29 18:16 ` [PATCH net-next v2 1/7] net: nexthop: Adjust netlink policy parsing for a new attribute Petr Machata
@ 2024-03-01 15:31 ` David Ahern
0 siblings, 0 replies; 23+ messages in thread
From: David Ahern @ 2024-03-01 15:31 UTC (permalink / raw)
To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev
Cc: Ido Schimmel, Simon Horman, mlxsw
On 2/29/24 11:16 AM, Petr Machata wrote:
> A following patch will introduce a new attribute, op-specific flags to
> adjust the behavior of an operation. Different operations will recognize
> different flags.
>
> - To make the differentiation possible, stop sharing the policies for get
> and del operations.
>
> - To allow querying for presence of the attribute, have all the attribute
> arrays sized to NHA_MAX, regardless of what is permitted by policy, and
> pass the corresponding value to nlmsg_parse() as well.
>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
> net/ipv4/nexthop.c | 58 ++++++++++++++++++++++------------------------
> 1 file changed, 28 insertions(+), 30 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 2/7] net: nexthop: Add NHA_OP_FLAGS
2024-02-29 18:16 ` [PATCH net-next v2 2/7] net: nexthop: Add NHA_OP_FLAGS Petr Machata
@ 2024-03-01 15:36 ` David Ahern
0 siblings, 0 replies; 23+ messages in thread
From: David Ahern @ 2024-03-01 15:36 UTC (permalink / raw)
To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev
Cc: Ido Schimmel, Simon Horman, mlxsw
On 2/29/24 11:16 AM, Petr Machata wrote:
> In order to add per-nexthop statistics, but still not increase netlink
> message size for consumers that do not care about them, there needs to be a
> toggle through which the user indicates their desire to get the statistics.
> To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to
> use the attribute for carrying of arbitrary operation-specific flags, i.e.
> not make it specific for get / dump.
>
> Add the new attribute to get and dump policies, but do not actually allow
> any flags yet -- those will come later as the flags themselves are defined.
> Add the necessary parsing code.
>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>
> Notes:
> v2:
> - Change OP_FLAGS to u32, enforce through NLA_POLICY_MASK
>
> include/uapi/linux/nexthop.h | 3 +++
> net/ipv4/nexthop.c | 24 ++++++++++++++++++++----
> 2 files changed, 23 insertions(+), 4 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 3/7] net: nexthop: Add nexthop group entry stats
2024-02-29 18:16 ` [PATCH net-next v2 3/7] net: nexthop: Add nexthop group entry stats Petr Machata
@ 2024-03-01 15:36 ` David Ahern
0 siblings, 0 replies; 23+ messages in thread
From: David Ahern @ 2024-03-01 15:36 UTC (permalink / raw)
To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev
Cc: Ido Schimmel, Simon Horman, mlxsw
On 2/29/24 11:16 AM, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> Add nexthop group entry stats to count the number of packets forwarded
> via each nexthop in the group. The stats will be exposed to user space
> for better data path observability in the next patch.
>
> The per-CPU stats pointer is placed at the beginning of 'struct
> nh_grp_entry', so that all the fields accessed for the data path reside
> on the same cache line:
>
> struct nh_grp_entry {
> struct nexthop * nh; /* 0 8 */
> struct nh_grp_entry_stats * stats; /* 8 8 */
> u8 weight; /* 16 1 */
>
> /* XXX 7 bytes hole, try to pack */
>
> union {
> struct {
> atomic_t upper_bound; /* 24 4 */
> } hthr; /* 24 4 */
> struct {
> struct list_head uw_nh_entry; /* 24 16 */
> u16 count_buckets; /* 40 2 */
> u16 wants_buckets; /* 42 2 */
> } res; /* 24 24 */
> }; /* 24 24 */
> struct list_head nh_list; /* 48 16 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> struct nexthop * nh_parent; /* 64 8 */
>
> /* size: 72, cachelines: 2, members: 6 */
> /* sum members: 65, holes: 1, sum holes: 7 */
> /* last cacheline: 8 bytes */
> };
>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>
> Notes:
> v2:
> - Set err on nexthop_create_group() error path
>
> include/net/nexthop.h | 6 ++++++
> net/ipv4/nexthop.c | 25 +++++++++++++++++++++----
> 2 files changed, 27 insertions(+), 4 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space
2024-02-29 18:16 ` [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space Petr Machata
@ 2024-03-01 15:45 ` David Ahern
2024-03-01 17:17 ` Petr Machata
2024-03-01 17:28 ` Jakub Kicinski
2024-03-01 17:32 ` Eric Dumazet
1 sibling, 2 replies; 23+ messages in thread
From: David Ahern @ 2024-03-01 15:45 UTC (permalink / raw)
To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev
Cc: Ido Schimmel, Simon Horman, mlxsw
On 2/29/24 11:16 AM, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> Add netlink support for reading NH group stats.
>
> This data is only for statistics of the traffic in the SW datapath. HW
> nexthop group statistics will be added in the following patches.
>
> Emission of the stats is keyed to a new op_stats flag to avoid cluttering
> the netlink message with stats if the user doesn't need them:
> NHA_OP_FLAG_DUMP_STATS.
>
> Co-developed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>
> Notes:
> v2:
> - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS
> - Rename jump target in nla_put_nh_group_stats() to avoid
> having to rename further in the patchset.
>
> include/uapi/linux/nexthop.h | 30 ++++++++++++
> net/ipv4/nexthop.c | 92 ++++++++++++++++++++++++++++++++----
> 2 files changed, 114 insertions(+), 8 deletions(-)
>
> @@ -104,4 +109,29 @@ enum {
>
> #define NHA_RES_BUCKET_MAX (__NHA_RES_BUCKET_MAX - 1)
>
> +enum {
> + NHA_GROUP_STATS_UNSPEC,
> +
> + /* nested; nexthop group entry stats */
> + NHA_GROUP_STATS_ENTRY,
> +
> + __NHA_GROUP_STATS_MAX,
> +};
> +
> +#define NHA_GROUP_STATS_MAX (__NHA_GROUP_STATS_MAX - 1)
> +
> +enum {
> + NHA_GROUP_STATS_ENTRY_UNSPEC,
> +
> + /* u32; nexthop id of the nexthop group entry */
> + NHA_GROUP_STATS_ENTRY_ID,
> +
> + /* uint; number of packets forwarded via the nexthop group entry */
why not make it a u64?
> + NHA_GROUP_STATS_ENTRY_PACKETS,
> +
> + __NHA_GROUP_STATS_ENTRY_MAX,
> +};
> +
> +#define NHA_GROUP_STATS_ENTRY_MAX (__NHA_GROUP_STATS_ENTRY_MAX - 1)
> +
> #endif
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 5/7] net: nexthop: Add hardware statistics notifications
2024-02-29 18:16 ` [PATCH net-next v2 5/7] net: nexthop: Add hardware statistics notifications Petr Machata
@ 2024-03-01 15:46 ` David Ahern
0 siblings, 0 replies; 23+ messages in thread
From: David Ahern @ 2024-03-01 15:46 UTC (permalink / raw)
To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev
Cc: Ido Schimmel, Simon Horman, mlxsw
On 2/29/24 11:16 AM, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> Add hw_stats field to several notifier structures to communicate to the
> drivers that HW statistics should be configured for nexthops within a given
> group.
>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
> include/net/nexthop.h | 3 +++
> net/ipv4/nexthop.c | 2 ++
> 2 files changed, 5 insertions(+)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 6/7] net: nexthop: Add ability to enable / disable hardware statistics
2024-02-29 18:16 ` [PATCH net-next v2 6/7] net: nexthop: Add ability to enable / disable hardware statistics Petr Machata
@ 2024-03-01 15:49 ` David Ahern
2024-03-01 17:18 ` Petr Machata
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2024-03-01 15:49 UTC (permalink / raw)
To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev
Cc: Ido Schimmel, Simon Horman, mlxsw
On 2/29/24 11:16 AM, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> Add netlink support for enabling collection of HW statistics on nexthop
> groups.
>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
> include/net/nexthop.h | 2 ++
> include/uapi/linux/nexthop.h | 3 +++
> net/ipv4/nexthop.c | 15 ++++++++++++++-
> 3 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 02629ba7a75d..15f108c440ae 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -37,6 +37,7 @@ static const struct nla_policy rtm_nh_policy_new[] = {
> [NHA_ENCAP] = { .type = NLA_NESTED },
> [NHA_FDB] = { .type = NLA_FLAG },
> [NHA_RES_GROUP] = { .type = NLA_NESTED },
> + [NHA_HW_STATS_ENABLE] = NLA_POLICY_MAX(NLA_U32, 1),
numbers typically need a name or comment.
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 7/7] net: nexthop: Expose nexthop group HW stats to user space
2024-02-29 18:16 ` [PATCH net-next v2 7/7] net: nexthop: Expose nexthop group HW stats to user space Petr Machata
@ 2024-03-01 15:55 ` David Ahern
2024-03-01 17:22 ` Petr Machata
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2024-03-01 15:55 UTC (permalink / raw)
To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev
Cc: Ido Schimmel, Simon Horman, mlxsw, Kees Cook
On 2/29/24 11:16 AM, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> Add netlink support for reading NH group hardware stats.
>
> Stats collection is done through a new notifier,
> NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for
> a given NH group are thereby asked to collect the stats and report back to
> core by calling nh_grp_hw_stats_report_delta(). This is similar to what
> netdevice L3 stats do.
>
> Besides exposing number of packets that passed in the HW datapath, also
> include information on whether any driver actually realizes the counters.
> The core can tell based on whether it got any _report_delta() reports from
> the drivers. This allows enabling the statistics at the group at any time,
> with drivers opting into supporting them. This is also in line with what
> netdevice L3 stats are doing.
>
> So as not to waste time and space, tie the collection and reporting of HW
> stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS.
>
> Co-developed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits
> ---
>
> Notes:
> v2:
> - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS_HW
> - Do not cancel outside of nesting in nla_put_nh_group_stats()
>
> include/net/nexthop.h | 18 +++++
> include/uapi/linux/nexthop.h | 9 +++
> net/ipv4/nexthop.c | 133 ++++++++++++++++++++++++++++++++---
> 3 files changed, 151 insertions(+), 9 deletions(-)
>
> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 15f108c440ae..169a003cc855 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -43,7 +43,8 @@ static const struct nla_policy rtm_nh_policy_new[] = {
> static const struct nla_policy rtm_nh_policy_get[] = {
> [NHA_ID] = { .type = NLA_U32 },
> [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
> - NHA_OP_FLAG_DUMP_STATS),
> + NHA_OP_FLAG_DUMP_STATS |
> + NHA_OP_FLAG_DUMP_HW_STATS),
> };
>
> static const struct nla_policy rtm_nh_policy_del[] = {
> @@ -56,7 +57,8 @@ static const struct nla_policy rtm_nh_policy_dump[] = {
> [NHA_MASTER] = { .type = NLA_U32 },
> [NHA_FDB] = { .type = NLA_FLAG },
> [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
> - NHA_OP_FLAG_DUMP_STATS),
> + NHA_OP_FLAG_DUMP_STATS |
> + NHA_OP_FLAG_DUMP_HW_STATS),
2 instances of the same mask; worth giving it a name.
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space
2024-03-01 15:45 ` David Ahern
@ 2024-03-01 17:17 ` Petr Machata
2024-03-01 17:28 ` Jakub Kicinski
1 sibling, 0 replies; 23+ messages in thread
From: Petr Machata @ 2024-03-01 17:17 UTC (permalink / raw)
To: David Ahern
Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Ido Schimmel, Simon Horman, mlxsw
David Ahern <dsahern@kernel.org> writes:
> On 2/29/24 11:16 AM, Petr Machata wrote:
>> +enum {
>> + NHA_GROUP_STATS_ENTRY_UNSPEC,
>> +
>> + /* u32; nexthop id of the nexthop group entry */
>> + NHA_GROUP_STATS_ENTRY_ID,
>> +
>> + /* uint; number of packets forwarded via the nexthop group entry */
>
> why not make it a u64?
uint will store it as 32-bit if possible and as 64-bit if necessary.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 6/7] net: nexthop: Add ability to enable / disable hardware statistics
2024-03-01 15:49 ` David Ahern
@ 2024-03-01 17:18 ` Petr Machata
0 siblings, 0 replies; 23+ messages in thread
From: Petr Machata @ 2024-03-01 17:18 UTC (permalink / raw)
To: David Ahern
Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Ido Schimmel, Simon Horman, mlxsw
David Ahern <dsahern@kernel.org> writes:
> On 2/29/24 11:16 AM, Petr Machata wrote:
>> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
>> index 02629ba7a75d..15f108c440ae 100644
>> --- a/net/ipv4/nexthop.c
>> +++ b/net/ipv4/nexthop.c
>> @@ -37,6 +37,7 @@ static const struct nla_policy rtm_nh_policy_new[] = {
>> [NHA_ENCAP] = { .type = NLA_NESTED },
>> [NHA_FDB] = { .type = NLA_FLAG },
>> [NHA_RES_GROUP] = { .type = NLA_NESTED },
>> + [NHA_HW_STATS_ENABLE] = NLA_POLICY_MAX(NLA_U32, 1),
>
> numbers typically need a name or comment.
How about this?
[NHA_HW_STATS_ENABLE] = NLA_POLICY_MAX(NLA_U32, true),
Too weird maybe?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 7/7] net: nexthop: Expose nexthop group HW stats to user space
2024-03-01 15:55 ` David Ahern
@ 2024-03-01 17:22 ` Petr Machata
0 siblings, 0 replies; 23+ messages in thread
From: Petr Machata @ 2024-03-01 17:22 UTC (permalink / raw)
To: David Ahern
Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Ido Schimmel, Simon Horman, mlxsw, Kees Cook
David Ahern <dsahern@kernel.org> writes:
> On 2/29/24 11:16 AM, Petr Machata wrote:
>> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
>> index 15f108c440ae..169a003cc855 100644
>> --- a/net/ipv4/nexthop.c
>> +++ b/net/ipv4/nexthop.c
>> @@ -43,7 +43,8 @@ static const struct nla_policy rtm_nh_policy_new[] = {
>> static const struct nla_policy rtm_nh_policy_get[] = {
>> [NHA_ID] = { .type = NLA_U32 },
>> [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
>> - NHA_OP_FLAG_DUMP_STATS),
>> + NHA_OP_FLAG_DUMP_STATS |
>> + NHA_OP_FLAG_DUMP_HW_STATS),
>> };
>>
>> static const struct nla_policy rtm_nh_policy_del[] = {
>> @@ -56,7 +57,8 @@ static const struct nla_policy rtm_nh_policy_dump[] = {
>> [NHA_MASTER] = { .type = NLA_U32 },
>> [NHA_FDB] = { .type = NLA_FLAG },
>> [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
>> - NHA_OP_FLAG_DUMP_STATS),
>> + NHA_OP_FLAG_DUMP_STATS |
>> + NHA_OP_FLAG_DUMP_HW_STATS),
>
> 2 instances of the same mask; worth giving it a name.
OK. I'll stick this somewhere above the policies:
#define NHA_OP_FLAGS_DUMP_STATS_ALL (NHA_OP_FLAG_DUMP_STATS | \
NHA_OP_FLAG_DUMP_HW_STATS)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space
2024-03-01 15:45 ` David Ahern
2024-03-01 17:17 ` Petr Machata
@ 2024-03-01 17:28 ` Jakub Kicinski
2024-03-02 2:45 ` David Ahern
1 sibling, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2024-03-01 17:28 UTC (permalink / raw)
To: David Ahern
Cc: Petr Machata, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
Ido Schimmel, Simon Horman, mlxsw
On Fri, 1 Mar 2024 08:45:52 -0700 David Ahern wrote:
> > + /* uint; number of packets forwarded via the nexthop group entry */
>
> why not make it a u64?
I think it should be our default type. Explicit length should only
be picked if there's a clear reason. Too many review cycles wasted
steering people onto correct types...
Now, I'm not sure Pablo merged the sint/uint support into libmnl,
I can share the patches to handle them in iproute2, if that helps.
Or just copy/paste from YNL.
I need to find the courage to ask you how we can start using YNL
in iproute2, I really worry that we're not getting CLI tools written
for new families any more :(
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space
2024-02-29 18:16 ` [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space Petr Machata
2024-03-01 15:45 ` David Ahern
@ 2024-03-01 17:32 ` Eric Dumazet
2024-03-04 11:09 ` Petr Machata
1 sibling, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2024-03-01 17:32 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
Ido Schimmel, David Ahern, Simon Horman, mlxsw
On Thu, Feb 29, 2024 at 7:20 PM Petr Machata <petrm@nvidia.com> wrote:
>
> From: Ido Schimmel <idosch@nvidia.com>
>
> Add netlink support for reading NH group stats.
>
> This data is only for statistics of the traffic in the SW datapath. HW
> nexthop group statistics will be added in the following patches.
>
> Emission of the stats is keyed to a new op_stats flag to avoid cluttering
> the netlink message with stats if the user doesn't need them:
> NHA_OP_FLAG_DUMP_STATS.
>
> Co-developed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>
> Notes:
> v2:
> - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS
> - Rename jump target in nla_put_nh_group_stats() to avoid
> having to rename further in the patchset.
>
> include/uapi/linux/nexthop.h | 30 ++++++++++++
> net/ipv4/nexthop.c | 92 ++++++++++++++++++++++++++++++++----
> 2 files changed, 114 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
> index 086444e2946c..f4db63c17085 100644
> --- a/include/uapi/linux/nexthop.h
> +++ b/include/uapi/linux/nexthop.h
> @@ -30,6 +30,8 @@ enum {
>
> #define NEXTHOP_GRP_TYPE_MAX (__NEXTHOP_GRP_TYPE_MAX - 1)
>
> +#define NHA_OP_FLAG_DUMP_STATS BIT(0)
> +
> enum {
> NHA_UNSPEC,
> NHA_ID, /* u32; id for nexthop. id == 0 means auto-assign */
> @@ -63,6 +65,9 @@ enum {
> /* u32; operation-specific flags */
> NHA_OP_FLAGS,
>
> + /* nested; nexthop group stats */
> + NHA_GROUP_STATS,
> +
> __NHA_MAX,
> };
>
> @@ -104,4 +109,29 @@ enum {
>
> #define NHA_RES_BUCKET_MAX (__NHA_RES_BUCKET_MAX - 1)
>
> +enum {
> + NHA_GROUP_STATS_UNSPEC,
> +
> + /* nested; nexthop group entry stats */
> + NHA_GROUP_STATS_ENTRY,
> +
> + __NHA_GROUP_STATS_MAX,
> +};
> +
> +#define NHA_GROUP_STATS_MAX (__NHA_GROUP_STATS_MAX - 1)
> +
> +enum {
> + NHA_GROUP_STATS_ENTRY_UNSPEC,
> +
> + /* u32; nexthop id of the nexthop group entry */
> + NHA_GROUP_STATS_ENTRY_ID,
> +
> + /* uint; number of packets forwarded via the nexthop group entry */
> + NHA_GROUP_STATS_ENTRY_PACKETS,
> +
> + __NHA_GROUP_STATS_ENTRY_MAX,
> +};
> +
> +#define NHA_GROUP_STATS_ENTRY_MAX (__NHA_GROUP_STATS_ENTRY_MAX - 1)
> +
> #endif
> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 4be66622e24f..0ede8777bd66 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -41,7 +41,8 @@ static const struct nla_policy rtm_nh_policy_new[] = {
>
> static const struct nla_policy rtm_nh_policy_get[] = {
> [NHA_ID] = { .type = NLA_U32 },
> - [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32, 0),
> + [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
> + NHA_OP_FLAG_DUMP_STATS),
> };
>
> static const struct nla_policy rtm_nh_policy_del[] = {
> @@ -53,7 +54,8 @@ static const struct nla_policy rtm_nh_policy_dump[] = {
> [NHA_GROUPS] = { .type = NLA_FLAG },
> [NHA_MASTER] = { .type = NLA_U32 },
> [NHA_FDB] = { .type = NLA_FLAG },
> - [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32, 0),
> + [NHA_OP_FLAGS] = NLA_POLICY_MASK(NLA_U32,
> + NHA_OP_FLAG_DUMP_STATS),
> };
>
> static const struct nla_policy rtm_nh_res_policy_new[] = {
> @@ -661,8 +663,77 @@ static int nla_put_nh_group_res(struct sk_buff *skb, struct nh_group *nhg)
> return -EMSGSIZE;
> }
>
> -static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg)
> +static void nh_grp_entry_stats_read(struct nh_grp_entry *nhge,
> + struct nh_grp_entry_stats *stats)
> {
> + int i;
> +
> + memset(stats, 0, sizeof(*stats));
> + for_each_possible_cpu(i) {
> + struct nh_grp_entry_stats *cpu_stats;
> + unsigned int start;
> + u64 packets;
> +
> + cpu_stats = per_cpu_ptr(nhge->stats, i);
> + do {
> + start = u64_stats_fetch_begin(&cpu_stats->syncp);
> + packets = cpu_stats->packets;
This is not safe, even on 64bit arches.
You should use u64_stats_t, u64_stats_read(), u64_stats_add() ...
> + } while (u64_stats_fetch_retry(&cpu_stats->syncp, start));
> +
> + stats->packets += packets;
> + }
> +}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space
2024-03-01 17:28 ` Jakub Kicinski
@ 2024-03-02 2:45 ` David Ahern
2024-03-03 3:36 ` Jakub Kicinski
0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2024-03-02 2:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Petr Machata, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
Ido Schimmel, Simon Horman, mlxsw
On 3/1/24 10:28 AM, Jakub Kicinski wrote:
> On Fri, 1 Mar 2024 08:45:52 -0700 David Ahern wrote:
>>> + /* uint; number of packets forwarded via the nexthop group entry */
>>
>> why not make it a u64?
>
> I think it should be our default type. Explicit length should only
> be picked if there's a clear reason. Too many review cycles wasted
> steering people onto correct types...
>
> Now, I'm not sure Pablo merged the sint/uint support into libmnl,
> I can share the patches to handle them in iproute2, if that helps.
> Or just copy/paste from YNL.
>
> I need to find the courage to ask you how we can start using YNL
> in iproute2, I really worry that we're not getting CLI tools written
> for new families any more :(
I need to find time to play around with the cli tool more; my attempts
last weekend were a bit rough hunting for the right syntax.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space
2024-03-02 2:45 ` David Ahern
@ 2024-03-03 3:36 ` Jakub Kicinski
0 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2024-03-03 3:36 UTC (permalink / raw)
To: David Ahern
Cc: Petr Machata, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
Ido Schimmel, Simon Horman, mlxsw
On Fri, 1 Mar 2024 19:45:27 -0700 David Ahern wrote:
> > I need to find the courage to ask you how we can start using YNL
> > in iproute2, I really worry that we're not getting CLI tools written
> > for new families any more :(
>
> I need to find time to play around with the cli tool more; my attempts
> last weekend were a bit rough hunting for the right syntax.
Do you mean playing with the Python CLI? :(
The thought of packaging that did cross my mind, but I always viewed
it as a quick PoC / development tool. Maybe you're right tho, I should
stop fighting it.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space
2024-03-01 17:32 ` Eric Dumazet
@ 2024-03-04 11:09 ` Petr Machata
0 siblings, 0 replies; 23+ messages in thread
From: Petr Machata @ 2024-03-04 11:09 UTC (permalink / raw)
To: Eric Dumazet
Cc: Petr Machata, David S. Miller, Jakub Kicinski, Paolo Abeni,
netdev, Ido Schimmel, David Ahern, Simon Horman, mlxsw
Eric Dumazet <edumazet@google.com> writes:
> On Thu, Feb 29, 2024 at 7:20 PM Petr Machata <petrm@nvidia.com> wrote:
>> @@ -661,8 +663,77 @@ static int nla_put_nh_group_res(struct sk_buff *skb, struct nh_group *nhg)
>> return -EMSGSIZE;
>> }
>>
>> -static int nla_put_nh_group(struct sk_buff *skb, struct nh_group *nhg)
>> +static void nh_grp_entry_stats_read(struct nh_grp_entry *nhge,
>> + struct nh_grp_entry_stats *stats)
>> {
>> + int i;
>> +
>> + memset(stats, 0, sizeof(*stats));
>> + for_each_possible_cpu(i) {
>> + struct nh_grp_entry_stats *cpu_stats;
>> + unsigned int start;
>> + u64 packets;
>> +
>> + cpu_stats = per_cpu_ptr(nhge->stats, i);
>> + do {
>> + start = u64_stats_fetch_begin(&cpu_stats->syncp);
>> + packets = cpu_stats->packets;
>
> This is not safe, even on 64bit arches.
>
> You should use u64_stats_t, u64_stats_read(), u64_stats_add() ...
OK.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-03-04 12:45 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-29 18:16 [PATCH net-next v2 0/7] Support for nexthop group statistics Petr Machata
2024-02-29 18:16 ` [PATCH net-next v2 1/7] net: nexthop: Adjust netlink policy parsing for a new attribute Petr Machata
2024-03-01 15:31 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 2/7] net: nexthop: Add NHA_OP_FLAGS Petr Machata
2024-03-01 15:36 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 3/7] net: nexthop: Add nexthop group entry stats Petr Machata
2024-03-01 15:36 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 4/7] net: nexthop: Expose nexthop group stats to user space Petr Machata
2024-03-01 15:45 ` David Ahern
2024-03-01 17:17 ` Petr Machata
2024-03-01 17:28 ` Jakub Kicinski
2024-03-02 2:45 ` David Ahern
2024-03-03 3:36 ` Jakub Kicinski
2024-03-01 17:32 ` Eric Dumazet
2024-03-04 11:09 ` Petr Machata
2024-02-29 18:16 ` [PATCH net-next v2 5/7] net: nexthop: Add hardware statistics notifications Petr Machata
2024-03-01 15:46 ` David Ahern
2024-02-29 18:16 ` [PATCH net-next v2 6/7] net: nexthop: Add ability to enable / disable hardware statistics Petr Machata
2024-03-01 15:49 ` David Ahern
2024-03-01 17:18 ` Petr Machata
2024-02-29 18:16 ` [PATCH net-next v2 7/7] net: nexthop: Expose nexthop group HW stats to user space Petr Machata
2024-03-01 15:55 ` David Ahern
2024-03-01 17:22 ` Petr Machata
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).