* [PATCH net 0/3] nexthop: Nexthop dump fixes
@ 2023-08-08 7:52 Ido Schimmel
2023-08-08 7:52 ` [PATCH net 1/3] nexthop: Fix infinite nexthop dump when using maximum nexthop ID Ido Schimmel
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Ido Schimmel @ 2023-08-08 7:52 UTC (permalink / raw)
To: netdev; +Cc: davem, kuba, pabeni, edumazet, dsahern, petrm, Ido Schimmel
Patches #1 and #3 fix two problems related to nexthops and nexthop
buckets dump, respectively. Patch #2 is a preparation for the third
patch.
The pattern described in these patches of splitting the NLMSG_DONE to a
separate response is prevalent in other rtnetlink dump callbacks. I
don't know if it's because I'm missing something or if this was done
intentionally to ensure the message is delivered to user space. After
commit 0642840b8bb0 ("af_netlink: ensure that NLMSG_DONE never fails in
dumps") this is no longer necessary and I can improve these dump
callbacks assuming this analysis is correct.
No regressions in existing tests:
# ./fib_nexthops.sh
[...]
Tests passed: 230
Tests failed: 0
Ido Schimmel (3):
nexthop: Fix infinite nexthop dump when using maximum nexthop ID
nexthop: Make nexthop bucket dump more efficient
nexthop: Fix infinite nexthop bucket dump when using maximum nexthop
ID
net/ipv4/nexthop.c | 28 ++++++---------------
tools/testing/selftests/net/fib_nexthops.sh | 10 ++++++++
2 files changed, 17 insertions(+), 21 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net 1/3] nexthop: Fix infinite nexthop dump when using maximum nexthop ID
2023-08-08 7:52 [PATCH net 0/3] nexthop: Nexthop dump fixes Ido Schimmel
@ 2023-08-08 7:52 ` Ido Schimmel
2023-08-08 14:53 ` David Ahern
2023-08-08 7:52 ` [PATCH net 2/3] nexthop: Make nexthop bucket dump more efficient Ido Schimmel
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Ido Schimmel @ 2023-08-08 7:52 UTC (permalink / raw)
To: netdev; +Cc: davem, kuba, pabeni, edumazet, dsahern, petrm, Ido Schimmel
A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.
The nexthop dump callback always returns a positive number if nexthops
were filled in the provided skb, even if the dump is complete. This
means that a dump will span at least two recvmsg() calls as long as
nexthops are present. In the last recvmsg() call the dump callback will
not fill in any nexthops because the previous call indicated that the
dump should restart from the last dumped nexthop ID plus one.
# ip nexthop add id 1 blackhole
# strace -e sendto,recvmsg -s 5 ip nexthop
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394315, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 36
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 1], {nla_len=4, nla_type=NHA_BLACKHOLE}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
id 1 blackhole
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
+++ exited with 0 +++
This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:
# ip nexthop add id $((2**32-1)) blackhole
# ip nexthop
id 4294967295 blackhole
id 4294967295 blackhole
[...]
Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOP response:
# ip nexthop add id $((2**32-1)) blackhole
# strace -e sendto,recvmsg -s 5 ip nexthop
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394080, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 56
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 4294967295], {nla_len=4, nla_type=NHA_BLACKHOLE}]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 56
id 4294967295 blackhole
+++ exited with 0 +++
Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.
Add a test that fails before the fix:
# ./fib_nexthops.sh -t basic
[...]
TEST: Maximum nexthop ID dump [FAIL]
[...]
And passes after it:
# ./fib_nexthops.sh -t basic
[...]
TEST: Maximum nexthop ID dump [ OK ]
[...]
Fixes: ab84be7e54fc ("net: Initial nexthop code")
Reported-by: Petr Machata <petrm@nvidia.com>
Closes: https://lore.kernel.org/netdev/87sf91enuf.fsf@nvidia.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
---
net/ipv4/nexthop.c | 6 +-----
tools/testing/selftests/net/fib_nexthops.sh | 5 +++++
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index f95142e56da0..179e50d8fe07 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -3221,13 +3221,9 @@ static int rtm_dump_nexthop(struct sk_buff *skb, struct netlink_callback *cb)
&rtm_dump_nexthop_cb, &filter);
if (err < 0) {
if (likely(skb->len))
- goto out;
- goto out_err;
+ err = skb->len;
}
-out:
- err = skb->len;
-out_err:
cb->seq = net->nexthop.seq;
nl_dump_check_consistent(cb, nlmsg_hdr(skb));
return err;
diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
index 0f5e88c8f4ff..10aa059b9f06 100755
--- a/tools/testing/selftests/net/fib_nexthops.sh
+++ b/tools/testing/selftests/net/fib_nexthops.sh
@@ -1981,6 +1981,11 @@ basic()
run_cmd "$IP link set dev lo up"
+ # Dump should not loop endlessly when maximum nexthop ID is configured.
+ run_cmd "$IP nexthop add id $((2**32-1)) blackhole"
+ run_cmd "timeout 5 $IP nexthop"
+ log_test $? 0 "Maximum nexthop ID dump"
+
#
# groups
#
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 2/3] nexthop: Make nexthop bucket dump more efficient
2023-08-08 7:52 [PATCH net 0/3] nexthop: Nexthop dump fixes Ido Schimmel
2023-08-08 7:52 ` [PATCH net 1/3] nexthop: Fix infinite nexthop dump when using maximum nexthop ID Ido Schimmel
@ 2023-08-08 7:52 ` Ido Schimmel
2023-08-08 14:53 ` David Ahern
2023-08-08 7:52 ` [PATCH net 3/3] nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID Ido Schimmel
2023-08-09 21:00 ` [PATCH net 0/3] nexthop: Nexthop dump fixes patchwork-bot+netdevbpf
3 siblings, 1 reply; 8+ messages in thread
From: Ido Schimmel @ 2023-08-08 7:52 UTC (permalink / raw)
To: netdev; +Cc: davem, kuba, pabeni, edumazet, dsahern, petrm, Ido Schimmel
rtm_dump_nexthop_bucket_nh() is used to dump nexthop buckets belonging
to a specific resilient nexthop group. The function returns a positive
return code (the skb length) upon both success and failure.
The above behavior is problematic. When a complete nexthop bucket dump
is requested, the function that walks the different nexthops treats the
non-zero return code as an error. This causes buckets belonging to
different resilient nexthop groups to be dumped using different buffers
even if they can all fit in the same buffer:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 1
# ip nexthop add id 20 group 1 type resilient buckets 1
# strace -e recvmsg -s 0 ip nexthop bucket
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
id 10 index 0 idle_time 10.27 nhid 1
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
id 20 index 0 idle_time 6.44 nhid 1
[...]
Fix by only returning a non-zero return code when an error occurred and
restarting the dump from the bucket index we failed to fill in. This
allows buckets belonging to different resilient nexthop groups to be
dumped using the same buffer:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 1
# ip nexthop add id 20 group 1 type resilient buckets 1
# strace -e recvmsg -s 0 ip nexthop bucket
[...]
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
id 10 index 0 idle_time 30.21 nhid 1
id 20 index 0 idle_time 26.7 nhid 1
[...]
While this change is more of a performance improvement change than an
actual bug fix, it is a prerequisite for a subsequent patch that does
fix a bug.
Fixes: 8a1bbabb034d ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
---
net/ipv4/nexthop.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 179e50d8fe07..f365a4f63899 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -3363,25 +3363,19 @@ static int rtm_dump_nexthop_bucket_nh(struct sk_buff *skb,
dd->filter.res_bucket_nh_id != nhge->nh->id)
continue;
+ dd->ctx->bucket_index = bucket_index;
err = nh_fill_res_bucket(skb, nh, bucket, bucket_index,
RTM_NEWNEXTHOPBUCKET, portid,
cb->nlh->nlmsg_seq, NLM_F_MULTI,
cb->extack);
- if (err < 0) {
- if (likely(skb->len))
- goto out;
- goto out_err;
- }
+ if (err)
+ return err;
}
dd->ctx->done_nh_idx = dd->ctx->nh.idx + 1;
- bucket_index = 0;
+ dd->ctx->bucket_index = 0;
-out:
- err = skb->len;
-out_err:
- dd->ctx->bucket_index = bucket_index;
- return err;
+ return 0;
}
static int rtm_dump_nexthop_bucket_cb(struct sk_buff *skb,
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 3/3] nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
2023-08-08 7:52 [PATCH net 0/3] nexthop: Nexthop dump fixes Ido Schimmel
2023-08-08 7:52 ` [PATCH net 1/3] nexthop: Fix infinite nexthop dump when using maximum nexthop ID Ido Schimmel
2023-08-08 7:52 ` [PATCH net 2/3] nexthop: Make nexthop bucket dump more efficient Ido Schimmel
@ 2023-08-08 7:52 ` Ido Schimmel
2023-08-08 14:54 ` David Ahern
2023-08-09 21:00 ` [PATCH net 0/3] nexthop: Nexthop dump fixes patchwork-bot+netdevbpf
3 siblings, 1 reply; 8+ messages in thread
From: Ido Schimmel @ 2023-08-08 7:52 UTC (permalink / raw)
To: netdev; +Cc: davem, kuba, pabeni, edumazet, dsahern, petrm, Ido Schimmel
A netlink dump callback can return a positive number to signal that more
information needs to be dumped or zero to signal that the dump is
complete. In the second case, the core netlink code will append the
NLMSG_DONE message to the skb in order to indicate to user space that
the dump is complete.
The nexthop bucket dump callback always returns a positive number if
nexthop buckets were filled in the provided skb, even if the dump is
complete. This means that a dump will span at least two recvmsg() calls
as long as nexthop buckets are present. In the last recvmsg() call the
dump callback will not fill in any nexthop buckets because the previous
call indicated that the dump should restart from the last dumped nexthop
ID plus one.
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id 10 group 1 type resilient buckets 2
# strace -e sendto,recvmsg -s 5 ip nexthop bucket
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
id 10 index 0 idle_time 6.66 nhid 1
id 10 index 1 idle_time 6.66 nhid 1
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
+++ exited with 0 +++
This behavior is both inefficient and buggy. If the last nexthop to be
dumped had the maximum ID of 0xffffffff, then the dump will restart from
0 (0xffffffff + 1) and never end:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
# ip nexthop bucket
id 4294967295 index 0 idle_time 5.55 nhid 1
id 4294967295 index 1 idle_time 5.55 nhid 1
id 4294967295 index 0 idle_time 5.55 nhid 1
id 4294967295 index 1 idle_time 5.55 nhid 1
[...]
Fix by adjusting the dump callback to return zero when the dump is
complete. After the fix only one recvmsg() call is made and the
NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:
# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
# strace -e sendto,recvmsg -s 5 ip nexthop bucket
sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
id 4294967295 index 0 idle_time 6.61 nhid 1
id 4294967295 index 1 idle_time 6.61 nhid 1
+++ exited with 0 +++
Note that if the NLMSG_DONE message cannot be appended because of size
limitations, then another recvmsg() will be needed, but the core netlink
code will not invoke the dump callback and simply reply with a
NLMSG_DONE message since it knows that the callback previously returned
zero.
Add a test that fails before the fix:
# ./fib_nexthops.sh -t basic_res
[...]
TEST: Maximum nexthop ID dump [FAIL]
[...]
And passes after it:
# ./fib_nexthops.sh -t basic_res
[...]
TEST: Maximum nexthop ID dump [ OK ]
[...]
Fixes: 8a1bbabb034d ("nexthop: Add netlink handlers for bucket dump")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
---
net/ipv4/nexthop.c | 6 +-----
tools/testing/selftests/net/fib_nexthops.sh | 5 +++++
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index f365a4f63899..be5498f5dd31 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -3424,13 +3424,9 @@ static int rtm_dump_nexthop_bucket(struct sk_buff *skb,
if (err < 0) {
if (likely(skb->len))
- goto out;
- goto out_err;
+ err = skb->len;
}
-out:
- err = skb->len;
-out_err:
cb->seq = net->nexthop.seq;
nl_dump_check_consistent(cb, nlmsg_hdr(skb));
return err;
diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
index 10aa059b9f06..df8d90b51867 100755
--- a/tools/testing/selftests/net/fib_nexthops.sh
+++ b/tools/testing/selftests/net/fib_nexthops.sh
@@ -2206,6 +2206,11 @@ basic_res()
run_cmd "$IP nexthop bucket list fdb"
log_test $? 255 "Dump all nexthop buckets with invalid 'fdb' keyword"
+ # Dump should not loop endlessly when maximum nexthop ID is configured.
+ run_cmd "$IP nexthop add id $((2**32-1)) group 1/2 type resilient buckets 4"
+ run_cmd "timeout 5 $IP nexthop bucket"
+ log_test $? 0 "Maximum nexthop ID dump"
+
#
# resilient nexthop buckets get requests
#
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net 1/3] nexthop: Fix infinite nexthop dump when using maximum nexthop ID
2023-08-08 7:52 ` [PATCH net 1/3] nexthop: Fix infinite nexthop dump when using maximum nexthop ID Ido Schimmel
@ 2023-08-08 14:53 ` David Ahern
0 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2023-08-08 14:53 UTC (permalink / raw)
To: Ido Schimmel, netdev; +Cc: davem, kuba, pabeni, edumazet, petrm
On 8/8/23 1:52 AM, Ido Schimmel wrote:
> A netlink dump callback can return a positive number to signal that more
> information needs to be dumped or zero to signal that the dump is
> complete. In the second case, the core netlink code will append the
> NLMSG_DONE message to the skb in order to indicate to user space that
> the dump is complete.
>
> The nexthop dump callback always returns a positive number if nexthops
> were filled in the provided skb, even if the dump is complete. This
> means that a dump will span at least two recvmsg() calls as long as
> nexthops are present. In the last recvmsg() call the dump callback will
> not fill in any nexthops because the previous call indicated that the
> dump should restart from the last dumped nexthop ID plus one.
>
> # ip nexthop add id 1 blackhole
> # strace -e sendto,recvmsg -s 5 ip nexthop
> sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394315, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 36
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 1], {nla_len=4, nla_type=NHA_BLACKHOLE}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
> id 1 blackhole
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
> +++ exited with 0 +++
>
> This behavior is both inefficient and buggy. If the last nexthop to be
> dumped had the maximum ID of 0xffffffff, then the dump will restart from
> 0 (0xffffffff + 1) and never end:
>
> # ip nexthop add id $((2**32-1)) blackhole
> # ip nexthop
> id 4294967295 blackhole
> id 4294967295 blackhole
> [...]
>
> Fix by adjusting the dump callback to return zero when the dump is
> complete. After the fix only one recvmsg() call is made and the
> NLMSG_DONE message is appended to the RTM_NEWNEXTHOP response:
>
> # ip nexthop add id $((2**32-1)) blackhole
> # strace -e sendto,recvmsg -s 5 ip nexthop
> sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394080, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 56
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 4294967295], {nla_len=4, nla_type=NHA_BLACKHOLE}]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 56
> id 4294967295 blackhole
> +++ exited with 0 +++
>
> Note that if the NLMSG_DONE message cannot be appended because of size
> limitations, then another recvmsg() will be needed, but the core netlink
> code will not invoke the dump callback and simply reply with a
> NLMSG_DONE message since it knows that the callback previously returned
> zero.
>
> Add a test that fails before the fix:
>
> # ./fib_nexthops.sh -t basic
> [...]
> TEST: Maximum nexthop ID dump [FAIL]
> [...]
>
> And passes after it:
>
> # ./fib_nexthops.sh -t basic
> [...]
> TEST: Maximum nexthop ID dump [ OK ]
> [...]
>
> Fixes: ab84be7e54fc ("net: Initial nexthop code")
> Reported-by: Petr Machata <petrm@nvidia.com>
> Closes: https://lore.kernel.org/netdev/87sf91enuf.fsf@nvidia.com/
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> ---
> net/ipv4/nexthop.c | 6 +-----
> tools/testing/selftests/net/fib_nexthops.sh | 5 +++++
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net 2/3] nexthop: Make nexthop bucket dump more efficient
2023-08-08 7:52 ` [PATCH net 2/3] nexthop: Make nexthop bucket dump more efficient Ido Schimmel
@ 2023-08-08 14:53 ` David Ahern
0 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2023-08-08 14:53 UTC (permalink / raw)
To: Ido Schimmel, netdev; +Cc: davem, kuba, pabeni, edumazet, petrm
On 8/8/23 1:52 AM, Ido Schimmel wrote:
> rtm_dump_nexthop_bucket_nh() is used to dump nexthop buckets belonging
> to a specific resilient nexthop group. The function returns a positive
> return code (the skb length) upon both success and failure.
>
> The above behavior is problematic. When a complete nexthop bucket dump
> is requested, the function that walks the different nexthops treats the
> non-zero return code as an error. This causes buckets belonging to
> different resilient nexthop groups to be dumped using different buffers
> even if they can all fit in the same buffer:
>
> # ip link add name dummy1 up type dummy
> # ip nexthop add id 1 dev dummy1
> # ip nexthop add id 10 group 1 type resilient buckets 1
> # ip nexthop add id 20 group 1 type resilient buckets 1
> # strace -e recvmsg -s 0 ip nexthop bucket
> [...]
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
> id 10 index 0 idle_time 10.27 nhid 1
> [...]
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
> id 20 index 0 idle_time 6.44 nhid 1
> [...]
>
> Fix by only returning a non-zero return code when an error occurred and
> restarting the dump from the bucket index we failed to fill in. This
> allows buckets belonging to different resilient nexthop groups to be
> dumped using the same buffer:
>
> # ip link add name dummy1 up type dummy
> # ip nexthop add id 1 dev dummy1
> # ip nexthop add id 10 group 1 type resilient buckets 1
> # ip nexthop add id 20 group 1 type resilient buckets 1
> # strace -e recvmsg -s 0 ip nexthop bucket
> [...]
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
> id 10 index 0 idle_time 30.21 nhid 1
> id 20 index 0 idle_time 26.7 nhid 1
> [...]
>
> While this change is more of a performance improvement change than an
> actual bug fix, it is a prerequisite for a subsequent patch that does
> fix a bug.
>
> Fixes: 8a1bbabb034d ("nexthop: Add netlink handlers for bucket dump")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> ---
> net/ipv4/nexthop.c | 16 +++++-----------
> 1 file changed, 5 insertions(+), 11 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net 3/3] nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
2023-08-08 7:52 ` [PATCH net 3/3] nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID Ido Schimmel
@ 2023-08-08 14:54 ` David Ahern
0 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2023-08-08 14:54 UTC (permalink / raw)
To: Ido Schimmel, netdev; +Cc: davem, kuba, pabeni, edumazet, petrm
On 8/8/23 1:52 AM, Ido Schimmel wrote:
> A netlink dump callback can return a positive number to signal that more
> information needs to be dumped or zero to signal that the dump is
> complete. In the second case, the core netlink code will append the
> NLMSG_DONE message to the skb in order to indicate to user space that
> the dump is complete.
>
> The nexthop bucket dump callback always returns a positive number if
> nexthop buckets were filled in the provided skb, even if the dump is
> complete. This means that a dump will span at least two recvmsg() calls
> as long as nexthop buckets are present. In the last recvmsg() call the
> dump callback will not fill in any nexthop buckets because the previous
> call indicated that the dump should restart from the last dumped nexthop
> ID plus one.
>
> # ip link add name dummy1 up type dummy
> # ip nexthop add id 1 dev dummy1
> # ip nexthop add id 10 group 1 type resilient buckets 2
> # strace -e sendto,recvmsg -s 5 ip nexthop bucket
> sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
> id 10 index 0 idle_time 6.66 nhid 1
> id 10 index 1 idle_time 6.66 nhid 1
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
> +++ exited with 0 +++
>
> This behavior is both inefficient and buggy. If the last nexthop to be
> dumped had the maximum ID of 0xffffffff, then the dump will restart from
> 0 (0xffffffff + 1) and never end:
>
> # ip link add name dummy1 up type dummy
> # ip nexthop add id 1 dev dummy1
> # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
> # ip nexthop bucket
> id 4294967295 index 0 idle_time 5.55 nhid 1
> id 4294967295 index 1 idle_time 5.55 nhid 1
> id 4294967295 index 0 idle_time 5.55 nhid 1
> id 4294967295 index 1 idle_time 5.55 nhid 1
> [...]
>
> Fix by adjusting the dump callback to return zero when the dump is
> complete. After the fix only one recvmsg() call is made and the
> NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:
>
> # ip link add name dummy1 up type dummy
> # ip nexthop add id 1 dev dummy1
> # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
> # strace -e sendto,recvmsg -s 5 ip nexthop bucket
> sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
> id 4294967295 index 0 idle_time 6.61 nhid 1
> id 4294967295 index 1 idle_time 6.61 nhid 1
> +++ exited with 0 +++
>
> Note that if the NLMSG_DONE message cannot be appended because of size
> limitations, then another recvmsg() will be needed, but the core netlink
> code will not invoke the dump callback and simply reply with a
> NLMSG_DONE message since it knows that the callback previously returned
> zero.
>
> Add a test that fails before the fix:
>
> # ./fib_nexthops.sh -t basic_res
> [...]
> TEST: Maximum nexthop ID dump [FAIL]
> [...]
>
> And passes after it:
>
> # ./fib_nexthops.sh -t basic_res
> [...]
> TEST: Maximum nexthop ID dump [ OK ]
> [...]
>
> Fixes: 8a1bbabb034d ("nexthop: Add netlink handlers for bucket dump")
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> ---
> net/ipv4/nexthop.c | 6 +-----
> tools/testing/selftests/net/fib_nexthops.sh | 5 +++++
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net 0/3] nexthop: Nexthop dump fixes
2023-08-08 7:52 [PATCH net 0/3] nexthop: Nexthop dump fixes Ido Schimmel
` (2 preceding siblings ...)
2023-08-08 7:52 ` [PATCH net 3/3] nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID Ido Schimmel
@ 2023-08-09 21:00 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-08-09 21:00 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, davem, kuba, pabeni, edumazet, dsahern, petrm
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 8 Aug 2023 10:52:30 +0300 you wrote:
> Patches #1 and #3 fix two problems related to nexthops and nexthop
> buckets dump, respectively. Patch #2 is a preparation for the third
> patch.
>
> The pattern described in these patches of splitting the NLMSG_DONE to a
> separate response is prevalent in other rtnetlink dump callbacks. I
> don't know if it's because I'm missing something or if this was done
> intentionally to ensure the message is delivered to user space. After
> commit 0642840b8bb0 ("af_netlink: ensure that NLMSG_DONE never fails in
> dumps") this is no longer necessary and I can improve these dump
> callbacks assuming this analysis is correct.
>
> [...]
Here is the summary with links:
- [net,1/3] nexthop: Fix infinite nexthop dump when using maximum nexthop ID
https://git.kernel.org/netdev/net/c/913f60cacda7
- [net,2/3] nexthop: Make nexthop bucket dump more efficient
https://git.kernel.org/netdev/net/c/f10d3d9df49d
- [net,3/3] nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
https://git.kernel.org/netdev/net/c/8743aeff5bc4
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-09 21:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-08 7:52 [PATCH net 0/3] nexthop: Nexthop dump fixes Ido Schimmel
2023-08-08 7:52 ` [PATCH net 1/3] nexthop: Fix infinite nexthop dump when using maximum nexthop ID Ido Schimmel
2023-08-08 14:53 ` David Ahern
2023-08-08 7:52 ` [PATCH net 2/3] nexthop: Make nexthop bucket dump more efficient Ido Schimmel
2023-08-08 14:53 ` David Ahern
2023-08-08 7:52 ` [PATCH net 3/3] nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID Ido Schimmel
2023-08-08 14:54 ` David Ahern
2023-08-09 21:00 ` [PATCH net 0/3] nexthop: Nexthop dump fixes patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).