[PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm
@ 2026-01-22 14:56 Felix Maurer
  2026-01-22 14:56 ` [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP Felix Maurer
                   ` (10 more replies)
  0 siblings, 11 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

The duplicate discard algorithms for PRP and HSR do not work reliably
with certain link faults. Especially with packet loss on one link, the
duplicate discard algorithms drop valid packets. For a more thorough
description see patches 4 (for PRP) and 6 (for HSR).

This patchset replaces the current algorithms (based on a drop window
for PRP and highest seen sequence number for HSR) with a single new one
that tracks the received sequence numbers individually (descriptions
again in patches 4 and 6).

The changes will lead to higher memory usage and more work to do for
each packet. But I argue that this is an acceptable trade-off to make
for a more robust PRP and HSR behavior with faulty links. After all,
both protocols are to be used in environments where redundancy is needed
and people are willing to setup special network topologies to achieve
that.

Some more reasoning on the overhead and expected scale of the deployment
from the RFC discussion:

> As for the expected scale, there are two dimensions: the number of nodes
> in the network and the data rate with which they send.
>
> The number of nodes in the network affect the memory usage because each
> node now has the block buffer. For PRP that's 64 blocks * 32 byte =
> 2kbyte for each node in the node table. A PRP network doesn't have an
> explicit limit for the number of nodes. However, the whole network is a
> single layer-2 segment which shouldn't grow too large anyways. Even if
> one really tries to put 1000 nodes into the PRP network, the memory
> overhead (2Mbyte) is acceptable in my opinion.
>
> For HSR, the blocks would be larger because we need to track the
> sequence numbers per port. I expect 64 blocks * 80 byte = 5kbyte per
> node in the node table. There is no explicit limit for the size of an
> HSR ring either. But I expect them to be of limited size because the
> forwarding delays add up throughout the ring. I've seen vendors limiting
> the ring size to 50 nodes with 100Mbit/s links and 300 with 1Gbit/s
> links. In both cases I consider the memory overhead acceptable.
>
> The data rates are harder to reason about. In general, the data rates
> for HSR and PRP are limited because too high packet rates would lead to
> very fast re-use of the 16bit sequence numbers. The IEC 62439-3:2021
> mentions 100Mbit/s links and 1Gbit/s links. I don't expect HSR or PRP
> networks to scale out to, e.g., 10Gbit/s links with the current
> specification as this would mean that sequence numbers could repeat as
> often as every ~4ms. The default constants in the IEC standard, which we
> also use, are oriented at a 100Mbit/s network.
>
> In my tests with veth pairs, the CPU overhead didn't lead to
> significantly lower data rates. The main factor limiting the data rate
> at the moment, I assume, is the per-node spinlock that is taken for each
> received packet. IMHO, there is a lot more to gain in terms of CPU
> overhead from making this lock smaller or getting rid of it, than we
> loose with the more accurate duplicate discard algorithm in this patchset.
>
> The CPU overhead of the algorithm benefits from the fact that in high
> packet rate scenarios (where it really matters) many packets will have
> sequence numbers in already initialized blocks. These packets just have
> additionally: one xarray lookup, one comparison, and one bit setting. If
> a block needs to be initialized (once every 128 packets plus their 128
> duplicates if all sequence numbers are seen), we will have: one
> xa_erase, a bunch of memory writes, and one xa_store.
>
> In theory, all packets could end up in the slow path if a node sends
> every 128th packet to us. If this is sent from a well behaving node, the
> packet rate wouldn't be an issue anymore, though.

A note about

Thanks,
   Felix

Signed-off-by: Felix Maurer <fmaurer@redhat.com>

---

Changes since v1:
  - link: https://lore.kernel.org/netdev/cover.1769001553.git.fmaurer@redhat.com/
  - Disable shellcheck for unassigned variables on the first use of each
    namespace from setup_ns (I thought this would be necessary for every
    use and therefore didn't do it in v1)
  - Address the netdev/ai-review remarks, they were all valid

Changes since the RFC:
  - link: https://lore.kernel.org/netdev/cover.1766433800.git.fmaurer@redhat.com/
  - Extended the new algorithm to HSR
  - shellcheck'ing and checkpatch'ing
  - Updated the KUnit test

Felix Maurer (9):
  selftests: hsr: Add ping test for PRP
  selftests: hsr: Check duplicates on HSR with VLAN
  selftests: hsr: Add tests for faulty links
  hsr: Implement more robust duplicate discard for PRP
  selftests: hsr: Add tests for more link faults with PRP
  hsr: Implement more robust duplicate discard for HSR
  selftests: hsr: Add more link fault tests for HSR
  hsr: Update PRP duplicate discard KUnit test for new algorithm
  MAINTAINERS: Assign hsr selftests to HSR

 MAINTAINERS                                   |   1 +
 net/hsr/hsr_framereg.c                        | 362 ++++++++++-------
 net/hsr/hsr_framereg.h                        |  39 +-
 net/hsr/prp_dup_discard_test.c                | 154 ++++---
 tools/testing/selftests/net/hsr/Makefile      |   2 +
 tools/testing/selftests/net/hsr/hsr_ping.sh   | 207 +++-------
 .../testing/selftests/net/hsr/link_faults.sh  | 378 ++++++++++++++++++
 tools/testing/selftests/net/hsr/prp_ping.sh   | 147 +++++++
 tools/testing/selftests/net/hsr/settings      |   2 +-
 9 files changed, 910 insertions(+), 382 deletions(-)
 create mode 100755 tools/testing/selftests/net/hsr/link_faults.sh
 create mode 100755 tools/testing/selftests/net/hsr/prp_ping.sh


base-commit: b6d5a62231acd628c4a989cad562fe37958b1218
-- 
2.52.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
@ 2026-01-22 14:56 ` Felix Maurer
  2026-01-29 11:05   ` Sebastian Andrzej Siewior
  2026-01-22 14:56 ` [PATCH net-next v2 2/9] selftests: hsr: Check duplicates on HSR with VLAN Felix Maurer
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

Add a selftest for PRP that performs a basic ping test on IPv4 and IPv6,
over the plain PRP interface and a VLAN interface, similar to the existing
ping test for HSR. The test first checks reachability of the other node,
then checks for no loss and no duplicates.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---
 tools/testing/selftests/net/hsr/Makefile    |   1 +
 tools/testing/selftests/net/hsr/prp_ping.sh | 147 ++++++++++++++++++++
 2 files changed, 148 insertions(+)
 create mode 100755 tools/testing/selftests/net/hsr/prp_ping.sh

diff --git a/tools/testing/selftests/net/hsr/Makefile b/tools/testing/selftests/net/hsr/Makefile
index 4b6afc0fe9f8..1886f345897a 100644
--- a/tools/testing/selftests/net/hsr/Makefile
+++ b/tools/testing/selftests/net/hsr/Makefile
@@ -5,6 +5,7 @@ top_srcdir = ../../../../..
 TEST_PROGS := \
 	hsr_ping.sh \
 	hsr_redbox.sh \
+	prp_ping.sh \
 # end of TEST_PROGS
 
 TEST_FILES += hsr_common.sh
diff --git a/tools/testing/selftests/net/hsr/prp_ping.sh b/tools/testing/selftests/net/hsr/prp_ping.sh
new file mode 100755
index 000000000000..fd2ba9f05d4c
--- /dev/null
+++ b/tools/testing/selftests/net/hsr/prp_ping.sh
@@ -0,0 +1,147 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+ipv6=true
+
+source ./hsr_common.sh
+
+optstring="h4"
+usage() {
+	echo "Usage: $0 [OPTION]"
+	echo -e "\t-4: IPv4 only: disable IPv6 tests (default: test both IPv4 and IPv6)"
+}
+
+while getopts "$optstring" option;do
+	case "$option" in
+	"h")
+		usage "$0"
+		exit 0
+		;;
+	"4")
+		ipv6=false
+		;;
+	"?")
+		usage "$0"
+		exit 1
+		;;
+esac
+done
+
+setup_prp_interfaces()
+{
+	echo "INFO: Preparing interfaces for PRP"
+# Two PRP nodes, connected by two links (treated as LAN A and LAN B).
+#
+#       vethA ----- vethA
+#     prp1             prp2
+#       vethB ----- vethB
+#
+#     node1           node2
+
+	# Interfaces
+	# shellcheck disable=SC2154 # variables assigned by setup_ns
+	ip link add vethA netns "$node1" type veth peer name vethA netns "$node2"
+	ip link add vethB netns "$node1" type veth peer name vethB netns "$node2"
+
+	# MAC addresses will be copied from LAN A interface
+	ip -net "$node1" link set address 00:11:22:00:00:01 dev vethA
+	ip -net "$node2" link set address 00:11:22:00:00:02 dev vethA
+
+	# PRP
+	ip -net "$node1" link add name prp1 type hsr \
+		slave1 vethA slave2 vethB supervision 45 proto 1
+	ip -net "$node2" link add name prp2 type hsr \
+		slave1 vethA slave2 vethB supervision 45 proto 1
+
+	# IP addresses
+	ip -net "$node1" addr add 100.64.0.1/24 dev prp1
+	ip -net "$node1" addr add dead:beef:0::1/64 dev prp1 nodad
+	ip -net "$node2" addr add 100.64.0.2/24 dev prp2
+	ip -net "$node2" addr add dead:beef:0::2/64 dev prp2 nodad
+
+	# All links up
+	ip -net "$node1" link set vethA up
+	ip -net "$node1" link set vethB up
+	ip -net "$node1" link set prp1 up
+
+	ip -net "$node2" link set vethA up
+	ip -net "$node2" link set vethB up
+	ip -net "$node2" link set prp2 up
+}
+
+setup_vlan_interfaces()
+{
+	# Interfaces
+	ip -net "$node1" link add link prp1 name prp1.2 type vlan id 2
+	ip -net "$node2" link add link prp2 name prp2.2 type vlan id 2
+
+	# IP addresses
+	ip -net "$node1" addr add 100.64.2.1/24 dev prp1.2
+	ip -net "$node1" addr add dead:beef:2::1/64 dev prp1.2 nodad
+
+	ip -net "$node2" addr add 100.64.2.2/24 dev prp2.2
+	ip -net "$node2" addr add dead:beef:2::2/64 dev prp2.2 nodad
+
+	# All links up
+	ip -net "$node1" link set prp1.2 up
+	ip -net "$node2" link set prp2.2 up
+}
+
+do_ping_tests()
+{
+	local netid="$1"
+
+	echo "INFO: Initial validation ping"
+
+	do_ping "$node1" "100.64.$netid.2"
+	do_ping "$node2" "100.64.$netid.1"
+	stop_if_error "Initial validation failed on IPv4"
+
+	do_ping "$node1" "dead:beef:$netid::2"
+	do_ping "$node2" "dead:beef:$netid::1"
+	stop_if_error "Initial validation failed on IPv6"
+
+	echo "INFO: Longer ping test."
+
+	do_ping_long "$node1" "100.64.$netid.2"
+	do_ping_long "$node2" "100.64.$netid.1"
+	stop_if_error "Longer ping test failed on IPv4."
+
+	do_ping_long "$node1" "dead:beef:$netid::2"
+	do_ping_long "$node2" "dead:beef:$netid::1"
+	stop_if_error "Longer ping test failed on IPv6."
+}
+
+run_ping_tests()
+{
+	echo "INFO: Running ping tests"
+	do_ping_tests 0
+}
+
+run_vlan_ping_tests()
+{
+	vlan_challenged_prp1=$(ip net exec "$node1" ethtool -k prp1 | \
+		grep "vlan-challenged" | awk '{print $2}')
+	vlan_challenged_prp2=$(ip net exec "$node2" ethtool -k prp2 | \
+		grep "vlan-challenged" | awk '{print $2}')
+
+	if [[ "$vlan_challenged_prp1" = "off" || \
+	      "$vlan_challenged_prp2" = "off" ]]; then
+		echo "INFO: Running VLAN ping tests"
+		setup_vlan_interfaces
+		do_ping_tests 2
+	else
+		echo "INFO: Not Running VLAN tests as the device does not support VLAN"
+	fi
+}
+
+check_prerequisites
+trap cleanup_all_ns EXIT
+
+setup_ns node1 node2
+setup_prp_interfaces
+
+run_ping_tests
+run_vlan_ping_tests
+
+exit $ret
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 2/9] selftests: hsr: Check duplicates on HSR with VLAN
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
  2026-01-22 14:56 ` [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP Felix Maurer
@ 2026-01-22 14:56 ` Felix Maurer
  2026-01-22 14:56 ` [PATCH net-next v2 3/9] selftests: hsr: Add tests for faulty links Felix Maurer
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

Previously the hsr_ping test only checked that all nodes in a VLAN are
reachable (using do_ping). Update the test to also check that there is no
packet loss and no duplicate packets by running the same tests for VLANs as
without VLANs (including using do_ping_long). This also adds tests for IPv6
over VLAN. To unify the test code, the topology without VLANs now uses IP
addresses from dead:beef:0::/64 to align with the 100.64.0.0/24 range for
IPv4. Error messages are updated across the board to make it easier to find
what actually failed.

Also update the VLAN test to only run in VLAN 2, as there is no need to
check if ping really works with VLAN IDs 2, 3, 4, and 5. This lowers the
number of long ping tests on VLANs to keep the overall test runtime in
bounds.

It's still necessary to bump the test timeout a bit, though: a ping long
tests takes 1sec, do_ping_tests performs 12 of them, do_link_problem_tests
6, and the VLAN tests again 12. With some buffer for setup and waiting and
for two protocol versions, 90sec timeout seems reasonable.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---
 tools/testing/selftests/net/hsr/hsr_ping.sh | 188 +++++++-------------
 tools/testing/selftests/net/hsr/settings    |   2 +-
 2 files changed, 70 insertions(+), 120 deletions(-)

diff --git a/tools/testing/selftests/net/hsr/hsr_ping.sh b/tools/testing/selftests/net/hsr/hsr_ping.sh
index 5a65f4f836be..ebee4a18fc67 100755
--- a/tools/testing/selftests/net/hsr/hsr_ping.sh
+++ b/tools/testing/selftests/net/hsr/hsr_ping.sh
@@ -27,31 +27,34 @@ while getopts "$optstring" option;do
 esac
 done
 
-do_complete_ping_test()
+do_ping_tests()
 {
-	echo "INFO: Initial validation ping."
-	# Each node has to be able each one.
-	do_ping "$ns1" 100.64.0.2
-	do_ping "$ns2" 100.64.0.1
-	do_ping "$ns3" 100.64.0.1
-	stop_if_error "Initial validation failed."
-
-	do_ping "$ns1" 100.64.0.3
-	do_ping "$ns2" 100.64.0.3
-	do_ping "$ns3" 100.64.0.2
+	local netid="$1"
 
-	do_ping "$ns1" dead:beef:1::2
-	do_ping "$ns1" dead:beef:1::3
-	do_ping "$ns2" dead:beef:1::1
-	do_ping "$ns2" dead:beef:1::2
-	do_ping "$ns3" dead:beef:1::1
-	do_ping "$ns3" dead:beef:1::2
+	echo "INFO: Running ping tests."
 
-	stop_if_error "Initial validation failed."
+	echo "INFO: Initial validation ping."
+	# Each node has to be able to reach each one.
+	do_ping "$ns1" "100.64.$netid.2"
+	do_ping "$ns1" "100.64.$netid.3"
+	do_ping "$ns2" "100.64.$netid.1"
+	do_ping "$ns2" "100.64.$netid.3"
+	do_ping "$ns3" "100.64.$netid.1"
+	do_ping "$ns3" "100.64.$netid.2"
+	stop_if_error "Initial validation failed on IPv4."
+
+	do_ping "$ns1" "dead:beef:$netid::2"
+	do_ping "$ns1" "dead:beef:$netid::3"
+	do_ping "$ns2" "dead:beef:$netid::1"
+	do_ping "$ns2" "dead:beef:$netid::2"
+	do_ping "$ns3" "dead:beef:$netid::1"
+	do_ping "$ns3" "dead:beef:$netid::2"
+	stop_if_error "Initial validation failed on IPv6."
 
 # Wait until supervisor all supervision frames have been processed and the node
 # entries have been merged. Otherwise duplicate frames will be observed which is
 # valid at this stage.
+	echo "INFO: Wait for node table entries to be merged."
 	WAIT=5
 	while [ ${WAIT} -gt 0 ]
 	do
@@ -68,24 +71,28 @@ do_complete_ping_test()
 	sleep 1
 
 	echo "INFO: Longer ping test."
-	do_ping_long "$ns1" 100.64.0.2
-	do_ping_long "$ns1" dead:beef:1::2
-	do_ping_long "$ns1" 100.64.0.3
-	do_ping_long "$ns1" dead:beef:1::3
-
-	stop_if_error "Longer ping test failed."
-
-	do_ping_long "$ns2" 100.64.0.1
-	do_ping_long "$ns2" dead:beef:1::1
-	do_ping_long "$ns2" 100.64.0.3
-	do_ping_long "$ns2" dead:beef:1::2
-	stop_if_error "Longer ping test failed."
+	do_ping_long "$ns1" "100.64.$netid.2"
+	do_ping_long "$ns1" "dead:beef:$netid::2"
+	do_ping_long "$ns1" "100.64.$netid.3"
+	do_ping_long "$ns1" "dead:beef:$netid::3"
+	stop_if_error "Longer ping test failed (ns1)."
+
+	do_ping_long "$ns2" "100.64.$netid.1"
+	do_ping_long "$ns2" "dead:beef:$netid::1"
+	do_ping_long "$ns2" "100.64.$netid.3"
+	do_ping_long "$ns2" "dead:beef:$netid::3"
+	stop_if_error "Longer ping test failed (ns2)."
+
+	do_ping_long "$ns3" "100.64.$netid.1"
+	do_ping_long "$ns3" "dead:beef:$netid::1"
+	do_ping_long "$ns3" "100.64.$netid.2"
+	do_ping_long "$ns3" "dead:beef:$netid::2"
+	stop_if_error "Longer ping test failed (ns3)."
+}
 
-	do_ping_long "$ns3" 100.64.0.1
-	do_ping_long "$ns3" dead:beef:1::1
-	do_ping_long "$ns3" 100.64.0.2
-	do_ping_long "$ns3" dead:beef:1::2
-	stop_if_error "Longer ping test failed."
+do_link_problem_tests()
+{
+	echo "INFO: Running link problem tests."
 
 	echo "INFO: Cutting one link."
 	do_ping_long "$ns1" 100.64.0.3 &
@@ -104,26 +111,22 @@ do_complete_ping_test()
 
 	do_ping_long "$ns1" 100.64.0.2
 	do_ping_long "$ns1" 100.64.0.3
-
-	stop_if_error "Failed with delay and packetloss."
+	stop_if_error "Failed with delay and packetloss (ns1)."
 
 	do_ping_long "$ns2" 100.64.0.1
 	do_ping_long "$ns2" 100.64.0.3
-
-	stop_if_error "Failed with delay and packetloss."
+	stop_if_error "Failed with delay and packetloss (ns2)."
 
 	do_ping_long "$ns3" 100.64.0.1
 	do_ping_long "$ns3" 100.64.0.2
-	stop_if_error "Failed with delay and packetloss."
-
-	echo "INFO: All good."
+	stop_if_error "Failed with delay and packetloss (ns3)."
 }
 
 setup_hsr_interfaces()
 {
 	local HSRv="$1"
 
-	echo "INFO: preparing interfaces for HSRv${HSRv}."
+	echo "INFO: Preparing interfaces for HSRv${HSRv}."
 # Three HSR nodes. Each node has one link to each of its neighbour, two links in total.
 #
 #    ns1eth1 ----- ns2eth1
@@ -140,17 +143,20 @@ setup_hsr_interfaces()
 	ip link add ns3eth2 netns "$ns3" type veth peer name ns2eth2 netns "$ns2"
 
 	# HSRv0/1
-	ip -net "$ns1" link add name hsr1 type hsr slave1 ns1eth1 slave2 ns1eth2 supervision 45 version $HSRv proto 0
-	ip -net "$ns2" link add name hsr2 type hsr slave1 ns2eth1 slave2 ns2eth2 supervision 45 version $HSRv proto 0
-	ip -net "$ns3" link add name hsr3 type hsr slave1 ns3eth1 slave2 ns3eth2 supervision 45 version $HSRv proto 0
+	ip -net "$ns1" link add name hsr1 type hsr slave1 ns1eth1 \
+		slave2 ns1eth2 supervision 45 version "$HSRv" proto 0
+	ip -net "$ns2" link add name hsr2 type hsr slave1 ns2eth1 \
+		slave2 ns2eth2 supervision 45 version "$HSRv" proto 0
+	ip -net "$ns3" link add name hsr3 type hsr slave1 ns3eth1 \
+		slave2 ns3eth2 supervision 45 version "$HSRv" proto 0
 
 	# IP for HSR
 	ip -net "$ns1" addr add 100.64.0.1/24 dev hsr1
-	ip -net "$ns1" addr add dead:beef:1::1/64 dev hsr1 nodad
+	ip -net "$ns1" addr add dead:beef:0::1/64 dev hsr1 nodad
 	ip -net "$ns2" addr add 100.64.0.2/24 dev hsr2
-	ip -net "$ns2" addr add dead:beef:1::2/64 dev hsr2 nodad
+	ip -net "$ns2" addr add dead:beef:0::2/64 dev hsr2 nodad
 	ip -net "$ns3" addr add 100.64.0.3/24 dev hsr3
-	ip -net "$ns3" addr add dead:beef:1::3/64 dev hsr3 nodad
+	ip -net "$ns3" addr add dead:beef:0::3/64 dev hsr3 nodad
 
 	ip -net "$ns1" link set address 00:11:22:00:01:01 dev ns1eth1
 	ip -net "$ns1" link set address 00:11:22:00:01:02 dev ns1eth2
@@ -177,85 +183,33 @@ setup_hsr_interfaces()
 
 setup_vlan_interfaces() {
 	ip -net "$ns1" link add link hsr1 name hsr1.2 type vlan id 2
-	ip -net "$ns1" link add link hsr1 name hsr1.3 type vlan id 3
-	ip -net "$ns1" link add link hsr1 name hsr1.4 type vlan id 4
-	ip -net "$ns1" link add link hsr1 name hsr1.5 type vlan id 5
-
 	ip -net "$ns2" link add link hsr2 name hsr2.2 type vlan id 2
-	ip -net "$ns2" link add link hsr2 name hsr2.3 type vlan id 3
-	ip -net "$ns2" link add link hsr2 name hsr2.4 type vlan id 4
-	ip -net "$ns2" link add link hsr2 name hsr2.5 type vlan id 5
-
 	ip -net "$ns3" link add link hsr3 name hsr3.2 type vlan id 2
-	ip -net "$ns3" link add link hsr3 name hsr3.3 type vlan id 3
-	ip -net "$ns3" link add link hsr3 name hsr3.4 type vlan id 4
-	ip -net "$ns3" link add link hsr3 name hsr3.5 type vlan id 5
 
 	ip -net "$ns1" addr add 100.64.2.1/24 dev hsr1.2
-	ip -net "$ns1" addr add 100.64.3.1/24 dev hsr1.3
-	ip -net "$ns1" addr add 100.64.4.1/24 dev hsr1.4
-	ip -net "$ns1" addr add 100.64.5.1/24 dev hsr1.5
+	ip -net "$ns1" addr add dead:beef:2::1/64 dev hsr1.2 nodad
 
 	ip -net "$ns2" addr add 100.64.2.2/24 dev hsr2.2
-	ip -net "$ns2" addr add 100.64.3.2/24 dev hsr2.3
-	ip -net "$ns2" addr add 100.64.4.2/24 dev hsr2.4
-	ip -net "$ns2" addr add 100.64.5.2/24 dev hsr2.5
+	ip -net "$ns2" addr add dead:beef:2::2/64 dev hsr2.2 nodad
 
 	ip -net "$ns3" addr add 100.64.2.3/24 dev hsr3.2
-	ip -net "$ns3" addr add 100.64.3.3/24 dev hsr3.3
-	ip -net "$ns3" addr add 100.64.4.3/24 dev hsr3.4
-	ip -net "$ns3" addr add 100.64.5.3/24 dev hsr3.5
+	ip -net "$ns3" addr add dead:beef:2::3/64 dev hsr3.2 nodad
 
 	ip -net "$ns1" link set dev hsr1.2 up
-	ip -net "$ns1" link set dev hsr1.3 up
-	ip -net "$ns1" link set dev hsr1.4 up
-	ip -net "$ns1" link set dev hsr1.5 up
-
 	ip -net "$ns2" link set dev hsr2.2 up
-	ip -net "$ns2" link set dev hsr2.3 up
-	ip -net "$ns2" link set dev hsr2.4 up
-	ip -net "$ns2" link set dev hsr2.5 up
-
 	ip -net "$ns3" link set dev hsr3.2 up
-	ip -net "$ns3" link set dev hsr3.3 up
-	ip -net "$ns3" link set dev hsr3.4 up
-	ip -net "$ns3" link set dev hsr3.5 up
 
 }
 
-hsr_vlan_ping() {
-	do_ping "$ns1" 100.64.2.2
-	do_ping "$ns1" 100.64.3.2
-	do_ping "$ns1" 100.64.4.2
-	do_ping "$ns1" 100.64.5.2
-
-	do_ping "$ns1" 100.64.2.3
-	do_ping "$ns1" 100.64.3.3
-	do_ping "$ns1" 100.64.4.3
-	do_ping "$ns1" 100.64.5.3
-
-	do_ping "$ns2" 100.64.2.1
-	do_ping "$ns2" 100.64.3.1
-	do_ping "$ns2" 100.64.4.1
-	do_ping "$ns2" 100.64.5.1
-
-	do_ping "$ns2" 100.64.2.3
-	do_ping "$ns2" 100.64.3.3
-	do_ping "$ns2" 100.64.4.3
-	do_ping "$ns2" 100.64.5.3
-
-	do_ping "$ns3" 100.64.2.1
-	do_ping "$ns3" 100.64.3.1
-	do_ping "$ns3" 100.64.4.1
-	do_ping "$ns3" 100.64.5.1
-
-	do_ping "$ns3" 100.64.2.2
-	do_ping "$ns3" 100.64.3.2
-	do_ping "$ns3" 100.64.4.2
-	do_ping "$ns3" 100.64.5.2
+run_complete_ping_tests()
+{
+	echo "INFO: Running complete ping tests."
+	do_ping_tests 0
+	do_link_problem_tests
 }
 
-run_vlan_tests() {
+run_vlan_tests()
+{
 	vlan_challenged_hsr1=$(ip net exec "$ns1" ethtool -k hsr1 | grep "vlan-challenged" | awk '{print $2}')
 	vlan_challenged_hsr2=$(ip net exec "$ns2" ethtool -k hsr2 | grep "vlan-challenged" | awk '{print $2}')
 	vlan_challenged_hsr3=$(ip net exec "$ns3" ethtool -k hsr3 | grep "vlan-challenged" | awk '{print $2}')
@@ -263,27 +217,23 @@ run_vlan_tests() {
 	if [[ "$vlan_challenged_hsr1" = "off" || "$vlan_challenged_hsr2" = "off" || "$vlan_challenged_hsr3" = "off" ]]; then
 		echo "INFO: Running VLAN tests"
 		setup_vlan_interfaces
-		hsr_vlan_ping
+		do_ping_tests 2
 	else
 		echo "INFO: Not Running VLAN tests as the device does not support VLAN"
 	fi
 }
 
 check_prerequisites
-setup_ns ns1 ns2 ns3
-
 trap cleanup_all_ns EXIT
 
+setup_ns ns1 ns2 ns3
 setup_hsr_interfaces 0
-do_complete_ping_test
-
+run_complete_ping_tests
 run_vlan_tests
 
 setup_ns ns1 ns2 ns3
-
 setup_hsr_interfaces 1
-do_complete_ping_test
-
+run_complete_ping_tests
 run_vlan_tests
 
 exit $ret
diff --git a/tools/testing/selftests/net/hsr/settings b/tools/testing/selftests/net/hsr/settings
index 0fbc037f2aa8..ba4d85f74cd6 100644
--- a/tools/testing/selftests/net/hsr/settings
+++ b/tools/testing/selftests/net/hsr/settings
@@ -1 +1 @@
-timeout=50
+timeout=90
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 3/9] selftests: hsr: Add tests for faulty links
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
  2026-01-22 14:56 ` [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP Felix Maurer
  2026-01-22 14:56 ` [PATCH net-next v2 2/9] selftests: hsr: Check duplicates on HSR with VLAN Felix Maurer
@ 2026-01-22 14:56 ` Felix Maurer
  2026-01-22 14:56 ` [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP Felix Maurer
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

Add a test case that can support different types of faulty links for all
protocol versions (HSRv0, HSRv1, PRPv1). It starts with a baseline with
fully functional links. The first faulty case is one link being cut during
the ping. This test uses a different function for ping that sends more
packets in shorter intervals to stress the duplicate detection algorithms a
bit more and allow for future tests with other link faults (packet loss,
reordering, etc.).

As the link fault tests now cover the cut link for HSR and PRP, it can be
removed from the hsr_ping test. Note that the removed cut link test did not
really test the fault because do_ping_long takes about 1sec while the link
is only cut after a 3sec sleep.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---
 tools/testing/selftests/net/hsr/Makefile      |   1 +
 tools/testing/selftests/net/hsr/hsr_ping.sh   |  11 -
 .../testing/selftests/net/hsr/link_faults.sh  | 271 ++++++++++++++++++
 3 files changed, 272 insertions(+), 11 deletions(-)
 create mode 100755 tools/testing/selftests/net/hsr/link_faults.sh

diff --git a/tools/testing/selftests/net/hsr/Makefile b/tools/testing/selftests/net/hsr/Makefile
index 1886f345897a..31fb9326cf53 100644
--- a/tools/testing/selftests/net/hsr/Makefile
+++ b/tools/testing/selftests/net/hsr/Makefile
@@ -5,6 +5,7 @@ top_srcdir = ../../../../..
 TEST_PROGS := \
 	hsr_ping.sh \
 	hsr_redbox.sh \
+	link_faults.sh \
 	prp_ping.sh \
 # end of TEST_PROGS
 
diff --git a/tools/testing/selftests/net/hsr/hsr_ping.sh b/tools/testing/selftests/net/hsr/hsr_ping.sh
index ebee4a18fc67..0ec71b20ab75 100755
--- a/tools/testing/selftests/net/hsr/hsr_ping.sh
+++ b/tools/testing/selftests/net/hsr/hsr_ping.sh
@@ -94,17 +94,6 @@ do_link_problem_tests()
 {
 	echo "INFO: Running link problem tests."
 
-	echo "INFO: Cutting one link."
-	do_ping_long "$ns1" 100.64.0.3 &
-
-	sleep 3
-	ip -net "$ns3" link set ns3eth1 down
-	wait
-
-	ip -net "$ns3" link set ns3eth1 up
-
-	stop_if_error "Failed with one link down."
-
 	echo "INFO: Delay the link and drop a few packages."
 	tc -net "$ns3" qdisc add dev ns3eth1 root netem delay 50ms
 	tc -net "$ns2" qdisc add dev ns2eth1 root netem delay 5ms loss 25%
diff --git a/tools/testing/selftests/net/hsr/link_faults.sh b/tools/testing/selftests/net/hsr/link_faults.sh
new file mode 100755
index 000000000000..7ff14dcd32e7
--- /dev/null
+++ b/tools/testing/selftests/net/hsr/link_faults.sh
@@ -0,0 +1,271 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# shellcheck disable=SC2329
+
+source ../lib.sh
+
+ALL_TESTS="
+	test_clean_hsrv0
+	test_cut_link_hsrv0
+	test_clean_hsrv1
+	test_cut_link_hsrv1
+	test_clean_prp
+	test_cut_link_prp
+"
+
+# The tests are running ping for 5sec with a relatively short interval with a
+# cut link, which should be recoverable by HSR/PRP.
+
+setup_hsr_topo()
+{
+	# Three HSR nodes in a ring, every node has a LAN A interface connected
+	# to the LAN B interface of the next node.
+	#
+	#    node1            node2
+	#
+	#     vethA -------- vethB
+	#   hsr1                 hsr2
+	#     vethB          vethA
+	#         \          /
+	#         vethA  vethB
+	#             hsr3
+	#
+	#            node3
+
+	local ver="$1"
+
+	setup_ns node1 node2 node3
+
+	# veth links
+	# shellcheck disable=SC2154 # variables assigned by setup_ns
+	ip link add vethA netns "$node1" type veth peer name vethB netns "$node2"
+	# shellcheck disable=SC2154 # variables assigned by setup_ns
+	ip link add vethA netns "$node2" type veth peer name vethB netns "$node3"
+	ip link add vethA netns "$node3" type veth peer name vethB netns "$node1"
+
+	# MAC addresses (not needed for HSR operation, but helps with debugging)
+	ip -net "$node1" link set address 00:11:22:00:01:01 dev vethA
+	ip -net "$node1" link set address 00:11:22:00:01:02 dev vethB
+
+	ip -net "$node2" link set address 00:11:22:00:02:01 dev vethA
+	ip -net "$node2" link set address 00:11:22:00:02:02 dev vethB
+
+	ip -net "$node3" link set address 00:11:22:00:03:01 dev vethA
+	ip -net "$node3" link set address 00:11:22:00:03:02 dev vethB
+
+	# HSR interfaces
+	ip -net "$node1" link add name hsr1 type hsr proto 0 version "$ver" \
+		slave1 vethA slave2 vethB supervision 45
+	ip -net "$node2" link add name hsr2 type hsr proto 0 version "$ver" \
+		slave1 vethA slave2 vethB supervision 45
+	ip -net "$node3" link add name hsr3 type hsr proto 0 version "$ver" \
+		slave1 vethA slave2 vethB supervision 45
+
+	# IP addresses
+	ip -net "$node1" addr add 100.64.0.1/24 dev hsr1
+	ip -net "$node2" addr add 100.64.0.2/24 dev hsr2
+	ip -net "$node3" addr add 100.64.0.3/24 dev hsr3
+
+	# Set all links up
+	ip -net "$node1" link set vethA up
+	ip -net "$node1" link set vethB up
+	ip -net "$node1" link set hsr1 up
+
+	ip -net "$node2" link set vethA up
+	ip -net "$node2" link set vethB up
+	ip -net "$node2" link set hsr2 up
+
+	ip -net "$node3" link set vethA up
+	ip -net "$node3" link set vethB up
+	ip -net "$node3" link set hsr3 up
+}
+
+setup_prp_topo()
+{
+	# Two PRP nodes, connected by two links (treated as LAN A and LAN B).
+	#
+	#       vethA ----- vethA
+	#     prp1             prp2
+	#       vethB ----- vethB
+	#
+	#     node1           node2
+
+	setup_ns node1 node2
+
+	# veth links
+	ip link add vethA netns "$node1" type veth peer name vethA netns "$node2"
+	ip link add vethB netns "$node1" type veth peer name vethB netns "$node2"
+
+	# MAC addresses will be copied from LAN A interface
+	ip -net "$node1" link set address 00:11:22:00:00:01 dev vethA
+	ip -net "$node2" link set address 00:11:22:00:00:02 dev vethA
+
+	# PRP interfaces
+	ip -net "$node1" link add name prp1 type hsr \
+		slave1 vethA slave2 vethB supervision 45 proto 1
+	ip -net "$node2" link add name prp2 type hsr \
+		slave1 vethA slave2 vethB supervision 45 proto 1
+
+	# IP addresses
+	ip -net "$node1" addr add 100.64.0.1/24 dev prp1
+	ip -net "$node2" addr add 100.64.0.2/24 dev prp2
+
+	# All links up
+	ip -net "$node1" link set vethA up
+	ip -net "$node1" link set vethB up
+	ip -net "$node1" link set prp1 up
+
+	ip -net "$node2" link set vethA up
+	ip -net "$node2" link set vethB up
+	ip -net "$node2" link set prp2 up
+}
+
+wait_for_hsr_node_table()
+{
+	log_info "Wait for node table entries to be merged."
+	WAIT=5
+	while [ "${WAIT}" -gt 0 ]; do
+		nts=$(cat /sys/kernel/debug/hsr/hsr*/node_table)
+
+		# We need entries in the node tables, and they need to be merged
+		if (echo "$nts" | grep -qE "^([0-9a-f]{2}:){5}") && \
+		    ! (echo "$nts" | grep -q "00:00:00:00:00:00"); then
+			return
+		fi
+
+		sleep 1
+		((WAIT--))
+	done
+	check_err 1 "Failed to wait for merged node table entries"
+}
+
+setup_topo()
+{
+	local proto="$1"
+
+	if [ "$proto" = "HSRv0" ]; then
+		setup_hsr_topo 0
+		wait_for_hsr_node_table
+	elif [ "$proto" = "HSRv1" ]; then
+		setup_hsr_topo 1
+		wait_for_hsr_node_table
+	elif [ "$proto" = "PRP" ]; then
+		setup_prp_topo
+	else
+		check_err 1 "Unknown protocol (${proto})"
+	fi
+}
+
+check_ping()
+{
+	local node="$1"
+	local dst="$2"
+	local ping_args="-q -i 0.01 -c 400"
+
+	log_info "Running ping $node -> $dst"
+	# shellcheck disable=SC2086
+	output=$(ip netns exec "$node" ping $ping_args "$dst" | \
+		grep "packets transmitted")
+	log_info "$output"
+
+	dups=0
+	loss=0
+
+	if [[ "$output" =~ \+([0-9]+)" duplicates" ]]; then
+		dups="${BASH_REMATCH[1]}"
+	fi
+	if [[ "$output" =~ ([0-9\.]+\%)" packet loss" ]]; then
+		loss="${BASH_REMATCH[1]}"
+	fi
+
+	check_err "$dups" "Unexpected duplicate packets (${dups})"
+	if [ "$loss" != "0%" ]; then
+		check_err 1 "Unexpected packet loss (${loss})"
+	fi
+}
+
+test_clean()
+{
+	local proto="$1"
+
+	RET=0
+	tname="${FUNCNAME[0]} - ${proto}"
+
+	setup_topo "$proto"
+	if ((RET != ksft_pass)); then
+		log_test "${tname} setup"
+		return
+	fi
+
+	check_ping "$node1" "100.64.0.2"
+
+	log_test "${tname}"
+}
+
+test_clean_hsrv0()
+{
+	test_clean "HSRv0"
+}
+
+test_clean_hsrv1()
+{
+	test_clean "HSRv1"
+}
+
+test_clean_prp()
+{
+	test_clean "PRP"
+}
+
+test_cut_link()
+{
+	local proto="$1"
+
+	RET=0
+	tname="${FUNCNAME[0]} - ${proto}"
+
+	setup_topo "$proto"
+	if ((RET != ksft_pass)); then
+		log_test "${tname} setup"
+		return
+	fi
+
+	# Cutting link from subshell, so check_ping can run in the normal shell
+	# with access to global variables from the test harness.
+	(
+		sleep 2
+		log_info "Cutting link"
+		ip -net "$node1" link set vethB down
+	) &
+	check_ping "$node1" "100.64.0.2"
+
+	wait
+	log_test "${tname}"
+}
+
+
+test_cut_link_hsrv0()
+{
+	test_cut_link "HSRv0"
+}
+
+test_cut_link_hsrv1()
+{
+	test_cut_link "HSRv1"
+}
+
+test_cut_link_prp()
+{
+	test_cut_link "PRP"
+}
+
+cleanup()
+{
+	cleanup_all_ns
+}
+
+trap cleanup EXIT
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
                   ` (2 preceding siblings ...)
  2026-01-22 14:56 ` [PATCH net-next v2 3/9] selftests: hsr: Add tests for faulty links Felix Maurer
@ 2026-01-22 14:56 ` Felix Maurer
  2026-01-28 16:38   ` Simon Horman
                     ` (2 more replies)
  2026-01-22 14:57 ` [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP Felix Maurer
                   ` (6 subsequent siblings)
  10 siblings, 3 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy, Steffen Lindner

The PRP duplicate discard algorithm does not work reliably with certain
link faults. Especially with packet loss on one link, the duplicate discard
algorithm drops valid packets which leads to packet loss on the PRP
interface where the link fault should in theory be perfectly recoverable by
PRP. This happens because the algorithm opens the drop window on the lossy
link, covering received and lost sequence numbers. If the other, non-lossy
link receives the duplicate for a lost frame, it is within the drop window
of the lossy link and therefore dropped.

Since IEC 62439-3:2012, a node has one sequence number counter for frames
it sends, instead of one sequence number counter for each destination.
Therefore, a node can not expect to receive contiguous sequence numbers
from a sender. A missing sequence number can be totally normal (if the
sender intermittently communicates with another node) or mean a frame was
lost.

The algorithm, as previously implemented in commit 05fd00e5e7b1 ("net: hsr:
Fix PRP duplicate detection"), was part of IEC 62439-3:2010 (HSRv0/PRPv0)
but was removed with IEC 62439-3:2012 (HSRv1/PRPv1). Since that, no
algorithm is specified but up to implementers. It should be "designed such
that it never rejects a legitimate frame, while occasional acceptance of a
duplicate can be tolerated" (IEC 62439-3:2021).

For the duplicate discard algorithm, this means that 1) we need to track
the sequence numbers individually to account for non-contiguous sequence
numbers, and 2) we should always err on the side of accepting a duplicate
than dropping a valid frame.

The idea of the new algorithm is to store the seen sequence numbers in a
bitmap. To keep the size of the bitmap in control, we store it as a "sparse
bitmap" where the bitmap is split into blocks and not all blocks exist at
the same time. The sparse bitmap is implemented using an xarray that keeps
the references to the individual blocks and a backing ring buffer that
stores the actual blocks. New blocks are initialized in the buffer and
added to the xarray as needed when new frames arrive. Existing blocks are
removed in two conditions:
1. The block found for an arriving sequence number is old and therefore not
   relevant to the duplicate discard algorithm anymore, i.e., it has been
   added more than the entry forget time ago. In this case, the block is
   removed from the xarray and marked as forgotten (by setting its
   timestamp to 0).
2. Space is needed in the ring buffer for a new block. In this case, the
   block is removed from the xarray, if it hasn't already been forgotten
   (by 1.). Afterwards, the new block is initialized in its place.

This has the nice property that we can reliably track sequence numbers on
low traffic situations (where they expire based on their timestamp) and
more quickly forget sequence numbers in high traffic situations before they
potentially wrap over and repeat before they are expired.

When nodes are merged, the blocks are merged as well. The timestamp of a
merged block is set to the minimum of the two timestamps to never keep
around a seen sequence number for too long. The bitmaps are or'd to mark
all seen sequence numbers as seen.

All of this still happens under seq_out_lock, to prevent concurrent
access to the blocks.

Reported-by: Steffen Lindner <steffen.lindner@de.abb.com>
Fixes: 05fd00e5e7b1 ("net: hsr: Fix PRP duplicate detection")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---

Notes:
    The bug was reported by Steffen directly to us, I asked for permission
    to add his Reported-by tag.

 net/hsr/hsr_framereg.c | 203 +++++++++++++++++++++++++++--------------
 net/hsr/hsr_framereg.h |  21 ++++-
 2 files changed, 152 insertions(+), 72 deletions(-)

diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index 3a2a2fa7a0a3..70b3819e9298 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -35,7 +35,6 @@ static bool seq_nr_after(u16 a, u16 b)
 
 #define seq_nr_before(a, b)		seq_nr_after((b), (a))
 #define seq_nr_before_or_eq(a, b)	(!seq_nr_after((a), (b)))
-#define PRP_DROP_WINDOW_LEN 32768
 
 bool hsr_addr_is_redbox(struct hsr_priv *hsr, unsigned char *addr)
 {
@@ -126,13 +125,29 @@ void hsr_del_self_node(struct hsr_priv *hsr)
 		kfree_rcu(old, rcu_head);
 }
 
+static void hsr_free_node(struct hsr_node *node)
+{
+	xa_destroy(&node->seq_blocks);
+	kfree(node->block_buf);
+	kfree(node);
+}
+
+static void hsr_free_node_rcu(struct rcu_head *rn)
+{
+	struct hsr_node *node = container_of(rn, struct hsr_node, rcu_head);
+
+	hsr_free_node(node);
+}
+
 void hsr_del_nodes(struct list_head *node_db)
 {
 	struct hsr_node *node;
 	struct hsr_node *tmp;
 
-	list_for_each_entry_safe(node, tmp, node_db, mac_list)
-		kfree(node);
+	list_for_each_entry_safe(node, tmp, node_db, mac_list) {
+		list_del(&node->mac_list);
+		hsr_free_node(node);
+	}
 }
 
 void prp_handle_san_frame(bool san, enum hsr_port_type port,
@@ -158,7 +173,7 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
 				     u16 seq_out, bool san,
 				     enum hsr_port_type rx_port)
 {
-	struct hsr_node *new_node, *node;
+	struct hsr_node *new_node, *node = NULL;
 	unsigned long now;
 	int i;
 
@@ -169,6 +184,14 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
 	ether_addr_copy(new_node->macaddress_A, addr);
 	spin_lock_init(&new_node->seq_out_lock);
 
+	new_node->block_buf = kcalloc(HSR_MAX_SEQ_BLOCKS,
+				      sizeof(struct hsr_seq_block),
+				      GFP_ATOMIC);
+	if (!new_node->block_buf)
+		goto free;
+
+	xa_init(&new_node->seq_blocks);
+
 	/* We are only interested in time diffs here, so use current jiffies
 	 * as initialization. (0 could trigger an spurious ring error warning).
 	 */
@@ -176,11 +199,7 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
 	for (i = 0; i < HSR_PT_PORTS; i++) {
 		new_node->time_in[i] = now;
 		new_node->time_out[i] = now;
-	}
-	for (i = 0; i < HSR_PT_PORTS; i++) {
 		new_node->seq_out[i] = seq_out;
-		new_node->seq_expected[i] = seq_out + 1;
-		new_node->seq_start[i] = seq_out + 1;
 	}
 
 	if (san && hsr->proto_ops->handle_san_frame)
@@ -199,6 +218,8 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
 	return new_node;
 out:
 	spin_unlock_bh(&hsr->list_lock);
+	kfree(new_node->block_buf);
+free:
 	kfree(new_node);
 	return node;
 }
@@ -280,6 +301,59 @@ struct hsr_node *hsr_get_node(struct hsr_port *port, struct list_head *node_db,
 			    san, rx_port);
 }
 
+static bool hsr_seq_block_is_old(struct hsr_seq_block *block)
+{
+	unsigned long expiry = msecs_to_jiffies(HSR_ENTRY_FORGET_TIME);
+
+	return time_is_before_jiffies(block->time + expiry);
+}
+
+static void hsr_forget_seq_block(struct hsr_node *node,
+				 struct hsr_seq_block *block)
+{
+	if (block->time)
+		xa_erase(&node->seq_blocks, block->block_idx);
+	block->time = 0;
+}
+
+/* Get the currently active sequence number block. If there is no block yet, or
+ * the existing one is expired, a new block is created. The idea is to maintain
+ * a "sparse bitmap" where a bitmap for the whole sequence number space is
+ * split into blocks and not all blocks exist all the time. The blocks can
+ * expire after time (in low traffic situations) or when they are replaced in
+ * the backing fixed size buffer (in high traffic situations).
+ */
+static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
+					       u16 block_idx)
+{
+	struct hsr_seq_block *block, *res;
+
+	block = xa_load(&node->seq_blocks, block_idx);
+
+	if (block && hsr_seq_block_is_old(block)) {
+		hsr_forget_seq_block(node, block);
+		block = NULL;
+	}
+
+	if (!block) {
+		block = &node->block_buf[node->next_block];
+		hsr_forget_seq_block(node, block);
+
+		memset(block, 0, sizeof(*block));
+		block->time = jiffies;
+		block->block_idx = block_idx;
+
+		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
+		if (xa_is_err(res))
+			return NULL;
+
+		node->next_block =
+			(node->next_block + 1) & (HSR_MAX_SEQ_BLOCKS - 1);
+	}
+
+	return block;
+}
+
 /* Use the Supervision frame's info about an eventual macaddress_B for merging
  * nodes that has previously had their macaddress_B registered as a separate
  * node.
@@ -288,16 +362,18 @@ void hsr_handle_sup_frame(struct hsr_frame_info *frame)
 {
 	struct hsr_node *node_curr = frame->node_src;
 	struct hsr_port *port_rcv = frame->port_rcv;
+	struct hsr_seq_block *src_blk, *merge_blk;
 	struct hsr_priv *hsr = port_rcv->hsr;
-	struct hsr_sup_payload *hsr_sp;
 	struct hsr_sup_tlv *hsr_sup_tlv;
+	struct hsr_sup_payload *hsr_sp;
 	struct hsr_node *node_real;
 	struct sk_buff *skb = NULL;
 	struct list_head *node_db;
 	struct ethhdr *ethhdr;
-	int i;
-	unsigned int pull_size = 0;
 	unsigned int total_pull_size = 0;
+	unsigned int pull_size = 0;
+	unsigned long idx;
+	int i;
 
 	/* Here either frame->skb_hsr or frame->skb_prp should be
 	 * valid as supervision frame always will have protocol
@@ -391,6 +467,18 @@ void hsr_handle_sup_frame(struct hsr_frame_info *frame)
 		if (seq_nr_after(node_curr->seq_out[i], node_real->seq_out[i]))
 			node_real->seq_out[i] = node_curr->seq_out[i];
 	}
+
+	xa_for_each(&node_curr->seq_blocks, idx, src_blk) {
+		if (hsr_seq_block_is_old(src_blk))
+			continue;
+
+		merge_blk = hsr_get_seq_block(node_real, src_blk->block_idx);
+		if (!merge_blk)
+			continue;
+		merge_blk->time = min(merge_blk->time, src_blk->time);
+		bitmap_or(merge_blk->seq_nrs, merge_blk->seq_nrs,
+			  src_blk->seq_nrs, HSR_SEQ_BLOCK_SIZE);
+	}
 	spin_unlock_bh(&node_real->seq_out_lock);
 	node_real->addr_B_port = port_rcv->type;
 
@@ -398,7 +486,7 @@ void hsr_handle_sup_frame(struct hsr_frame_info *frame)
 	if (!node_curr->removed) {
 		list_del_rcu(&node_curr->mac_list);
 		node_curr->removed = true;
-		kfree_rcu(node_curr, rcu_head);
+		call_rcu(&node_curr->rcu_head, hsr_free_node_rcu);
 	}
 	spin_unlock_bh(&hsr->list_lock);
 
@@ -505,17 +593,12 @@ int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
 	return 0;
 }
 
-/* Adaptation of the PRP duplicate discard algorithm described in wireshark
- * wiki (https://wiki.wireshark.org/PRP)
- *
- * A drop window is maintained for both LANs with start sequence set to the
- * first sequence accepted on the LAN that has not been seen on the other LAN,
- * and expected sequence set to the latest received sequence number plus one.
- *
- * When a frame is received on either LAN it is compared against the received
- * frames on the other LAN. If it is outside the drop window of the other LAN
- * the frame is accepted and the drop window is updated.
- * The drop window for the other LAN is reset.
+/* PRP duplicate discard algorithm: we maintain a bitmap where we set a bit for
+ * every seen sequence number. The bitmap is split into blocks and the block
+ * management is detailed in hsr_get_seq_block(). In any case, we err on the
+ * side of accepting a packet, as the specification requires the algorithm to
+ * be "designed such that it never rejects a legitimate frame, while occasional
+ * acceptance of a duplicate can be tolerated." (IEC 62439-3:2021, 4.1.10.3).
  *
  * 'port' is the outgoing interface
  * 'frame' is the frame to be sent
@@ -526,18 +609,21 @@ int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
  */
 int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
 {
-	enum hsr_port_type other_port;
-	enum hsr_port_type rcv_port;
+	u16 sequence_nr, seq_bit, block_idx;
+	struct hsr_seq_block *block;
 	struct hsr_node *node;
-	u16 sequence_diff;
-	u16 sequence_exp;
-	u16 sequence_nr;
 
-	/* out-going frames are always in order
-	 * and can be checked the same way as for HSR
-	 */
-	if (frame->port_rcv->type == HSR_PT_MASTER)
-		return hsr_register_frame_out(port, frame);
+	node = frame->node_src;
+	sequence_nr = frame->sequence_nr;
+
+	// out-going frames are always in order
+	if (frame->port_rcv->type == HSR_PT_MASTER) {
+		spin_lock_bh(&node->seq_out_lock);
+		node->time_out[port->type] = jiffies;
+		node->seq_out[port->type] = sequence_nr;
+		spin_unlock_bh(&node->seq_out_lock);
+		return 0;
+	}
 
 	/* for PRP we should only forward frames from the slave ports
 	 * to the master port
@@ -545,47 +631,26 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
 	if (port->type != HSR_PT_MASTER)
 		return 1;
 
-	node = frame->node_src;
-	sequence_nr = frame->sequence_nr;
-	sequence_exp = sequence_nr + 1;
-	rcv_port = frame->port_rcv->type;
-	other_port = rcv_port == HSR_PT_SLAVE_A ? HSR_PT_SLAVE_B :
-				 HSR_PT_SLAVE_A;
-
 	spin_lock_bh(&node->seq_out_lock);
-	if (time_is_before_jiffies(node->time_out[port->type] +
-	    msecs_to_jiffies(HSR_ENTRY_FORGET_TIME)) ||
-	    (node->seq_start[rcv_port] == node->seq_expected[rcv_port] &&
-	     node->seq_start[other_port] == node->seq_expected[other_port])) {
-		/* the node hasn't been sending for a while
-		 * or both drop windows are empty, forward the frame
-		 */
-		node->seq_start[rcv_port] = sequence_nr;
-	} else if (seq_nr_before(sequence_nr, node->seq_expected[other_port]) &&
-		   seq_nr_before_or_eq(node->seq_start[other_port], sequence_nr)) {
-		/* drop the frame, update the drop window for the other port
-		 * and reset our drop window
-		 */
-		node->seq_start[other_port] = sequence_exp;
-		node->seq_expected[rcv_port] = sequence_exp;
-		node->seq_start[rcv_port] = node->seq_expected[rcv_port];
-		spin_unlock_bh(&node->seq_out_lock);
-		return 1;
-	}
 
-	/* update the drop window for the port where this frame was received
-	 * and clear the drop window for the other port
-	 */
-	node->seq_start[other_port] = node->seq_expected[other_port];
-	node->seq_expected[rcv_port] = sequence_exp;
-	sequence_diff = sequence_exp - node->seq_start[rcv_port];
-	if (sequence_diff > PRP_DROP_WINDOW_LEN)
-		node->seq_start[rcv_port] = sequence_exp - PRP_DROP_WINDOW_LEN;
+	block_idx = hsr_seq_block_index(sequence_nr);
+	block = hsr_get_seq_block(node, block_idx);
+	if (!block)
+		goto out_new;
+
+	seq_bit = hsr_seq_block_bit(sequence_nr);
+	if (test_and_set_bit(seq_bit, block->seq_nrs))
+		goto out_seen;
 
 	node->time_out[port->type] = jiffies;
 	node->seq_out[port->type] = sequence_nr;
+out_new:
 	spin_unlock_bh(&node->seq_out_lock);
 	return 0;
+
+out_seen:
+	spin_unlock_bh(&node->seq_out_lock);
+	return 1;
 }
 
 #if IS_MODULE(CONFIG_PRP_DUP_DISCARD_KUNIT_TEST)
@@ -672,7 +737,7 @@ void hsr_prune_nodes(struct timer_list *t)
 				list_del_rcu(&node->mac_list);
 				node->removed = true;
 				/* Note that we need to free this entry later: */
-				kfree_rcu(node, rcu_head);
+				call_rcu(&node->rcu_head, hsr_free_node_rcu);
 			}
 		}
 	}
@@ -706,7 +771,7 @@ void hsr_prune_proxy_nodes(struct timer_list *t)
 				list_del_rcu(&node->mac_list);
 				node->removed = true;
 				/* Note that we need to free this entry later: */
-				kfree_rcu(node, rcu_head);
+				call_rcu(&node->rcu_head, hsr_free_node_rcu);
 			}
 		}
 	}
diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h
index b04948659d84..d4d1fa70039f 100644
--- a/net/hsr/hsr_framereg.h
+++ b/net/hsr/hsr_framereg.h
@@ -74,9 +74,23 @@ bool hsr_is_node_in_db(struct list_head *node_db,
 
 int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
 
+#define HSR_SEQ_BLOCK_SHIFT 7 /* 128 bits */
+#define HSR_SEQ_BLOCK_SIZE (1 << HSR_SEQ_BLOCK_SHIFT)
+#define HSR_SEQ_BLOCK_MASK (HSR_SEQ_BLOCK_SIZE - 1)
+#define HSR_MAX_SEQ_BLOCKS 64
+
+#define hsr_seq_block_index(sequence_nr) ((sequence_nr) >> HSR_SEQ_BLOCK_SHIFT)
+#define hsr_seq_block_bit(sequence_nr) ((sequence_nr) & HSR_SEQ_BLOCK_MASK)
+
+struct hsr_seq_block {
+	unsigned long		time;
+	u16			block_idx;
+	DECLARE_BITMAP(seq_nrs, HSR_SEQ_BLOCK_SIZE);
+};
+
 struct hsr_node {
 	struct list_head	mac_list;
-	/* Protect R/W access to seq_out */
+	/* Protect R/W access to seq_out and seq_blocks */
 	spinlock_t		seq_out_lock;
 	unsigned char		macaddress_A[ETH_ALEN];
 	unsigned char		macaddress_B[ETH_ALEN];
@@ -91,8 +105,9 @@ struct hsr_node {
 	u16			seq_out[HSR_PT_PORTS];
 	bool			removed;
 	/* PRP specific duplicate handling */
-	u16			seq_expected[HSR_PT_PORTS];
-	u16			seq_start[HSR_PT_PORTS];
+	struct xarray		seq_blocks;
+	struct hsr_seq_block	*block_buf;
+	unsigned int		next_block;
 	struct rcu_head		rcu_head;
 };
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
                   ` (3 preceding siblings ...)
  2026-01-22 14:56 ` [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP Felix Maurer
@ 2026-01-22 14:57 ` Felix Maurer
  2026-01-29 13:32   ` Sebastian Andrzej Siewior
  2026-01-22 14:57 ` [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR Felix Maurer
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

Add tests where one link has different rates of packet loss or reorders
packets. PRP should still be able to recover from these link faults and
show no packet loss.  However, it is acceptable to receive some level of
duplicate packets. This matches the current specification (IEC
62439-3:2021) of the duplicate discard algorithm that requires it to be
"designed such that it never rejects a legitimate frame, while occasional
acceptance of a duplicate can be tolerated." The rate of acceptable
duplicates in this test is intentionally high (10%) to make the test
stable, the values I observed in the worst test cases (20% loss) are around
5% duplicates.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---
 .../testing/selftests/net/hsr/link_faults.sh  | 79 +++++++++++++++++--
 1 file changed, 74 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/net/hsr/link_faults.sh b/tools/testing/selftests/net/hsr/link_faults.sh
index 7ff14dcd32e7..1959bea17147 100755
--- a/tools/testing/selftests/net/hsr/link_faults.sh
+++ b/tools/testing/selftests/net/hsr/link_faults.sh
@@ -11,10 +11,16 @@ ALL_TESTS="
 	test_cut_link_hsrv1
 	test_clean_prp
 	test_cut_link_prp
+	test_packet_loss_prp
+	test_high_packet_loss_prp
+	test_reordering_prp
 "
 
-# The tests are running ping for 5sec with a relatively short interval with a
-# cut link, which should be recoverable by HSR/PRP.
+# The tests are running ping for 5sec with a relatively short interval in
+# different scenarios with faulty links (cut links, packet loss, delay,
+# reordering) that should be recoverable by HSR/PRP. The ping interval (10ms)
+# is short enough that the base delay (50ms) leads to a queue in the netem
+# qdiscs which is needed for reordering.
 
 setup_hsr_topo()
 {
@@ -160,6 +166,7 @@ check_ping()
 {
 	local node="$1"
 	local dst="$2"
+	local accepted_dups="$3"
 	local ping_args="-q -i 0.01 -c 400"
 
 	log_info "Running ping $node -> $dst"
@@ -178,7 +185,9 @@ check_ping()
 		loss="${BASH_REMATCH[1]}"
 	fi
 
-	check_err "$dups" "Unexpected duplicate packets (${dups})"
+	if [ "$dups" -gt "$accepted_dups" ]; then
+		check_err 1 "Unexpected duplicate packets (${dups})"
+	fi
 	if [ "$loss" != "0%" ]; then
 		check_err 1 "Unexpected packet loss (${loss})"
 	fi
@@ -197,7 +206,7 @@ test_clean()
 		return
 	fi
 
-	check_ping "$node1" "100.64.0.2"
+	check_ping "$node1" "100.64.0.2" 0
 
 	log_test "${tname}"
 }
@@ -237,7 +246,7 @@ test_cut_link()
 		log_info "Cutting link"
 		ip -net "$node1" link set vethB down
 	) &
-	check_ping "$node1" "100.64.0.2"
+	check_ping "$node1" "100.64.0.2" 0
 
 	wait
 	log_test "${tname}"
@@ -259,6 +268,66 @@ test_cut_link_prp()
 	test_cut_link "PRP"
 }
 
+test_packet_loss()
+{
+	local proto="$1"
+	local loss="$2"
+
+	RET=0
+	tname="${FUNCNAME[0]} - ${proto}, ${loss}"
+
+	setup_topo "$proto"
+	if ((RET != ksft_pass)); then
+		log_test "${tname} setup"
+		return
+	fi
+
+	# Packet loss with lower delay makes sure the packets on the lossy link
+	# arrive first.
+	tc -net "$node1" qdisc add dev vethA root netem delay 50ms
+	tc -net "$node1" qdisc add dev vethB root netem delay 20ms loss "$loss"
+
+	check_ping "$node1" "100.64.0.2" 40
+
+	log_test "${tname}"
+}
+
+test_packet_loss_prp()
+{
+	test_packet_loss "PRP" "20%"
+}
+
+test_high_packet_loss_prp()
+{
+	test_packet_loss "PRP" "80%"
+}
+
+test_reordering()
+{
+	local proto="$1"
+
+	RET=0
+	tname="${FUNCNAME[0]} - ${proto}"
+
+	setup_topo "$proto"
+	if ((RET != ksft_pass)); then
+		log_test "${tname} setup"
+		return
+	fi
+
+	tc -net "$node1" qdisc add dev vethA root netem delay 50ms
+	tc -net "$node1" qdisc add dev vethB root netem delay 50ms reorder 20%
+
+	check_ping "$node1" "100.64.0.2" 40
+
+	log_test "${tname}"
+}
+
+test_reordering_prp()
+{
+	test_reordering "PRP"
+}
+
 cleanup()
 {
 	cleanup_all_ns
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
                   ` (4 preceding siblings ...)
  2026-01-22 14:57 ` [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP Felix Maurer
@ 2026-01-22 14:57 ` Felix Maurer
  2026-01-29 14:43   ` Sebastian Andrzej Siewior
  2026-01-22 14:57 ` [PATCH net-next v2 7/9] selftests: hsr: Add more link fault tests " Felix Maurer
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy, Yoann Congal

The HSR duplicate discard algorithm had even more basic problems than the
described for PRP in the previous patch. It relied only on the last
received sequence number to decide if a new frame should be forwarded to
any port. This does not work correctly in any case where frames are
received out of order. The linked bug report claims that this can even
happen with perfectly fine links due to the order in which incoming frames
are processed (which can be unexpected on multi-core systems). The issue
also occasionally shows up in the HSR selftests. The main reason is that
the sequence number that was last forwarded to the master port may have
skipped a number which will in turn never be delivered to the host.

As the problem (we accidentally skip over a sequence number that has not
been received but will be received in the future) is similar to PRP, we can
apply a similar solution. The duplicate discard algorithm based on the
"sparse bitmap" works well for HSR if it is extended to track one bitmap
for each port (A, B, master, interlink). To do this, change the sequence
number blocks to contain a flexible array member as the last member that
can keep chunks for as many bitmaps as we need. This design makes it easy
to reuse the same algorithm in a potential PRP RedBox implementation.

The duplicate discard algorithm functions are modified to deal with
sequence number blocks of different sizes and to correctly use the array of
bitmap chunks. There is a notable speciality for HSR: the port type has a
special port type NONE with value 0. This leads to the number of port types
being 5 instead of actually 4. To save memory, remove the NONE port from
the bitmap (by subtracting 1) when setting up the block buffer and when
accessing the bitmap chunks in the array.

Removing the old algorithm allows us to get rid of a few fields that are
not needed any more: time_out and seq_out for each port. We can also remove
some functions that were only necessary for the previous duplicate discard
algorithm.

The removal of seq_out is possible despite its previous usage in
hsr_register_frame_in: it was used to prevent updates to time_in when
"invalid" sequence numbers were received. With the new duplicate discard
algorithm, time_in has no relevance for the expiry of sequence numbers
anymore. They will expire based on the timestamps in the sequence number
blocks after at most 400ms. There is no need that a node "re-registers" to
"resume communication": after 400ms, all sequence numbers are accepted
again. Also, according to the IEC 62439-3:2021, all nodes are supposed to
send no traffic for 500ms after boot to lead exactly to this expiry of seen
sequence numbers. time_in is still used for pruning nodes from the node
table after no traffic has been received for 60sec. Pruning is only needed
if the node is really gone and has not been sending any traffic for that
period.

seq_out was also used to report the last incoming sequence number from a
node through netlink. I am not sure how useful this value is to userspace
at all, but added getting it from the sequence number blocks. This number
can be outdated after node merging until a new block has been added.

Reported-by: Yoann Congal <yoann.congal@smile.fr>
Closes: https://lore.kernel.org/netdev/7d221a07-8358-4c0b-a09c-3b029c052245@smile.fr/
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---
 net/hsr/hsr_framereg.c | 216 +++++++++++++++++++++--------------------
 net/hsr/hsr_framereg.h |  20 ++--
 2 files changed, 124 insertions(+), 112 deletions(-)

diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index 70b3819e9298..c9bd25e6a7cc 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -19,23 +19,6 @@
 #include "hsr_framereg.h"
 #include "hsr_netlink.h"
 
-/* seq_nr_after(a, b) - return true if a is after (higher in sequence than) b,
- * false otherwise.
- */
-static bool seq_nr_after(u16 a, u16 b)
-{
-	/* Remove inconsistency where
-	 * seq_nr_after(a, b) == seq_nr_before(a, b)
-	 */
-	if ((int)b - a == 32768)
-		return false;
-
-	return (((s16)(b - a)) < 0);
-}
-
-#define seq_nr_before(a, b)		seq_nr_after((b), (a))
-#define seq_nr_before_or_eq(a, b)	(!seq_nr_after((a), (b)))
-
 bool hsr_addr_is_redbox(struct hsr_priv *hsr, unsigned char *addr)
 {
 	if (!hsr->redbox || !is_valid_ether_addr(hsr->macaddress_redbox))
@@ -163,18 +146,16 @@ void prp_handle_san_frame(bool san, enum hsr_port_type port,
 		node->san_b = true;
 }
 
-/* Allocate an hsr_node and add it to node_db. 'addr' is the node's address_A;
- * seq_out is used to initialize filtering of outgoing duplicate frames
- * originating from the newly added node.
+/* Allocate an hsr_node and add it to node_db. 'addr' is the node's address_A.
  */
 static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
 				     struct list_head *node_db,
-				     unsigned char addr[],
-				     u16 seq_out, bool san,
+				     unsigned char addr[], bool san,
 				     enum hsr_port_type rx_port)
 {
 	struct hsr_node *new_node, *node = NULL;
 	unsigned long now;
+	size_t block_sz;
 	int i;
 
 	new_node = kzalloc(sizeof(*new_node), GFP_ATOMIC);
@@ -184,9 +165,13 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
 	ether_addr_copy(new_node->macaddress_A, addr);
 	spin_lock_init(&new_node->seq_out_lock);
 
-	new_node->block_buf = kcalloc(HSR_MAX_SEQ_BLOCKS,
-				      sizeof(struct hsr_seq_block),
-				      GFP_ATOMIC);
+	if (hsr->prot_version == PRP_V1)
+		new_node->seq_port_cnt = 1;
+	else
+		new_node->seq_port_cnt = HSR_PT_PORTS - 1;
+
+	block_sz = hsr_seq_block_size(new_node);
+	new_node->block_buf = kcalloc(HSR_MAX_SEQ_BLOCKS, block_sz, GFP_ATOMIC);
 	if (!new_node->block_buf)
 		goto free;
 
@@ -198,8 +183,6 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr,
 	now = jiffies;
 	for (i = 0; i < HSR_PT_PORTS; i++) {
 		new_node->time_in[i] = now;
-		new_node->time_out[i] = now;
-		new_node->seq_out[i] = seq_out;
 	}
 
 	if (san && hsr->proto_ops->handle_san_frame)
@@ -244,7 +227,6 @@ struct hsr_node *hsr_get_node(struct hsr_port *port, struct list_head *node_db,
 	struct ethhdr *ethhdr;
 	struct prp_rct *rct;
 	bool san = false;
-	u16 seq_out;
 
 	if (!skb_mac_header_was_set(skb))
 		return NULL;
@@ -281,24 +263,13 @@ struct hsr_node *hsr_get_node(struct hsr_port *port, struct list_head *node_db,
 		/* Check if skb contains hsr_ethhdr */
 		if (skb->mac_len < sizeof(struct hsr_ethhdr))
 			return NULL;
-
-		/* Use the existing sequence_nr from the tag as starting point
-		 * for filtering duplicate frames.
-		 */
-		seq_out = hsr_get_skb_sequence_nr(skb) - 1;
 	} else {
 		rct = skb_get_PRP_rct(skb);
-		if (rct && prp_check_lsdu_size(skb, rct, is_sup)) {
-			seq_out = prp_get_skb_sequence_nr(rct);
-		} else {
-			if (rx_port != HSR_PT_MASTER)
-				san = true;
-			seq_out = HSR_SEQNR_START;
-		}
+		if (!rct && rx_port != HSR_PT_MASTER)
+			san = true;
 	}
 
-	return hsr_add_node(hsr, node_db, ethhdr->h_source, seq_out,
-			    san, rx_port);
+	return hsr_add_node(hsr, node_db, ethhdr->h_source, san, rx_port);
 }
 
 static bool hsr_seq_block_is_old(struct hsr_seq_block *block)
@@ -327,6 +298,7 @@ static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
 					       u16 block_idx)
 {
 	struct hsr_seq_block *block, *res;
+	size_t block_sz;
 
 	block = xa_load(&node->seq_blocks, block_idx);
 
@@ -336,10 +308,11 @@ static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
 	}
 
 	if (!block) {
-		block = &node->block_buf[node->next_block];
+		block_sz = hsr_seq_block_size(node);
+		block = node->block_buf + node->next_block * block_sz;
 		hsr_forget_seq_block(node, block);
 
-		memset(block, 0, sizeof(*block));
+		memset(block, 0, block_sz);
 		block->time = jiffies;
 		block->block_idx = block_idx;
 
@@ -416,8 +389,7 @@ void hsr_handle_sup_frame(struct hsr_frame_info *frame)
 	if (!node_real)
 		/* No frame received from AddrA of this node yet */
 		node_real = hsr_add_node(hsr, node_db, hsr_sp->macaddress_A,
-					 HSR_SEQNR_START - 1, true,
-					 port_rcv->type);
+					 true, port_rcv->type);
 	if (!node_real)
 		goto done; /* No mem */
 	if (node_real == node_curr)
@@ -464,8 +436,6 @@ void hsr_handle_sup_frame(struct hsr_frame_info *frame)
 			node_real->time_in_stale[i] =
 						node_curr->time_in_stale[i];
 		}
-		if (seq_nr_after(node_curr->seq_out[i], node_real->seq_out[i]))
-			node_real->seq_out[i] = node_curr->seq_out[i];
 	}
 
 	xa_for_each(&node_curr->seq_blocks, idx, src_blk) {
@@ -476,8 +446,10 @@ void hsr_handle_sup_frame(struct hsr_frame_info *frame)
 		if (!merge_blk)
 			continue;
 		merge_blk->time = min(merge_blk->time, src_blk->time);
-		bitmap_or(merge_blk->seq_nrs, merge_blk->seq_nrs,
-			  src_blk->seq_nrs, HSR_SEQ_BLOCK_SIZE);
+		for (i = 0; i < node_real->seq_port_cnt; i++) {
+			bitmap_or(merge_blk->seq_nrs[i], merge_blk->seq_nrs[i],
+				  src_blk->seq_nrs[i], HSR_SEQ_BLOCK_SIZE);
+		}
 	}
 	spin_unlock_bh(&node_real->seq_out_lock);
 	node_real->addr_B_port = port_rcv->type;
@@ -554,60 +526,28 @@ void hsr_addr_subst_dest(struct hsr_node *node_src, struct sk_buff *skb,
 void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port,
 			   u16 sequence_nr)
 {
-	/* Don't register incoming frames without a valid sequence number. This
-	 * ensures entries of restarted nodes gets pruned so that they can
-	 * re-register and resume communications.
-	 */
-	if (!(port->dev->features & NETIF_F_HW_HSR_TAG_RM) &&
-	    seq_nr_before(sequence_nr, node->seq_out[port->type]))
-		return;
-
 	node->time_in[port->type] = jiffies;
 	node->time_in_stale[port->type] = false;
 }
 
-/* 'skb' is a HSR Ethernet frame (with a HSR tag inserted), with a valid
- * ethhdr->h_source address and skb->mac_header set.
- *
- * Return:
- *	 1 if frame can be shown to have been sent recently on this interface,
- *	 0 otherwise, or
- *	 negative error code on error
- */
-int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
-{
-	struct hsr_node *node = frame->node_src;
-	u16 sequence_nr = frame->sequence_nr;
-
-	spin_lock_bh(&node->seq_out_lock);
-	if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]) &&
-	    time_is_after_jiffies(node->time_out[port->type] +
-	    msecs_to_jiffies(HSR_ENTRY_FORGET_TIME))) {
-		spin_unlock_bh(&node->seq_out_lock);
-		return 1;
-	}
-
-	node->time_out[port->type] = jiffies;
-	node->seq_out[port->type] = sequence_nr;
-	spin_unlock_bh(&node->seq_out_lock);
-	return 0;
-}
-
-/* PRP duplicate discard algorithm: we maintain a bitmap where we set a bit for
+/* Duplicate discard algorithm: we maintain a bitmap where we set a bit for
  * every seen sequence number. The bitmap is split into blocks and the block
  * management is detailed in hsr_get_seq_block(). In any case, we err on the
  * side of accepting a packet, as the specification requires the algorithm to
  * be "designed such that it never rejects a legitimate frame, while occasional
  * acceptance of a duplicate can be tolerated." (IEC 62439-3:2021, 4.1.10.3).
+ * While this requirement is explicit for PRP, applying it to HSR does no harm
+ * either.
  *
- * 'port' is the outgoing interface
  * 'frame' is the frame to be sent
+ * 'port_type' is the type of the outgoing interface
  *
  * Return:
  *	 1 if frame can be shown to have been sent recently on this interface,
  *	 0 otherwise
  */
-int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
+static int hsr_check_duplicate(struct hsr_frame_info *frame,
+			       unsigned int port_type)
 {
 	u16 sequence_nr, seq_bit, block_idx;
 	struct hsr_seq_block *block;
@@ -616,20 +556,8 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
 	node = frame->node_src;
 	sequence_nr = frame->sequence_nr;
 
-	// out-going frames are always in order
-	if (frame->port_rcv->type == HSR_PT_MASTER) {
-		spin_lock_bh(&node->seq_out_lock);
-		node->time_out[port->type] = jiffies;
-		node->seq_out[port->type] = sequence_nr;
-		spin_unlock_bh(&node->seq_out_lock);
+	if (WARN_ON_ONCE(port_type >= node->seq_port_cnt))
 		return 0;
-	}
-
-	/* for PRP we should only forward frames from the slave ports
-	 * to the master port
-	 */
-	if (port->type != HSR_PT_MASTER)
-		return 1;
 
 	spin_lock_bh(&node->seq_out_lock);
 
@@ -639,11 +567,9 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
 		goto out_new;
 
 	seq_bit = hsr_seq_block_bit(sequence_nr);
-	if (test_and_set_bit(seq_bit, block->seq_nrs))
+	if (test_and_set_bit(seq_bit, block->seq_nrs[port_type]))
 		goto out_seen;
 
-	node->time_out[port->type] = jiffies;
-	node->seq_out[port->type] = sequence_nr;
 out_new:
 	spin_unlock_bh(&node->seq_out_lock);
 	return 0;
@@ -653,6 +579,49 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
 	return 1;
 }
 
+/* HSR duplicate discard: we check if the same frame has already been sent on
+ * this outgoing interface. The check follows the general duplicate discard
+ * algorithm.
+ *
+ * 'port' is the outgoing interface
+ * 'frame' is the frame to be sent
+ *
+ * Return:
+ *	 1 if frame can be shown to have been sent recently on this interface,
+ *	 0 otherwise
+ */
+int hsr_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
+{
+	return hsr_check_duplicate(frame, port->type - 1);
+}
+
+/* PRP duplicate discard: we only consider frames that are received on port A
+ * or port B and should go to the master port. For those, we check if they have
+ * already been received by the host, i.e., master port. The check uses the
+ * general duplicate discard algorithm, but without tracking multiple ports.
+ *
+ * 'port' is the outgoing interface
+ * 'frame' is the frame to be sent
+ *
+ * Return:
+ *	 1 if frame can be shown to have been sent recently on this interface,
+ *	 0 otherwise
+ */
+int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
+{
+	// out-going frames are always in order
+	if (frame->port_rcv->type == HSR_PT_MASTER)
+		return 0;
+
+	/* for PRP we should only forward frames from the slave ports
+	 * to the master port
+	 */
+	if (port->type != HSR_PT_MASTER)
+		return 1;
+
+	return hsr_check_duplicate(frame, 0);
+}
+
 #if IS_MODULE(CONFIG_PRP_DUP_DISCARD_KUNIT_TEST)
 EXPORT_SYMBOL(prp_register_frame_out);
 #endif
@@ -805,6 +774,39 @@ void *hsr_get_next_node(struct hsr_priv *hsr, void *_pos,
 	return NULL;
 }
 
+/* Fill the last sequence number that has been received from node on if1 by
+ * finding the last sequence number sent on port B; accordingly get the last
+ * received sequence number for if2 using sent sequence numbers on port A.
+ */
+static void fill_last_seq_nrs(struct hsr_node *node, u16 *if1_seq, u16 *if2_seq)
+{
+	struct hsr_seq_block *block;
+	unsigned int block_off;
+	size_t block_sz;
+	u16 seq_bit;
+
+	spin_lock_bh(&node->seq_out_lock);
+
+	// Get last inserted block
+	block_off = (node->next_block - 1) & (HSR_MAX_SEQ_BLOCKS - 1);
+	block_sz = hsr_seq_block_size(node);
+	block = node->block_buf + block_off * block_sz;
+
+	if (!bitmap_empty(block->seq_nrs[HSR_PT_SLAVE_B - 1],
+			  HSR_SEQ_BLOCK_SIZE)) {
+		seq_bit = find_last_bit(block->seq_nrs[HSR_PT_SLAVE_B - 1],
+					HSR_SEQ_BLOCK_SIZE);
+		*if1_seq = (block->block_idx << HSR_SEQ_BLOCK_SHIFT) | seq_bit;
+	}
+	if (!bitmap_empty(block->seq_nrs[HSR_PT_SLAVE_A - 1],
+			  HSR_SEQ_BLOCK_SIZE)) {
+		seq_bit = find_last_bit(block->seq_nrs[HSR_PT_SLAVE_A - 1],
+					HSR_SEQ_BLOCK_SIZE);
+		*if2_seq = (block->block_idx << HSR_SEQ_BLOCK_SHIFT) | seq_bit;
+	}
+	spin_unlock_bh(&node->seq_out_lock);
+}
+
 int hsr_get_node_data(struct hsr_priv *hsr,
 		      const unsigned char *addr,
 		      unsigned char addr_b[ETH_ALEN],
@@ -845,8 +847,10 @@ int hsr_get_node_data(struct hsr_priv *hsr,
 		*if2_age = jiffies_to_msecs(tdiff);
 
 	/* Present sequence numbers as if they were incoming on interface */
-	*if1_seq = node->seq_out[HSR_PT_SLAVE_B];
-	*if2_seq = node->seq_out[HSR_PT_SLAVE_A];
+	*if1_seq = 0;
+	*if2_seq = 0;
+	if (hsr->prot_version != PRP_V1)
+		fill_last_seq_nrs(node, if1_seq, if2_seq);
 
 	if (node->addr_B_port != HSR_PT_NONE) {
 		port = hsr_port_get_hsr(hsr, node->addr_B_port);
diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h
index d4d1fa70039f..1e297268d1d4 100644
--- a/net/hsr/hsr_framereg.h
+++ b/net/hsr/hsr_framereg.h
@@ -82,15 +82,18 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
 #define hsr_seq_block_index(sequence_nr) ((sequence_nr) >> HSR_SEQ_BLOCK_SHIFT)
 #define hsr_seq_block_bit(sequence_nr) ((sequence_nr) & HSR_SEQ_BLOCK_MASK)
 
+#define DECLARE_BITMAP_FLEX_ARRAY(name, bits) \
+	unsigned long name[][BITS_TO_LONGS(bits)]
+
 struct hsr_seq_block {
 	unsigned long		time;
 	u16			block_idx;
-	DECLARE_BITMAP(seq_nrs, HSR_SEQ_BLOCK_SIZE);
+	DECLARE_BITMAP_FLEX_ARRAY(seq_nrs, HSR_SEQ_BLOCK_SIZE);
 };
 
 struct hsr_node {
 	struct list_head	mac_list;
-	/* Protect R/W access to seq_out and seq_blocks */
+	/* Protect R/W access seq_blocks */
 	spinlock_t		seq_out_lock;
 	unsigned char		macaddress_A[ETH_ALEN];
 	unsigned char		macaddress_B[ETH_ALEN];
@@ -98,17 +101,22 @@ struct hsr_node {
 	enum hsr_port_type	addr_B_port;
 	unsigned long		time_in[HSR_PT_PORTS];
 	bool			time_in_stale[HSR_PT_PORTS];
-	unsigned long		time_out[HSR_PT_PORTS];
 	/* if the node is a SAN */
 	bool			san_a;
 	bool			san_b;
-	u16			seq_out[HSR_PT_PORTS];
 	bool			removed;
-	/* PRP specific duplicate handling */
+	/* Duplicate detection */
 	struct xarray		seq_blocks;
-	struct hsr_seq_block	*block_buf;
+	void			*block_buf;
 	unsigned int		next_block;
+	unsigned int		seq_port_cnt;
 	struct rcu_head		rcu_head;
 };
 
+static inline size_t hsr_seq_block_size(struct hsr_node *node)
+{
+	WARN_ON_ONCE(node->seq_port_cnt == 0);
+	return struct_size_t(struct hsr_seq_block, seq_nrs, node->seq_port_cnt);
+}
+
 #endif /* __HSR_FRAMEREG_H */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 7/9] selftests: hsr: Add more link fault tests for HSR
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
                   ` (5 preceding siblings ...)
  2026-01-22 14:57 ` [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR Felix Maurer
@ 2026-01-22 14:57 ` Felix Maurer
  2026-01-22 14:57 ` [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm Felix Maurer
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

Run the packet loss and reordering tests also for both HSR versions. Now
they can be removed from the hsr_ping tests completely. Note that the tests
currently fail because the HSR duplicate discard algorithm does not
sufficiently take faulty links into account.

The timeout needs to be increased because there are 15 link fault test
cases now, with each of them taking 5-6sec for the test and at most 5sec
for the HSR node tables to get merged and we also want some room to make
the test runs stable.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---
 tools/testing/selftests/net/hsr/hsr_ping.sh   | 32 +++-------------
 .../testing/selftests/net/hsr/link_faults.sh  | 38 +++++++++++++++++++
 tools/testing/selftests/net/hsr/settings      |  2 +-
 3 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/tools/testing/selftests/net/hsr/hsr_ping.sh b/tools/testing/selftests/net/hsr/hsr_ping.sh
index 0ec71b20ab75..f4d685df4345 100755
--- a/tools/testing/selftests/net/hsr/hsr_ping.sh
+++ b/tools/testing/selftests/net/hsr/hsr_ping.sh
@@ -90,27 +90,6 @@ do_ping_tests()
 	stop_if_error "Longer ping test failed (ns3)."
 }
 
-do_link_problem_tests()
-{
-	echo "INFO: Running link problem tests."
-
-	echo "INFO: Delay the link and drop a few packages."
-	tc -net "$ns3" qdisc add dev ns3eth1 root netem delay 50ms
-	tc -net "$ns2" qdisc add dev ns2eth1 root netem delay 5ms loss 25%
-
-	do_ping_long "$ns1" 100.64.0.2
-	do_ping_long "$ns1" 100.64.0.3
-	stop_if_error "Failed with delay and packetloss (ns1)."
-
-	do_ping_long "$ns2" 100.64.0.1
-	do_ping_long "$ns2" 100.64.0.3
-	stop_if_error "Failed with delay and packetloss (ns2)."
-
-	do_ping_long "$ns3" 100.64.0.1
-	do_ping_long "$ns3" 100.64.0.2
-	stop_if_error "Failed with delay and packetloss (ns3)."
-}
-
 setup_hsr_interfaces()
 {
 	local HSRv="$1"
@@ -190,11 +169,10 @@ setup_vlan_interfaces() {
 
 }
 
-run_complete_ping_tests()
+run_ping_tests()
 {
-	echo "INFO: Running complete ping tests."
+	echo "INFO: Running ping tests."
 	do_ping_tests 0
-	do_link_problem_tests
 }
 
 run_vlan_tests()
@@ -204,7 +182,7 @@ run_vlan_tests()
 	vlan_challenged_hsr3=$(ip net exec "$ns3" ethtool -k hsr3 | grep "vlan-challenged" | awk '{print $2}')
 
 	if [[ "$vlan_challenged_hsr1" = "off" || "$vlan_challenged_hsr2" = "off" || "$vlan_challenged_hsr3" = "off" ]]; then
-		echo "INFO: Running VLAN tests"
+		echo "INFO: Running VLAN ping tests"
 		setup_vlan_interfaces
 		do_ping_tests 2
 	else
@@ -217,12 +195,12 @@ trap cleanup_all_ns EXIT
 
 setup_ns ns1 ns2 ns3
 setup_hsr_interfaces 0
-run_complete_ping_tests
+run_ping_tests
 run_vlan_tests
 
 setup_ns ns1 ns2 ns3
 setup_hsr_interfaces 1
-run_complete_ping_tests
+run_ping_tests
 run_vlan_tests
 
 exit $ret
diff --git a/tools/testing/selftests/net/hsr/link_faults.sh b/tools/testing/selftests/net/hsr/link_faults.sh
index 1959bea17147..be526281571c 100755
--- a/tools/testing/selftests/net/hsr/link_faults.sh
+++ b/tools/testing/selftests/net/hsr/link_faults.sh
@@ -7,8 +7,16 @@ source ../lib.sh
 ALL_TESTS="
 	test_clean_hsrv0
 	test_cut_link_hsrv0
+	test_packet_loss_hsrv0
+	test_high_packet_loss_hsrv0
+	test_reordering_hsrv0
+
 	test_clean_hsrv1
 	test_cut_link_hsrv1
+	test_packet_loss_hsrv1
+	test_high_packet_loss_hsrv1
+	test_reordering_hsrv1
+
 	test_clean_prp
 	test_cut_link_prp
 	test_packet_loss_prp
@@ -292,11 +300,31 @@ test_packet_loss()
 	log_test "${tname}"
 }
 
+test_packet_loss_hsrv0()
+{
+	test_packet_loss "HSRv0" "20%"
+}
+
+test_packet_loss_hsrv1()
+{
+	test_packet_loss "HSRv1" "20%"
+}
+
 test_packet_loss_prp()
 {
 	test_packet_loss "PRP" "20%"
 }
 
+test_high_packet_loss_hsrv0()
+{
+	test_packet_loss "HSRv0" "80%"
+}
+
+test_high_packet_loss_hsrv1()
+{
+	test_packet_loss "HSRv1" "80%"
+}
+
 test_high_packet_loss_prp()
 {
 	test_packet_loss "PRP" "80%"
@@ -323,6 +351,16 @@ test_reordering()
 	log_test "${tname}"
 }
 
+test_reordering_hsrv0()
+{
+	test_reordering "HSRv0"
+}
+
+test_reordering_hsrv1()
+{
+	test_reordering "HSRv1"
+}
+
 test_reordering_prp()
 {
 	test_reordering "PRP"
diff --git a/tools/testing/selftests/net/hsr/settings b/tools/testing/selftests/net/hsr/settings
index ba4d85f74cd6..a953c96aa16e 100644
--- a/tools/testing/selftests/net/hsr/settings
+++ b/tools/testing/selftests/net/hsr/settings
@@ -1 +1 @@
-timeout=90
+timeout=180
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
                   ` (6 preceding siblings ...)
  2026-01-22 14:57 ` [PATCH net-next v2 7/9] selftests: hsr: Add more link fault tests " Felix Maurer
@ 2026-01-22 14:57 ` Felix Maurer
  2026-01-29 15:12   ` Sebastian Andrzej Siewior
  2026-01-22 14:57 ` [PATCH net-next v2 9/9] MAINTAINERS: Assign hsr selftests to HSR Felix Maurer
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

With the new duplicate discard algorithm from the earlier patches, the
KUnit test for the algorithm needs to be updated. The updates are done in a
way to match the original intends pretty closely. Currently, there is much
knowledge about the actual algorithm baked into the tests (especially the
expectations) which may need some redesign in the future.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---
 net/hsr/hsr_framereg.c         |  11 ++-
 net/hsr/hsr_framereg.h         |   4 +
 net/hsr/prp_dup_discard_test.c | 154 +++++++++++++++------------------
 3 files changed, 80 insertions(+), 89 deletions(-)

diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index c9bd25e6a7cc..08f91f10a934 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -11,6 +11,7 @@
  * Same code handles filtering of duplicates for PRP as well.
  */
 
+#include <kunit/visibility.h>
 #include <linux/if_ether.h>
 #include <linux/etherdevice.h>
 #include <linux/slab.h>
@@ -294,8 +295,8 @@ static void hsr_forget_seq_block(struct hsr_node *node,
  * expire after time (in low traffic situations) or when they are replaced in
  * the backing fixed size buffer (in high traffic situations).
  */
-static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
-					       u16 block_idx)
+VISIBLE_IF_KUNIT struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
+							 u16 block_idx)
 {
 	struct hsr_seq_block *block, *res;
 	size_t block_sz;
@@ -326,6 +327,7 @@ static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
 
 	return block;
 }
+EXPORT_SYMBOL_IF_KUNIT(hsr_get_seq_block);
 
 /* Use the Supervision frame's info about an eventual macaddress_B for merging
  * nodes that has previously had their macaddress_B registered as a separate
@@ -621,10 +623,7 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
 
 	return hsr_check_duplicate(frame, 0);
 }
-
-#if IS_MODULE(CONFIG_PRP_DUP_DISCARD_KUNIT_TEST)
-EXPORT_SYMBOL(prp_register_frame_out);
-#endif
+EXPORT_SYMBOL_IF_KUNIT(prp_register_frame_out);
 
 static struct hsr_port *get_late_port(struct hsr_priv *hsr,
 				      struct hsr_node *node)
diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h
index 1e297268d1d4..93fc7ae49306 100644
--- a/net/hsr/hsr_framereg.h
+++ b/net/hsr/hsr_framereg.h
@@ -74,6 +74,10 @@ bool hsr_is_node_in_db(struct list_head *node_db,
 
 int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
 
+#if IS_ENABLED(CONFIG_KUNIT)
+struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node, u16 block_idx);
+#endif
+
 #define HSR_SEQ_BLOCK_SHIFT 7 /* 128 bits */
 #define HSR_SEQ_BLOCK_SIZE (1 << HSR_SEQ_BLOCK_SHIFT)
 #define HSR_SEQ_BLOCK_MASK (HSR_SEQ_BLOCK_SIZE - 1)
diff --git a/net/hsr/prp_dup_discard_test.c b/net/hsr/prp_dup_discard_test.c
index e86b7b633ae8..a7a5beb52d62 100644
--- a/net/hsr/prp_dup_discard_test.c
+++ b/net/hsr/prp_dup_discard_test.c
@@ -4,6 +4,8 @@
 #include "hsr_main.h"
 #include "hsr_framereg.h"
 
+MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
+
 struct prp_test_data {
 	struct hsr_port port;
 	struct hsr_port port_rcv;
@@ -13,37 +15,53 @@ struct prp_test_data {
 
 static struct prp_test_data *build_prp_test_data(struct kunit *test)
 {
+	size_t block_sz;
+
 	struct prp_test_data *data = kunit_kzalloc(test,
 		sizeof(struct prp_test_data), GFP_USER);
 	KUNIT_EXPECT_NOT_ERR_OR_NULL(test, data);
 
+	data->node.seq_port_cnt = 1;
+	block_sz = hsr_seq_block_size(&data->node);
+	data->node.block_buf = kunit_kcalloc(test, HSR_MAX_SEQ_BLOCKS, block_sz,
+					     GFP_ATOMIC);
+	KUNIT_EXPECT_NOT_ERR_OR_NULL(test, data);
+	xa_init(&data->node.seq_blocks);
+
 	data->frame.node_src = &data->node;
 	data->frame.port_rcv = &data->port_rcv;
 	data->port_rcv.type = HSR_PT_SLAVE_A;
-	data->node.seq_start[HSR_PT_SLAVE_A] = 1;
-	data->node.seq_expected[HSR_PT_SLAVE_A] = 1;
-	data->node.seq_start[HSR_PT_SLAVE_B] = 1;
-	data->node.seq_expected[HSR_PT_SLAVE_B] = 1;
-	data->node.seq_out[HSR_PT_MASTER] = 0;
-	data->node.time_out[HSR_PT_MASTER] = jiffies;
 	data->port.type = HSR_PT_MASTER;
 
 	return data;
 }
 
-static void check_prp_counters(struct kunit *test,
-			       struct prp_test_data *data,
-			       u16 seq_start_a, u16 seq_expected_a,
-			       u16 seq_start_b, u16 seq_expected_b)
+static void check_prp_frame_seen(struct kunit *test, struct prp_test_data *data,
+				 u16 sequence_nr)
+{
+	u16 block_idx, seq_bit;
+	struct hsr_seq_block *block;
+
+	block_idx = sequence_nr >> HSR_SEQ_BLOCK_SHIFT;
+	block = xa_load(&data->node.seq_blocks, block_idx);
+	KUNIT_EXPECT_NOT_NULL(test, block);
+
+	seq_bit = sequence_nr & HSR_SEQ_BLOCK_MASK;
+	KUNIT_EXPECT_TRUE(test, test_bit(seq_bit, block->seq_nrs[0]));
+}
+
+static void check_prp_frame_unseen(struct kunit *test,
+				   struct prp_test_data *data, u16 sequence_nr)
 {
-	KUNIT_EXPECT_EQ(test, data->node.seq_start[HSR_PT_SLAVE_A],
-			seq_start_a);
-	KUNIT_EXPECT_EQ(test, data->node.seq_start[HSR_PT_SLAVE_B],
-			seq_start_b);
-	KUNIT_EXPECT_EQ(test, data->node.seq_expected[HSR_PT_SLAVE_A],
-			seq_expected_a);
-	KUNIT_EXPECT_EQ(test, data->node.seq_expected[HSR_PT_SLAVE_B],
-			seq_expected_b);
+	u16 block_idx, seq_bit;
+	struct hsr_seq_block *block;
+
+	block_idx = sequence_nr >> HSR_SEQ_BLOCK_SHIFT;
+	block = hsr_get_seq_block(&data->node, block_idx);
+	KUNIT_EXPECT_NOT_NULL(test, block);
+
+	seq_bit = sequence_nr & HSR_SEQ_BLOCK_MASK;
+	KUNIT_EXPECT_FALSE(test, test_bit(seq_bit, block->seq_nrs[0]));
 }
 
 static void prp_dup_discard_forward(struct kunit *test)
@@ -54,52 +72,48 @@ static void prp_dup_discard_forward(struct kunit *test)
 	data->frame.sequence_nr = 2;
 	KUNIT_EXPECT_EQ(test, 0,
 			prp_register_frame_out(&data->port, &data->frame));
-	KUNIT_EXPECT_EQ(test, data->frame.sequence_nr,
-			data->node.seq_out[HSR_PT_MASTER]);
-	KUNIT_EXPECT_EQ(test, jiffies, data->node.time_out[HSR_PT_MASTER]);
-	check_prp_counters(test, data, data->frame.sequence_nr,
-			   data->frame.sequence_nr + 1, 1, 1);
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
 }
 
-static void prp_dup_discard_inside_dropwindow(struct kunit *test)
+static void prp_dup_discard_drop_duplicate(struct kunit *test)
 {
-	/* Normal situation, other LAN ahead by one. Frame is dropped */
 	struct prp_test_data *data = build_prp_test_data(test);
-	unsigned long time = jiffies - 10;
 
-	data->frame.sequence_nr = 1;
-	data->node.seq_expected[HSR_PT_SLAVE_B] = 3;
-	data->node.seq_out[HSR_PT_MASTER] = 2;
-	data->node.time_out[HSR_PT_MASTER] = time;
+	data->frame.sequence_nr = 2;
+	KUNIT_EXPECT_EQ(test, 0,
+			prp_register_frame_out(&data->port, &data->frame));
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
 
 	KUNIT_EXPECT_EQ(test, 1,
 			prp_register_frame_out(&data->port, &data->frame));
-	KUNIT_EXPECT_EQ(test, 2, data->node.seq_out[HSR_PT_MASTER]);
-	KUNIT_EXPECT_EQ(test, time, data->node.time_out[HSR_PT_MASTER]);
-	check_prp_counters(test, data, 2, 2, 2, 3);
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
 }
 
-static void prp_dup_discard_node_timeout(struct kunit *test)
+static void prp_dup_discard_entry_timeout(struct kunit *test)
 {
 	/* Timeout situation, node hasn't sent anything for a while */
 	struct prp_test_data *data = build_prp_test_data(test);
+	struct hsr_seq_block *block;
+	u16 block_idx;
 
 	data->frame.sequence_nr = 7;
-	data->node.seq_start[HSR_PT_SLAVE_A] = 1234;
-	data->node.seq_expected[HSR_PT_SLAVE_A] = 1235;
-	data->node.seq_start[HSR_PT_SLAVE_B] = 1234;
-	data->node.seq_expected[HSR_PT_SLAVE_B] = 1234;
-	data->node.seq_out[HSR_PT_MASTER] = 1234;
-	data->node.time_out[HSR_PT_MASTER] =
-		jiffies - msecs_to_jiffies(HSR_ENTRY_FORGET_TIME) - 1;
+	KUNIT_EXPECT_EQ(test, 0,
+			prp_register_frame_out(&data->port, &data->frame));
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
+
+	data->frame.sequence_nr = 11;
+	KUNIT_EXPECT_EQ(test, 0,
+			prp_register_frame_out(&data->port, &data->frame));
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
+
+	block_idx = data->frame.sequence_nr >> HSR_SEQ_BLOCK_SHIFT;
+	block = hsr_get_seq_block(&data->node, block_idx);
+	block->time = jiffies - msecs_to_jiffies(HSR_ENTRY_FORGET_TIME) - 1;
 
 	KUNIT_EXPECT_EQ(test, 0,
 			prp_register_frame_out(&data->port, &data->frame));
-	KUNIT_EXPECT_EQ(test, data->frame.sequence_nr,
-			data->node.seq_out[HSR_PT_MASTER]);
-	KUNIT_EXPECT_EQ(test, jiffies, data->node.time_out[HSR_PT_MASTER]);
-	check_prp_counters(test, data, data->frame.sequence_nr,
-			   data->frame.sequence_nr + 1, 1234, 1234);
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
+	check_prp_frame_unseen(test, data, 7);
 }
 
 static void prp_dup_discard_out_of_sequence(struct kunit *test)
@@ -107,50 +121,36 @@ static void prp_dup_discard_out_of_sequence(struct kunit *test)
 	/* One frame is received out of sequence on both LANs */
 	struct prp_test_data *data = build_prp_test_data(test);
 
-	data->node.seq_start[HSR_PT_SLAVE_A] = 10;
-	data->node.seq_expected[HSR_PT_SLAVE_A] = 10;
-	data->node.seq_start[HSR_PT_SLAVE_B] = 10;
-	data->node.seq_expected[HSR_PT_SLAVE_B] = 10;
-	data->node.seq_out[HSR_PT_MASTER] = 9;
+	/* initial frame, should be accepted */
+	data->frame.sequence_nr = 9;
+	KUNIT_EXPECT_EQ(test, 0,
+			prp_register_frame_out(&data->port, &data->frame));
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
 
 	/* 1st old frame, should be accepted */
 	data->frame.sequence_nr = 8;
 	KUNIT_EXPECT_EQ(test, 0,
 			prp_register_frame_out(&data->port, &data->frame));
-	KUNIT_EXPECT_EQ(test, data->frame.sequence_nr,
-			data->node.seq_out[HSR_PT_MASTER]);
-	check_prp_counters(test, data, data->frame.sequence_nr,
-			   data->frame.sequence_nr + 1, 10, 10);
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
 
 	/* 2nd frame should be dropped */
 	data->frame.sequence_nr = 8;
 	data->port_rcv.type = HSR_PT_SLAVE_B;
 	KUNIT_EXPECT_EQ(test, 1,
 			prp_register_frame_out(&data->port, &data->frame));
-	check_prp_counters(test, data, data->frame.sequence_nr + 1,
-			   data->frame.sequence_nr + 1,
-			   data->frame.sequence_nr + 1,
-			   data->frame.sequence_nr + 1);
 
 	/* Next frame, this is forwarded */
 	data->frame.sequence_nr = 10;
 	data->port_rcv.type = HSR_PT_SLAVE_A;
 	KUNIT_EXPECT_EQ(test, 0,
 			prp_register_frame_out(&data->port, &data->frame));
-	KUNIT_EXPECT_EQ(test, data->frame.sequence_nr,
-			data->node.seq_out[HSR_PT_MASTER]);
-	check_prp_counters(test, data, data->frame.sequence_nr,
-			   data->frame.sequence_nr + 1, 9, 9);
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
 
 	/* and next one is dropped */
 	data->frame.sequence_nr = 10;
 	data->port_rcv.type = HSR_PT_SLAVE_B;
 	KUNIT_EXPECT_EQ(test, 1,
 			prp_register_frame_out(&data->port, &data->frame));
-	check_prp_counters(test, data, data->frame.sequence_nr + 1,
-			   data->frame.sequence_nr + 1,
-			   data->frame.sequence_nr + 1,
-			   data->frame.sequence_nr + 1);
 }
 
 static void prp_dup_discard_lan_b_late(struct kunit *test)
@@ -158,43 +158,31 @@ static void prp_dup_discard_lan_b_late(struct kunit *test)
 	/* LAN B is behind */
 	struct prp_test_data *data = build_prp_test_data(test);
 
-	data->node.seq_start[HSR_PT_SLAVE_A] = 9;
-	data->node.seq_expected[HSR_PT_SLAVE_A] = 9;
-	data->node.seq_start[HSR_PT_SLAVE_B] = 9;
-	data->node.seq_expected[HSR_PT_SLAVE_B] = 9;
-	data->node.seq_out[HSR_PT_MASTER] = 8;
-
 	data->frame.sequence_nr = 9;
 	KUNIT_EXPECT_EQ(test, 0,
 			prp_register_frame_out(&data->port, &data->frame));
-	KUNIT_EXPECT_EQ(test, data->frame.sequence_nr,
-			data->node.seq_out[HSR_PT_MASTER]);
-	check_prp_counters(test, data, 9, 10, 9, 9);
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
 
 	data->frame.sequence_nr = 10;
 	KUNIT_EXPECT_EQ(test, 0,
 			prp_register_frame_out(&data->port, &data->frame));
-	KUNIT_EXPECT_EQ(test, data->frame.sequence_nr,
-			data->node.seq_out[HSR_PT_MASTER]);
-	check_prp_counters(test, data, 9, 11, 9, 9);
+	check_prp_frame_seen(test, data, data->frame.sequence_nr);
 
 	data->frame.sequence_nr = 9;
 	data->port_rcv.type = HSR_PT_SLAVE_B;
 	KUNIT_EXPECT_EQ(test, 1,
 			prp_register_frame_out(&data->port, &data->frame));
-	check_prp_counters(test, data, 10, 11, 10, 10);
 
 	data->frame.sequence_nr = 10;
 	data->port_rcv.type = HSR_PT_SLAVE_B;
 	KUNIT_EXPECT_EQ(test, 1,
 			prp_register_frame_out(&data->port, &data->frame));
-	check_prp_counters(test, data,  11, 11, 11, 11);
 }
 
 static struct kunit_case prp_dup_discard_test_cases[] = {
 	KUNIT_CASE(prp_dup_discard_forward),
-	KUNIT_CASE(prp_dup_discard_inside_dropwindow),
-	KUNIT_CASE(prp_dup_discard_node_timeout),
+	KUNIT_CASE(prp_dup_discard_drop_duplicate),
+	KUNIT_CASE(prp_dup_discard_entry_timeout),
 	KUNIT_CASE(prp_dup_discard_out_of_sequence),
 	KUNIT_CASE(prp_dup_discard_lan_b_late),
 	{}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH net-next v2 9/9] MAINTAINERS: Assign hsr selftests to HSR
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
                   ` (7 preceding siblings ...)
  2026-01-22 14:57 ` [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm Felix Maurer
@ 2026-01-22 14:57 ` Felix Maurer
  2026-01-22 17:24 ` [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Sebastian Andrzej Siewior
  2026-01-23  1:35 ` Jakub Kicinski
  10 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-22 14:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

Despite the HSR subsystem being orphaned at the moment due to the original
maintainer being unreachable for a while, assign the selftests to the
subsystem for the future.

Signed-off-by: Felix Maurer <fmaurer@redhat.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 92768bceb929..6cb87b640ffc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11582,6 +11582,7 @@ HSR NETWORK PROTOCOL
 L:	netdev@vger.kernel.org
 S:	Orphan
 F:	net/hsr/
+F:	tools/testing/selftests/net/hsr/
 
 HT16K33 LED CONTROLLER DRIVER
 M:	Robin van der Gracht <robin@protonic.nl>
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
                   ` (8 preceding siblings ...)
  2026-01-22 14:57 ` [PATCH net-next v2 9/9] MAINTAINERS: Assign hsr selftests to HSR Felix Maurer
@ 2026-01-22 17:24 ` Sebastian Andrzej Siewior
  2026-01-23  1:35 ` Jakub Kicinski
  10 siblings, 0 replies; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-22 17:24 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-01-22 15:56:55 [+0100], Felix Maurer wrote:

I try to get to this.

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm
  2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
                   ` (9 preceding siblings ...)
  2026-01-22 17:24 ` [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Sebastian Andrzej Siewior
@ 2026-01-23  1:35 ` Jakub Kicinski
  2026-01-26  9:28   ` Felix Maurer
  10 siblings, 1 reply; 55+ messages in thread
From: Jakub Kicinski @ 2026-01-23  1:35 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy

On Thu, 22 Jan 2026 15:56:55 +0100 Felix Maurer wrote:
> The duplicate discard algorithms for PRP and HSR do not work reliably
> with certain link faults. Especially with packet loss on one link, the
> duplicate discard algorithms drop valid packets. For a more thorough
> description see patches 4 (for PRP) and 6 (for HSR).

Transient build breakage here - patches 4-7 don't build and will break
bisection. Perhaps wait a few days to give a chance for Sebastian to
TAL before reposting tho?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm
  2026-01-23  1:35 ` Jakub Kicinski
@ 2026-01-26  9:28   ` Felix Maurer
  2026-01-29 15:29     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-26  9:28 UTC (permalink / raw)
  To: Jakub Kicinski, bigeasy
  Cc: netdev, davem, edumazet, pabeni, horms, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio

On Thu, Jan 22, 2026 at 05:35:56PM -0800, Jakub Kicinski wrote:
> Transient build breakage here - patches 4-7 don't build and will break
> bisection. Perhaps wait a few days to give a chance for Sebastian to
> TAL before reposting tho?

The build breaks because I didn't update the KUnit test for the PRP
duplicate discard algorithm in patch 4 and 6 to keep the patches a bit
smaller. But the build succeeds without
CONFIG_PRP_DUP_DISCARD_KUNIT_TEST, which is probably not active in most
bisection runs.

I'll wait for feedback from Sebastian, but I'm also fine with merging
the fixes for the KUnit test into the other patches and reposting.

Thanks,
   Felix

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-01-22 14:56 ` [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP Felix Maurer
@ 2026-01-28 16:38   ` Simon Horman
  2026-01-28 18:37     ` Felix Maurer
  2026-01-29 13:29   ` Sebastian Andrzej Siewior
  2026-02-02  8:47   ` Steffen Lindner
  2 siblings, 1 reply; 55+ messages in thread
From: Simon Horman @ 2026-01-28 16:38 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy, Steffen Lindner

On Thu, Jan 22, 2026 at 03:56:59PM +0100, Felix Maurer wrote:

...

> diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c

...

> +/* Get the currently active sequence number block. If there is no block yet, or
> + * the existing one is expired, a new block is created. The idea is to maintain
> + * a "sparse bitmap" where a bitmap for the whole sequence number space is
> + * split into blocks and not all blocks exist all the time. The blocks can
> + * expire after time (in low traffic situations) or when they are replaced in
> + * the backing fixed size buffer (in high traffic situations).
> + */
> +static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
> +					       u16 block_idx)
> +{
> +	struct hsr_seq_block *block, *res;
> +
> +	block = xa_load(&node->seq_blocks, block_idx);
> +
> +	if (block && hsr_seq_block_is_old(block)) {
> +		hsr_forget_seq_block(node, block);
> +		block = NULL;
> +	}
> +
> +	if (!block) {
> +		block = &node->block_buf[node->next_block];
> +		hsr_forget_seq_block(node, block);
> +
> +		memset(block, 0, sizeof(*block));
> +		block->time = jiffies;
> +		block->block_idx = block_idx;
> +
> +		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
> +		if (xa_is_err(res))

Hi Felix,

I ran Claude Code over this with review-prompts [1] and it flags
that in the error path above, the following is needed so that the
block can be re-used.

			block->time = 0;

[1] https://github.com/masoncl/review-prompts/

> +			return NULL;
> +
> +		node->next_block =
> +			(node->next_block + 1) & (HSR_MAX_SEQ_BLOCKS - 1);
> +	}
> +
> +	return block;
> +}
> +

...

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-01-28 16:38   ` Simon Horman
@ 2026-01-28 18:37     ` Felix Maurer
  2026-02-02 16:57       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-28 18:37 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, davem, edumazet, kuba, pabeni, jkarrenpalo, tglx, mingo,
	allison.henderson, petrm, antonio, bigeasy, Steffen Lindner

On Wed, Jan 28, 2026 at 04:38:24PM +0000, Simon Horman wrote:
> On Thu, Jan 22, 2026 at 03:56:59PM +0100, Felix Maurer wrote:
>
> ...
>
> > diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
>
> ...
>
> > +/* Get the currently active sequence number block. If there is no block yet, or
> > + * the existing one is expired, a new block is created. The idea is to maintain
> > + * a "sparse bitmap" where a bitmap for the whole sequence number space is
> > + * split into blocks and not all blocks exist all the time. The blocks can
> > + * expire after time (in low traffic situations) or when they are replaced in
> > + * the backing fixed size buffer (in high traffic situations).
> > + */
> > +static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
> > +					       u16 block_idx)
> > +{
> > +	struct hsr_seq_block *block, *res;
> > +
> > +	block = xa_load(&node->seq_blocks, block_idx);
> > +
> > +	if (block && hsr_seq_block_is_old(block)) {
> > +		hsr_forget_seq_block(node, block);
> > +		block = NULL;
> > +	}
> > +
> > +	if (!block) {
> > +		block = &node->block_buf[node->next_block];
> > +		hsr_forget_seq_block(node, block);
> > +
> > +		memset(block, 0, sizeof(*block));
> > +		block->time = jiffies;
> > +		block->block_idx = block_idx;
> > +
> > +		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
> > +		if (xa_is_err(res))
>
> Hi Felix,
>
> I ran Claude Code over this with review-prompts [1] and it flags
> that in the error path above, the following is needed so that the
> block can be re-used.
>
> 			block->time = 0;

I agree. It would nonetheless be reused at some point, but not setting
time = 0 may lead to an unexpected block getting removed when it is
reused (or at least an attempt to do so). I'll fix this in v3.

Thanks,
   Felix


> [1] https://github.com/masoncl/review-prompts/
>
> > +			return NULL;
> > +
> > +		node->next_block =
> > +			(node->next_block + 1) & (HSR_MAX_SEQ_BLOCKS - 1);
> > +	}
> > +
> > +	return block;
> > +}
> > +
>
> ...


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-01-22 14:56 ` [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP Felix Maurer
@ 2026-01-29 11:05   ` Sebastian Andrzej Siewior
  2026-01-29 13:31     ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-29 11:05 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-01-22 15:56:56 [+0100], Felix Maurer wrote:
> diff --git a/tools/testing/selftests/net/hsr/prp_ping.sh b/tools/testing/selftests/net/hsr/prp_ping.sh
> new file mode 100755
> index 000000000000..fd2ba9f05d4c
> --- /dev/null
> +++ b/tools/testing/selftests/net/hsr/prp_ping.sh
…
> +	# MAC addresses will be copied from LAN A interface
> +	ip -net "$node1" link set address 00:11:22:00:00:01 dev vethA
> +	ip -net "$node2" link set address 00:11:22:00:00:02 dev vethA

so I somehow started this (I think) but while browsing the spec it
somehow says that the same MAC address should be used on both ports.
Could it be?
It says that the two frames are identical except for the LAN field and
checksum. Also the duplication is defined on src-MAC + seq nr.
Having this requires to merge the two MACs for a node and we do this but
could this be a left over from an older version of the spec or a
behaviour that was not meant happen?

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-01-22 14:56 ` [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP Felix Maurer
  2026-01-28 16:38   ` Simon Horman
@ 2026-01-29 13:29   ` Sebastian Andrzej Siewior
  2026-01-29 15:30     ` Felix Maurer
  2026-02-02  8:47   ` Steffen Lindner
  2 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-29 13:29 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Steffen Lindner

On 2026-01-22 15:56:59 [+0100], Felix Maurer wrote:
> The PRP duplicate discard algorithm does not work reliably with certain
…
> The idea of the new algorithm is to store the seen sequence numbers in a
> bitmap. To keep the size of the bitmap in control, we store it as a "sparse
> bitmap" where the bitmap is split into blocks and not all blocks exist at
> the same time. The sparse bitmap is implemented using an xarray that keeps
> the references to the individual blocks and a backing ring buffer that
…

My idea was to keep track of say the last 64 sequence numbers but I had
no idea how to efficiently implement it with all the "forget" parts and
so on. What you did here is quite nice.

> @@ -280,6 +301,59 @@ struct hsr_node *hsr_get_node(struct hsr_port *port, struct list_head *node_db,
…
> +/* Get the currently active sequence number block. If there is no block yet, or
> + * the existing one is expired, a new block is created. The idea is to maintain
> + * a "sparse bitmap" where a bitmap for the whole sequence number space is
> + * split into blocks and not all blocks exist all the time. The blocks can
> + * expire after time (in low traffic situations) or when they are replaced in
> + * the backing fixed size buffer (in high traffic situations).
                                    HSR_MAX_SEQ_BLOCKS, 
> + */
> +static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
> +					       u16 block_idx)
> +{
> +	struct hsr_seq_block *block, *res;
> +
> +	block = xa_load(&node->seq_blocks, block_idx);
> +
> +	if (block && hsr_seq_block_is_old(block)) {
> +		hsr_forget_seq_block(node, block);
> +		block = NULL;
> +	}
> +
> +	if (!block) {
> +		block = &node->block_buf[node->next_block];
> +		hsr_forget_seq_block(node, block);
> +
> +		memset(block, 0, sizeof(*block));
> +		block->time = jiffies;

So we assign ->time here while the block is created and it expires after
HSR_ENTRY_FORGET_TIME (400ms). *Maybe* it should be updated in
hsr_check_duplicate() if we set a bit. The difference would be that the
last sequence in the block would get it whole life time while now the
lifetime starts with the first entry in the block. The downside
of course would be that the first entry gets more than the initialy
expected lifetime.
Not sure if this matters in reality, just wanted to mention.

> +		block->block_idx = block_idx;
> +
> +		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
> +		if (xa_is_err(res))
> +			return NULL;
> +
> +		node->next_block =
> +			(node->next_block + 1) & (HSR_MAX_SEQ_BLOCKS - 1);
> +	}
> +
> +	return block;
> +}
> +
>  /* Use the Supervision frame's info about an eventual macaddress_B for merging
>   * nodes that has previously had their macaddress_B registered as a separate
>   * node.
> @@ -545,47 +631,26 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
…
> +
> +	seq_bit = hsr_seq_block_bit(sequence_nr);
> +	if (test_and_set_bit(seq_bit, block->seq_nrs))

the access happens under ::seq_out_lock so you don't need to be atomic
here. You could safe a cycle and use __test_and_set_bit() instead.

> +		goto out_seen;
>  
>  	node->time_out[port->type] = jiffies;
>  	node->seq_out[port->type] = sequence_nr;
> +out_new:
>  	spin_unlock_bh(&node->seq_out_lock);
>  	return 0;
> +
> +out_seen:
> +	spin_unlock_bh(&node->seq_out_lock);
> +	return 1;
>  }
>  
>  #if IS_MODULE(CONFIG_PRP_DUP_DISCARD_KUNIT_TEST)

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-01-29 11:05   ` Sebastian Andrzej Siewior
@ 2026-01-29 13:31     ` Felix Maurer
  2026-01-29 15:21       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-29 13:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Thu, Jan 29, 2026 at 12:05:00PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-22 15:56:56 [+0100], Felix Maurer wrote:
> > diff --git a/tools/testing/selftests/net/hsr/prp_ping.sh b/tools/testing/selftests/net/hsr/prp_ping.sh
> > new file mode 100755
> > index 000000000000..fd2ba9f05d4c
> > --- /dev/null
> > +++ b/tools/testing/selftests/net/hsr/prp_ping.sh
> …
> > +	# MAC addresses will be copied from LAN A interface
> > +	ip -net "$node1" link set address 00:11:22:00:00:01 dev vethA
> > +	ip -net "$node2" link set address 00:11:22:00:00:02 dev vethA
>
> so I somehow started this (I think) but while browsing the spec it
> somehow says that the same MAC address should be used on both ports.
> Could it be?
> It says that the two frames are identical except for the LAN field and
> checksum. Also the duplication is defined on src-MAC + seq nr.
> Having this requires to merge the two MACs for a node and we do this but
> could this be a left over from an older version of the spec or a
> behaviour that was not meant happen?

Yes, for PRP it is required that both ports, A and B, of a node send
with the same MAC. For us that means that the two ports need to be
configured with the same MAC address. This used to be a common source of
configuration errors. Therefore, b65999e7238e ("net: hsr: sync hw addr
of slave2 according to slave1 hw addr on PRP") made it so that we are
now copying the MAC from port A to port B.

Therefore, I'm only setting the MAC of vethA on each node in the test.
Even this is not strictly necessary but it turns out that debugging is a
lot simpler, when it is obvious addresses belong to which node.

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
  2026-01-22 14:57 ` [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP Felix Maurer
@ 2026-01-29 13:32   ` Sebastian Andrzej Siewior
  2026-02-02 11:30     ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-29 13:32 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-01-22 15:57:00 [+0100], Felix Maurer wrote:
> Add tests where one link has different rates of packet loss or reorders
> packets. PRP should still be able to recover from these link faults and
> show no packet loss.  However, it is acceptable to receive some level of
> duplicate packets. This matches the current specification (IEC
> 62439-3:2021) of the duplicate discard algorithm that requires it to be
> "designed such that it never rejects a legitimate frame, while occasional
> acceptance of a duplicate can be tolerated." The rate of acceptable
> duplicates in this test is intentionally high (10%) to make the test
> stable, the values I observed in the worst test cases (20% loss) are around
> 5% duplicates.

Do you know why we have the duplicates? It is because of the high
sending rate at which point the blocks are dropped and we can't
recognise the duplicate?

> Signed-off-by: Felix Maurer <fmaurer@redhat.com>

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-01-22 14:57 ` [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR Felix Maurer
@ 2026-01-29 14:43   ` Sebastian Andrzej Siewior
  2026-01-29 16:17     ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-29 14:43 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On 2026-01-22 15:57:01 [+0100], Felix Maurer wrote:
…
> As the problem (we accidentally skip over a sequence number that has not
> been received but will be received in the future) is similar to PRP, we can
> apply a similar solution. The duplicate discard algorithm based on the
> "sparse bitmap" works well for HSR if it is extended to track one bitmap
> for each port (A, B, master, interlink). To do this, change the sequence
> number blocks to contain a flexible array member as the last member that
> can keep chunks for as many bitmaps as we need. This design makes it easy
> to reuse the same algorithm in a potential PRP RedBox implementation.

I know you just "copy" the logic based on what we have now but…
Why do we have to track the sequence number for A, B and interlink? The
'master' port is what we feed into the stack so this needs to be
de-duplicated. I am not sure how 'interlink' works so I keep quiet here.
But A and B? There shouldn't be any duplicates on A and B unless the
destination node forwards the node. Or do I miss something?
I'm bringing this up because limiting to one (or two since I am unsure
about interlink) would save some memory and avoid needless updates. And
if you have HW-offloading enabled then you shouldn't see any packets
which are not directed to _this_ node.

…
> diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
> index 70b3819e9298..c9bd25e6a7cc 100644
> --- a/net/hsr/hsr_framereg.c
> +++ b/net/hsr/hsr_framereg.c
> @@ -805,6 +774,39 @@ void *hsr_get_next_node(struct hsr_priv *hsr, void *_pos,
>  	return NULL;
>  }
>  
> +/* Fill the last sequence number that has been received from node on if1 by
> + * finding the last sequence number sent on port B; accordingly get the last
> + * received sequence number for if2 using sent sequence numbers on port A.
> + */
> +static void fill_last_seq_nrs(struct hsr_node *node, u16 *if1_seq, u16 *if2_seq)
> +{
> +	struct hsr_seq_block *block;
> +	unsigned int block_off;
> +	size_t block_sz;
> +	u16 seq_bit;
> +
> +	spin_lock_bh(&node->seq_out_lock);
> +
> +	// Get last inserted block
> +	block_off = (node->next_block - 1) & (HSR_MAX_SEQ_BLOCKS - 1);
> +	block_sz = hsr_seq_block_size(node);
> +	block = node->block_buf + block_off * block_sz;
> +
> +	if (!bitmap_empty(block->seq_nrs[HSR_PT_SLAVE_B - 1],
> +			  HSR_SEQ_BLOCK_SIZE)) {
> +		seq_bit = find_last_bit(block->seq_nrs[HSR_PT_SLAVE_B - 1],
> +					HSR_SEQ_BLOCK_SIZE);
> +		*if1_seq = (block->block_idx << HSR_SEQ_BLOCK_SHIFT) | seq_bit;
> +	}
> +	if (!bitmap_empty(block->seq_nrs[HSR_PT_SLAVE_A - 1],
> +			  HSR_SEQ_BLOCK_SIZE)) {
> +		seq_bit = find_last_bit(block->seq_nrs[HSR_PT_SLAVE_A - 1],
> +					HSR_SEQ_BLOCK_SIZE);
> +		*if2_seq = (block->block_idx << HSR_SEQ_BLOCK_SHIFT) | seq_bit;
> +	}
> +	spin_unlock_bh(&node->seq_out_lock);

I know you preserve what is already here but what is this even used for?
| ip -d link show dev hsr0

does not show these numbers. It shows the sequence number of the hsr0
interface which I understand.
But then it is also possible to show the last received sequence number
of any node on either of the two interfaces?

> +}
> +
>  int hsr_get_node_data(struct hsr_priv *hsr,
>  		      const unsigned char *addr,
>  		      unsigned char addr_b[ETH_ALEN],
…
> --- a/net/hsr/hsr_framereg.h
> +++ b/net/hsr/hsr_framereg.h
> @@ -82,15 +82,18 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
>  #define hsr_seq_block_index(sequence_nr) ((sequence_nr) >> HSR_SEQ_BLOCK_SHIFT)
>  #define hsr_seq_block_bit(sequence_nr) ((sequence_nr) & HSR_SEQ_BLOCK_MASK)
>  
> +#define DECLARE_BITMAP_FLEX_ARRAY(name, bits) \
> +	unsigned long name[][BITS_TO_LONGS(bits)]
> +
>  struct hsr_seq_block {
>  	unsigned long		time;
>  	u16			block_idx;
> -	DECLARE_BITMAP(seq_nrs, HSR_SEQ_BLOCK_SIZE);
> +	DECLARE_BITMAP_FLEX_ARRAY(seq_nrs, HSR_SEQ_BLOCK_SIZE);

is there a story behind DECLARE_BITMAP_FLEX_ARRAY()? We have just this
one user.

>  };

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm
  2026-01-22 14:57 ` [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm Felix Maurer
@ 2026-01-29 15:12   ` Sebastian Andrzej Siewior
  2026-01-29 16:19     ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-29 15:12 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-01-22 15:57:03 [+0100], Felix Maurer wrote:
> With the new duplicate discard algorithm from the earlier patches, the
> KUnit test for the algorithm needs to be updated. The updates are done in a
> way to match the original intends pretty closely. Currently, there is much
> knowledge about the actual algorithm baked into the tests (especially the
> expectations) which may need some redesign in the future.

So the below isn't new but I wasn't aware we have this kunit thing
earlier. It triggers even before your series, just differently now. The
problem seems to be that node member isn't properly initialized. So
lockep complains:

| 1..1
|     KTAP version 1
|     # Subtest: prp_duplicate_discard
|     # module: prp_dup_discard_test
|     1..5
| INFO: trying to register non-static key.
| The code is fine but needs lockdep annotation, or maybe
| you didn't initialize this object before use?
| turning off the locking correctness validator.
…
|  lock_acquire+0xc8/0x210
|  rt_spin_lock+0x2e/0x1e0
|  hsr_check_duplicate+0x40/0xb0
|  prp_dup_discard_forward+0x30/0x180 [prp_dup_discard_test]

It needs:

diff --git a/net/hsr/prp_dup_discard_test.c b/net/hsr/prp_dup_discard_test.c
index a7a5beb52d62f..2ab346d920bbd 100644
--- a/net/hsr/prp_dup_discard_test.c
+++ b/net/hsr/prp_dup_discard_test.c
@@ -27,6 +27,7 @@ static struct prp_test_data *build_prp_test_data(struct kunit *test)
 					     GFP_ATOMIC);
 	KUNIT_EXPECT_NOT_ERR_OR_NULL(test, data);
 	xa_init(&data->node.seq_blocks);
+	spin_lock_init(&data->node.seq_out_lock);
 
 	data->frame.node_src = &data->node;
 	data->frame.port_rcv = &data->port_rcv;

Sebastian

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-01-29 13:31     ` Felix Maurer
@ 2026-01-29 15:21       ` Sebastian Andrzej Siewior
  2026-01-29 17:44         ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-29 15:21 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-01-29 14:31:30 [+0100], Felix Maurer wrote:
> On Thu, Jan 29, 2026 at 12:05:00PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2026-01-22 15:56:56 [+0100], Felix Maurer wrote:
> > > diff --git a/tools/testing/selftests/net/hsr/prp_ping.sh b/tools/testing/selftests/net/hsr/prp_ping.sh
> > > new file mode 100755
> > > index 000000000000..fd2ba9f05d4c
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/net/hsr/prp_ping.sh
> > …
> > > +	# MAC addresses will be copied from LAN A interface
> > > +	ip -net "$node1" link set address 00:11:22:00:00:01 dev vethA
> > > +	ip -net "$node2" link set address 00:11:22:00:00:02 dev vethA
> >
> > so I somehow started this (I think) but while browsing the spec it
> > somehow says that the same MAC address should be used on both ports.
> > Could it be?
> > It says that the two frames are identical except for the LAN field and
> > checksum. Also the duplication is defined on src-MAC + seq nr.
> > Having this requires to merge the two MACs for a node and we do this but
> > could this be a left over from an older version of the spec or a
> > behaviour that was not meant happen?
> 
> Yes, for PRP it is required that both ports, A and B, of a node send
> with the same MAC. For us that means that the two ports need to be
> configured with the same MAC address. This used to be a common source of
> configuration errors. Therefore, b65999e7238e ("net: hsr: sync hw addr
> of slave2 according to slave1 hw addr on PRP") made it so that we are
> now copying the MAC from port A to port B.
> 
> Therefore, I'm only setting the MAC of vethA on each node in the test.
> Even this is not strictly necessary but it turns out that debugging is a
> lot simpler, when it is obvious addresses belong to which node.

Looking at the hsr tests, those have two different macs… It should be
the same. It works because it merges the nodes and lookup works for
both…

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm
  2026-01-26  9:28   ` Felix Maurer
@ 2026-01-29 15:29     ` Sebastian Andrzej Siewior
  2026-01-29 16:29       ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-01-29 15:29 UTC (permalink / raw)
  To: Felix Maurer
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, horms,
	jkarrenpalo, tglx, mingo, allison.henderson, petrm, antonio

On 2026-01-26 10:28:18 [+0100], Felix Maurer wrote:
> 
> I'll wait for feedback from Sebastian, but I'm also fine with merging
> the fixes for the KUnit test into the other patches and reposting.

Overall the new algorithm is very nice. Feel free to add
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

I made a few comments where I think it should be possible to avoid
tracking the sequence number for interface A and B but since you
preserve what is done now, I don't mind to keep it as-it and try to
remove it later on (unless I'm wrong and it is not doable).

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-01-29 13:29   ` Sebastian Andrzej Siewior
@ 2026-01-29 15:30     ` Felix Maurer
  0 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-29 15:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Steffen Lindner

On Thu, Jan 29, 2026 at 02:29:40PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-22 15:56:59 [+0100], Felix Maurer wrote:
> > The PRP duplicate discard algorithm does not work reliably with certain
> …
> > The idea of the new algorithm is to store the seen sequence numbers in a
> > bitmap. To keep the size of the bitmap in control, we store it as a "sparse
> > bitmap" where the bitmap is split into blocks and not all blocks exist at
> > the same time. The sparse bitmap is implemented using an xarray that keeps
> > the references to the individual blocks and a backing ring buffer that
> …
>
> My idea was to keep track of say the last 64 sequence numbers but I had
> no idea how to efficiently implement it with all the "forget" parts and
> so on. What you did here is quite nice.

Thank you!

> > @@ -280,6 +301,59 @@ struct hsr_node *hsr_get_node(struct hsr_port *port, struct list_head *node_db,
> …
> > +/* Get the currently active sequence number block. If there is no block yet, or
> > + * the existing one is expired, a new block is created. The idea is to maintain
> > + * a "sparse bitmap" where a bitmap for the whole sequence number space is
> > + * split into blocks and not all blocks exist all the time. The blocks can
> > + * expire after time (in low traffic situations) or when they are replaced in
> > + * the backing fixed size buffer (in high traffic situations).
>                                     HSR_MAX_SEQ_BLOCKS,
> > + */
> > +static struct hsr_seq_block *hsr_get_seq_block(struct hsr_node *node,
> > +					       u16 block_idx)
> > +{
> > +	struct hsr_seq_block *block, *res;
> > +
> > +	block = xa_load(&node->seq_blocks, block_idx);
> > +
> > +	if (block && hsr_seq_block_is_old(block)) {
> > +		hsr_forget_seq_block(node, block);
> > +		block = NULL;
> > +	}
> > +
> > +	if (!block) {
> > +		block = &node->block_buf[node->next_block];
> > +		hsr_forget_seq_block(node, block);
> > +
> > +		memset(block, 0, sizeof(*block));
> > +		block->time = jiffies;
>
> So we assign ->time here while the block is created and it expires after
> HSR_ENTRY_FORGET_TIME (400ms). *Maybe* it should be updated in
> hsr_check_duplicate() if we set a bit. The difference would be that the
> last sequence in the block would get it whole life time while now the
> lifetime starts with the first entry in the block. The downside
> of course would be that the first entry gets more than the initialy
> expected lifetime.
> Not sure if this matters in reality, just wanted to mention.

I considered that as well. But given that the standard states that
"since the 16-bit SeqNr wraps around after 65 536 frames [...], entries
older than a specified time EntryForgetTime are purged" (IEC
62439-3:2021, 4.1.10.3) and later more explicitly that EntryForgetTime
is the "maximum time an entry may reside in the duplicate table" (4.4),
I decided that we rather go the safe way here and use the lifetime of
the first entry.

One can definitely construct a scenario where using the lifetime of the
last entry is problematic: let's say we have a block with a single
entry. If the sequence number wraps around and we reach the same block
again after, let's say, 350ms but hit a different bit in the block, the
new lifetime would apply to both entries, which would almost double the
lifetime of the initial entry.

The large span of 128 entries per timestamp could be mitigated by
changing the size (and maybe number) of blocks. The 64 blocks with 128
entries each is a kind of arbitrary pick, but make a good tradeoff in my
opinion. I'm making that tradeoff between memory usage, timestamp
inaccuracy, and link delay we can deal with (with 64 * 128 = 2^13
entries out of the 2^16 sequence number space, i.e, 1/8th of the space,
that's 5-6ms on a 1Gbit/s network where sequence number could repeat
every 44ms according to the standard).

> > +		block->block_idx = block_idx;
> > +
> > +		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
> > +		if (xa_is_err(res))
> > +			return NULL;
> > +
> > +		node->next_block =
> > +			(node->next_block + 1) & (HSR_MAX_SEQ_BLOCKS - 1);
> > +	}
> > +
> > +	return block;
> > +}
> > +
> >  /* Use the Supervision frame's info about an eventual macaddress_B for merging
> >   * nodes that has previously had their macaddress_B registered as a separate
> >   * node.
> > @@ -545,47 +631,26 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame)
> …
> > +
> > +	seq_bit = hsr_seq_block_bit(sequence_nr);
> > +	if (test_and_set_bit(seq_bit, block->seq_nrs))
>
> the access happens under ::seq_out_lock so you don't need to be atomic
> here. You could safe a cycle and use __test_and_set_bit() instead.

Good point, I tried to get rid of the lock at some point, but then
decided that it would be material for another patchset. I'll change it
to __test_and_set_bit().

Thanks,
   Felix


> > +		goto out_seen;
> >
> >  	node->time_out[port->type] = jiffies;
> >  	node->seq_out[port->type] = sequence_nr;
> > +out_new:
> >  	spin_unlock_bh(&node->seq_out_lock);
> >  	return 0;
> > +
> > +out_seen:
> > +	spin_unlock_bh(&node->seq_out_lock);
> > +	return 1;
> >  }
> >
> >  #if IS_MODULE(CONFIG_PRP_DUP_DISCARD_KUNIT_TEST)
>
> Sebastian


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-01-29 14:43   ` Sebastian Andrzej Siewior
@ 2026-01-29 16:17     ` Felix Maurer
  2026-01-29 18:01       ` Felix Maurer
  2026-02-02 17:11       ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-29 16:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On Thu, Jan 29, 2026 at 03:43:48PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-22 15:57:01 [+0100], Felix Maurer wrote:
> …
> > As the problem (we accidentally skip over a sequence number that has not
> > been received but will be received in the future) is similar to PRP, we can
> > apply a similar solution. The duplicate discard algorithm based on the
> > "sparse bitmap" works well for HSR if it is extended to track one bitmap
> > for each port (A, B, master, interlink). To do this, change the sequence
> > number blocks to contain a flexible array member as the last member that
> > can keep chunks for as many bitmaps as we need. This design makes it easy
> > to reuse the same algorithm in a potential PRP RedBox implementation.
>
> I know you just "copy" the logic based on what we have now but…
> Why do we have to track the sequence number for A, B and interlink? The
> 'master' port is what we feed into the stack so this needs to be
> de-duplicated. I am not sure how 'interlink' works so I keep quiet here.
> But A and B? There shouldn't be any duplicates on A and B unless the
> destination node forwards the node. Or do I miss something?
> I'm bringing this up because limiting to one (or two since I am unsure
> about interlink) would save some memory and avoid needless updates. And
> if you have HW-offloading enabled then you shouldn't see any packets
> which are not directed to _this_ node.

About the interlink: that's the interface where you attach devices that
know nothing about HSR, i.e., when we are a RedBox. I consider it very
similar to the master port, it's our responsibility to de-duplicate what
we send out there.

I was thinking about exactly this while working on the patch as well and
I came to each conclusion (A,B are needed vs. are not needed) at least
once. In the end, I think we will need it. It's right that a well
behaving node in the ring should not forward "frames for which the node
is the unique destination" (5.3.2.1). But there could be frames that
have no unique destination in the ring at all: multicast frames or
frames addressed to a non-existing MAC address. We should not forward
such frames either to prevent them from looping forever.

Now, such frames should probably only reach back to us, if we sent them
(either from our stack or from the interlink port). We could track
sequence numbers sent to master and interlink (for de-duplication) and
sent by master and interlink (for loop prevention) for more clarity, but
then we're back at four bitmaps again.

I agree that we can and should optimize the HW-offloaded case. I'd
suggest to do that in a separate patchset, though, partly because I
don't have access to a hardware with HSR offload at the moment.

> …
> > diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
> > index 70b3819e9298..c9bd25e6a7cc 100644
> > --- a/net/hsr/hsr_framereg.c
> > +++ b/net/hsr/hsr_framereg.c
> > @@ -805,6 +774,39 @@ void *hsr_get_next_node(struct hsr_priv *hsr, void *_pos,
> >  	return NULL;
> >  }
> >
> > +/* Fill the last sequence number that has been received from node on if1 by
> > + * finding the last sequence number sent on port B; accordingly get the last
> > + * received sequence number for if2 using sent sequence numbers on port A.
> > + */
> > +static void fill_last_seq_nrs(struct hsr_node *node, u16 *if1_seq, u16 *if2_seq)
> > +{
> > +	struct hsr_seq_block *block;
> > +	unsigned int block_off;
> > +	size_t block_sz;
> > +	u16 seq_bit;
> > +
> > +	spin_lock_bh(&node->seq_out_lock);
> > +
> > +	// Get last inserted block
> > +	block_off = (node->next_block - 1) & (HSR_MAX_SEQ_BLOCKS - 1);
> > +	block_sz = hsr_seq_block_size(node);
> > +	block = node->block_buf + block_off * block_sz;
> > +
> > +	if (!bitmap_empty(block->seq_nrs[HSR_PT_SLAVE_B - 1],
> > +			  HSR_SEQ_BLOCK_SIZE)) {
> > +		seq_bit = find_last_bit(block->seq_nrs[HSR_PT_SLAVE_B - 1],
> > +					HSR_SEQ_BLOCK_SIZE);
> > +		*if1_seq = (block->block_idx << HSR_SEQ_BLOCK_SHIFT) | seq_bit;
> > +	}
> > +	if (!bitmap_empty(block->seq_nrs[HSR_PT_SLAVE_A - 1],
> > +			  HSR_SEQ_BLOCK_SIZE)) {
> > +		seq_bit = find_last_bit(block->seq_nrs[HSR_PT_SLAVE_A - 1],
> > +					HSR_SEQ_BLOCK_SIZE);
> > +		*if2_seq = (block->block_idx << HSR_SEQ_BLOCK_SHIFT) | seq_bit;
> > +	}
> > +	spin_unlock_bh(&node->seq_out_lock);
>
> I know you preserve what is already here but what is this even used for?
> | ip -d link show dev hsr0
>
> does not show these numbers. It shows the sequence number of the hsr0
> interface which I understand.
> But then it is also possible to show the last received sequence number
> of any node on either of the two interfaces?

This is only exposed through the HSR_C_GET_NODE_STATUS netlink command
for the "HSR" (sic!) netlink family. I'm not aware of a userspace tool
that actually shows this data. It's unclear if it is of any real use.

I assume that the two sequence numbers are reported because they would
in theory allow to detect a broken ring: if one of the numbers stops
increasing or the numbers generally diverge, some link in the ring is
broken. If we have traffic from multiple nodes, we could even detect
where the ring is broken. That's likely also the reason for the weird
cross-assinment of B->1, A->2. What we send out on B has to have arrived
on port A resp. 1 before.

We can discuss removing this stuff in the future, but I'd prefer not to
touch userspace at this time.

> > +}
> > +
> >  int hsr_get_node_data(struct hsr_priv *hsr,
> >  		      const unsigned char *addr,
> >  		      unsigned char addr_b[ETH_ALEN],
> …
> > --- a/net/hsr/hsr_framereg.h
> > +++ b/net/hsr/hsr_framereg.h
> > @@ -82,15 +82,18 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
> >  #define hsr_seq_block_index(sequence_nr) ((sequence_nr) >> HSR_SEQ_BLOCK_SHIFT)
> >  #define hsr_seq_block_bit(sequence_nr) ((sequence_nr) & HSR_SEQ_BLOCK_MASK)
> >
> > +#define DECLARE_BITMAP_FLEX_ARRAY(name, bits) \
> > +	unsigned long name[][BITS_TO_LONGS(bits)]
> > +
> >  struct hsr_seq_block {
> >  	unsigned long		time;
> >  	u16			block_idx;
> > -	DECLARE_BITMAP(seq_nrs, HSR_SEQ_BLOCK_SIZE);
> > +	DECLARE_BITMAP_FLEX_ARRAY(seq_nrs, HSR_SEQ_BLOCK_SIZE);
>
> is there a story behind DECLARE_BITMAP_FLEX_ARRAY()? We have just this
> one user.

I've added it this way to make it obvious that almost all of it is the
same as DECLARE_BITMAP(). But probably it's better to get rid of the
macro, add the definition directly to the struct, and maybe add a
comment that it's supposed to be similar to what DECLARE_BITMAP()
produces. What do you think?

Thanks for your comments,
   Felix

> >  };
>
> Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm
  2026-01-29 15:12   ` Sebastian Andrzej Siewior
@ 2026-01-29 16:19     ` Felix Maurer
  0 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-29 16:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Thu, Jan 29, 2026 at 04:12:41PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-22 15:57:03 [+0100], Felix Maurer wrote:
> > With the new duplicate discard algorithm from the earlier patches, the
> > KUnit test for the algorithm needs to be updated. The updates are done in a
> > way to match the original intends pretty closely. Currently, there is much
> > knowledge about the actual algorithm baked into the tests (especially the
> > expectations) which may need some redesign in the future.
>
> So the below isn't new but I wasn't aware we have this kunit thing
> earlier. It triggers even before your series, just differently now. The
> problem seems to be that node member isn't properly initialized. So
> lockep complains:
>
> | 1..1
> |     KTAP version 1
> |     # Subtest: prp_duplicate_discard
> |     # module: prp_dup_discard_test
> |     1..5
> | INFO: trying to register non-static key.
> | The code is fine but needs lockdep annotation, or maybe
> | you didn't initialize this object before use?
> | turning off the locking correctness validator.
> …
> |  lock_acquire+0xc8/0x210
> |  rt_spin_lock+0x2e/0x1e0
> |  hsr_check_duplicate+0x40/0xb0
> |  prp_dup_discard_forward+0x30/0x180 [prp_dup_discard_test]
>
> It needs:
>
> diff --git a/net/hsr/prp_dup_discard_test.c b/net/hsr/prp_dup_discard_test.c
> index a7a5beb52d62f..2ab346d920bbd 100644
> --- a/net/hsr/prp_dup_discard_test.c
> +++ b/net/hsr/prp_dup_discard_test.c
> @@ -27,6 +27,7 @@ static struct prp_test_data *build_prp_test_data(struct kunit *test)
>  					     GFP_ATOMIC);
>  	KUNIT_EXPECT_NOT_ERR_OR_NULL(test, data);
>  	xa_init(&data->node.seq_blocks);
> +	spin_lock_init(&data->node.seq_out_lock);
>
>  	data->frame.node_src = &data->node;
>  	data->frame.port_rcv = &data->port_rcv;

Thanks, I'll add it!

   Felix

> Sebastian


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm
  2026-01-29 15:29     ` Sebastian Andrzej Siewior
@ 2026-01-29 16:29       ` Felix Maurer
  0 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-01-29 16:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, horms,
	jkarrenpalo, tglx, mingo, allison.henderson, petrm, antonio

On Thu, Jan 29, 2026 at 04:29:22PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-26 10:28:18 [+0100], Felix Maurer wrote:
> >
> > I'll wait for feedback from Sebastian, but I'm also fine with merging
> > the fixes for the KUnit test into the other patches and reposting.
>
> Overall the new algorithm is very nice. Feel free to add
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Thank you very much for taking the time to review this series! I'll
address the parts as I wrote in the other replies.

As Jakub wrote, there is the transient build failure due to the KUnit
test needing changes. I discussed that with Simon as well (off-list). In
the v3, I'll add the respective fixes to the KUnit test to the patches
where I change the algorithm so the build passes all the time. As the
patches 4 and 6 will look quite different with that, I'll only add your
Reviewed by tag to the remaining ones and leave 4 and 6 subject to a
quick re-review by you.

> I made a few comments where I think it should be possible to avoid
> tracking the sequence number for interface A and B but since you
> preserve what is done now, I don't mind to keep it as-it and try to
> remove it later on (unless I'm wrong and it is not doable).

I replied to that directly, I don't think that's doable. But we can take
a look at the HW-offload case in the future.

Thanks,
   Felix

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-01-29 15:21       ` Sebastian Andrzej Siewior
@ 2026-01-29 17:44         ` Felix Maurer
  2026-02-02 11:51           ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-29 17:44 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Thu, Jan 29, 2026 at 04:21:49PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-29 14:31:30 [+0100], Felix Maurer wrote:
> > On Thu, Jan 29, 2026 at 12:05:00PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2026-01-22 15:56:56 [+0100], Felix Maurer wrote:
> > > > diff --git a/tools/testing/selftests/net/hsr/prp_ping.sh b/tools/testing/selftests/net/hsr/prp_ping.sh
> > > > new file mode 100755
> > > > index 000000000000..fd2ba9f05d4c
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/net/hsr/prp_ping.sh
> > > …
> > > > +	# MAC addresses will be copied from LAN A interface
> > > > +	ip -net "$node1" link set address 00:11:22:00:00:01 dev vethA
> > > > +	ip -net "$node2" link set address 00:11:22:00:00:02 dev vethA
> > >
> > > so I somehow started this (I think) but while browsing the spec it
> > > somehow says that the same MAC address should be used on both ports.
> > > Could it be?
> > > It says that the two frames are identical except for the LAN field and
> > > checksum. Also the duplication is defined on src-MAC + seq nr.
> > > Having this requires to merge the two MACs for a node and we do this but
> > > could this be a left over from an older version of the spec or a
> > > behaviour that was not meant happen?
> >
> > Yes, for PRP it is required that both ports, A and B, of a node send
> > with the same MAC. For us that means that the two ports need to be
> > configured with the same MAC address. This used to be a common source of
> > configuration errors. Therefore, b65999e7238e ("net: hsr: sync hw addr
> > of slave2 according to slave1 hw addr on PRP") made it so that we are
> > now copying the MAC from port A to port B.
> >
> > Therefore, I'm only setting the MAC of vethA on each node in the test.
> > Even this is not strictly necessary but it turns out that debugging is a
> > lot simpler, when it is obvious addresses belong to which node.
>
> Looking at the hsr tests, those have two different macs… It should be
> the same. It works because it merges the nodes and lookup works for
> both…

Hm, I am not sure? For PRP, it's an explicit requirement to use the same
MAC addresses for both ports. For HSR, I think the standard is less
clear about the MAC addresses. And at least our code seems to assume
that there could be different MACs on the two interfaces of a node? But
yes, the node merging addresses this.

I'll take another look at this topic tomorrow.

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-01-29 16:17     ` Felix Maurer
@ 2026-01-29 18:01       ` Felix Maurer
  2026-01-30 10:34         ` Felix Maurer
  2026-02-02 17:11       ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-29 18:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On Thu, Jan 29, 2026 at 05:17:04PM +0100, Felix Maurer wrote:
> On Thu, Jan 29, 2026 at 03:43:48PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2026-01-22 15:57:01 [+0100], Felix Maurer wrote:
> > …
> > > As the problem (we accidentally skip over a sequence number that has not
> > > been received but will be received in the future) is similar to PRP, we can
> > > apply a similar solution. The duplicate discard algorithm based on the
> > > "sparse bitmap" works well for HSR if it is extended to track one bitmap
> > > for each port (A, B, master, interlink). To do this, change the sequence
> > > number blocks to contain a flexible array member as the last member that
> > > can keep chunks for as many bitmaps as we need. This design makes it easy
> > > to reuse the same algorithm in a potential PRP RedBox implementation.
> >
> > I know you just "copy" the logic based on what we have now but…
> > Why do we have to track the sequence number for A, B and interlink? The
> > 'master' port is what we feed into the stack so this needs to be
> > de-duplicated. I am not sure how 'interlink' works so I keep quiet here.
> > But A and B? There shouldn't be any duplicates on A and B unless the
> > destination node forwards the node. Or do I miss something?
> > I'm bringing this up because limiting to one (or two since I am unsure
> > about interlink) would save some memory and avoid needless updates. And
> > if you have HW-offloading enabled then you shouldn't see any packets
> > which are not directed to _this_ node.
>
> About the interlink: that's the interface where you attach devices that
> know nothing about HSR, i.e., when we are a RedBox. I consider it very
> similar to the master port, it's our responsibility to de-duplicate what
> we send out there.
>
> I was thinking about exactly this while working on the patch as well and
> I came to each conclusion (A,B are needed vs. are not needed) at least
> once. In the end, I think we will need it. It's right that a well
> behaving node in the ring should not forward "frames for which the node
> is the unique destination" (5.3.2.1). But there could be frames that
> have no unique destination in the ring at all: multicast frames or
> frames addressed to a non-existing MAC address. We should not forward
> such frames either to prevent them from looping forever.
>
> Now, such frames should probably only reach back to us, if we sent them
> (either from our stack or from the interlink port). We could track
> sequence numbers sent to master and interlink (for de-duplication) and
> sent by master and interlink (for loop prevention) for more clarity, but
> then we're back at four bitmaps again.

Please disregard this. If we are the source, we can easily detect it
from the frame. I'll consider this again. Like I wrote, I'm jumping
between necessary and not.

Thanks,
   Felix


> I agree that we can and should optimize the HW-offloaded case. I'd
> suggest to do that in a separate patchset, though, partly because I
> don't have access to a hardware with HSR offload at the moment.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-01-29 18:01       ` Felix Maurer
@ 2026-01-30 10:34         ` Felix Maurer
  2026-02-02 17:53           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-01-30 10:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On Thu, Jan 29, 2026 at 07:01:02PM +0100, Felix Maurer wrote:
> On Thu, Jan 29, 2026 at 05:17:04PM +0100, Felix Maurer wrote:
> > On Thu, Jan 29, 2026 at 03:43:48PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2026-01-22 15:57:01 [+0100], Felix Maurer wrote:
> > > …
> > > > As the problem (we accidentally skip over a sequence number that has not
> > > > been received but will be received in the future) is similar to PRP, we can
> > > > apply a similar solution. The duplicate discard algorithm based on the
> > > > "sparse bitmap" works well for HSR if it is extended to track one bitmap
> > > > for each port (A, B, master, interlink). To do this, change the sequence
> > > > number blocks to contain a flexible array member as the last member that
> > > > can keep chunks for as many bitmaps as we need. This design makes it easy
> > > > to reuse the same algorithm in a potential PRP RedBox implementation.
> > >
> > > I know you just "copy" the logic based on what we have now but…
> > > Why do we have to track the sequence number for A, B and interlink? The
> > > 'master' port is what we feed into the stack so this needs to be
> > > de-duplicated. I am not sure how 'interlink' works so I keep quiet here.
> > > But A and B? There shouldn't be any duplicates on A and B unless the
> > > destination node forwards the node. Or do I miss something?
> > > I'm bringing this up because limiting to one (or two since I am unsure
> > > about interlink) would save some memory and avoid needless updates. And
> > > if you have HW-offloading enabled then you shouldn't see any packets
> > > which are not directed to _this_ node.
> >
> > About the interlink: that's the interface where you attach devices that
> > know nothing about HSR, i.e., when we are a RedBox. I consider it very
> > similar to the master port, it's our responsibility to de-duplicate what
> > we send out there.
> >
> > I was thinking about exactly this while working on the patch as well and
> > I came to each conclusion (A,B are needed vs. are not needed) at least
> > once. In the end, I think we will need it. It's right that a well
> > behaving node in the ring should not forward "frames for which the node
> > is the unique destination" (5.3.2.1). But there could be frames that
> > have no unique destination in the ring at all: multicast frames or
> > frames addressed to a non-existing MAC address. We should not forward
> > such frames either to prevent them from looping forever.
> >
> > Now, such frames should probably only reach back to us, if we sent them
> > (either from our stack or from the interlink port). We could track
> > sequence numbers sent to master and interlink (for de-duplication) and
> > sent by master and interlink (for loop prevention) for more clarity, but
> > then we're back at four bitmaps again.
>
> Please disregard this. If we are the source, we can easily detect it
> from the frame. I'll consider this again. Like I wrote, I'm jumping
> between necessary and not.

I took another look at this. I think the IEC 62439-3:2021 is not super
clear on this (it often refers to "duplicate frames" without being exact
about in which domain they are duplicates). But the section on HSR Modes
(5.3.2.1) is quite telling when all modes are read in combination.

First, there is the mode H, which is the default and mandatory to
implement. In this mode, a node should forward all frames "except for
frames sent by the node itself, duplicate frames and frames for which
the node is the unique destination". Note that "duplicate frame" is not
further specified here. As we don't implement different modes, we should
follow mode H in our implementation.

In contrast, there is also mode X (sometimes referred to as "traffic
reduction").  It is supposed to work like mode H, but without sending
"counter-duplicates", i.e., frames that are "a duplicate of a frame that
it received [...] from the opposite direction."

To me, this means two things for mode H, i.e., what we should be doing:
- For a frame with one sequence number coming from one node, we should
  be forwarding the frame once in each direction.
- There could be duplicates of these frames in either direction that we
  should not forward. This is also hinted at in other parts of the
  standard, that there could be multiple duplicates, especially when HSR
  rings are coupled.

Therefore, I think it is correct to do the duplicate tracking once for
each port, especially separately for port A and port B.

Thanks,
   Felix


> > I agree that we can and should optimize the HW-offloaded case. I'd
> > suggest to do that in a separate patchset, though, partly because I
> > don't have access to a hardware with HSR offload at the moment.
>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-01-22 14:56 ` [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP Felix Maurer
  2026-01-28 16:38   ` Simon Horman
  2026-01-29 13:29   ` Sebastian Andrzej Siewior
@ 2026-02-02  8:47   ` Steffen Lindner
  2 siblings, 0 replies; 55+ messages in thread
From: Steffen Lindner @ 2026-02-02  8:47 UTC (permalink / raw)
  To: Felix Maurer, netdev@vger.kernel.org
  Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, jkarrenpalo@gmail.com,
	tglx@linutronix.de, mingo@kernel.org,
	allison.henderson@oracle.com, petrm@nvidia.com,
	antonio@openvpn.net, bigeasy@linutronix.de

From: Felix Maurer <fmaurer@redhat.com>
Sent: Thursday, January 22, 2026 3:56 PM
To: netdev@vger.kernel.org
Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; horms@kernel.org; jkarrenpalo@gmail.com; tglx@linutronix.de; mingo@kernel.org; allison.henderson@oracle.com; petrm@nvidia.com; antonio@openvpn.net; bigeasy@linutronix.de; Steffen Lindner
Subject: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP

Hi Felix,

> The idea of the new algorithm is to store the seen sequence numbers in a
> bitmap. To keep the size of the bitmap in control, we store it as a "sparse
> bitmap" where the bitmap is split into blocks and not all blocks exist at
> the same time. The sparse bitmap is implemented using an xarray that keeps
> the references to the individual blocks and a backing ring buffer that
> stores the actual blocks. New blocks are initialized in the buffer and
> added to the xarray as needed when new frames arrive. Existing blocks are
> removed in two conditions:
> 1. The block found for an arriving sequence number is old and therefore not
>    relevant to the duplicate discard algorithm anymore, i.e., it has been
>    added more than the entry forget time ago. In this case, the block is
>    removed from the xarray and marked as forgotten (by setting its
>    timestamp to 0).
> 2. Space is needed in the ring buffer for a new block. In this case, the
>    block is removed from the xarray, if it hasn't already been forgotten
>    (by 1.). Afterwards, the new block is initialized in its place.
>
> This has the nice property that we can reliably track sequence numbers on
> low traffic situations (where they expire based on their timestamp) and
> more quickly forget sequence numbers in high traffic situations before they
> potentially wrap over and repeat before they are expired.

We tested the provided patches based on our bug-report. The patches
significantly reduce the end-to-end packet loss in different network
scenarios compared to the existing PRP implementation.

Tested-by: Steffen Lindner <steffen.lindner@de.abb.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
  2026-01-29 13:32   ` Sebastian Andrzej Siewior
@ 2026-02-02 11:30     ` Felix Maurer
  2026-02-02 16:45       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-02 11:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Thu, Jan 29, 2026 at 02:32:17PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-22 15:57:00 [+0100], Felix Maurer wrote:
> > Add tests where one link has different rates of packet loss or reorders
> > packets. PRP should still be able to recover from these link faults and
> > show no packet loss.  However, it is acceptable to receive some level of
> > duplicate packets. This matches the current specification (IEC
> > 62439-3:2021) of the duplicate discard algorithm that requires it to be
> > "designed such that it never rejects a legitimate frame, while occasional
> > acceptance of a duplicate can be tolerated." The rate of acceptable
> > duplicates in this test is intentionally high (10%) to make the test
> > stable, the values I observed in the worst test cases (20% loss) are around
> > 5% duplicates.
>
> Do you know why we have the duplicates? It is because of the high
> sending rate at which point the blocks are dropped and we can't
> recognise the duplicate?

I could not give a definitive answer on this last week, I had to dig a
bit and think through this. The reason for the duplicates is not a high
sending rate leading to dropped blocks. Rather, it's the low sending
rate leading to expiring blocks: in the test, the ping interval is set
to 10ms. As the blocks expire based on the timestamp of the first
received sequence number, the 400ms entry expiry gets hit every ca. 40
frames, at which we basically clear all received sequence number in the
block by creating a new block. This happens on both nodes (for ping
requests and replies) ca. 10 times during the test, leading to about 20
duplicate packets, which is the ca. 5% duplicates I observed in multiple
runs of this test.

This means the duplicate rate is highly dependent on the traffic pattern
and the test manages to hit exactly this situation. I'll add the
explanation to the message in v3. If similar traffic patterns are hit in
the real world too often, we could test different block sizes. I just
did this and the results are as expected:
- 64 bit blocks: the duplicate rate is about the same because the block
  still expires once.
- 32 bit blocks: the duplicate rate drops to around 4% because,
  including the netem delay in the test, this is roughly the edge
  between expiring a block and already hitting a new one. But if we keep
  the bitmap as it is, we waste 50% of memory.
- 16 bit blocks: duplicate rate is zero because we use new blocks way
  before the expiry time. But this wastes about 75% memory for each
  block and the overhead of the timestamp in each block gets quite
  significant.

I'd suggest to keep the 128 bit blocks to have a chance to detect
duplicates at higher data rates for some delay difference (up to about
5.5ms with 1Gbit/s). 64 bit blocks would halve the maximum delay
difference, but probably still strike a good balance between memory
overhead and duplicate rate. Do you prefer one over the other?

Thanks,
   Felix

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-01-29 17:44         ` Felix Maurer
@ 2026-02-02 11:51           ` Felix Maurer
  2026-02-02 15:55             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-02 11:51 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Thu, Jan 29, 2026 at 06:44:47PM +0100, Felix Maurer wrote:
> On Thu, Jan 29, 2026 at 04:21:49PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2026-01-29 14:31:30 [+0100], Felix Maurer wrote:
> > > On Thu, Jan 29, 2026 at 12:05:00PM +0100, Sebastian Andrzej Siewior wrote:
> > > > On 2026-01-22 15:56:56 [+0100], Felix Maurer wrote:
> > > > > diff --git a/tools/testing/selftests/net/hsr/prp_ping.sh b/tools/testing/selftests/net/hsr/prp_ping.sh
> > > > > new file mode 100755
> > > > > index 000000000000..fd2ba9f05d4c
> > > > > --- /dev/null
> > > > > +++ b/tools/testing/selftests/net/hsr/prp_ping.sh
> > > > …
> > > > > +	# MAC addresses will be copied from LAN A interface
> > > > > +	ip -net "$node1" link set address 00:11:22:00:00:01 dev vethA
> > > > > +	ip -net "$node2" link set address 00:11:22:00:00:02 dev vethA
> > > >
> > > > so I somehow started this (I think) but while browsing the spec it
> > > > somehow says that the same MAC address should be used on both ports.
> > > > Could it be?
> > > > It says that the two frames are identical except for the LAN field and
> > > > checksum. Also the duplication is defined on src-MAC + seq nr.
> > > > Having this requires to merge the two MACs for a node and we do this but
> > > > could this be a left over from an older version of the spec or a
> > > > behaviour that was not meant happen?
> > >
> > > Yes, for PRP it is required that both ports, A and B, of a node send
> > > with the same MAC. For us that means that the two ports need to be
> > > configured with the same MAC address. This used to be a common source of
> > > configuration errors. Therefore, b65999e7238e ("net: hsr: sync hw addr
> > > of slave2 according to slave1 hw addr on PRP") made it so that we are
> > > now copying the MAC from port A to port B.
> > >
> > > Therefore, I'm only setting the MAC of vethA on each node in the test.
> > > Even this is not strictly necessary but it turns out that debugging is a
> > > lot simpler, when it is obvious addresses belong to which node.
> >
> > Looking at the hsr tests, those have two different macs… It should be
> > the same. It works because it merges the nodes and lookup works for
> > both…
>
> Hm, I am not sure? For PRP, it's an explicit requirement to use the same
> MAC addresses for both ports. For HSR, I think the standard is less
> clear about the MAC addresses. And at least our code seems to assume
> that there could be different MACs on the two interfaces of a node? But
> yes, the node merging addresses this.

I'm still not 100% certain, but I agree that the standard reads more
like the MAC addresses should be the same for the two HSR ports. At the
moment, the kernel and the test assumes that they can/should be
different. Therefore, I think we should fix this across the board in
another patchset if we agree that's the right thing to do.

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-02-02 11:51           ` Felix Maurer
@ 2026-02-02 15:55             ` Sebastian Andrzej Siewior
  2026-02-03 10:12               ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-02 15:55 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-02-02 12:51:12 [+0100], Felix Maurer wrote:
> > Hm, I am not sure? For PRP, it's an explicit requirement to use the same
> > MAC addresses for both ports. For HSR, I think the standard is less
> > clear about the MAC addresses. And at least our code seems to assume
> > that there could be different MACs on the two interfaces of a node? But
> > yes, the node merging addresses this.
> 
> I'm still not 100% certain, but I agree that the standard reads more
> like the MAC addresses should be the same for the two HSR ports. At the
> moment, the kernel and the test assumes that they can/should be
> different. Therefore, I think we should fix this across the board in
> another patchset if we agree that's the right thing to do.

I would suggest to do so. This could serve as bad example and my PTP
userland patches expect the same MAC on both ports. So ;)

> Thanks,
>    Felix
> 

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
  2026-02-02 11:30     ` Felix Maurer
@ 2026-02-02 16:45       ` Sebastian Andrzej Siewior
  2026-02-03 12:09         ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-02 16:45 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-02-02 12:30:44 [+0100], Felix Maurer wrote:
> On Thu, Jan 29, 2026 at 02:32:17PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2026-01-22 15:57:00 [+0100], Felix Maurer wrote:
> > > Add tests where one link has different rates of packet loss or reorders
> > > packets. PRP should still be able to recover from these link faults and
> > > show no packet loss.  However, it is acceptable to receive some level of
> > > duplicate packets. This matches the current specification (IEC
> > > 62439-3:2021) of the duplicate discard algorithm that requires it to be
> > > "designed such that it never rejects a legitimate frame, while occasional
> > > acceptance of a duplicate can be tolerated." The rate of acceptable
> > > duplicates in this test is intentionally high (10%) to make the test
> > > stable, the values I observed in the worst test cases (20% loss) are around
> > > 5% duplicates.
> >
> > Do you know why we have the duplicates? It is because of the high
> > sending rate at which point the blocks are dropped and we can't
> > recognise the duplicate?
> 
> I could not give a definitive answer on this last week, I had to dig a
> bit and think through this. The reason for the duplicates is not a high
> sending rate leading to dropped blocks. Rather, it's the low sending
> rate leading to expiring blocks: in the test, the ping interval is set
> to 10ms. As the blocks expire based on the timestamp of the first
> received sequence number, the 400ms entry expiry gets hit every ca. 40
> frames, at which we basically clear all received sequence number in the
> block by creating a new block. This happens on both nodes (for ping
> requests and replies) ca. 10 times during the test, leading to about 20
> duplicate packets, which is the ca. 5% duplicates I observed in multiple
> runs of this test.
> 
> This means the duplicate rate is highly dependent on the traffic pattern
> and the test manages to hit exactly this situation. I'll add the
> explanation to the message in v3. If similar traffic patterns are hit in
> the real world too often, we could test different block sizes. I just
> did this and the results are as expected:
> - 64 bit blocks: the duplicate rate is about the same because the block
>   still expires once.
> - 32 bit blocks: the duplicate rate drops to around 4% because,
>   including the netem delay in the test, this is roughly the edge
>   between expiring a block and already hitting a new one. But if we keep
>   the bitmap as it is, we waste 50% of memory.
> - 16 bit blocks: duplicate rate is zero because we use new blocks way
>   before the expiry time. But this wastes about 75% memory for each
>   block and the overhead of the timestamp in each block gets quite
>   significant.
> 
> I'd suggest to keep the 128 bit blocks to have a chance to detect
> duplicates at higher data rates for some delay difference (up to about
> 5.5ms with 1Gbit/s). 64 bit blocks would halve the maximum delay
> difference, but probably still strike a good balance between memory
> overhead and duplicate rate. Do you prefer one over the other?

We had false positives with the old algorithm so having now some is not
the end of the world. 128bit blocks are fine I guess. The question is
how far do we want to tune it and everything comes with a price.

If I update bitmap every time there is a valid packet:

diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index 08f91f10a9347..bb9b13940c35b 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -572,6 +572,7 @@ static int hsr_check_duplicate(struct hsr_frame_info *frame,
 	if (test_and_set_bit(seq_bit, block->seq_nrs[port_type]))
 		goto out_seen;
 
+	block->time = jiffies;
 out_new:
 	spin_unlock_bh(&node->seq_out_lock);
 	return 0;

then I the test
| TEST: test_cut_link - HSRv1                                         [ OK ]
| INFO: Wait for node table entries to be merged.
| INFO: Running ping node1-NUb1aJ -> 100.64.0.2
| INFO: 400 packets transmitted, 400 received, +22 duplicates, 0% packet loss, time 6113ms

turns into

| TEST: test_cut_link - HSRv1                                         [ OK ]
| INFO: Wait for node table entries to be merged.
| INFO: Running ping node1-6i4xNe -> 100.64.0.2
| INFO: 400 packets transmitted, 400 received, 0% packet loss, time 6067ms

which could be considered as an improvement. On the other hand the live
time of the block gets pushed past the HSR_ENTRY_FORGET_TIME so
the first entry survives longer theoretically 127 * (400-1) = ~50 seconds.
In the reboot case the box should be quiet for 500ms at which point the
entry gets removed anyway so the longer "hold time" shouldn't matter.

It sounds like a good idea but I don't want to rush anything ;)

> Thanks,
>    Felix

Sebastian

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-01-28 18:37     ` Felix Maurer
@ 2026-02-02 16:57       ` Sebastian Andrzej Siewior
  2026-02-03 10:23         ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-02 16:57 UTC (permalink / raw)
  To: Felix Maurer
  Cc: Simon Horman, netdev, davem, edumazet, kuba, pabeni, jkarrenpalo,
	tglx, mingo, allison.henderson, petrm, antonio, Steffen Lindner

On 2026-01-28 19:37:23 [+0100], Felix Maurer wrote:
> > > +	if (!block) {
> > > +		block = &node->block_buf[node->next_block];
> > > +		hsr_forget_seq_block(node, block);
> > > +
> > > +		memset(block, 0, sizeof(*block));
> > > +		block->time = jiffies;
> > > +		block->block_idx = block_idx;
> > > +
> > > +		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
> > > +		if (xa_is_err(res))
> >
> > Hi Felix,
> >
> > I ran Claude Code over this with review-prompts [1] and it flags
> > that in the error path above, the following is needed so that the
> > block can be re-used.
> >
> > 			block->time = 0;
> 
> I agree. It would nonetheless be reused at some point, but not setting
> time = 0 may lead to an unexpected block getting removed when it is
> reused (or at least an attempt to do so). I'll fix this in v3.

Not sure I follow. If xa_store() fails then the block is not added to
the "sequence-blocks" and it will be attempted once the next packet is
received, right?. Otherwise node->next_block is not updated at which
point this block will be added twice which sounds worse.

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-01-29 16:17     ` Felix Maurer
  2026-01-29 18:01       ` Felix Maurer
@ 2026-02-02 17:11       ` Sebastian Andrzej Siewior
  2026-02-03 11:08         ` Felix Maurer
  1 sibling, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-02 17:11 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On 2026-01-29 17:17:04 [+0100], Felix Maurer wrote:
> About the interlink: that's the interface where you attach devices that
> know nothing about HSR, i.e., when we are a RedBox. I consider it very
> similar to the master port, it's our responsibility to de-duplicate what
> we send out there.

Right. It should contain the same "seen" values as the master port.

> I was thinking about exactly this while working on the patch as well and
> I came to each conclusion (A,B are needed vs. are not needed) at least
> once. In the end, I think we will need it. It's right that a well
> behaving node in the ring should not forward "frames for which the node
> is the unique destination" (5.3.2.1). But there could be frames that
> have no unique destination in the ring at all: multicast frames or
> frames addressed to a non-existing MAC address. We should not forward
> such frames either to prevent them from looping forever.

But multicast and non-existing destination frames do one round in the
circle, reach the sender which will remove them from the ring and not
forward.

> Now, such frames should probably only reach back to us, if we sent them
> (either from our stack or from the interlink port). We could track
> sequence numbers sent to master and interlink (for de-duplication) and
> sent by master and interlink (for loop prevention) for more clarity, but
> then we're back at four bitmaps again.

I think we could avoid tracking A/ B. The packets that we receive from
A/ B and forward to the master after de-duplication should be the same
as those that are forwarded to the interlink port.

> I agree that we can and should optimize the HW-offloaded case. I'd
> suggest to do that in a separate patchset, though, partly because I
> don't have access to a hardware with HSR offload at the moment.

Sure.

> > I know you preserve what is already here but what is this even used for?
> > | ip -d link show dev hsr0
> >
> > does not show these numbers. It shows the sequence number of the hsr0
> > interface which I understand.
> > But then it is also possible to show the last received sequence number
> > of any node on either of the two interfaces?
> 
> This is only exposed through the HSR_C_GET_NODE_STATUS netlink command
> for the "HSR" (sic!) netlink family. I'm not aware of a userspace tool
> that actually shows this data. It's unclear if it is of any real use.

hehe. Great.

> I assume that the two sequence numbers are reported because they would
> in theory allow to detect a broken ring: if one of the numbers stops
> increasing or the numbers generally diverge, some link in the ring is
> broken. If we have traffic from multiple nodes, we could even detect
> where the ring is broken. That's likely also the reason for the weird
> cross-assinment of B->1, A->2. What we send out on B has to have arrived
> on port A resp. 1 before.
> 
> We can discuss removing this stuff in the future, but I'd prefer not to
> touch userspace at this time.

Sure. I was just looking who is using this and this netlink interface
was the only user. It looks like debug code.

> > > +}
> > > +
> > >  int hsr_get_node_data(struct hsr_priv *hsr,
> > >  		      const unsigned char *addr,
> > >  		      unsigned char addr_b[ETH_ALEN],
> > …
> > > --- a/net/hsr/hsr_framereg.h
> > > +++ b/net/hsr/hsr_framereg.h
> > > @@ -82,15 +82,18 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
> > >  #define hsr_seq_block_index(sequence_nr) ((sequence_nr) >> HSR_SEQ_BLOCK_SHIFT)
> > >  #define hsr_seq_block_bit(sequence_nr) ((sequence_nr) & HSR_SEQ_BLOCK_MASK)
> > >
> > > +#define DECLARE_BITMAP_FLEX_ARRAY(name, bits) \
> > > +	unsigned long name[][BITS_TO_LONGS(bits)]
> > > +
> > >  struct hsr_seq_block {
> > >  	unsigned long		time;
> > >  	u16			block_idx;
> > > -	DECLARE_BITMAP(seq_nrs, HSR_SEQ_BLOCK_SIZE);
> > > +	DECLARE_BITMAP_FLEX_ARRAY(seq_nrs, HSR_SEQ_BLOCK_SIZE);
> >
> > is there a story behind DECLARE_BITMAP_FLEX_ARRAY()? We have just this
> > one user.
> 
> I've added it this way to make it obvious that almost all of it is the
> same as DECLARE_BITMAP(). But probably it's better to get rid of the
> macro, add the definition directly to the struct, and maybe add a
> comment that it's supposed to be similar to what DECLARE_BITMAP()
> produces. What do you think?

I would get rid of it. There the "official" DECLARE_BITMAP(). If this
doesn't work just open code it avoiding the macro.

> Thanks for your comments,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-01-30 10:34         ` Felix Maurer
@ 2026-02-02 17:53           ` Sebastian Andrzej Siewior
  2026-02-03 11:49             ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-02 17:53 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On 2026-01-30 11:34:55 [+0100], Felix Maurer wrote:
> I took another look at this. I think the IEC 62439-3:2021 is not super
> clear on this (it often refers to "duplicate frames" without being exact
> about in which domain they are duplicates). But the section on HSR Modes
> (5.3.2.1) is quite telling when all modes are read in combination.
> 
> First, there is the mode H, which is the default and mandatory to
> implement. In this mode, a node should forward all frames "except for
> frames sent by the node itself, duplicate frames and frames for which
> the node is the unique destination". Note that "duplicate frame" is not
> further specified here. As we don't implement different modes, we should
> follow mode H in our implementation.
> 
> In contrast, there is also mode X (sometimes referred to as "traffic
> reduction").  It is supposed to work like mode H, but without sending
> "counter-duplicates", i.e., frames that are "a duplicate of a frame that
> it received [...] from the opposite direction."
> 
> To me, this means two things for mode H, i.e., what we should be doing:
> - For a frame with one sequence number coming from one node, we should
>   be forwarding the frame once in each direction.
> - There could be duplicates of these frames in either direction that we
>   should not forward. This is also hinted at in other parts of the
>   standard, that there could be multiple duplicates, especially when HSR
>   rings are coupled.
> 
> Therefore, I think it is correct to do the duplicate tracking once for
> each port, especially separately for port A and port B.

I will not argue with you here.
But. :)
If you track duplicates for A and B and see a duplicate on port A then
this indicates that the sender of this packet did not remove it from the
ring once it received it back. This looks like a failure.
If you track duplicates for A and B in a single "bitmap" then this would
mode X.

I nag here a bit because you allocate 16 + (128 / 8) * 4 * 64 = 4112
bytes for this bitmap per node. That is a bit over 4kib. Then adding and
removing sequences got a bit more expensive. Anyway. There is table
F.19+ specifying HSR tests and don't find "forwarding duplicate over
port A". So lets keep it.

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-02-02 15:55             ` Sebastian Andrzej Siewior
@ 2026-02-03 10:12               ` Felix Maurer
  2026-02-03 11:55                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 10:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Mon, Feb 02, 2026 at 04:55:10PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-02 12:51:12 [+0100], Felix Maurer wrote:
> > > Hm, I am not sure? For PRP, it's an explicit requirement to use the same
> > > MAC addresses for both ports. For HSR, I think the standard is less
> > > clear about the MAC addresses. And at least our code seems to assume
> > > that there could be different MACs on the two interfaces of a node? But
> > > yes, the node merging addresses this.
> >
> > I'm still not 100% certain, but I agree that the standard reads more
> > like the MAC addresses should be the same for the two HSR ports. At the
> > moment, the kernel and the test assumes that they can/should be
> > different. Therefore, I think we should fix this across the board in
> > another patchset if we agree that's the right thing to do.
>
> I would suggest to do so. This could serve as bad example and my PTP
> userland patches expect the same MAC on both ports. So ;)

I'll put it on my list for another patchset :)
But this of course brings up how the MAC addresses should be handled on
the HSR interfaces in general. At the moment, for HSR we copy the MAC of
portA to master. For PRP, the MAC of portA is copied to master and
portB.

If the MACs should be the same for both ports of an HSR interface,
should we set them when creating the HSR interface, similarly to PRP?
And does it make sense to copy from portA (feels somewhat arbitrary to
me) or would it be cleaner to copy from master to portA and portB?

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-02-02 16:57       ` Sebastian Andrzej Siewior
@ 2026-02-03 10:23         ` Felix Maurer
  2026-02-03 11:57           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 10:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Simon Horman, netdev, davem, edumazet, kuba, pabeni, jkarrenpalo,
	tglx, mingo, allison.henderson, petrm, antonio, Steffen Lindner

On Mon, Feb 02, 2026 at 05:57:02PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-28 19:37:23 [+0100], Felix Maurer wrote:
> > > > +	if (!block) {
> > > > +		block = &node->block_buf[node->next_block];
> > > > +		hsr_forget_seq_block(node, block);
> > > > +
> > > > +		memset(block, 0, sizeof(*block));
> > > > +		block->time = jiffies;
> > > > +		block->block_idx = block_idx;
> > > > +
> > > > +		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
> > > > +		if (xa_is_err(res))
> > >
> > > Hi Felix,
> > >
> > > I ran Claude Code over this with review-prompts [1] and it flags
> > > that in the error path above, the following is needed so that the
> > > block can be re-used.
> > >
> > > 			block->time = 0;
> >
> > I agree. It would nonetheless be reused at some point, but not setting
> > time = 0 may lead to an unexpected block getting removed when it is
> > reused (or at least an attempt to do so). I'll fix this in v3.
>
> Not sure I follow. If xa_store() fails then the block is not added to
> the "sequence-blocks" and it will be attempted once the next packet is
> received, right?. Otherwise node->next_block is not updated at which
> point this block will be added twice which sounds worse.

Yes, it will be attempted again on the next frame that needs a new
block. But every attempt to recycle the next_block starts with calling
hsr_forget_seq_block(), which tries to xa_erase() the block if time!=0.
Generally, time==0 is the marker that a block is not stored in the
xarray. So for consistency it's correct to set time=0 if the block can't
be added to the xarray, even though I don't think it would cause any
trouble at the moment if the time is not reset.

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-02-02 17:11       ` Sebastian Andrzej Siewior
@ 2026-02-03 11:08         ` Felix Maurer
  2026-02-03 12:09           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 11:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On Mon, Feb 02, 2026 at 06:11:23PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-29 17:17:04 [+0100], Felix Maurer wrote:
> > About the interlink: that's the interface where you attach devices that
> > know nothing about HSR, i.e., when we are a RedBox. I consider it very
> > similar to the master port, it's our responsibility to de-duplicate what
> > we send out there.
>
> Right. It should contain the same "seen" values as the master port.
>
> > I was thinking about exactly this while working on the patch as well and
> > I came to each conclusion (A,B are needed vs. are not needed) at least
> > once. In the end, I think we will need it. It's right that a well
> > behaving node in the ring should not forward "frames for which the node
> > is the unique destination" (5.3.2.1). But there could be frames that
> > have no unique destination in the ring at all: multicast frames or
> > frames addressed to a non-existing MAC address. We should not forward
> > such frames either to prevent them from looping forever.
>
> But multicast and non-existing destination frames do one round in the
> circle, reach the sender which will remove them from the ring and not
> forward.
>
> > Now, such frames should probably only reach back to us, if we sent them
> > (either from our stack or from the interlink port). We could track
> > sequence numbers sent to master and interlink (for de-duplication) and
> > sent by master and interlink (for loop prevention) for more clarity, but
> > then we're back at four bitmaps again.
>
> I think we could avoid tracking A/ B. The packets that we receive from
> A/ B and forward to the master after de-duplication should be the same
> as those that are forwarded to the interlink port.

I consider this an argument for removing the separate tracking of master
and interlink (and thereby saving another 1k), by changing the
forwarding code to not iterate over all ports independently.

I'll reply to the A/B tracking topic in the other mail :)


> > > I know you preserve what is already here but what is this even used for?
> > > | ip -d link show dev hsr0
> > >
> > > does not show these numbers. It shows the sequence number of the hsr0
> > > interface which I understand.
> > > But then it is also possible to show the last received sequence number
> > > of any node on either of the two interfaces?
> >
> > This is only exposed through the HSR_C_GET_NODE_STATUS netlink command
> > for the "HSR" (sic!) netlink family. I'm not aware of a userspace tool
> > that actually shows this data. It's unclear if it is of any real use.
>
> hehe. Great.
>
> > I assume that the two sequence numbers are reported because they would
> > in theory allow to detect a broken ring: if one of the numbers stops
> > increasing or the numbers generally diverge, some link in the ring is
> > broken. If we have traffic from multiple nodes, we could even detect
> > where the ring is broken. That's likely also the reason for the weird
> > cross-assinment of B->1, A->2. What we send out on B has to have arrived
> > on port A resp. 1 before.
> >
> > We can discuss removing this stuff in the future, but I'd prefer not to
> > touch userspace at this time.
>
> Sure. I was just looking who is using this and this netlink interface
> was the only user. It looks like debug code.

For which we would have the debugfs... The netlink interface is already
on my list to take a look at. I'll take that debug aspect into account.
I might start a different thread with the discussion topics that came up
with this patchset.

> > > > +}
> > > > +
> > > >  int hsr_get_node_data(struct hsr_priv *hsr,
> > > >  		      const unsigned char *addr,
> > > >  		      unsigned char addr_b[ETH_ALEN],
> > > …
> > > > --- a/net/hsr/hsr_framereg.h
> > > > +++ b/net/hsr/hsr_framereg.h
> > > > @@ -82,15 +82,18 @@ int prp_register_frame_out(struct hsr_port *port, struct hsr_frame_info *frame);
> > > >  #define hsr_seq_block_index(sequence_nr) ((sequence_nr) >> HSR_SEQ_BLOCK_SHIFT)
> > > >  #define hsr_seq_block_bit(sequence_nr) ((sequence_nr) & HSR_SEQ_BLOCK_MASK)
> > > >
> > > > +#define DECLARE_BITMAP_FLEX_ARRAY(name, bits) \
> > > > +	unsigned long name[][BITS_TO_LONGS(bits)]
> > > > +
> > > >  struct hsr_seq_block {
> > > >  	unsigned long		time;
> > > >  	u16			block_idx;
> > > > -	DECLARE_BITMAP(seq_nrs, HSR_SEQ_BLOCK_SIZE);
> > > > +	DECLARE_BITMAP_FLEX_ARRAY(seq_nrs, HSR_SEQ_BLOCK_SIZE);
> > >
> > > is there a story behind DECLARE_BITMAP_FLEX_ARRAY()? We have just this
> > > one user.
> >
> > I've added it this way to make it obvious that almost all of it is the
> > same as DECLARE_BITMAP(). But probably it's better to get rid of the
> > macro, add the definition directly to the struct, and maybe add a
> > comment that it's supposed to be similar to what DECLARE_BITMAP()
> > produces. What do you think?
>
> I would get rid of it. There the "official" DECLARE_BITMAP(). If this
> doesn't work just open code it avoiding the macro.

Alright, I did that in the v3 [1] I sent yesterday. Sorry for not
waiting with that until the end of every discussion. I understood your
reply to 0/9 as a "go" for sending a new version addressing the small
issues (without touching the A/B tracking).

Thanks,
   Felix


[1]: https://lore.kernel.org/netdev/cover.1770041682.git.fmaurer@redhat.com/


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-02-02 17:53           ` Sebastian Andrzej Siewior
@ 2026-02-03 11:49             ` Felix Maurer
  2026-02-03 12:08               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 11:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On Mon, Feb 02, 2026 at 06:53:11PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-01-30 11:34:55 [+0100], Felix Maurer wrote:
> > I took another look at this. I think the IEC 62439-3:2021 is not super
> > clear on this (it often refers to "duplicate frames" without being exact
> > about in which domain they are duplicates). But the section on HSR Modes
> > (5.3.2.1) is quite telling when all modes are read in combination.
> >
> > First, there is the mode H, which is the default and mandatory to
> > implement. In this mode, a node should forward all frames "except for
> > frames sent by the node itself, duplicate frames and frames for which
> > the node is the unique destination". Note that "duplicate frame" is not
> > further specified here. As we don't implement different modes, we should
> > follow mode H in our implementation.
> >
> > In contrast, there is also mode X (sometimes referred to as "traffic
> > reduction").  It is supposed to work like mode H, but without sending
> > "counter-duplicates", i.e., frames that are "a duplicate of a frame that
> > it received [...] from the opposite direction."
> >
> > To me, this means two things for mode H, i.e., what we should be doing:
> > - For a frame with one sequence number coming from one node, we should
> >   be forwarding the frame once in each direction.
> > - There could be duplicates of these frames in either direction that we
> >   should not forward. This is also hinted at in other parts of the
> >   standard, that there could be multiple duplicates, especially when HSR
> >   rings are coupled.
> >
> > Therefore, I think it is correct to do the duplicate tracking once for
> > each port, especially separately for port A and port B.
>
> I will not argue with you here.
> But. :)
> If you track duplicates for A and B and see a duplicate on port A then
> this indicates that the sender of this packet did not remove it from the
> ring once it received it back. This looks like a failure.

I agree, it looks like a failure. I tried to find the reason why the
standard is so explicitly asking for discarding duplicates within the
ring, as that implies that sometimes such duplicates happen in normal
operation. In most cases, IMHO, there should not be any duplicates in
the same direction in the ring, if the nodes behave correctly.

I think the duplicates can occur on RedBoxes if they are used to couple
the HSR ring and another redundant network (either with a PRP network
through HSR-PRP mode or with an HSR network through HSR-HSR mode, which
we both don't implement at the moment). In both cases, two RedBoxes are
used to maintain redundancy and both independently inject a frame they
receive into the HSR ring (see Figure 12 and 16 in the IEC
62439-3:2021). But it's also the RedBoxes that would remove the
duplicates, so we would be aware that we should discard duplicates on
port A and B.

However, I'm not sure if these are the only cases where duplicates are
to be expected in the ring and I don't want to prematurely make the
decision to just forward everything within the ring.

> If you track duplicates for A and B in a single "bitmap" then this would
> mode X.

I think I didn't understand this clearly and I mixed this up in some of
my replies, so just to make sure; there are two ideas:
1) Track port A and B together, i.e., don't forward the
   counter-duplicate, i.e., Mode X (which is optional to implement).
2) Don't track duplicates for port A or B at all. We consider duplicates
   in the ring as a failure of another node.

We don't consider 1), i.e., Mode X, at the moment. We are only
discussing idea 2), right?

> I nag here a bit because you allocate 16 + (128 / 8) * 4 * 64 = 4112
> bytes for this bitmap per node. That is a bit over 4kib. Then adding and
> removing sequences got a bit more expensive. Anyway. There is table
> F.19+ specifying HSR tests and don't find "forwarding duplicate over
> port A". So lets keep it.

I totally understand that. It's quite some memory for each node and I'm
very much in for reducing that. But I'd suggest that, for now, we follow
what we had before and what the standard explicitly asks, which is to
track (and discard) duplicates on port A and B as well. I'm not sure if
we should diverge from the standard in one of the few points where it is
explicit. If I understand your last sentence correctly, you're okay with
that as well?

Thanks,
   Felix

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-02-03 10:12               ` Felix Maurer
@ 2026-02-03 11:55                 ` Sebastian Andrzej Siewior
  2026-02-03 12:23                   ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-03 11:55 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-02-03 11:12:36 [+0100], Felix Maurer wrote:
> On Mon, Feb 02, 2026 at 04:55:10PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2026-02-02 12:51:12 [+0100], Felix Maurer wrote:
> > > > Hm, I am not sure? For PRP, it's an explicit requirement to use the same
> > > > MAC addresses for both ports. For HSR, I think the standard is less
> > > > clear about the MAC addresses. And at least our code seems to assume
> > > > that there could be different MACs on the two interfaces of a node? But
> > > > yes, the node merging addresses this.
> > >
> > > I'm still not 100% certain, but I agree that the standard reads more
> > > like the MAC addresses should be the same for the two HSR ports. At the
> > > moment, the kernel and the test assumes that they can/should be
> > > different. Therefore, I think we should fix this across the board in
> > > another patchset if we agree that's the right thing to do.
> >
> > I would suggest to do so. This could serve as bad example and my PTP
> > userland patches expect the same MAC on both ports. So ;)
> 
> I'll put it on my list for another patchset :)
> But this of course brings up how the MAC addresses should be handled on
> the HSR interfaces in general. At the moment, for HSR we copy the MAC of
> portA to master. For PRP, the MAC of portA is copied to master and
> portB.
> 
> If the MACs should be the same for both ports of an HSR interface,
> should we set them when creating the HSR interface, similarly to PRP?
We do this?

> And does it make sense to copy from portA (feels somewhat arbitrary to
> me) or would it be cleaner to copy from master to portA and portB?

I do
| ip link set $portA down
| ip link set $portB down
| ip link set dev $portA address $mac
| ip link set dev $portB address $mac
| ip link add name $if type hsr slave1 $portA slave2 $portB supervision 45 version 1

and then up, up, up to get up the three interfaces functional. $mac is
the MAC $portA.

On the other hand it shouldn't matter because while adding the HSR tag,
the MAC of the hsr interface should be taken but somehow I observed that
packets, which originate on the $portB interface have the MAC address of
this interface. So I did this and forgot about it later…
But then hardware offloading might behave different if it considers the
address of the interface so it might required to set the MAC address.

I am uncertain what the best practice would be here.
Forcing the MAC of portA to portB could be done at setup phase by the
stack but would require that the interface is down…

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-02-03 10:23         ` Felix Maurer
@ 2026-02-03 11:57           ` Sebastian Andrzej Siewior
  2026-02-03 12:42             ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-03 11:57 UTC (permalink / raw)
  To: Felix Maurer
  Cc: Simon Horman, netdev, davem, edumazet, kuba, pabeni, jkarrenpalo,
	tglx, mingo, allison.henderson, petrm, antonio, Steffen Lindner

On 2026-02-03 11:23:57 [+0100], Felix Maurer wrote:
> On Mon, Feb 02, 2026 at 05:57:02PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2026-01-28 19:37:23 [+0100], Felix Maurer wrote:
> > > > > +	if (!block) {
> > > > > +		block = &node->block_buf[node->next_block];
> > > > > +		hsr_forget_seq_block(node, block);
> > > > > +
> > > > > +		memset(block, 0, sizeof(*block));
> > > > > +		block->time = jiffies;
> > > > > +		block->block_idx = block_idx;
> > > > > +
> > > > > +		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
> > > > > +		if (xa_is_err(res))
> > > >
> > > > Hi Felix,
> > > >
> > > > I ran Claude Code over this with review-prompts [1] and it flags
> > > > that in the error path above, the following is needed so that the
> > > > block can be re-used.
> > > >
> > > > 			block->time = 0;
> > >
> > > I agree. It would nonetheless be reused at some point, but not setting
> > > time = 0 may lead to an unexpected block getting removed when it is
> > > reused (or at least an attempt to do so). I'll fix this in v3.
> >
> > Not sure I follow. If xa_store() fails then the block is not added to
> > the "sequence-blocks" and it will be attempted once the next packet is
> > received, right?. Otherwise node->next_block is not updated at which
> > point this block will be added twice which sounds worse.
> 
> Yes, it will be attempted again on the next frame that needs a new
> block. But every attempt to recycle the next_block starts with calling
> hsr_forget_seq_block(), which tries to xa_erase() the block if time!=0.
> Generally, time==0 is the marker that a block is not stored in the
> xarray. So for consistency it's correct to set time=0 if the block can't
> be added to the xarray, even though I don't think it would cause any
> trouble at the moment if the time is not reset.

My point is that the block will not be recycled if it failed to be added
here. It remains the "next" block to be used and added to the list if it
fails that xa_store().

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-02-03 11:49             ` Felix Maurer
@ 2026-02-03 12:08               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-03 12:08 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On 2026-02-03 12:49:39 [+0100], Felix Maurer wrote:
> I think the duplicates can occur on RedBoxes if they are used to couple
> the HSR ring and another redundant network (either with a PRP network
> through HSR-PRP mode or with an HSR network through HSR-HSR mode, which
> we both don't implement at the moment). In both cases, two RedBoxes are
> used to maintain redundancy and both independently inject a frame they
> receive into the HSR ring (see Figure 12 and 16 in the IEC
> 62439-3:2021). But it's also the RedBoxes that would remove the
> duplicates, so we would be aware that we should discard duplicates on
> port A and B.
> 
> However, I'm not sure if these are the only cases where duplicates are
> to be expected in the ring and I don't want to prematurely make the
> decision to just forward everything within the ring.

I would have to study the spec first to understand how PRP, HSR mix
works with the redbox before I can give a statement. But yeah…

> > If you track duplicates for A and B in a single "bitmap" then this would
> > mode X.
> 
> I think I didn't understand this clearly and I mixed this up in some of
> my replies, so just to make sure; there are two ideas:
> 1) Track port A and B together, i.e., don't forward the
>    counter-duplicate, i.e., Mode X (which is optional to implement).
> 2) Don't track duplicates for port A or B at all. We consider duplicates
>    in the ring as a failure of another node.
> 
> We don't consider 1), i.e., Mode X, at the moment. We are only
> discussing idea 2), right?

correct.

> > I nag here a bit because you allocate 16 + (128 / 8) * 4 * 64 = 4112
> > bytes for this bitmap per node. That is a bit over 4kib. Then adding and
> > removing sequences got a bit more expensive. Anyway. There is table
> > F.19+ specifying HSR tests and don't find "forwarding duplicate over
> > port A". So lets keep it.
> 
> I totally understand that. It's quite some memory for each node and I'm
> very much in for reducing that. But I'd suggest that, for now, we follow
> what we had before and what the standard explicitly asks, which is to
> track (and discard) duplicates on port A and B as well. I'm not sure if
> we should diverge from the standard in one of the few points where it is
> explicit. If I understand your last sentence correctly, you're okay with
> that as well?

Yes. Because you reworked the duplicate-detect algorithm and I would
like to keep the "unrelated" details (tracking A, B, …) as it was so if
it serious regressions get reported we can easily pin point it. So we
replace just one thing and both multiples. At the expense that more
memory is now consumed but this is fine I hope.
So this might be for the future.

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR
  2026-02-03 11:08         ` Felix Maurer
@ 2026-02-03 12:09           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-03 12:09 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio, Yoann Congal

On 2026-02-03 12:08:39 [+0100], Felix Maurer wrote:
> > I would get rid of it. There the "official" DECLARE_BITMAP(). If this
> > doesn't work just open code it avoiding the macro.
> 
> Alright, I did that in the v3 [1] I sent yesterday. Sorry for not
> waiting with that until the end of every discussion. I understood your
> reply to 0/9 as a "go" for sending a new version addressing the small
> issues (without touching the A/B tracking).

Don't worry, I planned to reply earlier but didn't manage. Hope to
continue today with your v3.

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
  2026-02-02 16:45       ` Sebastian Andrzej Siewior
@ 2026-02-03 12:09         ` Felix Maurer
  2026-02-03 14:49           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 12:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Mon, Feb 02, 2026 at 05:45:50PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-02 12:30:44 [+0100], Felix Maurer wrote:
> > On Thu, Jan 29, 2026 at 02:32:17PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2026-01-22 15:57:00 [+0100], Felix Maurer wrote:
> > > > Add tests where one link has different rates of packet loss or reorders
> > > > packets. PRP should still be able to recover from these link faults and
> > > > show no packet loss.  However, it is acceptable to receive some level of
> > > > duplicate packets. This matches the current specification (IEC
> > > > 62439-3:2021) of the duplicate discard algorithm that requires it to be
> > > > "designed such that it never rejects a legitimate frame, while occasional
> > > > acceptance of a duplicate can be tolerated." The rate of acceptable
> > > > duplicates in this test is intentionally high (10%) to make the test
> > > > stable, the values I observed in the worst test cases (20% loss) are around
> > > > 5% duplicates.
> > >
> > > Do you know why we have the duplicates? It is because of the high
> > > sending rate at which point the blocks are dropped and we can't
> > > recognise the duplicate?
> >
> > I could not give a definitive answer on this last week, I had to dig a
> > bit and think through this. The reason for the duplicates is not a high
> > sending rate leading to dropped blocks. Rather, it's the low sending
> > rate leading to expiring blocks: in the test, the ping interval is set
> > to 10ms. As the blocks expire based on the timestamp of the first
> > received sequence number, the 400ms entry expiry gets hit every ca. 40
> > frames, at which we basically clear all received sequence number in the
> > block by creating a new block. This happens on both nodes (for ping
> > requests and replies) ca. 10 times during the test, leading to about 20
> > duplicate packets, which is the ca. 5% duplicates I observed in multiple
> > runs of this test.
> >
> > This means the duplicate rate is highly dependent on the traffic pattern
> > and the test manages to hit exactly this situation. I'll add the
> > explanation to the message in v3. If similar traffic patterns are hit in
> > the real world too often, we could test different block sizes. I just
> > did this and the results are as expected:
> > - 64 bit blocks: the duplicate rate is about the same because the block
> >   still expires once.
> > - 32 bit blocks: the duplicate rate drops to around 4% because,
> >   including the netem delay in the test, this is roughly the edge
> >   between expiring a block and already hitting a new one. But if we keep
> >   the bitmap as it is, we waste 50% of memory.
> > - 16 bit blocks: duplicate rate is zero because we use new blocks way
> >   before the expiry time. But this wastes about 75% memory for each
> >   block and the overhead of the timestamp in each block gets quite
> >   significant.
> >
> > I'd suggest to keep the 128 bit blocks to have a chance to detect
> > duplicates at higher data rates for some delay difference (up to about
> > 5.5ms with 1Gbit/s). 64 bit blocks would halve the maximum delay
> > difference, but probably still strike a good balance between memory
> > overhead and duplicate rate. Do you prefer one over the other?
>
> We had false positives with the old algorithm so having now some is not
> the end of the world. 128bit blocks are fine I guess. The question is
> how far do we want to tune it and everything comes with a price.
>
> If I update bitmap every time there is a valid packet:
>
> diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
> index 08f91f10a9347..bb9b13940c35b 100644
> --- a/net/hsr/hsr_framereg.c
> +++ b/net/hsr/hsr_framereg.c
> @@ -572,6 +572,7 @@ static int hsr_check_duplicate(struct hsr_frame_info *frame,
>  	if (test_and_set_bit(seq_bit, block->seq_nrs[port_type]))
>  		goto out_seen;
>
> +	block->time = jiffies;
>  out_new:
>  	spin_unlock_bh(&node->seq_out_lock);
>  	return 0;
>
> then I the test
> | TEST: test_cut_link - HSRv1                                         [ OK ]
> | INFO: Wait for node table entries to be merged.
> | INFO: Running ping node1-NUb1aJ -> 100.64.0.2
> | INFO: 400 packets transmitted, 400 received, +22 duplicates, 0% packet loss, time 6113ms
>
> turns into
>
> | TEST: test_cut_link - HSRv1                                         [ OK ]
> | INFO: Wait for node table entries to be merged.
> | INFO: Running ping node1-6i4xNe -> 100.64.0.2
> | INFO: 400 packets transmitted, 400 received, 0% packet loss, time 6067ms
>
> which could be considered as an improvement.

Thank you for testing this! That's indeed an improvement.

> On the other hand the live
> time of the block gets pushed past the HSR_ENTRY_FORGET_TIME so
> the first entry survives longer theoretically 127 * (400-1) = ~50 seconds.
> In the reboot case the box should be quiet for 500ms at which point the
> entry gets removed anyway so the longer "hold time" shouldn't matter.
>
> It sounds like a good idea but I don't want to rush anything ;)

I agree, the reboot case is safe because of the 500ms quiet period. But
I kinda fear a situation where a node sends a lot of packets (making
it's sequence number wrap within the 400ms), but only a small fraction
of that to us, so that we don't clear blocks because of a full buffer.
That could make a block live pretty long, when we hit it multiple times,
which could in turn lead to valid, new frames being dropped. I'd say
it's a somewhat artificial scenario, but not impossible in reality
(especially considering that I'd expect a good chunk of the traffic in
HSR and PRP networks being periodic, which may lead to reliably hitting
the same blocks over and over).

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-02-03 11:55                 ` Sebastian Andrzej Siewior
@ 2026-02-03 12:23                   ` Felix Maurer
  2026-02-03 13:47                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 12:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Tue, Feb 03, 2026 at 12:55:26PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-03 11:12:36 [+0100], Felix Maurer wrote:
> > On Mon, Feb 02, 2026 at 04:55:10PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2026-02-02 12:51:12 [+0100], Felix Maurer wrote:
> > > > > Hm, I am not sure? For PRP, it's an explicit requirement to use the same
> > > > > MAC addresses for both ports. For HSR, I think the standard is less
> > > > > clear about the MAC addresses. And at least our code seems to assume
> > > > > that there could be different MACs on the two interfaces of a node? But
> > > > > yes, the node merging addresses this.
> > > >
> > > > I'm still not 100% certain, but I agree that the standard reads more
> > > > like the MAC addresses should be the same for the two HSR ports. At the
> > > > moment, the kernel and the test assumes that they can/should be
> > > > different. Therefore, I think we should fix this across the board in
> > > > another patchset if we agree that's the right thing to do.
> > >
> > > I would suggest to do so. This could serve as bad example and my PTP
> > > userland patches expect the same MAC on both ports. So ;)
> >
> > I'll put it on my list for another patchset :)
> > But this of course brings up how the MAC addresses should be handled on
> > the HSR interfaces in general. At the moment, for HSR we copy the MAC of
> > portA to master. For PRP, the MAC of portA is copied to master and
> > portB.
> >
> > If the MACs should be the same for both ports of an HSR interface,
> > should we set them when creating the HSR interface, similarly to PRP?
> We do this?

We didn't always do this, but since b65999e7238e ("net: hsr: sync hw
addr of slave2 according to slave1 hw addr on PRP"), we are doing this
for PRP.

> > And does it make sense to copy from portA (feels somewhat arbitrary to
> > me) or would it be cleaner to copy from master to portA and portB?
>
> I do
> | ip link set $portA down
> | ip link set $portB down
> | ip link set dev $portA address $mac
> | ip link set dev $portB address $mac
> | ip link add name $if type hsr slave1 $portA slave2 $portB supervision 45 version 1
>
> and then up, up, up to get up the three interfaces functional. $mac is
> the MAC $portA.
>
> On the other hand it shouldn't matter because while adding the HSR tag,
> the MAC of the hsr interface should be taken but somehow I observed that
> packets, which originate on the $portB interface have the MAC address of
> this interface. So I did this and forgot about it later…
> But then hardware offloading might behave different if it considers the
> address of the interface so it might required to set the MAC address.
>
> I am uncertain what the best practice would be here.
> Forcing the MAC of portA to portB could be done at setup phase by the
> stack but would require that the interface is down…

I think that's a pretty common series of setup steps, yes. But as even
we are not sure how the MAC addresses should be set, I think we should
not expect that users will configure this correctly, but do the right
thing(tm) automatically.

But like you, I'm uncertain what would be the right thing to do and
which port should control the MAC for all the others.

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-02-03 11:57           ` Sebastian Andrzej Siewior
@ 2026-02-03 12:42             ` Felix Maurer
  2026-02-03 13:49               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 12:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Simon Horman, netdev, davem, edumazet, kuba, pabeni, jkarrenpalo,
	tglx, mingo, allison.henderson, petrm, antonio, Steffen Lindner

On Tue, Feb 03, 2026 at 12:57:19PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-03 11:23:57 [+0100], Felix Maurer wrote:
> > On Mon, Feb 02, 2026 at 05:57:02PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2026-01-28 19:37:23 [+0100], Felix Maurer wrote:
> > > > > > +	if (!block) {
> > > > > > +		block = &node->block_buf[node->next_block];
> > > > > > +		hsr_forget_seq_block(node, block);
> > > > > > +
> > > > > > +		memset(block, 0, sizeof(*block));
> > > > > > +		block->time = jiffies;
> > > > > > +		block->block_idx = block_idx;
> > > > > > +
> > > > > > +		res = xa_store(&node->seq_blocks, block_idx, block, GFP_ATOMIC);
> > > > > > +		if (xa_is_err(res))
> > > > >
> > > > > Hi Felix,
> > > > >
> > > > > I ran Claude Code over this with review-prompts [1] and it flags
> > > > > that in the error path above, the following is needed so that the
> > > > > block can be re-used.
> > > > >
> > > > > 			block->time = 0;
> > > >
> > > > I agree. It would nonetheless be reused at some point, but not setting
> > > > time = 0 may lead to an unexpected block getting removed when it is
> > > > reused (or at least an attempt to do so). I'll fix this in v3.
> > >
> > > Not sure I follow. If xa_store() fails then the block is not added to
> > > the "sequence-blocks" and it will be attempted once the next packet is
> > > received, right?. Otherwise node->next_block is not updated at which
> > > point this block will be added twice which sounds worse.
> >
> > Yes, it will be attempted again on the next frame that needs a new
> > block. But every attempt to recycle the next_block starts with calling
> > hsr_forget_seq_block(), which tries to xa_erase() the block if time!=0.
> > Generally, time==0 is the marker that a block is not stored in the
> > xarray. So for consistency it's correct to set time=0 if the block can't
> > be added to the xarray, even though I don't think it would cause any
> > trouble at the moment if the time is not reset.
>
> My point is that the block will not be recycled if it failed to be added
> here. It remains the "next" block to be used and added to the list if it
> fails that xa_store().

Hm, I'm not sure I follow. Do you argue that the code already in the
patch is incorrect or that it would be incorrect with Simon's
suggestion?

Yes, in case the xa_store() fails, the current "next" block remains the
"next" block. But we also definitely erased it from the xarray already.
With the next frame, we attempt to recycle/insert the block again. I
don't think that can lead to the same block being used twice?

I understood Simon's suggestion this way: in the error path, *also*
reset block->time to 0. We'd still return NULL in that case and,
crucially, don't advance the next_block. IMHO, that only makes the data
more consistent (blocks in the buffer that are not in the xarray always
have time=0) and prevents that we try to remove a non-existing entry
from the xarray on the next frame when we try again to recycle the
current "next" block.

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-02-03 12:23                   ` Felix Maurer
@ 2026-02-03 13:47                     ` Sebastian Andrzej Siewior
  2026-02-03 15:07                       ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-03 13:47 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-02-03 13:23:50 [+0100], Felix Maurer wrote:
> But like you, I'm uncertain what would be the right thing to do and
> which port should control the MAC for all the others.

At the very least I need the same MAC on both ends for PTP.
Now that check the spec, 5.2.1.1 "HSR handling of multicast frames",
bottom below Figure 10, says it sends the frame over frame A and frame B
oder the ports. The "src-MAC + sequence number" uniquely identify the
same frame.
Now I am certain that the two different MACs are wrong. Let deal with
this later and then decide what to do about the testsuite.

> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-02-03 12:42             ` Felix Maurer
@ 2026-02-03 13:49               ` Sebastian Andrzej Siewior
  2026-02-03 15:11                 ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-03 13:49 UTC (permalink / raw)
  To: Felix Maurer
  Cc: Simon Horman, netdev, davem, edumazet, kuba, pabeni, jkarrenpalo,
	tglx, mingo, allison.henderson, petrm, antonio, Steffen Lindner

On 2026-02-03 13:42:48 [+0100], Felix Maurer wrote:
> Hm, I'm not sure I follow. Do you argue that the code already in the
> patch is incorrect or that it would be incorrect with Simon's
> suggestion?

The suggested change should have no impact.

…

> I understood Simon's suggestion this way: in the error path, *also*
> reset block->time to 0. We'd still return NULL in that case and,
> crucially, don't advance the next_block. IMHO, that only makes the data
> more consistent (blocks in the buffer that are not in the xarray always
> have time=0) and prevents that we try to remove a non-existing entry
> from the xarray on the next frame when we try again to recycle the
> current "next" block.

That is correct but should have no visible impact.

> Thanks,
>    Felix
> 

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
  2026-02-03 12:09         ` Felix Maurer
@ 2026-02-03 14:49           ` Sebastian Andrzej Siewior
  2026-02-03 15:32             ` Felix Maurer
  0 siblings, 1 reply; 55+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-03 14:49 UTC (permalink / raw)
  To: Felix Maurer
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On 2026-02-03 13:09:40 [+0100], Felix Maurer wrote:
> I agree, the reboot case is safe because of the 500ms quiet period. But
> I kinda fear a situation where a node sends a lot of packets (making
> it's sequence number wrap within the 400ms), but only a small fraction

wrap? A lot of packets means you force-recycle the oldest block. This
doesn't have an impact on timing.

> of that to us, so that we don't clear blocks because of a full buffer.

Why does this make a difference? Are you describing it from PRP
perspective? From HSR point of view, each node sees all frames. If you
have HW support for de-duplication then you don't see duplicates.

> That could make a block live pretty long, when we hit it multiple times,
> which could in turn lead to valid, new frames being dropped. I'd say

The block covers up to 128 frames and you have 64 blocks. This covers
8192 incremental seq-numbers and every block gets recycled 8 times until
the seq-number overflow.

I think you are afraid if a node sends 65536 packets in less than 400ms
and the receiving node observes only a fraction of it (less than 8192)
so it does not expire blocks by force. This may indeed lead to dropping
"new" packets.
But this should be only a PRP problem or also a HSR problem if the used
sequence number has "random" increments.

> it's a somewhat artificial scenario, but not impossible in reality
> (especially considering that I'd expect a good chunk of the traffic in
> HSR and PRP networks being periodic, which may lead to reliably hitting
> the same blocks over and over).
> 
> Thanks,
>    Felix

Sebastian

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP
  2026-02-03 13:47                     ` Sebastian Andrzej Siewior
@ 2026-02-03 15:07                       ` Felix Maurer
  0 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 15:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Tue, Feb 03, 2026 at 02:47:12PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-03 13:23:50 [+0100], Felix Maurer wrote:
> > But like you, I'm uncertain what would be the right thing to do and
> > which port should control the MAC for all the others.
>
> At the very least I need the same MAC on both ends for PTP.
> Now that check the spec, 5.2.1.1 "HSR handling of multicast frames",
> bottom below Figure 10, says it sends the frame over frame A and frame B
> oder the ports. The "src-MAC + sequence number" uniquely identify the
> same frame.
> Now I am certain that the two different MACs are wrong. Let deal with
> this later and then decide what to do about the testsuite.

Alright, I lean towards same MAC on both ports as well. But yeah, let's
fix this in the future, including the test suite.

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP
  2026-02-03 13:49               ` Sebastian Andrzej Siewior
@ 2026-02-03 15:11                 ` Felix Maurer
  0 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 15:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Simon Horman, netdev, davem, edumazet, kuba, pabeni, jkarrenpalo,
	tglx, mingo, allison.henderson, petrm, antonio, Steffen Lindner

On Tue, Feb 03, 2026 at 02:49:29PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-03 13:42:48 [+0100], Felix Maurer wrote:
> > Hm, I'm not sure I follow. Do you argue that the code already in the
> > patch is incorrect or that it would be incorrect with Simon's
> > suggestion?
>
> The suggested change should have no impact.
>
> …
>
> > I understood Simon's suggestion this way: in the error path, *also*
> > reset block->time to 0. We'd still return NULL in that case and,
> > crucially, don't advance the next_block. IMHO, that only makes the data
> > more consistent (blocks in the buffer that are not in the xarray always
> > have time=0) and prevents that we try to remove a non-existing entry
> > from the xarray on the next frame when we try again to recycle the
> > current "next" block.
>
> That is correct but should have no visible impact.

Alright, we're on the same track then. I agree that there should not be
any visible impact. But to avoid surprises down the road, I think it's
good to set time=0 for consistency.

Thanks,
   Felix


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
  2026-02-03 14:49           ` Sebastian Andrzej Siewior
@ 2026-02-03 15:32             ` Felix Maurer
  0 siblings, 0 replies; 55+ messages in thread
From: Felix Maurer @ 2026-02-03 15:32 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jkarrenpalo, tglx,
	mingo, allison.henderson, petrm, antonio

On Tue, Feb 03, 2026 at 03:49:03PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-03 13:09:40 [+0100], Felix Maurer wrote:
> > I agree, the reboot case is safe because of the 500ms quiet period. But
> > I kinda fear a situation where a node sends a lot of packets (making
> > it's sequence number wrap within the 400ms), but only a small fraction
>
> wrap? A lot of packets means you force-recycle the oldest block. This
> doesn't have an impact on timing.
>
> > of that to us, so that we don't clear blocks because of a full buffer.
>
> Why does this make a difference? Are you describing it from PRP
> perspective? From HSR point of view, each node sees all frames. If you
> have HW support for de-duplication then you don't see duplicates.

Good point, I didn't state that explicitly. All of my reasoning was
based on a PRP perspective, where we don't necessarily see all frames.

> > That could make a block live pretty long, when we hit it multiple times,
> > which could in turn lead to valid, new frames being dropped. I'd say
>
> The block covers up to 128 frames and you have 64 blocks. This covers
> 8192 incremental seq-numbers and every block gets recycled 8 times until
> the seq-number overflow.
>
> I think you are afraid if a node sends 65536 packets in less than 400ms
> and the receiving node observes only a fraction of it (less than 8192)
> so it does not expire blocks by force. This may indeed lead to dropping
> "new" packets.
> But this should be only a PRP problem or also a HSR problem if the used
> sequence number has "random" increments.

Yes, that's about the scenario I'm considering (I very much agree it's
an edge case, though). At it's face, this is only a PRP problem.

But I think this PRP problem can extend into an HSR ring. The standard
describes how HSR rings can be coupled to a PRP network using two
RedBoxes in HSR-PRP mode. Assuming a node in the PRP network sends a
frame, if the RedBox knows the receiver is a node in the PRP network, it
will not inject the frame into the HSR ring.  Therefore, the MACs in the
PRP network could be appearing as nodes sending with random increments
in our HSR ring. Notably, we can't know if such RedBoxes exist in our
ring. Yeah, it's another condition that needs to be met, and the edge
case get's thinner.

Thanks,
   Felix

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2026-02-03 15:32 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP Felix Maurer
2026-01-29 11:05   ` Sebastian Andrzej Siewior
2026-01-29 13:31     ` Felix Maurer
2026-01-29 15:21       ` Sebastian Andrzej Siewior
2026-01-29 17:44         ` Felix Maurer
2026-02-02 11:51           ` Felix Maurer
2026-02-02 15:55             ` Sebastian Andrzej Siewior
2026-02-03 10:12               ` Felix Maurer
2026-02-03 11:55                 ` Sebastian Andrzej Siewior
2026-02-03 12:23                   ` Felix Maurer
2026-02-03 13:47                     ` Sebastian Andrzej Siewior
2026-02-03 15:07                       ` Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 2/9] selftests: hsr: Check duplicates on HSR with VLAN Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 3/9] selftests: hsr: Add tests for faulty links Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP Felix Maurer
2026-01-28 16:38   ` Simon Horman
2026-01-28 18:37     ` Felix Maurer
2026-02-02 16:57       ` Sebastian Andrzej Siewior
2026-02-03 10:23         ` Felix Maurer
2026-02-03 11:57           ` Sebastian Andrzej Siewior
2026-02-03 12:42             ` Felix Maurer
2026-02-03 13:49               ` Sebastian Andrzej Siewior
2026-02-03 15:11                 ` Felix Maurer
2026-01-29 13:29   ` Sebastian Andrzej Siewior
2026-01-29 15:30     ` Felix Maurer
2026-02-02  8:47   ` Steffen Lindner
2026-01-22 14:57 ` [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP Felix Maurer
2026-01-29 13:32   ` Sebastian Andrzej Siewior
2026-02-02 11:30     ` Felix Maurer
2026-02-02 16:45       ` Sebastian Andrzej Siewior
2026-02-03 12:09         ` Felix Maurer
2026-02-03 14:49           ` Sebastian Andrzej Siewior
2026-02-03 15:32             ` Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR Felix Maurer
2026-01-29 14:43   ` Sebastian Andrzej Siewior
2026-01-29 16:17     ` Felix Maurer
2026-01-29 18:01       ` Felix Maurer
2026-01-30 10:34         ` Felix Maurer
2026-02-02 17:53           ` Sebastian Andrzej Siewior
2026-02-03 11:49             ` Felix Maurer
2026-02-03 12:08               ` Sebastian Andrzej Siewior
2026-02-02 17:11       ` Sebastian Andrzej Siewior
2026-02-03 11:08         ` Felix Maurer
2026-02-03 12:09           ` Sebastian Andrzej Siewior
2026-01-22 14:57 ` [PATCH net-next v2 7/9] selftests: hsr: Add more link fault tests " Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm Felix Maurer
2026-01-29 15:12   ` Sebastian Andrzej Siewior
2026-01-29 16:19     ` Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 9/9] MAINTAINERS: Assign hsr selftests to HSR Felix Maurer
2026-01-22 17:24 ` [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Sebastian Andrzej Siewior
2026-01-23  1:35 ` Jakub Kicinski
2026-01-26  9:28   ` Felix Maurer
2026-01-29 15:29     ` Sebastian Andrzej Siewior
2026-01-29 16:29       ` Felix Maurer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox