public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Felix Maurer <fmaurer@redhat.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	jkarrenpalo@gmail.com, tglx@linutronix.de, mingo@kernel.org,
	allison.henderson@oracle.com, petrm@nvidia.com,
	antonio@openvpn.net
Subject: Re: [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
Date: Mon, 2 Feb 2026 17:45:50 +0100	[thread overview]
Message-ID: <20260202164550.1TDNCEe_@linutronix.de> (raw)
In-Reply-To: <aYCK5Dczg_0tbuEA@thinkpad>

On 2026-02-02 12:30:44 [+0100], Felix Maurer wrote:
> On Thu, Jan 29, 2026 at 02:32:17PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2026-01-22 15:57:00 [+0100], Felix Maurer wrote:
> > > Add tests where one link has different rates of packet loss or reorders
> > > packets. PRP should still be able to recover from these link faults and
> > > show no packet loss.  However, it is acceptable to receive some level of
> > > duplicate packets. This matches the current specification (IEC
> > > 62439-3:2021) of the duplicate discard algorithm that requires it to be
> > > "designed such that it never rejects a legitimate frame, while occasional
> > > acceptance of a duplicate can be tolerated." The rate of acceptable
> > > duplicates in this test is intentionally high (10%) to make the test
> > > stable, the values I observed in the worst test cases (20% loss) are around
> > > 5% duplicates.
> >
> > Do you know why we have the duplicates? It is because of the high
> > sending rate at which point the blocks are dropped and we can't
> > recognise the duplicate?
> 
> I could not give a definitive answer on this last week, I had to dig a
> bit and think through this. The reason for the duplicates is not a high
> sending rate leading to dropped blocks. Rather, it's the low sending
> rate leading to expiring blocks: in the test, the ping interval is set
> to 10ms. As the blocks expire based on the timestamp of the first
> received sequence number, the 400ms entry expiry gets hit every ca. 40
> frames, at which we basically clear all received sequence number in the
> block by creating a new block. This happens on both nodes (for ping
> requests and replies) ca. 10 times during the test, leading to about 20
> duplicate packets, which is the ca. 5% duplicates I observed in multiple
> runs of this test.
> 
> This means the duplicate rate is highly dependent on the traffic pattern
> and the test manages to hit exactly this situation. I'll add the
> explanation to the message in v3. If similar traffic patterns are hit in
> the real world too often, we could test different block sizes. I just
> did this and the results are as expected:
> - 64 bit blocks: the duplicate rate is about the same because the block
>   still expires once.
> - 32 bit blocks: the duplicate rate drops to around 4% because,
>   including the netem delay in the test, this is roughly the edge
>   between expiring a block and already hitting a new one. But if we keep
>   the bitmap as it is, we waste 50% of memory.
> - 16 bit blocks: duplicate rate is zero because we use new blocks way
>   before the expiry time. But this wastes about 75% memory for each
>   block and the overhead of the timestamp in each block gets quite
>   significant.
> 
> I'd suggest to keep the 128 bit blocks to have a chance to detect
> duplicates at higher data rates for some delay difference (up to about
> 5.5ms with 1Gbit/s). 64 bit blocks would halve the maximum delay
> difference, but probably still strike a good balance between memory
> overhead and duplicate rate. Do you prefer one over the other?

We had false positives with the old algorithm so having now some is not
the end of the world. 128bit blocks are fine I guess. The question is
how far do we want to tune it and everything comes with a price.

If I update bitmap every time there is a valid packet:

diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index 08f91f10a9347..bb9b13940c35b 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -572,6 +572,7 @@ static int hsr_check_duplicate(struct hsr_frame_info *frame,
 	if (test_and_set_bit(seq_bit, block->seq_nrs[port_type]))
 		goto out_seen;
 
+	block->time = jiffies;
 out_new:
 	spin_unlock_bh(&node->seq_out_lock);
 	return 0;

then I the test
| TEST: test_cut_link - HSRv1                                         [ OK ]
| INFO: Wait for node table entries to be merged.
| INFO: Running ping node1-NUb1aJ -> 100.64.0.2
| INFO: 400 packets transmitted, 400 received, +22 duplicates, 0% packet loss, time 6113ms

turns into

| TEST: test_cut_link - HSRv1                                         [ OK ]
| INFO: Wait for node table entries to be merged.
| INFO: Running ping node1-6i4xNe -> 100.64.0.2
| INFO: 400 packets transmitted, 400 received, 0% packet loss, time 6067ms

which could be considered as an improvement. On the other hand the live
time of the block gets pushed past the HSR_ENTRY_FORGET_TIME so
the first entry survives longer theoretically 127 * (400-1) = ~50 seconds.
In the reboot case the box should be quiet for 500ms at which point the
entry gets removed anyway so the longer "hold time" shouldn't matter.

It sounds like a good idea but I don't want to rush anything ;)

> Thanks,
>    Felix

Sebastian

  reply	other threads:[~2026-02-02 16:45 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP Felix Maurer
2026-01-29 11:05   ` Sebastian Andrzej Siewior
2026-01-29 13:31     ` Felix Maurer
2026-01-29 15:21       ` Sebastian Andrzej Siewior
2026-01-29 17:44         ` Felix Maurer
2026-02-02 11:51           ` Felix Maurer
2026-02-02 15:55             ` Sebastian Andrzej Siewior
2026-02-03 10:12               ` Felix Maurer
2026-02-03 11:55                 ` Sebastian Andrzej Siewior
2026-02-03 12:23                   ` Felix Maurer
2026-02-03 13:47                     ` Sebastian Andrzej Siewior
2026-02-03 15:07                       ` Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 2/9] selftests: hsr: Check duplicates on HSR with VLAN Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 3/9] selftests: hsr: Add tests for faulty links Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP Felix Maurer
2026-01-28 16:38   ` Simon Horman
2026-01-28 18:37     ` Felix Maurer
2026-02-02 16:57       ` Sebastian Andrzej Siewior
2026-02-03 10:23         ` Felix Maurer
2026-02-03 11:57           ` Sebastian Andrzej Siewior
2026-02-03 12:42             ` Felix Maurer
2026-02-03 13:49               ` Sebastian Andrzej Siewior
2026-02-03 15:11                 ` Felix Maurer
2026-01-29 13:29   ` Sebastian Andrzej Siewior
2026-01-29 15:30     ` Felix Maurer
2026-02-02  8:47   ` Steffen Lindner
2026-01-22 14:57 ` [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP Felix Maurer
2026-01-29 13:32   ` Sebastian Andrzej Siewior
2026-02-02 11:30     ` Felix Maurer
2026-02-02 16:45       ` Sebastian Andrzej Siewior [this message]
2026-02-03 12:09         ` Felix Maurer
2026-02-03 14:49           ` Sebastian Andrzej Siewior
2026-02-03 15:32             ` Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR Felix Maurer
2026-01-29 14:43   ` Sebastian Andrzej Siewior
2026-01-29 16:17     ` Felix Maurer
2026-01-29 18:01       ` Felix Maurer
2026-01-30 10:34         ` Felix Maurer
2026-02-02 17:53           ` Sebastian Andrzej Siewior
2026-02-03 11:49             ` Felix Maurer
2026-02-03 12:08               ` Sebastian Andrzej Siewior
2026-02-02 17:11       ` Sebastian Andrzej Siewior
2026-02-03 11:08         ` Felix Maurer
2026-02-03 12:09           ` Sebastian Andrzej Siewior
2026-01-22 14:57 ` [PATCH net-next v2 7/9] selftests: hsr: Add more link fault tests " Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm Felix Maurer
2026-01-29 15:12   ` Sebastian Andrzej Siewior
2026-01-29 16:19     ` Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 9/9] MAINTAINERS: Assign hsr selftests to HSR Felix Maurer
2026-01-22 17:24 ` [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Sebastian Andrzej Siewior
2026-01-23  1:35 ` Jakub Kicinski
2026-01-26  9:28   ` Felix Maurer
2026-01-29 15:29     ` Sebastian Andrzej Siewior
2026-01-29 16:29       ` Felix Maurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260202164550.1TDNCEe_@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=allison.henderson@oracle.com \
    --cc=antonio@openvpn.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fmaurer@redhat.com \
    --cc=horms@kernel.org \
    --cc=jkarrenpalo@gmail.com \
    --cc=kuba@kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox