public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Felix Maurer <fmaurer@redhat.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	jkarrenpalo@gmail.com, tglx@linutronix.de, mingo@kernel.org,
	allison.henderson@oracle.com, petrm@nvidia.com,
	antonio@openvpn.net
Subject: Re: [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP
Date: Tue, 3 Feb 2026 13:09:40 +0100	[thread overview]
Message-ID: <aYHlhKiSh8YrXjT1@thinkpad> (raw)
In-Reply-To: <20260202164550.1TDNCEe_@linutronix.de>

On Mon, Feb 02, 2026 at 05:45:50PM +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-02 12:30:44 [+0100], Felix Maurer wrote:
> > On Thu, Jan 29, 2026 at 02:32:17PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2026-01-22 15:57:00 [+0100], Felix Maurer wrote:
> > > > Add tests where one link has different rates of packet loss or reorders
> > > > packets. PRP should still be able to recover from these link faults and
> > > > show no packet loss.  However, it is acceptable to receive some level of
> > > > duplicate packets. This matches the current specification (IEC
> > > > 62439-3:2021) of the duplicate discard algorithm that requires it to be
> > > > "designed such that it never rejects a legitimate frame, while occasional
> > > > acceptance of a duplicate can be tolerated." The rate of acceptable
> > > > duplicates in this test is intentionally high (10%) to make the test
> > > > stable, the values I observed in the worst test cases (20% loss) are around
> > > > 5% duplicates.
> > >
> > > Do you know why we have the duplicates? It is because of the high
> > > sending rate at which point the blocks are dropped and we can't
> > > recognise the duplicate?
> >
> > I could not give a definitive answer on this last week, I had to dig a
> > bit and think through this. The reason for the duplicates is not a high
> > sending rate leading to dropped blocks. Rather, it's the low sending
> > rate leading to expiring blocks: in the test, the ping interval is set
> > to 10ms. As the blocks expire based on the timestamp of the first
> > received sequence number, the 400ms entry expiry gets hit every ca. 40
> > frames, at which we basically clear all received sequence number in the
> > block by creating a new block. This happens on both nodes (for ping
> > requests and replies) ca. 10 times during the test, leading to about 20
> > duplicate packets, which is the ca. 5% duplicates I observed in multiple
> > runs of this test.
> >
> > This means the duplicate rate is highly dependent on the traffic pattern
> > and the test manages to hit exactly this situation. I'll add the
> > explanation to the message in v3. If similar traffic patterns are hit in
> > the real world too often, we could test different block sizes. I just
> > did this and the results are as expected:
> > - 64 bit blocks: the duplicate rate is about the same because the block
> >   still expires once.
> > - 32 bit blocks: the duplicate rate drops to around 4% because,
> >   including the netem delay in the test, this is roughly the edge
> >   between expiring a block and already hitting a new one. But if we keep
> >   the bitmap as it is, we waste 50% of memory.
> > - 16 bit blocks: duplicate rate is zero because we use new blocks way
> >   before the expiry time. But this wastes about 75% memory for each
> >   block and the overhead of the timestamp in each block gets quite
> >   significant.
> >
> > I'd suggest to keep the 128 bit blocks to have a chance to detect
> > duplicates at higher data rates for some delay difference (up to about
> > 5.5ms with 1Gbit/s). 64 bit blocks would halve the maximum delay
> > difference, but probably still strike a good balance between memory
> > overhead and duplicate rate. Do you prefer one over the other?
>
> We had false positives with the old algorithm so having now some is not
> the end of the world. 128bit blocks are fine I guess. The question is
> how far do we want to tune it and everything comes with a price.
>
> If I update bitmap every time there is a valid packet:
>
> diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
> index 08f91f10a9347..bb9b13940c35b 100644
> --- a/net/hsr/hsr_framereg.c
> +++ b/net/hsr/hsr_framereg.c
> @@ -572,6 +572,7 @@ static int hsr_check_duplicate(struct hsr_frame_info *frame,
>  	if (test_and_set_bit(seq_bit, block->seq_nrs[port_type]))
>  		goto out_seen;
>
> +	block->time = jiffies;
>  out_new:
>  	spin_unlock_bh(&node->seq_out_lock);
>  	return 0;
>
> then I the test
> | TEST: test_cut_link - HSRv1                                         [ OK ]
> | INFO: Wait for node table entries to be merged.
> | INFO: Running ping node1-NUb1aJ -> 100.64.0.2
> | INFO: 400 packets transmitted, 400 received, +22 duplicates, 0% packet loss, time 6113ms
>
> turns into
>
> | TEST: test_cut_link - HSRv1                                         [ OK ]
> | INFO: Wait for node table entries to be merged.
> | INFO: Running ping node1-6i4xNe -> 100.64.0.2
> | INFO: 400 packets transmitted, 400 received, 0% packet loss, time 6067ms
>
> which could be considered as an improvement.

Thank you for testing this! That's indeed an improvement.

> On the other hand the live
> time of the block gets pushed past the HSR_ENTRY_FORGET_TIME so
> the first entry survives longer theoretically 127 * (400-1) = ~50 seconds.
> In the reboot case the box should be quiet for 500ms at which point the
> entry gets removed anyway so the longer "hold time" shouldn't matter.
>
> It sounds like a good idea but I don't want to rush anything ;)

I agree, the reboot case is safe because of the 500ms quiet period. But
I kinda fear a situation where a node sends a lot of packets (making
it's sequence number wrap within the 400ms), but only a small fraction
of that to us, so that we don't clear blocks because of a full buffer.
That could make a block live pretty long, when we hit it multiple times,
which could in turn lead to valid, new frames being dropped. I'd say
it's a somewhat artificial scenario, but not impossible in reality
(especially considering that I'd expect a good chunk of the traffic in
HSR and PRP networks being periodic, which may lead to reliably hitting
the same blocks over and over).

Thanks,
   Felix


  reply	other threads:[~2026-02-03 12:09 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22 14:56 [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 1/9] selftests: hsr: Add ping test for PRP Felix Maurer
2026-01-29 11:05   ` Sebastian Andrzej Siewior
2026-01-29 13:31     ` Felix Maurer
2026-01-29 15:21       ` Sebastian Andrzej Siewior
2026-01-29 17:44         ` Felix Maurer
2026-02-02 11:51           ` Felix Maurer
2026-02-02 15:55             ` Sebastian Andrzej Siewior
2026-02-03 10:12               ` Felix Maurer
2026-02-03 11:55                 ` Sebastian Andrzej Siewior
2026-02-03 12:23                   ` Felix Maurer
2026-02-03 13:47                     ` Sebastian Andrzej Siewior
2026-02-03 15:07                       ` Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 2/9] selftests: hsr: Check duplicates on HSR with VLAN Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 3/9] selftests: hsr: Add tests for faulty links Felix Maurer
2026-01-22 14:56 ` [PATCH net-next v2 4/9] hsr: Implement more robust duplicate discard for PRP Felix Maurer
2026-01-28 16:38   ` Simon Horman
2026-01-28 18:37     ` Felix Maurer
2026-02-02 16:57       ` Sebastian Andrzej Siewior
2026-02-03 10:23         ` Felix Maurer
2026-02-03 11:57           ` Sebastian Andrzej Siewior
2026-02-03 12:42             ` Felix Maurer
2026-02-03 13:49               ` Sebastian Andrzej Siewior
2026-02-03 15:11                 ` Felix Maurer
2026-01-29 13:29   ` Sebastian Andrzej Siewior
2026-01-29 15:30     ` Felix Maurer
2026-02-02  8:47   ` Steffen Lindner
2026-01-22 14:57 ` [PATCH net-next v2 5/9] selftests: hsr: Add tests for more link faults with PRP Felix Maurer
2026-01-29 13:32   ` Sebastian Andrzej Siewior
2026-02-02 11:30     ` Felix Maurer
2026-02-02 16:45       ` Sebastian Andrzej Siewior
2026-02-03 12:09         ` Felix Maurer [this message]
2026-02-03 14:49           ` Sebastian Andrzej Siewior
2026-02-03 15:32             ` Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 6/9] hsr: Implement more robust duplicate discard for HSR Felix Maurer
2026-01-29 14:43   ` Sebastian Andrzej Siewior
2026-01-29 16:17     ` Felix Maurer
2026-01-29 18:01       ` Felix Maurer
2026-01-30 10:34         ` Felix Maurer
2026-02-02 17:53           ` Sebastian Andrzej Siewior
2026-02-03 11:49             ` Felix Maurer
2026-02-03 12:08               ` Sebastian Andrzej Siewior
2026-02-02 17:11       ` Sebastian Andrzej Siewior
2026-02-03 11:08         ` Felix Maurer
2026-02-03 12:09           ` Sebastian Andrzej Siewior
2026-01-22 14:57 ` [PATCH net-next v2 7/9] selftests: hsr: Add more link fault tests " Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 8/9] hsr: Update PRP duplicate discard KUnit test for new algorithm Felix Maurer
2026-01-29 15:12   ` Sebastian Andrzej Siewior
2026-01-29 16:19     ` Felix Maurer
2026-01-22 14:57 ` [PATCH net-next v2 9/9] MAINTAINERS: Assign hsr selftests to HSR Felix Maurer
2026-01-22 17:24 ` [PATCH net-next v2 0/9] hsr: Implement more robust duplicate discard algorithm Sebastian Andrzej Siewior
2026-01-23  1:35 ` Jakub Kicinski
2026-01-26  9:28   ` Felix Maurer
2026-01-29 15:29     ` Sebastian Andrzej Siewior
2026-01-29 16:29       ` Felix Maurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYHlhKiSh8YrXjT1@thinkpad \
    --to=fmaurer@redhat.com \
    --cc=allison.henderson@oracle.com \
    --cc=antonio@openvpn.net \
    --cc=bigeasy@linutronix.de \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jkarrenpalo@gmail.com \
    --cc=kuba@kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox