* [PATCH net-next] tcp: correct the skip logic in tcp_sacktag_skip()
@ 2025-07-13 15:22 Xin Guo
2025-07-17 8:24 ` Paolo Abeni
0 siblings, 1 reply; 5+ messages in thread
From: Xin Guo @ 2025-07-13 15:22 UTC (permalink / raw)
To: ncardwell, edumazet, davem, dsahern, kuba, pabeni; +Cc: netdev, Xin Guo
tcp_sacktag_skip() directly return the input skb only
if TCP_SKB_CB(skb)->seq>skip_to_seq,
this is not right, and the logic should be
TCP_SKB_CB(skb)->seq>=skip_to_seq, for example
if start_seq is equal to tcp_highest_sack_seq() ,
the start_seq is equal to seq of skb which is from
tcp_highest_sack().
and on the other side ,when
tcp_highest_sack_seq() < start_seq in
tcp_sacktag_write_queue(),
the skb is from tcp_highest_sack() will be ignored
in tcp_sacktag_skip(), so clean the logic also.
Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: Xin Guo <guoxin0309@gmail.com>
---
net/ipv4/tcp_input.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 79e3bfb0108f..bbefb866c649 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1809,7 +1809,7 @@ static struct sk_buff *tcp_sacktag_bsearch(struct sock *sk, u32 seq)
static struct sk_buff *tcp_sacktag_skip(struct sk_buff *skb, struct sock *sk,
u32 skip_to_seq)
{
- if (skb && after(TCP_SKB_CB(skb)->seq, skip_to_seq))
+ if (skb && !before(TCP_SKB_CB(skb)->seq, skip_to_seq))
return skb;
return tcp_sacktag_bsearch(sk, skip_to_seq);
@@ -1997,7 +1997,7 @@ tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb,
continue;
}
- if (!before(start_seq, tcp_highest_sack_seq(tp))) {
+ if (tcp_highest_sack_seq(tp) == start_seq) {
skb = tcp_highest_sack(sk);
if (!skb)
break;
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] tcp: correct the skip logic in tcp_sacktag_skip()
2025-07-13 15:22 [PATCH net-next] tcp: correct the skip logic in tcp_sacktag_skip() Xin Guo
@ 2025-07-17 8:24 ` Paolo Abeni
2025-07-17 13:36 ` Xin Guo
0 siblings, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2025-07-17 8:24 UTC (permalink / raw)
To: Xin Guo, ncardwell, Kuniyuki Iwashima
Cc: netdev, edumazet, davem, dsahern, kuba
On 7/13/25 5:22 PM, Xin Guo wrote:
> tcp_sacktag_skip() directly return the input skb only
> if TCP_SKB_CB(skb)->seq>skip_to_seq,
> this is not right, and the logic should be
> TCP_SKB_CB(skb)->seq>=skip_to_seq,
Adding Kuniyuki
I'm not sure this statement is actually true. A more clear (and slightly
more descriptive) commit message could help better understanding the
issue. What is the bad behaviour you are observing?
Ideally a packetdrill test case to demonstrate it would help
> for example
> if start_seq is equal to tcp_highest_sack_seq() ,
> the start_seq is equal to seq of skb which is from
> tcp_highest_sack().
> and on the other side ,when
> tcp_highest_sack_seq() < start_seq in
> tcp_sacktag_write_queue(),
> the skb is from tcp_highest_sack() will be ignored
> in tcp_sacktag_skip(), so clean the logic also.
>
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
At very least the fixes tag looks wrong, because AFAICS such change did
not modify the behaviour tcp_sacktag_skip.
> Signed-off-by: Xin Guo <guoxin0309@gmail.com>
Thanks,
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] tcp: correct the skip logic in tcp_sacktag_skip()
2025-07-17 8:24 ` Paolo Abeni
@ 2025-07-17 13:36 ` Xin Guo
2025-07-17 14:14 ` Neal Cardwell
0 siblings, 1 reply; 5+ messages in thread
From: Xin Guo @ 2025-07-17 13:36 UTC (permalink / raw)
To: Paolo Abeni
Cc: ncardwell, Kuniyuki Iwashima, netdev, edumazet, davem, dsahern,
kuba
Hi Paolo,
Thanks for your review, let me explain in the thread first.
1)let me start from tcp_sacktag_skip, the definition as below:
static struct sk_buff *tcp_sacktag_skip(struct sk_buff *skb, struct sock *sk,
u32 skip_to_seq)
{
if (skb && after(TCP_SKB_CB(skb)->seq, skip_to_seq))
return skb;
return tcp_sacktag_bsearch(sk, skip_to_seq);
}
the input skb is a hint to avoid search the RTX tree, and the condition is:
skb->seq > skip_to_seq(so skip_to_seq cannot be included in skb),
as below:
0----------------------------------------------|------------------>+ skb->seq
0--------------------|-------------------------------------------->+ skip_to_seq
2)let me check the code snippet in tcp_sacktag_write_queue()
the code try to speed up the search by using tcp_highest_sack(),
the code is from the rtx queue is a list, but now the rtx queue is a tree.
the mean is that if the start_seq >=tcp_highest_sack_seq(), the we use
skb=tcp_highest_sack() as a hint to speed up the lookup(avoid to
lookup the tree).
so we can see that the skb->seq <=start_seq.
then if we use the skb and start_seq to call tcp_sacktag_skip(),
the tcp_sacktag_skip will go for rtx tree lookup, so
code snippet does not take effect.
static int
tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb,
u32 prior_snd_una, struct tcp_sacktag_state *state)
{
...
while (i < used_sacks) {
if (!before(start_seq, tcp_highest_sack_seq(tp))) {
skb = tcp_highest_sack(sk);
if (!skb)
break;
}
skb = tcp_sacktag_skip(skb, sk, start_seq);
walk:
skb = tcp_sacktag_walk(skb, sk, next_dup, state,
start_seq, end_seq, dup_sack);
advance_sp:
i++;
}
3) on the other side, let me show the logic in tcp_sacktag_bsearch, the logic is
the skb->seq should be met:
seq=<skb->seq and seq<skb->end_seq
so the seq should be included in skb, the log is not consist with
tcp_sacktag_skip().
static struct sk_buff *tcp_sacktag_bsearch(struct sock *sk, u32 seq)
{
struct rb_node *parent, **p = &sk->tcp_rtx_queue.rb_node;
struct sk_buff *skb;
while (*p) {
parent = *p;
skb = rb_to_skb(parent);
if (before(seq, TCP_SKB_CB(skb)->seq)) {
p = &parent->rb_left;
continue;
}
//[Xin Guo] at here seq=<skb->seq
if (!before(seq, TCP_SKB_CB(skb)->end_seq)) {
p = &parent->rb_right;
continue;
}
//[Xin Guo]at here seq<skb->end_seq
return skb;
}
return NULL;
}
i hope that it is more clear now, thanks.
Regards
Guo Xin.
On Thu, Jul 17, 2025 at 4:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 7/13/25 5:22 PM, Xin Guo wrote:
> > tcp_sacktag_skip() directly return the input skb only
> > if TCP_SKB_CB(skb)->seq>skip_to_seq,
> > this is not right, and the logic should be
> > TCP_SKB_CB(skb)->seq>=skip_to_seq,
>
> Adding Kuniyuki
>
> I'm not sure this statement is actually true. A more clear (and slightly
> more descriptive) commit message could help better understanding the
> issue. What is the bad behaviour you are observing?
>
> Ideally a packetdrill test case to demonstrate it would help
sorry that a packetdrill script cannot show the wrong behavior.
>
> > for example
> > if start_seq is equal to tcp_highest_sack_seq() ,
> > the start_seq is equal to seq of skb which is from
> > tcp_highest_sack().
> > and on the other side ,when
> > tcp_highest_sack_seq() < start_seq in
> > tcp_sacktag_write_queue(),
> > the skb is from tcp_highest_sack() will be ignored
> > in tcp_sacktag_skip(), so clean the logic also.
> >
> > Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
>
> At very least the fixes tag looks wrong, because AFAICS such change did
> not modify the behaviour tcp_sacktag_skip.
the commit 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
change the tcp_sacktag_skip(), and not be changed till now.
so i think it is right.
>
> > Signed-off-by: Xin Guo <guoxin0309@gmail.com>
>
> Thanks,
>
> Paolo
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] tcp: correct the skip logic in tcp_sacktag_skip()
2025-07-17 13:36 ` Xin Guo
@ 2025-07-17 14:14 ` Neal Cardwell
2025-07-17 14:30 ` Xin Guo
0 siblings, 1 reply; 5+ messages in thread
From: Neal Cardwell @ 2025-07-17 14:14 UTC (permalink / raw)
To: Xin Guo
Cc: Paolo Abeni, Kuniyuki Iwashima, netdev, edumazet, davem, dsahern,
kuba
On Thu, Jul 17, 2025 at 9:36 AM Xin Guo <guoxin0309@gmail.com> wrote:
>
> Hi Paolo,
> Thanks for your review, let me explain in the thread first.
> 1)let me start from tcp_sacktag_skip, the definition as below:
> static struct sk_buff *tcp_sacktag_skip(struct sk_buff *skb, struct sock *sk,
> u32 skip_to_seq)
> {
> if (skb && after(TCP_SKB_CB(skb)->seq, skip_to_seq))
> return skb;
>
> return tcp_sacktag_bsearch(sk, skip_to_seq);
> }
> the input skb is a hint to avoid search the RTX tree, and the condition is:
> skb->seq > skip_to_seq(so skip_to_seq cannot be included in skb),
> as below:
> 0----------------------------------------------|------------------>+ skb->seq
> 0--------------------|-------------------------------------------->+ skip_to_seq
>
> 2)let me check the code snippet in tcp_sacktag_write_queue()
> the code try to speed up the search by using tcp_highest_sack(),
> the code is from the rtx queue is a list, but now the rtx queue is a tree.
> the mean is that if the start_seq >=tcp_highest_sack_seq(), the we use
> skb=tcp_highest_sack() as a hint to speed up the lookup(avoid to
> lookup the tree).
> so we can see that the skb->seq <=start_seq.
> then if we use the skb and start_seq to call tcp_sacktag_skip(),
> the tcp_sacktag_skip will go for rtx tree lookup, so
> code snippet does not take effect.
>
> static int
> tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb,
> u32 prior_snd_una, struct tcp_sacktag_state *state)
> {
> ...
> while (i < used_sacks) {
>
> if (!before(start_seq, tcp_highest_sack_seq(tp))) {
> skb = tcp_highest_sack(sk);
> if (!skb)
> break;
> }
> skb = tcp_sacktag_skip(skb, sk, start_seq);
>
> walk:
> skb = tcp_sacktag_walk(skb, sk, next_dup, state,
> start_seq, end_seq, dup_sack);
>
> advance_sp:
> i++;
> }
>
> 3) on the other side, let me show the logic in tcp_sacktag_bsearch, the logic is
> the skb->seq should be met:
> seq=<skb->seq and seq<skb->end_seq
> so the seq should be included in skb, the log is not consist with
> tcp_sacktag_skip().
>
> static struct sk_buff *tcp_sacktag_bsearch(struct sock *sk, u32 seq)
> {
> struct rb_node *parent, **p = &sk->tcp_rtx_queue.rb_node;
> struct sk_buff *skb;
>
> while (*p) {
> parent = *p;
> skb = rb_to_skb(parent);
> if (before(seq, TCP_SKB_CB(skb)->seq)) {
> p = &parent->rb_left;
> continue;
> }
> //[Xin Guo] at here seq=<skb->seq
> if (!before(seq, TCP_SKB_CB(skb)->end_seq)) {
> p = &parent->rb_right;
> continue;
> }
> //[Xin Guo]at here seq<skb->end_seq
> return skb;
> }
> return NULL;
> }
>
> i hope that it is more clear now, thanks.
>
> Regards
> Guo Xin.
>
> On Thu, Jul 17, 2025 at 4:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On 7/13/25 5:22 PM, Xin Guo wrote:
> > > tcp_sacktag_skip() directly return the input skb only
> > > if TCP_SKB_CB(skb)->seq>skip_to_seq,
> > > this is not right, and the logic should be
> > > TCP_SKB_CB(skb)->seq>=skip_to_seq,
> >
> > Adding Kuniyuki
> >
> > I'm not sure this statement is actually true. A more clear (and slightly
> > more descriptive) commit message could help better understanding the
> > issue. What is the bad behaviour you are observing?
> >
> > Ideally a packetdrill test case to demonstrate it would help
> sorry that a packetdrill script cannot show the wrong behavior.
I agree with Paolo that having a packetdrill test case for this kind
of issue would be best.
Can you please explain why you are saying that a packetdrill script
cannot show the incorrect behavior for this issue?
Usually packetdrill is a good fit for these kinds of protocol
processing issues that do not involve performance or race conditions.
Here are examples of test cases that stress SACK processing behavior,
if that helps:
https://github.com/google/packetdrill/tree/master/gtests/net/tcp/sack
Thanks,
neal
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] tcp: correct the skip logic in tcp_sacktag_skip()
2025-07-17 14:14 ` Neal Cardwell
@ 2025-07-17 14:30 ` Xin Guo
0 siblings, 0 replies; 5+ messages in thread
From: Xin Guo @ 2025-07-17 14:30 UTC (permalink / raw)
To: Neal Cardwell
Cc: Paolo Abeni, Kuniyuki Iwashima, netdev, edumazet, davem, dsahern,
kuba
thanks neal,
the input parameter skb for tcp_sacktag_skip() is just a hint to avoid
rtx tree lookup,
if the hint does not take effect, the tcp_sacktag_bsearch() can also
find the right skb,
that is why it is hard to show the wrong behavior by packetdrill.
static struct sk_buff *tcp_sacktag_skip(struct sk_buff *skb, struct sock *sk,
u32 skip_to_seq)
{
if (skb && after(TCP_SKB_CB(skb)->seq, skip_to_seq))
return skb;
return tcp_sacktag_bsearch(sk, skip_to_seq);
}
Regards
Guo Xin.
On Thu, Jul 17, 2025 at 10:14 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Thu, Jul 17, 2025 at 9:36 AM Xin Guo <guoxin0309@gmail.com> wrote:
> >
> > Hi Paolo,
> > Thanks for your review, let me explain in the thread first.
> > 1)let me start from tcp_sacktag_skip, the definition as below:
> > static struct sk_buff *tcp_sacktag_skip(struct sk_buff *skb, struct sock *sk,
> > u32 skip_to_seq)
> > {
> > if (skb && after(TCP_SKB_CB(skb)->seq, skip_to_seq))
> > return skb;
> >
> > return tcp_sacktag_bsearch(sk, skip_to_seq);
> > }
> > the input skb is a hint to avoid search the RTX tree, and the condition is:
> > skb->seq > skip_to_seq(so skip_to_seq cannot be included in skb),
> > as below:
> > 0----------------------------------------------|------------------>+ skb->seq
> > 0--------------------|-------------------------------------------->+ skip_to_seq
> >
> > 2)let me check the code snippet in tcp_sacktag_write_queue()
> > the code try to speed up the search by using tcp_highest_sack(),
> > the code is from the rtx queue is a list, but now the rtx queue is a tree.
> > the mean is that if the start_seq >=tcp_highest_sack_seq(), the we use
> > skb=tcp_highest_sack() as a hint to speed up the lookup(avoid to
> > lookup the tree).
> > so we can see that the skb->seq <=start_seq.
> > then if we use the skb and start_seq to call tcp_sacktag_skip(),
> > the tcp_sacktag_skip will go for rtx tree lookup, so
> > code snippet does not take effect.
> >
> > static int
> > tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb,
> > u32 prior_snd_una, struct tcp_sacktag_state *state)
> > {
> > ...
> > while (i < used_sacks) {
> >
> > if (!before(start_seq, tcp_highest_sack_seq(tp))) {
> > skb = tcp_highest_sack(sk);
> > if (!skb)
> > break;
> > }
> > skb = tcp_sacktag_skip(skb, sk, start_seq);
> >
> > walk:
> > skb = tcp_sacktag_walk(skb, sk, next_dup, state,
> > start_seq, end_seq, dup_sack);
> >
> > advance_sp:
> > i++;
> > }
> >
> > 3) on the other side, let me show the logic in tcp_sacktag_bsearch, the logic is
> > the skb->seq should be met:
> > seq=<skb->seq and seq<skb->end_seq
> > so the seq should be included in skb, the log is not consist with
> > tcp_sacktag_skip().
> >
> > static struct sk_buff *tcp_sacktag_bsearch(struct sock *sk, u32 seq)
> > {
> > struct rb_node *parent, **p = &sk->tcp_rtx_queue.rb_node;
> > struct sk_buff *skb;
> >
> > while (*p) {
> > parent = *p;
> > skb = rb_to_skb(parent);
> > if (before(seq, TCP_SKB_CB(skb)->seq)) {
> > p = &parent->rb_left;
> > continue;
> > }
> > //[Xin Guo] at here seq=<skb->seq
> > if (!before(seq, TCP_SKB_CB(skb)->end_seq)) {
> > p = &parent->rb_right;
> > continue;
> > }
> > //[Xin Guo]at here seq<skb->end_seq
> > return skb;
> > }
> > return NULL;
> > }
> >
> > i hope that it is more clear now, thanks.
> >
> > Regards
> > Guo Xin.
> >
> > On Thu, Jul 17, 2025 at 4:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > On 7/13/25 5:22 PM, Xin Guo wrote:
> > > > tcp_sacktag_skip() directly return the input skb only
> > > > if TCP_SKB_CB(skb)->seq>skip_to_seq,
> > > > this is not right, and the logic should be
> > > > TCP_SKB_CB(skb)->seq>=skip_to_seq,
> > >
> > > Adding Kuniyuki
> > >
> > > I'm not sure this statement is actually true. A more clear (and slightly
> > > more descriptive) commit message could help better understanding the
> > > issue. What is the bad behaviour you are observing?
> > >
> > > Ideally a packetdrill test case to demonstrate it would help
> > sorry that a packetdrill script cannot show the wrong behavior.
>
> I agree with Paolo that having a packetdrill test case for this kind
> of issue would be best.
>
> Can you please explain why you are saying that a packetdrill script
> cannot show the incorrect behavior for this issue?
>
> Usually packetdrill is a good fit for these kinds of protocol
> processing issues that do not involve performance or race conditions.
>
> Here are examples of test cases that stress SACK processing behavior,
> if that helps:
> https://github.com/google/packetdrill/tree/master/gtests/net/tcp/sack
>
> Thanks,
> neal
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-17 14:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-13 15:22 [PATCH net-next] tcp: correct the skip logic in tcp_sacktag_skip() Xin Guo
2025-07-17 8:24 ` Paolo Abeni
2025-07-17 13:36 ` Xin Guo
2025-07-17 14:14 ` Neal Cardwell
2025-07-17 14:30 ` Xin Guo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).