netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
@ 2012-03-03  7:36 Neal Cardwell
  2012-03-03 14:22 ` Ilpo Järvinen
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Neal Cardwell @ 2012-03-03  7:36 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, ilpo.jarvinen, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert, Neal Cardwell

In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
to mark the first portion as lost. This is for two primary reasons:

(1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
doing this, it preserves the sum of their packet counts in order to
reflect the real-world dynamics on the wire. But given that skbs can
have remainders that do not align to MSS boundaries, this packet count
preservation means that for SACKed skbs there is not necessarily a
direct linear relationship between tcp_skb_pcount(skb) and
skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
off and mark as lost a prefix of length (packets - oldcnt)*mss from
SACKed skbs were leading to occasional failures of the WARN_ON(len >
skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
recent "crash in tcp_fragment" thread on netdev).

(2) there is no real point in fragmenting off part of a SACKed skb and
calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
for SACKed skbs.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ee42d42..d9b83d1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2569,6 +2569,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head)
 
 		if (cnt > packets) {
 			if ((tcp_is_sack(tp) && !tcp_is_fack(tp)) ||
+			    (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) ||
 			    (oldcnt >= packets))
 				break;
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
  2012-03-03  7:36 [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost() Neal Cardwell
@ 2012-03-03 14:22 ` Ilpo Järvinen
  2012-03-03 15:19 ` Yuchung Cheng
  2012-03-03 18:03 ` Nandita Dukkipati
  2 siblings, 0 replies; 5+ messages in thread
From: Ilpo Järvinen @ 2012-03-03 14:22 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1220 bytes --]

On Sat, 3 Mar 2012, Neal Cardwell wrote:

> In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
> to mark the first portion as lost. This is for two primary reasons:
> 
> (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
> doing this, it preserves the sum of their packet counts in order to
> reflect the real-world dynamics on the wire. But given that skbs can
> have remainders that do not align to MSS boundaries, this packet count
> preservation means that for SACKed skbs there is not necessarily a
> direct linear relationship between tcp_skb_pcount(skb) and
> skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
> off and mark as lost a prefix of length (packets - oldcnt)*mss from
> SACKed skbs were leading to occasional failures of the WARN_ON(len >
> skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
> recent "crash in tcp_fragment" thread on netdev).
> 
> (2) there is no real point in fragmenting off part of a SACKed skb and
> calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
> for SACKed skbs.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>

Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

-- 
 i.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
  2012-03-03  7:36 [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost() Neal Cardwell
  2012-03-03 14:22 ` Ilpo Järvinen
@ 2012-03-03 15:19 ` Yuchung Cheng
  2012-03-03 18:03 ` Nandita Dukkipati
  2 siblings, 0 replies; 5+ messages in thread
From: Yuchung Cheng @ 2012-03-03 15:19 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, netdev, ilpo.jarvinen, Nandita Dukkipati,
	Tom Herbert

On Fri, Mar 2, 2012 at 11:36 PM, Neal Cardwell <ncardwell@google.com> wrote:
> In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
> to mark the first portion as lost. This is for two primary reasons:
>
> (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
> doing this, it preserves the sum of their packet counts in order to
> reflect the real-world dynamics on the wire. But given that skbs can
> have remainders that do not align to MSS boundaries, this packet count
> preservation means that for SACKed skbs there is not necessarily a
> direct linear relationship between tcp_skb_pcount(skb) and
> skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
> off and mark as lost a prefix of length (packets - oldcnt)*mss from
> SACKed skbs were leading to occasional failures of the WARN_ON(len >
> skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
> recent "crash in tcp_fragment" thread on netdev).
>
> (2) there is no real point in fragmenting off part of a SACKed skb and
> calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
> for SACKed skbs.
>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>

Thanks for spending a lot of time on this obscure bug.

> ---
>  net/ipv4/tcp_input.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index ee42d42..d9b83d1 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2569,6 +2569,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head)
>
>                if (cnt > packets) {
>                        if ((tcp_is_sack(tp) && !tcp_is_fack(tp)) ||
> +                           (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) ||
>                            (oldcnt >= packets))
>                                break;
>
> --
> 1.7.7.3
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
  2012-03-03  7:36 [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost() Neal Cardwell
  2012-03-03 14:22 ` Ilpo Järvinen
  2012-03-03 15:19 ` Yuchung Cheng
@ 2012-03-03 18:03 ` Nandita Dukkipati
  2012-03-03 19:58   ` David Miller
  2 siblings, 1 reply; 5+ messages in thread
From: Nandita Dukkipati @ 2012-03-03 18:03 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, netdev, ilpo.jarvinen, Yuchung Cheng, Tom Herbert

On Fri, Mar 2, 2012 at 11:36 PM, Neal Cardwell <ncardwell@google.com> wrote:
>
> In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
> to mark the first portion as lost. This is for two primary reasons:
>
> (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
> doing this, it preserves the sum of their packet counts in order to
> reflect the real-world dynamics on the wire. But given that skbs can
> have remainders that do not align to MSS boundaries, this packet count
> preservation means that for SACKed skbs there is not necessarily a
> direct linear relationship between tcp_skb_pcount(skb) and
> skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
> off and mark as lost a prefix of length (packets - oldcnt)*mss from
> SACKed skbs were leading to occasional failures of the WARN_ON(len >
> skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
> recent "crash in tcp_fragment" thread on netdev).
>
> (2) there is no real point in fragmenting off part of a SACKed skb and
> calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
> for SACKed skbs.
>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>

Acked-by: Nandita Dukkipati <nanditad@google.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
  2012-03-03 18:03 ` Nandita Dukkipati
@ 2012-03-03 19:58   ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2012-03-03 19:58 UTC (permalink / raw)
  To: nanditad; +Cc: ncardwell, netdev, ilpo.jarvinen, ycheng, therbert

From: Nandita Dukkipati <nanditad@google.com>
Date: Sat, 3 Mar 2012 10:03:56 -0800

> On Fri, Mar 2, 2012 at 11:36 PM, Neal Cardwell <ncardwell@google.com> wrote:
>>
>> In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
>> to mark the first portion as lost. This is for two primary reasons:
>>
>> (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
>> doing this, it preserves the sum of their packet counts in order to
>> reflect the real-world dynamics on the wire. But given that skbs can
>> have remainders that do not align to MSS boundaries, this packet count
>> preservation means that for SACKed skbs there is not necessarily a
>> direct linear relationship between tcp_skb_pcount(skb) and
>> skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
>> off and mark as lost a prefix of length (packets - oldcnt)*mss from
>> SACKed skbs were leading to occasional failures of the WARN_ON(len >
>> skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
>> recent "crash in tcp_fragment" thread on netdev).
>>
>> (2) there is no real point in fragmenting off part of a SACKed skb and
>> calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
>> for SACKed skbs.
>>
>> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> 
> Acked-by: Nandita Dukkipati <nanditad@google.com>

Applied and queued up for -stable, thanks everyone.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-03-03 19:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-03  7:36 [PATCH] tcp: don't fragment SACKed skbs in tcp_mark_head_lost() Neal Cardwell
2012-03-03 14:22 ` Ilpo Järvinen
2012-03-03 15:19 ` Yuchung Cheng
2012-03-03 18:03 ` Nandita Dukkipati
2012-03-03 19:58   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).