netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tcp: Reallocate headroom if it would overflow csum_start
@ 2013-04-11 11:19 Thomas Graf
  2013-04-11 12:41 ` Sergei Shtylyov
  2013-04-11 15:49 ` Eric Dumazet
  0 siblings, 2 replies; 5+ messages in thread
From: Thomas Graf @ 2013-04-11 11:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet

If a TCP retransmission gets partially ACKed and collapsed multiple
times it is possible for the headroom to grow beyond 64K which will
overflow the 16bit skb->csum_start which is based on the start of
the headroom. It has been observed rarely in the wild with IPoIB due
to the 64K MTU.

Verify if the acking and collapsing resulted in a headroom exceeding
what csum_start can cover and reallocate the headroom if so.

LLNL has been running the patch for a while and has not seen the
problem occur since.

A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
LLNL for helping out with the investigation and testing.

Reported-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
v2: reallocate headroom instead of preventing further collapsing

 net/ipv4/tcp_output.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b44cf81..bf6ceb7 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2388,8 +2388,11 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	 */
 	TCP_SKB_CB(skb)->when = tcp_time_stamp;
 
-	/* make sure skb->data is aligned on arches that require it */
-	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
+	/* make sure skb->data is aligned on arches that require it
+	 * and check if ack-trimming & collapsing extended the headroom
+	 * beyond what csum_start can cover. */
+	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3) ||
+		     skb_headroom(skb) >= 0xFFFF)) {
 		struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
 						   GFP_ATOMIC);
 		return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: Reallocate headroom if it would overflow csum_start
  2013-04-11 11:19 [PATCH] tcp: Reallocate headroom if it would overflow csum_start Thomas Graf
@ 2013-04-11 12:41 ` Sergei Shtylyov
  2013-04-11 15:49 ` Eric Dumazet
  1 sibling, 0 replies; 5+ messages in thread
From: Sergei Shtylyov @ 2013-04-11 12:41 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev, eric.dumazet

Hello.

On 11-04-2013 15:19, Thomas Graf wrote:

> If a TCP retransmission gets partially ACKed and collapsed multiple
> times it is possible for the headroom to grow beyond 64K which will
> overflow the 16bit skb->csum_start which is based on the start of
> the headroom. It has been observed rarely in the wild with IPoIB due
> to the 64K MTU.

> Verify if the acking and collapsing resulted in a headroom exceeding
> what csum_start can cover and reallocate the headroom if so.

> LLNL has been running the patch for a while and has not seen the
> problem occur since.

> A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
> LLNL for helping out with the investigation and testing.

> Reported-by: Jim Foraker <foraker1@llnl.gov>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
[...]

    Minor formatting nit.

> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b44cf81..bf6ceb7 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2388,8 +2388,11 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
>   	 */
>   	TCP_SKB_CB(skb)->when = tcp_time_stamp;
>
> -	/* make sure skb->data is aligned on arches that require it */
> -	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
> +	/* make sure skb->data is aligned on arches that require it
> +	 * and check if ack-trimming & collapsing extended the headroom
> +	 * beyond what csum_start can cover. */

    The preferred multi-line comment style in the networking code:

/* bla
  * bla
  */

WBR, Sergei

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: Reallocate headroom if it would overflow csum_start
  2013-04-11 11:19 [PATCH] tcp: Reallocate headroom if it would overflow csum_start Thomas Graf
  2013-04-11 12:41 ` Sergei Shtylyov
@ 2013-04-11 15:49 ` Eric Dumazet
  2013-04-11 17:52   ` Ben Hutchings
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2013-04-11 15:49 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev

On Thu, 2013-04-11 at 13:19 +0200, Thomas Graf wrote:
> If a TCP retransmission gets partially ACKed and collapsed multiple
> times it is possible for the headroom to grow beyond 64K which will
> overflow the 16bit skb->csum_start which is based on the start of
> the headroom. It has been observed rarely in the wild with IPoIB due
> to the 64K MTU.
> 
> Verify if the acking and collapsing resulted in a headroom exceeding
> what csum_start can cover and reallocate the headroom if so.
> 
> LLNL has been running the patch for a while and has not seen the
> problem occur since.
> 
> A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
> LLNL for helping out with the investigation and testing.
> 
> Reported-by: Jim Foraker <foraker1@llnl.gov>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> v2: reallocate headroom instead of preventing further collapsing
> 
>  net/ipv4/tcp_output.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b44cf81..bf6ceb7 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2388,8 +2388,11 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
>  	 */
>  	TCP_SKB_CB(skb)->when = tcp_time_stamp;
>  
> -	/* make sure skb->data is aligned on arches that require it */
> -	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
> +	/* make sure skb->data is aligned on arches that require it
> +	 * and check if ack-trimming & collapsing extended the headroom
> +	 * beyond what csum_start can cover. */
> +	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3) ||
> +		     skb_headroom(skb) >= 0xFFFF)) {
>  		struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
>  						   GFP_ATOMIC);
>  		return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :

Strange... It was tested on an arch with NET_IP_ALIGN == 2 I presume ?

This fix should also be done for other arches (x86 for example)

I would code the condition like that instead

if ((NET_IP_ALIGN && ((unsigned long)skb->data & 3)) ||
    skb_headroom(skb) >= 0xFFFF)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: Reallocate headroom if it would overflow csum_start
  2013-04-11 15:49 ` Eric Dumazet
@ 2013-04-11 17:52   ` Ben Hutchings
  2013-04-11 17:57     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Hutchings @ 2013-04-11 17:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Graf, davem, netdev

On Thu, 2013-04-11 at 08:49 -0700, Eric Dumazet wrote:
> On Thu, 2013-04-11 at 13:19 +0200, Thomas Graf wrote:
> > If a TCP retransmission gets partially ACKed and collapsed multiple
> > times it is possible for the headroom to grow beyond 64K which will
> > overflow the 16bit skb->csum_start which is based on the start of
> > the headroom. It has been observed rarely in the wild with IPoIB due
> > to the 64K MTU.
> > 
> > Verify if the acking and collapsing resulted in a headroom exceeding
> > what csum_start can cover and reallocate the headroom if so.
> > 
> > LLNL has been running the patch for a while and has not seen the
> > problem occur since.
> > 
> > A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
> > LLNL for helping out with the investigation and testing.
> > 
> > Reported-by: Jim Foraker <foraker1@llnl.gov>
> > Signed-off-by: Thomas Graf <tgraf@suug.ch>
> > ---
> > v2: reallocate headroom instead of preventing further collapsing
> > 
> >  net/ipv4/tcp_output.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index b44cf81..bf6ceb7 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2388,8 +2388,11 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
> >  	 */
> >  	TCP_SKB_CB(skb)->when = tcp_time_stamp;
> >  
> > -	/* make sure skb->data is aligned on arches that require it */
> > -	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
> > +	/* make sure skb->data is aligned on arches that require it
> > +	 * and check if ack-trimming & collapsing extended the headroom
> > +	 * beyond what csum_start can cover. */
> > +	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3) ||
> > +		     skb_headroom(skb) >= 0xFFFF)) {
> >  		struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
> >  						   GFP_ATOMIC);
> >  		return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
> 
> Strange... It was tested on an arch with NET_IP_ALIGN == 2 I presume ?
> 
> This fix should also be done for other arches (x86 for example)
> 
> I would code the condition like that instead
> 
> if ((NET_IP_ALIGN && ((unsigned long)skb->data & 3)) ||
>     skb_headroom(skb) >= 0xFFFF)

You dropped the unlikely() and added redundant parentheses, which may be
clearer but is still equivalent.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tcp: Reallocate headroom if it would overflow csum_start
  2013-04-11 17:52   ` Ben Hutchings
@ 2013-04-11 17:57     ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2013-04-11 17:57 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Thomas Graf, davem, netdev

On Thu, 2013-04-11 at 18:52 +0100, Ben Hutchings wrote:
> On Thu, 2013-04-11 at 08:49 -0700, Eric Dumazet wrote:
> > On Thu, 2013-04-11 at 13:19 +0200, Thomas Graf wrote:
> > > If a TCP retransmission gets partially ACKed and collapsed multiple
> > > times it is possible for the headroom to grow beyond 64K which will
> > > overflow the 16bit skb->csum_start which is based on the start of
> > > the headroom. It has been observed rarely in the wild with IPoIB due
> > > to the 64K MTU.
> > > 
> > > Verify if the acking and collapsing resulted in a headroom exceeding
> > > what csum_start can cover and reallocate the headroom if so.
> > > 
> > > LLNL has been running the patch for a while and has not seen the
> > > problem occur since.
> > > 
> > > A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
> > > LLNL for helping out with the investigation and testing.
> > > 
> > > Reported-by: Jim Foraker <foraker1@llnl.gov>
> > > Signed-off-by: Thomas Graf <tgraf@suug.ch>
> > > ---
> > > v2: reallocate headroom instead of preventing further collapsing
> > > 
> > >  net/ipv4/tcp_output.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index b44cf81..bf6ceb7 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -2388,8 +2388,11 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
> > >  	 */
> > >  	TCP_SKB_CB(skb)->when = tcp_time_stamp;
> > >  
> > > -	/* make sure skb->data is aligned on arches that require it */
> > > -	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
> > > +	/* make sure skb->data is aligned on arches that require it
> > > +	 * and check if ack-trimming & collapsing extended the headroom
> > > +	 * beyond what csum_start can cover. */
> > > +	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3) ||
> > > +		     skb_headroom(skb) >= 0xFFFF)) {
> > >  		struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
> > >  						   GFP_ATOMIC);
> > >  		return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
> > 
> > Strange... It was tested on an arch with NET_IP_ALIGN == 2 I presume ?
> > 
> > This fix should also be done for other arches (x86 for example)
> > 
> > I would code the condition like that instead
> > 
> > if ((NET_IP_ALIGN && ((unsigned long)skb->data & 3)) ||
> >     skb_headroom(skb) >= 0xFFFF)
> 
> You dropped the unlikely() and added redundant parentheses, which may be
> clearer but is still equivalent.

I see what you mean...

I just don't like

if (A && B || C)

I prefer in this case

if ((A && B) || C)

Then add the unlikely() if we really care in this _ultra_ slow path

if (unlikely((A && B) || C)) 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-04-11 17:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-11 11:19 [PATCH] tcp: Reallocate headroom if it would overflow csum_start Thomas Graf
2013-04-11 12:41 ` Sergei Shtylyov
2013-04-11 15:49 ` Eric Dumazet
2013-04-11 17:52   ` Ben Hutchings
2013-04-11 17:57     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).