[PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop
@ 2013-02-13 23:40 Thomas Graf
  2013-02-13 23:48 ` Thomas Graf
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Graf @ 2013-02-13 23:40 UTC (permalink / raw)
  To: davem; +Cc: netdev

skb_checksum_help() verifies the integrity of skb->csum_start
and skb->csum_offset with BUG_ON()s.

They have been hit with IPoIB which uses a 64K MTU. If a TCP
retransmission gets partially ACKed and collapsed multiple times
it is possible for the headroom to grow beyond 64K which will
overflow the 16bit skb->csum_start.

This in turn will trigger the BUG_ON() in skb_checksum_help().
Convert these to WARN_ON() and drop the packet.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/core/dev.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f64e439..629d22e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2047,11 +2047,14 @@ int skb_checksum_help(struct sk_buff *skb)
 	}
 
 	offset = skb_checksum_start_offset(skb);
-	BUG_ON(offset >= skb_headlen(skb));
+	if (WARN_ON(offset >= skb_headlen(skb)))
+		return -ERANGE;
+
 	csum = skb_checksum(skb, offset, skb->len - offset, 0);
 
 	offset += skb->csum_offset;
-	BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb));
+	if (WARN_ON(offset + sizeof(__sum16) > skb_headlen(skb)))
+		return -ERANGE;
 
 	if (skb_cloned(skb) &&
 	    !skb_clone_writable(skb, offset + sizeof(__sum16))) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop
  2013-02-13 23:40 [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop Thomas Graf
@ 2013-02-13 23:48 ` Thomas Graf
  2013-02-14  0:37   ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Graf @ 2013-02-13 23:48 UTC (permalink / raw)
  To: davem; +Cc: netdev

On 02/13/13 at 11:40pm, Thomas Graf wrote:
> They have been hit with IPoIB which uses a 64K MTU. If a TCP
> retransmission gets partially ACKed and collapsed multiple times
> it is possible for the headroom to grow beyond 64K which will
> overflow the 16bit skb->csum_start.

On the subject of fixing this, I considered:

 a) Reallocate the headroom in tcp_trim_head() if it would
    overflow. I disregarded this idea because replacing the
    old skb with the new skb on the write queue for such a
    rare situation seems overkill.

 b) No longer collapse if the new skb would result in a
    a headroom + data that exceeds 64K. This seems to be the
    most trivial fix.

 c) Increase size of csum_start or store checksum_start_offset
    differently.

Other ideas?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop
  2013-02-13 23:48 ` Thomas Graf
@ 2013-02-14  0:37   ` David Miller
  2013-02-14 10:18     ` Thomas Graf
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2013-02-14  0:37 UTC (permalink / raw)
  To: tgraf; +Cc: netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Wed, 13 Feb 2013 23:48:43 +0000

> On 02/13/13 at 11:40pm, Thomas Graf wrote:
>> They have been hit with IPoIB which uses a 64K MTU. If a TCP
>> retransmission gets partially ACKed and collapsed multiple times
>> it is possible for the headroom to grow beyond 64K which will
>> overflow the 16bit skb->csum_start.
> 
> On the subject of fixing this, I considered:
> 
>  a) Reallocate the headroom in tcp_trim_head() if it would
>     overflow. I disregarded this idea because replacing the
>     old skb with the new skb on the write queue for such a
>     rare situation seems overkill.
> 
>  b) No longer collapse if the new skb would result in a
>     a headroom + data that exceeds 64K. This seems to be the
>     most trivial fix.
> 
>  c) Increase size of csum_start or store checksum_start_offset
>     differently.
> 
> Other ideas?

"b" is a good idea.

Let's not paper over this, this BUG_ON() is really a BUG_ON()
meaning "FIX ME NOW" :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop
  2013-02-14  0:37   ` David Miller
@ 2013-02-14 10:18     ` Thomas Graf
  2013-02-14 16:22       ` Eric Dumazet
  2013-02-14 18:00       ` David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Graf @ 2013-02-14 10:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 02/13/13 at 07:37pm, David Miller wrote:
> From: Thomas Graf <tgraf@suug.ch>
> Date: Wed, 13 Feb 2013 23:48:43 +0000
[...]
> >  b) No longer collapse if the new skb would result in a
> >     a headroom + data that exceeds 64K. This seems to be the
> >     most trivial fix.
[...]
> > Other ideas?
> 
> "b" is a good idea.

OK, patch to do so being tested by original reporter.

> Let's not paper over this, this BUG_ON() is really a BUG_ON()
> meaning "FIX ME NOW" :-)

Maybe it's my general dislike of BUG_ON() in the processing
path, especially if the bug condition can be influenced remotely.
It looks absolutely doable to trigger the previously mentioned
partial acking & collapsing on purpose by a malicious receiver
even with an MTU of 1500. I believe we should avoid total DoS
in future similar situations that we don't think of yet.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop
  2013-02-14 10:18     ` Thomas Graf
@ 2013-02-14 16:22       ` Eric Dumazet
  2013-02-14 18:50         ` Thomas Graf
  2013-02-14 18:00       ` David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-02-14 16:22 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David Miller, netdev

On Thu, 2013-02-14 at 10:18 +0000, Thomas Graf wrote:
> Maybe it's my general dislike of BUG_ON() in the processing
> path, especially if the bug condition can be influenced remotely.
> It looks absolutely doable to trigger the previously mentioned
> partial acking & collapsing on purpose by a malicious receiver
> even with an MTU of 1500. I believe we should avoid total DoS
> in future similar situations that we don't think of yet.

It seems not possible to avoid bugs, being a BUG_ON() or a out of bound
memory access or whatever. We must fix them eventually.

In this case, it seems we must limit payload to 

65535 - MAX_TCP_HEADER 

It would make tcp_xmit_size_goal() a bit shorter.

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 2c7e596..2f6c8e5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -793,10 +793,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 	xmit_size_goal = mss_now;
 
 	if (large_allowed && sk_can_gso(sk)) {
-		xmit_size_goal = ((sk->sk_gso_max_size - 1) -
-				  inet_csk(sk)->icsk_af_ops->net_header_len -
-				  inet_csk(sk)->icsk_ext_hdr_len -
-				  tp->tcp_header_len);
+		xmit_size_goal = sk->sk_gso_max_size - 1 - MAX_TCP_HEADER;
 
 		/* TSQ : try to have two TSO segments in flight */
 		xmit_size_goal = min_t(u32, xmit_size_goal,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop
  2013-02-14 16:22       ` Eric Dumazet
@ 2013-02-14 18:50         ` Thomas Graf
  2013-02-14 19:21           ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Graf @ 2013-02-14 18:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On 02/14/13 at 08:22am, Eric Dumazet wrote:
> It seems not possible to avoid bugs, being a BUG_ON() or a out of bound
> memory access or whatever. We must fix them eventually.

Of course, I never intended to not fix this but I still think
leaving the BUG_ON() is wrong ;-)

> In this case, it seems we must limit payload to 
> 
> 65535 - MAX_TCP_HEADER 
> 
> It would make tcp_xmit_size_goal() a bit shorter.
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 2c7e596..2f6c8e5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -793,10 +793,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
>  	xmit_size_goal = mss_now;
>  
>  	if (large_allowed && sk_can_gso(sk)) {
> -		xmit_size_goal = ((sk->sk_gso_max_size - 1) -
> -				  inet_csk(sk)->icsk_af_ops->net_header_len -
> -				  inet_csk(sk)->icsk_ext_hdr_len -
> -				  tp->tcp_header_len);
> +		xmit_size_goal = sk->sk_gso_max_size - 1 - MAX_TCP_HEADER;
>  
>  		/* TSQ : try to have two TSO segments in flight */
>  		xmit_size_goal = min_t(u32, xmit_size_goal,

I don't think this would help. The allocated skb data will still
exceed 64K and thus after trimming the acked data and collapsing
the header might be stored after the 64K mark.

We would have to limit the skb tailroom to 64K upon allocation.
That would mean we would waste some of the additional space that
kmalloc() might have given us:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 32443eb..c8f9850 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -241,6 +241,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mas
         * to allow max possible filling before reallocation.
         */
        size = SKB_WITH_OVERHEAD(ksize(data));
+       /* ensure that all offsets based on skb->head fit into 16bits */
+       size = min_t(int, size, 65535);
        prefetchw(data + size);
 
        /*


Or if that is not ideal, avoiding the collapse that causes the overflow
would also help:


diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5d45159..e9111b4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2301,6 +2301,12 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
                if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp)))
                        break;
 
+               /* Never collapse if the resulting headroom + data exceeds
+                * 64K as that is the maximum csum_start can cover.
+                */
+               if (skb_headroom(to) + to->len + skb->len > 65535)
+                       break;
+
                tcp_collapse_retrans(sk, to);
        }
 }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop
  2013-02-14 18:50         ` Thomas Graf
@ 2013-02-14 19:21           ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2013-02-14 19:21 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David Miller, netdev

On Thu, 2013-02-14 at 18:50 +0000, Thomas Graf wrote:
> On 02/14/13 at 08:22am, Eric Dumazet wrote:
> > It seems not possible to avoid bugs, being a BUG_ON() or a out of bound
> > memory access or whatever. We must fix them eventually.
> 
> Of course, I never intended to not fix this but I still think
> leaving the BUG_ON() is wrong ;-)
> 
> > In this case, it seems we must limit payload to 
> > 
> > 65535 - MAX_TCP_HEADER 
> > 
> > It would make tcp_xmit_size_goal() a bit shorter.
> > 
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 2c7e596..2f6c8e5 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -793,10 +793,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
> >  	xmit_size_goal = mss_now;
> >  
> >  	if (large_allowed && sk_can_gso(sk)) {
> > -		xmit_size_goal = ((sk->sk_gso_max_size - 1) -
> > -				  inet_csk(sk)->icsk_af_ops->net_header_len -
> > -				  inet_csk(sk)->icsk_ext_hdr_len -
> > -				  tp->tcp_header_len);
> > +		xmit_size_goal = sk->sk_gso_max_size - 1 - MAX_TCP_HEADER;
> >  
> >  		/* TSQ : try to have two TSO segments in flight */
> >  		xmit_size_goal = min_t(u32, xmit_size_goal,
> 
> I don't think this would help. The allocated skb data will still
> exceed 64K and thus after trimming the acked data and collapsing
> the header might be stored after the 64K mark.
> 

OK, I now understand your issue.

> We would have to limit the skb tailroom to 64K upon allocation.
> That would mean we would waste some of the additional space that
> kmalloc() might have given us:
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 32443eb..c8f9850 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -241,6 +241,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mas
>          * to allow max possible filling before reallocation.
>          */
>         size = SKB_WITH_OVERHEAD(ksize(data));
> +       /* ensure that all offsets based on skb->head fit into 16bits */
> +       size = min_t(int, size, 65535);
>         prefetchw(data + size);
>  

This part is certainly not good. 

alloc_skb() is generic and some users really want more than 64K

>         /*
> 
> 
> Or if that is not ideal, avoiding the collapse that causes the overflow
> would also help:
> 
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5d45159..e9111b4 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2301,6 +2301,12 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
>                 if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp)))
>                         break;
>  
> +               /* Never collapse if the resulting headroom + data exceeds
> +                * 64K as that is the maximum csum_start can cover.
> +                */
> +               if (skb_headroom(to) + to->len + skb->len > 65535)
> +                       break;
> +
>                 tcp_collapse_retrans(sk, to);
>    

Definitely better.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop
  2013-02-14 10:18     ` Thomas Graf
  2013-02-14 16:22       ` Eric Dumazet
@ 2013-02-14 18:00       ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2013-02-14 18:00 UTC (permalink / raw)
  To: tgraf; +Cc: netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 14 Feb 2013 10:18:53 +0000

> On 02/13/13 at 07:37pm, David Miller wrote:
> Maybe it's my general dislike of BUG_ON() in the processing
> path, especially if the bug condition can be influenced remotely.
> It looks absolutely doable to trigger the previously mentioned
> partial acking & collapsing on purpose by a malicious receiver
> even with an MTU of 1500. I believe we should avoid total DoS
> in future similar situations that we don't think of yet.

I heard that people can very effectively protect themselves from DoS's
by not using Infiniband. :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-02-14 19:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-13 23:40 [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop Thomas Graf
2013-02-13 23:48 ` Thomas Graf
2013-02-14  0:37   ` David Miller
2013-02-14 10:18     ` Thomas Graf
2013-02-14 16:22       ` Eric Dumazet
2013-02-14 18:50         ` Thomas Graf
2013-02-14 19:21           ` Eric Dumazet
2013-02-14 18:00       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).