* [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop @ 2013-02-13 23:40 Thomas Graf 2013-02-13 23:48 ` Thomas Graf 0 siblings, 1 reply; 8+ messages in thread From: Thomas Graf @ 2013-02-13 23:40 UTC (permalink / raw) To: davem; +Cc: netdev skb_checksum_help() verifies the integrity of skb->csum_start and skb->csum_offset with BUG_ON()s. They have been hit with IPoIB which uses a 64K MTU. If a TCP retransmission gets partially ACKed and collapsed multiple times it is possible for the headroom to grow beyond 64K which will overflow the 16bit skb->csum_start. This in turn will trigger the BUG_ON() in skb_checksum_help(). Convert these to WARN_ON() and drop the packet. Signed-off-by: Thomas Graf <tgraf@suug.ch> --- net/core/dev.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index f64e439..629d22e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2047,11 +2047,14 @@ int skb_checksum_help(struct sk_buff *skb) } offset = skb_checksum_start_offset(skb); - BUG_ON(offset >= skb_headlen(skb)); + if (WARN_ON(offset >= skb_headlen(skb))) + return -ERANGE; + csum = skb_checksum(skb, offset, skb->len - offset, 0); offset += skb->csum_offset; - BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb)); + if (WARN_ON(offset + sizeof(__sum16) > skb_headlen(skb))) + return -ERANGE; if (skb_cloned(skb) && !skb_clone_writable(skb, offset + sizeof(__sum16))) { ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop 2013-02-13 23:40 [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop Thomas Graf @ 2013-02-13 23:48 ` Thomas Graf 2013-02-14 0:37 ` David Miller 0 siblings, 1 reply; 8+ messages in thread From: Thomas Graf @ 2013-02-13 23:48 UTC (permalink / raw) To: davem; +Cc: netdev On 02/13/13 at 11:40pm, Thomas Graf wrote: > They have been hit with IPoIB which uses a 64K MTU. If a TCP > retransmission gets partially ACKed and collapsed multiple times > it is possible for the headroom to grow beyond 64K which will > overflow the 16bit skb->csum_start. On the subject of fixing this, I considered: a) Reallocate the headroom in tcp_trim_head() if it would overflow. I disregarded this idea because replacing the old skb with the new skb on the write queue for such a rare situation seems overkill. b) No longer collapse if the new skb would result in a a headroom + data that exceeds 64K. This seems to be the most trivial fix. c) Increase size of csum_start or store checksum_start_offset differently. Other ideas? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop 2013-02-13 23:48 ` Thomas Graf @ 2013-02-14 0:37 ` David Miller 2013-02-14 10:18 ` Thomas Graf 0 siblings, 1 reply; 8+ messages in thread From: David Miller @ 2013-02-14 0:37 UTC (permalink / raw) To: tgraf; +Cc: netdev From: Thomas Graf <tgraf@suug.ch> Date: Wed, 13 Feb 2013 23:48:43 +0000 > On 02/13/13 at 11:40pm, Thomas Graf wrote: >> They have been hit with IPoIB which uses a 64K MTU. If a TCP >> retransmission gets partially ACKed and collapsed multiple times >> it is possible for the headroom to grow beyond 64K which will >> overflow the 16bit skb->csum_start. > > On the subject of fixing this, I considered: > > a) Reallocate the headroom in tcp_trim_head() if it would > overflow. I disregarded this idea because replacing the > old skb with the new skb on the write queue for such a > rare situation seems overkill. > > b) No longer collapse if the new skb would result in a > a headroom + data that exceeds 64K. This seems to be the > most trivial fix. > > c) Increase size of csum_start or store checksum_start_offset > differently. > > Other ideas? "b" is a good idea. Let's not paper over this, this BUG_ON() is really a BUG_ON() meaning "FIX ME NOW" :-) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop 2013-02-14 0:37 ` David Miller @ 2013-02-14 10:18 ` Thomas Graf 2013-02-14 16:22 ` Eric Dumazet 2013-02-14 18:00 ` David Miller 0 siblings, 2 replies; 8+ messages in thread From: Thomas Graf @ 2013-02-14 10:18 UTC (permalink / raw) To: David Miller; +Cc: netdev On 02/13/13 at 07:37pm, David Miller wrote: > From: Thomas Graf <tgraf@suug.ch> > Date: Wed, 13 Feb 2013 23:48:43 +0000 [...] > > b) No longer collapse if the new skb would result in a > > a headroom + data that exceeds 64K. This seems to be the > > most trivial fix. [...] > > Other ideas? > > "b" is a good idea. OK, patch to do so being tested by original reporter. > Let's not paper over this, this BUG_ON() is really a BUG_ON() > meaning "FIX ME NOW" :-) Maybe it's my general dislike of BUG_ON() in the processing path, especially if the bug condition can be influenced remotely. It looks absolutely doable to trigger the previously mentioned partial acking & collapsing on purpose by a malicious receiver even with an MTU of 1500. I believe we should avoid total DoS in future similar situations that we don't think of yet. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop 2013-02-14 10:18 ` Thomas Graf @ 2013-02-14 16:22 ` Eric Dumazet 2013-02-14 18:50 ` Thomas Graf 2013-02-14 18:00 ` David Miller 1 sibling, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2013-02-14 16:22 UTC (permalink / raw) To: Thomas Graf; +Cc: David Miller, netdev On Thu, 2013-02-14 at 10:18 +0000, Thomas Graf wrote: > Maybe it's my general dislike of BUG_ON() in the processing > path, especially if the bug condition can be influenced remotely. > It looks absolutely doable to trigger the previously mentioned > partial acking & collapsing on purpose by a malicious receiver > even with an MTU of 1500. I believe we should avoid total DoS > in future similar situations that we don't think of yet. It seems not possible to avoid bugs, being a BUG_ON() or a out of bound memory access or whatever. We must fix them eventually. In this case, it seems we must limit payload to 65535 - MAX_TCP_HEADER It would make tcp_xmit_size_goal() a bit shorter. diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 2c7e596..2f6c8e5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -793,10 +793,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, xmit_size_goal = mss_now; if (large_allowed && sk_can_gso(sk)) { - xmit_size_goal = ((sk->sk_gso_max_size - 1) - - inet_csk(sk)->icsk_af_ops->net_header_len - - inet_csk(sk)->icsk_ext_hdr_len - - tp->tcp_header_len); + xmit_size_goal = sk->sk_gso_max_size - 1 - MAX_TCP_HEADER; /* TSQ : try to have two TSO segments in flight */ xmit_size_goal = min_t(u32, xmit_size_goal, ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop 2013-02-14 16:22 ` Eric Dumazet @ 2013-02-14 18:50 ` Thomas Graf 2013-02-14 19:21 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Thomas Graf @ 2013-02-14 18:50 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev On 02/14/13 at 08:22am, Eric Dumazet wrote: > It seems not possible to avoid bugs, being a BUG_ON() or a out of bound > memory access or whatever. We must fix them eventually. Of course, I never intended to not fix this but I still think leaving the BUG_ON() is wrong ;-) > In this case, it seems we must limit payload to > > 65535 - MAX_TCP_HEADER > > It would make tcp_xmit_size_goal() a bit shorter. > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 2c7e596..2f6c8e5 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -793,10 +793,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, > xmit_size_goal = mss_now; > > if (large_allowed && sk_can_gso(sk)) { > - xmit_size_goal = ((sk->sk_gso_max_size - 1) - > - inet_csk(sk)->icsk_af_ops->net_header_len - > - inet_csk(sk)->icsk_ext_hdr_len - > - tp->tcp_header_len); > + xmit_size_goal = sk->sk_gso_max_size - 1 - MAX_TCP_HEADER; > > /* TSQ : try to have two TSO segments in flight */ > xmit_size_goal = min_t(u32, xmit_size_goal, I don't think this would help. The allocated skb data will still exceed 64K and thus after trimming the acked data and collapsing the header might be stored after the 64K mark. We would have to limit the skb tailroom to 64K upon allocation. That would mean we would waste some of the additional space that kmalloc() might have given us: diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 32443eb..c8f9850 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -241,6 +241,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mas * to allow max possible filling before reallocation. */ size = SKB_WITH_OVERHEAD(ksize(data)); + /* ensure that all offsets based on skb->head fit into 16bits */ + size = min_t(int, size, 65535); prefetchw(data + size); /* Or if that is not ideal, avoiding the collapse that causes the overflow would also help: diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 5d45159..e9111b4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2301,6 +2301,12 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to, if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp))) break; + /* Never collapse if the resulting headroom + data exceeds + * 64K as that is the maximum csum_start can cover. + */ + if (skb_headroom(to) + to->len + skb->len > 65535) + break; + tcp_collapse_retrans(sk, to); } } ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop 2013-02-14 18:50 ` Thomas Graf @ 2013-02-14 19:21 ` Eric Dumazet 0 siblings, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2013-02-14 19:21 UTC (permalink / raw) To: Thomas Graf; +Cc: David Miller, netdev On Thu, 2013-02-14 at 18:50 +0000, Thomas Graf wrote: > On 02/14/13 at 08:22am, Eric Dumazet wrote: > > It seems not possible to avoid bugs, being a BUG_ON() or a out of bound > > memory access or whatever. We must fix them eventually. > > Of course, I never intended to not fix this but I still think > leaving the BUG_ON() is wrong ;-) > > > In this case, it seems we must limit payload to > > > > 65535 - MAX_TCP_HEADER > > > > It would make tcp_xmit_size_goal() a bit shorter. > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index 2c7e596..2f6c8e5 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -793,10 +793,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now, > > xmit_size_goal = mss_now; > > > > if (large_allowed && sk_can_gso(sk)) { > > - xmit_size_goal = ((sk->sk_gso_max_size - 1) - > > - inet_csk(sk)->icsk_af_ops->net_header_len - > > - inet_csk(sk)->icsk_ext_hdr_len - > > - tp->tcp_header_len); > > + xmit_size_goal = sk->sk_gso_max_size - 1 - MAX_TCP_HEADER; > > > > /* TSQ : try to have two TSO segments in flight */ > > xmit_size_goal = min_t(u32, xmit_size_goal, > > I don't think this would help. The allocated skb data will still > exceed 64K and thus after trimming the acked data and collapsing > the header might be stored after the 64K mark. > OK, I now understand your issue. > We would have to limit the skb tailroom to 64K upon allocation. > That would mean we would waste some of the additional space that > kmalloc() might have given us: > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 32443eb..c8f9850 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -241,6 +241,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mas > * to allow max possible filling before reallocation. > */ > size = SKB_WITH_OVERHEAD(ksize(data)); > + /* ensure that all offsets based on skb->head fit into 16bits */ > + size = min_t(int, size, 65535); > prefetchw(data + size); > This part is certainly not good. alloc_skb() is generic and some users really want more than 64K > /* > > > Or if that is not ideal, avoiding the collapse that causes the overflow > would also help: > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 5d45159..e9111b4 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2301,6 +2301,12 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to, > if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp))) > break; > > + /* Never collapse if the resulting headroom + data exceeds > + * 64K as that is the maximum csum_start can cover. > + */ > + if (skb_headroom(to) + to->len + skb->len > 65535) > + break; > + > tcp_collapse_retrans(sk, to); > Definitely better. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop 2013-02-14 10:18 ` Thomas Graf 2013-02-14 16:22 ` Eric Dumazet @ 2013-02-14 18:00 ` David Miller 1 sibling, 0 replies; 8+ messages in thread From: David Miller @ 2013-02-14 18:00 UTC (permalink / raw) To: tgraf; +Cc: netdev From: Thomas Graf <tgraf@suug.ch> Date: Thu, 14 Feb 2013 10:18:53 +0000 > On 02/13/13 at 07:37pm, David Miller wrote: > Maybe it's my general dislike of BUG_ON() in the processing > path, especially if the bug condition can be influenced remotely. > It looks absolutely doable to trigger the previously mentioned > partial acking & collapsing on purpose by a malicious receiver > even with an MTU of 1500. I believe we should avoid total DoS > in future similar situations that we don't think of yet. I heard that people can very effectively protect themselves from DoS's by not using Infiniband. :-) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-02-14 19:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-13 23:40 [PATCH] net: Convert skb->csum_(start|offset) integrity BUG_ON() to WARN_ON() & drop Thomas Graf 2013-02-13 23:48 ` Thomas Graf 2013-02-14 0:37 ` David Miller 2013-02-14 10:18 ` Thomas Graf 2013-02-14 16:22 ` Eric Dumazet 2013-02-14 18:50 ` Thomas Graf 2013-02-14 19:21 ` Eric Dumazet 2013-02-14 18:00 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).