* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops [not found] <CA+0Zf5Be5JUSvssH9CrDQq=jtETZr=veAPC7jriM8B-FZKbbpg@mail.gmail.com> @ 2011-10-14 14:57 ` Eric Dumazet 2011-10-17 7:16 ` Elmar Vonlanthen 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2011-10-14 14:57 UTC (permalink / raw) To: Elmar Vonlanthen; +Cc: linux-kernel, netdev, Timo Teräs, Herbert Xu Le vendredi 14 octobre 2011 à 10:08 +0200, Elmar Vonlanthen a écrit : > Hello again > > Kernel 3.0.6 is affected as well. > > 2011-10-14 09:50:46 chgut11fw01 kernel: skb_over_panic: text:c13059c9 > len:64 put:64 head:c730c400 data:c730c450 tail:0xc730c490 > end:0xc730c480 dev:<NULL> > 2011-10-14 09:50:46 chgut11fw01 kernel: ------------[ cut here ]------------ > 2011-10-14 09:50:46 chgut11fw01 kernel: kernel BUG at net/core/skbuff.c:128! > 2011-10-14 09:50:46 chgut11fw01 kernel: invalid opcode: 0000 [#1] SMP > 2011-10-14 09:50:46 chgut11fw01 kernel: Modules linked in: ip_gre gre > authenc xfrm4_mode_transport deflate zlib_deflate ctr twofish_generic > twofish_common serpent cryptd aes_i586 aes_generic blowfish cast5 cbc > ecb rmd160 sha512_generic sha256_generic sha1_generic xfrm_user > xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key tun ipt_LOG > xt_limit ipt_REJECT xt_state ipt_REDIRECT ipt_MASQUERADE xt_policy > xt_TCPMSS xt_tcpmss xt_tcpudp xt_NOTRACK iptable_filter iptable_nat > xt_mark xt_connmark iptable_mangle iptable_raw ip_tables x_tables > nf_conntrack_tftp nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 > nf_conntrack_ftp nf_conntrack rtc ppdev parport_pc parport w83792d > i2c_dev i2c_i801 i2c_core pl2303 usbserial coretemp hwmon usbhid > ohci_hcd uhci_hcd ehci_hcd usbcore e1000 e1000e aufs ata_piix libata > 2011-10-14 09:50:46 chgut11fw01 kernel: > 2011-10-14 09:50:46 chgut11fw01 kernel: Pid: 6299, comm: ospfd Not > tainted 3.0.6-SMP #1 /LakePort > 2011-10-14 09:50:46 chgut11fw01 kernel: EIP: 0060:[<c12b6815>] EFLAGS: > 00010292 CPU: 0 > 2011-10-14 09:50:46 chgut11fw01 kernel: EIP is at skb_put+0x85/0x90 > 2011-10-14 09:50:46 chgut11fw01 kernel: EAX: 00000078 EBX: c13059c9 > ECX: 00000096 EDX: ffffff8b > 2011-10-14 09:50:46 chgut11fw01 kernel: ESI: deac4a80 EDI: 00000040 > EBP: d8979c80 ESP: d8979c58 > 2011-10-14 09:50:46 chgut11fw01 kernel: DS: 007b ES: 007b FS: 00d8 GS: > 0033 SS: 0068 > 2011-10-14 09:50:46 chgut11fw01 kernel: Process ospfd (pid: 6299, > ti=d8978000 task=de94ee80 task.ti=d8978000) > 2011-10-14 09:50:46 chgut11fw01 kernel: Stack: > 2011-10-14 09:50:46 chgut11fw01 kernel: c13fd194 c13059c9 00000040 > 00000040 c730c400 c730c450 c730c490 c730c480 > 2011-10-14 09:50:46 chgut11fw01 kernel: c13fb21c d897da40 d8979d38 > c13059c9 d8979d24 8db04700 00000001 00000001 > 2011-10-14 09:50:46 chgut11fw01 kernel: d8979ce4 d8979cc8 c121ce7d > 00000000 000012b5 d8979ecc dbff0c00 c730c450 > 2011-10-14 09:50:46 chgut11fw01 kernel: Call Trace: > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c13059c9>] ? raw_sendmsg+0x5a9/0x850 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c13059c9>] raw_sendmsg+0x5a9/0x850 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c121ce7d>] ? extract_entropy+0x5d/0x70 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c1027f73>] ? > try_to_wake_up+0x173/0x1f0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12b2700>] ? release_sock+0xb0/0xd0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c130f222>] inet_sendmsg+0x42/0xa0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c1021407>] ? > __wake_up_sync_key+0x47/0x60 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12afc07>] sock_sendmsg+0xa7/0xd0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12b29f3>] ? > sock_def_readable+0x33/0x60 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12afc07>] ? sock_sendmsg+0xa7/0xd0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12b8e83>] ? verify_iovec+0x53/0xb0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12b08e4>] __sys_sendmsg+0x2d4/0x2e0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12afc9e>] ? > sockfd_lookup_light+0x1e/0x70 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12b02fa>] ? sys_sendto+0xaa/0xe0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c102f928>] ? nsecs_to_jiffies+0x8/0x10 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12eaf11>] ? ip_setsockopt+0x41/0xa0 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12b0a36>] sys_sendmsg+0x36/0x60 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c12b1669>] sys_socketcall+0xe9/0x280 > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c133d8c5>] syscall_call+0x7/0xb > 2011-10-14 09:50:46 chgut11fw01 kernel: [<c1330000>] ? > packet_recvmsg+0x170/0x440 > 2011-10-14 09:50:46 chgut11fw01 kernel: Code: 00 00 89 4c 24 14 8b 88 > a4 00 00 00 89 54 24 0c 89 4c 24 10 8b 40 50 89 5c 24 04 c7 04 24 94 > d1 3f c1 89 44 24 08 e8 b4 4b 08 00 <0f> 0b eb fe b9 1c b2 3f c1 eb ae > 55 89 e5 57 56 89 d6 53 89 c3 > 2011-10-14 09:50:46 chgut11fw01 kernel: EIP: [<c12b6815>] > skb_put+0x85/0x90 SS:ESP 0068:d8979c58 > 2011-10-14 09:50:46 chgut11fw01 kernel: ---[ end trace ff341104610beeed ]--- > Could anyone give me a hint, how I can furher proceed to find the error? > Thanks. Please try following patch : [PATCH] ip_gre: dont increase dev->needed_headroom on a live device It seems ip_gre is able to change dev->needed_headroom on the fly. Its is not legal unfortunately and triggers a BUG in raw_sendmsg() skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) < another cpu change dev->needed_headromm (making it bigger) ... skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE() -> we crash later because skb head is exhausted. Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route header_len in max_headroom calculation) Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Timo Teräs <timo.teras@iki.fi> CC: Herbert Xu <herbert@gondor.apana.org.au> --- diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 8871067..1505dcf 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -835,8 +835,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev if (skb_headroom(skb) < max_headroom || skb_shared(skb)|| (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom); - if (max_headroom > dev->needed_headroom) - dev->needed_headroom = max_headroom; if (!new_skb) { ip_rt_put(rt); dev->stats.tx_dropped++; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-14 14:57 ` PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops Eric Dumazet @ 2011-10-17 7:16 ` Elmar Vonlanthen 2011-10-18 2:30 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Elmar Vonlanthen @ 2011-10-17 7:16 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, netdev, Timo Teräs, Herbert Xu 2011/10/14 Eric Dumazet <eric.dumazet@gmail.com>: > Please try following patch : > > [PATCH] ip_gre: dont increase dev->needed_headroom on a live device > > It seems ip_gre is able to change dev->needed_headroom on the fly. > > Its is not legal unfortunately and triggers a BUG in raw_sendmsg() > > skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) > > < another cpu change dev->needed_headromm (making it bigger) > > ... > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); > > We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE() > -> we crash later because skb head is exhausted. > > Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route > header_len in max_headroom calculation) > > Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: Timo Teräs <timo.teras@iki.fi> > CC: Herbert Xu <herbert@gondor.apana.org.au> > --- > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 8871067..1505dcf 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -835,8 +835,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev > if (skb_headroom(skb) < max_headroom || skb_shared(skb)|| > (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { > struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom); > - if (max_headroom > dev->needed_headroom) > - dev->needed_headroom = max_headroom; > if (!new_skb) { > ip_rt_put(rt); > dev->stats.tx_dropped++; Hello I tried this patch and I was not able anymore to reproduce the kernel oops. So the patch solved the bug. Thank you very much! Would it be possible to add the patch to the long term kernel 2.6.35 as well? Because this is the one I use at the moment in production. And sorry for posting to the wrong mailing list (linux-kernel). Best regards Elmar ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-17 7:16 ` Elmar Vonlanthen @ 2011-10-18 2:30 ` Eric Dumazet 2011-10-18 9:34 ` Herbert Xu 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2011-10-18 2:30 UTC (permalink / raw) To: Elmar Vonlanthen; +Cc: linux-kernel, netdev, Timo Teräs, Herbert Xu Le lundi 17 octobre 2011 à 09:16 +0200, Elmar Vonlanthen a écrit : > 2011/10/14 Eric Dumazet <eric.dumazet@gmail.com>: > > Please try following patch : > > > > [PATCH] ip_gre: dont increase dev->needed_headroom on a live device > > > > It seems ip_gre is able to change dev->needed_headroom on the fly. > > > > Its is not legal unfortunately and triggers a BUG in raw_sendmsg() > > > > skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) > > > > < another cpu change dev->needed_headromm (making it bigger) > > > > ... > > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); > > > > We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE() > > -> we crash later because skb head is exhausted. > > > > Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route > > header_len in max_headroom calculation) > > > > Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > CC: Timo Teräs <timo.teras@iki.fi> > > CC: Herbert Xu <herbert@gondor.apana.org.au> > > --- > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > index 8871067..1505dcf 100644 > > --- a/net/ipv4/ip_gre.c > > +++ b/net/ipv4/ip_gre.c > > @@ -835,8 +835,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev > > if (skb_headroom(skb) < max_headroom || skb_shared(skb)|| > > (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { > > struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom); > > - if (max_headroom > dev->needed_headroom) > > - dev->needed_headroom = max_headroom; > > if (!new_skb) { > > ip_rt_put(rt); > > dev->stats.tx_dropped++; > > Hello > > I tried this patch and I was not able anymore to reproduce the kernel > oops. So the patch solved the bug. > Thank you very much! > > Would it be possible to add the patch to the long term kernel 2.6.35 > as well? Because this is the one I use at the moment in production. > Thanks for testing. If David/Herbert/Timo agree, then patch should find its way into current kernel, then to stable trees as well. Thanks ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 2:30 ` Eric Dumazet @ 2011-10-18 9:34 ` Herbert Xu 2011-10-18 10:01 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Herbert Xu @ 2011-10-18 9:34 UTC (permalink / raw) To: Eric Dumazet; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs On Tue, Oct 18, 2011 at 04:30:32AM +0200, Eric Dumazet wrote: > > If David/Herbert/Timo agree, then patch should find its way into current > kernel, then to stable trees as well. Actually, I think we should instead fix the users of needed_headroom to not read it twice which is causing problems here. GRE tunnels by their nature do not have a fixed value for needed_headroom. As the underlying routes change the necessary headroom may need to be adjusted due to further encapsulation such as IPsec. Keeping it constant from tunnel creation may result in suboptimal performance due to unnecessary header reallocations. However, until we audit the stack to see if there are further instances of double-readings such as the one causing the crash here, I'm fine with your patch making it constant. Once we're sure that all of the double-readings are gone we can revert to a dynamic needed_headroom. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 9:34 ` Herbert Xu @ 2011-10-18 10:01 ` Eric Dumazet 2011-10-18 10:05 ` Herbert Xu 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2011-10-18 10:01 UTC (permalink / raw) To: Herbert Xu; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs Le mardi 18 octobre 2011 à 11:34 +0200, Herbert Xu a écrit : > On Tue, Oct 18, 2011 at 04:30:32AM +0200, Eric Dumazet wrote: > > > > If David/Herbert/Timo agree, then patch should find its way into current > > kernel, then to stable trees as well. > > Actually, I think we should instead fix the users of needed_headroom > to not read it twice which is causing problems here. > > GRE tunnels by their nature do not have a fixed value for > needed_headroom. As the underlying routes change the necessary > headroom may need to be adjusted due to further encapsulation such > as IPsec. > > Keeping it constant from tunnel creation may result in suboptimal > performance due to unnecessary header reallocations. > > However, until we audit the stack to see if there are further > instances of double-readings such as the one causing the crash > here, I'm fine with your patch making it constant. > > Once we're sure that all of the double-readings are gone we > can revert to a dynamic needed_headroom. > Sure, we can work on this path for future kernels. Adding an RCU protected structure to hold hard_header_len / needed_headroom / needed_tailroom should be possible, but this adds yet another pointer dereference... Thanks ! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 10:01 ` Eric Dumazet @ 2011-10-18 10:05 ` Herbert Xu 2011-10-18 10:23 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Herbert Xu @ 2011-10-18 10:05 UTC (permalink / raw) To: Eric Dumazet; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs On Tue, Oct 18, 2011 at 12:01:33PM +0200, Eric Dumazet wrote: > > Adding an RCU protected structure to hold hard_header_len / > needed_headroom / needed_tailroom should be possible, but this adds yet > another pointer dereference... I don't think we need RCU here since the problem is simply that we're using two different values for skb allocations and skb_reserve. As long as we use one and the same value it should work. The value will rarely be incorrect and when it is, automatic reallocation will occur. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 10:05 ` Herbert Xu @ 2011-10-18 10:23 ` Eric Dumazet 2011-10-18 10:45 ` Herbert Xu 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2011-10-18 10:23 UTC (permalink / raw) To: Herbert Xu; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs Le mardi 18 octobre 2011 à 12:05 +0200, Herbert Xu a écrit : > On Tue, Oct 18, 2011 at 12:01:33PM +0200, Eric Dumazet wrote: > > > > Adding an RCU protected structure to hold hard_header_len / > > needed_headroom / needed_tailroom should be possible, but this adds yet > > another pointer dereference... > > I don't think we need RCU here since the problem is simply that > we're using two different values for skb allocations and skb_reserve. > > As long as we use one and the same value it should work. The value > will rarely be incorrect and when it is, automatic reallocation will > occur. > You're right, if reallocations are OK in all paths. We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() / LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead of a [read once] pointer to values. Thats a bit complex change, but doable. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 10:23 ` Eric Dumazet @ 2011-10-18 10:45 ` Herbert Xu 2011-10-18 11:37 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Herbert Xu @ 2011-10-18 10:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs On Tue, Oct 18, 2011 at 12:23:43PM +0200, Eric Dumazet wrote: > > You're right, if reallocations are OK in all paths. If it wasn't OK then making needed_headroom constant won't work anyway. > We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() / > LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead > of a [read once] pointer to values. I'm not sure what you mean here. I don't see any need to change these macros. All we need is to save the value in a local variable: hh_len = LL_RESERVED_SPACE(dev); skb = alloc_skb(hh_len + len); skb_reserve(skb, hh_len); Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 10:45 ` Herbert Xu @ 2011-10-18 11:37 ` Eric Dumazet 2011-10-18 11:49 ` Herbert Xu 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2011-10-18 11:37 UTC (permalink / raw) To: Herbert Xu; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs Le mardi 18 octobre 2011 à 12:45 +0200, Herbert Xu a écrit : > On Tue, Oct 18, 2011 at 12:23:43PM +0200, Eric Dumazet wrote: > > > > You're right, if reallocations are OK in all paths. > > If it wasn't OK then making needed_headroom constant won't work > anyway. > > > We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() / > > LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead > > of a [read once] pointer to values. > > I'm not sure what you mean here. I don't see any need to change > these macros. All we need is to save the value in a local variable: > > hh_len = LL_RESERVED_SPACE(dev); > > skb = alloc_skb(hh_len + len); > skb_reserve(skb, hh_len); > Not really Herbert. Please read again my patch changelog. In the bug we try to fix, we have : skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) ... < increase of dev->needed_headroom by another cpu/task > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); skb_put() -> crash because we reserved too much space So we really want LL_ALLOCATED_SPACE() and LL_RESERVED_SPACE() use the same needed_headroom, or else you can have LL_RESERVED_SPACE() > LL_ALLOCATED_SPACE(). There are several way to fix this, but this kind of code assumed the dev->needed... values were consistent for the whole block. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 11:37 ` Eric Dumazet @ 2011-10-18 11:49 ` Herbert Xu 2011-10-18 12:56 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: Herbert Xu @ 2011-10-18 11:49 UTC (permalink / raw) To: Eric Dumazet; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs On Tue, Oct 18, 2011 at 01:37:58PM +0200, Eric Dumazet wrote: > > In the bug we try to fix, we have : > > skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) > > ... < increase of dev->needed_headroom by another cpu/task > > > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); OK, in that case one fix would be to replace LL_ALLOCATED_SPACE with its two constiuents so that they may be stored in local variables for later use. hlen = LL_HEADROOM(skb); tlen = LL_TAILROOM(skb); skb_alloc_send_skb(sk, ... + LL_ALIGN(hlen + tlen)); skb_reserve(skb, LL_ALIGN(hlen)); Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 11:49 ` Herbert Xu @ 2011-10-18 12:56 ` Eric Dumazet 2011-10-18 13:45 ` Herbert Xu 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2011-10-18 12:56 UTC (permalink / raw) To: Herbert Xu; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs Le mardi 18 octobre 2011 à 13:49 +0200, Herbert Xu a écrit : > On Tue, Oct 18, 2011 at 01:37:58PM +0200, Eric Dumazet wrote: > > > > In the bug we try to fix, we have : > > > > skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) > > > > ... < increase of dev->needed_headroom by another cpu/task > > > > > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); > > OK, in that case one fix would be to replace LL_ALLOCATED_SPACE > with its two constiuents so that they may be stored in local > variables for later use. > > hlen = LL_HEADROOM(skb); > tlen = LL_TAILROOM(skb); > skb_alloc_send_skb(sk, ... + LL_ALIGN(hlen + tlen)); > > skb_reserve(skb, LL_ALIGN(hlen)); > > Cheers, I am ok by this way, but we might hit another similar problem elsewhere. (igmp.c ip6_output, ...) We effectively want to remove LL_ALLOCATED_SPACE() usage and obfuscate code... [PATCH] raw: allow dev->needed_headroom dynamic change It seems ip_gre is able to change dev->needed_headroom on the fly. It triggers a BUG in raw_sendmsg() skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) < another cpu change dev->needed_headromm (making it bigger) ... skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE() -> we crash later because skb head is exhausted. Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route header_len in max_headroom calculation) Reported-by: Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Timo Teräs <timo.teras@iki.fi> CC: Herbert Xu <herbert@gondor.apana.org.au> --- include/linux/netdevice.h | 10 +++++++--- net/ipv4/raw.c | 9 +++++++-- net/ipv6/raw.c | 9 +++++++-- 3 files changed, 21 insertions(+), 7 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ddee79b..dba2399 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -276,12 +276,16 @@ struct hh_cache { * LL_ALLOCATED_SPACE also takes into account the tailroom the device * may need. */ +#define LL_ALIGN(__len) (((__len)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + #define LL_RESERVED_SPACE(dev) \ - ((((dev)->hard_header_len+(dev)->needed_headroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + LL_ALIGN((dev)->hard_header_len + (dev)->needed_headroom)) + #define LL_RESERVED_SPACE_EXTRA(dev,extra) \ - ((((dev)->hard_header_len+(dev)->needed_headroom+(extra))&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + LL_ALIGN((dev)->hard_header_len + (dev)->needed_headroom + (extra)) + #define LL_ALLOCATED_SPACE(dev) \ - ((((dev)->hard_header_len+(dev)->needed_headroom+(dev)->needed_tailroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + LL_ALIGN((dev)->hard_header_len + (dev)->needed_headroom + (dev)->needed_tailroom) struct header_ops { int (*create) (struct sk_buff *skb, struct net_device *dev, diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 61714bd..4ed4eda 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -326,6 +326,9 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, unsigned int iphlen; int err; struct rtable *rt = *rtp; + unsigned int hard_header_len = rt->dst.dev->hard_header_len; + unsigned int needed_headroom = rt->dst.dev->needed_headroom; + unsigned int needed_tailroom = rt->dst.dev->needed_tailroom; if (length > rt->dst.dev->mtu) { ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, @@ -336,11 +339,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, goto out; skb = sock_alloc_send_skb(sk, - length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15, + length + LL_ALIGN(hard_header_len + + needed_headroom + + needed_tailroom) + 15, flags & MSG_DONTWAIT, &err); if (skb == NULL) goto error; - skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); + skb_reserve(skb, LL_ALIGN(hard_header_len + needed_headroom)); skb->priority = sk->sk_priority; skb->mark = sk->sk_mark; diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 343852e..eb0a797 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -610,6 +610,9 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, struct sk_buff *skb; int err; struct rt6_info *rt = (struct rt6_info *)*dstp; + unsigned int hard_header_len = rt->dst.dev->hard_header_len; + unsigned int needed_headroom = rt->dst.dev->needed_headroom; + unsigned int needed_tailroom = rt->dst.dev->needed_tailroom; if (length > rt->dst.dev->mtu) { ipv6_local_error(sk, EMSGSIZE, fl6, rt->dst.dev->mtu); @@ -619,11 +622,13 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, goto out; skb = sock_alloc_send_skb(sk, - length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15, + length + LL_ALIGN(hard_header_len + + needed_headroom + + needed_tailroom) + 15, flags & MSG_DONTWAIT, &err); if (skb == NULL) goto error; - skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); + skb_reserve(skb, LL_ALIGN(hard_header_len + needed_headroom)); skb->priority = sk->sk_priority; skb->mark = sk->sk_mark; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 12:56 ` Eric Dumazet @ 2011-10-18 13:45 ` Herbert Xu 2011-10-19 7:09 ` David Miller 0 siblings, 1 reply; 31+ messages in thread From: Herbert Xu @ 2011-10-18 13:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: Elmar Vonlanthen, linux-kernel, netdev, Timo Teräs On Tue, Oct 18, 2011 at 02:56:00PM +0200, Eric Dumazet wrote: > > I am ok by this way, but we might hit another similar problem elsewhere. > > (igmp.c ip6_output, ...) > > We effectively want to remove LL_ALLOCATED_SPACE() usage and obfuscate > code... Here's another idea, provide a helper to do the skb allocation and the skb_reserve in one go. That way this ugliness would only need to be done once. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-18 13:45 ` Herbert Xu @ 2011-10-19 7:09 ` David Miller 2011-10-19 7:18 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: David Miller @ 2011-10-19 7:09 UTC (permalink / raw) To: herbert; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Tue, 18 Oct 2011 15:45:37 +0200 > On Tue, Oct 18, 2011 at 02:56:00PM +0200, Eric Dumazet wrote: >> >> I am ok by this way, but we might hit another similar problem elsewhere. >> >> (igmp.c ip6_output, ...) >> >> We effectively want to remove LL_ALLOCATED_SPACE() usage and obfuscate >> code... > > Here's another idea, provide a helper to do the skb allocation > and the skb_reserve in one go. That way this ugliness would only > need to be done once. Someone please test this: -------------------- net: Fix crashes on devices which dynamically change needed headroom. One such device is IP_GRE. The problem is that we evaluate the device characteristics twice, once to determine the allocation size, and once to do the skb_reserve(). Combine these into one operation using a helper function. With help from Eric Dumazet and Herbert Xu. Reported-by: Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ddee79b..d4abb400 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -276,12 +276,22 @@ struct hh_cache { * LL_ALLOCATED_SPACE also takes into account the tailroom the device * may need. */ +#define LL_ALIGN(__len) (((__len)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + +#define __LL_RESERVED_SPACE(__hard_hdr_len, __needed_headroom) \ + LL_ALIGN((__hard_hdr_len) + (__needed_headroom)) + #define LL_RESERVED_SPACE(dev) \ - ((((dev)->hard_header_len+(dev)->needed_headroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + __LL_RESERVED_SPACE((dev)->hard_header_len, (dev)->needed_headroom)) + #define LL_RESERVED_SPACE_EXTRA(dev,extra) \ - ((((dev)->hard_header_len+(dev)->needed_headroom+(extra))&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + LL_ALIGN((dev)->hard_header_len + (dev)->needed_headroom + (extra)) + +#define __LL_ALLOCATED_SPACE(__hard_hdr_len, __needed_headroom, __needed_tailroom) \ + LL_ALIGN((__hard_hdr_len) + (__needed_headroom) + (__needed_tailroom)) + #define LL_ALLOCATED_SPACE(dev) \ - ((((dev)->hard_header_len+(dev)->needed_headroom+(dev)->needed_tailroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + __LL_ALLOCATED_SPACE((dev)->hard_header_len, (dev)->needed_headroom, (dev)->needed_tailroom) struct header_ops { int (*create) (struct sk_buff *skb, struct net_device *dev, diff --git a/include/net/sock.h b/include/net/sock.h index 8e4062f..9d96995 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1103,6 +1103,11 @@ extern struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size, int noblock, int *errcode); +extern struct sk_buff *sock_alloc_send_skb_reserve(struct sock *sk, + struct net_device *dev, + unsigned long base_size, + int noblock, + int *errcode); extern struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len, unsigned long data_len, diff --git a/net/core/sock.c b/net/core/sock.c index bc745d0..9cefb72 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1586,6 +1586,26 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size, } EXPORT_SYMBOL(sock_alloc_send_skb); +struct sk_buff *sock_alloc_send_skb_reserve(struct sock *sk, struct net_device *dev, + unsigned long base_size, int noblock, + int *errcode) +{ + unsigned int hh_len = dev->hard_header_len; + unsigned int needed_hroom = dev->needed_headroom; + unsigned int needed_troom = dev->needed_tailroom; + unsigned int reserve, lls; + struct sk_buff *skb; + + lls = __LL_ALLOCATED_SPACE(hh_len, needed_hroom, needed_troom); + skb = sock_alloc_send_skb(sk, (base_size + lls + 15), noblock, errcode); + if (skb) { + reserve = __LL_RESERVED_SPACE(hh_len, needed_hroom); + skb_reserve(skb, reserve); + } + return skb; +} +EXPORT_SYMBOL(sock_alloc_send_skb_reserve); + static void __lock_sock(struct sock *sk) __releases(&sk->sk_lock.slock) __acquires(&sk->sk_lock.slock) diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 61714bd..dcecb97 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -335,12 +335,11 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, if (flags&MSG_PROBE) goto out; - skb = sock_alloc_send_skb(sk, - length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15, - flags & MSG_DONTWAIT, &err); + skb = sock_alloc_send_skb_reserve(sk, rt->dst.dev, length, + flags & MSG_DONTWAIT, + &err); if (skb == NULL) goto error; - skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); skb->priority = sk->sk_priority; skb->mark = sk->sk_mark; diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 343852e..5a8e141 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -618,12 +618,11 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, if (flags&MSG_PROBE) goto out; - skb = sock_alloc_send_skb(sk, - length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15, - flags & MSG_DONTWAIT, &err); + skb = sock_alloc_send_skb_reserve(sk, rt->dst.dev, length, + flags & MSG_DONTWAIT, + &err); if (skb == NULL) goto error; - skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); skb->priority = sk->sk_priority; skb->mark = sk->sk_mark; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-19 7:09 ` David Miller @ 2011-10-19 7:18 ` Eric Dumazet 2011-10-19 7:30 ` David Miller 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2011-10-19 7:18 UTC (permalink / raw) To: David Miller; +Cc: herbert, evonlanthen, linux-kernel, netdev, timo.teras Le mercredi 19 octobre 2011 à 03:09 -0400, David Miller a écrit : > From: Herbert Xu <herbert@gondor.hengli.com.au> > Date: Tue, 18 Oct 2011 15:45:37 +0200 > > > On Tue, Oct 18, 2011 at 02:56:00PM +0200, Eric Dumazet wrote: > >> > >> I am ok by this way, but we might hit another similar problem elsewhere. > >> > >> (igmp.c ip6_output, ...) > >> > >> We effectively want to remove LL_ALLOCATED_SPACE() usage and obfuscate > >> code... > > > > Here's another idea, provide a helper to do the skb allocation > > and the skb_reserve in one go. That way this ugliness would only > > need to be done once. > > Someone please test this: > > -------------------- > net: Fix crashes on devices which dynamically change needed headroom. > > One such device is IP_GRE. > > The problem is that we evaluate the device characteristics twice, once > to determine the allocation size, and once to do the skb_reserve(). > > Combine these into one operation using a helper function. > > With help from Eric Dumazet and Herbert Xu. > > Reported-by: Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > Seems fine (Maybe do the +15 in caller site ?), but we also have other problematic cases, using alloc_skb() only... ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-19 7:18 ` Eric Dumazet @ 2011-10-19 7:30 ` David Miller 2011-10-19 7:52 ` Eric Dumazet 2011-10-19 8:08 ` Herbert Xu 0 siblings, 2 replies; 31+ messages in thread From: David Miller @ 2011-10-19 7:30 UTC (permalink / raw) To: eric.dumazet; +Cc: herbert, evonlanthen, linux-kernel, netdev, timo.teras From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 19 Oct 2011 09:18:33 +0200 > Seems fine (Maybe do the +15 in caller site ?), but we also have other > problematic cases, using alloc_skb() only... Ok, which ones operate over these problematic GRE tunnels? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-19 7:30 ` David Miller @ 2011-10-19 7:52 ` Eric Dumazet 2011-10-19 8:02 ` David Miller 2011-10-19 8:08 ` Herbert Xu 1 sibling, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2011-10-19 7:52 UTC (permalink / raw) To: David Miller; +Cc: herbert, evonlanthen, linux-kernel, netdev, timo.teras Le mercredi 19 octobre 2011 à 03:30 -0400, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 19 Oct 2011 09:18:33 +0200 > > > Seems fine (Maybe do the +15 in caller site ?), but we also have other > > problematic cases, using alloc_skb() only... > > Ok, which ones operate over these problematic GRE tunnels? > I was more thinking of general idea to allow any device to change its needed headroom. For GRE tunnels, I dont think IPv6 fragmentation could be relevant, but maybe IGMP could trigger a problem ? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-19 7:52 ` Eric Dumazet @ 2011-10-19 8:02 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2011-10-19 8:02 UTC (permalink / raw) To: eric.dumazet; +Cc: herbert, evonlanthen, linux-kernel, netdev, timo.teras From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 19 Oct 2011 09:52:19 +0200 > For GRE tunnels, I dont think IPv6 fragmentation could be relevant, > but maybe IGMP could trigger a problem ? Funny... icmpv6 uses ip6_append_data() which uses hh_len in a local variable, which seems to suggest that it's immune to this problem. IPV4 side seems identical in this regard. So, as far as I can see, my patch is sufficient to cover IP_GRE reasonably in the 'net' tree. Agreed? If so, someone please test this thing :-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-19 7:30 ` David Miller 2011-10-19 7:52 ` Eric Dumazet @ 2011-10-19 8:08 ` Herbert Xu 2011-10-20 9:30 ` David Miller 1 sibling, 1 reply; 31+ messages in thread From: Herbert Xu @ 2011-10-19 8:08 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras On Wed, Oct 19, 2011 at 03:30:52AM -0400, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 19 Oct 2011 09:18:33 +0200 > > > Seems fine (Maybe do the +15 in caller site ?), but we also have other > > problematic cases, using alloc_skb() only... > > Ok, which ones operate over these problematic GRE tunnels? Potentially all of them since we now support Ethernet-over-GRE. I think Eric's initial patch is probably the safest bet for rc10. We can then work on the proper fix for the next release. As to the latter, I've just done a grep over net and it seems that all users of LL_ALLOCATED_SPACE fall into two cases, alloc_skb users or sock_alloc_send_skb (including pskb) users. For alloc_skb we could add a new helper. While for the other case we could either create a new helper or just add an extra dev argument that may be NULL for those that don't care about LL_ALLOCATED_SPACE. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-19 8:08 ` Herbert Xu @ 2011-10-20 9:30 ` David Miller 2011-10-20 9:35 ` Herbert Xu 0 siblings, 1 reply; 31+ messages in thread From: David Miller @ 2011-10-20 9:30 UTC (permalink / raw) To: herbert; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Wed, 19 Oct 2011 10:08:07 +0200 > I think Eric's initial patch is probably the safest bet for rc10. > We can then work on the proper fix for the next release. There are two "initial patch", I wonder which one you mean. There's his really first patch, which remoevs the lines in IP_GRE which change dev->needed_headroom. I was under the impression we were against doing that. The other patch he posted duplicates the device attribute variable caching in two functions. My patch is just a tweak so that we only do this sequence in one place, the new sock_alloc_send_skb_reserve() helper, instead of in both the ipv4 and ipv6 RAW code. So I'm a little confused what your suggestion for rc10 really is :-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-20 9:30 ` David Miller @ 2011-10-20 9:35 ` Herbert Xu 2011-10-20 20:21 ` David Miller 2011-10-25 11:54 ` Herbert Xu 0 siblings, 2 replies; 31+ messages in thread From: Herbert Xu @ 2011-10-20 9:35 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras On Thu, Oct 20, 2011 at 05:30:50AM -0400, David Miller wrote: > > So I'm a little confused what your suggestion for rc10 really > is :-) I meant his first initial patch :) While it is suboptimal in the sense that should the value of needed_headroom increase we'll end up constantly reallocating skbs, I believe that it is at least semantically correct. In the time being I'll look more closely at all the users of needed_headroom to see if there's anything we've missed. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-20 9:35 ` Herbert Xu @ 2011-10-20 20:21 ` David Miller 2011-10-25 11:54 ` Herbert Xu 1 sibling, 0 replies; 31+ messages in thread From: David Miller @ 2011-10-20 20:21 UTC (permalink / raw) To: herbert; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Thu, 20 Oct 2011 11:35:41 +0200 > On Thu, Oct 20, 2011 at 05:30:50AM -0400, David Miller wrote: >> >> So I'm a little confused what your suggestion for rc10 really >> is :-) > > I meant his first initial patch :) > > While it is suboptimal in the sense that should the value of > needed_headroom increase we'll end up constantly reallocating > skbs, I believe that it is at least semantically correct. Ok, I applied Eric's patch which removes the dynamic changing of the needed_headroom in IP_GRE. Thanks everyone! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-20 9:35 ` Herbert Xu 2011-10-20 20:21 ` David Miller @ 2011-10-25 11:54 ` Herbert Xu 2011-10-26 3:12 ` David Miller 1 sibling, 1 reply; 31+ messages in thread From: Herbert Xu @ 2011-10-25 11:54 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras On Thu, Oct 20, 2011 at 11:35:41AM +0200, Herbert Xu wrote: > > In the time being I'll look more closely at all the users of > needed_headroom to see if there's anything we've missed. OK I've reviewed all the users of needed_headroom and I haven't found anything other than the cases that we have enumerated. One thing I noticed is that the macro LL_ALLOCATED_SPACE is completely bogus and buggy. It applies the alignment to the sum of headroom and tailroom. However, the alignment is then applied separately to the headroom when reserving, meaning that we may end up with insufficient tailroom. So I'm going to get rid of LL_ALLOCATED_SPACE completely and replace it with explicit references to the tailroom as it doesn't need the alignment anyway (The headroom needs alignment since we use it to ensure the head is aligned). Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops 2011-10-25 11:54 ` Herbert Xu @ 2011-10-26 3:12 ` David Miller 2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu 0 siblings, 1 reply; 31+ messages in thread From: David Miller @ 2011-10-26 3:12 UTC (permalink / raw) To: herbert; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Tue, 25 Oct 2011 13:54:25 +0200 > So I'm going to get rid of LL_ALLOCATED_SPACE completely and > replace it with explicit references to the tailroom as it doesn't > need the alignment anyway (The headroom needs alignment since > we use it to ensure the head is aligned). Ok. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment 2011-10-26 3:12 ` David Miller @ 2011-11-18 12:18 ` Herbert Xu 2011-11-18 12:20 ` [PATCH 3/6] net: Remove all uses of LL_ALLOCATED_SPACE Herbert Xu ` (6 more replies) 0 siblings, 7 replies; 31+ messages in thread From: Herbert Xu @ 2011-11-18 12:18 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras On Tue, Oct 25, 2011 at 11:12:04PM -0400, David Miller wrote: > From: Herbert Xu <herbert@gondor.hengli.com.au> > Date: Tue, 25 Oct 2011 13:54:25 +0200 > > > So I'm going to get rid of LL_ALLOCATED_SPACE completely and > > replace it with explicit references to the tailroom as it doesn't > > need the alignment anyway (The headroom needs alignment since > > we use it to ensure the head is aligned). > > Ok. Here are the patches that do this. I also picked up one spot that should have used LL_ALLOCATED_SPACE but did not (see 2nd last patch). Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 3/6] net: Remove all uses of LL_ALLOCATED_SPACE 2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu @ 2011-11-18 12:20 ` Herbert Xu 2011-11-18 12:20 ` [PATCH 1/6] ipv4: " Herbert Xu ` (5 subsequent siblings) 6 siblings, 0 replies; 31+ messages in thread From: Herbert Xu @ 2011-11-18 12:20 UTC (permalink / raw) To: David Miller, eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras net: Remove all uses of LL_ALLOCATED_SPACE The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the alignment to the sum of needed_headroom and needed_tailroom. As the amount that is then reserved for head room is needed_headroom with alignment, this means that the tail room left may be too small. This patch replaces all uses of LL_ALLOCATED_SPACE with the macro LL_RESERVED_SPACE and direct reference to needed_tailroom. This also fixes the problem with needed_headroom changing between allocating the skb and reserving the head room. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/core/netpoll.c | 6 ++++-- net/econet/af_econet.c | 7 +++++-- net/ieee802154/dgram.c | 7 +++++-- net/ieee802154/raw.c | 7 +++++-- net/packet/af_packet.c | 18 +++++++++++------- 5 files changed, 30 insertions(+), 15 deletions(-) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index cf64c1f..1a7d8e2 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -422,6 +422,7 @@ static void arp_reply(struct sk_buff *skb) struct sk_buff *send_skb; struct netpoll *np, *tmp; unsigned long flags; + int hlen, tlen; int hits = 0; if (list_empty(&npinfo->rx_np)) @@ -479,8 +480,9 @@ static void arp_reply(struct sk_buff *skb) if (tip != np->local_ip) continue; - send_skb = find_skb(np, size + LL_ALLOCATED_SPACE(np->dev), - LL_RESERVED_SPACE(np->dev)); + hlen = LL_RESERVED_SPACE(np->dev); + tlen = np->dev->needed_tailroom; + send_skb = find_skb(np, size + hlen + tlen, hlen); if (!send_skb) continue; diff --git a/net/econet/af_econet.c b/net/econet/af_econet.c index 1c1f26c..7e717cb 100644 --- a/net/econet/af_econet.c +++ b/net/econet/af_econet.c @@ -322,6 +322,7 @@ static int econet_sendmsg(struct kiocb *iocb, struct socket *sock, /* Real hardware Econet. We're not worthy etc. */ #ifdef CONFIG_ECONET_NATIVE unsigned short proto = 0; + int hlen, tlen; int res; if (len + 15 > dev->mtu) { @@ -331,12 +332,14 @@ static int econet_sendmsg(struct kiocb *iocb, struct socket *sock, dev_hold(dev); - skb = sock_alloc_send_skb(sk, len + LL_ALLOCATED_SPACE(dev), + hlen = LL_RESERVED_SPACE(dev); + tlen = dev->needed_tailroom; + skb = sock_alloc_send_skb(sk, len + hlen + tlen, msg->msg_flags & MSG_DONTWAIT, &err); if (skb == NULL) goto out_unlock; - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); skb_reset_network_header(skb); eb = (struct ec_cb *)&skb->cb; diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c index faecf64..1b09eaa 100644 --- a/net/ieee802154/dgram.c +++ b/net/ieee802154/dgram.c @@ -209,6 +209,7 @@ static int dgram_sendmsg(struct kiocb *iocb, struct sock *sk, unsigned mtu; struct sk_buff *skb; struct dgram_sock *ro = dgram_sk(sk); + int hlen, tlen; int err; if (msg->msg_flags & MSG_OOB) { @@ -229,13 +230,15 @@ static int dgram_sendmsg(struct kiocb *iocb, struct sock *sk, mtu = dev->mtu; pr_debug("name = %s, mtu = %u\n", dev->name, mtu); - skb = sock_alloc_send_skb(sk, LL_ALLOCATED_SPACE(dev) + size, + hlen = LL_RESERVED_SPACE(dev); + tlen = dev->needed_tailroom; + skb = sock_alloc_send_skb(sk, hlen + tlen + size, msg->msg_flags & MSG_DONTWAIT, &err); if (!skb) goto out_dev; - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); skb_reset_network_header(skb); diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c index 10970ca..f96bae8 100644 --- a/net/ieee802154/raw.c +++ b/net/ieee802154/raw.c @@ -108,6 +108,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, struct net_device *dev; unsigned mtu; struct sk_buff *skb; + int hlen, tlen; int err; if (msg->msg_flags & MSG_OOB) { @@ -137,12 +138,14 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, goto out_dev; } - skb = sock_alloc_send_skb(sk, LL_ALLOCATED_SPACE(dev) + size, + hlen = LL_RESERVED_SPACE(dev); + tlen = dev->needed_tailroom; + skb = sock_alloc_send_skb(sk, hlen + tlen + size, msg->msg_flags & MSG_DONTWAIT, &err); if (!skb) goto out_dev; - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); skb_reset_mac_header(skb); skb_reset_network_header(skb); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 82a6f34..71c1a75 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1944,7 +1944,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb) static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, void *frame, struct net_device *dev, int size_max, - __be16 proto, unsigned char *addr) + __be16 proto, unsigned char *addr, int hlen) { union { struct tpacket_hdr *h1; @@ -1978,7 +1978,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, return -EMSGSIZE; } - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); skb_reset_network_header(skb); data = ph.raw + po->tp_hdrlen - sizeof(struct sockaddr_ll); @@ -2053,6 +2053,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) unsigned char *addr; int len_sum = 0; int status = 0; + int hlen, tlen; mutex_lock(&po->pg_vec_lock); @@ -2101,16 +2102,17 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) } status = TP_STATUS_SEND_REQUEST; + hlen = LL_RESERVED_SPACE(dev); + tlen = dev->needed_tailroom; skb = sock_alloc_send_skb(&po->sk, - LL_ALLOCATED_SPACE(dev) - + sizeof(struct sockaddr_ll), + hlen + tlen + sizeof(struct sockaddr_ll), 0, &err); if (unlikely(skb == NULL)) goto out_status; tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto, - addr); + addr, hlen); if (unlikely(tp_len < 0)) { if (po->tp_loss) { @@ -2207,6 +2209,7 @@ static int packet_snd(struct socket *sock, int vnet_hdr_len; struct packet_sock *po = pkt_sk(sk); unsigned short gso_type = 0; + int hlen, tlen; /* * Get and verify the address. @@ -2291,8 +2294,9 @@ static int packet_snd(struct socket *sock, goto out_unlock; err = -ENOBUFS; - skb = packet_alloc_skb(sk, LL_ALLOCATED_SPACE(dev), - LL_RESERVED_SPACE(dev), len, vnet_hdr.hdr_len, + hlen = LL_RESERVED_SPACE(dev); + tlen = dev->needed_tailroom; + skb = packet_alloc_skb(sk, hlen + tlen, hlen, len, vnet_hdr.hdr_len, msg->msg_flags & MSG_DONTWAIT, &err); if (skb == NULL) goto out_unlock; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 1/6] ipv4: Remove all uses of LL_ALLOCATED_SPACE 2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu 2011-11-18 12:20 ` [PATCH 3/6] net: Remove all uses of LL_ALLOCATED_SPACE Herbert Xu @ 2011-11-18 12:20 ` Herbert Xu 2011-11-18 12:20 ` [PATCH 2/6] ipv6: " Herbert Xu ` (4 subsequent siblings) 6 siblings, 0 replies; 31+ messages in thread From: Herbert Xu @ 2011-11-18 12:20 UTC (permalink / raw) To: David Miller, eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras ipv4: Remove all uses of LL_ALLOCATED_SPACE The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the alignment to the sum of needed_headroom and needed_tailroom. As the amount that is then reserved for head room is needed_headroom with alignment, this means that the tail room left may be too small. This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv4 with the macro LL_RESERVED_SPACE and direct reference to needed_tailroom. This also fixes the problem with needed_headroom changing between allocating the skb and reserving the head room. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/ipv4/arp.c | 6 ++++-- net/ipv4/igmp.c | 13 +++++++++---- net/ipv4/ipconfig.c | 6 ++++-- net/ipv4/raw.c | 7 +++++-- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 96a164a..fb3fdbc 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -592,16 +592,18 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, struct sk_buff *skb; struct arphdr *arp; unsigned char *arp_ptr; + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; /* * Allocate a buffer */ - skb = alloc_skb(arp_hdr_len(dev) + LL_ALLOCATED_SPACE(dev), GFP_ATOMIC); + skb = alloc_skb(arp_hdr_len(dev) + hlen + tlen, GFP_ATOMIC); if (skb == NULL) return NULL; - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); skb_reset_network_header(skb); arp = (struct arphdr *) skb_put(skb, arp_hdr_len(dev)); skb->dev = dev; diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index c7472ef..fbc5376 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -304,9 +304,11 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size) struct igmpv3_report *pig; struct net *net = dev_net(dev); struct flowi4 fl4; + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; while (1) { - skb = alloc_skb(size + LL_ALLOCATED_SPACE(dev), + skb = alloc_skb(size + hlen + tlen, GFP_ATOMIC | __GFP_NOWARN); if (skb) break; @@ -327,7 +329,7 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size) skb_dst_set(skb, &rt->dst); skb->dev = dev; - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); skb_reset_network_header(skb); pip = ip_hdr(skb); @@ -647,6 +649,7 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc, __be32 group = pmc ? pmc->multiaddr : 0; struct flowi4 fl4; __be32 dst; + int hlen, tlen; if (type == IGMPV3_HOST_MEMBERSHIP_REPORT) return igmpv3_send_report(in_dev, pmc); @@ -661,7 +664,9 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc, if (IS_ERR(rt)) return -1; - skb = alloc_skb(IGMP_SIZE+LL_ALLOCATED_SPACE(dev), GFP_ATOMIC); + hlen = LL_RESERVED_SPACE(dev); + tlen = dev->needed_tailroom; + skb = alloc_skb(IGMP_SIZE + hlen + tlen, GFP_ATOMIC); if (skb == NULL) { ip_rt_put(rt); return -1; @@ -669,7 +674,7 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc, skb_dst_set(skb, &rt->dst); - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); skb_reset_network_header(skb); iph = ip_hdr(skb); diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index 0da2afc..adcac64 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -763,13 +763,15 @@ static void __init ic_bootp_send_if(struct ic_device *d, unsigned long jiffies_d struct sk_buff *skb; struct bootp_pkt *b; struct iphdr *h; + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; /* Allocate packet */ - skb = alloc_skb(sizeof(struct bootp_pkt) + LL_ALLOCATED_SPACE(dev) + 15, + skb = alloc_skb(sizeof(struct bootp_pkt) + hlen + tlen + 15, GFP_KERNEL); if (!skb) return; - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); b = (struct bootp_pkt *) skb_put(skb, sizeof(struct bootp_pkt)); memset(b, 0, sizeof(struct bootp_pkt)); diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 007e2eb..e6c1f0e 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -327,6 +327,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, unsigned int iphlen; int err; struct rtable *rt = *rtp; + int hlen, tlen; if (length > rt->dst.dev->mtu) { ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, @@ -336,12 +337,14 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, if (flags&MSG_PROBE) goto out; + hlen = LL_RESERVED_SPACE(rt->dst.dev); + tlen = rt->dst.dev->needed_tailroom; skb = sock_alloc_send_skb(sk, - length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15, + length + hlen + tlen + 15, flags & MSG_DONTWAIT, &err); if (skb == NULL) goto error; - skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); + skb_reserve(skb, hlen); skb->priority = sk->sk_priority; skb->mark = sk->sk_mark; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 2/6] ipv6: Remove all uses of LL_ALLOCATED_SPACE 2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu 2011-11-18 12:20 ` [PATCH 3/6] net: Remove all uses of LL_ALLOCATED_SPACE Herbert Xu 2011-11-18 12:20 ` [PATCH 1/6] ipv4: " Herbert Xu @ 2011-11-18 12:20 ` Herbert Xu 2011-11-18 12:20 ` [PATCH 5/6] packet: Add needed_tailroom to packet_sendmsg_spkt Herbert Xu ` (3 subsequent siblings) 6 siblings, 0 replies; 31+ messages in thread From: Herbert Xu @ 2011-11-18 12:20 UTC (permalink / raw) To: David Miller, eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras ipv6: Remove all uses of LL_ALLOCATED_SPACE The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the alignment to the sum of needed_headroom and needed_tailroom. As the amount that is then reserved for head room is needed_headroom with alignment, this means that the tail room left may be too small. This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6 with the macro LL_RESERVED_SPACE and direct reference to needed_tailroom. This also fixes the problem with needed_headroom changing between allocating the skb and reserving the head room. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/ipv6/ip6_output.c | 8 ++++++-- net/ipv6/mcast.c | 12 ++++++++---- net/ipv6/ndisc.c | 13 +++++++++---- net/ipv6/raw.c | 6 ++++-- 4 files changed, 27 insertions(+), 12 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 84d0bd5..68ef97f 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -631,6 +631,7 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)) struct ipv6hdr *tmp_hdr; struct frag_hdr *fh; unsigned int mtu, hlen, left, len; + int hroom, troom; __be32 frag_id = 0; int ptr, offset = 0, err=0; u8 *prevhdr, nexthdr = 0; @@ -797,6 +798,8 @@ slow_path: */ *prevhdr = NEXTHDR_FRAGMENT; + hroom = LL_RESERVED_SPACE(rt->dst.dev); + troom = rt->dst.dev->needed_tailroom; /* * Keep copying data until we run out. @@ -815,7 +818,8 @@ slow_path: * Allocate buffer. */ - if ((frag = alloc_skb(len+hlen+sizeof(struct frag_hdr)+LL_ALLOCATED_SPACE(rt->dst.dev), GFP_ATOMIC)) == NULL) { + if ((frag = alloc_skb(len + hlen + sizeof(struct frag_hdr) + + hroom + troom, GFP_ATOMIC)) == NULL) { NETDEBUG(KERN_INFO "IPv6: frag: no memory for new fragment!\n"); IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_FRAGFAILS); @@ -828,7 +832,7 @@ slow_path: */ ip6_copy_metadata(frag, skb); - skb_reserve(frag, LL_RESERVED_SPACE(rt->dst.dev)); + skb_reserve(frag, hroom); skb_put(frag, len + hlen + sizeof(struct frag_hdr)); skb_reset_network_header(frag); fh = (struct frag_hdr *)(skb_network_header(frag) + hlen); diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index ee7839f..7b94beb 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1343,13 +1343,15 @@ static struct sk_buff *mld_newpack(struct net_device *dev, int size) struct mld2_report *pmr; struct in6_addr addr_buf; const struct in6_addr *saddr; + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; int err; u8 ra[8] = { IPPROTO_ICMPV6, 0, IPV6_TLV_ROUTERALERT, 2, 0, 0, IPV6_TLV_PADN, 0 }; /* we assume size > sizeof(ra) here */ - size += LL_ALLOCATED_SPACE(dev); + size += hlen + tlen; /* limit our allocations to order-0 page */ size = min_t(int, size, SKB_MAX_ORDER(0, 0)); skb = sock_alloc_send_skb(sk, size, 1, &err); @@ -1357,7 +1359,7 @@ static struct sk_buff *mld_newpack(struct net_device *dev, int size) if (!skb) return NULL; - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); if (ipv6_get_lladdr(dev, &addr_buf, IFA_F_TENTATIVE)) { /* <draft-ietf-magma-mld-source-05.txt>: @@ -1723,6 +1725,8 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type) struct mld_msg *hdr; const struct in6_addr *snd_addr, *saddr; struct in6_addr addr_buf; + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; int err, len, payload_len, full_len; u8 ra[8] = { IPPROTO_ICMPV6, 0, IPV6_TLV_ROUTERALERT, 2, 0, 0, @@ -1744,7 +1748,7 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type) IPSTATS_MIB_OUT, full_len); rcu_read_unlock(); - skb = sock_alloc_send_skb(sk, LL_ALLOCATED_SPACE(dev) + full_len, 1, &err); + skb = sock_alloc_send_skb(sk, hlen + tlen + full_len, 1, &err); if (skb == NULL) { rcu_read_lock(); @@ -1754,7 +1758,7 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type) return; } - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); if (ipv6_get_lladdr(dev, &addr_buf, IFA_F_TENTATIVE)) { /* <draft-ietf-magma-mld-source-05.txt>: diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 44e5b7f..5f191c4 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -446,6 +446,8 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev, struct sock *sk = net->ipv6.ndisc_sk; struct sk_buff *skb; struct icmp6hdr *hdr; + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; int len; int err; u8 *opt; @@ -459,7 +461,7 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev, skb = sock_alloc_send_skb(sk, (MAX_HEADER + sizeof(struct ipv6hdr) + - len + LL_ALLOCATED_SPACE(dev)), + len + hlen + tlen), 1, &err); if (!skb) { ND_PRINTK0(KERN_ERR @@ -468,7 +470,7 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev, return NULL; } - skb_reserve(skb, LL_RESERVED_SPACE(dev)); + skb_reserve(skb, hlen); ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len); skb->transport_header = skb->tail; @@ -1533,6 +1535,7 @@ void ndisc_send_redirect(struct sk_buff *skb, struct neighbour *neigh, struct inet6_dev *idev; struct flowi6 fl6; u8 *opt; + int hlen, tlen; int rd_len; int err; u8 ha_buf[MAX_ADDR_LEN], *ha = NULL; @@ -1590,9 +1593,11 @@ void ndisc_send_redirect(struct sk_buff *skb, struct neighbour *neigh, rd_len &= ~0x7; len += rd_len; + hlen = LL_RESERVED_SPACE(dev); + tlen = dev->needed_tailroom; buff = sock_alloc_send_skb(sk, (MAX_HEADER + sizeof(struct ipv6hdr) + - len + LL_ALLOCATED_SPACE(dev)), + len + hlen + tlen), 1, &err); if (buff == NULL) { ND_PRINTK0(KERN_ERR @@ -1601,7 +1606,7 @@ void ndisc_send_redirect(struct sk_buff *skb, struct neighbour *neigh, goto release; } - skb_reserve(buff, LL_RESERVED_SPACE(dev)); + skb_reserve(buff, hlen); ip6_nd_hdr(sk, buff, dev, &saddr_buf, &ipv6_hdr(skb)->saddr, IPPROTO_ICMPV6, len); diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 331af3b..638fb2a 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -610,6 +610,8 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, struct sk_buff *skb; int err; struct rt6_info *rt = (struct rt6_info *)*dstp; + int hlen = LL_RESERVED_SPACE(rt->dst.dev); + int tlen = rt->dst.dev->needed_tailroom; if (length > rt->dst.dev->mtu) { ipv6_local_error(sk, EMSGSIZE, fl6, rt->dst.dev->mtu); @@ -619,11 +621,11 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, goto out; skb = sock_alloc_send_skb(sk, - length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15, + length + hlen + tlen + 15, flags & MSG_DONTWAIT, &err); if (skb == NULL) goto error; - skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); + skb_reserve(skb, hlen); skb->priority = sk->sk_priority; skb->mark = sk->sk_mark; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 5/6] packet: Add needed_tailroom to packet_sendmsg_spkt 2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu ` (2 preceding siblings ...) 2011-11-18 12:20 ` [PATCH 2/6] ipv6: " Herbert Xu @ 2011-11-18 12:20 ` Herbert Xu 2011-11-18 12:20 ` [PATCH 4/6] net: Remove LL_ALLOCATED_SPACE Herbert Xu ` (2 subsequent siblings) 6 siblings, 0 replies; 31+ messages in thread From: Herbert Xu @ 2011-11-18 12:20 UTC (permalink / raw) To: David Miller, eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras packet: Add needed_tailroom to packet_sendmsg_spkt While auditing LL_ALLOCATED_SPACE I noticed that packet_sendmsg_spkt did not include needed_tailroom when allocating an skb. This isn't a fatal error as we should always tolerate inadequate tail room but it isn't optimal. This patch fixes that. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/packet/af_packet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 71c1a75..0da505c 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1499,10 +1499,11 @@ retry: if (!skb) { size_t reserved = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; unsigned int hhlen = dev->header_ops ? dev->hard_header_len : 0; rcu_read_unlock(); - skb = sock_wmalloc(sk, len + reserved, 0, GFP_KERNEL); + skb = sock_wmalloc(sk, len + reserved + tlen, 0, GFP_KERNEL); if (skb == NULL) return -ENOBUFS; /* FIXME: Save some space for broken drivers that write a hard ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 4/6] net: Remove LL_ALLOCATED_SPACE 2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu ` (3 preceding siblings ...) 2011-11-18 12:20 ` [PATCH 5/6] packet: Add needed_tailroom to packet_sendmsg_spkt Herbert Xu @ 2011-11-18 12:20 ` Herbert Xu 2011-11-18 12:20 ` [PATCH 6/6] ip_gre: Set needed_headroom dynamically again Herbert Xu 2011-11-18 20:01 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment David Miller 6 siblings, 0 replies; 31+ messages in thread From: Herbert Xu @ 2011-11-18 12:20 UTC (permalink / raw) To: David Miller, eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras net: Remove LL_ALLOCATED_SPACE The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the alignment to the sum of needed_headroom and needed_tailroom. As the amount that is then reserved for head room is needed_headroom with alignment, this means that the tail room left may be too small. Now that all uses of LL_ALLOCATED_SPACE have been removed, this patch finally removes the macro itself. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- include/linux/netdevice.h | 5 ----- 1 file changed, 5 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cbeb586..1568c9d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -272,16 +272,11 @@ struct hh_cache { * * We could use other alignment values, but we must maintain the * relationship HH alignment <= LL alignment. - * - * LL_ALLOCATED_SPACE also takes into account the tailroom the device - * may need. */ #define LL_RESERVED_SPACE(dev) \ ((((dev)->hard_header_len+(dev)->needed_headroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) #define LL_RESERVED_SPACE_EXTRA(dev,extra) \ ((((dev)->hard_header_len+(dev)->needed_headroom+(extra))&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) -#define LL_ALLOCATED_SPACE(dev) \ - ((((dev)->hard_header_len+(dev)->needed_headroom+(dev)->needed_tailroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) struct header_ops { int (*create) (struct sk_buff *skb, struct net_device *dev, ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 6/6] ip_gre: Set needed_headroom dynamically again 2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu ` (4 preceding siblings ...) 2011-11-18 12:20 ` [PATCH 4/6] net: Remove LL_ALLOCATED_SPACE Herbert Xu @ 2011-11-18 12:20 ` Herbert Xu 2011-11-18 20:01 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment David Miller 6 siblings, 0 replies; 31+ messages in thread From: Herbert Xu @ 2011-11-18 12:20 UTC (permalink / raw) To: David Miller, eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras ip_gre: Set needed_headroom dynamically again Now that all needed_headroom users have been fixed up so that we can safely increase needed_headroom, this patch restore the dynamic update of needed_headroom. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- net/ipv4/ip_gre.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index d55110e..d7bb94c 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -835,6 +835,8 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev if (skb_headroom(skb) < max_headroom || skb_shared(skb)|| (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom); + if (max_headroom > dev->needed_headroom) + dev->needed_headroom = max_headroom; if (!new_skb) { ip_rt_put(rt); dev->stats.tx_dropped++; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment 2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu ` (5 preceding siblings ...) 2011-11-18 12:20 ` [PATCH 6/6] ip_gre: Set needed_headroom dynamically again Herbert Xu @ 2011-11-18 20:01 ` David Miller 6 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2011-11-18 20:01 UTC (permalink / raw) To: herbert; +Cc: eric.dumazet, evonlanthen, linux-kernel, netdev, timo.teras From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Fri, 18 Nov 2011 20:18:32 +0800 > On Tue, Oct 25, 2011 at 11:12:04PM -0400, David Miller wrote: >> From: Herbert Xu <herbert@gondor.hengli.com.au> >> Date: Tue, 25 Oct 2011 13:54:25 +0200 >> >> > So I'm going to get rid of LL_ALLOCATED_SPACE completely and >> > replace it with explicit references to the tailroom as it doesn't >> > need the alignment anyway (The headroom needs alignment since >> > we use it to ensure the head is aligned). >> >> Ok. > > Here are the patches that do this. I also picked up one spot > that should have used LL_ALLOCATED_SPACE but did not (see 2nd > last patch). This all looks good to me, applied to net-next, thanks! ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2011-11-18 20:01 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+0Zf5Be5JUSvssH9CrDQq=jtETZr=veAPC7jriM8B-FZKbbpg@mail.gmail.com>
2011-10-14 14:57 ` PROBLEM: System call 'sendmsg' of process ospfd (quagga) causes kernel oops Eric Dumazet
2011-10-17 7:16 ` Elmar Vonlanthen
2011-10-18 2:30 ` Eric Dumazet
2011-10-18 9:34 ` Herbert Xu
2011-10-18 10:01 ` Eric Dumazet
2011-10-18 10:05 ` Herbert Xu
2011-10-18 10:23 ` Eric Dumazet
2011-10-18 10:45 ` Herbert Xu
2011-10-18 11:37 ` Eric Dumazet
2011-10-18 11:49 ` Herbert Xu
2011-10-18 12:56 ` Eric Dumazet
2011-10-18 13:45 ` Herbert Xu
2011-10-19 7:09 ` David Miller
2011-10-19 7:18 ` Eric Dumazet
2011-10-19 7:30 ` David Miller
2011-10-19 7:52 ` Eric Dumazet
2011-10-19 8:02 ` David Miller
2011-10-19 8:08 ` Herbert Xu
2011-10-20 9:30 ` David Miller
2011-10-20 9:35 ` Herbert Xu
2011-10-20 20:21 ` David Miller
2011-10-25 11:54 ` Herbert Xu
2011-10-26 3:12 ` David Miller
2011-11-18 12:18 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment Herbert Xu
2011-11-18 12:20 ` [PATCH 3/6] net: Remove all uses of LL_ALLOCATED_SPACE Herbert Xu
2011-11-18 12:20 ` [PATCH 1/6] ipv4: " Herbert Xu
2011-11-18 12:20 ` [PATCH 2/6] ipv6: " Herbert Xu
2011-11-18 12:20 ` [PATCH 5/6] packet: Add needed_tailroom to packet_sendmsg_spkt Herbert Xu
2011-11-18 12:20 ` [PATCH 4/6] net: Remove LL_ALLOCATED_SPACE Herbert Xu
2011-11-18 12:20 ` [PATCH 6/6] ip_gre: Set needed_headroom dynamically again Herbert Xu
2011-11-18 20:01 ` [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).