* Re: unable to handle kernel NULL pointer dereference in skb_dequeue [not found] <0fe401cb92e7$85ba2260$912e6720$@si> @ 2010-12-03 13:09 ` Eric Dumazet 2010-12-03 14:37 ` Andrej Ota 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2010-12-03 13:09 UTC (permalink / raw) To: Toshio; +Cc: linux-kernel, gvs, Rami Rosen, netdev Le vendredi 03 décembre 2010 à 13:42 +0100, Toshio a écrit : > I have also hit on Bug 20292 (https://bugzilla.kernel.org/show_bug.cgi?id=20292) in final 2.6.36. After investigating changes made between 2.6.35.4, which worked, and 2.6.36 which started oopsing, I think the problem was in double freeing of skb caused by change of return value for __pppoe_xmit in case of errors. > > As it turned out this might be the cause of random BUG reports throught the kernel, whenever something touched skb. Most common BUG with my use case happened at skb_dequeue: > CC netdev and Rami Rosen > 00000060 <skb_dequeue>: > 60: 53 push %ebx > 61: 89 c2 mov %eax,%edx > 63: 9c pushf > 64: 59 pop %ecx > 65: fa cli > 66: 8b 00 mov (%eax),%eax > 68: 39 c2 cmp %eax,%edx > 6a: 74 24 je 90 <skb_dequeue+0x30> > 6c: 85 c0 test %eax,%eax > 6e: 74 1a je 8a <skb_dequeue+0x2a> > 70: ff 4a 08 decl 0x8(%edx) > 73: 8b 18 mov (%eax),%ebx > 75: c7 00 00 00 00 00 movl $0x0,(%eax) > 7b: 8b 50 04 mov 0x4(%eax),%edx > 7e: c7 40 04 00 00 00 00 movl $0x0,0x4(%eax) > 85: 89 53 04 mov %edx,0x4(%ebx) > 88:* 89 1a mov %ebx,(%edx) > 8a: 51 push %ecx > 8b: 9d popf > 8c: 5b pop %ebx > 8d: c3 ret > 8e: 66 90 xchg %ax,%ax > 90: b8 00 00 00 00 mov $0x0,%eax > 95: eb f3 jmp 8a <skb_dequeue+0x2a> > 97: 89 f6 mov %esi,%esi > 99: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi > > > This location corresponds to line "next = next->next" from inlined __skb_dequeue(). The only reason BUG could happen here is something overwriting or otherwise corrupting skb list. > > Patch that works for me is below. Now I only hope I haven't (re)introduced a memory leak... > > I am not subscribed to LKML, so please reply-to-all if you need to contact me. > > ----------------------------------------------------------------------------- > --- linux-2.6.36/drivers/net/pppoe.c 2010-10-20 22:30:22.000000000 +0200 > +++ linux-2.6.36.toshio/drivers/net/pppoe.c 2010-12-03 13:11:56.000000000 +0100 > @@ -924,8 +924,10 @@ > /* Copy the data if there is no space for the header or if it's > * read-only. > */ > - if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len)) > + if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len)) { > + kfree_skb(skb); > goto abort; > + } > > __skb_push(skb, sizeof(*ph)); > skb_reset_network_header(skb); > @@ -947,7 +949,6 @@ > return 1; > > abort: > - kfree_skb(skb); > return 0; > } > Problem comes from commit 55c95e738da85 (fix return value of __pppoe_xmit() method) I am not sure patch is OK ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: unable to handle kernel NULL pointer dereference in skb_dequeue 2010-12-03 13:09 ` unable to handle kernel NULL pointer dereference in skb_dequeue Eric Dumazet @ 2010-12-03 14:37 ` Andrej Ota 2010-12-03 14:46 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Andrej Ota @ 2010-12-03 14:37 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, gvs, Rami Rosen, netdev >> Patch that works for me is below. Now I only hope I haven't >> (re)introduced a memory leak... > Problem comes from commit 55c95e738da85 (fix return value of > __pppoe_xmit() method) > > I am not sure patch is OK Me neither. That's why I wrote "works for me". All I dare say is that it works better than current code and is probably no worse than it was before above mentioned commit. Apart from that, there is no point in having return value for __pppoe_xmit if return value isn't needed. Easiest way of triggering this BUG is by terminating PPPoE on the server side, which then hits "if (!dev) { goto abort; }". This in turn calls "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug. I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have just caused another BUG. However, if I read file comments at the top, I see a comment from 19/07/01 stating that I have to delete original skb if code succeeds and never delete it on failure. About the skb copy mentioned in the same comment, I don't know. 2001 was many commits ago. Andrej Ota. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: unable to handle kernel NULL pointer dereference in skb_dequeue 2010-12-03 14:37 ` Andrej Ota @ 2010-12-03 14:46 ` Eric Dumazet 2010-12-03 22:07 ` Jarek Poplawski ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Eric Dumazet @ 2010-12-03 14:46 UTC (permalink / raw) To: Andrej Ota; +Cc: linux-kernel, gvs, Rami Rosen, netdev Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit : > >> Patch that works for me is below. Now I only hope I haven't > >> (re)introduced a memory leak... > > > Problem comes from commit 55c95e738da85 (fix return value of > > __pppoe_xmit() method) > > > > I am not sure patch is OK > > > Me neither. That's why I wrote "works for me". All I dare say is that it > works better than current code and is probably no worse than it was before > above mentioned commit. Apart from that, there is no point in having return > value for __pppoe_xmit if return value isn't needed. > > Easiest way of triggering this BUG is by terminating PPPoE on the server > side, which then hits "if (!dev) { goto abort; }". This in turn calls > "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then > goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug. > > I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have > just caused another BUG. However, if I read file comments at the top, I see > a comment from 19/07/01 stating that I have to delete original skb if code > succeeds and never delete it on failure. About the skb copy mentioned in > the same comment, I don't know. 2001 was many commits ago. Well, all I wanted to say was that _I_ was not sure, but probably other network guys have a better diagnostic. Rami, could you re-explain the rationale of your patch ? Thanks ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: unable to handle kernel NULL pointer dereference in skb_dequeue 2010-12-03 14:46 ` Eric Dumazet @ 2010-12-03 22:07 ` Jarek Poplawski 2010-12-03 22:16 ` Denys Fedoryshchenko 2010-12-10 19:51 ` Denys Fedoryshchenko 2 siblings, 0 replies; 8+ messages in thread From: Jarek Poplawski @ 2010-12-03 22:07 UTC (permalink / raw) To: Eric Dumazet; +Cc: Andrej Ota, linux-kernel, gvs, Rami Rosen, netdev Eric Dumazet wrote: > Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit : >>>> Patch that works for me is below. Now I only hope I haven't >>>> (re)introduced a memory leak... >> >>> Problem comes from commit 55c95e738da85 (fix return value of >>> __pppoe_xmit() method) ... > Rami, could you re-explain the rationale of your patch ? I guess we could revert it in the meantime: it seems there might be at least three threads (including bugzilla) with this problem. Jarek P. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: unable to handle kernel NULL pointer dereference in skb_dequeue 2010-12-03 14:46 ` Eric Dumazet 2010-12-03 22:07 ` Jarek Poplawski @ 2010-12-03 22:16 ` Denys Fedoryshchenko 2010-12-10 19:51 ` Denys Fedoryshchenko 2 siblings, 0 replies; 8+ messages in thread From: Denys Fedoryshchenko @ 2010-12-03 22:16 UTC (permalink / raw) To: Eric Dumazet; +Cc: Andrej Ota, linux-kernel, gvs, Rami Rosen, netdev On Friday 03 December 2010 16:46:35 Eric Dumazet wrote: > Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit : > > >> Patch that works for me is below. Now I only hope I haven't > > >> (re)introduced a memory leak... > > > > > > Problem comes from commit 55c95e738da85 (fix return value of > > > __pppoe_xmit() method) > > > > > > I am not sure patch is OK > > > > Me neither. That's why I wrote "works for me". All I dare say is that it > > works better than current code and is probably no worse than it was > > before above mentioned commit. Apart from that, there is no point in > > having return value for __pppoe_xmit if return value isn't needed. > > > > Easiest way of triggering this BUG is by terminating PPPoE on the server > > side, which then hits "if (!dev) { goto abort; }". This in turn calls > > "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then > > goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug. > > > > I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have > > just caused another BUG. However, if I read file comments at the top, I > > see a comment from 19/07/01 stating that I have to delete original skb > > if code succeeds and never delete it on failure. About the skb copy > > mentioned in the same comment, I don't know. 2001 was many commits ago. > > Well, all I wanted to say was that _I_ was not sure, but probably other > network guys have a better diagnostic. > > Rami, could you re-explain the rationale of your patch ? > > Thanks > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html This patch seems fixed my issue (Re: 2.6.35->2.6.36 regression, vanilla kernel panic, ppp or hrtimers crashing), tested now. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: unable to handle kernel NULL pointer dereference in skb_dequeue 2010-12-03 14:46 ` Eric Dumazet 2010-12-03 22:07 ` Jarek Poplawski 2010-12-03 22:16 ` Denys Fedoryshchenko @ 2010-12-10 19:51 ` Denys Fedoryshchenko 2010-12-10 20:18 ` David Miller 2 siblings, 1 reply; 8+ messages in thread From: Denys Fedoryshchenko @ 2010-12-10 19:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: Andrej Ota, linux-kernel, gvs, Rami Rosen, netdev On Friday 03 December 2010 16:46:35 Eric Dumazet wrote: > Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit : > > >> Patch that works for me is below. Now I only hope I haven't > > >> (re)introduced a memory leak... > > > > > > Problem comes from commit 55c95e738da85 (fix return value of > > > __pppoe_xmit() method) > > > > > > I am not sure patch is OK > > > > Me neither. That's why I wrote "works for me". All I dare say is that it > > works better than current code and is probably no worse than it was > > before above mentioned commit. Apart from that, there is no point in > > having return value for __pppoe_xmit if return value isn't needed. > > > > Easiest way of triggering this BUG is by terminating PPPoE on the server > > side, which then hits "if (!dev) { goto abort; }". This in turn calls > > "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then > > goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug. > > > > I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have > > just caused another BUG. However, if I read file comments at the top, I > > see a comment from 19/07/01 stating that I have to delete original skb > > if code succeeds and never delete it on failure. About the skb copy > > mentioned in the same comment, I don't know. 2001 was many commits ago. > > Well, all I wanted to say was that _I_ was not sure, but probably other > network guys have a better diagnostic. > > Rami, could you re-explain the rationale of your patch ? > > Thanks > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Is there any plans to queue any patch to stable? pppoe is almost dead in 2.6.36.* ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: unable to handle kernel NULL pointer dereference in skb_dequeue 2010-12-10 19:51 ` Denys Fedoryshchenko @ 2010-12-10 20:18 ` David Miller 2010-12-10 21:30 ` Jarek Poplawski 0 siblings, 1 reply; 8+ messages in thread From: David Miller @ 2010-12-10 20:18 UTC (permalink / raw) To: nuclearcat; +Cc: eric.dumazet, andrej, linux-kernel, gvs, ramirose, netdev From: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Date: Fri, 10 Dec 2010 21:51:04 +0200 > On Friday 03 December 2010 16:46:35 Eric Dumazet wrote: >> Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit : >> > >> Patch that works for me is below. Now I only hope I haven't >> > >> (re)introduced a memory leak... >> > > >> > > Problem comes from commit 55c95e738da85 (fix return value of >> > > __pppoe_xmit() method) >> > > >> > > I am not sure patch is OK >> > >> > Me neither. That's why I wrote "works for me". All I dare say is that it >> > works better than current code and is probably no worse than it was >> > before above mentioned commit. Apart from that, there is no point in >> > having return value for __pppoe_xmit if return value isn't needed. >> > >> > Easiest way of triggering this BUG is by terminating PPPoE on the server >> > side, which then hits "if (!dev) { goto abort; }". This in turn calls >> > "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then >> > goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug. >> > >> > I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have >> > just caused another BUG. However, if I read file comments at the top, I >> > see a comment from 19/07/01 stating that I have to delete original skb >> > if code succeeds and never delete it on failure. About the skb copy >> > mentioned in the same comment, I don't know. 2001 was many commits ago. >> >> Well, all I wanted to say was that _I_ was not sure, but probably other >> network guys have a better diagnostic. >> >> Rami, could you re-explain the rationale of your patch ? >> >> Thanks >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > Is there any plans to queue any patch to stable? > pppoe is almost dead in 2.6.36.* I'll deal with it for -stable once I evaluate this patch for upstream, which I haven't even gotten to yet. When people bark about -stable this and -stable that, it just takes more time away from me actually getting through all the patches. If it causes a crash, I know it should go to stable and I'll take care of it. So there is no need to make an explicit note or query about it. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: unable to handle kernel NULL pointer dereference in skb_dequeue 2010-12-10 20:18 ` David Miller @ 2010-12-10 21:30 ` Jarek Poplawski 0 siblings, 0 replies; 8+ messages in thread From: Jarek Poplawski @ 2010-12-10 21:30 UTC (permalink / raw) To: David Miller Cc: nuclearcat, eric.dumazet, andrej, linux-kernel, gvs, ramirose, netdev David Miller wrote: > From: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> > Date: Fri, 10 Dec 2010 21:51:04 +0200 ... >> Is there any plans to queue any patch to stable? >> pppoe is almost dead in 2.6.36.* > > I'll deal with it for -stable once I evaluate this patch for upstream, > which I haven't even gotten to yet. This patch is here: http://patchwork.ozlabs.org/patch/75095/ But IMHO it's buggy for the reason I mentioned. Since Andrej didn't respond yet I'd suggest reverting of the offending commit 55c95e738da85. If you agree with that but e.g. too busy, let me know. Jarek P. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-12-10 21:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <0fe401cb92e7$85ba2260$912e6720$@si>
2010-12-03 13:09 ` unable to handle kernel NULL pointer dereference in skb_dequeue Eric Dumazet
2010-12-03 14:37 ` Andrej Ota
2010-12-03 14:46 ` Eric Dumazet
2010-12-03 22:07 ` Jarek Poplawski
2010-12-03 22:16 ` Denys Fedoryshchenko
2010-12-10 19:51 ` Denys Fedoryshchenko
2010-12-10 20:18 ` David Miller
2010-12-10 21:30 ` Jarek Poplawski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).