netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
@ 2017-10-01 17:21 Stephen Hemminger
  2017-10-02 13:32 ` James Chapman
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2017-10-01 17:21 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev



Begin forwarded message:

Date: Sun, 01 Oct 2017 16:22:33 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]


https://bugzilla.kernel.org/show_bug.cgi?id=197099

            Bug ID: 197099
           Summary: Kernel panic in interrupt [l2tp_ppp]
           Product: Networking
           Version: 2.5
    Kernel Version: 4.8.13-1.el6.elrepo.x86_64
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: stephen@networkplumber.org
          Reporter: svimik@gmail.com
        Regression: No

Created attachment 258685
  --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit  
stacktrace screenshot

Hello!

Getting kernel panics on multiple servers. Since it mentions l2tp_core,
l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
I'm wrong).

Unfortunately I'm still struggling with making kdump work, so the trace
screenshot is all I have at this moment. The only hope is that this stacktrace
means something to the guys that wrote the code.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-01 17:21 Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp] Stephen Hemminger
@ 2017-10-02 13:32 ` James Chapman
  2017-10-02 13:56   ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: James Chapman @ 2017-10-02 13:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

This seems to be a NULL pointer exception caused by tunnel->sock being
NULL at the call to bh_lock_sock() in l2tp_xmit_skb() at
l2tp_core.c:1135.

tunnel->sock is set NULL in l2tp_core's tunnel socket destructor.

At the moment, I don't understand how this happens because
pppol2tp_xmit() does a sock_hold() on the tunnel socket before
l2tp_xmit_skb() is called. I'm still looking at this.

Has this problem only recently started happening?





On 1 October 2017 at 18:21, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
>
> Begin forwarded message:
>
> Date: Sun, 01 Oct 2017 16:22:33 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=197099
>
>             Bug ID: 197099
>            Summary: Kernel panic in interrupt [l2tp_ppp]
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 4.8.13-1.el6.elrepo.x86_64
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: svimik@gmail.com
>         Regression: No
>
> Created attachment 258685
>   --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit
> stacktrace screenshot
>
> Hello!
>
> Getting kernel panics on multiple servers. Since it mentions l2tp_core,
> l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
> I'm wrong).
>
> Unfortunately I'm still struggling with making kdump work, so the trace
> screenshot is all I have at this moment. The only hope is that this stacktrace
> means something to the guys that wrote the code.
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-02 13:32 ` James Chapman
@ 2017-10-02 13:56   ` Eric Dumazet
  2017-10-02 18:35     ` SviMik
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2017-10-02 13:56 UTC (permalink / raw)
  To: James Chapman, svimik; +Cc: Stephen Hemminger, netdev

CC svimik@gmail.com so that he is aware of this netdev thread.

On Mon, 2017-10-02 at 14:32 +0100, James Chapman wrote:
> This seems to be a NULL pointer exception caused by tunnel->sock being
> NULL at the call to bh_lock_sock() in l2tp_xmit_skb() at
> l2tp_core.c:1135.
> 
> tunnel->sock is set NULL in l2tp_core's tunnel socket destructor.
> 
> At the moment, I don't understand how this happens because
> pppol2tp_xmit() does a sock_hold() on the tunnel socket before
> l2tp_xmit_skb() is called. I'm still looking at this.
> 
> Has this problem only recently started happening?
> 
> 
> 
> 
> 
> On 1 October 2017 at 18:21, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> > Begin forwarded message:
> >
> > Date: Sun, 01 Oct 2017 16:22:33 +0000
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: stephen@networkplumber.org
> > Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
> >
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=197099
> >
> >             Bug ID: 197099
> >            Summary: Kernel panic in interrupt [l2tp_ppp]
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 4.8.13-1.el6.elrepo.x86_64
> >           Hardware: x86-64
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: stephen@networkplumber.org
> >           Reporter: svimik@gmail.com
> >         Regression: No
> >
> > Created attachment 258685
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit
> > stacktrace screenshot
> >
> > Hello!
> >
> > Getting kernel panics on multiple servers. Since it mentions l2tp_core,
> > l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
> > I'm wrong).
> >
> > Unfortunately I'm still struggling with making kdump work, so the trace
> > screenshot is all I have at this moment. The only hope is that this stacktrace
> > means something to the guys that wrote the code.
> >
> > --
> > You are receiving this mail because:
> > You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-02 13:56   ` Eric Dumazet
@ 2017-10-02 18:35     ` SviMik
  2017-10-03  7:27       ` James Chapman
  0 siblings, 1 reply; 12+ messages in thread
From: SviMik @ 2017-10-02 18:35 UTC (permalink / raw)
  To: netdev

Hi, James!

No, I'm suffering from kernel panics since I started using 4.x
kernels. See my current collection:
http://svimik.com/hdmmsk1kp1.png
http://svimik.com/hdmmsk2kp2.png
http://svimik.com/hdmmsk2kp3.png
http://svimik.com/hdmmsk2kp4.png
http://svimik.com/hdmmsk2kp5.png
http://svimik.com/hdmmsk7kp1.png

Screenshots are from three different machines, kernels from 4.8.13 to 4.13.4.

2017-10-02 16:56 GMT+03:00 Eric Dumazet <eric.dumazet@gmail.com>:
> CC svimik@gmail.com so that he is aware of this netdev thread.
>
> On Mon, 2017-10-02 at 14:32 +0100, James Chapman wrote:
>> This seems to be a NULL pointer exception caused by tunnel->sock being
>> NULL at the call to bh_lock_sock() in l2tp_xmit_skb() at
>> l2tp_core.c:1135.
>>
>> tunnel->sock is set NULL in l2tp_core's tunnel socket destructor.
>>
>> At the moment, I don't understand how this happens because
>> pppol2tp_xmit() does a sock_hold() on the tunnel socket before
>> l2tp_xmit_skb() is called. I'm still looking at this.
>>
>> Has this problem only recently started happening?
>>
>>
>>
>>
>>
>> On 1 October 2017 at 18:21, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>> >
>> >
>> > Begin forwarded message:
>> >
>> > Date: Sun, 01 Oct 2017 16:22:33 +0000
>> > From: bugzilla-daemon@bugzilla.kernel.org
>> > To: stephen@networkplumber.org
>> > Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
>> >
>> >
>> > https://bugzilla.kernel.org/show_bug.cgi?id=197099
>> >
>> >             Bug ID: 197099
>> >            Summary: Kernel panic in interrupt [l2tp_ppp]
>> >            Product: Networking
>> >            Version: 2.5
>> >     Kernel Version: 4.8.13-1.el6.elrepo.x86_64
>> >           Hardware: x86-64
>> >                 OS: Linux
>> >               Tree: Mainline
>> >             Status: NEW
>> >           Severity: normal
>> >           Priority: P1
>> >          Component: Other
>> >           Assignee: stephen@networkplumber.org
>> >           Reporter: svimik@gmail.com
>> >         Regression: No
>> >
>> > Created attachment 258685
>> >   --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit
>> > stacktrace screenshot
>> >
>> > Hello!
>> >
>> > Getting kernel panics on multiple servers. Since it mentions l2tp_core,
>> > l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
>> > I'm wrong).
>> >
>> > Unfortunately I'm still struggling with making kdump work, so the trace
>> > screenshot is all I have at this moment. The only hope is that this stacktrace
>> > means something to the guys that wrote the code.
>> >
>> > --
>> > You are receiving this mail because:
>> > You are the assignee for the bug.
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-02 18:35     ` SviMik
@ 2017-10-03  7:27       ` James Chapman
  2017-10-04  7:49         ` James Chapman
  2017-10-09 17:15         ` Guillaume Nault
  0 siblings, 2 replies; 12+ messages in thread
From: James Chapman @ 2017-10-03  7:27 UTC (permalink / raw)
  To: SviMik; +Cc: netdev

On 2 October 2017 at 19:35, SviMik <svimik@gmail.com> wrote:
> Hi, James!
>
> No, I'm suffering from kernel panics since I started using 4.x
> kernels.
It's interesting that you are seeing l2tp issues since switching to
4.x kernels. Are you able to try earlier kernels to find the latest
version that works? I'm curious whether things broke at v3.15.

> See my current collection:
> http://svimik.com/hdmmsk1kp1.png
This one is another crash implicating l2tp socket shutdown, this time
when the session pppol2tp socket is closed. Unfortunately the
screenshot doesn't show the full oops text. I'll investigate this one
too if you can get a full oops capture.

> http://svimik.com/hdmmsk2kp2.png
> http://svimik.com/hdmmsk2kp3.png
These are both the same crash as the oops of this bug report.

> http://svimik.com/hdmmsk2kp4.png
> http://svimik.com/hdmmsk2kp5.png
These are truncated oops.

> http://svimik.com/hdmmsk7kp1.png
Same crash as hdmmsk1kp1.png

> Screenshots are from three different machines, kernels from 4.8.13 to 4.13.4.

For capturing complete oops messages, have you tried setting up
netconsole? You might also find the full text in the syslog on reboot.

> 2017-10-02 16:56 GMT+03:00 Eric Dumazet <eric.dumazet@gmail.com>:
>> CC svimik@gmail.com so that he is aware of this netdev thread.
>>
>> On Mon, 2017-10-02 at 14:32 +0100, James Chapman wrote:
>>> This seems to be a NULL pointer exception caused by tunnel->sock being
>>> NULL at the call to bh_lock_sock() in l2tp_xmit_skb() at
>>> l2tp_core.c:1135.
>>>
>>> tunnel->sock is set NULL in l2tp_core's tunnel socket destructor.
>>>
>>> At the moment, I don't understand how this happens because
>>> pppol2tp_xmit() does a sock_hold() on the tunnel socket before
>>> l2tp_xmit_skb() is called. I'm still looking at this.
>>>
>>> Has this problem only recently started happening?
>>>
>>>
>>>
>>>
>>>
>>> On 1 October 2017 at 18:21, Stephen Hemminger
>>> <stephen@networkplumber.org> wrote:
>>> >
>>> >
>>> > Begin forwarded message:
>>> >
>>> > Date: Sun, 01 Oct 2017 16:22:33 +0000
>>> > From: bugzilla-daemon@bugzilla.kernel.org
>>> > To: stephen@networkplumber.org
>>> > Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
>>> >
>>> >
>>> > https://bugzilla.kernel.org/show_bug.cgi?id=197099
>>> >
>>> >             Bug ID: 197099
>>> >            Summary: Kernel panic in interrupt [l2tp_ppp]
>>> >            Product: Networking
>>> >            Version: 2.5
>>> >     Kernel Version: 4.8.13-1.el6.elrepo.x86_64
>>> >           Hardware: x86-64
>>> >                 OS: Linux
>>> >               Tree: Mainline
>>> >             Status: NEW
>>> >           Severity: normal
>>> >           Priority: P1
>>> >          Component: Other
>>> >           Assignee: stephen@networkplumber.org
>>> >           Reporter: svimik@gmail.com
>>> >         Regression: No
>>> >
>>> > Created attachment 258685
>>> >   --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit
>>> > stacktrace screenshot
>>> >
>>> > Hello!
>>> >
>>> > Getting kernel panics on multiple servers. Since it mentions l2tp_core,
>>> > l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
>>> > I'm wrong).
>>> >
>>> > Unfortunately I'm still struggling with making kdump work, so the trace
>>> > screenshot is all I have at this moment. The only hope is that this stacktrace
>>> > means something to the guys that wrote the code.
>>> >
>>> > --
>>> > You are receiving this mail because:
>>> > You are the assignee for the bug.
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-03  7:27       ` James Chapman
@ 2017-10-04  7:49         ` James Chapman
  2017-10-04 10:33           ` Guillaume Nault
  2017-10-06  4:45           ` SviMik
  2017-10-09 17:15         ` Guillaume Nault
  1 sibling, 2 replies; 12+ messages in thread
From: James Chapman @ 2017-10-04  7:49 UTC (permalink / raw)
  To: SviMik; +Cc: netdev, Guillaume Nault

On 3 October 2017 at 08:27, James Chapman <jchapman@katalix.com> wrote:
> On 2 October 2017 at 19:35, SviMik <svimik@gmail.com> wrote:
>> Hi, James!
>>
>> No, I'm suffering from kernel panics since I started using 4.x
>> kernels.
> It's interesting that you are seeing l2tp issues since switching to
> 4.x kernels. Are you able to try earlier kernels to find the latest
> version that works? I'm curious whether things broke at v3.15.

It's possible that this may be fixed by a patch that is already
upstream and merged for v4.14. The fix is from Guillaume Nault:

f3c66d4 l2tp: prevent creation of sessions on terminated tunnels

If it's possible that the L2TP server may try to create a session in a
tunnel that is being closed, this bug would be exposed.

Guillaume's fix isn't yet pushed to stable releases. Are you able to
try a v4.14-rc build?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-04  7:49         ` James Chapman
@ 2017-10-04 10:33           ` Guillaume Nault
  2017-10-06  4:45           ` SviMik
  1 sibling, 0 replies; 12+ messages in thread
From: Guillaume Nault @ 2017-10-04 10:33 UTC (permalink / raw)
  To: James Chapman; +Cc: SviMik, netdev

On Wed, Oct 04, 2017 at 08:49:51AM +0100, James Chapman wrote:
> On 3 October 2017 at 08:27, James Chapman <jchapman@katalix.com> wrote:
> > On 2 October 2017 at 19:35, SviMik <svimik@gmail.com> wrote:
> >> Hi, James!
> >>
> >> No, I'm suffering from kernel panics since I started using 4.x
> >> kernels.
> > It's interesting that you are seeing l2tp issues since switching to
> > 4.x kernels. Are you able to try earlier kernels to find the latest
> > version that works? I'm curious whether things broke at v3.15.
> 
> It's possible that this may be fixed by a patch that is already
> upstream and merged for v4.14. The fix is from Guillaume Nault:
> 
> f3c66d4 l2tp: prevent creation of sessions on terminated tunnels
> 
> If it's possible that the L2TP server may try to create a session in a
> tunnel that is being closed, this bug would be exposed.
>
Yes, I think this patch is worth a try. In the case of sessions created
on a dead tunnel, I wouldn't have expected the xmit path to even reach
l2tp_xmit_skb() though (that's certainly possible, but the timing
constraints look a bit hard to reach).


BTW, I started working on this issue a few days ago and came to the
same conclusions as the ones you posted in your previous replies. Given
that we were in line with the analysis, I've switched to the PPP bug
reported by Beniamino (https://www.spinics.net/lists/netdev/msg458002.html).
I'll move back to L2TP as soon as possible.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-04  7:49         ` James Chapman
  2017-10-04 10:33           ` Guillaume Nault
@ 2017-10-06  4:45           ` SviMik
  2017-10-06  9:52             ` James Chapman
  1 sibling, 1 reply; 12+ messages in thread
From: SviMik @ 2017-10-06  4:45 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev, Guillaume Nault

2017-10-04 10:49 GMT+03:00 James Chapman <jchapman@katalix.com>:
> On 3 October 2017 at 08:27, James Chapman <jchapman@katalix.com> wrote:
>> For capturing complete oops messages, have you tried setting up
>> netconsole? You might also find the full text in the syslog on reboot.

Why, thank you! You've just told me that Santa Claus exists :)
I've set up netconsole on 93 of my servers, and hope starting from
tomorrow I'll have more pretty kernel panic reports, and get them even
from servers where I had never had a chance to capture the console
before.

>> It's interesting that you are seeing l2tp issues since switching to
>> 4.x kernels. Are you able to try earlier kernels to find the latest
>> version that works? I'm curious whether things broke at v3.15.

I'll try, but it will take some time to grab enough statistics. The
bug is relatively rare, only few panics per day on the whole bunch of
93 servers.

> It's possible that this may be fixed by a patch that is already
> upstream and merged for v4.14. The fix is from Guillaume Nault:
>
> f3c66d4 l2tp: prevent creation of sessions on terminated tunnels
>
> If it's possible that the L2TP server may try to create a session in a
> tunnel that is being closed, this bug would be exposed.
>
> Guillaume's fix isn't yet pushed to stable releases. Are you able to
> try a v4.14-rc build?

Sorry, I'm not skilled enough to build a kernel for CentOS on my own.
Will wait till it appears in elrepo. The latest version there is
currently 4.13.5. Meanwhile I'll try to switch to 3.10 and see how it
works.

I have also captured few more kernel panics in the last few days.
Please see if they are related to this bug:
http://svimik.com/hdmmsk1kp2.png
http://svimik.com/hdmmsk1kp3.png
http://svimik.com/hdmmsk1kp4.png
http://svimik.com/hdmmsk2kp6.png

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-06  4:45           ` SviMik
@ 2017-10-06  9:52             ` James Chapman
  2017-10-07 12:09               ` SviMik
  0 siblings, 1 reply; 12+ messages in thread
From: James Chapman @ 2017-10-06  9:52 UTC (permalink / raw)
  To: SviMik; +Cc: netdev, Guillaume Nault

On 6 October 2017 at 05:45, SviMik <svimik@gmail.com> wrote:
> 2017-10-04 10:49 GMT+03:00 James Chapman <jchapman@katalix.com>:
>> On 3 October 2017 at 08:27, James Chapman <jchapman@katalix.com> wrote:
>>> For capturing complete oops messages, have you tried setting up
>>> netconsole? You might also find the full text in the syslog on reboot.
>
> Why, thank you! You've just told me that Santa Claus exists :)

You're welcome. Heh, my wife says I have a few more grey hairs and I
don't shave as often as I should. :)

> I've set up netconsole on 93 of my servers, and hope starting from
> tomorrow I'll have more pretty kernel panic reports, and get them even
> from servers where I had never had a chance to capture the console
> before.
>
>>> It's interesting that you are seeing l2tp issues since switching to
>>> 4.x kernels. Are you able to try earlier kernels to find the latest
>>> version that works? I'm curious whether things broke at v3.15.
>
> I'll try, but it will take some time to grab enough statistics. The
> bug is relatively rare, only few panics per day on the whole bunch of
> 93 servers.
>
>> It's possible that this may be fixed by a patch that is already
>> upstream and merged for v4.14. The fix is from Guillaume Nault:
>>
>> f3c66d4 l2tp: prevent creation of sessions on terminated tunnels
>>
>> If it's possible that the L2TP server may try to create a session in a
>> tunnel that is being closed, this bug would be exposed.
>>
>> Guillaume's fix isn't yet pushed to stable releases. Are you able to
>> try a v4.14-rc build?
>
> Sorry, I'm not skilled enough to build a kernel for CentOS on my own.
> Will wait till it appears in elrepo. The latest version there is
> currently 4.13.5. Meanwhile I'll try to switch to 3.10 and see how it
> works.

No problem. Please keep us updated. If Guillaume's fix in v4.14
prevents the l2tp crashes in your systems, I'd like to push it out to
stable releases. I have been trying to reproduce the problem here but
have had no luck so far. My guess is that your l2tp servers have a
large ppp population and are handling a lot of traffic. Until we have
evidence that Guillaume's patch resolves this problem, it's harder to
justify pushing it out to stable.

> I have also captured few more kernel panics in the last few days.
> Please see if they are related to this bug:
> http://svimik.com/hdmmsk1kp2.png
> http://svimik.com/hdmmsk1kp3.png
> http://svimik.com/hdmmsk1kp4.png
> http://svimik.com/hdmmsk2kp6.png

Thanks. None of these are related to this bug but it looks like p3, p4
and p6 are all in the networking code. It might be worth opening
separate threads for these. A full oops capture with netconsole would
likely get more attention though.

To check whether the oops is related to this bug yourself, please
check for text that contains "l2tp_xmit_skb" before posting it to this
thread.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-06  9:52             ` James Chapman
@ 2017-10-07 12:09               ` SviMik
  2017-10-07 16:38                 ` Denys Fedoryshchenko
  0 siblings, 1 reply; 12+ messages in thread
From: SviMik @ 2017-10-07 12:09 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev, Guillaume Nault

2017-10-06 12:52 GMT+03:00 James Chapman <jchapman@katalix.com>:
> On 6 October 2017 at 05:45, SviMik <svimik@gmail.com> wrote:
>> 2017-10-04 10:49 GMT+03:00 James Chapman <jchapman@katalix.com>:
>>> On 3 October 2017 at 08:27, James Chapman <jchapman@katalix.com> wrote:
>>>> For capturing complete oops messages, have you tried setting up
>>>> netconsole? You might also find the full text in the syslog on reboot.
>>
>> Why, thank you! You've just told me that Santa Claus exists :)
>
> You're welcome. Heh, my wife says I have a few more grey hairs and I
> don't shave as often as I should. :)
>
>> I've set up netconsole on 93 of my servers, and hope starting from
>> tomorrow I'll have more pretty kernel panic reports, and get them even
>> from servers where I had never had a chance to capture the console
>> before.

Unfortunately, netconsole has managed to send a kernel panic trace
only once, and it's not related to this bug. Looks like something
crashes really hard to make netconsole unusable.

Just for record, it seems to me that tun_do_read() has some bug too:
http://svimik.com/hdmmsk1kp5.txt
Shall I report it to a separate thread?

Meanwhile, I have found that kdump in CentOS just fails to work with
kernels >=4.9 while working fine with 4.8.
It says:
Rebuilding /boot/initrd-4.9.48-29.el6.x86_64kdump.img
No module ext4 found for kernel 4.9.48-29.el6.x86_64, aborting.
Failed to run mkdumprd

>>>> It's interesting that you are seeing l2tp issues since switching to
>>>> 4.x kernels. Are you able to try earlier kernels to find the latest
>>>> version that works? I'm curious whether things broke at v3.15.
>>
>> I'll try, but it will take some time to grab enough statistics. The
>> bug is relatively rare, only few panics per day on the whole bunch of
>> 93 servers.

I have tested the kernel 3.10.107-1.el6.elrepo.x86_64 for 24 hours,
and have to say that none of kernel panics occurred on any of the
servers during this period. Which is pretty impressive comparing how
many different oops I had with 4.x kernels. Oops which were not
related to this bug are gone too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-07 12:09               ` SviMik
@ 2017-10-07 16:38                 ` Denys Fedoryshchenko
  0 siblings, 0 replies; 12+ messages in thread
From: Denys Fedoryshchenko @ 2017-10-07 16:38 UTC (permalink / raw)
  To: SviMik; +Cc: James Chapman, netdev, Guillaume Nault, netdev-owner

On 2017-10-07 15:09, SviMik wrote:
> 
> Unfortunately, netconsole has managed to send a kernel panic trace
> only once, and it's not related to this bug. Looks like something
> crashes really hard to make netconsole unusable.
In some cases i had luck with pstore, when netconsole failed me 
(especially networking bugs), it stores panic messages more reliably, 
especially on recent platforms who have ERST and EFI.
https://www.kernel.org/doc/Documentation/ABI/testing/pstore

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
  2017-10-03  7:27       ` James Chapman
  2017-10-04  7:49         ` James Chapman
@ 2017-10-09 17:15         ` Guillaume Nault
  1 sibling, 0 replies; 12+ messages in thread
From: Guillaume Nault @ 2017-10-09 17:15 UTC (permalink / raw)
  To: James Chapman; +Cc: SviMik, netdev

On Tue, Oct 03, 2017 at 08:27:32AM +0100, James Chapman wrote:
> On 2 October 2017 at 19:35, SviMik <svimik@gmail.com> wrote:
> > Hi, James!
> >
> > No, I'm suffering from kernel panics since I started using 4.x
> > kernels.
> It's interesting that you are seeing l2tp issues since switching to
> 4.x kernels. Are you able to try earlier kernels to find the latest
> version that works? I'm curious whether things broke at v3.15.
> 
> > See my current collection:
> > http://svimik.com/hdmmsk1kp1.png
> This one is another crash implicating l2tp socket shutdown, this time
> when the session pppol2tp socket is closed. Unfortunately the
> screenshot doesn't show the full oops text. I'll investigate this one
> too if you can get a full oops capture.
> 
For this one too, it looks like tunnel->sock is NULL. This time, it's
l2tp_session_free() that breaks, but the root issue is probably the
same: l2tp_tunnel_destruct() removes the tunnel concurrently, most
likely because the session was created on a closing tunnel (as was
suggested by James in another message). So here too, commit
f3c66d4e144a ("l2tp: prevent creation of sessions on terminated tunnels")
might be a good start.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-09 17:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-01 17:21 Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp] Stephen Hemminger
2017-10-02 13:32 ` James Chapman
2017-10-02 13:56   ` Eric Dumazet
2017-10-02 18:35     ` SviMik
2017-10-03  7:27       ` James Chapman
2017-10-04  7:49         ` James Chapman
2017-10-04 10:33           ` Guillaume Nault
2017-10-06  4:45           ` SviMik
2017-10-06  9:52             ` James Chapman
2017-10-07 12:09               ` SviMik
2017-10-07 16:38                 ` Denys Fedoryshchenko
2017-10-09 17:15         ` Guillaume Nault

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).