netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter
@ 2013-11-18 17:14 Stephen Hemminger
  2013-11-18 17:46 ` Vlad Yasevich
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2013-11-18 17:14 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Sun, 17 Nov 2013 19:38:56 -0800
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter


https://bugzilla.kernel.org/show_bug.cgi?id=65131

            Bug ID: 65131
           Summary: kernel panic (BUG_ON raised) in SCTP function
                    sctp_cmd_interpreter
           Product: Networking
           Version: 2.5
    Kernel Version: 3.11.8 custom build, repeated on 3.11.2
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: IPV4
          Assignee: shemminger@linux-foundation.org
          Reporter: yuras@uch.net
        Regression: No

Created attachment 114991
  --> https://bugzilla.kernel.org/attachment.cgi?id=114991&action=edit
Screenshot of panic

Two-node cluster configured using latest corosync (also DRBD 8.4.4, LVM2, and
GFS2 but this is unessential).
Steps to reproduce:
1. Start corosync on both nodes.
2. Start dlm_controld (version 4.0.2) on both nodes (used SCTP protocol as TCP
cannot be used on multi-homed hosts). Adds such lines to kern.log:
    kernel: [  580.428664] sctp: Hash tables configured (established 65536 bind
65536)
    kernel: [  580.441779] DLM installed
3. Start clvmd on either node. Adds such lines to kern.log:
    kernel: [ 1345.259502] dlm: Using SCTP for communications
    kernel: [ 1345.260699] dlm: clvmd: joining the lockspace group...
    kernel: [ 1345.262962] dlm: clvmd: dlm_recover 1
    kernel: [ 1345.262968] dlm: clvmd: group event done 0 0
    kernel: [ 1345.262992] dlm: clvmd: add member 1024
    kernel: [ 1345.262995] dlm: clvmd: dlm_recover_members 1 nodes
    kernel: [ 1345.262996] dlm: clvmd: join complete
    kernel: [ 1345.262998] dlm: clvmd: generation 1 slots 1 1:1024
    kernel: [ 1345.262999] dlm: clvmd: dlm_recover_directory
    kernel: [ 1345.263000] dlm: clvmd: dlm_recover_directory 0 in 0 new
    kernel: [ 1345.263002] dlm: clvmd: dlm_recover_directory 0 out 0 messages
    kernel: [ 1345.263019] dlm: clvmd: dlm_recover 1 generation 1 done: 0 ms
4. Start clvmd on second node. With high probability one node or both nodes
panic in the similar way. Screenshot in attachment.

Stack trace can differ slightly above EOI line, but RIP was always the same. I
suppose provided CPU codes correspond to one of BUG_ON macro inside
sctp_cmd_interpreter. So, this is a bug.

Now this bug totally prevents me from using my cluster as DLM rejects to use
TCP for multi-homed hosts.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fw: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter
  2013-11-18 17:14 Fw: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter Stephen Hemminger
@ 2013-11-18 17:46 ` Vlad Yasevich
  2013-11-18 19:35   ` Vlad Yasevich
  0 siblings, 1 reply; 4+ messages in thread
From: Vlad Yasevich @ 2013-11-18 17:46 UTC (permalink / raw)
  To: Stephen Hemminger, netdev

On 11/18/2013 12:14 PM, Stephen Hemminger wrote:
> 
> 
> Begin forwarded message:
> 
> Date: Sun, 17 Nov 2013 19:38:56 -0800
> From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
> To: "stephen@networkplumber.org" <stephen@networkplumber.org>
> Subject: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=65131
> 
>             Bug ID: 65131
>            Summary: kernel panic (BUG_ON raised) in SCTP function
>                     sctp_cmd_interpreter
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 3.11.8 custom build, repeated on 3.11.2
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: blocking
>           Priority: P1
>          Component: IPV4
>           Assignee: shemminger@linux-foundation.org
>           Reporter: yuras@uch.net
>         Regression: No
> 
> Created attachment 114991
>   --> https://bugzilla.kernel.org/attachment.cgi?id=114991&action=edit
> Screenshot of panic
> 
> Two-node cluster configured using latest corosync (also DRBD 8.4.4, LVM2, and
> GFS2 but this is unessential).
> Steps to reproduce:
> 1. Start corosync on both nodes.
> 2. Start dlm_controld (version 4.0.2) on both nodes (used SCTP protocol as TCP
> cannot be used on multi-homed hosts). Adds such lines to kern.log:
>     kernel: [  580.428664] sctp: Hash tables configured (established 65536 bind
> 65536)
>     kernel: [  580.441779] DLM installed
> 3. Start clvmd on either node. Adds such lines to kern.log:
>     kernel: [ 1345.259502] dlm: Using SCTP for communications
>     kernel: [ 1345.260699] dlm: clvmd: joining the lockspace group...
>     kernel: [ 1345.262962] dlm: clvmd: dlm_recover 1
>     kernel: [ 1345.262968] dlm: clvmd: group event done 0 0
>     kernel: [ 1345.262992] dlm: clvmd: add member 1024
>     kernel: [ 1345.262995] dlm: clvmd: dlm_recover_members 1 nodes
>     kernel: [ 1345.262996] dlm: clvmd: join complete
>     kernel: [ 1345.262998] dlm: clvmd: generation 1 slots 1 1:1024
>     kernel: [ 1345.262999] dlm: clvmd: dlm_recover_directory
>     kernel: [ 1345.263000] dlm: clvmd: dlm_recover_directory 0 in 0 new
>     kernel: [ 1345.263002] dlm: clvmd: dlm_recover_directory 0 out 0 messages
>     kernel: [ 1345.263019] dlm: clvmd: dlm_recover 1 generation 1 done: 0 ms
> 4. Start clvmd on second node. With high probability one node or both nodes
> panic in the similar way. Screenshot in attachment.
> 
> Stack trace can differ slightly above EOI line, but RIP was always the same. I
> suppose provided CPU codes correspond to one of BUG_ON macro inside
> sctp_cmd_interpreter. So, this is a bug.
> 
> Now this bug totally prevents me from using my cluster as DLM rejects to use
> TCP for multi-homed hosts.
> 

Should be fixed by:
commit 7926c1d5be0b7cbe5b8d5c788d7d39237e7b212c
Author: Daniel Borkmann <dborkman@redhat.com>
Date:   Thu Oct 31 09:13:32 2013 +0100

    net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb

-vlad

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fw: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter
  2013-11-18 17:46 ` Vlad Yasevich
@ 2013-11-18 19:35   ` Vlad Yasevich
  2013-11-18 23:49     ` Daniel Borkmann
  0 siblings, 1 reply; 4+ messages in thread
From: Vlad Yasevich @ 2013-11-18 19:35 UTC (permalink / raw)
  To: Stephen Hemminger, netdev

On 11/18/2013 12:46 PM, Vlad Yasevich wrote:
> On 11/18/2013 12:14 PM, Stephen Hemminger wrote:
>>
>>
>> Begin forwarded message:
>>
>> Date: Sun, 17 Nov 2013 19:38:56 -0800
>> From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
>> To: "stephen@networkplumber.org" <stephen@networkplumber.org>
>> Subject: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=65131
>>
>>             Bug ID: 65131
>>            Summary: kernel panic (BUG_ON raised) in SCTP function
>>                     sctp_cmd_interpreter
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 3.11.8 custom build, repeated on 3.11.2
>>           Hardware: All
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: blocking
>>           Priority: P1
>>          Component: IPV4
>>           Assignee: shemminger@linux-foundation.org
>>           Reporter: yuras@uch.net
>>         Regression: No
>>
>> Created attachment 114991
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=114991&action=edit
>> Screenshot of panic
>>
>> Two-node cluster configured using latest corosync (also DRBD 8.4.4, LVM2, and
>> GFS2 but this is unessential).
>> Steps to reproduce:
>> 1. Start corosync on both nodes.
>> 2. Start dlm_controld (version 4.0.2) on both nodes (used SCTP protocol as TCP
>> cannot be used on multi-homed hosts). Adds such lines to kern.log:
>>     kernel: [  580.428664] sctp: Hash tables configured (established 65536 bind
>> 65536)
>>     kernel: [  580.441779] DLM installed
>> 3. Start clvmd on either node. Adds such lines to kern.log:
>>     kernel: [ 1345.259502] dlm: Using SCTP for communications
>>     kernel: [ 1345.260699] dlm: clvmd: joining the lockspace group...
>>     kernel: [ 1345.262962] dlm: clvmd: dlm_recover 1
>>     kernel: [ 1345.262968] dlm: clvmd: group event done 0 0
>>     kernel: [ 1345.262992] dlm: clvmd: add member 1024
>>     kernel: [ 1345.262995] dlm: clvmd: dlm_recover_members 1 nodes
>>     kernel: [ 1345.262996] dlm: clvmd: join complete
>>     kernel: [ 1345.262998] dlm: clvmd: generation 1 slots 1 1:1024
>>     kernel: [ 1345.262999] dlm: clvmd: dlm_recover_directory
>>     kernel: [ 1345.263000] dlm: clvmd: dlm_recover_directory 0 in 0 new
>>     kernel: [ 1345.263002] dlm: clvmd: dlm_recover_directory 0 out 0 messages
>>     kernel: [ 1345.263019] dlm: clvmd: dlm_recover 1 generation 1 done: 0 ms
>> 4. Start clvmd on second node. With high probability one node or both nodes
>> panic in the similar way. Screenshot in attachment.
>>
>> Stack trace can differ slightly above EOI line, but RIP was always the same. I
>> suppose provided CPU codes correspond to one of BUG_ON macro inside
>> sctp_cmd_interpreter. So, this is a bug.
>>
>> Now this bug totally prevents me from using my cluster as DLM rejects to use
>> TCP for multi-homed hosts.
>>
> 
> Should be fixed by:
> commit 7926c1d5be0b7cbe5b8d5c788d7d39237e7b212c
> Author: Daniel Borkmann <dborkman@redhat.com>
> Date:   Thu Oct 31 09:13:32 2013 +0100
> 
>     net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb
> 
> -vlad
> 

Just received confirmation that the above patch has been queued for 3.11.

-vlad

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fw: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter
  2013-11-18 19:35   ` Vlad Yasevich
@ 2013-11-18 23:49     ` Daniel Borkmann
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Borkmann @ 2013-11-18 23:49 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Stephen Hemminger, netdev

On 11/18/2013 08:35 PM, Vlad Yasevich wrote:
> On 11/18/2013 12:46 PM, Vlad Yasevich wrote:

>> Should be fixed by:
>> commit 7926c1d5be0b7cbe5b8d5c788d7d39237e7b212c
>> Author: Daniel Borkmann <dborkman@redhat.com>
>> Date:   Thu Oct 31 09:13:32 2013 +0100
>>
>>      net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcb
>
> Just received confirmation that the above patch has been queued for 3.11.

Indeed, thanks !

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-11-18 23:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-18 17:14 Fw: [Bug 65131] New: kernel panic (BUG_ON raised) in SCTP function sctp_cmd_interpreter Stephen Hemminger
2013-11-18 17:46 ` Vlad Yasevich
2013-11-18 19:35   ` Vlad Yasevich
2013-11-18 23:49     ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).