netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Oops in Unix sockets code
@ 2009-11-19 13:20 Blaschka
  2009-11-19 13:40 ` Christian Borntraeger
  0 siblings, 1 reply; 4+ messages in thread
From: Blaschka @ 2009-11-19 13:20 UTC (permalink / raw)
  To: netdev, linux-s390


Hi,

running disk tests on s390x (kernel 2.6.31) we get following Oops in Unix domain
socket code (hald process). Can somebody help? We do get this Oops from time to
time so we are willing to test a patch or provide additional debug data if
required.

Thanks,
  Frank

    <1>Unable to handle kernel pointer dereference at virtual kernel address 000000007575e000
    <4>Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    <4>Modules linked in: sunrpc qeth_l3 dm_multipath dm_mod qeth ccwgroup chsc_sch
    <4>CPU: 0 Not tainted 2.6.31-39.x.20091102-s390xdefault #1
    <4>Process hald (pid: 2117, task: 000000007d200c40, ksp: 000000007ab33880)
    <4>Krnl PSW : 0704100180000000 00000000003a15f8 (_raw_read_trylock+0x0/0x28)
    <4>           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
    <4>Krnl GPRS: 16c8a00000000000 000000007d200c40 000000007575ed18 0000000000000003
    <4>           00000000005853d2 000000007d201470 0000000000000002 000000007ab33c30
    <4>           0000000075746c78 000000007a74da48 000000000051a16a 000000007575ed18
    <4>           000000007575ed30 00000000005da190 00000000005853dc 000000007ab338c8
    <4>Krnl Code: 00000000003a15e8: c03000185811        larl    %r3,6ac60a
    <4>           00000000003a15ee: c0e5fffffdd9        brasl   %r14,3a11a0
    <4>           00000000003a15f4: a7f4ffce            brc     15,3a1590
    <4>          >00000000003a15f8: 58302000            l       %r3,0(%r2)
    <4>           00000000003a15fc: b9170033            llgtr   %r3,%r3
    <4>           00000000003a1600: 1853                lr      %r5,%r3
    <4>           00000000003a1602: 1813                lr      %r1,%r3
    <4>           00000000003a1604: a75a0001            ahi     %r5,1
    <4>Call Trace:
    <4>([<00000000005853d2>] _read_lock+0x5a/0x98)
    <4> [<000000000051a16a>] unix_write_space+0x36/0xb0
    <4> [<00000000004788a8>] sock_wfree+0x80/0x84
    <4> [<000000000047dc08>] skb_release_head_state+0x88/0x140
    <4> [<000000000047d7ec>] __kfree_skb+0x28/0x10c
    <4> [<0000000000481d7e>] skb_free_datagram+0x32/0x6c
    <4> [<0000000000517a46>] unix_dgram_recvmsg+0x246/0x38c
    <4> [<0000000000474036>] sock_recvmsg+0xe2/0x118
    <4> [<00000000004754f8>] SyS_recvmsg+0x134/0x310
    <4> [<0000000000472f14>] SyS_socketcall+0xfc/0x31c
    <4> [<0000000000117f9e>] sysc_noemu+0x10/0x16
    <4> [<0000004f131a95ae>] 0x4f131a95ae
    <4>INFO: lockdep is turned off.
    <4>Last Breaking-Event-Address:
    <4> [<00000000005853d6>] _read_lock+0x5e/0x98
    <4>
    <0>Kernel panic - not syncing: Fatal exception: panic_on_oops
    <4>CPU: 0 Tainted: G      D    2.6.31-39.x.20091102-s390xdefault #1
    <4>Process hald (pid: 2117, task: 000000007d200c40, ksp: 000000007ab33880)
    <4>0000000000000000 000000007ab33588 0000000000000002 0000000000000000
    <4>       000000007ab33628 000000007ab335a0 000000007ab335a0 00000000005801b8
    <4>       0000000000000001 0000000000000000 000000007ab33c30 0000000000000000
    <4>       000000000000000d 0000000000000000 000000007ab335f8 000000000000000e
    <4>       000000000058fc18 0000000000105700 000000007ab33588 000000007ab335d0
    <4>Call Trace:
    <4>([<00000000001055fc>] show_trace+0xf0/0x148)
    <4> [<0000000000580022>] panic+0xa2/0x1e4
    <4> [<0000000000105bf8>] die+0x14c/0x168
    <4> [<00000000001012d8>] do_no_context+0xa8/0xe8
    <4> [<000000000058597c>] do_dat_exception+0x134/0x338
    <4> [<0000000000117fa4>] sysc_return+0x0/0x8
    <4> [<00000000003a15f8>] _raw_read_trylock+0x0/0x28
    <4>([<00000000005853d2>] _read_lock+0x5a/0x98)
    <4> [<000000000051a16a>] unix_write_space+0x36/0xb0
    <4> [<00000000004788a8>] sock_wfree+0x80/0x84
    <4> [<000000000047dc08>] skb_release_head_state+0x88/0x140
    <4> [<000000000047d7ec>] __kfree_skb+0x28/0x10c
    <4> [<0000000000481d7e>] skb_free_datagram+0x32/0x6c
    <4> [<0000000000517a46>] unix_dgram_recvmsg+0x246/0x38c
    <4> [<0000000000474036>] sock_recvmsg+0xe2/0x118
    <4> [<00000000004754f8>] SyS_recvmsg+0x134/0x310
    <4> [<0000000000472f14>] SyS_socketcall+0xfc/0x31c
    <4> [<0000000000117f9e>] sysc_noemu+0x10/0x16
    <4> [<0000004f131a95ae>] 0x4f131a95ae
    <4>INFO: lockdep is turned off.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Oops in Unix sockets code
  2009-11-19 13:20 Oops in Unix sockets code Blaschka
@ 2009-11-19 13:40 ` Christian Borntraeger
  2009-11-19 14:20   ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Christian Borntraeger @ 2009-11-19 13:40 UTC (permalink / raw)
  To: Blaschka; +Cc: netdev, linux-s390

Am Donnerstag 19 November 2009 14:20:28 schrieb Blaschka:
>     <1>Unable to handle kernel pointer dereference at virtual kernel address 000000007575e000
>     <4>Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
0011(page translation excepton) and DEBUG_PAGEALLOC might indicate a use after free.

>     <4>Modules linked in: sunrpc qeth_l3 dm_multipath dm_mod qeth ccwgroup chsc_sch
>     <4>CPU: 0 Not tainted 2.6.31-39.x.20091102-s390xdefault #1
>     <4>Process hald (pid: 2117, task: 000000007d200c40, ksp: 000000007ab33880)
>     <4>Krnl PSW : 0704100180000000 00000000003a15f8 (_raw_read_trylock+0x0/0x28)
>     <4>           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
>     <4>Krnl GPRS: 16c8a00000000000 000000007d200c40 000000007575ed18 0000000000000003
>     <4>           00000000005853d2 000000007d201470 0000000000000002 000000007ab33c30
>     <4>           0000000075746c78 000000007a74da48 000000000051a16a 000000007575ed18
>     <4>           000000007575ed30 00000000005da190 00000000005853dc 000000007ab338c8
>     <4>Krnl Code: 00000000003a15e8: c03000185811        larl    %r3,6ac60a
>     <4>           00000000003a15ee: c0e5fffffdd9        brasl   %r14,3a11a0
>     <4>           00000000003a15f4: a7f4ffce            brc     15,3a1590
>     <4>          >00000000003a15f8: 58302000            l       %r3,0(%r2)
>     <4>           00000000003a15fc: b9170033            llgtr   %r3,%r3
>     <4>           00000000003a1600: 1853                lr      %r5,%r3
>     <4>           00000000003a1602: 1813                lr      %r1,%r3
>     <4>           00000000003a1604: a75a0001            ahi     %r5,1
>     <4>Call Trace:
>     <4>([<00000000005853d2>] _read_lock+0x5a/0x98)
>     <4> [<000000000051a16a>] unix_write_space+0x36/0xb0
[...]

So it looks like that struct sock *sk is already gone in unix_write_space.
Since I have no clue about the socket code, I can only guess that there is a
locking or refcount issue.

Christian


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Oops in Unix sockets code
  2009-11-19 13:40 ` Christian Borntraeger
@ 2009-11-19 14:20   ` Eric Dumazet
  2009-11-19 15:46     ` Sebastian Ott
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2009-11-19 14:20 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: Blaschka, netdev, linux-s390

Christian Borntraeger a écrit :
> Am Donnerstag 19 November 2009 14:20:28 schrieb Blaschka:
>>     <1>Unable to handle kernel pointer dereference at virtual kernel address 000000007575e000
>>     <4>Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> 0011(page translation excepton) and DEBUG_PAGEALLOC might indicate a use after free.
> 
>>     <4>Modules linked in: sunrpc qeth_l3 dm_multipath dm_mod qeth ccwgroup chsc_sch
>>     <4>CPU: 0 Not tainted 2.6.31-39.x.20091102-s390xdefault #1
>>     <4>Process hald (pid: 2117, task: 000000007d200c40, ksp: 000000007ab33880)
>>     <4>Krnl PSW : 0704100180000000 00000000003a15f8 (_raw_read_trylock+0x0/0x28)
>>     <4>           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
>>     <4>Krnl GPRS: 16c8a00000000000 000000007d200c40 000000007575ed18 0000000000000003
>>     <4>           00000000005853d2 000000007d201470 0000000000000002 000000007ab33c30
>>     <4>           0000000075746c78 000000007a74da48 000000000051a16a 000000007575ed18
>>     <4>           000000007575ed30 00000000005da190 00000000005853dc 000000007ab338c8
>>     <4>Krnl Code: 00000000003a15e8: c03000185811        larl    %r3,6ac60a
>>     <4>           00000000003a15ee: c0e5fffffdd9        brasl   %r14,3a11a0
>>     <4>           00000000003a15f4: a7f4ffce            brc     15,3a1590
>>     <4>          >00000000003a15f8: 58302000            l       %r3,0(%r2)
>>     <4>           00000000003a15fc: b9170033            llgtr   %r3,%r3
>>     <4>           00000000003a1600: 1853                lr      %r5,%r3
>>     <4>           00000000003a1602: 1813                lr      %r1,%r3
>>     <4>           00000000003a1604: a75a0001            ahi     %r5,1
>>     <4>Call Trace:
>>     <4>([<00000000005853d2>] _read_lock+0x5a/0x98)
>>     <4> [<000000000051a16a>] unix_write_space+0x36/0xb0
> [...]
> 
> So it looks like that struct sock *sk is already gone in unix_write_space.
> Since I have no clue about the socket code, I can only guess that there is a
> locking or refcount issue.

2.6.31 has a known bug

2.6.31.4 should correct it

commit 657453424a3c382035983f9a47306fafea730f6d
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Thu Sep 24 10:49:24 2009 +0000

    net: Fix sock_wfree() race
    
    [ Upstream commit d99927f4d93f36553699573b279e0ff98ad7dea6 ]
    
    Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
    (net: No more expensive sock_hold()/sock_put() on each tx)
    opens a window in sock_wfree() where another cpu
    might free the socket we are working on.
    
    A fix is to call sk->sk_write_space(sk) while still
    holding a reference on sk.
    
    Reported-by: Jike Song <albcamus@gmail.com>
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


Please try 2.6.31.6 ;)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Oops in Unix sockets code
  2009-11-19 14:20   ` Eric Dumazet
@ 2009-11-19 15:46     ` Sebastian Ott
  0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Ott @ 2009-11-19 15:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Christian Borntraeger, Blaschka, netdev, linux-s390



On Thu, 19 Nov 2009, Eric Dumazet wrote:
...
> 2.6.31 has a known bug
> 
> 2.6.31.4 should correct it
> 
> commit 657453424a3c382035983f9a47306fafea730f6d
> Author: Eric Dumazet <eric.dumazet@gmail.com>
> Date:   Thu Sep 24 10:49:24 2009 +0000

indeed, problem didn't show up with d99927f applied. thanks for
pointing that out.

sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-11-19 15:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-19 13:20 Oops in Unix sockets code Blaschka
2009-11-19 13:40 ` Christian Borntraeger
2009-11-19 14:20   ` Eric Dumazet
2009-11-19 15:46     ` Sebastian Ott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).