* Oops in Unix sockets code
@ 2009-11-19 13:20 Blaschka
2009-11-19 13:40 ` Christian Borntraeger
0 siblings, 1 reply; 4+ messages in thread
From: Blaschka @ 2009-11-19 13:20 UTC (permalink / raw)
To: netdev, linux-s390
Hi,
running disk tests on s390x (kernel 2.6.31) we get following Oops in Unix domain
socket code (hald process). Can somebody help? We do get this Oops from time to
time so we are willing to test a patch or provide additional debug data if
required.
Thanks,
Frank
<1>Unable to handle kernel pointer dereference at virtual kernel address 000000007575e000
<4>Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
<4>Modules linked in: sunrpc qeth_l3 dm_multipath dm_mod qeth ccwgroup chsc_sch
<4>CPU: 0 Not tainted 2.6.31-39.x.20091102-s390xdefault #1
<4>Process hald (pid: 2117, task: 000000007d200c40, ksp: 000000007ab33880)
<4>Krnl PSW : 0704100180000000 00000000003a15f8 (_raw_read_trylock+0x0/0x28)
<4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
<4>Krnl GPRS: 16c8a00000000000 000000007d200c40 000000007575ed18 0000000000000003
<4> 00000000005853d2 000000007d201470 0000000000000002 000000007ab33c30
<4> 0000000075746c78 000000007a74da48 000000000051a16a 000000007575ed18
<4> 000000007575ed30 00000000005da190 00000000005853dc 000000007ab338c8
<4>Krnl Code: 00000000003a15e8: c03000185811 larl %r3,6ac60a
<4> 00000000003a15ee: c0e5fffffdd9 brasl %r14,3a11a0
<4> 00000000003a15f4: a7f4ffce brc 15,3a1590
<4> >00000000003a15f8: 58302000 l %r3,0(%r2)
<4> 00000000003a15fc: b9170033 llgtr %r3,%r3
<4> 00000000003a1600: 1853 lr %r5,%r3
<4> 00000000003a1602: 1813 lr %r1,%r3
<4> 00000000003a1604: a75a0001 ahi %r5,1
<4>Call Trace:
<4>([<00000000005853d2>] _read_lock+0x5a/0x98)
<4> [<000000000051a16a>] unix_write_space+0x36/0xb0
<4> [<00000000004788a8>] sock_wfree+0x80/0x84
<4> [<000000000047dc08>] skb_release_head_state+0x88/0x140
<4> [<000000000047d7ec>] __kfree_skb+0x28/0x10c
<4> [<0000000000481d7e>] skb_free_datagram+0x32/0x6c
<4> [<0000000000517a46>] unix_dgram_recvmsg+0x246/0x38c
<4> [<0000000000474036>] sock_recvmsg+0xe2/0x118
<4> [<00000000004754f8>] SyS_recvmsg+0x134/0x310
<4> [<0000000000472f14>] SyS_socketcall+0xfc/0x31c
<4> [<0000000000117f9e>] sysc_noemu+0x10/0x16
<4> [<0000004f131a95ae>] 0x4f131a95ae
<4>INFO: lockdep is turned off.
<4>Last Breaking-Event-Address:
<4> [<00000000005853d6>] _read_lock+0x5e/0x98
<4>
<0>Kernel panic - not syncing: Fatal exception: panic_on_oops
<4>CPU: 0 Tainted: G D 2.6.31-39.x.20091102-s390xdefault #1
<4>Process hald (pid: 2117, task: 000000007d200c40, ksp: 000000007ab33880)
<4>0000000000000000 000000007ab33588 0000000000000002 0000000000000000
<4> 000000007ab33628 000000007ab335a0 000000007ab335a0 00000000005801b8
<4> 0000000000000001 0000000000000000 000000007ab33c30 0000000000000000
<4> 000000000000000d 0000000000000000 000000007ab335f8 000000000000000e
<4> 000000000058fc18 0000000000105700 000000007ab33588 000000007ab335d0
<4>Call Trace:
<4>([<00000000001055fc>] show_trace+0xf0/0x148)
<4> [<0000000000580022>] panic+0xa2/0x1e4
<4> [<0000000000105bf8>] die+0x14c/0x168
<4> [<00000000001012d8>] do_no_context+0xa8/0xe8
<4> [<000000000058597c>] do_dat_exception+0x134/0x338
<4> [<0000000000117fa4>] sysc_return+0x0/0x8
<4> [<00000000003a15f8>] _raw_read_trylock+0x0/0x28
<4>([<00000000005853d2>] _read_lock+0x5a/0x98)
<4> [<000000000051a16a>] unix_write_space+0x36/0xb0
<4> [<00000000004788a8>] sock_wfree+0x80/0x84
<4> [<000000000047dc08>] skb_release_head_state+0x88/0x140
<4> [<000000000047d7ec>] __kfree_skb+0x28/0x10c
<4> [<0000000000481d7e>] skb_free_datagram+0x32/0x6c
<4> [<0000000000517a46>] unix_dgram_recvmsg+0x246/0x38c
<4> [<0000000000474036>] sock_recvmsg+0xe2/0x118
<4> [<00000000004754f8>] SyS_recvmsg+0x134/0x310
<4> [<0000000000472f14>] SyS_socketcall+0xfc/0x31c
<4> [<0000000000117f9e>] sysc_noemu+0x10/0x16
<4> [<0000004f131a95ae>] 0x4f131a95ae
<4>INFO: lockdep is turned off.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Oops in Unix sockets code
2009-11-19 13:20 Oops in Unix sockets code Blaschka
@ 2009-11-19 13:40 ` Christian Borntraeger
2009-11-19 14:20 ` Eric Dumazet
0 siblings, 1 reply; 4+ messages in thread
From: Christian Borntraeger @ 2009-11-19 13:40 UTC (permalink / raw)
To: Blaschka; +Cc: netdev, linux-s390
Am Donnerstag 19 November 2009 14:20:28 schrieb Blaschka:
> <1>Unable to handle kernel pointer dereference at virtual kernel address 000000007575e000
> <4>Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
0011(page translation excepton) and DEBUG_PAGEALLOC might indicate a use after free.
> <4>Modules linked in: sunrpc qeth_l3 dm_multipath dm_mod qeth ccwgroup chsc_sch
> <4>CPU: 0 Not tainted 2.6.31-39.x.20091102-s390xdefault #1
> <4>Process hald (pid: 2117, task: 000000007d200c40, ksp: 000000007ab33880)
> <4>Krnl PSW : 0704100180000000 00000000003a15f8 (_raw_read_trylock+0x0/0x28)
> <4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
> <4>Krnl GPRS: 16c8a00000000000 000000007d200c40 000000007575ed18 0000000000000003
> <4> 00000000005853d2 000000007d201470 0000000000000002 000000007ab33c30
> <4> 0000000075746c78 000000007a74da48 000000000051a16a 000000007575ed18
> <4> 000000007575ed30 00000000005da190 00000000005853dc 000000007ab338c8
> <4>Krnl Code: 00000000003a15e8: c03000185811 larl %r3,6ac60a
> <4> 00000000003a15ee: c0e5fffffdd9 brasl %r14,3a11a0
> <4> 00000000003a15f4: a7f4ffce brc 15,3a1590
> <4> >00000000003a15f8: 58302000 l %r3,0(%r2)
> <4> 00000000003a15fc: b9170033 llgtr %r3,%r3
> <4> 00000000003a1600: 1853 lr %r5,%r3
> <4> 00000000003a1602: 1813 lr %r1,%r3
> <4> 00000000003a1604: a75a0001 ahi %r5,1
> <4>Call Trace:
> <4>([<00000000005853d2>] _read_lock+0x5a/0x98)
> <4> [<000000000051a16a>] unix_write_space+0x36/0xb0
[...]
So it looks like that struct sock *sk is already gone in unix_write_space.
Since I have no clue about the socket code, I can only guess that there is a
locking or refcount issue.
Christian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Oops in Unix sockets code
2009-11-19 13:40 ` Christian Borntraeger
@ 2009-11-19 14:20 ` Eric Dumazet
2009-11-19 15:46 ` Sebastian Ott
0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2009-11-19 14:20 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: Blaschka, netdev, linux-s390
Christian Borntraeger a écrit :
> Am Donnerstag 19 November 2009 14:20:28 schrieb Blaschka:
>> <1>Unable to handle kernel pointer dereference at virtual kernel address 000000007575e000
>> <4>Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> 0011(page translation excepton) and DEBUG_PAGEALLOC might indicate a use after free.
>
>> <4>Modules linked in: sunrpc qeth_l3 dm_multipath dm_mod qeth ccwgroup chsc_sch
>> <4>CPU: 0 Not tainted 2.6.31-39.x.20091102-s390xdefault #1
>> <4>Process hald (pid: 2117, task: 000000007d200c40, ksp: 000000007ab33880)
>> <4>Krnl PSW : 0704100180000000 00000000003a15f8 (_raw_read_trylock+0x0/0x28)
>> <4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
>> <4>Krnl GPRS: 16c8a00000000000 000000007d200c40 000000007575ed18 0000000000000003
>> <4> 00000000005853d2 000000007d201470 0000000000000002 000000007ab33c30
>> <4> 0000000075746c78 000000007a74da48 000000000051a16a 000000007575ed18
>> <4> 000000007575ed30 00000000005da190 00000000005853dc 000000007ab338c8
>> <4>Krnl Code: 00000000003a15e8: c03000185811 larl %r3,6ac60a
>> <4> 00000000003a15ee: c0e5fffffdd9 brasl %r14,3a11a0
>> <4> 00000000003a15f4: a7f4ffce brc 15,3a1590
>> <4> >00000000003a15f8: 58302000 l %r3,0(%r2)
>> <4> 00000000003a15fc: b9170033 llgtr %r3,%r3
>> <4> 00000000003a1600: 1853 lr %r5,%r3
>> <4> 00000000003a1602: 1813 lr %r1,%r3
>> <4> 00000000003a1604: a75a0001 ahi %r5,1
>> <4>Call Trace:
>> <4>([<00000000005853d2>] _read_lock+0x5a/0x98)
>> <4> [<000000000051a16a>] unix_write_space+0x36/0xb0
> [...]
>
> So it looks like that struct sock *sk is already gone in unix_write_space.
> Since I have no clue about the socket code, I can only guess that there is a
> locking or refcount issue.
2.6.31 has a known bug
2.6.31.4 should correct it
commit 657453424a3c382035983f9a47306fafea730f6d
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu Sep 24 10:49:24 2009 +0000
net: Fix sock_wfree() race
[ Upstream commit d99927f4d93f36553699573b279e0ff98ad7dea6 ]
Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.
A fix is to call sk->sk_write_space(sk) while still
holding a reference on sk.
Reported-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Please try 2.6.31.6 ;)
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Oops in Unix sockets code
2009-11-19 14:20 ` Eric Dumazet
@ 2009-11-19 15:46 ` Sebastian Ott
0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Ott @ 2009-11-19 15:46 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Christian Borntraeger, Blaschka, netdev, linux-s390
On Thu, 19 Nov 2009, Eric Dumazet wrote:
...
> 2.6.31 has a known bug
>
> 2.6.31.4 should correct it
>
> commit 657453424a3c382035983f9a47306fafea730f6d
> Author: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu Sep 24 10:49:24 2009 +0000
indeed, problem didn't show up with d99927f applied. thanks for
pointing that out.
sebastian
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-11-19 15:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-19 13:20 Oops in Unix sockets code Blaschka
2009-11-19 13:40 ` Christian Borntraeger
2009-11-19 14:20 ` Eric Dumazet
2009-11-19 15:46 ` Sebastian Ott
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).