* 3.0.4: Soft lockup in netfront in SMP build
@ 2007-04-21 21:45 Graham, Simon
2007-04-21 22:14 ` Jeremy Fitzhardinge
2007-04-22 9:55 ` Keir Fraser
0 siblings, 2 replies; 3+ messages in thread
From: Graham, Simon @ 2007-04-21 21:45 UTC (permalink / raw)
To: xen-devel
Just run into a (real) soft lockup running 3.0.4 - stack is at the end of this, but basically:
. network_open acquires the rx spin lock with spin_lock() and then checks for
work on the queue and calls (I think) netif_rx_schedule with the lock still held which
can call into the hypervisor.
. An interrupt is delivered to the bottom half of netfront which ends up calling netif_poll
which blocks attempting to acquire the rx spin lock.
Oops!
I see from the unstable tree that this code was recently modified to use spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code - clearly we cant merge this changeset into 3.0.4.
I haven't looked too closely at all of the code yet, but I'm wondering if a judicious change of spin_lock to spin_lock_bh in netfront would be the best approach?
Simon
BUG: soft lockup detected on CPU#0!
Pid: 2973, comm: ifconfig
EIP: 0061:[<c035868a>] CPU: 0
EIP is at _spin_lock+0xa/0x20
EFLAGS: 00000286 Tainted: GF (2.6.16.38-xen #1)
EAX: c13b840c EBX: c13b8000 ECX: 00000040 EDX: cf783dc0
ESI: 00000000 EDI: c1233274 EBP: cf783d28 DS: 007b ES: 007b
CR0: 8005003b CR2: b7eea198 CR3: 103f2000 CR4: 00000660
[<c010595d>] show_trace+0xd/0x10
[<c010344f>] show_regs+0x18f/0x1c0
[<c01491f4>] softlockup_tick+0xe4/0x100
[<c012c713>] do_timer+0x43/0xf0
[<c010936a>] timer_interrupt+0x1fa/0x670
[<c01494da>] handle_IRQ_event+0x6a/0xb0
[<c01495a9>] __do_IRQ+0x89/0xf0
[<c0106ee8>] do_IRQ+0x38/0x70
[<c028dee0>] evtchn_do_upcall+0x90/0x110
[<c01056a1>] hypervisor_callback+0x3d/0x48
[<c02a5db2>] netif_poll+0x32/0x630
[<c02dace8>] net_rx_action+0x148/0x230
[<c01279d5>] __do_softirq+0x95/0x130
[<c0127af5>] do_softirq+0x85/0xa0
[<c0127bda>] irq_exit+0x3a/0x50
[<c0106eed>] do_IRQ+0x3d/0x70
[<c028dee0>] evtchn_do_upcall+0x90/0x110
[<c01056a1>] hypervisor_callback+0x3d/0x48
[<c02a461b>] network_open+0x13b/0x210
[<c02d9564>] dev_open+0x74/0x90
[<c02db482>] dev_change_flags+0x52/0x110
[<c0320187>] devinet_ioctl+0x4f7/0x5a0
[<c0322174>] inet_ioctl+0x94/0xc0
[<c02cf84d>] sock_ioctl+0xed/0x250
[<c01842d6>] do_ioctl+0x76/0x90
[<c0184479>] vfs_ioctl+0x59/0x1d0
[<c0184657>] sys_ioctl+0x67/0x80
[<c01054cd>] syscall_call+0x7/0xb
crash>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 3.0.4: Soft lockup in netfront in SMP build
2007-04-21 21:45 3.0.4: Soft lockup in netfront in SMP build Graham, Simon
@ 2007-04-21 22:14 ` Jeremy Fitzhardinge
2007-04-22 9:55 ` Keir Fraser
1 sibling, 0 replies; 3+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-21 22:14 UTC (permalink / raw)
To: Graham, Simon; +Cc: xen-devel
Graham, Simon wrote:
> Just run into a (real) soft lockup running 3.0.4 - stack is at the end of this, but basically:
>
> . network_open acquires the rx spin lock with spin_lock() and then checks for
> work on the queue and calls (I think) netif_rx_schedule with the lock still held which
> can call into the hypervisor.
> . An interrupt is delivered to the bottom half of netfront which ends up calling netif_poll
> which blocks attempting to acquire the rx spin lock.
>
> Oops!
>
> I see from the unstable tree that this code was recently modified to use spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code - clearly we cant merge this changeset into 3.0.4.
>
> I haven't looked too closely at all of the code yet, but I'm wondering if a judicious change of spin_lock to spin_lock_bh in netfront would be the best approach?
>
I found a few locking problems when I ran netfront with lockdep
enabled. Fixes were committed to xen-unstable in 14844:abea8d171503 and
14851:22460cfaca71. I was wondering if there had been any real cases of
these deadlocking.
J
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: 3.0.4: Soft lockup in netfront in SMP build
2007-04-21 21:45 3.0.4: Soft lockup in netfront in SMP build Graham, Simon
2007-04-21 22:14 ` Jeremy Fitzhardinge
@ 2007-04-22 9:55 ` Keir Fraser
1 sibling, 0 replies; 3+ messages in thread
From: Keir Fraser @ 2007-04-22 9:55 UTC (permalink / raw)
To: Graham, Simon, xen-devel
On 21/4/07 22:45, "Graham, Simon" <Simon.Graham@stratus.com> wrote:
> Oops!
>
> I see from the unstable tree that this code was recently modified to use
> spin_lock_bh() instead of spin_lock() as part of a mega-merge of IA64 code -
> clearly we cant merge this changeset into 3.0.4.
>
> I haven't looked too closely at all of the code yet, but I'm wondering if a
> judicious change of spin_lock to spin_lock_bh in netfront would be the best
> approach?
Actually the fixes are localised in changesets 14844 and 14851. These should
be easily portable to the kernel(s) of your choice. Thanks are due again to
Jeremy for finding these nasty lurking deadlocks (albeit with automated
assistance :-).
-- Keir
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-04-22 9:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-21 21:45 3.0.4: Soft lockup in netfront in SMP build Graham, Simon
2007-04-21 22:14 ` Jeremy Fitzhardinge
2007-04-22 9:55 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.