public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Generic callfunction IPI problems
@ 2008-07-06 14:50 Jeremy Fitzhardinge
  2008-07-06 16:03 ` [PATCH] generic ipi function calls: wait on alloc failure fallback Jeremy Fitzhardinge
  0 siblings, 1 reply; 3+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-06 14:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ingo Molnar, Linux Kernel Mailing List

Hi Jens,

I'm seeing these oopses when running under Xen:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<ffffffff8105de9a>] generic_smp_call_function_interrupt+0xfb/0x118
PGD 0 
Oops: 0000 [1] SMP 
CPU 15 
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.26-rc8-tip #306
RIP: e030:[<ffffffff8105de9a>]  [<ffffffff8105de9a>] generic_smp_call_function_interrupt+0xfb/0x118
RSP: e02b:ffff88007f653e98  EFLAGS: 00010046
RAX: ffffffff815fe6e0 RBX: ffff88007e523cc8 RCX: 0000000000000001
RDX: ffffc10000200200 RSI: 0000000000000001 RDI: ffffffff81693240
RBP: ffff88007f653eb8 R08: ffff88007f653ec8 R09: 0002db11ddd83820
R10: ffff880000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 000000000000000f R15: 0000000000000040
FS:  00007f1dadd907a0(0000) GS:ffff88007ff30080(0000) knlGS:0000000000000000
CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001001000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
Process swapper (pid: 0, threadinfo ffff88007ff90000, task ffff88007ff57100)
Stack:  ffff88007ff2f4c0 0000000000000000 0000000000000000 000000000000004d
 ffff88007f653ec8 ffffffff8100dea7 ffff88007f653ef8 ffffffff810747c5
 ffffffff816959c0 000000000000004d ffff88007ff2f4c0 ffffffff81695a10
Call Trace:
 <IRQ>  [<ffffffff8100dea7>] xen_call_function_interrupt+0xe/0x167
 [<ffffffff810747c5>] handle_IRQ_event+0x2e/0x65
 [<ffffffff81075e5b>] handle_level_irq+0xb5/0x116
 [<ffffffff81013f34>] do_IRQ+0xf7/0x177
 [<ffffffff811ba227>] xen_evtchn_do_upcall+0xb3/0x136
 [<ffffffff8141558e>] xen_do_hypervisor_callback+0x1e/0x30
 <EOI>  [<ffffffff810093aa>] ? _stext+0x3aa/0x1000
 [<ffffffff810093aa>] ? _stext+0x3aa/0x1000
 [<ffffffff8100a42e>] ? xen_safe_halt+0x10/0x1a
 [<ffffffff8100ba26>] ? xen_idle+0x46/0x5c
 [<ffffffff8100eb60>] ? cpu_idle+0xca/0x101
 [<ffffffff8140bc7d>] ? cpu_bringup_and_idle+0x8a/0x8f


Code: e8 fc 96 fc ff 90 41 f6 44 24 20 01 74 08 41 83 64 24 20 fe eb 11 49 8d 7c 24 38 48 c7 c6 90 dd 05 81 e8 6f 92 01 00 4d 8b 24 24 <49> 8b 04 24 49 81 fc e0 e6 5f 81 0f 18 08 0f 85 11 ff ff ff 5b 
RIP  [<ffffffff8105de9a>] generic_smp_call_function_interrupt+0xfb/0x118
 RSP <ffff88007f653e98>
CR2: 0000000000000000
Kernel panic - not syncing: Fatal exception in interrupt


They're pretty rare - this system did a kernbench run on a 16 vcpu 
system with no problems, then oopsed this way when I left it idle overnight.

One interesting data point is that I've been experimenting with more 
virtualization-friendly spinlock algorithms.  If I replace ticket locks 
with the old lock-byte algorithm, I see this much more frequently (and a 
spin-and-block algorithm generally doesn't get through boot).  I wonder 
if there's a race which is masked by ticket locks' strict FIFO 
algorithm?  (But this particular oops was with completely standard 
ticketlocks in place.)

I've been running your old generic IPI patches for a while with no 
problems; this seems to be specific to the version in tip.git.  I 
haven't looked to see what differences there are yet.

I've also only observed problems under Xen, but I haven't done much 
testing on real hardware.

Thanks,
    J

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] generic ipi function calls: wait on alloc failure fallback
  2008-07-06 14:50 Generic callfunction IPI problems Jeremy Fitzhardinge
@ 2008-07-06 16:03 ` Jeremy Fitzhardinge
  2008-07-06 17:21   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 3+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-06 16:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ingo Molnar, Linux Kernel Mailing List

When a GFP_ATOMIC allocation fails, smp_call_function_mask falls back
to allocating the data on the stack and converting it to a waiting call.

Make sure we actually wait in this case.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 kernel/smp.c |    1 +
 1 file changed, 1 insertion(+)

===================================================================
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -312,6 +312,7 @@ int smp_call_function_mask(cpumask_t mas
 	if (!data) {
 		data = &d;
 		data->csd.flags = CSD_FLAG_WAIT;
+		wait = 1;
 	}
 
 	spin_lock_init(&data->lock);




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] generic ipi function calls: wait on alloc failure fallback
  2008-07-06 16:03 ` [PATCH] generic ipi function calls: wait on alloc failure fallback Jeremy Fitzhardinge
@ 2008-07-06 17:21   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 3+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-06 17:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ingo Molnar, Linux Kernel Mailing List

Jeremy Fitzhardinge wrote:
> When a GFP_ATOMIC allocation fails, smp_call_function_mask falls back
> to allocating the data on the stack and converting it to a waiting call.
>
> Make sure we actually wait in this case. 

Unfortunately this doesn't solve my crash, though it may account for 
some of them.

The oops I'm looking at at the moment is a NULL pointer on ->next of the 
rcu list in generic_smp_call_function_interrupt()...

    J

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-07-06 17:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-06 14:50 Generic callfunction IPI problems Jeremy Fitzhardinge
2008-07-06 16:03 ` [PATCH] generic ipi function calls: wait on alloc failure fallback Jeremy Fitzhardinge
2008-07-06 17:21   ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox