public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Generic callfunction IPI problems
@ 2008-07-06 14:50 Jeremy Fitzhardinge
  2008-07-06 16:03 ` [PATCH] generic ipi function calls: wait on alloc failure fallback Jeremy Fitzhardinge
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-06 14:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ingo Molnar, Linux Kernel Mailing List

Hi Jens,

I'm seeing these oopses when running under Xen:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<ffffffff8105de9a>] generic_smp_call_function_interrupt+0xfb/0x118
PGD 0 
Oops: 0000 [1] SMP 
CPU 15 
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.26-rc8-tip #306
RIP: e030:[<ffffffff8105de9a>]  [<ffffffff8105de9a>] generic_smp_call_function_interrupt+0xfb/0x118
RSP: e02b:ffff88007f653e98  EFLAGS: 00010046
RAX: ffffffff815fe6e0 RBX: ffff88007e523cc8 RCX: 0000000000000001
RDX: ffffc10000200200 RSI: 0000000000000001 RDI: ffffffff81693240
RBP: ffff88007f653eb8 R08: ffff88007f653ec8 R09: 0002db11ddd83820
R10: ffff880000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 000000000000000f R15: 0000000000000040
FS:  00007f1dadd907a0(0000) GS:ffff88007ff30080(0000) knlGS:0000000000000000
CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001001000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
Process swapper (pid: 0, threadinfo ffff88007ff90000, task ffff88007ff57100)
Stack:  ffff88007ff2f4c0 0000000000000000 0000000000000000 000000000000004d
 ffff88007f653ec8 ffffffff8100dea7 ffff88007f653ef8 ffffffff810747c5
 ffffffff816959c0 000000000000004d ffff88007ff2f4c0 ffffffff81695a10
Call Trace:
 <IRQ>  [<ffffffff8100dea7>] xen_call_function_interrupt+0xe/0x167
 [<ffffffff810747c5>] handle_IRQ_event+0x2e/0x65
 [<ffffffff81075e5b>] handle_level_irq+0xb5/0x116
 [<ffffffff81013f34>] do_IRQ+0xf7/0x177
 [<ffffffff811ba227>] xen_evtchn_do_upcall+0xb3/0x136
 [<ffffffff8141558e>] xen_do_hypervisor_callback+0x1e/0x30
 <EOI>  [<ffffffff810093aa>] ? _stext+0x3aa/0x1000
 [<ffffffff810093aa>] ? _stext+0x3aa/0x1000
 [<ffffffff8100a42e>] ? xen_safe_halt+0x10/0x1a
 [<ffffffff8100ba26>] ? xen_idle+0x46/0x5c
 [<ffffffff8100eb60>] ? cpu_idle+0xca/0x101
 [<ffffffff8140bc7d>] ? cpu_bringup_and_idle+0x8a/0x8f


Code: e8 fc 96 fc ff 90 41 f6 44 24 20 01 74 08 41 83 64 24 20 fe eb 11 49 8d 7c 24 38 48 c7 c6 90 dd 05 81 e8 6f 92 01 00 4d 8b 24 24 <49> 8b 04 24 49 81 fc e0 e6 5f 81 0f 18 08 0f 85 11 ff ff ff 5b 
RIP  [<ffffffff8105de9a>] generic_smp_call_function_interrupt+0xfb/0x118
 RSP <ffff88007f653e98>
CR2: 0000000000000000
Kernel panic - not syncing: Fatal exception in interrupt


They're pretty rare - this system did a kernbench run on a 16 vcpu 
system with no problems, then oopsed this way when I left it idle overnight.

One interesting data point is that I've been experimenting with more 
virtualization-friendly spinlock algorithms.  If I replace ticket locks 
with the old lock-byte algorithm, I see this much more frequently (and a 
spin-and-block algorithm generally doesn't get through boot).  I wonder 
if there's a race which is masked by ticket locks' strict FIFO 
algorithm?  (But this particular oops was with completely standard 
ticketlocks in place.)

I've been running your old generic IPI patches for a while with no 
problems; this seems to be specific to the version in tip.git.  I 
haven't looked to see what differences there are yet.

I've also only observed problems under Xen, but I haven't done much 
testing on real hardware.

Thanks,
    J

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [PATCH] generic ipi function calls: wait on alloc failure fallback
@ 2008-07-15 20:22 Jeremy Fitzhardinge
  2008-07-15 21:48 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-07-15 20:22 UTC (permalink / raw)
  To: Jens Axboe, Ingo Molnar; +Cc: Linux Kernel Mailing List, Linus Torvalds

When a GFP_ATOMIC allocation fails, it falls back to allocating the
data on the stack and converting it to a waiting call.

Make sure we actually wait in this case.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 kernel/smp.c |    1 +
 1 file changed, 1 insertion(+)

===================================================================
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -312,6 +312,7 @@ int smp_call_function_mask(cpumask_t mas
 	if (!data) {
 		data = &d;
 		data->csd.flags = CSD_FLAG_WAIT;
+		wait = 1;
 	}
 
 	spin_lock_init(&data->lock);




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-07-18 22:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-06 14:50 Generic callfunction IPI problems Jeremy Fitzhardinge
2008-07-06 16:03 ` [PATCH] generic ipi function calls: wait on alloc failure fallback Jeremy Fitzhardinge
2008-07-06 17:21   ` Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2008-07-15 20:22 Jeremy Fitzhardinge
2008-07-15 21:48 ` Ingo Molnar
2008-07-15 22:01   ` Jeremy Fitzhardinge
2008-07-18 22:19     ` Ingo Molnar
2008-07-18 22:42       ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox