From: Vivek Goyal <vgoyal@redhat.com>
To: Don Zickus <dzickus@redhat.com>
Cc: x86@kernel.org, kexec-list <kexec@lists.infradead.org>,
LKML <linux-kernel@vger.kernel.org>,
ebiederm@xmission.com
Subject: Re: [PATCH] x86, kdump, ioapic: Fix kdump race with migrating irq
Date: Tue, 31 Jan 2012 16:37:13 -0500 [thread overview]
Message-ID: <20120131213713.GC4378@redhat.com> (raw)
In-Reply-To: <1328045114-4489-1-git-send-email-dzickus@redhat.com>
On Tue, Jan 31, 2012 at 04:25:14PM -0500, Don Zickus wrote:
> A customer of ours noticed when their machine crashed, kdump did not
> work but hung instead. Using their firmware dumping solution they
> grabbed a vmcore and decoded the stacks on the cpus. What they
> noticed seemed to be a rare deadlock with the ioapic_lock.
>
> CPU4:
> machine_crash_shutdown
> -> machine_ops.crash_shutdown
> -> native_machine_crash_shutdown
> -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
> -> disable_IO_APIC
> -> clear_IO_APIC
> -> clear_IO_APIC_pin
> -> ioapic_read_entry
> -> spin_lock_irqsave(&ioapic_lock, flags)
> ---Infinite loop here---
>
> CPU0:
> do_IRQ
> -> handle_irq
> -> handle_edge_irq
> -> ack_apic_edge
> -> move_native_irq
> -> mask_IO_APIC_irq
> -> mask_IO_APIC_irq_desc
> -> spin_lock_irqsave(&ioapic_lock, flags)
> ---Receive NMI here after getting spinlock---
> -> nmi
> -> do_nmi
> -> crash_nmi_callback
> ---Infinite loop here---
>
> The problem is that although kdump tries to shutdown minimal hardware,
> it still needs to disable the IO APIC. This requires spinlocks which
> may be held by another cpu. This other cpu is being held infinitely in
> an NMI context by kdump in order to serialize the crashing path. Instant
> deadlock.
>
> I attempted to resolve this by busting the spinlock in the kdump case only.
> My justification was that kdump has already stopped the other cpus and it
> is only clearing the io apic which shouldn't cause harm when overwriting
> what the other cpu was doing.
>
> I tested this by loading a dummy module that grabs the ioapic_lock and then
> on another cpu, run 'echo c > /proc/sysrq-trigger'. The deadlock was detected
> and fixed with the patch below.
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
Sounds reasonable to me.
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: Don Zickus <dzickus@redhat.com>
Cc: x86@kernel.org, LKML <linux-kernel@vger.kernel.org>,
ebiederm@xmission.com, kexec-list <kexec@lists.infradead.org>
Subject: Re: [PATCH] x86, kdump, ioapic: Fix kdump race with migrating irq
Date: Tue, 31 Jan 2012 16:37:13 -0500 [thread overview]
Message-ID: <20120131213713.GC4378@redhat.com> (raw)
In-Reply-To: <1328045114-4489-1-git-send-email-dzickus@redhat.com>
On Tue, Jan 31, 2012 at 04:25:14PM -0500, Don Zickus wrote:
> A customer of ours noticed when their machine crashed, kdump did not
> work but hung instead. Using their firmware dumping solution they
> grabbed a vmcore and decoded the stacks on the cpus. What they
> noticed seemed to be a rare deadlock with the ioapic_lock.
>
> CPU4:
> machine_crash_shutdown
> -> machine_ops.crash_shutdown
> -> native_machine_crash_shutdown
> -> kdump_nmi_shootdown_cpus ------> Send NMI to other CPUs
> -> disable_IO_APIC
> -> clear_IO_APIC
> -> clear_IO_APIC_pin
> -> ioapic_read_entry
> -> spin_lock_irqsave(&ioapic_lock, flags)
> ---Infinite loop here---
>
> CPU0:
> do_IRQ
> -> handle_irq
> -> handle_edge_irq
> -> ack_apic_edge
> -> move_native_irq
> -> mask_IO_APIC_irq
> -> mask_IO_APIC_irq_desc
> -> spin_lock_irqsave(&ioapic_lock, flags)
> ---Receive NMI here after getting spinlock---
> -> nmi
> -> do_nmi
> -> crash_nmi_callback
> ---Infinite loop here---
>
> The problem is that although kdump tries to shutdown minimal hardware,
> it still needs to disable the IO APIC. This requires spinlocks which
> may be held by another cpu. This other cpu is being held infinitely in
> an NMI context by kdump in order to serialize the crashing path. Instant
> deadlock.
>
> I attempted to resolve this by busting the spinlock in the kdump case only.
> My justification was that kdump has already stopped the other cpus and it
> is only clearing the io apic which shouldn't cause harm when overwriting
> what the other cpu was doing.
>
> I tested this by loading a dummy module that grabs the ioapic_lock and then
> on another cpu, run 'echo c > /proc/sysrq-trigger'. The deadlock was detected
> and fixed with the patch below.
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
Sounds reasonable to me.
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Thanks
Vivek
next prev parent reply other threads:[~2012-01-31 21:37 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-31 21:25 [PATCH] x86, kdump, ioapic: Fix kdump race with migrating irq Don Zickus
2012-01-31 21:25 ` Don Zickus
2012-01-31 21:37 ` Vivek Goyal [this message]
2012-01-31 21:37 ` Vivek Goyal
2012-01-31 22:08 ` Eric W. Biederman
2012-01-31 22:08 ` Eric W. Biederman
2012-01-31 22:27 ` Don Zickus
2012-01-31 22:27 ` Don Zickus
2012-01-31 22:38 ` Eric W. Biederman
2012-01-31 22:38 ` Eric W. Biederman
2012-02-01 23:04 ` Don Zickus
2012-02-01 23:04 ` Don Zickus
2012-02-02 1:34 ` Eric W. Biederman
2012-02-02 1:34 ` Eric W. Biederman
2012-02-02 15:33 ` Don Zickus
2012-02-02 15:33 ` Don Zickus
2012-02-02 17:45 ` Don Zickus
2012-02-02 17:45 ` Don Zickus
2012-02-20 15:20 ` Seiji Aguchi
2012-02-20 15:20 ` Seiji Aguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120131213713.GC4378@redhat.com \
--to=vgoyal@redhat.com \
--cc=dzickus@redhat.com \
--cc=ebiederm@xmission.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.