From: Prarit Bhargava <prarit@redhat.com>
To: rui wang <ruiv.wang@gmail.com>
Cc: linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Michel Lespinasse <walken@google.com>,
Andi Kleen <ak@linux.intel.com>,
Seiji Aguchi <seiji.aguchi@hds.com>,
Yang Zhang <yang.z.zhang@intel.com>,
Paul Gortmaker <paul.gortmaker@windriver.com>,
janet.morgan@intel.com, tony.luck@intel.com
Subject: Re: [PATCH] x86, Fix do_IRQ interrupt warning for cpu hotplug retriggered irqs
Date: Mon, 23 Dec 2013 10:29:23 -0500 [thread overview]
Message-ID: <52B856D3.4030802@redhat.com> (raw)
In-Reply-To: <CANVTcTZ-ZkTvR0+=eFyNQ7E8R2UYo1qdA-RQ+9nzK2w=qCPkPQ@mail.gmail.com>
On 12/23/2013 04:41 AM, rui wang wrote:
> On 12/2/13, Prarit Bhargava <prarit@redhat.com> wrote:
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64831
>>
>> When downing a cpu it is possible that there are unhandled irqs left in
>> the APIC IRR register. fixup_irqs() goes through the IRR and retriggers
>> the IRQs left in the APIC IRR. After this, the vector for the irq is set
>> to -1. There is a possibility here, however, that the CPU does handle an
>> irq in the IRR and then calls the vector.
>>
>
> The patch does not seem to root-cause the problem. It seems to hide
> the real problem.
>
> It is not possible that a device-triggered irq can arrive to this cpu
> again after fixup_irqs() fills its vector_irq[vector] to -1, because
> we've done the following:
>
> 1. We disabled interrupt on this cpu in stop_machine().
> 2. We called irq_set_affinity() to exclude this cpu as a target for the irq.
> 3. We checked APIC_IRR and re-triggered any pending irqs to other cpus.
... and we set the IRQ handler to -1 for the down'd cpu.
Rui, I think you're right up to here but I think this has nothing to do with IPI
or locking.
I assumed that the issue I was trying to fix was long standing and well-known
within the kernel given some of the comments I had read here-and-there about
people seeing the do_IRQ errors on LKML. There have long been reports of the
do_IRQ warning output during cpu down.
Here's what the issue is after step 3 above...
4. The APIC_IRR is still *set* in the down'd cpu with IRQs disabled.
5. We continue executing the stop_machine "down" portion of the code, then
continue executing in normal context the "die" code (ie, __cpu_die()).
IRQ disable only pertains stop_machine down. So after we leave that context,
IRR will still execute. While the kernel is spinning in cpu_die(), the down'd
cpu attempts to execute handler for IRQ in IRR ... and can't find one because
we've set it to -1. So we see the warning.
A few additional debug points:
1. I put a printk in fixup_irq when we call the irq_retrigger on another cpu
that dumps the the down'd CPU and IRQ # in fixup_irqs(). I see that printk
*EVERYTIME* I see the do_IRQ warning.
2. The do_IRQ warning *always* appears before I see the offline message ...
[ 148.656016] Broke affinity for irq 634
[ 148.660493] Broke affinity for irq 698
[ 148.665739] kvm: disabling virtualization on CPU58
[ 148.666732] PRARIT: 58.208 IRR entry ... irq_retrigger call.
at this point we've left the stop_machine() code and we're now continuing to
execute ... then we hit the cpu_die() ... which spins.
[ 148.671106] do_IRQ: 58.208 No irq handler for vector (irq -1)
[ 148.677544] smpboot: CPU 58 is now offline
I think I have root caused this to the IRR being set in the down'd cpu. It is
admittedly a rare occurrence in the kernel. I usually have to run about 1000 up
and down's before hitting it, however, on my current test system it seems to hit
much more frequently, almost 1 in 64 times.
P.
next prev parent reply other threads:[~2013-12-23 15:29 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-02 13:23 [PATCH] x86, Fix do_IRQ interrupt warning for cpu hotplug retriggered irqs Prarit Bhargava
2013-12-23 9:41 ` rui wang
2013-12-23 15:29 ` Prarit Bhargava [this message]
2013-12-24 4:41 ` rui wang
2013-12-24 13:11 ` Prarit Bhargava
2013-12-25 8:22 ` rui wang
2013-12-27 16:14 ` Prarit Bhargava
-- strict thread matches above, loose matches on Subject: below --
2013-11-19 16:24 Prarit Bhargava
2013-11-11 23:08 Prarit Bhargava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52B856D3.4030802@redhat.com \
--to=prarit@redhat.com \
--cc=ak@linux.intel.com \
--cc=hpa@zytor.com \
--cc=janet.morgan@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paul.gortmaker@windriver.com \
--cc=ruiv.wang@gmail.com \
--cc=seiji.aguchi@hds.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=walken@google.com \
--cc=x86@kernel.org \
--cc=yang.z.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox