From: ebiederm@xmission.com (Eric W. Biederman)
To: Gary Hade <garyhade@us.ibm.com>
Cc: mingo@elte.hu, mingo@redhat.com, linux-kernel@vger.kernel.org,
tglx@linutronix.de, hpa@zytor.com, x86@kernel.org,
yinghai@kernel.org, lcm@us.ibm.com
Subject: Re: [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "active" device IRQ interrruption
Date: Wed, 03 Jun 2009 05:27:13 -0700 [thread overview]
Message-ID: <m1y6s9qz8e.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20090602193216.GC7282@us.ibm.com> (Gary Hade's message of "Tue\, 2 Jun 2009 12\:32\:16 -0700")
Gary Hade <garyhade@us.ibm.com> writes:
> Impact: Eliminates an issue that can leave the system in an
> unusable state.
>
> This patch addresses an issue where device generated IRQs
> are no longer seen by the kernel following IRQ affinity
> migration while the device is generating IRQs at a high rate.
>
> I have been able to consistently reproduce the problem on
> some of our systems by running the following script (VICTIM_IRQ
> specifies the IRQ for the aic94xx device) while a single instance
> of the command
> # while true; do find / -exec file {} \;; done
> is keeping the filesystem activity and IRQ rate reasonably high.
To be 100% clear.
If masking and checking to see if the irq was already pending was
sufficient to migrate irqs in process context was enough to
safely migrate irqs in process context then that is how we would
always do it. I have been down that road and down some extensive
testing in the past.
I found hardware bugs in both AMD and Intel IOAPIC that make your
code demonstrably unsafe.
I was challenged by some of the software guys from Intel and eventually
the came back and told me they had talked with their hardware engineers
and I was correct.
So no. This code is totally and severely broken and we should not do
it.
You are introducing complexity and heuristics to avoid the fact that
fixup_irqs is fundamentally broken. Sure you might tweak things
so they work a little more often.
> The root cause is a known issue already addressed for some
> code paths [e.g. ack_apic_level() and the now obsolete
> migrate_irq_remapped_level_desc()] where the ioapic can
> misbehave when the I/O redirection table register is written
> while the Remote IRR bit is set.
No the reason we do this is not because of the IRR. Although
that certainly does not help.
We do this because it is not in general safe to do complicated
reprogramming to the ioapic while the hardware may send an
irq. You can lock up the hardware state machine etc.
If the work around was as simple as you propose a delayed work or busy
waiting until the irq handler was complete variant would have been
written and used long ago.
So my reaction to this horrible afterthought is
NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO
PLEASE NO.
Eric
prev parent reply other threads:[~2009-06-03 12:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-02 19:32 [RESEND] [PATCH v2] [BUGFIX] x86/x86_64: fix CPU offlining triggered "active" device IRQ interrruption Gary Hade
2009-06-03 4:51 ` H. Peter Anvin
2009-06-03 16:40 ` Gary Hade
2009-06-03 12:03 ` Eric W. Biederman
2009-06-03 17:06 ` Gary Hade
2009-06-03 21:13 ` Eric W. Biederman
2009-06-04 20:04 ` Gary Hade
2009-06-04 21:17 ` Gary Hade
2009-06-04 23:16 ` Eric W. Biederman
2009-06-03 12:27 ` Eric W. Biederman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1y6s9qz8e.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=garyhade@us.ibm.com \
--cc=hpa@zytor.com \
--cc=lcm@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox