public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gary Hade <garyhade@us.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gary Hade <garyhade@us.ibm.com>,
	mingo@elte.hu, mingo@redhat.com, tglx@linutronix.de,
	hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org,
	lcm@us.ibm.com
Subject: Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption
Date: Mon, 13 Apr 2009 14:09:13 -0700	[thread overview]
Message-ID: <20090413210913.GC8393@us.ibm.com> (raw)
In-Reply-To: <m1ljq5r7lw.fsf@fess.ebiederm.org>

On Sun, Apr 12, 2009 at 12:32:11PM -0700, Eric W. Biederman wrote:
> Gary Hade <garyhade@us.ibm.com> writes:
> 
> > Impact: Eliminates a race that can leave the system in an
> >         unusable state
> >
> > During rapid offlining of multiple CPUs there is a chance
> > that an IRQ affinity move destination CPU will be offlined
> > before the IRQ affinity move initiated during the offlining
> > of a previous CPU completes.  This can happen when the device
> > is not very active and thus fails to generate the IRQ that is
> > needed to complete the IRQ affinity move before the move
> > destination CPU is offlined.  When this happens there is an
> > -EBUSY return from __assign_irq_vector() during the offlining
> > of the IRQ move destination CPU which prevents initiation of
> > a new IRQ affinity move operation to an online CPU.  This
> > leaves the IRQ affinity set to an offlined CPU.
> >
> > I have been able to reproduce the problem on some of our
> > systems using the following script.  When the system is idle
> > the problem often reproduces during the first CPU offlining
> > sequence.
> 
> Ok.  I have had a chance to think through what you your patches
> are doing and it is assuming the broken logic in cpu_down is correct
> and patching over some but not all of the problems.
> 
> First the problem is not migrating irqs when IRR is set.

When the device is very active, a printk in __target_IO_APIC_irq()
immediately prior to
    io_apic_modify(apic, 0x10 + pin*2, reg);
intermittently displays 'reg' values indicating that the
Remote IRR bit is set.

With PATCH 3/3 the same printk displays no 'reg' values
indicating that the Remote IRR bit is set _and_ the IRQ
interruption problem disappears.

This is what led me to very strongly believe that the
problem was caused by writing the I/O redirection table
register while the Remote IRR bit was set.

> The general
> problem is that the state machines in most ioapics are fragile and
> can get confused if you reprogram them at any point when an irq can
> come in.

IRQs are masked [from fixup_irqs() when offlining a CPU, from
ack_apic_level() when not offlining a CPU] during the reprogramming.
Does this not help avoid the issue?  Sorry if this is a nieve
question.

>  In the middle of an interrupt handler is the one time we
> know interrupts can not come in.
> 
> To really fix this problem we need to do two things.
> 1) Tack when irqs that can not be migrated from process context are
>    on a cpu, and deny cpu hot-unplug.
> 2) Modify every interrupt that can be safely migrated in interrupt context
>    to migrate irqs in interrupt context so no one encounters this problem
>    in practice.
> 
> We can update MSIs and do a pci read to know when the update has made it
> to a device.  Multi MSI is a disaster but I won't go there.
> 
> In lowest priority delivery mode when the irq is not changing domain but
> just changing the set of possible cpus the interrupt can be delivered to.
> 
> And then of course all of the fun iommus that remap irqs.

Sounds non-trivial.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc


      reply	other threads:[~2009-04-13 21:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-08 21:07 [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption Gary Hade
2009-04-08 22:30 ` Yinghai Lu
2009-04-08 23:37   ` Gary Hade
2009-04-08 23:58     ` Yinghai Lu
2009-04-08 23:59       ` Yinghai Lu
2009-04-09 19:17         ` Gary Hade
2009-04-09 22:38           ` Yinghai Lu
2009-04-10  0:53             ` Gary Hade
2009-04-10  1:29 ` Eric W. Biederman
2009-04-10 20:09   ` Gary Hade
2009-04-10 22:02     ` Eric W. Biederman
2009-04-11  7:44       ` Yinghai Lu
2009-04-11  7:51       ` Yinghai Lu
2009-04-11 11:01         ` Eric W. Biederman
2009-04-13 17:41           ` Pallipadi, Venkatesh
2009-04-13 18:50             ` Eric W. Biederman
2009-04-13 22:20               ` [PATCH] irq, x86: Remove IRQ_DISABLED check in process context IRQ move Pallipadi, Venkatesh
2009-04-14  1:40                 ` Eric W. Biederman
2009-04-14 14:06                 ` [tip:irq/urgent] x86, irq: " tip-bot for Pallipadi, Venkatesh
2009-04-12 19:32 ` [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption Eric W. Biederman
2009-04-13 21:09   ` Gary Hade [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090413210913.GC8393@us.ibm.com \
    --to=garyhade@us.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=lcm@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox