From: Gary Hade <garyhade@us.ibm.com>
To: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Gary Hade <garyhade@us.ibm.com>,
mingo@elte.hu, mingo@redhat.com, tglx@linutronix.de,
hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org,
lcm@us.ibm.com
Subject: Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption
Date: Thu, 9 Apr 2009 12:17:08 -0700 [thread overview]
Message-ID: <20090409191707.GA7247@us.ibm.com> (raw)
In-Reply-To: <86802c440904081659l1ec30838l99fcb9c693363d00@mail.gmail.com>
On Wed, Apr 08, 2009 at 04:59:35PM -0700, Yinghai Lu wrote:
> On Wed, Apr 8, 2009 at 4:58 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> > On Wed, Apr 8, 2009 at 4:37 PM, Gary Hade <garyhade@us.ibm.com> wrote:
> >> On Wed, Apr 08, 2009 at 03:30:15PM -0700, Yinghai Lu wrote:
> >>> On Wed, Apr 8, 2009 at 2:07 PM, Gary Hade <garyhade@us.ibm.com> wrote:
> >>> > Impact: Eliminates a race that can leave the system in an
> >>> > unusable state
> >>> >
> >>> > During rapid offlining of multiple CPUs there is a chance
> >>> > that an IRQ affinity move destination CPU will be offlined
> >>> > before the IRQ affinity move initiated during the offlining
> >>> > of a previous CPU completes. This can happen when the device
> >>> > is not very active and thus fails to generate the IRQ that is
> >>> > needed to complete the IRQ affinity move before the move
> >>> > destination CPU is offlined. When this happens there is an
> >>> > -EBUSY return from __assign_irq_vector() during the offlining
> >>> > of the IRQ move destination CPU which prevents initiation of
> >>> > a new IRQ affinity move operation to an online CPU. This
> >>> > leaves the IRQ affinity set to an offlined CPU.
> >>> >
> >>> > I have been able to reproduce the problem on some of our
> >>> > systems using the following script. When the system is idle
> >>> > the problem often reproduces during the first CPU offlining
> >>> > sequence.
> >>> >
> >>> > #!/bin/sh
> >>> >
> >>> > SYS_CPU_DIR=/sys/devices/system/cpu
> >>> > VICTIM_IRQ=25
> >>> > IRQ_MASK=f0
> >>> >
> >>> > iteration=0
> >>> > while true; do
> >>> > echo $iteration
> >>> > echo $IRQ_MASK > /proc/irq/$VICTIM_IRQ/smp_affinity
> >>> > for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
> >>> > echo 0 > $cpudir/online
> >>> > done
> >>> > for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
> >>> > echo 1 > $cpudir/online
> >>> > done
> >>> > iteration=`expr $iteration + 1`
> >>> > done
> >>> >
> >>> > The proposed fix takes advantage of the fact that when all
> >>> > CPUs in the old domain are offline there is nothing to be done
> >>> > by send_cleanup_vector() during the affinity move completion.
> >>> > So, we simply avoid setting cfg->move_in_progress preventing
> >>> > the above mentioned -EBUSY return from __assign_irq_vector().
> >>> > This allows initiation of a new IRQ affinity move to a CPU
> >>> > that is not going offline.
> >>> >
> >>> > Signed-off-by: Gary Hade <garyhade@us.ibm.com>
> >>> >
> >>> > ---
> >>> > arch/x86/kernel/apic/io_apic.c | 11 ++++++++---
> >>> > 1 file changed, 8 insertions(+), 3 deletions(-)
> >>> >
> >>> > Index: linux-2.6.30-rc1/arch/x86/kernel/apic/io_apic.c
> >>> > ===================================================================
> >>> > --- linux-2.6.30-rc1.orig/arch/x86/kernel/apic/io_apic.c 2009-04-08 09:23:00.000000000 -0700
> >>> > +++ linux-2.6.30-rc1/arch/x86/kernel/apic/io_apic.c 2009-04-08 09:23:16.000000000 -0700
> >>> > @@ -363,7 +363,8 @@ set_extra_move_desc(struct irq_desc *des
> >>> > struct irq_cfg *cfg = desc->chip_data;
> >>> >
> >>> > if (!cfg->move_in_progress) {
> >>> > - /* it means that domain is not changed */
> >>> > + /* it means that domain has not changed or all CPUs
> >>> > + * in old domain are offline */
> >>> > if (!cpumask_intersects(desc->affinity, mask))
> >>> > cfg->move_desc_pending = 1;
> >>> > }
> >>> > @@ -1262,8 +1263,11 @@ next:
> >>> > current_vector = vector;
> >>> > current_offset = offset;
> >>> > if (old_vector) {
> >>> > - cfg->move_in_progress = 1;
> >>> > cpumask_copy(cfg->old_domain, cfg->domain);
> >>> > + if (cpumask_intersects(cfg->old_domain,
> >>> > + cpu_online_mask)) {
> >>> > + cfg->move_in_progress = 1;
> >>> > + }
> >>> > }
> >>> > for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
> >>> > per_cpu(vector_irq, new_cpu)[vector] = irq;
> >>> > @@ -2492,7 +2496,8 @@ static void irq_complete_move(struct irq
> >>> > if (likely(!cfg->move_desc_pending))
> >>> > return;
> >>> >
> >>> > - /* domain has not changed, but affinity did */
> >>> > + /* domain has not changed or all CPUs in old domain
> >>> > + * are offline, but affinity changed */
> >>> > me = smp_processor_id();
> >>> > if (cpumask_test_cpu(me, desc->affinity)) {
> >>> > *descp = desc = move_irq_desc(desc, me);
> >>> > --
> >>>
> >>> so you mean during __assign_irq_vector(), cpu_online_mask get updated?
> >>
> >> No, the CPU being offlined is removed from cpu_online_mask
> >> earlier via a call to remove_cpu_from_maps() from
> >> cpu_disable_common(). This happens just before fixup_irqs()
> >> is called.
> >>
> >>> with your patch, how about that it just happen right after you check
> >>> that second time.
> >>>
> >>> it seems we are missing some lock_vector_lock() on the remove cpu from
> >>> online mask.
> >>
> >> The remove_cpu_from_maps() call in cpu_disable_common() is vector
> >> lock protected:
> >> void cpu_disable_common(void)
> >> {
> >> < snip >
> >> /* It's now safe to remove this processor from the online map */
> >> lock_vector_lock();
> >> remove_cpu_from_maps(cpu);
> >> unlock_vector_lock();
> >> fixup_irqs();
> >> }
> >
> >
> > __assign_irq_vector always has vector_lock locked...
OK, I see the 'vector_lock' spin_lock_irqsave/spin_unlock_irqrestore
surrounding the __assign_irq_vector call in assign_irq_vector.
> > so cpu_online_mask will not changed during,
I understand that this 'vector_lock' acquisition prevents
multiple simultaneous executions of __assign_irq_vector but
does that really prevent another thread executing outside
__assign_irq_vector (or outside other 'vector_lock' serialized
code) from modifying cpu_online_mask?
Isn't it really 'cpu_add_remove_lock' (also held when
__assign_irq_vector() is called in the context of a CPU add
or remove) that is used for this purpose?
> > why do you need to check that again in __assign_irq_vector ?
Because that is where the cfg->move_in_progress flag was
being set.
Is there some reason that the content of cpu_online_mask
cannot be trusted at this location?
If all the CPUs in the old domain are offline doesn't
that imply that we got to that location in response to
a CPU offline request?
> >
> looks like you need to clear move_in_progress in fixup_irqs()
This would be a difficult since I believe the code is
currently partitioned in a manner that prevents access to
irq_cfg records from functions defined in arch/x86/kernel/irq_32.c
and arch/x86/kernel/irq_64.c. It also doesn't feel right to
allow cfg->move_in_progress to be set in __assign_irq_vector
and then clear it in fixup_irqs().
Gary
--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc
next prev parent reply other threads:[~2009-04-09 19:17 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-08 21:07 [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption Gary Hade
2009-04-08 22:30 ` Yinghai Lu
2009-04-08 23:37 ` Gary Hade
2009-04-08 23:58 ` Yinghai Lu
2009-04-08 23:59 ` Yinghai Lu
2009-04-09 19:17 ` Gary Hade [this message]
2009-04-09 22:38 ` Yinghai Lu
2009-04-10 0:53 ` Gary Hade
2009-04-10 1:29 ` Eric W. Biederman
2009-04-10 20:09 ` Gary Hade
2009-04-10 22:02 ` Eric W. Biederman
2009-04-11 7:44 ` Yinghai Lu
2009-04-11 7:51 ` Yinghai Lu
2009-04-11 11:01 ` Eric W. Biederman
2009-04-13 17:41 ` Pallipadi, Venkatesh
2009-04-13 18:50 ` Eric W. Biederman
2009-04-13 22:20 ` [PATCH] irq, x86: Remove IRQ_DISABLED check in process context IRQ move Pallipadi, Venkatesh
2009-04-14 1:40 ` Eric W. Biederman
2009-04-14 14:06 ` [tip:irq/urgent] x86, irq: " tip-bot for Pallipadi, Venkatesh
2009-04-12 19:32 ` [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption Eric W. Biederman
2009-04-13 21:09 ` Gary Hade
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090409191707.GA7247@us.ibm.com \
--to=garyhade@us.ibm.com \
--cc=hpa@zytor.com \
--cc=lcm@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=yhlu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox