From: Ingo Molnar <mingo@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Harry Junior <harryjr@outlook.fr>,
Tony Luck <tony.luck@intel.com>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Joe Lawrence <joe.lawrence@stratus.com>,
Borislav Petkov <bp@alien8.de>
Subject: Re: [PATCH] x86/irq: Cure live lock in irq_force_complete_move()
Date: Sat, 12 Mar 2016 16:23:51 +0100 [thread overview]
Message-ID: <20160312152351.GC7015@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1603112048030.3657@nanos>
* Thomas Gleixner <tglx@linutronix.de> wrote:
> Harry reported, that he's able to trigger a system freeze with cpu hot
> unplug. The freeze turned out to be a live lock caused by recent changes in
> irq_force_complete_move().
>
> When fixup_irqs() and from there irq_force_complete_move() is called on the
> dying cpu, then all other cpus are in stop machine and wait for the dying cpu
> to complete the teardown. If there is a move of an interrupt pending then
> irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
> mask and waits for them to clear the mask. That's obviously impossible as
> those cpus are firmly stuck in stop machine with interrupts disabled.
>
> I should have known that, but I completely overlooked it being concentrated on
> the locking issues around the vectors. And the existance of the call to
s/existence
> __irq_complete_move() in the code, which actually sends the cleanup IPI made
> it look completely logical that waiting for that cleanup to complete is the
> right thing to do. That call was bogus even before the recent changes, but it
> was the pointless distraction which tricked me not to see the real issue. :(
>
> So looking deeper into the issue I discovered that the cleanup of the vectors
> is actually pretty simple. We have to look at three cases:
>
> 1) The move_in_progress flag of the interrupt is set
>
> A) The interrupt must be moved in interrupt context, i.e. the affinity
> change takes place when the next interrupt happens.
>
> In that case the io_apic is not yet updated to the new vector, so we can
> simply restore the target domain mask to the previous state,
> i.e. old_domain, and restore the old vector in the configuration data.
>
> Further we need to check whether the affinity update actually changed
> the vector or merily reduced the target mask. If it's a new vector, then
> we need to clear the vector entries of the new vector.
>
> This undoes the pending affinity change to the old target, but with the
> outgoing cpu cleared in the target domain mask.
>
> B) The interrupt can be moved in any context, i.e. the io_apic has been
> updated with the new vector already, but no interrupt was delivered
> after that update, so we know for sure, that the next interrupt will be
> delivered to the new vector.
>
> So it's the same as case #2 where the cleanup IPI has been issued
> already and the domain cpu mask is not yet empty. See below.
>
> 2) The move_in_progress flag is not set and the old_domain cpu mask is not
> empty.
>
> That means, that an interrupt was delivered after the change and the
> cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
> responded to it yet.
>
> It does not matter in which context the io_apic update happened, the
> io_apic contains the new vector already. See also case 1B)
>
> So we know at this point that the next interrupt will arrive on the new
> vector, so we can safely cleanup the old vectors on the cpus in the
> old_domain cpu mask.
>
> Fixes: 98229aa36caa "x86/irq: Plug vector cleanup race"
> Reported-by: Harry Junior <harryjr@outlook.fr>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
> arch/x86/include/asm/hw_irq.h | 1
> arch/x86/kernel/apic/vector.c | 94 +++++++++++++++++++++++++++++++++---------
> 2 files changed, 77 insertions(+), 18 deletions(-)
Cool fix!! :-)
How much time did it take for you to figure out this one?? ...
Some minor spelchecking nits:
> + * All CPUs are stuck in stop machine with interrupts disabled so
> + * calling __irq_complete_move() would be completely pointless.
> */
> raw_spin_lock(&vector_lock);
> +
> + /*
> + * Clean out all offline cpus (including the outgoing one) from the
> + * old_domain mask.
s/cpus/CPUs
> + */
> cpumask_and(data->old_domain, data->old_domain, cpu_online_mask);
> - while (!cpumask_empty(data->old_domain)) {
> +
> + /*
> + * If move_in_progress is cleared and the old_domain mask is empty,
> + * then there is nothing to cleanup. fixup_irqs() will take care of
s/cleanup/clean up
> + * the stale vectors on the outgoing cpu.
s/cpu/CPU
> + */
> + if (!data->move_in_progress && cpumask_empty(data->old_domain)) {
> raw_spin_unlock(&vector_lock);
> - raw_spin_unlock(&desc->lock);
> - cpu_relax();
> - raw_spin_lock(&desc->lock);
> + return;
> + }
> +
> + /*
> + * We have to distinguish three cases:
> + *
> + * 1) The interrupt is in move_in_progress state and the interrupt is
> + * not marked with IRQ_MOVE_PCNTXT. That means the io_apic still
> + * points to the old vector.
> + *
> + * 2) The interrupt is in move_in_progress state and the interrupt is
> + * marked with IRQ_MOVE_PCNTXT. That means the io_apic already has
> + * the new vector.
> + *
> + * 3) The interrupt has been moved, the io_apic has already the new
> + * vector, but the cleanup IPIs have not been processed yet.
> + *
> + * #2 and #3 can be handled in the same way as the old vector is not
> + * longer in use and the vector entries of the cpus in old_domain mask
s/not longer in use/no longer in use
s/cpus/CPUs
> + * can be cleaned up safely now.
> + */
> + if (!irqd_can_move_in_process_context(irqdata) &&
> + data->move_in_progress) {
> /*
> + * We restore old_domain (the offline cpus have been masked
s/cpus/CPUs
> + /*
> + * If old_domain is not empty, then other cpus still have the
s/CPUs
> + * irq descriptor set in their vector array. Clean it up, it's
> + * not longer possible that the interrupt happens on that
> + * vector.
s/it's not longer possible/it's no longer possible
> + */
> + v = cfg->old_vector;
> + for_each_cpu(cpu, data->old_domain)
> + per_cpu(vector_irq, cpu)[v] = VECTOR_UNUSED;
> }
> + /* Cleanup the left overs of the (half finished) move */
s/Cleanup/clean up
Thanks,
Ingo
next prev parent reply other threads:[~2016-03-12 15:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-11 20:19 [PATCH] x86/irq: Cure live lock in irq_force_complete_move() Thomas Gleixner
2016-03-11 22:33 ` Luck, Tony
2016-03-12 8:27 ` Thomas Gleixner
2016-03-12 10:58 ` Thomas Gleixner
2016-03-12 15:23 ` Ingo Molnar [this message]
2016-03-12 19:56 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160312152351.GC7015@gmail.com \
--to=mingo@kernel.org \
--cc=bp@alien8.de \
--cc=harryjr@outlook.fr \
--cc=joe.lawrence@stratus.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.