From: Ingo Molnar <mingo@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Harry Junior <harryjr@outlook.fr>,
Tony Luck <tony.luck@intel.com>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Joe Lawrence <joe.lawrence@stratus.com>,
Borislav Petkov <bp@alien8.de>
Subject: Re: [PATCH] x86/irq: Cure live lock in irq_force_complete_move()
Date: Sat, 12 Mar 2016 16:23:51 +0100 [thread overview]
Message-ID: <20160312152351.GC7015@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1603112048030.3657@nanos>
* Thomas Gleixner <tglx@linutronix.de> wrote:
> Harry reported, that he's able to trigger a system freeze with cpu hot
> unplug. The freeze turned out to be a live lock caused by recent changes in
> irq_force_complete_move().
>
> When fixup_irqs() and from there irq_force_complete_move() is called on the
> dying cpu, then all other cpus are in stop machine and wait for the dying cpu
> to complete the teardown. If there is a move of an interrupt pending then
> irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
> mask and waits for them to clear the mask. That's obviously impossible as
> those cpus are firmly stuck in stop machine with interrupts disabled.
>
> I should have known that, but I completely overlooked it being concentrated on
> the locking issues around the vectors. And the existance of the call to
s/existence
> __irq_complete_move() in the code, which actually sends the cleanup IPI made
> it look completely logical that waiting for that cleanup to complete is the
> right thing to do. That call was bogus even before the recent changes, but it
> was the pointless distraction which tricked me not to see the real issue. :(
>
> So looking deeper into the issue I discovered that the cleanup of the vectors
> is actually pretty simple. We have to look at three cases:
>
> 1) The move_in_progress flag of the interrupt is set
>
> A) The interrupt must be moved in interrupt context, i.e. the affinity
> change takes place when the next interrupt happens.
>
> In that case the io_apic is not yet updated to the new vector, so we can
> simply restore the target domain mask to the previous state,
> i.e. old_domain, and restore the old vector in the configuration data.
>
> Further we need to check whether the affinity update actually changed
> the vector or merily reduced the target mask. If it's a new vector, then
> we need to clear the vector entries of the new vector.
>
> This undoes the pending affinity change to the old target, but with the
> outgoing cpu cleared in the target domain mask.
>
> B) The interrupt can be moved in any context, i.e. the io_apic has been
> updated with the new vector already, but no interrupt was delivered
> after that update, so we know for sure, that the next interrupt will be
> delivered to the new vector.
>
> So it's the same as case #2 where the cleanup IPI has been issued
> already and the domain cpu mask is not yet empty. See below.
>
> 2) The move_in_progress flag is not set and the old_domain cpu mask is not
> empty.
>
> That means, that an interrupt was delivered after the change and the
> cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
> responded to it yet.
>
> It does not matter in which context the io_apic update happened, the
> io_apic contains the new vector already. See also case 1B)
>
> So we know at this point that the next interrupt will arrive on the new
> vector, so we can safely cleanup the old vectors on the cpus in the
> old_domain cpu mask.
>
> Fixes: 98229aa36caa "x86/irq: Plug vector cleanup race"
> Reported-by: Harry Junior <harryjr@outlook.fr>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
> arch/x86/include/asm/hw_irq.h | 1
> arch/x86/kernel/apic/vector.c | 94 +++++++++++++++++++++++++++++++++---------
> 2 files changed, 77 insertions(+), 18 deletions(-)
Cool fix!! :-)
How much time did it take for you to figure out this one?? ...
Some minor spelchecking nits:
> + * All CPUs are stuck in stop machine with interrupts disabled so
> + * calling __irq_complete_move() would be completely pointless.
> */
> raw_spin_lock(&vector_lock);
> +
> + /*
> + * Clean out all offline cpus (including the outgoing one) from the
> + * old_domain mask.
s/cpus/CPUs
> + */
> cpumask_and(data->old_domain, data->old_domain, cpu_online_mask);
> - while (!cpumask_empty(data->old_domain)) {
> +
> + /*
> + * If move_in_progress is cleared and the old_domain mask is empty,
> + * then there is nothing to cleanup. fixup_irqs() will take care of
s/cleanup/clean up
> + * the stale vectors on the outgoing cpu.
s/cpu/CPU
> + */
> + if (!data->move_in_progress && cpumask_empty(data->old_domain)) {
> raw_spin_unlock(&vector_lock);
> - raw_spin_unlock(&desc->lock);
> - cpu_relax();
> - raw_spin_lock(&desc->lock);
> + return;
> + }
> +
> + /*
> + * We have to distinguish three cases:
> + *
> + * 1) The interrupt is in move_in_progress state and the interrupt is
> + * not marked with IRQ_MOVE_PCNTXT. That means the io_apic still
> + * points to the old vector.
> + *
> + * 2) The interrupt is in move_in_progress state and the interrupt is
> + * marked with IRQ_MOVE_PCNTXT. That means the io_apic already has
> + * the new vector.
> + *
> + * 3) The interrupt has been moved, the io_apic has already the new
> + * vector, but the cleanup IPIs have not been processed yet.
> + *
> + * #2 and #3 can be handled in the same way as the old vector is not
> + * longer in use and the vector entries of the cpus in old_domain mask
s/not longer in use/no longer in use
s/cpus/CPUs
> + * can be cleaned up safely now.
> + */
> + if (!irqd_can_move_in_process_context(irqdata) &&
> + data->move_in_progress) {
> /*
> + * We restore old_domain (the offline cpus have been masked
s/cpus/CPUs
> + /*
> + * If old_domain is not empty, then other cpus still have the
s/CPUs
> + * irq descriptor set in their vector array. Clean it up, it's
> + * not longer possible that the interrupt happens on that
> + * vector.
s/it's not longer possible/it's no longer possible
> + */
> + v = cfg->old_vector;
> + for_each_cpu(cpu, data->old_domain)
> + per_cpu(vector_irq, cpu)[v] = VECTOR_UNUSED;
> }
> + /* Cleanup the left overs of the (half finished) move */
s/Cleanup/clean up
Thanks,
Ingo
next prev parent reply other threads:[~2016-03-12 15:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-11 20:19 [PATCH] x86/irq: Cure live lock in irq_force_complete_move() Thomas Gleixner
2016-03-11 22:33 ` Luck, Tony
2016-03-12 8:27 ` Thomas Gleixner
2016-03-12 10:58 ` Thomas Gleixner
2016-03-12 15:23 ` Ingo Molnar [this message]
2016-03-12 19:56 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160312152351.GC7015@gmail.com \
--to=mingo@kernel.org \
--cc=bp@alien8.de \
--cc=harryjr@outlook.fr \
--cc=joe.lawrence@stratus.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox