From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753420AbcCLPX5 (ORCPT ); Sat, 12 Mar 2016 10:23:57 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36396 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752103AbcCLPX4 (ORCPT ); Sat, 12 Mar 2016 10:23:56 -0500 Date: Sat, 12 Mar 2016 16:23:51 +0100 From: Ingo Molnar To: Thomas Gleixner Cc: LKML , Harry Junior , Tony Luck , x86@kernel.org, Peter Zijlstra , Joe Lawrence , Borislav Petkov Subject: Re: [PATCH] x86/irq: Cure live lock in irq_force_complete_move() Message-ID: <20160312152351.GC7015@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Thomas Gleixner wrote: > Harry reported, that he's able to trigger a system freeze with cpu hot > unplug. The freeze turned out to be a live lock caused by recent changes in > irq_force_complete_move(). > > When fixup_irqs() and from there irq_force_complete_move() is called on the > dying cpu, then all other cpus are in stop machine and wait for the dying cpu > to complete the teardown. If there is a move of an interrupt pending then > irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain > mask and waits for them to clear the mask. That's obviously impossible as > those cpus are firmly stuck in stop machine with interrupts disabled. > > I should have known that, but I completely overlooked it being concentrated on > the locking issues around the vectors. And the existance of the call to s/existence > __irq_complete_move() in the code, which actually sends the cleanup IPI made > it look completely logical that waiting for that cleanup to complete is the > right thing to do. That call was bogus even before the recent changes, but it > was the pointless distraction which tricked me not to see the real issue. :( > > So looking deeper into the issue I discovered that the cleanup of the vectors > is actually pretty simple. We have to look at three cases: > > 1) The move_in_progress flag of the interrupt is set > > A) The interrupt must be moved in interrupt context, i.e. the affinity > change takes place when the next interrupt happens. > > In that case the io_apic is not yet updated to the new vector, so we can > simply restore the target domain mask to the previous state, > i.e. old_domain, and restore the old vector in the configuration data. > > Further we need to check whether the affinity update actually changed > the vector or merily reduced the target mask. If it's a new vector, then > we need to clear the vector entries of the new vector. > > This undoes the pending affinity change to the old target, but with the > outgoing cpu cleared in the target domain mask. > > B) The interrupt can be moved in any context, i.e. the io_apic has been > updated with the new vector already, but no interrupt was delivered > after that update, so we know for sure, that the next interrupt will be > delivered to the new vector. > > So it's the same as case #2 where the cleanup IPI has been issued > already and the domain cpu mask is not yet empty. See below. > > 2) The move_in_progress flag is not set and the old_domain cpu mask is not > empty. > > That means, that an interrupt was delivered after the change and the > cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have > responded to it yet. > > It does not matter in which context the io_apic update happened, the > io_apic contains the new vector already. See also case 1B) > > So we know at this point that the next interrupt will arrive on the new > vector, so we can safely cleanup the old vectors on the cpus in the > old_domain cpu mask. > > Fixes: 98229aa36caa "x86/irq: Plug vector cleanup race" > Reported-by: Harry Junior > Signed-off-by: Thomas Gleixner > Cc: stable@vger.kernel.org > --- > arch/x86/include/asm/hw_irq.h | 1 > arch/x86/kernel/apic/vector.c | 94 +++++++++++++++++++++++++++++++++--------- > 2 files changed, 77 insertions(+), 18 deletions(-) Cool fix!! :-) How much time did it take for you to figure out this one?? ... Some minor spelchecking nits: > + * All CPUs are stuck in stop machine with interrupts disabled so > + * calling __irq_complete_move() would be completely pointless. > */ > raw_spin_lock(&vector_lock); > + > + /* > + * Clean out all offline cpus (including the outgoing one) from the > + * old_domain mask. s/cpus/CPUs > + */ > cpumask_and(data->old_domain, data->old_domain, cpu_online_mask); > - while (!cpumask_empty(data->old_domain)) { > + > + /* > + * If move_in_progress is cleared and the old_domain mask is empty, > + * then there is nothing to cleanup. fixup_irqs() will take care of s/cleanup/clean up > + * the stale vectors on the outgoing cpu. s/cpu/CPU > + */ > + if (!data->move_in_progress && cpumask_empty(data->old_domain)) { > raw_spin_unlock(&vector_lock); > - raw_spin_unlock(&desc->lock); > - cpu_relax(); > - raw_spin_lock(&desc->lock); > + return; > + } > + > + /* > + * We have to distinguish three cases: > + * > + * 1) The interrupt is in move_in_progress state and the interrupt is > + * not marked with IRQ_MOVE_PCNTXT. That means the io_apic still > + * points to the old vector. > + * > + * 2) The interrupt is in move_in_progress state and the interrupt is > + * marked with IRQ_MOVE_PCNTXT. That means the io_apic already has > + * the new vector. > + * > + * 3) The interrupt has been moved, the io_apic has already the new > + * vector, but the cleanup IPIs have not been processed yet. > + * > + * #2 and #3 can be handled in the same way as the old vector is not > + * longer in use and the vector entries of the cpus in old_domain mask s/not longer in use/no longer in use s/cpus/CPUs > + * can be cleaned up safely now. > + */ > + if (!irqd_can_move_in_process_context(irqdata) && > + data->move_in_progress) { > /* > + * We restore old_domain (the offline cpus have been masked s/cpus/CPUs > + /* > + * If old_domain is not empty, then other cpus still have the s/CPUs > + * irq descriptor set in their vector array. Clean it up, it's > + * not longer possible that the interrupt happens on that > + * vector. s/it's not longer possible/it's no longer possible > + */ > + v = cfg->old_vector; > + for_each_cpu(cpu, data->old_domain) > + per_cpu(vector_irq, cpu)[v] = VECTOR_UNUSED; > } > + /* Cleanup the left overs of the (half finished) move */ s/Cleanup/clean up Thanks, Ingo