From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:54892 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965548AbcA0TGv (ORCPT ); Wed, 27 Jan 2016 14:06:51 -0500 From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Alistair Popple , Andrew Donnellan , Michael Ellerman Subject: [PATCH 4.3 126/157] powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion" Date: Wed, 27 Jan 2016 10:13:13 -0800 Message-Id: <20160127180939.453739059@linuxfoundation.org> In-Reply-To: <20160127180932.533735338@linuxfoundation.org> References: <20160127180932.533735338@linuxfoundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Sender: stable-owner@vger.kernel.org List-ID: 4.3-stable review patch. If anyone has any objections, please let me know. ------------------ From: Alistair Popple commit 036592fbbe753d236402a0ae68148e7c143a0f0e upstream. Commit 25642e1459ac ("powerpc/opal-irqchip: Fix double endian conversion") fixed an endian bug by calling opal_handle_events() in opal_event_unmask(). However this introduced a deadlock if we find an event is active during unmasking and call opal_handle_events() again. The bad call sequence is: opal_interrupt() -> opal_handle_events() -> generic_handle_irq() -> handle_level_irq() -> raw_spin_lock(&desc->lock) handle_irq_event(desc) unmask_irq(desc) -> opal_event_unmask() -> opal_handle_events() -> generic_handle_irq() -> handle_level_irq() -> raw_spin_lock(&desc->lock) (BOOM) When generating multiple opal events in quick succession this would lead to the following stall warnings: EEH: Fenced PHB#0 detected, location: U78C9.001.WZS09XA-P1-C32 INFO: rcu_sched detected stalls on CPUs/tasks: 12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=2065 15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=2065 (detected by 13, t=2102 jiffies, g=1325, c=1324, q=602) NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [irqbalance:2696] INFO: rcu_sched detected stalls on CPUs/tasks: 12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=8371 15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=8371 (detected by 20, t=8407 jiffies, g=1325, c=1324, q=1290) This patch corrects the problem by queuing the work if an event is active during unmasking, which is similar to the pre-endian fix behaviour. Fixes: 25642e1459ac ("powerpc/opal-irqchip: Fix double endian conversion") Signed-off-by: Alistair Popple Reported-by: Andrew Donnellan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powernv/opal-irqchip.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) --- a/arch/powerpc/platforms/powernv/opal-irqchip.c +++ b/arch/powerpc/platforms/powernv/opal-irqchip.c @@ -83,7 +83,19 @@ static void opal_event_unmask(struct irq set_bit(d->hwirq, &opal_event_irqchip.mask); opal_poll_events(&events); - opal_handle_events(be64_to_cpu(events)); + last_outstanding_events = be64_to_cpu(events); + + /* + * We can't just handle the events now with opal_handle_events(). + * If we did we would deadlock when opal_event_unmask() is called from + * handle_level_irq() with the irq descriptor lock held, because + * calling opal_handle_events() would call generic_handle_irq() and + * then handle_level_irq() which would try to take the descriptor lock + * again. Instead queue the events for later. + */ + if (last_outstanding_events & opal_event_irqchip.mask) + /* Need to retrigger the interrupt */ + irq_work_queue(&opal_event_irq_work); } static int opal_event_set_type(struct irq_data *d, unsigned int flow_type)