From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC79AC43217 for ; Wed, 4 May 2022 17:02:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355142AbiEDRGG (ORCPT ); Wed, 4 May 2022 13:06:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355708AbiEDREj (ORCPT ); Wed, 4 May 2022 13:04:39 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1B8AD4F452 for ; Wed, 4 May 2022 09:53:15 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C9F661042; Wed, 4 May 2022 09:53:14 -0700 (PDT) Received: from lpieralisi (unknown [10.57.1.196]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EE8E43FA27; Wed, 4 May 2022 09:53:12 -0700 (PDT) Date: Wed, 4 May 2022 17:53:07 +0100 From: Lorenzo Pieralisi To: Conor Dooley Cc: Bjorn Helgaas , Marc Zyngier , Conor.Dooley@microchip.com, Daire.McNamara@microchip.com, bhelgaas@google.com, Cyril.Jean@microchip.com, david.abdurachmanov@gmail.com, linux-pci@vger.kernel.org, robh@kernel.org Subject: Re: [RESEND PATCH v1 1/1] PCI: microchip: Fix potential race in interrupt handling Message-ID: <20220504165307.GA19115@lpieralisi> References: <20220502192223.GA319570@bhelgaas> <199f5479-b212-e1ac-f9e4-d5d13708cb0c@conchuod.ie> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <199f5479-b212-e1ac-f9e4-d5d13708cb0c@conchuod.ie> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On Wed, May 04, 2022 at 04:12:39PM +0100, Conor Dooley wrote: > On 02/05/2022 20:22, Bjorn Helgaas wrote: > > On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: > >> On Fri, 29 Apr 2022 22:57:33 +0100, > >> Bjorn Helgaas wrote: > >>> On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@microchip.com wrote: > >>>> On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > >>>>> On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@microchip.com wrote: > >>>>>> From: Daire McNamara > >>>>>> > >>>>>> Clear MSI bit in ISTATUS register after reading it before > >>>>>> handling individual MSI bits > > > >>>> Clear the MSI bit in ISTATUS register after reading it, but before > >>>> reading and handling individual MSI bits from the IMSI register. > >>>> This avoids a potential race where new MSI bits may be set on the > >>>> IMSI register after it was read and be missed when the MSI bit in > >>>> the ISTATUS register is cleared. > > > >>> Honestly, I don't understand enough about IRQs to determine whether > >>> this is a correct fix. Hopefully Marc will chime in. All I really > >>> know how to do is compare all the drivers and see which ones don't fit > >>> the typical patterns. > >> > >> This seems sensible. In general, edge interrupts need an early Ack > >> *before* the handler can be run. If it happens after, you're pretty > >> much guaranteed to lose edges that would be generated between the > >> handler and the late Ack. > >> > >> This can be implemented in HW in a variety of ways (read a register, > >> write a register, or even both). > > > > Is this something that is or could be documented somewhere under > > Documentation, e.g., "here are the common canonical patterns to use"? > > I feel like an idiot because I have this kind of question all the time > > and I never know how to confidently analyze it. > > Daire is still having the IT issues, so before I resend the patch with > a new commit message, how is the following: > > Clear the MSI bit in ISTATUS_LOCAL register after reading it, but > before reading and handling individual MSI bits from the ISTATUS_MSI > register. This avoids a potential race where new MSI bits may be set > on the ISTATUS_MSI register after it was read and be missed when the > MSI bit in the ISTATUS_LOCAL register is cleared. It is still unclear. You should translate what Marc said above into how ISTATUS_MSI and ISTATUS_LOCAL work (ie describe how HW works). Please describe what the registers do and use that to describe the fix. Thanks, Lorenzo > Reported by: Bjorn Helgaas > Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ > Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") > Signed-off-by: Daire McNamara > > > >>> And speaking of that, I looked at all the users of > >>> irq_set_chained_handler_and_data() in drivers/pci. All the handlers > >>> except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() > >>> and chained_irq_exit(). > >>> > >>> Are mc_handle_intx() and mc_handle_msi() just really special, or is > >>> this a mistake? > >> > >> That's just a bug. On the right HW, this would just result in lost > >> interrupts. > > Separate issue, separate patch. Do you want them in a series or as > another standalone patch? > > Thanks, > Conor.