All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: Trent Piepho <tpiepho@impinj.com>,
	"lorenzo.pieralisi@arm.com" <lorenzo.pieralisi@arm.com>,
	"jpinto@synopsys.com" <joao.pinto@synopsys.com>,
	"jingoohan1@gmail.com" <jingoohan1@gmail.com>,
	"faiz_abbas@ti.com" <faiz_abbas@ti.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"vigneshr@ti.com" <vigneshr@ti.com>
Subject: Re: [PATCH] PCI: dwc: Fix interrupt race in when handling MSI
Date: Tue, 13 Nov 2018 14:40:39 +0000	[thread overview]
Message-ID: <86h8gldorc.wl-marc.zyngier@arm.com> (raw)
In-Reply-To: <f474f03d-4b3f-9175-f91f-327f67060936@synopsys.com>

On Tue, 13 Nov 2018 00:41:24 +0000,
Gustavo Pimentel <gustavo.pimentel@synopsys.com> wrote:
> 
> On 09/11/2018 11:34, Marc Zyngier wrote:
> > On 08/11/18 19:49, Trent Piepho wrote:
> >> On Thu, 2018-11-08 at 09:49 +0000, Marc Zyngier wrote:
> >>> On 07/11/18 20:17, Trent Piepho wrote:
> >>>> On Wed, 2018-11-07 at 18:41 +0000, Marc Zyngier wrote:
> >>>>> On 06/11/18 19:40, Trent Piepho wrote:
> >>>>>>
> >>>>>> What about stable kernels that don't have the hierarchical API?
> >>>>>
> >>>>> My goal is to fix mainline first. Once we have something that works on
> >>>>> mainline, we can look at propagating the fix to other versions. But
> >>>>> mainline always comes first.
> >>>>
> >>>> This is a regression that went into 4.14.  Wouldn't the appropriate
> >>>> action for the stable series be to undo the regression?
> >>>
> >>> This is not how stable works. Stable kernels *only* contain patches that
> >>> are backported from mainline, and do not take standalone patch.
> >>>
> >>> Furthermore, your fix is to actually undo someone else's fix. Who is
> >>> right? In the absence of any documentation, the answer is "nobody".
> >>
> >> Little more history to this bug.  The code was originally the way it is
> >> now, but this same bug was fixed in 2013 in https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.or&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=bkWxpLoW-f-E3EdiDCCa0_h0PicsViasSlvIpzZvPxs&m=Tk2dtPxJ5WvHjQbWWx35od-JZmur8mRuDrSSVhf8nYs&s=3nFZMUc2qvOXJht1WuNCDorudHeFiWyeDaO1UbhTXmw&e=
> >> g/patch/3333681/
> >>
> >> Then that lasted four years until it was changed Aug 2017 in https://urldefense.proofpoint.com/v2/url?u=https-3A__pa&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=bkWxpLoW-f-E3EdiDCCa0_h0PicsViasSlvIpzZvPxs&m=Tk2dtPxJ5WvHjQbWWx35od-JZmur8mRuDrSSVhf8nYs&s=yfWJAoFIGjLbTnuE_KLzhW7J_pyMD3X1dXm4eHoy7_o&e=
> >> tchwork.kernel.org/patch/9893303/
> >>
> >> That lasted just six months until someone tried to revert it, https://urldefense.proofpoint.com/v2/url?u=https-3A__p&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=bkWxpLoW-f-E3EdiDCCa0_h0PicsViasSlvIpzZvPxs&m=Tk2dtPxJ5WvHjQbWWx35od-JZmur8mRuDrSSVhf8nYs&s=q1LGEb9w_CjcwmWVFOM5VWJ17oS9XHd8SsWTsBF8zfU&e=
> >> atchwork.kernel.org/patch/9893303/
> >>
> >> Seems pretty clear the way it is now is much worse than the way it was
> >> before, even if the previous design may have had another flaw.  Though
> >> I've yet to see anyone point out something makes the previous design
> >> broken.  Sub-optimal yes, but not broken.
> > 
> > This is not what was reported by the previous submitter. I guess they
> > changed this for a reason, no? I'm prepared to admit this is a end-point
> > driver bug, but we need to understand what is happening and stop
> > changing this driver randomly.
> > 
> >>> Anything can be backported to stable once we understand the issue. At
> >>> the moment, we're just playing games moving stuff around and hope
> >>> nothing else will break. That's not a sustainable way of maintaining
> >>> this driver. At the moment, the only patch I'm inclined to propose until
> >>> we get an actual interrupt handling flow from Synopsys is to mark this
> >>> driver as "BROKEN".
> >>
> >> It feels like you're using this bug to hold designware hostage in a
> >> broken kernel, and me along with them.  I don't have the documentation,
> >> no one does, there's no way for me to give you want you want.  But I've
> >> got hardware that doesn't work in the mainline kernel.
> > 
> > I take it as a personal offence that I'd be holding anything or anyone
> > hostage. I think I have a long enough track record working with the
> > Linux kernel not to take any of this nonsense. What's my interest in
> > keeping anything in this sorry state? Think about it for a minute.
> > 
> > When I'm asking for documentation, I'm not asking you. I perfectly
> > understood that you don't have any. We need Synopsys to step up and give
> > us a simple description of what the MSI workflow is. Because no matter
> > how you randomly mutate this code, it will still be broken until we get
> > a clear answer from *Synopsys*.
> > 
> > Gustavo, here's one simple ask. Please reply to this email with a step
> > by step, register by register description of how an MSI must be handled
> > on this HW. We do need it, and we need it now.
> 
> Hi Marc, I thought that I did respond to your question on Nov 8. However I will
> repeat it and add some extra information to it now.
> 
> About the registers:
> 
> PCIE_MSI_INTR0_MASK
> When an MSI is received for a masked interrupt, the corresponding status bit
> gets set in the interrupt status register but the controller will not signal it.
> As soon as the masked interrupt is unmasked and assuming the status bit is still
> set the controller will signal it.

So this is *really* a mask.

> PCIE_MSI_INTR0_ENABLE
> Enables/disables every interrupt. When an MSI is received from a disabled
> interrupt, no status bit gets set in MSI controller interrupt status register.

And that's not a mask at all. This allows the interrupt to be simply
lost. It is just a bit worrying that this is exactly what the DWC
driver is using, and faces all kind of interrupt loss when
enabling/disabling interrupts (which do have a mask/unmask semantic --
yes, the language is confusing).


> 
> About this new callbacks:
> 
> The dw_pci_bottom_mask() and dw_pci_bottom_unmask() callbacks were added on
> Linux kernel v4.17 on PCI: dwc: Move MSI IRQs allocation to IRQ domains
> hierarchical API patch [1] and based on what I've seen so far, before this patch
> there were no interrupt masking been performed.
> 
> Based on the information provided, its very likely that I should use the
> PCIE_MSI_INTR0_MASK register instead of PCIE_MSI_INTR0_ENABLE on the
> dw_pci_bottom_mask() and dw_pci_bottom_unmask() callbacks. As soon as I return
> from Linux plumbers conference, I will test the system using the
> PCIE_MSI_INTR0_MASK register.

I'm at Plumbers as well, so feel free to grab me and we'll sort this
out. We've wasted enough time on this broken driver.

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

  parent reply	other threads:[~2018-11-13 14:40 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-27  0:00 [PATCH] PCI: dwc: Fix interrupt race in when handling MSI Trent Piepho
2018-11-05 10:28 ` Vignesh R
2018-11-05 12:08 ` Gustavo Pimentel
2018-11-06 14:53 ` Lorenzo Pieralisi
2018-11-06 16:00   ` Marc Zyngier
2018-11-06 19:40     ` Trent Piepho
2018-11-07 11:07       ` Lorenzo Pieralisi
2018-11-07 12:58         ` Gustavo Pimentel
2018-11-07 18:41       ` Marc Zyngier
2018-11-07 20:17         ` Trent Piepho
2018-11-08  9:49           ` Marc Zyngier
2018-11-08 19:49             ` Trent Piepho
2018-11-09 10:13               ` Lorenzo Pieralisi
2018-11-09 17:17                 ` Vignesh R
2018-11-09 11:34               ` Marc Zyngier
2018-11-09 18:53                 ` Trent Piepho
2018-11-13  0:41                 ` Gustavo Pimentel
2018-11-13  1:18                   ` Trent Piepho
2018-11-13 10:36                     ` Lorenzo Pieralisi
2018-11-13 18:55                       ` Trent Piepho
2018-11-13 14:40                   ` Marc Zyngier [this message]
2018-11-07 12:57     ` Gustavo Pimentel
2018-11-07 18:32       ` Trent Piepho
2018-11-08 11:46         ` Gustavo Pimentel
2018-11-08 20:51           ` Trent Piepho
2018-11-12 16:01             ` Lorenzo Pieralisi
2018-11-13  1:03               ` Gustavo Pimentel
2018-11-14 21:29               ` Trent Piepho
2018-11-12 23:45             ` Gustavo Pimentel
2018-11-07 18:46       ` Marc Zyngier
2018-11-08 11:24         ` Gustavo Pimentel
2018-11-06 18:59   ` Trent Piepho

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86h8gldorc.wl-marc.zyngier@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=bhelgaas@google.com \
    --cc=faiz_abbas@ti.com \
    --cc=gustavo.pimentel@synopsys.com \
    --cc=jingoohan1@gmail.com \
    --cc=joao.pinto@synopsys.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=tpiepho@impinj.com \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.