Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Keith Busch <keith.busch@intel.com>,
	linux-pci@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH] PCI/MSI: preference to returning -ENOSPC from pci_alloc_irq_vectors_affinity
Date: Tue, 15 Jan 2019 17:49:04 -0600	[thread overview]
Message-ID: <20190115234904.GB158366@google.com> (raw)
In-Reply-To: <20190115224631.GA22558@ming.t460p>

On Wed, Jan 16, 2019 at 06:46:32AM +0800, Ming Lei wrote:
> Hi Bjorn,
> 
> I think Christoph and Jens are correct, we should make this patch into
> 5.0 because the issue is triggered since 3b6592f70ad7b4c2 ("nvme: utilize
> two queue maps, one for reads and one for writes"), which is merged to
> 5.0-rc.
> 
> For example, before 3b6592f70ad7b4c2, one nvme controller may be
> allocated 64 irq vectors; but after that commit, only 1 irq vector
> is assigned to this controller.
> 
> On Tue, Jan 15, 2019 at 01:31:35PM -0600, Bjorn Helgaas wrote:
> > On Tue, Jan 15, 2019 at 09:22:45AM -0700, Jens Axboe wrote:
> > > On 1/15/19 6:11 AM, Christoph Hellwig wrote:
> > > > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote:
> > > >> Applied to pci/msi for v5.1, thanks!
> > > >>
> > > >> If this is something that should be in v5.0, let me know and include the
> > > >> justification, e.g., something we already merged for v5.0 or regression
> > > >> info, etc, and a Fixes: line, and I'll move it to for-linus.
> > > > 
> > > > I'd be tempted to queues this up for 5.0.  Ming, what is your position?
> > > 
> > > I think we should - the API was introduced in this series, I think there's
> > > little (to no) reason NOT to fix it for 5.0.
> > 
> > I'm guessing the justification goes something like this (I haven't
> > done all the research, so I'll leave it to Ming to fill in the details):
> > 
> >   pci_alloc_irq_vectors_affinity() was added in v4.x by XXXX ("...").
> 
> dca51e7892fa3b ("nvme: switch to use pci_alloc_irq_vectors")
> 
> >   It had this return value defect then, but its min_vecs/max_vecs
> >   parameters removed the need for callers to interatively reduce the
> >   number of IRQs requested and retry the allocation, so they didn't
> >   need to distinguish -ENOSPC from -EINVAL.
> > 
> >   In v5.0, XXX ("...") added IRQ sets to the interface, which
> 
> 3b6592f70ad7b4c2 ("nvme: utilize two queue maps, one for reads and one for writes")
> 
> >   reintroduced the need to check for -ENOSPC and possibly reduce the
> >   number of IRQs requested and retry the allocation.

We're fixing a PCI core defect, so we should mention the relevant PCI
core commits, not the nvme-specific ones.  I looked them up for you
and moved this to for-linus for v5.0.


commit 77f88abd4a6f73a1a68dbdc0e3f21575fd508fc3
Author: Ming Lei <ming.lei@redhat.com>
Date:   Tue Jan 15 17:31:29 2019 -0600

    PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
    
    The API of pci_alloc_irq_vectors_affinity() says it returns -ENOSPC if
    fewer than @min_vecs interrupt vectors are available for @dev.
    
    However, if a device supports MSI-X but not MSI and a caller requests
    @min_vecs that can't be satisfied by MSI-X, we previously returned -EINVAL
    (from the failed attempt to enable MSI), not -ENOSPC.
    
    When -ENOSPC is returned, callers may reduce the number IRQs they request
    and try again.  Most callers can use the @min_vecs and @max_vecs
    parameters to avoid this retry loop, but that doesn't work when using IRQ
    affinity "nr_sets" because rebalancing the sets is driver-specific.
    
    This return value bug has been present since pci_alloc_irq_vectors() was
    added in v4.10 by aff171641d18 ("PCI: Provide sensible IRQ vector
    alloc/free routines"), but it wasn't an issue because @min_vecs/@max_vecs
    removed the need for callers to iteratively reduce the number of IRQs
    requested and retry the allocation, so they didn't need to distinguish
    -ENOSPC from -EINVAL.
    
    In v5.0, 6da4b3ab9a6e ("genirq/affinity: Add support for allocating
    interrupt sets") added IRQ sets to the interface, which reintroduced the
    need to check for -ENOSPC and possibly reduce the number of IRQs requested
    and retry the allocation.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    [bhelgaas: changelog]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Jens Axboe <axboe@fb.com>
    Cc: Keith Busch <keith.busch@intel.com>
    Cc: Christoph Hellwig <hch@lst.de>

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7a1c8a09efa5..4c0b47867258 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   const struct irq_affinity *affd)
 {
 	static const struct irq_affinity msi_default_affd;
-	int vecs = -ENOSPC;
+	int msix_vecs = -ENOSPC;
+	int msi_vecs = -ENOSPC;
 
 	if (flags & PCI_IRQ_AFFINITY) {
 		if (!affd)
@@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	}
 
 	if (flags & PCI_IRQ_MSIX) {
-		vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
-				affd);
-		if (vecs > 0)
-			return vecs;
+		msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs,
+						    max_vecs, affd);
+		if (msix_vecs > 0)
+			return msix_vecs;
 	}
 
 	if (flags & PCI_IRQ_MSI) {
-		vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
-		if (vecs > 0)
-			return vecs;
+		msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
+						  affd);
+		if (msi_vecs > 0)
+			return msi_vecs;
 	}
 
 	/* use legacy irq if allowed */
@@ -1199,7 +1201,9 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 		}
 	}
 
-	return vecs;
+	if (msix_vecs == -ENOSPC)
+		return -ENOSPC;
+	return msi_vecs;
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
 

      reply	other threads:[~2019-01-15 23:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-03  1:31 [PATCH] PCI/MSI: preference to returning -ENOSPC from pci_alloc_irq_vectors_affinity Ming Lei
2019-01-14 23:23 ` Bjorn Helgaas
2019-01-15 13:11   ` Christoph Hellwig
2019-01-15 16:22     ` Jens Axboe
2019-01-15 19:31       ` Bjorn Helgaas
2019-01-15 22:46         ` Ming Lei
2019-01-15 23:49           ` Bjorn Helgaas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190115234904.GB158366@google.com \
    --to=helgaas@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox