From: Christoph Hellwig <hch@lst.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
Ming Lei <ming.lei@redhat.com>, Christoph Hellwig <hch@lst.de>,
Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org,
Keith Busch <keith.busch@intel.com>,
Marc Zyngier <marc.zyngier@arm.com>,
Sumit Saxena <sumit.saxena@broadcom.com>,
Kashyap Desai <kashyap.desai@broadcom.com>,
Shivasharan Srikanteshwara
<shivasharan.srikanteshwara@broadcom.com>
Subject: Re: [patch v6 3/7] genirq/affinity: Add new callback for (re)calculating interrupt sets
Date: Tue, 15 Jun 2021 22:04:43 +0200 [thread overview]
Message-ID: <20210615200443.GA6557@lst.de> (raw)
In-Reply-To: <20210615195707.GA2909907@bjorn-Precision-5520>
On Tue, Jun 15, 2021 at 02:57:07PM -0500, Bjorn Helgaas wrote:
> On Sat, Feb 16, 2019 at 06:13:09PM +0100, Thomas Gleixner wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> >
> > The interrupt affinity spreading mechanism supports to spread out
> > affinities for one or more interrupt sets. A interrupt set contains one or
> > more interrupts. Each set is mapped to a specific functionality of a
> > device, e.g. general I/O queues and read I/O queus of multiqueue block
> > devices.
> >
> > The number of interrupts per set is defined by the driver. It depends on
> > the total number of available interrupts for the device, which is
> > determined by the PCI capabilites and the availability of underlying CPU
> > resources, and the number of queues which the device provides and the
> > driver wants to instantiate.
> >
> > The driver passes initial configuration for the interrupt allocation via a
> > pointer to struct irq_affinity.
> >
> > Right now the allocation mechanism is complex as it requires to have a loop
> > in the driver to determine the maximum number of interrupts which are
> > provided by the PCI capabilities and the underlying CPU resources. This
> > loop would have to be replicated in every driver which wants to utilize
> > this mechanism. That's unwanted code duplication and error prone.
> >
> > In order to move this into generic facilities it is required to have a
> > mechanism, which allows the recalculation of the interrupt sets and their
> > size, in the core code. As the core code does not have any knowledge about the
> > underlying device, a driver specific callback is required in struct
> > irq_affinity, which can be invoked by the core code. The callback gets the
> > number of available interupts as an argument, so the driver can calculate the
> > corresponding number and size of interrupt sets.
> >
> > At the moment the struct irq_affinity pointer which is handed in from the
> > driver and passed through to several core functions is marked 'const', but for
> > the callback to be able to modify the data in the struct it's required to
> > remove the 'const' qualifier.
> >
> > Add the optional callback to struct irq_affinity, which allows drivers to
> > recalculate the number and size of interrupt sets and remove the 'const'
> > qualifier.
> >
> > For simple invocations, which do not supply a callback, a default callback
> > is installed, which just sets nr_sets to 1 and transfers the number of
> > spreadable vectors to the set_size array at index 0.
> >
> > This is for now guarded by a check for nr_sets != 0 to keep the NVME driver
> > working until it is converted to the callback mechanism.
> >
> > To make sure that the driver configuration is correct under all circumstances
> > the callback is invoked even when there are no interrupts for queues left,
> > i.e. the pre/post requirements already exhaust the numner of available
> > interrupts.
> >
> > At the PCI layer irq_create_affinity_masks() has to be invoked even for the
> > case where the legacy interrupt is used. That ensures that the callback is
> > invoked and the device driver can adjust to that situation.
> >
> > [ tglx: Fixed the simple case (no sets required). Moved the sanity check
> > for nr_sets after the invocation of the callback so it catches
> > broken drivers. Fixed the kernel doc comments for struct
> > irq_affinity and de-'This patch'-ed the changelog ]
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> > @@ -1196,6 +1196,13 @@ int pci_alloc_irq_vectors_affinity(struc
> > /* use legacy irq if allowed */
> > if (flags & PCI_IRQ_LEGACY) {
> > if (min_vecs == 1 && dev->irq) {
> > + /*
> > + * Invoke the affinity spreading logic to ensure that
> > + * the device driver can adjust queue configuration
> > + * for the single interrupt case.
> > + */
> > + if (affd)
> > + irq_create_affinity_masks(1, affd);
>
> This looks like a leak because irq_create_affinity_masks() returns a
> pointer to kcalloc()ed space, but we throw away the pointer.
>
> Or is there something very subtle going on here, like this special
> case doesn't allocate anything? I do see the "Nothing to assign?"
> case that returns NULL with no alloc, but it's not completely trivial
> to verify that we take that case here.
>
> > pci_intx(dev, 1);
> > return 1;
> > }
---end quoted text---
WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@lst.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
Ming Lei <ming.lei@redhat.com>, Christoph Hellwig <hch@lst.de>,
Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org, linux-pci@vger.kernel.org,
Keith Busch <keith.busch@intel.com>,
Marc Zyngier <marc.zyngier@arm.com>,
Sumit Saxena <sumit.saxena@broadcom.com>,
Kashyap Desai <kashyap.desai@broadcom.com>,
Shivasharan Srikanteshwara
<shivasharan.srikanteshwara@broadcom.com>
Subject: Re: [patch v6 3/7] genirq/affinity: Add new callback for (re)calculating interrupt sets
Date: Tue, 15 Jun 2021 22:04:43 +0200 [thread overview]
Message-ID: <20210615200443.GA6557@lst.de> (raw)
In-Reply-To: <20210615195707.GA2909907@bjorn-Precision-5520>
On Tue, Jun 15, 2021 at 02:57:07PM -0500, Bjorn Helgaas wrote:
> On Sat, Feb 16, 2019 at 06:13:09PM +0100, Thomas Gleixner wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> >
> > The interrupt affinity spreading mechanism supports to spread out
> > affinities for one or more interrupt sets. A interrupt set contains one or
> > more interrupts. Each set is mapped to a specific functionality of a
> > device, e.g. general I/O queues and read I/O queus of multiqueue block
> > devices.
> >
> > The number of interrupts per set is defined by the driver. It depends on
> > the total number of available interrupts for the device, which is
> > determined by the PCI capabilites and the availability of underlying CPU
> > resources, and the number of queues which the device provides and the
> > driver wants to instantiate.
> >
> > The driver passes initial configuration for the interrupt allocation via a
> > pointer to struct irq_affinity.
> >
> > Right now the allocation mechanism is complex as it requires to have a loop
> > in the driver to determine the maximum number of interrupts which are
> > provided by the PCI capabilities and the underlying CPU resources. This
> > loop would have to be replicated in every driver which wants to utilize
> > this mechanism. That's unwanted code duplication and error prone.
> >
> > In order to move this into generic facilities it is required to have a
> > mechanism, which allows the recalculation of the interrupt sets and their
> > size, in the core code. As the core code does not have any knowledge about the
> > underlying device, a driver specific callback is required in struct
> > irq_affinity, which can be invoked by the core code. The callback gets the
> > number of available interupts as an argument, so the driver can calculate the
> > corresponding number and size of interrupt sets.
> >
> > At the moment the struct irq_affinity pointer which is handed in from the
> > driver and passed through to several core functions is marked 'const', but for
> > the callback to be able to modify the data in the struct it's required to
> > remove the 'const' qualifier.
> >
> > Add the optional callback to struct irq_affinity, which allows drivers to
> > recalculate the number and size of interrupt sets and remove the 'const'
> > qualifier.
> >
> > For simple invocations, which do not supply a callback, a default callback
> > is installed, which just sets nr_sets to 1 and transfers the number of
> > spreadable vectors to the set_size array at index 0.
> >
> > This is for now guarded by a check for nr_sets != 0 to keep the NVME driver
> > working until it is converted to the callback mechanism.
> >
> > To make sure that the driver configuration is correct under all circumstances
> > the callback is invoked even when there are no interrupts for queues left,
> > i.e. the pre/post requirements already exhaust the numner of available
> > interrupts.
> >
> > At the PCI layer irq_create_affinity_masks() has to be invoked even for the
> > case where the legacy interrupt is used. That ensures that the callback is
> > invoked and the device driver can adjust to that situation.
> >
> > [ tglx: Fixed the simple case (no sets required). Moved the sanity check
> > for nr_sets after the invocation of the callback so it catches
> > broken drivers. Fixed the kernel doc comments for struct
> > irq_affinity and de-'This patch'-ed the changelog ]
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> > @@ -1196,6 +1196,13 @@ int pci_alloc_irq_vectors_affinity(struc
> > /* use legacy irq if allowed */
> > if (flags & PCI_IRQ_LEGACY) {
> > if (min_vecs == 1 && dev->irq) {
> > + /*
> > + * Invoke the affinity spreading logic to ensure that
> > + * the device driver can adjust queue configuration
> > + * for the single interrupt case.
> > + */
> > + if (affd)
> > + irq_create_affinity_masks(1, affd);
>
> This looks like a leak because irq_create_affinity_masks() returns a
> pointer to kcalloc()ed space, but we throw away the pointer.
>
> Or is there something very subtle going on here, like this special
> case doesn't allocate anything? I do see the "Nothing to assign?"
> case that returns NULL with no alloc, but it's not completely trivial
> to verify that we take that case here.
>
> > pci_intx(dev, 1);
> > return 1;
> > }
---end quoted text---
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2021-06-15 20:04 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-16 17:13 [patch v6 0/7] genirq/affinity: Overhaul the multiple interrupt sets support Thomas Gleixner
2019-02-16 17:13 ` Thomas Gleixner
2019-02-16 17:13 ` [patch v6 1/7] genirq/affinity: Code consolidation Thomas Gleixner
2019-02-16 17:13 ` Thomas Gleixner
2019-02-17 13:36 ` Ming Lei
2019-02-17 13:36 ` Ming Lei
2019-02-18 10:25 ` [tip:irq/core] " tip-bot for Thomas Gleixner
2019-02-16 17:13 ` [patch v6 2/7] genirq/affinity: Store interrupt sets size in struct irq_affinity Thomas Gleixner
2019-02-16 17:13 ` Thomas Gleixner
2019-02-18 10:26 ` [tip:irq/core] " tip-bot for Ming Lei
2019-02-16 17:13 ` [patch v6 3/7] genirq/affinity: Add new callback for (re)calculating interrupt sets Thomas Gleixner
2019-02-16 17:13 ` Thomas Gleixner
2019-02-18 10:27 ` [tip:irq/core] " tip-bot for Ming Lei
2021-06-15 19:57 ` [patch v6 3/7] " Bjorn Helgaas
2021-06-15 19:57 ` Bjorn Helgaas
2021-06-15 20:04 ` Christoph Hellwig [this message]
2021-06-15 20:04 ` Christoph Hellwig
2021-06-16 0:40 ` Ming Lei
2021-06-16 0:40 ` Ming Lei
2021-06-18 19:32 ` Thomas Gleixner
2021-06-18 19:32 ` Thomas Gleixner
2021-06-18 19:19 ` Thomas Gleixner
2021-06-18 19:19 ` Thomas Gleixner
2019-02-16 17:13 ` [patch v6 4/7] nvme-pci: Simplify interrupt allocation Thomas Gleixner
2019-02-16 17:13 ` Thomas Gleixner
2019-02-18 10:27 ` [tip:irq/core] " tip-bot for Ming Lei
2019-02-16 17:13 ` [patch v6 5/7] genirq/affinity: Remove the leftovers of the original set support Thomas Gleixner
2019-02-16 17:13 ` Thomas Gleixner
2019-02-17 13:39 ` Ming Lei
2019-02-17 13:39 ` Ming Lei
2019-02-18 10:28 ` [tip:irq/core] " tip-bot for Thomas Gleixner
2019-02-16 17:13 ` [patch v6 6/7] PCI/MSI: Remove obsolete sanity checks for multiple interrupt sets Thomas Gleixner
2019-02-16 17:13 ` Thomas Gleixner
2019-02-17 13:39 ` Ming Lei
2019-02-17 13:39 ` Ming Lei
2019-02-18 10:29 ` [tip:irq/core] " tip-bot for Thomas Gleixner
2019-02-16 17:13 ` [patch v6 7/7] genirq/affinity: Add support for non-managed affinity sets Thomas Gleixner
2019-02-16 17:13 ` Thomas Gleixner
2019-02-17 13:45 ` Ming Lei
2019-02-17 13:45 ` Ming Lei
2019-02-17 19:17 ` Thomas Gleixner
2019-02-17 19:17 ` Thomas Gleixner
2019-02-18 2:49 ` Ming Lei
2019-02-18 2:49 ` Ming Lei
2019-02-18 7:25 ` Thomas Gleixner
2019-02-18 7:25 ` Thomas Gleixner
2019-02-18 8:43 ` [patch v6 0/7] genirq/affinity: Overhaul the multiple interrupt sets support Marc Zyngier
2019-02-18 8:43 ` Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210615200443.GA6557@lst.de \
--to=hch@lst.de \
--cc=axboe@kernel.dk \
--cc=helgaas@kernel.org \
--cc=kashyap.desai@broadcom.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=ming.lei@redhat.com \
--cc=sagi@grimberg.me \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=sumit.saxena@broadcom.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.