From: Marcelo Tosatti <mtosatti@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
linux-nvme@lists.infradead.org, Sagi Grimberg <sagi@grimberg.me>,
Lawrence Troup <ltroup@cisco.com>
Subject: Re: [PATCH V3] nvme-pci: allow unmanaged interrupts
Date: Mon, 15 Jul 2024 13:03:02 -0300 [thread overview]
Message-ID: <ZpVINh3BA88U6KEc@tpad> (raw)
In-Reply-To: <20240702104112.4123810-1-ming.lei@redhat.com>
On Tue, Jul 02, 2024 at 06:41:12PM +0800, Ming Lei wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> People _really_ want to control their interrupt affinity in some
> cases, such as Openshift with Performance profile, in which each
> irq's affinity is completely specified from userspace. Turns out
> that 'isolcpus=managed_irqs' isn't enough.
>
> Add module parameter to allow unmanaged interrupts, just as some
> SCSI drivers are doing.
>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> v2->v3:
> - rebase on for-next
> - add openshift use case
>
> v1->v2: skip the the AFFINITY vector allocation if the parameter is
> provided instead trying to make the vector code handle all post_vectors.
>
> drivers/nvme/host/pci.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5d8035218de9..a39c99c9b64d 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -63,6 +63,11 @@ MODULE_PARM_DESC(sgl_threshold,
> "Use SGLs when average request segment size is larger or equal to "
> "this size. Use 0 to disable SGLs.");
>
> +static bool managed_irqs = true;
> +module_param(managed_irqs, bool, 0444);
> +MODULE_PARM_DESC(managed_irqs,
> + "set to false for user controlled irq affinity");
> +
What if you set "static bool managed_irqs" to false when isolcpus is being used?
For example:
if (housekeeping_enabled(HK_TYPE_MANAGED_IRQ))
managed_irqs = false;
Then there is no additional parameter to tune (which addresses
Christoph's concern).
> #define NVME_PCI_MIN_QUEUE_SIZE 2
> #define NVME_PCI_MAX_QUEUE_SIZE 4095
> static int io_queue_depth_set(const char *val, const struct kernel_param *kp);
> @@ -456,7 +461,7 @@ static void nvme_pci_map_queues(struct blk_mq_tag_set *set)
> * affinity), so use the regular blk-mq cpu mapping
> */
> map->queue_offset = qoff;
> - if (i != HCTX_TYPE_POLL && offset)
> + if (managed_irqs && i != HCTX_TYPE_POLL && offset)
> blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
> else
> blk_mq_map_queues(map);
> @@ -2226,6 +2231,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
> };
> unsigned int irq_queues, poll_queues;
> unsigned int flags = PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY;
> + int ret;
>
> /*
> * Poll queues don't need interrupts, but we need at least one I/O queue
> @@ -2251,8 +2257,16 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
> irq_queues += (nr_io_queues - poll_queues);
> if (dev->ctrl.quirks & NVME_QUIRK_BROKEN_MSI)
> flags &= ~PCI_IRQ_MSI;
> - return pci_alloc_irq_vectors_affinity(pdev, 1, irq_queues, flags,
> +
> + if (managed_irqs)
> + return pci_alloc_irq_vectors_affinity(pdev, 1, irq_queues, flags,
> &affd);
> +
> + flags &= ~PCI_IRQ_AFFINITY;
> + ret = pci_alloc_irq_vectors(pdev, 1, irq_queues, flags);
> + if (ret > 0)
> + nvme_calc_irq_sets(&affd, ret - 1);
> + return ret;
> }
>
> static unsigned int nvme_max_io_queues(struct nvme_dev *dev)
> --
> 2.44.0
>
>
next prev parent reply other threads:[~2024-07-15 16:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-02 10:41 [PATCH V3] nvme-pci: allow unmanaged interrupts Ming Lei
2024-07-02 11:50 ` Christoph Hellwig
2024-07-02 12:12 ` Ming Lei
2024-07-02 12:20 ` Lawrence Troup (ltroup)
2024-07-02 15:28 ` Christoph Hellwig
2024-07-03 1:57 ` Ming Lei
2024-07-03 5:24 ` Christoph Hellwig
2024-07-02 16:28 ` Daniel Wagner
2024-07-03 1:51 ` Ming Lei
2024-07-16 10:18 ` Hannes Reinecke
2024-07-15 16:03 ` Marcelo Tosatti [this message]
2024-07-15 16:23 ` Marcelo Tosatti
2024-07-16 4:29 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZpVINh3BA88U6KEc@tpad \
--to=mtosatti@redhat.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=ltroup@cisco.com \
--cc=ming.lei@redhat.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.