From: Chaitanya Kulkarni <chaitanyak@nvidia.com>
To: "sagi@grimberg.me" <sagi@grimberg.me>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
Hannes Reinecke <hare@suse.de>
Subject: Re: [PATCH v2] nvme-tcp: Fix I/O queue cpu spreading for multiple controllers
Date: Tue, 7 Jan 2025 03:54:36 +0000 [thread overview]
Message-ID: <ad29db09-5643-4399-ba79-c2a2eb21e620@nvidia.com> (raw)
In-Reply-To: <20250104212711.37779-1-sagi@grimberg.me>
On 1/4/25 13:27, Sagi Grimberg wrote:
> Since day-1 we are assigning the queue io_cpu very naively. We always
> base the queue id (controller scope) and assign it its matching cpu
> from the online mask. This works fine when the number of queues match
> the number of cpu cores.
>
> The problem starts when we have less queues than cpu cores. First, we
> should take into account the mq_map and select a cpu within the cpus
> that are assigned to this queue by the mq_map in order to minimize cross
> numa cpu bouncing.
>
> Second, even worse is that we don't take into account multiple
> controllers may have assigned queues to a given cpu. As a result we may
> simply compund more and more queues on the same set of cpus, which is
> suboptimal.
>
> We fix this by introducing global per-cpu counters that tracks the
> number of queues assigned to each cpu, and we select the least used cpu
> based on the mq_map and the per-cpu counters, and assign it as the queue
> io_cpu.
>
> The behavior for a single controller is slightly optimized by selecting
> better cpu candidates by consulting with the mq_map, and multiple
> controllers are spreading queues among cpu cores much better, resulting
> in lower average cpu load, and less likelihood to hit hotspots.
>
> Note that the accounting is not 100% perfect, but we don't need to be,
> we're simply putting our best effort to select the best candidate cpu
> core that we find at any given point.
>
> Another byproduct is that every controller reset/reconnect may change
> the queues io_cpu mapping, based on the current LRU accounting scheme.
>
> Here is the baseline queue io_cpu assignment for 4 controllers, 2 queues
> per controller, and 4 cpus on the host:
> nvme1: queue 0: using cpu 0
> nvme1: queue 1: using cpu 1
> nvme2: queue 0: using cpu 0
> nvme2: queue 1: using cpu 1
> nvme3: queue 0: using cpu 0
> nvme3: queue 1: using cpu 1
> nvme4: queue 0: using cpu 0
> nvme4: queue 1: using cpu 1
>
> And this is the fixed io_cpu assignment:
> nvme1: queue 0: using cpu 0
> nvme1: queue 1: using cpu 2
> nvme2: queue 0: using cpu 1
> nvme2: queue 1: using cpu 3
> nvme3: queue 0: using cpu 0
> nvme3: queue 1: using cpu 2
> nvme4: queue 0: using cpu 1
> nvme4: queue 1: using cpu 3
>
> Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
> Suggested-by: Hannes Reinecke<hare@kernel.org>
> Signed-off-by: Sagi Grimberg<sagi@grimberg.me>
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
next prev parent reply other threads:[~2025-01-07 4:15 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-04 21:27 [PATCH v2] nvme-tcp: Fix I/O queue cpu spreading for multiple controllers Sagi Grimberg
2025-01-06 7:24 ` Christoph Hellwig
2025-01-07 3:54 ` Chaitanya Kulkarni [this message]
2025-01-07 16:49 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad29db09-5643-4399-ba79-c2a2eb21e620@nvidia.com \
--to=chaitanyak@nvidia.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox