Re: [PATCH 4/4] nvme-tcp: switch to 'cpu' affinity scope for unbound workqueues

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Hannes Reinecke <hare@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH 4/4] nvme-tcp: switch to 'cpu' affinity scope for unbound workqueues
Date: Wed, 3 Jul 2024 17:50:41 +0200	[thread overview]
Message-ID: <56ebc95e-8c9c-43cb-a849-a1bbdd4a98e6@suse.de> (raw)
In-Reply-To: <2b0dbd25-bfa5-4a0c-842b-50c27b0843a7@grimberg.me>

On 7/3/24 17:09, Sagi Grimberg wrote:
> 
> 
> On 03/07/2024 18:01, Hannes Reinecke wrote:
>> On 7/3/24 16:22, Sagi Grimberg wrote:
>>>
>>>
>>> On 03/07/2024 16:50, Hannes Reinecke wrote:
>>>> We should switch to the 'cpu' affinity scope when using the 
>>>> 'wq_unbound'
>>>> parameter as this allows us to keep I/O locality and improve 
>>>> performance.
>>>
>>> Can you please describe more why this is better? locality between what?
>>>
>> Well; the default unbound scope is 'cache', which groups the cpu 
>> according to the cache hierarchy. I want the cpu locality of the 
>> workqueue items to be preserved as much as possible, so I switched
>> to 'cpu' here.
>>
>> I'll get some performance numbers.
>>
>>> While you mention in your cover letter "comments and reviews are 
>>> welcome"
>>> The change logs in your patches are not designed to assist your 
>>> reviewer.
>>
>> I spent the last few weeks trying to come up with a solution based on my
>> original submission, but in the end I gave up as I hadn't been able to
>> fix the original issue.
> 
> Well, the last submission was a discombobulated set of mostly unrelated 
> patches...
> What was it that did not work?
> 
>> This here is a different approach by massaging the 'wq_unbound' 
>> mechanism, which is not only easier but also has the big advantage that
>> it actually works :-)
>> So I did not include a changlog to the previous patchset as this is a 
>> pretty different approach.
>> Sorry if this is confusing.
> 
> It's just difficult to try and understand what each patch contributes, 
> and most of the time the patches
> are under-documented. I want to see the improvements added, but I also 
> want them to be properly reviewed.

Sure. So here are some performance number:
(One subsystem, two paths, 96 queues)
default:
4k seq read: bw=365MiB/s (383MB/s), 11.4MiB/s-20.5MiB/s 
(11.0MB/s-21.5MB/s), io=16.0GiB (17.2GB), run=24950-44907msec
4k rand read: bw=307MiB/s (322MB/s), 9830KiB/s-13.8MiB/s 
(10.1MB/s-14.5MB/s), io=16.0GiB (17.2GB), run=37081-53333msec
4k seq write: bw=550MiB/s (577MB/s), 17.2MiB/s-28.7MiB/s 
(18.0MB/s-30.1MB/s), io=16.0GiB (17.2GB), run=17859-29786msec
4k rand write: bw=453MiB/s (475MB/s), 14.2MiB/s-21.3MiB/s 
(14.8MB/s-22.3MB/s), io=16.0GiB (17.2GB), run=24066-36161msec

unbound:
4k seq read: bw=232MiB/s (243MB/s), 6145KiB/s-9249KiB/s 
(6293kB/s-9471kB/s), io=13.6GiB (14.6GB), run=56685-60074msec
4k rand read: bw=249MiB/s (261MB/s), 6335KiB/s-9713KiB/s 
(6487kB/s-9946kB/s), io=14.6GiB (15.7GB), run=53976-60019msec
4k seq write: bw=358MiB/s (375MB/s), 11.2MiB/s-13.5MiB/s 
(11.7MB/s-14.2MB/s), io=16.0GiB (17.2GB), run=37918-45779msec
4k rand write: bw=335MiB/s (351MB/s), 10.5MiB/s-14.7MiB/s 
(10.0MB/s-15.4MB/s), io=16.0GiB (17.2GB), run=34929-48971msec

unbound + 'cpu' affinity:
4k seq read: bw=249MiB/s (261MB/s), 6003KiB/s-13.6MiB/s 
(6147kB/s-14.3MB/s), io=14.6GiB (15.7GB), run=37636-60065msec
4k rand read: bw=305MiB/s (320MB/s), 9773KiB/s-13.9MiB/s 
(10.0MB/s-14.6MB/s), io=16.0GiB (17.2GB), run=36791-53644msec
4k seq write: bw=499MiB/s (523MB/s), 15.6MiB/s-18.0MiB/s 
(16.3MB/s-19.9MB/s), io=16.0GiB (17.2GB), run=27018-32860msec
4k rand write: bw=536MiB/s (562MB/s), 16.7MiB/s-21.1MiB/s 
(17.6MB/s-22.1MB/s), io=16.0GiB (17.2GB), run=24305-30588msec

As you can see, with unbound and 'cpu' affinity we are basically on par
with the default implementations (all tests are run with per-controller
workqueues, mind).
Running the same workload with 4 subsystems and 8 paths will run into
I/O timeouts for the default implementation, but perfectly succeed with
unbound and 'cpu' affinity.
So definitely an improvement there.

I'll see to dig out performance numbers for the current implementations.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

next prev parent reply	other threads:[~2024-07-03 15:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-03 13:50 [PATCH 0/4] nvme-tcp: improve scalability Hannes Reinecke
2024-07-03 13:50 ` [PATCH 1/4] nvme-tcp: per-controller I/O workqueues Hannes Reinecke
2024-07-03 14:11   ` Sagi Grimberg
2024-07-03 14:46     ` Hannes Reinecke
2024-07-03 15:16       ` Sagi Grimberg
2024-07-03 17:07         ` Tejun Heo
2024-07-03 19:14           ` Sagi Grimberg
2024-07-03 19:17             ` Tejun Heo
2024-07-03 19:41               ` Sagi Grimberg
2024-07-04  7:36               ` Hannes Reinecke
2024-07-05  7:10                 ` Christoph Hellwig
2024-07-05  8:11                   ` Hannes Reinecke
2024-07-05  8:16                     ` Jens Axboe
2024-07-04  5:36   ` Christoph Hellwig
2024-07-03 13:50 ` [PATCH 2/4] nvme-tcp: align I/O cpu with blk-mq mapping Hannes Reinecke
2024-07-03 14:19   ` Sagi Grimberg
2024-07-03 14:53     ` Hannes Reinecke
2024-07-03 15:03       ` Sagi Grimberg
2024-07-03 15:40         ` Hannes Reinecke
2024-07-03 19:38           ` Sagi Grimberg
2024-07-03 19:47             ` Sagi Grimberg
2024-07-04  6:43             ` Hannes Reinecke
2024-07-04  9:07               ` Sagi Grimberg
2024-07-04 14:03                 ` Hannes Reinecke
2024-07-04  5:37     ` Christoph Hellwig
2024-07-04  9:13       ` Sagi Grimberg
2024-07-03 13:50 ` [PATCH 3/4] workqueue: introduce helper workqueue_unbound_affinity_scope() Hannes Reinecke
2024-07-03 17:31   ` Tejun Heo
2024-07-04  6:04     ` Hannes Reinecke
2024-07-03 13:50 ` [PATCH 4/4] nvme-tcp: switch to 'cpu' affinity scope for unbound workqueues Hannes Reinecke
2024-07-03 14:22   ` Sagi Grimberg
2024-07-03 15:01     ` Hannes Reinecke
2024-07-03 15:09       ` Sagi Grimberg
2024-07-03 15:50         ` Hannes Reinecke [this message]
2024-07-04  9:11           ` Sagi Grimberg
2024-07-04 15:54             ` Hannes Reinecke
2024-07-05 11:48               ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56ebc95e-8c9c-43cb-a849-a1bbdd4a98e6@suse.de \
    --to=hare@suse.de \
    --cc=hare@kernel.org \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox