From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Hannes Reinecke <hare@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
linux-nvme@lists.infradead.org
Subject: Re: [PATCH 2/4] nvme-tcp: align I/O cpu with blk-mq mapping
Date: Wed, 3 Jul 2024 17:40:55 +0200 [thread overview]
Message-ID: <dc849cb6-8bf3-4363-9bd1-9b13138c0fb7@suse.de> (raw)
In-Reply-To: <9bc9d94f-a129-4c6e-ac9e-d0eb8db341b0@grimberg.me>
On 7/3/24 17:03, Sagi Grimberg wrote:
>
>
> On 03/07/2024 17:53, Hannes Reinecke wrote:
>> On 7/3/24 16:19, Sagi Grimberg wrote:
>>>
>>>
>>> On 03/07/2024 16:50, Hannes Reinecke wrote:
>>>> When 'wq_unbound' is selected we should select the
>>>> the first CPU from a given blk-mq hctx mapping to queue
>>>> the tcp workqueue item. With this we can instruct the
>>>> workqueue code to keep the I/O affinity and avoid
>>>> a performance penalty.
>>>
>>> wq_unbound is designed to keep io_cpu to be UNBOUND, my recollection
>>> was the the person introducing it was trying to make the io_cpu
>>> always be on a specific NUMA node, or a subset of cpus within a numa node. So
>>> he uses that and tinkers with wq cpumask via sysfs.
>>>
>>> I don't see why you are tying this to wq_unbound in the first place.
>>>
>> Because in the default case the workqueue is nailed to a cpu, and will
>> not move from it. IE if you call 'queue_work_on()' it _will_ run on
>> that cpu.
>> But if something else is running on that CPU (printk logging, say),
>> you will have to stand in the queue until the scheduler gives you some
>> time.
>>
>> If the workqueue is unbound the workqueue code is able to switch away
>> from the cpu if it finds it busy or otherwise unsuitable, leading to a
>> better utilization and avoiding a workqueue stall.
>> And in the 'unbound' case the 'cpu' argument merely serves as a hint
>> where to place the workqueue item.
>> At least, that's how I understood the code.
>
> We should make the io_cpu come from blk-mq hctx mapping by default, and
> for every controller it should use a different cpu from the hctx
> mapping. That is the default behavior. in the wq_unbound case, we skip
> all of that and make io_cpu = WORK_CPU_UNBOUND, as it was before.
>
> I'm not sure I follow your logic.
>
Hehe. That's quite simple: there is none :-)
I have been tinkering with that approach in the last weeks, but got
consistently _worse_ results than with the original implementation.
So I gave up on trying to make that the default.
>>
>> And it makes the 'CPU hogged' messages go away, which is a bonus in
>> itself...
>
> Which messages? aren't these messages saying that the work spent too
> much time? why are you describing the case where the work does not get
> cpu quota to run?
I means these messages:
workqueue: nvme_tcp_io_work [nvme_tcp] hogged CPU for >10000us 32771
times, consider switching to WQ_UNBOUND
which I get consistently during testing with the default implementation.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
next prev parent reply other threads:[~2024-07-03 15:41 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-03 13:50 [PATCH 0/4] nvme-tcp: improve scalability Hannes Reinecke
2024-07-03 13:50 ` [PATCH 1/4] nvme-tcp: per-controller I/O workqueues Hannes Reinecke
2024-07-03 14:11 ` Sagi Grimberg
2024-07-03 14:46 ` Hannes Reinecke
2024-07-03 15:16 ` Sagi Grimberg
2024-07-03 17:07 ` Tejun Heo
2024-07-03 19:14 ` Sagi Grimberg
2024-07-03 19:17 ` Tejun Heo
2024-07-03 19:41 ` Sagi Grimberg
2024-07-04 7:36 ` Hannes Reinecke
2024-07-05 7:10 ` Christoph Hellwig
2024-07-05 8:11 ` Hannes Reinecke
2024-07-05 8:16 ` Jens Axboe
2024-07-04 5:36 ` Christoph Hellwig
2024-07-03 13:50 ` [PATCH 2/4] nvme-tcp: align I/O cpu with blk-mq mapping Hannes Reinecke
2024-07-03 14:19 ` Sagi Grimberg
2024-07-03 14:53 ` Hannes Reinecke
2024-07-03 15:03 ` Sagi Grimberg
2024-07-03 15:40 ` Hannes Reinecke [this message]
2024-07-03 19:38 ` Sagi Grimberg
2024-07-03 19:47 ` Sagi Grimberg
2024-07-04 6:43 ` Hannes Reinecke
2024-07-04 9:07 ` Sagi Grimberg
2024-07-04 14:03 ` Hannes Reinecke
2024-07-04 5:37 ` Christoph Hellwig
2024-07-04 9:13 ` Sagi Grimberg
2024-07-03 13:50 ` [PATCH 3/4] workqueue: introduce helper workqueue_unbound_affinity_scope() Hannes Reinecke
2024-07-03 17:31 ` Tejun Heo
2024-07-04 6:04 ` Hannes Reinecke
2024-07-03 13:50 ` [PATCH 4/4] nvme-tcp: switch to 'cpu' affinity scope for unbound workqueues Hannes Reinecke
2024-07-03 14:22 ` Sagi Grimberg
2024-07-03 15:01 ` Hannes Reinecke
2024-07-03 15:09 ` Sagi Grimberg
2024-07-03 15:50 ` Hannes Reinecke
2024-07-04 9:11 ` Sagi Grimberg
2024-07-04 15:54 ` Hannes Reinecke
2024-07-05 11:48 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dc849cb6-8bf3-4363-9bd1-9b13138c0fb7@suse.de \
--to=hare@suse.de \
--cc=hare@kernel.org \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox