From: Ming Lei <ming.lei@redhat.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: Jens Axboe <axboe@kernel.dk>, Mike Snitzer <snitzer@redhat.com>,
Long Li <longli@microsoft.com>,
"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
"Michael Kelley \(LINUX\)" <mikelley@microsoft.com>,
"'linux-block@vger.kernel.org'" <linux-block@vger.kernel.org>,
dm-devel@redhat.com, 'Christoph Hellwig' <hch@lst.de>
Subject: Re: [dm-devel] Random high CPU utilization in blk-mq with the none scheduler
Date: Tue, 14 Dec 2021 08:53:26 +0800 [thread overview]
Message-ID: <YbfrBpcV4hasdqQB@T590> (raw)
In-Reply-To: <BYAPR21MB12706DCD5ED9FC7AB3EE2EEABF759@BYAPR21MB1270.namprd21.prod.outlook.com>
On Tue, Dec 14, 2021 at 12:31:23AM +0000, Dexuan Cui wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> > Sent: Sunday, December 12, 2021 11:38 PM
>
> Ming, thanks so much for the detailed analysis!
>
> > From the log:
> >
> > 1) dm-mpath:
> > - queue depth: 2048
> > - busy: 848, and 62 of them are in sw queue, so run queue is often
> > caused
> > - nr_hw_queues: 1
> > - dm-2 is in use, and dm-1/dm-3 is idle
> > - dm-2's dispatch busy is 8, that should be the reason why excessive CPU
> > usage is observed when flushing plug list without commit dc5fc361d891 in
> > which hctx->dispatch_busy is just bypassed
> >
> > 2) iscsi
> > - dispatch_busy is 0
> > - nr_hw_queues: 1
> > - queue depth: 113
> > - busy=~33, active_queues is 3, so each LUN/iscsi host is saturated
> > - 23 active LUNs, 23 * 33 = 759 in-flight commands
> >
> > The high CPU utilization may be caused by:
> >
> > 1) big queue depth of dm mpath, the situation may be improved much if it
> > is reduced to 1024 or 800. The max allowed inflight commands from iscsi
> > hosts can be figured out, if dm's queue depth is much more than this number,
> > the extra commands need to dispatch, and run queue can be scheduled
> > immediately, so high CPU utilization is caused.
>
> I think you're correct:
> with dm_mod.dm_mq_queue_depth=256, the max CPU utilization is 8%.
> with dm_mod.dm_mq_queue_depth=400, the max CPU utilization is 12%.
> with dm_mod.dm_mq_queue_depth=800, the max CPU utilization is 88%.
>
> The performance with queue_depth=800 is poor.
> The performance with queue_depth=400 is good.
> The performance with queue_depth=256 is also good, and there is only a
> small drop comared with the 400 case.
That should be the reason why the issue isn't triggered in case of real
io scheduler.
So far blk-mq doesn't provide way to adjust tags queue depth
dynamically.
But not understand reason of default dm_mq_queue_depth(2048), in this
situation, each LUN can just queue 113/3 requests at most, and 3 LUNs
are attached to single iscsi host.
Mike, can you share why the default dm_mq_queue_depth is so big? And
seems it doesn't consider the underlying queue's queue depth. What is
the biggest dm rq queue depth? which need to saturate all underlying paths?
>
> > 2) single hw queue, so contention should be big, which should be avoided
> > in big machine, nvme-tcp might be better than iscsi here
> >
> > 3) iscsi io latency is a bit big
> >
> > Even CPU utilization is reduced by commit dc5fc361d891, io performance
> > can't be good too with v5.16-rc, I guess.
> >
> > Thanks,
> > Ming
>
> Actually the I/O performance of v5.16-rc4 (commit dc5fc361d891 is included)
> is good -- it's about the same as the case where v5.16-rc4 + reverting
> dc5fc361d891 + dm_mod.dm_mq_queue_depth=400 (or 256).
The single hw queue may be the root cause of your issue, and there
is only single run_work, which can be touched by all CPUs(~200) almost, so cache
ping-pong could be very serious.
Jens patch may improve it more or less, please test it.
Thanks,
Ming
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: Jens Axboe <axboe@kernel.dk>, 'Christoph Hellwig' <hch@lst.de>,
"'linux-block@vger.kernel.org'" <linux-block@vger.kernel.org>,
Long Li <longli@microsoft.com>,
"Michael Kelley (LINUX)" <mikelley@microsoft.com>,
"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
Mike Snitzer <snitzer@redhat.com>,
dm-devel@redhat.com
Subject: Re: Random high CPU utilization in blk-mq with the none scheduler
Date: Tue, 14 Dec 2021 08:53:26 +0800 [thread overview]
Message-ID: <YbfrBpcV4hasdqQB@T590> (raw)
In-Reply-To: <BYAPR21MB12706DCD5ED9FC7AB3EE2EEABF759@BYAPR21MB1270.namprd21.prod.outlook.com>
On Tue, Dec 14, 2021 at 12:31:23AM +0000, Dexuan Cui wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> > Sent: Sunday, December 12, 2021 11:38 PM
>
> Ming, thanks so much for the detailed analysis!
>
> > From the log:
> >
> > 1) dm-mpath:
> > - queue depth: 2048
> > - busy: 848, and 62 of them are in sw queue, so run queue is often
> > caused
> > - nr_hw_queues: 1
> > - dm-2 is in use, and dm-1/dm-3 is idle
> > - dm-2's dispatch busy is 8, that should be the reason why excessive CPU
> > usage is observed when flushing plug list without commit dc5fc361d891 in
> > which hctx->dispatch_busy is just bypassed
> >
> > 2) iscsi
> > - dispatch_busy is 0
> > - nr_hw_queues: 1
> > - queue depth: 113
> > - busy=~33, active_queues is 3, so each LUN/iscsi host is saturated
> > - 23 active LUNs, 23 * 33 = 759 in-flight commands
> >
> > The high CPU utilization may be caused by:
> >
> > 1) big queue depth of dm mpath, the situation may be improved much if it
> > is reduced to 1024 or 800. The max allowed inflight commands from iscsi
> > hosts can be figured out, if dm's queue depth is much more than this number,
> > the extra commands need to dispatch, and run queue can be scheduled
> > immediately, so high CPU utilization is caused.
>
> I think you're correct:
> with dm_mod.dm_mq_queue_depth=256, the max CPU utilization is 8%.
> with dm_mod.dm_mq_queue_depth=400, the max CPU utilization is 12%.
> with dm_mod.dm_mq_queue_depth=800, the max CPU utilization is 88%.
>
> The performance with queue_depth=800 is poor.
> The performance with queue_depth=400 is good.
> The performance with queue_depth=256 is also good, and there is only a
> small drop comared with the 400 case.
That should be the reason why the issue isn't triggered in case of real
io scheduler.
So far blk-mq doesn't provide way to adjust tags queue depth
dynamically.
But not understand reason of default dm_mq_queue_depth(2048), in this
situation, each LUN can just queue 113/3 requests at most, and 3 LUNs
are attached to single iscsi host.
Mike, can you share why the default dm_mq_queue_depth is so big? And
seems it doesn't consider the underlying queue's queue depth. What is
the biggest dm rq queue depth? which need to saturate all underlying paths?
>
> > 2) single hw queue, so contention should be big, which should be avoided
> > in big machine, nvme-tcp might be better than iscsi here
> >
> > 3) iscsi io latency is a bit big
> >
> > Even CPU utilization is reduced by commit dc5fc361d891, io performance
> > can't be good too with v5.16-rc, I guess.
> >
> > Thanks,
> > Ming
>
> Actually the I/O performance of v5.16-rc4 (commit dc5fc361d891 is included)
> is good -- it's about the same as the case where v5.16-rc4 + reverting
> dc5fc361d891 + dm_mod.dm_mq_queue_depth=400 (or 256).
The single hw queue may be the root cause of your issue, and there
is only single run_work, which can be touched by all CPUs(~200) almost, so cache
ping-pong could be very serious.
Jens patch may improve it more or less, please test it.
Thanks,
Ming
next prev parent reply other threads:[~2021-12-14 0:54 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-10 3:30 Random high CPU utilization in blk-mq with the none scheduler Dexuan Cui
2021-12-11 1:29 ` Dexuan Cui
2021-12-11 2:04 ` Jens Axboe
2021-12-11 3:10 ` Dexuan Cui
2021-12-11 3:15 ` Jens Axboe
2021-12-11 3:44 ` Dexuan Cui
2021-12-11 7:09 ` Dexuan Cui
2021-12-11 14:21 ` Jens Axboe
2021-12-11 18:54 ` Dexuan Cui
2021-12-13 18:43 ` Jens Axboe
2021-12-14 0:43 ` Dexuan Cui
2021-12-13 3:23 ` Ming Lei
2021-12-13 4:20 ` Dexuan Cui
2021-12-13 7:38 ` Ming Lei
2021-12-14 0:31 ` Dexuan Cui
2021-12-14 0:53 ` Ming Lei [this message]
2021-12-14 0:53 ` Ming Lei
2021-12-14 3:09 ` [dm-devel] " Dexuan Cui
2021-12-14 3:09 ` Dexuan Cui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YbfrBpcV4hasdqQB@T590 \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=decui@microsoft.com \
--cc=dm-devel@redhat.com \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=mikelley@microsoft.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.