public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kashyap Desai <kashyap.desai@broadcom.com>
To: John Garry <john.garry@huawei.com>, axboe@kernel.dk
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	ming.lei@redhat.com,
	Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Subject: RE: [PATCH RFT] blk-mq: optimize queue tag busy iter for shared_tags
Date: Tue, 21 Dec 2021 19:23:53 +0530	[thread overview]
Message-ID: <7028630054e9cd0e8c84670a27c2b164@mail.gmail.com> (raw)
In-Reply-To: <e9174a89-b3a4-d737-c5a9-ff3969053479@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 5475 bytes --]

>
> On 21/12/2021 12:31, Kashyap Desai wrote:
>
> Hi Kashyap,
>
> What kernel is this for? 5.17 or 5.16 + stable? Your intention is not
> clear to
> me.


Hi John

This is for current/5.17. This patch is meaningfully only on top of [1].

[1] " blk-mq: Use shared tags for shared sbitmap support" Commit -
e155b0c238b20f0a866f4334d292656665836c8a

While doing additional testing for [1], I noticed some performance issue.
Along with the performance issue, I noticed CPU lockup as well. Lockup
trace -

_raw_spin_lock_irqsave+0x42/0x50
 blk_mq_find_and_get_req+0x20/0xa0
 bt_iter+0x2d/0x80
 blk_mq_queue_tag_busy_iter+0x1aa/0x2f0
 ? blk_mq_complete_request+0x30/0x30
 ? blk_mq_complete_request+0x30/0x30
 ? __schedule+0x360/0x850
 blk_mq_timeout_work+0x5e/0x120
 process_one_work+0x1a8/0x380
 worker_thread+0x30/0x380
 ? wq_calc_node_cpumask.isra.30+0x100/0x100
 kthread+0x167/0x190
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30

It is a generic performance issue if driver use " shost->host_tagset = 1".
In fact, I found that [1] is useful to fix performance issue and provided
this additional patch.

I changed my setup to have 64 scsi_devices (earlier I just kept 16 or 24
drives, so did not noticed this issue). Performance/cpu lockup issue is not
due to [1].
More number of scsi device, hardware context per host and high queue depth
will increase the chances of lockup and performance drop.

Do you think, it is good to have changes in 5.16 + stable ?
I don't know if this  patch will create any side effect. Can you review and
let me know your feedback. ?

Kashyap

>
>
> > In [0], CPU usage for blk_mq_queue_tag_busy_iter() was optimized, but
> > there are still periodic call of blk_mq_queue_tag_busy_iter() from
> > below context. Below context is used for block layer timer to find out
> > potential expired command (per request queue) which requires tag
> > iteration
> > almost every 5 seconds(defined BLK_MAX_TIMEOUT) for each request
> queue.
> >
> > kthread
> >          worker_thread
> >          process_one_work
> >          blk_mq_timeout_work
> >          blk_mq_queue_tag_busy_iter
> >          bt_iter
> >          blk_mq_find_and_get_req
> >          _raw_spin_lock_irqsave
> >          native_queued_spin_lock_slowpath
> >
> > Changes in this patch optimize extra iterations of tags in case of
> > shared_tags. One iteration of shared_tags can give expected results for
> > iterate function.
> >
> > Setup -  AMD64 Gen-4.0 Server.
> > 64 Virtual Drive created using 16 Nvme drives + mpi3mr driver (in
> > shared_tags mode)
> >
> > Test command -
> > fio 64.fio --rw=randread --bs=4K --iodepth=32 --numjobs=2 --
> ioscheduler=mq-deadline --disk_util=0
> >
> > Without this patch on 5.16.0-rc5, mpi3mr driver in shared_tags mode can
> > give 4.0M IOPs vs expected to get ~6.0M.
> > Snippet of perf top
> >
> >    25.42%  [kernel]                               [k]
> > native_queued_spin_lock_slowpath
> >     3.95%  [kernel]                               [k] cpupri_set
> >     2.05%  [kernel]                               [k]
> > __blk_mq_get_driver_tag
> >     1.67%  [kernel]                               [k] __rcu_read_unlock
> >     1.63%  [kernel]                               [k]
> > check_preemption_disabled
> >
> > After applying this patch on 5.16.0-rc5, mpi3mr driver in shared_tags
> > mode reach up to 5.8M IOPs.
> >
> > Snippet of perf top
> >
> >     7.95%  [kernel]                               [k]
> > native_queued_spin_lock_slowpath
> >     5.61%  [kernel]                               [k] cpupri_set
> >     2.98%  [kernel]                               [k]
> > acpi_processor_ffh_cstate_enter
> >     2.49%  [kernel]                               [k] read_tsc
> >     2.15%  [kernel]                               [k]
> > check_preemption_disabled
> >
> >
> > [0]
> https://lore.kernel.org/all/9b092ca49e9b5415772cd950a3c12584@mail.gma
> il.com/
> >
> >
> > Cc: linux-block@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: john.garry@huawei.com
> > Cc: ming.lei@redhat.com
> > Cc: sathya.prakash@broadcom.com
> > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
> > ---
> >   block/blk-mq-tag.c | 11 ++++++++++-
> >   1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > index 995336abee33..3e0a8e79f966 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -253,7 +253,8 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned
> int bitnr, void *data)
> >   	if (!rq)
> >   		return true;
> >
> > -	if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> > +	if (rq->q == hctx->queue && (rq->mq_hctx == hctx ||
> > +				blk_mq_is_shared_tags(hctx->flags)))
> >   		ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
> >   	blk_mq_put_rq_ref(rq);
> >   	return ret;
> > @@ -484,6 +485,14 @@ void blk_mq_queue_tag_busy_iter(struct
> request_queue *q, busy_iter_fn *fn,
> >   		if (tags->nr_reserved_tags)
> >   			bt_for_each(hctx, &tags->breserved_tags, fn, priv,
> true);
> >   		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
> > +
> > +		/* In case of shared bitmap if shared_tags is allocated, it is not
> required
> > +		 * to iterate all the hctx. Looping one hctx is good enough.
> > +		 */
> > +		if (blk_mq_is_shared_tags(hctx->flags)) {
> > +			blk_queue_exit(q);
> > +			return;
>
> this looks like v5.16-rc6 code
>
> > +		}
> >   	}
> >   	blk_queue_exit(q);
> >   }
> >
>
>
>
> Thanks,
> John

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4212 bytes --]

  reply	other threads:[~2021-12-21 13:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-21 12:31 [PATCH RFT] blk-mq: optimize queue tag busy iter for shared_tags Kashyap Desai
2021-12-21 12:55 ` John Garry
2021-12-21 13:53   ` Kashyap Desai [this message]
2021-12-21 15:19     ` John Garry
2021-12-22 11:20       ` Kashyap Desai
2021-12-22 11:35         ` John Garry
2021-12-22 12:06           ` Kashyap Desai
2021-12-22 12:34             ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7028630054e9cd0e8c84670a27c2b164@mail.gmail.com \
    --to=kashyap.desai@broadcom.com \
    --cc=axboe@kernel.dk \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=sathya.prakash@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox