From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Logan Gunthorpe <logang@deltatee.com>,
linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
Song Liu <song@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
Stephen Bates <sbates@raithlin.com>,
Martin Oliveira <Martin.Oliveira@eideticom.com>,
David Sloan <David.Sloan@eideticom.com>
Subject: Re: [PATCH v2 12/12] md/raid5: Pivot raid5_make_request()
Date: Wed, 27 Apr 2022 10:06:58 +0800 [thread overview]
Message-ID: <61411981-6401-aaa7-9d3d-6a9ac1fec4f2@linux.dev> (raw)
In-Reply-To: <20220420195425.34911-13-logang@deltatee.com>
On 4/21/22 3:54 AM, Logan Gunthorpe wrote:
> The number of times the lock is taken can be reduced by pivoting
> raid5_make_request() so that it loops through every stripe and then
> loops through every disk in that stripe to see if the bio must be
> added. This reduces the number of times the lock must be taken by
> a factor equal to the number of data disks.
>
> To accomplish this, store the minimum and maxmimum disk sector that
> has already been finished and continue to the next logical sector if
> it is found that the disk sector has already been done. Then add a
> add_all_stripe_bios() to check all the bios for overlap and add them
> all if none of them overlap.
>
> Signed-off-by: Logan Gunthorpe<logang@deltatee.com>
> ---
> drivers/md/raid5.c | 92 +++++++++++++++++++++++++++++++++++++++++++---
> drivers/md/raid5.h | 1 +
> 2 files changed, 88 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 40a25c4b80bd..f86866cb15be 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3571,6 +3571,48 @@ static bool add_stripe_bio(struct stripe_head *sh, struct bio *bi,
> return true;
> }
>
> +static int add_all_stripe_bios(struct stripe_head *sh, struct bio *bi,
> + sector_t first_logical_sector, sector_t last_sector,
> + int forwrite, int previous)
> +{
> + int dd_idx;
> + int ret = 1;
> +
> + spin_lock_irq(&sh->stripe_lock);
> +
> + for (dd_idx = 0; dd_idx < sh->disks; dd_idx++) {
> + struct r5dev *dev = &sh->dev[dd_idx];
> +
> + clear_bit(R5_BioReady, &dev->flags);
> +
> + if (dd_idx == sh->pd_idx)
> + continue;
> +
> + if (dev->sector < first_logical_sector ||
> + dev->sector >= last_sector)
> + continue;
> +
> + if (stripe_bio_overlaps(sh, bi, dd_idx, forwrite)) {
> + set_bit(R5_Overlap, &dev->flags);
> + ret = 0;
> + continue;
> + }
> +
> + set_bit(R5_BioReady, &dev->flags);
Is it possible to just call __add_stripe_bio here? And change above
"continue"
to "return",
> + }
> +
> + if (!ret)
> + goto out;
> +
> + for (dd_idx = 0; dd_idx < sh->disks; dd_idx++)
> + if (test_bit(R5_BioReady, &sh->dev[dd_idx].flags))
> + __add_stripe_bio(sh, bi, dd_idx, forwrite, previous);
then we don't need another loop here, also no need to introduce another
flag.
But I am not sure it is feasible, so just FYI.
> +
> +out:
> + spin_unlock_irq(&sh->stripe_lock);
> + return ret;
> +}
> +
> static void end_reshape(struct r5conf *conf);
>
> static void stripe_set_idx(sector_t stripe, struct r5conf *conf, int previous,
> @@ -5869,6 +5911,10 @@ enum stripe_result {
> struct stripe_request_ctx {
> bool do_flush;
> struct stripe_head *batch_last;
> + sector_t disk_sector_done;
> + sector_t start_disk_sector;
> + bool first_wrap;
> + sector_t last_sector;
> };
Could you add some comments to above members if possible?
> static enum stripe_result make_stripe_request(struct mddev *mddev,
> @@ -5908,6 +5954,36 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
>
> new_sector = raid5_compute_sector(conf, logical_sector, previous,
> &dd_idx, NULL);
> +
> + /*
> + * This is a tricky algorithm to figure out which stripe_heads that
> + * have already been visited and exit early if the stripe_head has
> + * already been done. (Seeing all disks are added to a stripe_head
> + * once in add_all_stripe_bios().
> + *
> + * To start with, the disk sector of the last stripe that has been
> + * completed is stored in ctx->disk_sector_done. If the new_sector is
> + * less than this value, the stripe_head has already been done.
> + *
> + * There's one issue with this: if the request starts in the middle of
> + * a chunk, all the stripe heads before the starting offset will be
> + * missed. To account for this, set the first_wrap boolean to true
> + * if new_sector is less than the starting sector. Clear the
> + * boolean once the start sector is hit for the second time.
> + * When first_wrap is set, ignore the disk_sector_done.
> + */
> + if (ctx->start_disk_sector == MaxSector) {
> + ctx->start_disk_sector = new_sector;
> + } else if (new_sector < ctx->start_disk_sector) {
> + ctx->first_wrap = true;
> + } else if (new_sector == ctx->start_disk_sector) {
> + ctx->first_wrap = false;
> + ctx->start_disk_sector = 0;
> + return STRIPE_SUCCESS;
> + } else if (!ctx->first_wrap && new_sector <= ctx->disk_sector_done) {
> + return STRIPE_SUCCESS;
> + }
> +
Hmm, with above tricky algorithm, I guess the point is that we can avoid
to call below
stripe_add_to_batch_list where has hash_lock contention. If so, maybe we
can change
stripe_can_batch for the purpose.
> if (stripe_can_batch(sh)) {
> stripe_add_to_batch_list(conf, sh, ctx->batch_last);
> if (ctx->batch_last)
> @@ -5977,8 +6057,10 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
> static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
> {
> struct r5conf *conf = mddev->private;
> - sector_t logical_sector, last_sector;
> - struct stripe_request_ctx ctx = {};
> + sector_t logical_sector;
> + struct stripe_request_ctx ctx = {
> + .start_disk_sector = MaxSector,
> + };
> const int rw = bio_data_dir(bi);
> enum stripe_result res;
> DEFINE_WAIT(w);
> @@ -6021,7 +6103,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
> }
>
> logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
> - last_sector = bio_end_sector(bi);
> + ctx.last_sector = bio_end_sector(bi);
> bi->bi_next = NULL;
>
> /* Bail out if conflicts with reshape and REQ_NOWAIT is set */
> @@ -6036,7 +6118,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
> }
> md_account_bio(mddev, &bi);
> prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> - while (logical_sector < last_sector) {
> + while (logical_sector < ctx.last_sector) {
> res = make_stripe_request(mddev, conf, &ctx, logical_sector,
> bi);
> if (res == STRIPE_FAIL) {
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index 638d29863503..e73b58844f83 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -308,6 +308,7 @@ enum r5dev_flags {
> R5_Wantwrite,
> R5_Overlap, /* There is a pending overlapping request
> * on this block */
> + R5_BioReady, /* The current bio can be added to this disk */
This doesn't seem right to me, since the comment describes bio status
while others
are probably for r5dev.
Thanks,
Guoqing
next prev parent reply other threads:[~2022-04-27 2:07 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-20 19:54 [PATCH v2 00/12] Improve Raid5 Lock Contention Logan Gunthorpe
2022-04-20 19:54 ` [PATCH v2 01/12] md/raid5: Factor out ahead_of_reshape() function Logan Gunthorpe
2022-04-21 6:07 ` Christoph Hellwig
2022-04-21 9:17 ` Paul Menzel
2022-04-21 16:05 ` Logan Gunthorpe
2022-04-21 23:33 ` Wol
2022-04-27 1:28 ` Guoqing Jiang
2022-04-27 16:07 ` Logan Gunthorpe
2022-04-28 1:49 ` Guoqing Jiang
2022-04-28 15:44 ` Logan Gunthorpe
2022-04-29 0:24 ` Guoqing Jiang
2022-04-20 19:54 ` [PATCH v2 02/12] md/raid5: Refactor raid5_make_request loop Logan Gunthorpe
2022-04-21 6:08 ` Christoph Hellwig
2022-04-27 1:32 ` Guoqing Jiang
2022-04-27 16:08 ` Logan Gunthorpe
2022-04-28 1:16 ` Guoqing Jiang
2022-04-20 19:54 ` [PATCH v2 03/12] md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio() Logan Gunthorpe
2022-04-27 1:33 ` Guoqing Jiang
2022-04-20 19:54 ` [PATCH v2 04/12] md/raid5: Move common stripe count increment code into __find_stripe() Logan Gunthorpe
2022-04-21 6:10 ` Christoph Hellwig
2022-04-27 1:33 ` Guoqing Jiang
2022-04-20 19:54 ` [PATCH v2 05/12] md/raid5: Factor out helper from raid5_make_request() loop Logan Gunthorpe
2022-04-21 6:14 ` Christoph Hellwig
2022-04-20 19:54 ` [PATCH v2 06/12] md/raid5: Drop the do_prepare flag in raid5_make_request() Logan Gunthorpe
2022-04-21 6:15 ` Christoph Hellwig
2022-04-27 2:11 ` Guoqing Jiang
2022-04-20 19:54 ` [PATCH v2 07/12] md/raid5: Move read_seqcount_begin() into make_stripe_request() Logan Gunthorpe
2022-04-21 6:15 ` Christoph Hellwig
2022-04-27 2:13 ` Guoqing Jiang
2022-04-20 19:54 ` [PATCH v2 08/12] md/raid5: Refactor for loop in raid5_make_request() into while loop Logan Gunthorpe
2022-04-21 6:16 ` Christoph Hellwig
2022-04-20 19:54 ` [PATCH v2 09/12] md/raid5: Keep a reference to last stripe_head for batch Logan Gunthorpe
2022-04-21 6:17 ` Christoph Hellwig
2022-04-27 1:36 ` Guoqing Jiang
2022-04-27 23:27 ` Logan Gunthorpe
2022-04-20 19:54 ` [PATCH v2 10/12] md/raid5: Refactor add_stripe_bio() Logan Gunthorpe
2022-04-21 6:18 ` Christoph Hellwig
2022-04-20 19:54 ` [PATCH v2 11/12] md/raid5: Check all disks in a stripe_head for reshape progress Logan Gunthorpe
2022-04-21 6:18 ` Christoph Hellwig
2022-04-27 1:53 ` Guoqing Jiang
2022-04-27 16:11 ` Logan Gunthorpe
2022-04-20 19:54 ` [PATCH v2 12/12] md/raid5: Pivot raid5_make_request() Logan Gunthorpe
2022-04-21 6:43 ` Christoph Hellwig
2022-04-21 15:54 ` Logan Gunthorpe
2022-04-27 2:06 ` Guoqing Jiang [this message]
2022-04-27 16:18 ` Logan Gunthorpe
2022-04-28 1:32 ` Guoqing Jiang
2022-04-21 8:45 ` [PATCH v2 00/12] Improve Raid5 Lock Contention Xiao Ni
2022-04-21 16:02 ` Logan Gunthorpe
2022-04-24 8:00 ` Guoqing Jiang
2022-04-25 15:39 ` Logan Gunthorpe
2022-04-25 16:12 ` Xiao Ni
2022-04-28 21:22 ` Logan Gunthorpe
2022-04-29 0:49 ` Guoqing Jiang
2022-04-29 16:01 ` Logan Gunthorpe
2022-04-30 1:44 ` Guoqing Jiang
2022-04-24 7:53 ` Guoqing Jiang
2022-04-25 15:37 ` Logan Gunthorpe
2022-04-25 23:07 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=61411981-6401-aaa7-9d3d-6a9ac1fec4f2@linux.dev \
--to=guoqing.jiang@linux.dev \
--cc=David.Sloan@eideticom.com \
--cc=Martin.Oliveira@eideticom.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=sbates@raithlin.com \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.