From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: Shaohua Li <shli@fusionio.com>,
linux-raid@vger.kernel.org, axboe@kernel.dk
Subject: Re: [patch 6/7 v2] MD: raid5 trim support
Date: Tue, 18 Sep 2012 14:52:11 +1000 [thread overview]
Message-ID: <20120918145211.0cbfbfa1@notabene.brown> (raw)
In-Reply-To: <20120912040953.GA10047@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 4973 bytes --]
On Wed, 12 Sep 2012 12:09:53 +0800 Shaohua Li <shli@kernel.org> wrote:
> On Tue, Sep 11, 2012 at 02:10:08PM +1000, NeilBrown wrote:
> > > + bi = orig_bi;
> > > + if (bi->bi_rw & REQ_DISCARD) {
> > > + dd_idx++;
> > > + while (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
> > > + dd_idx++;
> > > + if (dd_idx < sh->disks)
> > > + goto again;
> > > + }
> >
> > I'm afraid there there is something else here that I can't make my self happy
> > with.
> >
> > You added a new "goto again" loop inside add_stripe_bio, and to compensate
> > the increase logical_sector in make_request so that it doesn't call
> > add_stripe_bio so many times.
> > This means that to control flow between make_request and add_stripe_bio is
> > very different depending on whether it is a discard or not. That make the
> > code harder to understand and easier to break later.
> >
> > I think it would to create a completely separate "make_trim_request()" which
> > handles the trim/discard case, and leave the current code as it is.
> >
> > If/when you do send a patch to do that, please also resend the other raid5
> > patch which comes after this one. I tend not to keep patches once I've
> > devices not to apply them immediately. It also reduces the chance of
> > confusion if you just send whatever you want me to apply.
>
> I'm afraid there will be a lot of duplicate code doing this way. How about
> below patch? I made it clearer for discard. If you like it, I'll send other
> patches.
I don't think there will be that much duplicate code. And even if there is
some, it is worth it to get a clearer control flow.
I tried writing something and realised that your code is (I think) racy.
You really need the whole stripe to be either all-discard, or no-discards.
So you need to hold the stripe_lock while adding the discard bios to every
device. Your code doesn't do that.
Here is what I came up with. It only addresses the 'make_request' part of
the patch and isn't even compile tested but it should show the general shape
of the solution, and show how little code is duplicated.
NeilBrown
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7031b86..ca80290 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4071,6 +4071,73 @@ static void release_stripe_plug(struct mddev *mddev,
release_stripe(sh);
}
+static void make_discard_request(struct mddev *mddev, struct bio *bi)
+{
+ sector_t logical_sector, last_sector;
+ int stripe_sectors;
+ struct r5conf *conf = mddev->private;
+
+ if (mddev->reshape_position != MaxSector)
+ /* Skip discard while reshape is happening */
+ return;
+
+ logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+ last_sector = bi->bi_sector + (bi->bi_size>>9);
+
+ bi->bi_next = NULL;
+ bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
+
+ stripe_sectors = conf->chunk_sectors *
+ (conf->raid_disks - conf->max_degraded);
+ logical_sector = DIV_ROUND_UP_SECTOR_T(logical_sector,
+ stripe_sectors);
+ sector_div(last_sector, stripe_sectors);
+
+ for (;logical_sector < last_sector;
+ logical_sector += STRIPE_SECTORS) {
+ sh = get_active_stripe(conf, logical_sector, previous,
+ 0, 0);
+ again:
+ prepare_to_wait(&conf->wait_for_overlap, &w,
+ TASK_UNINTERRUPTIBLE);
+ spin_lock_irq(&sh->stripe_lock);
+ for (d = 0; d < conf->raid_disks; d++) {
+ if (sh->dev[d].towrite ||
+ sh->dev[d].toread) {
+ set_bit(R5_Overwrite, &sh->dev[d].flags);
+ spin_unlock_irq(&sh->stripe_lock);
+ schedule();
+ goto again;
+ }
+ }
+ finish_wait(&conf->wait_for_overlap, &w);
+ for (d = 0; d < conf->raid_disks; d++)
+ if (d != conf->pd_idx &&
+ d != conf->qd_idx) {
+ sh->dev[d].towrite = bi;
+ set_bit(R5_OVERWRITE, &sh->dev[d].flags);
+ }
+ spin_unlock_irq(&sh->stripe_lock);
+ if (conf->mddev->bitmap) {
+ for (d = 0; d < conf->raid_disks - conf->max_degraded;
+ d++)
+ bitmap_startwrite(mddev->bitmap,
+ sh->sector,
+ STRIPE_SECTORS,
+ 0);
+ sh->bm_seq = conf->seq_flush + 1;
+ set_bit(STRIPE_BIT_DELAY, &sh->state);
+ }
+
+ set_bit(STRIPE_HANDLE, &sh->state);
+ clear_bit(STRIPE_DELAYED, &sh->state);
+ if (!test_and_set_bit(STRIPE_PREREAD_ACTIVE,
+ &sh->state))
+ atomic_inc(&conf->preread_active_stripes);
+ release_stripe_plug(mddev, sh);
+ }
+}
+
static void make_request(struct mddev *mddev, struct bio * bi)
{
struct r5conf *conf = mddev->private;
@@ -4093,6 +4160,11 @@ static void make_request(struct mddev *mddev, struct bio * bi)
chunk_aligned_read(mddev,bi))
return;
+ if (unlikely(bi->bi_rw & REQ_DISCARD)) {
+ make_discard_request(mddev, bi);
+ return;
+ }
+
logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
last_sector = bi->bi_sector + (bi->bi_size>>9);
bi->bi_next = NULL;
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-09-18 4:52 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-10 2:51 [patch 0/7 v2] MD linear/0/1/10/5 TRIM support Shaohua Li
2012-08-10 2:51 ` [patch 1/7 v2] block: makes bio_split support bio without data Shaohua Li
2012-08-10 2:51 ` [patch 2/7 v2] md: linear supports TRIM Shaohua Li
2012-08-10 2:51 ` [patch 3/7 v2] md: raid 0 " Shaohua Li
2012-08-10 2:51 ` [patch 4/7 v2] md: raid 1 " Shaohua Li
2012-08-10 2:51 ` [patch 5/7 v2] md: raid 10 " Shaohua Li
2012-08-10 2:51 ` [patch 6/7 v2] MD: raid5 trim support Shaohua Li
2012-08-13 1:50 ` NeilBrown
2012-08-13 2:04 ` Shaohua Li
2012-08-13 3:58 ` NeilBrown
2012-08-13 5:43 ` Shaohua Li
2012-09-11 4:10 ` NeilBrown
2012-09-12 4:09 ` Shaohua Li
2012-09-18 4:52 ` NeilBrown [this message]
2012-08-10 2:51 ` [patch 7/7 v2] MD: raid5 avoid unnecessary zero page for trim Shaohua Li
2012-08-13 1:51 ` [patch 0/7 v2] MD linear/0/1/10/5 TRIM support NeilBrown
2012-08-29 18:58 ` Holger Kiehl
2012-08-29 20:19 ` Martin K. Petersen
2012-08-30 0:45 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120918145211.0cbfbfa1@notabene.brown \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fusionio.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.