From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: Shaohua Li <shli@fusionio.com>,
linux-raid@vger.kernel.org, axboe@kernel.dk
Subject: Re: [patch 6/7 v2] MD: raid5 trim support
Date: Tue, 18 Sep 2012 14:52:11 +1000 [thread overview]
Message-ID: <20120918145211.0cbfbfa1@notabene.brown> (raw)
In-Reply-To: <20120912040953.GA10047@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 4973 bytes --]
On Wed, 12 Sep 2012 12:09:53 +0800 Shaohua Li <shli@kernel.org> wrote:
> On Tue, Sep 11, 2012 at 02:10:08PM +1000, NeilBrown wrote:
> > > + bi = orig_bi;
> > > + if (bi->bi_rw & REQ_DISCARD) {
> > > + dd_idx++;
> > > + while (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
> > > + dd_idx++;
> > > + if (dd_idx < sh->disks)
> > > + goto again;
> > > + }
> >
> > I'm afraid there there is something else here that I can't make my self happy
> > with.
> >
> > You added a new "goto again" loop inside add_stripe_bio, and to compensate
> > the increase logical_sector in make_request so that it doesn't call
> > add_stripe_bio so many times.
> > This means that to control flow between make_request and add_stripe_bio is
> > very different depending on whether it is a discard or not. That make the
> > code harder to understand and easier to break later.
> >
> > I think it would to create a completely separate "make_trim_request()" which
> > handles the trim/discard case, and leave the current code as it is.
> >
> > If/when you do send a patch to do that, please also resend the other raid5
> > patch which comes after this one. I tend not to keep patches once I've
> > devices not to apply them immediately. It also reduces the chance of
> > confusion if you just send whatever you want me to apply.
>
> I'm afraid there will be a lot of duplicate code doing this way. How about
> below patch? I made it clearer for discard. If you like it, I'll send other
> patches.
I don't think there will be that much duplicate code. And even if there is
some, it is worth it to get a clearer control flow.
I tried writing something and realised that your code is (I think) racy.
You really need the whole stripe to be either all-discard, or no-discards.
So you need to hold the stripe_lock while adding the discard bios to every
device. Your code doesn't do that.
Here is what I came up with. It only addresses the 'make_request' part of
the patch and isn't even compile tested but it should show the general shape
of the solution, and show how little code is duplicated.
NeilBrown
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7031b86..ca80290 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4071,6 +4071,73 @@ static void release_stripe_plug(struct mddev *mddev,
release_stripe(sh);
}
+static void make_discard_request(struct mddev *mddev, struct bio *bi)
+{
+ sector_t logical_sector, last_sector;
+ int stripe_sectors;
+ struct r5conf *conf = mddev->private;
+
+ if (mddev->reshape_position != MaxSector)
+ /* Skip discard while reshape is happening */
+ return;
+
+ logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+ last_sector = bi->bi_sector + (bi->bi_size>>9);
+
+ bi->bi_next = NULL;
+ bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
+
+ stripe_sectors = conf->chunk_sectors *
+ (conf->raid_disks - conf->max_degraded);
+ logical_sector = DIV_ROUND_UP_SECTOR_T(logical_sector,
+ stripe_sectors);
+ sector_div(last_sector, stripe_sectors);
+
+ for (;logical_sector < last_sector;
+ logical_sector += STRIPE_SECTORS) {
+ sh = get_active_stripe(conf, logical_sector, previous,
+ 0, 0);
+ again:
+ prepare_to_wait(&conf->wait_for_overlap, &w,
+ TASK_UNINTERRUPTIBLE);
+ spin_lock_irq(&sh->stripe_lock);
+ for (d = 0; d < conf->raid_disks; d++) {
+ if (sh->dev[d].towrite ||
+ sh->dev[d].toread) {
+ set_bit(R5_Overwrite, &sh->dev[d].flags);
+ spin_unlock_irq(&sh->stripe_lock);
+ schedule();
+ goto again;
+ }
+ }
+ finish_wait(&conf->wait_for_overlap, &w);
+ for (d = 0; d < conf->raid_disks; d++)
+ if (d != conf->pd_idx &&
+ d != conf->qd_idx) {
+ sh->dev[d].towrite = bi;
+ set_bit(R5_OVERWRITE, &sh->dev[d].flags);
+ }
+ spin_unlock_irq(&sh->stripe_lock);
+ if (conf->mddev->bitmap) {
+ for (d = 0; d < conf->raid_disks - conf->max_degraded;
+ d++)
+ bitmap_startwrite(mddev->bitmap,
+ sh->sector,
+ STRIPE_SECTORS,
+ 0);
+ sh->bm_seq = conf->seq_flush + 1;
+ set_bit(STRIPE_BIT_DELAY, &sh->state);
+ }
+
+ set_bit(STRIPE_HANDLE, &sh->state);
+ clear_bit(STRIPE_DELAYED, &sh->state);
+ if (!test_and_set_bit(STRIPE_PREREAD_ACTIVE,
+ &sh->state))
+ atomic_inc(&conf->preread_active_stripes);
+ release_stripe_plug(mddev, sh);
+ }
+}
+
static void make_request(struct mddev *mddev, struct bio * bi)
{
struct r5conf *conf = mddev->private;
@@ -4093,6 +4160,11 @@ static void make_request(struct mddev *mddev, struct bio * bi)
chunk_aligned_read(mddev,bi))
return;
+ if (unlikely(bi->bi_rw & REQ_DISCARD)) {
+ make_discard_request(mddev, bi);
+ return;
+ }
+
logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
last_sector = bi->bi_sector + (bi->bi_size>>9);
bi->bi_next = NULL;
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-09-18 4:52 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-10 2:51 [patch 0/7 v2] MD linear/0/1/10/5 TRIM support Shaohua Li
2012-08-10 2:51 ` [patch 1/7 v2] block: makes bio_split support bio without data Shaohua Li
2012-08-10 2:51 ` [patch 2/7 v2] md: linear supports TRIM Shaohua Li
2012-08-10 2:51 ` [patch 3/7 v2] md: raid 0 " Shaohua Li
2012-08-10 2:51 ` [patch 4/7 v2] md: raid 1 " Shaohua Li
2012-08-10 2:51 ` [patch 5/7 v2] md: raid 10 " Shaohua Li
2012-08-10 2:51 ` [patch 6/7 v2] MD: raid5 trim support Shaohua Li
2012-08-13 1:50 ` NeilBrown
2012-08-13 2:04 ` Shaohua Li
2012-08-13 3:58 ` NeilBrown
2012-08-13 5:43 ` Shaohua Li
2012-09-11 4:10 ` NeilBrown
2012-09-12 4:09 ` Shaohua Li
2012-09-18 4:52 ` NeilBrown [this message]
2012-08-10 2:51 ` [patch 7/7 v2] MD: raid5 avoid unnecessary zero page for trim Shaohua Li
2012-08-13 1:51 ` [patch 0/7 v2] MD linear/0/1/10/5 TRIM support NeilBrown
2012-08-29 18:58 ` Holger Kiehl
2012-08-29 20:19 ` Martin K. Petersen
2012-08-30 0:45 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120918145211.0cbfbfa1@notabene.brown \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fusionio.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).