linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: Shaohua Li <shli@fusionio.com>,
	linux-raid@vger.kernel.org, axboe@kernel.dk
Subject: Re: [patch 6/7 v2] MD: raid5 trim support
Date: Tue, 18 Sep 2012 14:52:11 +1000	[thread overview]
Message-ID: <20120918145211.0cbfbfa1@notabene.brown> (raw)
In-Reply-To: <20120912040953.GA10047@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 4973 bytes --]

On Wed, 12 Sep 2012 12:09:53 +0800 Shaohua Li <shli@kernel.org> wrote:

> On Tue, Sep 11, 2012 at 02:10:08PM +1000, NeilBrown wrote:
> > > +	bi = orig_bi;
> > > +	if (bi->bi_rw & REQ_DISCARD) {
> > > +		dd_idx++;
> > > +		while (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
> > > +			dd_idx++;
> > > +		if (dd_idx < sh->disks)
> > > +			goto again;
> > > +	}
> > 
> > I'm afraid there there is something else here that I can't make my self happy
> > with.
> > 
> > You added a new "goto again" loop inside add_stripe_bio, and to compensate
> > the increase logical_sector in make_request so that it doesn't call
> > add_stripe_bio so many times.
> > This means that to control flow between make_request and add_stripe_bio is
> > very different depending on whether it is a discard or not.  That make the
> > code harder to understand and easier to break later.
> > 
> > I think it would to create a completely separate "make_trim_request()" which
> > handles the trim/discard case, and leave the current code as it is.
> > 
> > If/when you do send a patch to do that, please also resend the other raid5
> > patch which comes after this one.  I tend not to keep patches once I've
> > devices not to apply them immediately.  It also reduces the chance of
> > confusion if you just send whatever you want me to apply.
> 
> I'm afraid there will be a lot of duplicate code doing this way. How about
> below patch? I made it clearer for discard. If you like it, I'll send other
> patches.

I don't think there will be that much duplicate code.  And even if there is
some, it is worth it to get a clearer control flow.

I tried writing something and realised that your code is (I think) racy.
You really need the whole stripe to be either all-discard, or no-discards.
So you need to hold the stripe_lock while adding the discard bios to every
device.  Your code doesn't do that.

Here is what I came up with.  It only addresses the 'make_request' part of
the patch and isn't even compile tested but it should show the general shape
of the solution, and show how little code is duplicated.

NeilBrown

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7031b86..ca80290 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4071,6 +4071,73 @@ static void release_stripe_plug(struct mddev *mddev,
 		release_stripe(sh);
 }
 
+static void make_discard_request(struct mddev *mddev, struct bio *bi)
+{
+	sector_t logical_sector, last_sector;
+	int stripe_sectors;
+	struct r5conf *conf = mddev->private;
+
+	if (mddev->reshape_position != MaxSector)
+		/* Skip discard while reshape is happening */
+		return;
+
+	logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+	last_sector = bi->bi_sector + (bi->bi_size>>9);
+
+	bi->bi_next = NULL;
+	bi->bi_phys_segments = 1;	/* over-loaded to count active stripes */
+
+	stripe_sectors = conf->chunk_sectors *
+		(conf->raid_disks - conf->max_degraded);
+	logical_sector = DIV_ROUND_UP_SECTOR_T(logical_sector,
+					       stripe_sectors);
+	sector_div(last_sector, stripe_sectors);
+
+	for (;logical_sector < last_sector;
+	     logical_sector += STRIPE_SECTORS) {
+		sh = get_active_stripe(conf, logical_sector, previous,
+				       0, 0);
+	again:
+		prepare_to_wait(&conf->wait_for_overlap, &w,
+				TASK_UNINTERRUPTIBLE);
+		spin_lock_irq(&sh->stripe_lock);
+		for (d = 0; d < conf->raid_disks; d++) {
+			if (sh->dev[d].towrite ||
+			    sh->dev[d].toread) {
+				set_bit(R5_Overwrite, &sh->dev[d].flags);
+				spin_unlock_irq(&sh->stripe_lock);
+				schedule();
+				goto again;
+			}
+		}
+		finish_wait(&conf->wait_for_overlap, &w);
+		for (d = 0; d < conf->raid_disks; d++)
+			if (d != conf->pd_idx &&
+			    d != conf->qd_idx) {
+				sh->dev[d].towrite = bi;
+				set_bit(R5_OVERWRITE, &sh->dev[d].flags);
+			}
+		spin_unlock_irq(&sh->stripe_lock);
+		if (conf->mddev->bitmap) {
+			for (d = 0; d < conf->raid_disks - conf->max_degraded;
+			     d++)
+				bitmap_startwrite(mddev->bitmap,
+						  sh->sector,
+						  STRIPE_SECTORS,
+						  0);
+			sh->bm_seq = conf->seq_flush + 1;
+			set_bit(STRIPE_BIT_DELAY, &sh->state);
+		}
+
+		set_bit(STRIPE_HANDLE, &sh->state);
+		clear_bit(STRIPE_DELAYED, &sh->state);
+		if (!test_and_set_bit(STRIPE_PREREAD_ACTIVE,
+				      &sh->state))
+			atomic_inc(&conf->preread_active_stripes);
+		release_stripe_plug(mddev, sh);
+	}
+}
+
 static void make_request(struct mddev *mddev, struct bio * bi)
 {
 	struct r5conf *conf = mddev->private;
@@ -4093,6 +4160,11 @@ static void make_request(struct mddev *mddev, struct bio * bi)
 	     chunk_aligned_read(mddev,bi))
 		return;
 
+	if (unlikely(bi->bi_rw & REQ_DISCARD)) {
+		make_discard_request(mddev, bi);
+		return;
+	}
+
 	logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
 	last_sector = bi->bi_sector + (bi->bi_size>>9);
 	bi->bi_next = NULL;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-09-18  4:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-10  2:51 [patch 0/7 v2] MD linear/0/1/10/5 TRIM support Shaohua Li
2012-08-10  2:51 ` [patch 1/7 v2] block: makes bio_split support bio without data Shaohua Li
2012-08-10  2:51 ` [patch 2/7 v2] md: linear supports TRIM Shaohua Li
2012-08-10  2:51 ` [patch 3/7 v2] md: raid 0 " Shaohua Li
2012-08-10  2:51 ` [patch 4/7 v2] md: raid 1 " Shaohua Li
2012-08-10  2:51 ` [patch 5/7 v2] md: raid 10 " Shaohua Li
2012-08-10  2:51 ` [patch 6/7 v2] MD: raid5 trim support Shaohua Li
2012-08-13  1:50   ` NeilBrown
2012-08-13  2:04     ` Shaohua Li
2012-08-13  3:58       ` NeilBrown
2012-08-13  5:43         ` Shaohua Li
2012-09-11  4:10           ` NeilBrown
2012-09-12  4:09             ` Shaohua Li
2012-09-18  4:52               ` NeilBrown [this message]
2012-08-10  2:51 ` [patch 7/7 v2] MD: raid5 avoid unnecessary zero page for trim Shaohua Li
2012-08-13  1:51 ` [patch 0/7 v2] MD linear/0/1/10/5 TRIM support NeilBrown
2012-08-29 18:58 ` Holger Kiehl
2012-08-29 20:19   ` Martin K. Petersen
2012-08-30  0:45   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120918145211.0cbfbfa1@notabene.brown \
    --to=neilb@suse.de \
    --cc=axboe@kernel.dk \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fusionio.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).