From: NeilBrown <neilb@suse.de>
To: majianpeng <majianpeng@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.
Date: Thu, 24 Oct 2013 12:50:59 +1100 [thread overview]
Message-ID: <20131024125059.1a440152@notabene.brown> (raw)
In-Reply-To: <201308281940511854138@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 15935 bytes --]
Hi,
sorry for the excessive delay in reviewing these.
I've applied the first three, though I fixed some careless spelling errors in
the patch descriptions and fixed up some of the grammar.
This one is fairly good - just a couple of fixes needed.
It is good that you have a nice details patch description at the top, but it
is a bit hard to read. Fixing the spelling and improving the formatting
would help a lot....
See below for comments in the code....
On Wed, 28 Aug 2013 19:40:54 +0800 majianpeng <majianpeng@gmail.com> wrote:
> There is a iobarrier in raid1 because the contend between normalIO and
> resyncIO.It suspend all normal IO when reysync.
> But if normalIO is outrange the resync windwos,there is no contend.
> So I rewrite the iobarrier.
>
> I patition the whole space into five parts.
> |---------|-----------|------------|----------------|-------|
> Pa next_resync start_next_window Pb
>
> Firstly i introduce some concepts:
> 1:RESYNC_WINDOW: For resync, there are 32 sync requests at most.A sync
> request is RESYNC_BLOCK_SIZE(64*1024).So the RESYNC_WINDOW is 32 *
> RESYNC_BLOCK_SIZE, that is 2MB.
> 2:NEXT_NORMALIO_DISTANCE: it indicate the distance between next_resync
> and start_next_window.It also indicate the distance between
> start_next_window and Pb.Now it is 3 * RESYNC_WINDOW_SIZE.It can tune.
> 3:next_resync mean the next start sector which will being do sync.
> 4:Pa means a position which before RESYNC_WINDOW distance from
> next_resync.
> 5:start_next_window means a position which after NEXT_NORMALIO_DISTANCE distance from
> next_resync.For normalio after this position,it can't wait resyncio
> 6:Pb means a position which after 2 * NEXT_NORMALIO_DISTANCE distance from
> next_resync.
> 7:current_window_requests means the count of normalIO between Pb and
> start_next_window.
> 8:next_window_requests means the count of nonmalIO after Pb.
>
> NormalIO will be partitioned into four types:
> NoIO1: it means the end sector of bio is smaller or equal the Pa.
> NoIO2: it means the start sector of bio larger or equal Pb
> NoIO3: it means the start sector of bio larger or equal start_next_window.
> NoIO4: it means the location between Pa and start_next_window
>
> For NoIO1,it don't use iobarrier.
> For NoIO4,it used the original iobarrier mechanism.The nornalIO and
> syncIO must be contend.
> For NoIO2/3, i add two filed in struct r1conf "current_window_requests,
> next_window_requests".They indicate the count of two types.
> For those, they don't wait.They only add the relevant parameter.
>
> For resync action, if there are NoIO4s, it must be wait.If not,it
> forward.But if current_window_requests > 0 and sync action reached to the
> start_next_window,it must wait until the current_window_requests became
> zero.If current_window_requests became zero,the start_next_window also
> forward. The current_window_requests will replaced by
> next_window_requests.
>
> There is a problem which when and how to change from NoIO2 to NoIO3.Only
> so, the sync action can forward.
>
> I add a filed in struct r1conf "start_next_window"..What condition will change?
> A: if start_next_window == MaxSector, it means there are no NoIO2/3.
> So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
> B: if current_window_requests == 0 && next_window_requests != 0, it
> means start_next_window move to Pb
>
> There is anthor problem which how to differentiate when NoIO2 became
> NoIO3.
> For example, there are many bios which are NoIO2 and a bio which is
> NoIO3. NoIO3 firstly completed,so the bios of NoIO2 became NoIO3.
> At generil,we should use flags and list_head. The codes should:
> >list_splice_init(NoIO2, NoIO3);
> >current_window_requests = next_window_requests;
> >next_window_requests = 0;
> >list_for_each_entry(NoIO3){
> > clear_bit(NoIO2_FLAG),
> > set_bit(NoIO3_FLAG,
> >}
> If there are many NoIO2, it will take too long in list_for_each_entry.
> Avoid this, i add a filed in struct r1bio "start_next_window".
> I used this to record the position conf->start_next_window when call wait_barrier in func
> make_request.
> For NoIO1/4, the value is zero.
> In func allow_barrier, it check the conf->start_next_window
> If r1bio->stat_next_window == conf->start_next_window,it mean
> there is no transition which NoIO2 to NoIO3.In this condtion
> if bio->bi_sector > conf->start_next_window + NEXT_NORMALIO_DISTANCE
> it means the bio is NoIO2;
> else
> it means the bio si NoIO3.
>
> If r1bio->start_next_window != conf->start_next_window,it mean
> there is a trnasiton which NoIO2 to NoIO3.Because there is at most one
> transtions.So it only means the bio is NoIO2.
>
> For one bio,there are many r1bio.So we make sure the
> r1bio->start_next_window is the same value.
> If we met blocked_dev, it must call allow_barrier and wait_barrier.
> So the first and the second value of conf->start_next_window will
> change.If there are many r1bio's with differnet start_next_window.
> For the relevant bio, it depent on the last value of r1bio.
> It will cause error.To avoid this, we must wait previous r1bios completes.
>
> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> ---
> drivers/md/raid1.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++------
> drivers/md/raid1.h | 14 +++++++
> 2 files changed, 121 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 08f076a..4499870 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -66,7 +66,8 @@
> */
> static int max_queued_requests = 1024;
>
> -static void allow_barrier(struct r1conf *conf);
> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> + sector_t bi_sector);
> static void lower_barrier(struct r1conf *conf);
>
> static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
> @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
> struct bio *bio = r1_bio->master_bio;
> int done;
> struct r1conf *conf = r1_bio->mddev->private;
> + sector_t start_next_window = r1_bio->start_next_window;
> + sector_t bi_sector = bio->bi_sector;
>
> if (bio->bi_phys_segments) {
> unsigned long flags;
> @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
> bio->bi_phys_segments--;
> done = (bio->bi_phys_segments == 0);
> spin_unlock_irqrestore(&conf->device_lock, flags);
> + /*
> + * make_request might be waiting for
> + * bi_phys_sectors to decrease
> + */
> + wake_up(&conf->wait_barrier);
bi_phys_sectors??? I think you mean bi_phys_segments. This sort of
attention to detail is really important.
> } else
> done = 1;
>
> @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
> * Wake up any possible resync thread that waits for the device
> * to go idle.
> */
> - allow_barrier(conf);
> + allow_barrier(conf, start_next_window, bi_sector);
> }
> }
>
> @@ -827,10 +835,19 @@ static void raise_barrier(struct r1conf *conf)
> /* block any new IO from starting */
> conf->barrier++;
>
> - /* Now wait for all pending IO to complete */
> + /* For those conditions, we must wait
> + * A:array is in frozen state.
> + * B:conf->nr_pending > 0 mean in sync window there are normal bios.
> + * C:total barrier are more than RESYNC_DEPTH
> + * D:conf->next_resync + RESYNC_SECTORS > conf->start_next_window
> + * mean the next nornalIO window not clear and next syncIO will be into
> + * this window.
> + */
This comment is indented badly.
/* For these conditions we must wait:
* A: while the array is in the frozen state
* B: while nr_pending > 0, meaning that there are normal bios in the
* resync window
...
> wait_event_lock_irq(conf->wait_barrier,
> !conf->array_frozen &&
> - !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
> + !conf->nr_pending && conf->barrier < RESYNC_DEPTH &&
> + (conf->next_resync + RESYNC_SECTORS
> + <= conf->start_next_window),
I think conditions are much more readable if the thing that you expect to
change comes first.
In this case we might need to wait until start_next_window increases, so
conf->start_next_window >= conf->next_resync + RESYNC_SECTORS
as it is, it looks like you are waiting for next_resync to decrease.
> conf->resync_lock);
>
> spin_unlock_irq(&conf->resync_lock);
> @@ -846,10 +863,43 @@ static void lower_barrier(struct r1conf *conf)
> wake_up(&conf->wait_barrier);
> }
>
> -static void wait_barrier(struct r1conf *conf)
> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
> {
> + bool wait = false;
> +
> + if (conf->array_frozen || !bio)
> + wait = true;
> + else if (conf->barrier && bio_data_dir(bio) == WRITE) {
> + if (conf->next_resync < RESYNC_WINDOW_SECTORS)
> + wait = true;
> + else if (conf->next_resync - RESYNC_WINDOW_SECTORS >=
> + bio_end_sector(bio))
> + wait = false;
> + else if (conf->next_resync + NEXT_NORMALIO_DISTANCE
> + <= bio->bi_sector) {
> + if (conf->start_next_window == MaxSector)
> + conf->start_next_window =
> + conf->next_resync +
> + NEXT_NORMALIO_DISTANCE;
> +
> + if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
> + <= bio->bi_sector)
> + conf->next_window_requests++;
> + else
> + conf->current_window_requests++;
> + } else
> + wait = true;
> + }
> +
> + return wait;
> +}
I really don't like it that this function changes start_next_window and
*_window_requests().
It looks like it is a simple function that tests a condition. But it
actually has a side-effect as well. And it has the side-effect when it
returns false which is extra confusing.
Also, you are incrementing *_window_requests for READ requests, which you
don't need to do.
I would change it so the below looks something like:
if (need_to_wait_for_sync(conf, bio)) {
....
} else if (bio_data_dir(bio) == WRITE) {
possibly update start_next_window and *_window_requests here
}
> +
> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
> +{
> + sector_t sector = 0;
> +
> spin_lock_irq(&conf->resync_lock);
> - if (conf->barrier) {
> + if (need_to_wait_for_sync(conf, bio)) {
> conf->nr_waiting++;
> /* Wait for the barrier to drop.
> * However if there are already pending
> @@ -868,16 +918,42 @@ static void wait_barrier(struct r1conf *conf)
> !bio_list_empty(current->bio_list))),
> conf->resync_lock);
> conf->nr_waiting--;
> - }
> + } else if (bio_data_dir(bio) == WRITE &&
> + bio->bi_sector >= conf->start_next_window)
> + sector = conf->start_next_window;
> conf->nr_pending++;
> spin_unlock_irq(&conf->resync_lock);
> + return sector;
> }
>
> -static void allow_barrier(struct r1conf *conf)
> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
> + sector_t bi_sector)
> {
> unsigned long flags;
> +
> spin_lock_irqsave(&conf->resync_lock, flags);
> conf->nr_pending--;
> + if (start_next_window) {
> + if (start_next_window == conf->start_next_window) {
> + if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
> + <= bi_sector)
> + conf->next_window_requests--;
> + else
> + conf->current_window_requests--;
> + } else
> + conf->current_window_requests--;
> +
> + if (!conf->current_window_requests) {
> + if (conf->next_window_requests) {
> + conf->current_window_requests =
> + conf->next_window_requests;
> + conf->next_window_requests = 0;
> + conf->start_next_window +=
> + NEXT_NORMALIO_DISTANCE;
> + } else
> + conf->start_next_window = MaxSector;
> + }
> + }
> spin_unlock_irqrestore(&conf->resync_lock, flags);
> wake_up(&conf->wait_barrier);
> }
> @@ -1012,6 +1088,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> int first_clone;
> int sectors_handled;
> int max_sectors;
> + sector_t start_next_window;
>
> /*
> * Register the new request and wait if the reconstruction
> @@ -1041,7 +1118,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> finish_wait(&conf->wait_barrier, &w);
> }
>
> - wait_barrier(conf);
> + start_next_window = wait_barrier(conf, bio);
>
> bitmap = mddev->bitmap;
>
> @@ -1162,6 +1239,7 @@ read_again:
>
> disks = conf->raid_disks * 2;
> retry_write:
> + r1_bio->start_next_window = start_next_window;
> blocked_rdev = NULL;
> rcu_read_lock();
> max_sectors = r1_bio->sectors;
> @@ -1230,14 +1308,24 @@ read_again:
> if (unlikely(blocked_rdev)) {
> /* Wait for this device to become unblocked */
> int j;
> + sector_t old = start_next_window;
>
> for (j = 0; j < i; j++)
> if (r1_bio->bios[j])
> rdev_dec_pending(conf->mirrors[j].rdev, mddev);
> r1_bio->state = 0;
> - allow_barrier(conf);
> + allow_barrier(conf, start_next_window, bio->bi_sector);
> md_wait_for_blocked_rdev(blocked_rdev, mddev);
> - wait_barrier(conf);
> + start_next_window = wait_barrier(conf, bio);
> + /*
> + * We must make sure the multi r1bios of bio have
> + * the same value of after_threetimes_resync_sector
> + */
after_threetimes_resync_sector???? Attention to detail please!
If you resubmit with these things fixed up I'll try to get it reviewed quite
a bit quicker than this time :-)
Thanks,
NeilBrown
> + if (bio->bi_phys_segments && old &&
> + old != start_next_window)
> + /*wait the former r1bio(s) completed*/
> + wait_event(conf->wait_barrier,
> + bio->bi_phys_segments == 1);
> goto retry_write;
> }
>
> @@ -1437,11 +1525,14 @@ static void print_conf(struct r1conf *conf)
>
> static void close_sync(struct r1conf *conf)
> {
> - wait_barrier(conf);
> - allow_barrier(conf);
> + wait_barrier(conf, NULL);
> + allow_barrier(conf, 0, 0);
>
> mempool_destroy(conf->r1buf_pool);
> conf->r1buf_pool = NULL;
> +
> + conf->next_resync = 0;
> + conf->start_next_window = MaxSector;
> }
>
> static int raid1_spare_active(struct mddev *mddev)
> @@ -2712,6 +2803,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
> conf->pending_count = 0;
> conf->recovery_disabled = mddev->recovery_disabled - 1;
>
> + conf->start_next_window = MaxSector;
> + conf->current_window_requests = conf->next_window_requests = 0;
> +
> err = -EIO;
> for (i = 0; i < conf->raid_disks * 2; i++) {
>
> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> index 331a98a..07425a1 100644
> --- a/drivers/md/raid1.h
> +++ b/drivers/md/raid1.h
> @@ -41,6 +41,19 @@ struct r1conf {
> */
> sector_t next_resync;
>
> + /*when raid1 start resync,we divide raid into four partitions
> + * |---------|--------------|---------------------|-------------|
> + * next_resync start_next_window Pc
> + * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
> + * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
> + * current_window_requests means the count of normalIO between
> + * start_next_window and Pc.
> + * next_window_requests means the count of nornalIO after Pc.
> + * */
> + sector_t start_next_window;
> + int current_window_requests;
> + int next_window_requests;
> +
> spinlock_t device_lock;
>
> /* list of 'struct r1bio' that need to be processed by raid1d,
> @@ -112,6 +125,7 @@ struct r1bio {
> * in this BehindIO request
> */
> sector_t sector;
> + sector_t start_next_window;
> int sectors;
> unsigned long state;
> struct mddev *mddev;
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-10-24 1:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-28 11:40 [PATCH 4/4] raid1: Rewrite the implementation of iobarrier majianpeng
2013-10-24 1:50 ` NeilBrown [this message]
2013-10-29 1:30 ` majianpeng
2013-10-31 2:33 ` NeilBrown
2013-10-31 3:20 ` majianpeng
2013-11-14 6:44 ` NeilBrown
2013-11-15 2:29 ` majianpeng
2013-11-15 3:42 ` NeilBrown
2013-11-15 6:55 ` majianpeng
2013-11-19 4:25 ` NeilBrown
2013-11-19 7:53 ` majianpeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131024125059.1a440152@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=majianpeng@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).