* About RAID1/10 io barrier between normal io and resync/recovery io.
@ 2012-11-01 7:58 majianpeng
2012-11-01 8:24 ` NeilBrown
0 siblings, 1 reply; 2+ messages in thread
From: majianpeng @ 2012-11-01 7:58 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
Hi Neil:
At present, there are barriers in raid1/10.About barrier,you described: "Sometimes we need to suspend IO while we do something else, either some resync/recovery, or reconfigure the array"
So when do normal request in func make_request, it call wait_barrier.And in sync_request it call raise_barrier.
Of course,there are some place to call raise_barrier to do other things.
But there, i only wanted to talk about normal io and resync/recovery.
1:I think you add io barrier is because raid1/10 didn't stripe like raid456.So it only for the situation which normal io and sync io had same content. Is that right?
2: If normal ios outside the resync/recovery window, i think there is no necessary to use io barrier.
I wanted to find the reason by reviewing the git log.But at present, git log of kernel is only after kernel 2.6.12.
So i looked into the code about kernel 2.4.And found something like i thought. There were resync window.
But why did you remove or modify ?
3: I think using resync window to control normal io is perfect but complicated.So i only used one positon which before mddev->curr_resync_completed.
The code is:
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 8034fbd..c45a769 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -234,6 +234,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
{
struct bio *bio = r1_bio->master_bio;
int done;
+ int skip_barrier = test_bit(R1BIO_SKIP_BARRIER, &r1_bio->state);
struct r1conf *conf = r1_bio->mddev->private;
if (bio->bi_phys_segments) {
@@ -253,7 +254,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
* Wake up any possible resync thread that waits for the device
* to go idle.
*/
- allow_barrier(conf);
+ if (!skip_barrier)
+ allow_barrier(conf);
}
}
@@ -1007,6 +1009,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
int first_clone;
int sectors_handled;
int max_sectors;
+ int skip_barrier = 0;
/*
* Register the new request and wait if the reconstruction
@@ -1036,7 +1039,13 @@ static void make_request(struct mddev *mddev, struct bio * bio)
finish_wait(&conf->wait_barrier, &w);
}
- wait_barrier(conf);
+ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
+ (bio->bi_sector + bio->bi_size/512 <=
+ mddev->curr_resync_completed))
+ skip_barrier = 1;
+
+ if (!skip_barrier)
+ wait_barrier(conf);
bitmap = mddev->bitmap;
@@ -1053,6 +1062,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
r1_bio->mddev = mddev;
r1_bio->sector = bio->bi_sector;
+ if (skip_barrier)
+ set_bit(R1BIO_SKIP_BARRIER, &r1_bio->state);
/* We might need to issue multiple reads to different
* devices if there are bad blocks around, so we keep
* track of the number of reads in bio->bi_phys_segments.
@@ -1229,10 +1240,15 @@ read_again:
for (j = 0; j < i; j++)
if (r1_bio->bios[j])
rdev_dec_pending(conf->mirrors[j].rdev, mddev);
- r1_bio->state = 0;
- allow_barrier(conf);
+ if (!skip_barrier)
+ r1_bio->state = 0;
+ else
+ set_bit(R1BIO_SKIP_BARRIER, &r1_bio->state);
+ if (!skip_barrier)
+ allow_barrier(conf);
md_wait_for_blocked_rdev(blocked_rdev, mddev);
- wait_barrier(conf);
+ if (!skip_barrier)
+ wait_barrier(conf);
goto retry_write;
}
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index 0ff3715..fab9fda 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -158,6 +158,8 @@ struct r1bio {
#define R1BIO_MadeGood 7
#define R1BIO_WriteError 8
+#define R1BIO_SKIP_BARRIER 9
+
extern int md_raid1_congested(struct mddev *mddev, int bits);
#endif
At present, the code only for normal io and sync io.But etc stop/remove disk /add disk/fix_read_error also used io barrier.
I used fio to test.
The fio script is:
[global]
filename=/dev/md0
direct=1
bs=512K
[write]
rw=write
runtime=600
[read]
stonewall
rw=read
runtime=600
I firstly started resync/recovery for some times in order to normal io position is before curr_resync_completed.
And i also keep the resync/recovery speed is about 30MB/s throught sys_speed_min.
The result is:
w/ above code:
write recovery/resync 30MB/S; fio 70MB/s
read recovery/resync 30MB/s; fio 90MB/s
w/o above code:
write recovery/resync 30MB/s; fio 29MB/s
read recovery/resync 30MB/s; fio 31MB/s
The result is remarkable.
Jianpeng
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: About RAID1/10 io barrier between normal io and resync/recovery io.
2012-11-01 7:58 About RAID1/10 io barrier between normal io and resync/recovery io majianpeng
@ 2012-11-01 8:24 ` NeilBrown
0 siblings, 0 replies; 2+ messages in thread
From: NeilBrown @ 2012-11-01 8:24 UTC (permalink / raw)
To: majianpeng; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5952 bytes --]
On Thu, 1 Nov 2012 15:58:04 +0800 majianpeng <majianpeng@gmail.com> wrote:
> Hi Neil:
> At present, there are barriers in raid1/10.About barrier,you described: "Sometimes we need to suspend IO while we do something else, either some resync/recovery, or reconfigure the array"
> So when do normal request in func make_request, it call wait_barrier.And in sync_request it call raise_barrier.
> Of course,there are some place to call raise_barrier to do other things.
> But there, i only wanted to talk about normal io and resync/recovery.
>
> 1:I think you add io barrier is because raid1/10 didn't stripe like raid456.So it only for the situation which normal io and sync io had same content. Is that right?
Same sector, yes.
> 2: If normal ios outside the resync/recovery window, i think there is no necessary to use io barrier.
correct.
> I wanted to find the reason by reviewing the git log.But at present, git log of kernel is only after kernel 2.6.12.
> So i looked into the code about kernel 2.4.And found something like i thought. There were resync window.
> But why did you remove or modify ?
I didn't. Ingo Molnar did.
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=0925bad356699684440720f2a908030f98bc3284
It wouldn't have generalised to raid10 very well (because resync and recovery
are handled very differently in raid10).
> 3: I think using resync window to control normal io is perfect but complicated.So i only used one positon which before mddev->curr_resync_completed.
Looks good, though I'll have a proper look next week when I have a bit more
time.
You only need the barrier for write operations, but I think we raise it for
read operations too. We can probably skip that.
Thanks,
NeilBrown
> The code is:
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 8034fbd..c45a769 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -234,6 +234,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
> {
> struct bio *bio = r1_bio->master_bio;
> int done;
> + int skip_barrier = test_bit(R1BIO_SKIP_BARRIER, &r1_bio->state);
> struct r1conf *conf = r1_bio->mddev->private;
>
> if (bio->bi_phys_segments) {
> @@ -253,7 +254,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
> * Wake up any possible resync thread that waits for the device
> * to go idle.
> */
> - allow_barrier(conf);
> + if (!skip_barrier)
> + allow_barrier(conf);
> }
> }
>
> @@ -1007,6 +1009,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> int first_clone;
> int sectors_handled;
> int max_sectors;
> + int skip_barrier = 0;
>
> /*
> * Register the new request and wait if the reconstruction
> @@ -1036,7 +1039,13 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> finish_wait(&conf->wait_barrier, &w);
> }
>
> - wait_barrier(conf);
> + if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
> + (bio->bi_sector + bio->bi_size/512 <=
> + mddev->curr_resync_completed))
> + skip_barrier = 1;
> +
> + if (!skip_barrier)
> + wait_barrier(conf);
>
> bitmap = mddev->bitmap;
>
> @@ -1053,6 +1062,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> r1_bio->mddev = mddev;
> r1_bio->sector = bio->bi_sector;
>
> + if (skip_barrier)
> + set_bit(R1BIO_SKIP_BARRIER, &r1_bio->state);
> /* We might need to issue multiple reads to different
> * devices if there are bad blocks around, so we keep
> * track of the number of reads in bio->bi_phys_segments.
> @@ -1229,10 +1240,15 @@ read_again:
> for (j = 0; j < i; j++)
> if (r1_bio->bios[j])
> rdev_dec_pending(conf->mirrors[j].rdev, mddev);
> - r1_bio->state = 0;
> - allow_barrier(conf);
> + if (!skip_barrier)
> + r1_bio->state = 0;
> + else
> + set_bit(R1BIO_SKIP_BARRIER, &r1_bio->state);
> + if (!skip_barrier)
> + allow_barrier(conf);
> md_wait_for_blocked_rdev(blocked_rdev, mddev);
> - wait_barrier(conf);
> + if (!skip_barrier)
> + wait_barrier(conf);
> goto retry_write;
> }
>
> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> index 0ff3715..fab9fda 100644
> --- a/drivers/md/raid1.h
> +++ b/drivers/md/raid1.h
> @@ -158,6 +158,8 @@ struct r1bio {
> #define R1BIO_MadeGood 7
> #define R1BIO_WriteError 8
>
> +#define R1BIO_SKIP_BARRIER 9
> +
> extern int md_raid1_congested(struct mddev *mddev, int bits);
>
> #endif
>
> At present, the code only for normal io and sync io.But etc stop/remove disk /add disk/fix_read_error also used io barrier.
> I used fio to test.
> The fio script is:
> [global]
> filename=/dev/md0
> direct=1
> bs=512K
>
> [write]
> rw=write
> runtime=600
>
> [read]
> stonewall
> rw=read
> runtime=600
>
> I firstly started resync/recovery for some times in order to normal io position is before curr_resync_completed.
> And i also keep the resync/recovery speed is about 30MB/s throught sys_speed_min.
> The result is:
> w/ above code:
> write recovery/resync 30MB/S; fio 70MB/s
> read recovery/resync 30MB/s; fio 90MB/s
>
> w/o above code:
> write recovery/resync 30MB/s; fio 29MB/s
> read recovery/resync 30MB/s; fio 31MB/s
>
> The result is remarkable.
>
>
> Jianpeng
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-11-01 8:24 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-01 7:58 About RAID1/10 io barrier between normal io and resync/recovery io majianpeng
2012-11-01 8:24 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).