* [patch] latency problem in md driver
@ 2006-12-22 14:32 Lars Ellenberg
2006-12-22 15:19 ` Jeff Garzik
2006-12-22 20:40 ` Neil Brown
0 siblings, 2 replies; 7+ messages in thread
From: Lars Ellenberg @ 2006-12-22 14:32 UTC (permalink / raw)
To: Ingo Molnar, Neil Brown, linux-raid
md raidX make_request functions strip off the BIO_RW_SYNC flag,
this introducing additional latency.
below is a suggested patch for the raid1.c .
other suggested solutions would be to let the bio_clone do its work,
and not reassign thereby stripping off all flags.
at most strip off known unwanted flags (the BARRIER flag).
similar pattern in the other raid versions.
cheers,
Lars
--- /mnt/kernel-src/linux-2.6.19/drivers/md/raid1.c.orig 2006-12-11 10:06:17.661776243 +0100
+++ /mnt/kernel-src/linux-2.6.19/drivers/md/raid1.c 2006-12-12 11:09:55.975762364 +0100
@@ -776,6 +776,7 @@ static int make_request(request_queue_t
struct page **behind_pages = NULL;
const int rw = bio_data_dir(bio);
int do_barriers;
+ int do_sync = bio_sync(bio);
/*
* Register the new request and wait if the reconstruction
@@ -835,7 +836,7 @@ static int make_request(request_queue_t
read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset;
read_bio->bi_bdev = mirror->rdev->bdev;
read_bio->bi_end_io = raid1_end_read_request;
- read_bio->bi_rw = READ;
+ read_bio->bi_rw = READ | do_sync;
read_bio->bi_private = r1_bio;
generic_make_request(read_bio);
@@ -906,7 +907,7 @@ static int make_request(request_queue_t
mbio->bi_sector = r1_bio->sector + conf->mirrors[i].rdev->data_offset;
mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
mbio->bi_end_io = raid1_end_write_request;
- mbio->bi_rw = WRITE | do_barriers;
+ mbio->bi_rw = WRITE | do_barriers | do_sync;
mbio->bi_private = r1_bio;
if (behind_pages) {
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] latency problem in md driver
2006-12-22 14:32 [patch] latency problem in md driver Lars Ellenberg
@ 2006-12-22 15:19 ` Jeff Garzik
2006-12-22 15:37 ` Ric Wheeler
2006-12-22 15:40 ` Lars Ellenberg
2006-12-22 20:40 ` Neil Brown
1 sibling, 2 replies; 7+ messages in thread
From: Jeff Garzik @ 2006-12-22 15:19 UTC (permalink / raw)
To: Lars Ellenberg
Cc: Ingo Molnar, Neil Brown, linux-raid, Andrew Morton, Jens Axboe
Lars Ellenberg wrote:
> md raidX make_request functions strip off the BIO_RW_SYNC flag,
> this introducing additional latency.
>
> below is a suggested patch for the raid1.c .
> other suggested solutions would be to let the bio_clone do its work,
> and not reassign thereby stripping off all flags.
> at most strip off known unwanted flags (the BARRIER flag).
It sounds like a major bug to strip the barrier flag. I quite
understand that a barrier to a RAID device as a whole behaves
differently from a barrier to an ATA or SCSI device, but that's no
excuse to avoid the problem.
If MD does not pass barriers, it is unilaterally dropping the "data made
it to the media" guarantee.
Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] latency problem in md driver
2006-12-22 15:19 ` Jeff Garzik
@ 2006-12-22 15:37 ` Ric Wheeler
2006-12-22 15:40 ` Lars Ellenberg
1 sibling, 0 replies; 7+ messages in thread
From: Ric Wheeler @ 2006-12-22 15:37 UTC (permalink / raw)
To: Jeff Garzik
Cc: Lars Ellenberg, Ingo Molnar, Neil Brown, linux-raid,
Andrew Morton, Jens Axboe
Jeff Garzik wrote:
> Lars Ellenberg wrote:
>> md raidX make_request functions strip off the BIO_RW_SYNC flag,
>> this introducing additional latency.
>>
>> below is a suggested patch for the raid1.c .
>> other suggested solutions would be to let the bio_clone do its work,
>> and not reassign thereby stripping off all flags.
>> at most strip off known unwanted flags (the BARRIER flag).
>
> It sounds like a major bug to strip the barrier flag. I quite
> understand that a barrier to a RAID device as a whole behaves
> differently from a barrier to an ATA or SCSI device, but that's no
> excuse to avoid the problem.
>
> If MD does not pass barriers, it is unilaterally dropping the "data made
> it to the media" guarantee.
>
> Jeff
Exactly right - if we do not pass the barrier request down to the
members of the RAID group, then we lose the data integrity needed.
Of course, in a RAID group, this will be introduce latency, but that is
the correct behavior.
ric
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] latency problem in md driver
2006-12-22 15:19 ` Jeff Garzik
2006-12-22 15:37 ` Ric Wheeler
@ 2006-12-22 15:40 ` Lars Ellenberg
2006-12-22 17:33 ` Lars Ellenberg
1 sibling, 1 reply; 7+ messages in thread
From: Lars Ellenberg @ 2006-12-22 15:40 UTC (permalink / raw)
To: Jeff Garzik
Cc: Ingo Molnar, Neil Brown, linux-raid, Andrew Morton, Jens Axboe
/ 2006-12-22 10:19:53 -0500
\ Jeff Garzik:
> Lars Ellenberg wrote:
> >md raidX make_request functions strip off the BIO_RW_SYNC flag,
> >this introducing additional latency.
> >below is a suggested patch for the raid1.c .
> >other suggested solutions would be to let the bio_clone do its work,
> >and not reassign thereby stripping off all flags.
> >at most strip off known unwanted flags (the BARRIER flag).
>
> It sounds like a major bug to strip the barrier flag. I quite understand that a barrier to a RAID device as a
> whole behaves differently from a barrier to an ATA or SCSI device, but that's no excuse to avoid the problem.
sorry for being ambiguous.
this is not about the BARRIER flag.
this is about the BIO_RW_SYNC flag.
> If MD does not pass barriers, it is unilaterally dropping the "data made it to the media" guarantee.
MD _does_ pass barriers, unless one of the managed devices reported
EOPNOTSUPP, in which case this is solved differently (reporting EOPNOTSUPP
itself).
the issue at hand is:
because md does special case the barrier bit, it strips off everything
else by re-assigning only the io direction and the barrier bit.
==> I'd like it to also pass at least the sync bit.
I think the better solution would be to _not_ re-assign bi_rw after
bio_clone, and then special case every other bit, but to only strip
off unwanted bits, if at all.
barrier bit was a bad example, because it either is not set,
or gets passed, or is errored as EOPNOTSUPP first thing.
Lars
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] latency problem in md driver
2006-12-22 15:40 ` Lars Ellenberg
@ 2006-12-22 17:33 ` Lars Ellenberg
0 siblings, 0 replies; 7+ messages in thread
From: Lars Ellenberg @ 2006-12-22 17:33 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Ingo Molnar, Neil Brown, linux-raid, Andrew Morton
/ 2006-12-22 16:40:17 +0100
\ Lars Ellenberg:
> the issue at hand is:
> because md does special case the barrier bit, it strips off everything
> else by re-assigning only the io direction and the barrier bit.
>
> ==> I'd like it to also pass at least the sync bit.
some additional info maybe regarding the motivation for this patch.
in drbd 0.7, we have some bitmap on some block device.
this can grow pretty large (128MB), and the code reading/writing it
does synchronous sector size io (which is stupid, and has been changed
already in current drbd svn). --- ok, too much detail ---
anyways, initializing 128MB using 512 Byte bios at a time
with BIO_RW_SYNC using current md raid1 code takes about 5 _minutes_.
with the patch posted earlier, it takes about 5 _seconds_.
this same performance loss will affect every users that are bound
by many small synchronous updates, unless they explicitly unplug
somewhere else.
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] latency problem in md driver
2006-12-22 14:32 [patch] latency problem in md driver Lars Ellenberg
2006-12-22 15:19 ` Jeff Garzik
@ 2006-12-22 20:40 ` Neil Brown
2006-12-26 13:09 ` Lars Ellenberg
1 sibling, 1 reply; 7+ messages in thread
From: Neil Brown @ 2006-12-22 20:40 UTC (permalink / raw)
To: Lars Ellenberg; +Cc: Ingo Molnar, linux-raid
On Friday December 22, Lars.Ellenberg@linbit.com wrote:
>
> md raidX make_request functions strip off the BIO_RW_SYNC flag,
> this introducing additional latency.
>
> below is a suggested patch for the raid1.c .
> other suggested solutions would be to let the bio_clone do its work,
> and not reassign thereby stripping off all flags.
> at most strip off known unwanted flags (the BARRIER flag).
>
> similar pattern in the other raid versions.
Thanks. I think your patch is appropriate.
I don't like the idea of passing down any flags by default. If a flag
makes failure more likely (like BIO_RW_AHEAD or BIO_RW_FAILFAST) then
I *don't* want it passed down without making a conscious decision that
it is a good idea.
So yes, _SYNC should be passed down in raid1/raid10. More interesting
stuff would be needed in raid456.
_FAILFAST should probably never be passed down (well, maybe in
multipath, but who uses md/multipath??)
_META ... what is that ?? I'm not passing it down until I know!!
I'll look into this after the holidays. Meanwhile if you want to be
certain that this is in 2.6.20,
- Fix the retry-read case as well (where ->bi_rw is assigned about
10 lines from the end of raid1d()
- Create a 'perfect patch',
- add Acked-by: NeilBrown <neilb@suse.de>
- and post it to akpm@osdl.org (and appropriate lists).
Thanks,
NeilBrown
>
> cheers,
>
> Lars
>
>
> --- /mnt/kernel-src/linux-2.6.19/drivers/md/raid1.c.orig 2006-12-11 10:06:17.661776243 +0100
> +++ /mnt/kernel-src/linux-2.6.19/drivers/md/raid1.c 2006-12-12 11:09:55.975762364 +0100
> @@ -776,6 +776,7 @@ static int make_request(request_queue_t
> struct page **behind_pages = NULL;
> const int rw = bio_data_dir(bio);
> int do_barriers;
> + int do_sync = bio_sync(bio);
>
> /*
> * Register the new request and wait if the reconstruction
> @@ -835,7 +836,7 @@ static int make_request(request_queue_t
> read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset;
> read_bio->bi_bdev = mirror->rdev->bdev;
> read_bio->bi_end_io = raid1_end_read_request;
> - read_bio->bi_rw = READ;
> + read_bio->bi_rw = READ | do_sync;
> read_bio->bi_private = r1_bio;
>
> generic_make_request(read_bio);
> @@ -906,7 +907,7 @@ static int make_request(request_queue_t
> mbio->bi_sector = r1_bio->sector + conf->mirrors[i].rdev->data_offset;
> mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
> mbio->bi_end_io = raid1_end_write_request;
> - mbio->bi_rw = WRITE | do_barriers;
> + mbio->bi_rw = WRITE | do_barriers | do_sync;
> mbio->bi_private = r1_bio;
>
> if (behind_pages) {
>
>
> --
> : Lars Ellenberg Tel +43-1-8178292-55 :
> : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
> : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] latency problem in md driver
2006-12-22 20:40 ` Neil Brown
@ 2006-12-26 13:09 ` Lars Ellenberg
0 siblings, 0 replies; 7+ messages in thread
From: Lars Ellenberg @ 2006-12-26 13:09 UTC (permalink / raw)
To: Neil Brown; +Cc: Ingo Molnar, linux-raid
/ 2006-12-23 07:40:18 +1100
\ Neil Brown:
> On Friday December 22, Lars.Ellenberg@linbit.com wrote:
> >
> > md raidX make_request functions strip off the BIO_RW_SYNC flag,
> > this introducing additional latency.
> >
> > below is a suggested patch for the raid1.c .
> > other suggested solutions would be to let the bio_clone do its work,
> > and not reassign thereby stripping off all flags.
> > at most strip off known unwanted flags (the BARRIER flag).
> >
> > similar pattern in the other raid versions.
>
> Thanks. I think your patch is appropriate.
> I don't like the idea of passing down any flags by default. If a flag
> makes failure more likely (like BIO_RW_AHEAD or BIO_RW_FAILFAST) then
> I *don't* want it passed down without making a conscious decision that
> it is a good idea.
>
> So yes, _SYNC should be passed down in raid1/raid10. More interesting
> stuff would be needed in raid456.
> _FAILFAST should probably never be passed down (well, maybe in
> multipath, but who uses md/multipath??)
> _META ... what is that ?? I'm not passing it down until I know!!
>
> I'll look into this after the holidays. Meanwhile if you want to be
> certain that this is in 2.6.20,
> - Fix the retry-read case as well (where ->bi_rw is assigned about
> 10 lines from the end of raid1d()
I don't think thats necessary,
since an explicit unplug is in place there.
but ok.
> - Create a 'perfect patch',
> - add Acked-by: NeilBrown <neilb@suse.de>
> - and post it to akpm@osdl.org (and appropriate lists).
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-12-26 13:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-22 14:32 [patch] latency problem in md driver Lars Ellenberg
2006-12-22 15:19 ` Jeff Garzik
2006-12-22 15:37 ` Ric Wheeler
2006-12-22 15:40 ` Lars Ellenberg
2006-12-22 17:33 ` Lars Ellenberg
2006-12-22 20:40 ` Neil Brown
2006-12-26 13:09 ` Lars Ellenberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).