* Re: raid5: switching cache buffer size log spam
[not found] <15525.35945.411587.185565@notabene.cse.unsw.edu.au>
@ 2002-04-29 1:32 ` dean gaudet
2002-05-01 5:33 ` Neil Brown
0 siblings, 1 reply; 2+ messages in thread
From: dean gaudet @ 2002-04-29 1:32 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
On Sat, 30 Mar 2002, Neil Brown wrote:
> On Friday March 29, dean-list-linux-kernel@arctic.org wrote:
> > 2.4.19-pre3-ac1
> >
> > /dev/md1 is a raid5 of 4 disks
> > [creating snapshot volumes] results in lots of log spam of the form:
> >
> > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 512
> > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 4096
> > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 4096 --> 512
>
> The current raid5 requires all requests that is recieves to be the
> same size. If it gets requests of different sizes, it does cope but
> it needs to flush its stripe cache and start again. This effectively
> serialises all requests around a request-size change.
> So it is probably fair to say that what you are trying to do is
> currently not a supported option. It may be in 2.6 if we ever get
> raid5 working again in 2.5 :-)
this is an operational concern to me now -- because even the act of
creating an LVM snapshot results in the log spam... and this makes my
machine grind to a halt if i attempt to do backups with snapshots.
i could pursue "fixing" LVM, mke2fs/etc. to all use 4096-byte buffer_heads
if possible. but i thought i'd also look into fixing md itself... is
anyone working on this already?
i've thought of two solutions, let me know what you think.
(1) in raid5_make_request, do something like:
if (bh->b_size <= conf->buffer_size || conf->buffer_size == 0) {
do what's in the code currently;
}
else {
break up the request into
(bh->b_size / conf->buffer_size) pieces;
}
this would mean i'd eventually end up with conf->buffer_size == 512
and my regular 4096 byte fs operations would end up being broken
into 8 pieces... as long as LVM is doing the smaller block size
requests.
and if i see things right, eventually a sync would cause the
cache to 0, and at that point it would be possible to get back to
a 4096-byte cache.
how hard is it to break up bh things like this? i'm a newbie in
this area of the kernel :)
(2) the differing size accesses are typically in disjoint regions of
the array. for example, when creating an LVM snapshot there is LVM
metadata, there is a live filesystem, and there is the LVM snapshot
volume itself. all of these are disjoint.
instead of draining the entire cache in get_active_stripe(), we only
need to drain entries which overlap the sector we're interested in.
i'm still puzzling out an efficient change to the hash which
accomplishes this...
one idea is to change conf->buffer_size to conf->max_buffer_size.
in this case we drain the entire cache any time max_buffer_size
changes, and allow entries in the cache which are any size
<= conf->max_buffer_size.
if max_buffer_size is 4096 (as i expect it to be) then 512-byte
accesses will potentially require hash chains 8x longer than
they would be today... but this could be a total non-issue.
i suspect it works for my situation, but may not work in
general when folks want to mix and match filesystem or
database block sizes on an array.
the cache could use a tree instead of a hash ... but that's not
so nice for n-way scaling :)
thanks
-dean
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: raid5: switching cache buffer size log spam
2002-04-29 1:32 ` raid5: switching cache buffer size log spam dean gaudet
@ 2002-05-01 5:33 ` Neil Brown
0 siblings, 0 replies; 2+ messages in thread
From: Neil Brown @ 2002-05-01 5:33 UTC (permalink / raw)
To: dean gaudet; +Cc: linux-raid
On Sunday April 28, dean@arctic.org wrote:
> > >
> > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 512
> > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 4096
> > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 4096 --> 512
> >
>
> this is an operational concern to me now -- because even the act of
> creating an LVM snapshot results in the log spam... and this makes my
> machine grind to a halt if i attempt to do backups with snapshots.
>
> i could pursue "fixing" LVM, mke2fs/etc. to all use 4096-byte buffer_heads
> if possible. but i thought i'd also look into fixing md itself... is
> anyone working on this already?
>
> i've thought of two solutions, let me know what you think.
>
> (1) in raid5_make_request, do something like:
>
> if (bh->b_size <= conf->buffer_size || conf->buffer_size == 0) {
> do what's in the code currently;
> }
> else {
> break up the request into
> (bh->b_size / conf->buffer_size) pieces;
> }
>
> this would mean i'd eventually end up with conf->buffer_size == 512
> and my regular 4096 byte fs operations would end up being broken
> into 8 pieces... as long as LVM is doing the smaller block size
> requests.
>
> and if i see things right, eventually a sync would cause the
> cache to 0, and at that point it would be possible to get back to
> a 4096-byte cache.
>
> how hard is it to break up bh things like this? i'm a newbie in
> this area of the kernel :)
Look at
http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.19-pre3/patch-j-RaidSplit
It allows each raid module to say "split this for me" and md.c will
split the request and re-submit.
It should then be fairly easy to get raid5 to say 'split' if the
request is bigger than the current size.
However doing raid5 with 512 by blocks seems to be a lot slower than
with 4K blocks.. Try it and see.
>
> (2) the differing size accesses are typically in disjoint regions of
> the array. for example, when creating an LVM snapshot there is LVM
> metadata, there is a live filesystem, and there is the LVM snapshot
> volume itself. all of these are disjoint.
>
> instead of draining the entire cache in get_active_stripe(), we only
> need to drain entries which overlap the sector we're interested
> in.
and you need to make sure that new requests don't then conflict with
other new requests... I suspect this approach would be quite hard.
I think the "right" way is to fix the cache size at 4K and handle
reads and writes from partial buffers. It's probably end up like this
in 2.5.. eventually.
NeilBrown
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2002-05-01 5:33 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <15525.35945.411587.185565@notabene.cse.unsw.edu.au>
2002-04-29 1:32 ` raid5: switching cache buffer size log spam dean gaudet
2002-05-01 5:33 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).