linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: raid5: switching cache buffer size log spam
       [not found] <15525.35945.411587.185565@notabene.cse.unsw.edu.au>
@ 2002-04-29  1:32 ` dean gaudet
  2002-05-01  5:33   ` Neil Brown
  0 siblings, 1 reply; 2+ messages in thread
From: dean gaudet @ 2002-04-29  1:32 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Sat, 30 Mar 2002, Neil Brown wrote:

> On Friday March 29, dean-list-linux-kernel@arctic.org wrote:
> > 2.4.19-pre3-ac1
> >
> > /dev/md1 is a raid5 of 4 disks
> > [creating snapshot volumes] results in lots of log spam of the form:
> >
> > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 512
> > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 4096
> > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 4096 --> 512
>
> The current raid5 requires all requests that is recieves to be the
> same size.  If it gets requests of different sizes, it does cope but
> it needs to flush its stripe cache and start again.  This effectively
> serialises all requests around a request-size change.
> So it is probably fair to say that what you are trying to do is
> currently not a supported option.  It may be in 2.6 if we ever get
> raid5 working again in 2.5 :-)

this is an operational concern to me now -- because even the act of
creating an LVM snapshot results in the log spam... and this makes my
machine grind to a halt if i attempt to do backups with snapshots.

i could pursue "fixing" LVM, mke2fs/etc. to all use 4096-byte buffer_heads
if possible.  but i thought i'd also look into fixing md itself... is
anyone working on this already?

i've thought of two solutions, let me know what you think.

(1) in raid5_make_request, do something like:

	if (bh->b_size <= conf->buffer_size || conf->buffer_size == 0) {
		do what's in the code currently;
	}
	else {
		break up the request into
			(bh->b_size / conf->buffer_size) pieces;
	}

    this would mean i'd eventually end up with conf->buffer_size == 512
    and my regular 4096 byte fs operations would end up being broken
    into 8 pieces... as long as LVM is doing the smaller block size
    requests.

    and if i see things right, eventually a sync would cause the
    cache to 0, and at that point it would be possible to get back to
    a 4096-byte cache.

    how hard is it to break up bh things like this?  i'm a newbie in
    this area of the kernel :)

(2) the differing size accesses are typically in disjoint regions of
    the array.  for example, when creating an LVM snapshot there is LVM
    metadata, there is a live filesystem, and there is the LVM snapshot
    volume itself.  all of these are disjoint.

    instead of draining the entire cache in get_active_stripe(), we only
    need to drain entries which overlap the sector we're interested in.

    i'm still puzzling out an efficient change to the hash which
    accomplishes this...

    one idea is to change conf->buffer_size to conf->max_buffer_size.
    in this case we drain the entire cache any time max_buffer_size
    changes, and allow entries in the cache which are any size
    <= conf->max_buffer_size.

    if max_buffer_size is 4096 (as i expect it to be) then 512-byte
    accesses will potentially require hash chains 8x longer than
    they would be today... but this could be a total non-issue.
    i suspect it works for my situation, but may not work in
    general when folks want to mix and match filesystem or
    database block sizes on an array.

    the cache could use a tree instead of a hash ... but that's not
    so nice for n-way scaling :)

thanks
-dean


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: raid5: switching cache buffer size log spam
  2002-04-29  1:32 ` raid5: switching cache buffer size log spam dean gaudet
@ 2002-05-01  5:33   ` Neil Brown
  0 siblings, 0 replies; 2+ messages in thread
From: Neil Brown @ 2002-05-01  5:33 UTC (permalink / raw)
  To: dean gaudet; +Cc: linux-raid

On Sunday April 28, dean@arctic.org wrote:
> > >
> > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 512
> > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 0 --> 4096
> > > Mar 29 10:47:45 debian kernel: raid5: switching cache buffer size, 4096 --> 512
> >
> 
> this is an operational concern to me now -- because even the act of
> creating an LVM snapshot results in the log spam... and this makes my
> machine grind to a halt if i attempt to do backups with snapshots.
> 
> i could pursue "fixing" LVM, mke2fs/etc. to all use 4096-byte buffer_heads
> if possible.  but i thought i'd also look into fixing md itself... is
> anyone working on this already?
> 
> i've thought of two solutions, let me know what you think.
> 
> (1) in raid5_make_request, do something like:
> 
> 	if (bh->b_size <= conf->buffer_size || conf->buffer_size == 0) {
> 		do what's in the code currently;
> 	}
> 	else {
> 		break up the request into
> 			(bh->b_size / conf->buffer_size) pieces;
> 	}
> 
>     this would mean i'd eventually end up with conf->buffer_size == 512
>     and my regular 4096 byte fs operations would end up being broken
>     into 8 pieces... as long as LVM is doing the smaller block size
>     requests.
> 
>     and if i see things right, eventually a sync would cause the
>     cache to 0, and at that point it would be possible to get back to
>     a 4096-byte cache.
> 
>     how hard is it to break up bh things like this?  i'm a newbie in
>     this area of the kernel :)

Look at
 http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.19-pre3/patch-j-RaidSplit

It allows each raid module to say "split this for me" and md.c will
split the request and re-submit.
It should then be fairly easy to get raid5 to say 'split' if the
request is bigger than the current size.
However doing raid5 with 512 by blocks seems to be a lot slower than
with 4K blocks..  Try it and see.


> 
> (2) the differing size accesses are typically in disjoint regions of
>     the array.  for example, when creating an LVM snapshot there is LVM
>     metadata, there is a live filesystem, and there is the LVM snapshot
>     volume itself.  all of these are disjoint.
> 
>     instead of draining the entire cache in get_active_stripe(), we only
>     need to drain entries which overlap the sector we're interested
>     in.

and you need to make sure that new requests don't then conflict with
other new requests...  I suspect this approach would be quite hard.


I think the "right" way is to fix the cache size at 4K and handle
reads and writes from partial buffers.  It's probably end up like this
in 2.5.. eventually.

NeilBrown

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-05-01  5:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <15525.35945.411587.185565@notabene.cse.unsw.edu.au>
2002-04-29  1:32 ` raid5: switching cache buffer size log spam dean gaudet
2002-05-01  5:33   ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).