2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
@ 2003-03-16 15:01 Dr. David Alan Gilbert
  2003-03-18  1:01 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2003-03-16 15:01 UTC (permalink / raw)
  To: linux-kernel

Hi,
  I've just built an 800GB RAID5 array and built an ext3 file system
on it; on trying to copy data off the 200GB RAID it is replacing I'm
starting to see errors of the form:

kernel: EXT3-fs error (device md(9,2)): ext3_new_block: Allocating block in
system zone - block = 140509185

and

kernel: EXT3-fs error (device md(9,2)): ext3_add_entry: bad entry in
directory #70254593: rec_len %% 4 != 0 - offset=28, inode=23880564,
rec_len=21587, name_len=76

and

kernel: raid5: multiple 1 requests for sector 281018464

This is on an x86 which has been running fine on the smaller raid for
years (albeit Reiser); the array is built from 5 200GB Western Digi
IDEs on a mix of promise and HPT controllers (there are no IDE errors
visible). This is a straight 2.4.20 kernel.

The previous messages to the list with this form of error have suggested
the problem is related to >2TB arrays; but this one is a relative 
tiny one.

Help greatly appreciated,

Dave
 ---------------- Have a happy GNU millennium! ----------------------   
/ Dr. David Alan Gilbert    | Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
  2003-03-16 15:01 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector Dr. David Alan Gilbert
@ 2003-03-18  1:01 ` Neil Brown
  2003-03-18  3:27   ` Andrew Morton
  2003-03-18 14:04   ` Dave Gilbert (Home)
  0 siblings, 2 replies; 5+ messages in thread
From: Neil Brown @ 2003-03-18  1:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-kernel, ext3-users

On Sunday March 16, gilbertd@treblig.org wrote:
> Hi,
>   I've just built an 800GB RAID5 array and built an ext3 file system
> on it; on trying to copy data off the 200GB RAID it is replacing I'm
> starting to see errors of the form:
> 
> kernel: EXT3-fs error (device md(9,2)): ext3_new_block: Allocating block in
> system zone - block = 140509185
> 
> and
> 
> kernel: EXT3-fs error (device md(9,2)): ext3_add_entry: bad entry in
> directory #70254593: rec_len %% 4 != 0 - offset=28, inode=23880564,
> rec_len=21587, name_len=76
> 
> and
> 
> kernel: raid5: multiple 1 requests for sector 281018464

I had exactly these symptoms about a year ago in 2.4.18.  I found and
fixed the problem and have just checked and the fix is definately in
2.4.20.
So if you really are running 2.4.20 then it looks like a similar bug
has appeared.

These two symptoms strongly suggest a buffer aliasing problem.
i.e. you have two buffers (one for data and one for metadata)
that refer to the same location on disc.
One is part of a file that was recently deleted, but the buffer hasn't
been flushed yet.  The other is part of a new directory.
The old buffer and the new buffer both get written to disc at much the
same time (hence the "multiple 1 requests"), but the old buffer hits
the disc second and so corrupts the filesystem.

The bug I found was specific to data=journal mode, and this certainly
has more options for buffer aliasing.  Were you using data=journal?

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
  2003-03-18  1:01 ` Neil Brown
@ 2003-03-18  3:27   ` Andrew Morton
  2003-03-18  5:59     ` Neil Brown
  2003-03-18 14:04   ` Dave Gilbert (Home)
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2003-03-18  3:27 UTC (permalink / raw)
  To: Neil Brown; +Cc: gilbertd, linux-kernel, ext3-users

Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> These two symptoms strongly suggest a buffer aliasing problem.
> i.e. you have two buffers (one for data and one for metadata)
> that refer to the same location on disc.
> One is part of a file that was recently deleted, but the buffer hasn't
> been flushed yet.  The other is part of a new directory.
> The old buffer and the new buffer both get written to disc at much the
> same time (hence the "multiple 1 requests"), but the old buffer hits
> the disc second and so corrupts the filesystem.

This aliasing can happen very easily with direct-io, and it is something
which drivers should be able to cope with.

I hope RAID is not still assuming that all requests are unique in this way?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
  2003-03-18  3:27   ` Andrew Morton
@ 2003-03-18  5:59     ` Neil Brown
  0 siblings, 0 replies; 5+ messages in thread
From: Neil Brown @ 2003-03-18  5:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: gilbertd, linux-kernel, ext3-users

On Monday March 17, akpm@digeo.com wrote:
> Neil Brown <neilb@cse.unsw.edu.au> wrote:
> >
> > These two symptoms strongly suggest a buffer aliasing problem.
> > i.e. you have two buffers (one for data and one for metadata)
> > that refer to the same location on disc.
> > One is part of a file that was recently deleted, but the buffer hasn't
> > been flushed yet.  The other is part of a new directory.
> > The old buffer and the new buffer both get written to disc at much the
> > same time (hence the "multiple 1 requests"), but the old buffer hits
> > the disc second and so corrupts the filesystem.
> 
> This aliasing can happen very easily with direct-io, and it is something
> which drivers should be able to cope with.
> 
> I hope RAID is not still assuming that all requests are unique in this way?

No.  RAID copes.  If raid5 sees a write request for a block that it
already has a pending write request for, it will print a warning and
delay the second until the first complete.

In the cas in question I don't think raid5 is contributing to the
problem.  It is just provide extra information which might help point
towards the problem - i.e. it is confirming that some sort of aliasing
is happening.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
  2003-03-18  1:01 ` Neil Brown
  2003-03-18  3:27   ` Andrew Morton
@ 2003-03-18 14:04   ` Dave Gilbert (Home)
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Gilbert (Home) @ 2003-03-18 14:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, ext3-users

Neil Brown wrote:

> The bug I found was specific to data=journal mode, and this certainly
> has more options for buffer aliasing.  Were you using data=journal?

No.

Dave



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-03-18 13:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-16 15:01 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector Dr. David Alan Gilbert
2003-03-18  1:01 ` Neil Brown
2003-03-18  3:27   ` Andrew Morton
2003-03-18  5:59     ` Neil Brown
2003-03-18 14:04   ` Dave Gilbert (Home)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox