* 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
@ 2003-03-16 15:01 Dr. David Alan Gilbert
2003-03-18 1:01 ` Neil Brown
0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2003-03-16 15:01 UTC (permalink / raw)
To: linux-kernel
Hi,
I've just built an 800GB RAID5 array and built an ext3 file system
on it; on trying to copy data off the 200GB RAID it is replacing I'm
starting to see errors of the form:
kernel: EXT3-fs error (device md(9,2)): ext3_new_block: Allocating block in
system zone - block = 140509185
and
kernel: EXT3-fs error (device md(9,2)): ext3_add_entry: bad entry in
directory #70254593: rec_len %% 4 != 0 - offset=28, inode=23880564,
rec_len=21587, name_len=76
and
kernel: raid5: multiple 1 requests for sector 281018464
This is on an x86 which has been running fine on the smaller raid for
years (albeit Reiser); the array is built from 5 200GB Western Digi
IDEs on a mix of promise and HPT controllers (there are no IDE errors
visible). This is a straight 2.4.20 kernel.
The previous messages to the list with this form of error have suggested
the problem is related to >2TB arrays; but this one is a relative
tiny one.
Help greatly appreciated,
Dave
---------------- Have a happy GNU millennium! ----------------------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
2003-03-16 15:01 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector Dr. David Alan Gilbert
@ 2003-03-18 1:01 ` Neil Brown
2003-03-18 3:27 ` Andrew Morton
2003-03-18 14:04 ` Dave Gilbert (Home)
0 siblings, 2 replies; 5+ messages in thread
From: Neil Brown @ 2003-03-18 1:01 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: linux-kernel, ext3-users
On Sunday March 16, gilbertd@treblig.org wrote:
> Hi,
> I've just built an 800GB RAID5 array and built an ext3 file system
> on it; on trying to copy data off the 200GB RAID it is replacing I'm
> starting to see errors of the form:
>
> kernel: EXT3-fs error (device md(9,2)): ext3_new_block: Allocating block in
> system zone - block = 140509185
>
> and
>
> kernel: EXT3-fs error (device md(9,2)): ext3_add_entry: bad entry in
> directory #70254593: rec_len %% 4 != 0 - offset=28, inode=23880564,
> rec_len=21587, name_len=76
>
> and
>
> kernel: raid5: multiple 1 requests for sector 281018464
I had exactly these symptoms about a year ago in 2.4.18. I found and
fixed the problem and have just checked and the fix is definately in
2.4.20.
So if you really are running 2.4.20 then it looks like a similar bug
has appeared.
These two symptoms strongly suggest a buffer aliasing problem.
i.e. you have two buffers (one for data and one for metadata)
that refer to the same location on disc.
One is part of a file that was recently deleted, but the buffer hasn't
been flushed yet. The other is part of a new directory.
The old buffer and the new buffer both get written to disc at much the
same time (hence the "multiple 1 requests"), but the old buffer hits
the disc second and so corrupts the filesystem.
The bug I found was specific to data=journal mode, and this certainly
has more options for buffer aliasing. Were you using data=journal?
NeilBrown
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
2003-03-18 1:01 ` Neil Brown
@ 2003-03-18 3:27 ` Andrew Morton
2003-03-18 5:59 ` Neil Brown
2003-03-18 14:04 ` Dave Gilbert (Home)
1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2003-03-18 3:27 UTC (permalink / raw)
To: Neil Brown; +Cc: gilbertd, linux-kernel, ext3-users
Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> These two symptoms strongly suggest a buffer aliasing problem.
> i.e. you have two buffers (one for data and one for metadata)
> that refer to the same location on disc.
> One is part of a file that was recently deleted, but the buffer hasn't
> been flushed yet. The other is part of a new directory.
> The old buffer and the new buffer both get written to disc at much the
> same time (hence the "multiple 1 requests"), but the old buffer hits
> the disc second and so corrupts the filesystem.
This aliasing can happen very easily with direct-io, and it is something
which drivers should be able to cope with.
I hope RAID is not still assuming that all requests are unique in this way?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
2003-03-18 3:27 ` Andrew Morton
@ 2003-03-18 5:59 ` Neil Brown
0 siblings, 0 replies; 5+ messages in thread
From: Neil Brown @ 2003-03-18 5:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: gilbertd, linux-kernel, ext3-users
On Monday March 17, akpm@digeo.com wrote:
> Neil Brown <neilb@cse.unsw.edu.au> wrote:
> >
> > These two symptoms strongly suggest a buffer aliasing problem.
> > i.e. you have two buffers (one for data and one for metadata)
> > that refer to the same location on disc.
> > One is part of a file that was recently deleted, but the buffer hasn't
> > been flushed yet. The other is part of a new directory.
> > The old buffer and the new buffer both get written to disc at much the
> > same time (hence the "multiple 1 requests"), but the old buffer hits
> > the disc second and so corrupts the filesystem.
>
> This aliasing can happen very easily with direct-io, and it is something
> which drivers should be able to cope with.
>
> I hope RAID is not still assuming that all requests are unique in this way?
No. RAID copes. If raid5 sees a write request for a block that it
already has a pending write request for, it will print a warning and
delay the second until the first complete.
In the cas in question I don't think raid5 is contributing to the
problem. It is just provide extra information which might help point
towards the problem - i.e. it is confirming that some sort of aliasing
is happening.
NeilBrown
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector
2003-03-18 1:01 ` Neil Brown
2003-03-18 3:27 ` Andrew Morton
@ 2003-03-18 14:04 ` Dave Gilbert (Home)
1 sibling, 0 replies; 5+ messages in thread
From: Dave Gilbert (Home) @ 2003-03-18 14:04 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel, ext3-users
Neil Brown wrote:
> The bug I found was specific to data=journal mode, and this certainly
> has more options for buffer aliasing. Were you using data=journal?
No.
Dave
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-03-18 13:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-16 15:01 2.4.20: ext3/raid5 - allocating block in system zone/multiple 1 requests for sector Dr. David Alan Gilbert
2003-03-18 1:01 ` Neil Brown
2003-03-18 3:27 ` Andrew Morton
2003-03-18 5:59 ` Neil Brown
2003-03-18 14:04 ` Dave Gilbert (Home)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox