* 2.6.0: ext3 journal aborted with ext3/lvm/raid5
@ 2004-01-07 20:53 Steven Ihde
2004-01-07 21:08 ` Mike Fedyk
2004-02-18 14:55 ` Leandro Guimarães Faria Corsetti Dutra
0 siblings, 2 replies; 9+ messages in thread
From: Steven Ihde @ 2004-01-07 20:53 UTC (permalink / raw)
To: linux-raid
I get "Aborting journal on device dm-x" and associated errors about
every week or two. This is on a 2.6.0 kernel, ext3 filesystem running
on lvm over md raid5. This has been happening since at least some
point in the 2.6.0-test series.
The "Aborting journal" is always preceded by some type of
"ext3_readdir: bad entry" error. At the end of this mail I have
listed a selection of entries from my syslog showing the relevant
occurences. This normally happens around 6:25am when the locate
database is being updated, though it has occasionally happened at
other times when the partition was being accessed.
The filesystem in question is my /usr partition. When I take the
system down to run fsck, it has always been able to recover (sometimes
with modifications, sometimes without). I have rebuilt the filesystem
at least twice (by running mke2fs to create a fresh fs and copying in
everything from an archive) and the problem has persisted. I have
also tried running my /usr partition on a plain disk partition (no lvm
or raid5), and that went two weeks with no errors. I have several
other partitions on the same raid5 array and I only get problem with
one of them.
So, it seems the problem could be:
1. a hardware problem in a certain area of one of my disks (but there
has never been a hardware error in the logs)
2. a problem with lvm
3. a problem with raid5
4. a problem with ext3
Any suggestions as to whether raid5 might be the source of the
problem, or how to rule it out? I can provide more information or try
out patches if necessary.
Thanks for any suggestions,
Steve
More info about my system:
Debian unstable, 2.6.0 kernel
Abit IS-7 (with Intel i865PE chipset) motherboard
Two 80G PATA Seagate drives hooked to separate channels on the
on-board IDE controller
One 120G SATA Seagate drive hooked to the onboard SATA controller
One three-disk raid0 array one the first 1G of each disk
One three-disk raid5 array on the remaining 79G of each disk
(all the errors have been on the /usr partition, which is on the
raid5 array)
Here are some syslog entries (dm-1 and dm-2 are referring to the same
logical volume; I changed my configuration slightly at some point
which caused the numbering to change):
Nov 26 06:25:31 hamachi kernel: EXT3-fs error (device dm-1): ext3_readdir: bad entry in directory #268586: rec_len %% 4 != 0 - offset=0, inode=298189867, rec_len=63517, name_len=32
Nov 26 06:25:31 hamachi kernel: Aborting journal on device dm-1.
Nov 26 06:25:32 hamachi kernel: ext3_abort called.
Nov 26 06:25:32 hamachi kernel: EXT3-fs abort (device dm-1): ext3_journal_start: Detected aborted journal
Nov 26 06:25:32 hamachi kernel: Remounting filesystem read-only
Dec 5 06:25:53 hamachi kernel: EXT3-fs error (device dm-1): ext3_readdir: bad entry in directory #72235: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Dec 5 06:25:53 hamachi kernel: Aborting journal on device dm-1.
Dec 5 06:25:54 hamachi kernel: ext3_abort called.
Dec 5 06:25:54 hamachi kernel: EXT3-fs abort (device dm-1): ext3_journal_start: Detected aborted journal
Dec 5 06:25:54 hamachi kernel: Remounting filesystem read-only
Dec 14 20:00:12 hamachi kernel: EXT3-fs error (device dm-1): ext3_readdir: bad entry in directory #229414: directory entry across blocks - offset=0, inode=0, rec_len=61476, name_len=157
Dec 14 20:00:12 hamachi kernel: Aborting journal on device dm-1.
Dec 14 20:00:12 hamachi kernel: ext3_abort called.
Dec 14 20:00:12 hamachi kernel: EXT3-fs abort (device dm-1): ext3_journal_start: Detected aborted journal
Dec 14 20:00:12 hamachi kernel: Remounting filesystem read-only
Dec 14 20:00:14 hamachi kernel: EXT3-fs error (device dm-1): ext3_readdir: bad entry in directory #229414: directory entry across blocks - offset=0, inode=0, rec_len=61476, name_len=157
Jan 2 22:54:12 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #182492: inode out of bounds - offset=0, inode=50462976, rec_len=1284, name_len=6
Jan 2 22:54:12 hamachi kernel: Aborting journal on device dm-2.
Jan 2 22:54:12 hamachi kernel: ext3_abort called.
Jan 2 22:54:12 hamachi kernel: EXT3-fs abort (device dm-2): ext3_journal_start: Detected aborted journal
Jan 2 22:54:12 hamachi kernel: Remounting filesystem read-only
Jan 2 22:54:13 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #184711: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jan 2 22:54:13 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #197622: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jan 2 22:54:13 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #295097: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jan 2 22:54:13 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #196845: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jan 2 22:54:13 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #197149: inode out of bounds - offset=0, inode=50462976, rec_len=1284, name_len=6
Jan 2 22:54:14 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #184553: inode out of bounds - offset=0, inode=50462976, rec_len=1284, name_len=6
Jan 2 22:54:14 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #197435: inode out of bounds - offset=0, inode=50462976, rec_len=1284, name_len=6
Jan 2 22:54:24 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #249397: directory entry across blocks - offset=0, inode=1886220131, rec_len=25964, name_len=116
Jan 2 22:54:26 hamachi last message repeated 11 times
Jan 7 06:25:19 hamachi kernel: EXT3-fs error (device dm-2): ext3_readdir: bad entry in directory #280327: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jan 7 06:25:19 hamachi kernel: Aborting journal on device dm-2.
Jan 7 06:25:19 hamachi kernel: ext3_abort called.
Jan 7 06:25:19 hamachi kernel: EXT3-fs abort (device dm-2): ext3_journal_start: Detected aborted journal
Jan 7 06:25:19 hamachi kernel: Remounting filesystem read-only
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.0: ext3 journal aborted with ext3/lvm/raid5
2004-01-07 20:53 2.6.0: ext3 journal aborted with ext3/lvm/raid5 Steven Ihde
@ 2004-01-07 21:08 ` Mike Fedyk
2004-01-07 21:31 ` Steven Ihde
2004-02-18 14:55 ` Leandro Guimarães Faria Corsetti Dutra
1 sibling, 1 reply; 9+ messages in thread
From: Mike Fedyk @ 2004-01-07 21:08 UTC (permalink / raw)
To: Steven Ihde; +Cc: linux-raid
On Wed, Jan 07, 2004 at 12:53:26PM -0800, Steven Ihde wrote:
>
> I get "Aborting journal on device dm-x" and associated errors about
> every week or two. This is on a 2.6.0 kernel, ext3 filesystem running
> on lvm over md raid5. This has been happening since at least some
> point in the 2.6.0-test series.
Have you run fsck on this filesystem?
Can you find the version of 2.6.0-test where this started?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.0: ext3 journal aborted with ext3/lvm/raid5
2004-01-07 21:08 ` Mike Fedyk
@ 2004-01-07 21:31 ` Steven Ihde
2004-01-07 22:42 ` Mike Fedyk
0 siblings, 1 reply; 9+ messages in thread
From: Steven Ihde @ 2004-01-07 21:31 UTC (permalink / raw)
To: linux-raid
On Wed, Jan 07, 2004 at 01:08:16PM -0800, Mike Fedyk wrote:
> On Wed, Jan 07, 2004 at 12:53:26PM -0800, Steven Ihde wrote:
> >
> > I get "Aborting journal on device dm-x" and associated errors about
> > every week or two. This is on a 2.6.0 kernel, ext3 filesystem running
> > on lvm over md raid5. This has been happening since at least some
> > point in the 2.6.0-test series.
>
> Have you run fsck on this filesystem?
I've fsck'ed it a number of times, because after the journal aborts
the fs is remounted read-only but unclean. So I take the system down,
run fsck, fsck works (and a subsequent e2fsck -nf is clean), and then
I bring the system back up. On the chance that there is a lingering
inconsistency in the filesystem I have rebuilt it from scratch twice
(by making a new, empty ext3 fs on the device, and copying in all the
files from a backup on another fs) and the problem persists. So I
don't think the fs data itself is the problem.
>
> Can you find the version of 2.6.0-test where this started?
This is a relatively new raid5 array -- I've been having this problem
ever since I created the array, which was sometime after I switched to
the 2.6.0-test series. So I'm not claiming there is (or isn't) a
version of 2.6.0-test that works -- only that 2.6.0-test* and 2.6.0
are the only versions I've tried, and that's where I've seen the
problem.
-Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.0: ext3 journal aborted with ext3/lvm/raid5
2004-01-07 21:31 ` Steven Ihde
@ 2004-01-07 22:42 ` Mike Fedyk
2004-01-07 23:14 ` Steven Ihde
0 siblings, 1 reply; 9+ messages in thread
From: Mike Fedyk @ 2004-01-07 22:42 UTC (permalink / raw)
To: Steven Ihde; +Cc: linux-raid
On Wed, Jan 07, 2004 at 01:31:11PM -0800, Steven Ihde wrote:
> On Wed, Jan 07, 2004 at 01:08:16PM -0800, Mike Fedyk wrote:
> > Can you find the version of 2.6.0-test where this started?
>
> This is a relatively new raid5 array -- I've been having this problem
> ever since I created the array, which was sometime after I switched to
> the 2.6.0-test series. So I'm not claiming there is (or isn't) a
> version of 2.6.0-test that works -- only that 2.6.0-test* and 2.6.0
> are the only versions I've tried, and that's where I've seen the
> problem.
Then please verify it on a 2.4.24 kernel, or run some hardware tests.
o Memtest86 (running all tests with at least one pass (this can take 24
hours for one pass depending on how much memory you have...)
o burn{BX,MMX,etc}
Mike
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: 2.6.0: ext3 journal aborted with ext3/lvm/raid5
2004-01-07 22:42 ` Mike Fedyk
@ 2004-01-07 23:14 ` Steven Ihde
0 siblings, 0 replies; 9+ messages in thread
From: Steven Ihde @ 2004-01-07 23:14 UTC (permalink / raw)
To: linux-raid
On Wed, Jan 07, 2004 at 02:42:49PM -0800, Mike Fedyk wrote:
> On Wed, Jan 07, 2004 at 01:31:11PM -0800, Steven Ihde wrote:
> > On Wed, Jan 07, 2004 at 01:08:16PM -0800, Mike Fedyk wrote:
> > > Can you find the version of 2.6.0-test where this started?
> >
> > This is a relatively new raid5 array -- I've been having this problem
> > ever since I created the array, which was sometime after I switched to
> > the 2.6.0-test series. So I'm not claiming there is (or isn't) a
> > version of 2.6.0-test that works -- only that 2.6.0-test* and 2.6.0
> > are the only versions I've tried, and that's where I've seen the
> > problem.
>
> Then please verify it on a 2.4.24 kernel, or run some hardware tests.
>
> o Memtest86 (running all tests with at least one pass (this can take 24
> hours for one pass depending on how much memory you have...)
> o burn{BX,MMX,etc}
I've run the full memtest86 within the past couple months and it came
out OK (I bought a bad stick of RAM at some point and wanted to be
sure the replacement was good).
If noone has any suggestions other than to try 2.4.24, I'll give it a
try and report back in a couple weeks (it'll take that long to be sure
since this problem sometimes doesn't occur for 10-15 days at a time).
Thanks,
Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.0: ext3 journal aborted with ext3/lvm/raid5
2004-01-07 20:53 2.6.0: ext3 journal aborted with ext3/lvm/raid5 Steven Ihde
2004-01-07 21:08 ` Mike Fedyk
@ 2004-02-18 14:55 ` Leandro Guimarães Faria Corsetti Dutra
2004-02-19 21:08 ` Steven Ihde
1 sibling, 1 reply; 9+ messages in thread
From: Leandro Guimarães Faria Corsetti Dutra @ 2004-02-18 14:55 UTC (permalink / raw)
To: linux-raid
On Wed, 07 Jan 2004 12:53:26 -0800, Steven Ihde wrote:
> I get "Aborting journal on device dm-x" and associated errors about every
> week or two. This is on a 2.6.0 kernel, ext3 filesystem running on lvm
> over md raid5.
I've found a similar situation, have any fix came forth?
--
Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71 +55 (11) 5686 9607
04.674-000 São Paulo, SP BRASIL
http://br.geocities.com./lgcdutra/
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.0: ext3 journal aborted with ext3/lvm/raid5
2004-02-18 14:55 ` Leandro Guimarães Faria Corsetti Dutra
@ 2004-02-19 21:08 ` Steven Ihde
2004-02-19 22:01 ` Leandro Guimarães Faria Corcete Dutra
2004-02-21 11:45 ` Luca Berra
0 siblings, 2 replies; 9+ messages in thread
From: Steven Ihde @ 2004-02-19 21:08 UTC (permalink / raw)
To: Leandro Guimarães Faria Corsetti Dutra; +Cc: linux-raid
On Wed, Feb 18, 2004 at 11:55:50AM -0300, Leandro Guimarães Faria Corsetti Dutra wrote:
> On Wed, 07 Jan 2004 12:53:26 -0800, Steven Ihde wrote:
>
> > I get "Aborting journal on device dm-x" and associated errors about every
> > week or two. This is on a 2.6.0 kernel, ext3 filesystem running on lvm
> > over md raid5.
>
> I've found a similar situation, have any fix came forth?
Funny you should bring it up now. When I posted back in January it
was suggested that I verify my hardware/memory was OK and also try to
reproduce the bug on 2.4.24. So, I ran burnBX/burnMMX for a couple
days, turned up no problems, did memtest86 overnight, also no
problems, and then I switched to 2.4.24.
I ran 2.4.24 (plus dm-1.00.07 for lvm2) for about 25 days and had no
problems. This seemed like reasonable evidence the problem doesn't
exist (or takes longer to show up) in 2.4.24. So then I switched to
2.6.2 (that being the latest 2.6 kernel). After about six days uptime
on 2.6.2, I started seeing the problem again.
Can anyone suggest what the next step is?
Here are the latest entries from my syslog (this is 2.6.2). As it
usually does, this occurred early in the morning while the locatedb
was being updated.
Feb 14 06:12:44 hamachi kernel: EXT3-fs error (device dm-4): ext3_readdir: bad entry in directory #5160995: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Feb 14 06:12:44 hamachi kernel: Aborting journal on device dm-4.
Feb 14 06:12:45 hamachi kernel: ext3_abort called.
Feb 14 06:12:45 hamachi kernel: EXT3-fs abort (device dm-4): ext3_journal_start: Detected aborted journal
Feb 14 06:12:45 hamachi kernel: Remounting filesystem read-only
Feb 14 06:13:56 hamachi kernel: EXT3-fs error (device dm-4) in start_transaction: Journal has aborted
Feb 14 06:13:57 hamachi last message repeated 741 times
Thanks,
Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.0: ext3 journal aborted with ext3/lvm/raid5
2004-02-19 21:08 ` Steven Ihde
@ 2004-02-19 22:01 ` Leandro Guimarães Faria Corcete Dutra
2004-02-21 11:45 ` Luca Berra
1 sibling, 0 replies; 9+ messages in thread
From: Leandro Guimarães Faria Corcete Dutra @ 2004-02-19 22:01 UTC (permalink / raw)
To: Steven Ihde; +Cc: linux-raid
Em Qui, 2004-02-19 às 18:08, Steven Ihde escreveu:
> On Wed, Feb 18, 2004 at 11:55:50AM -0300, Leandro Guimarães Faria Corsetti Dutra wrote:
> > On Wed, 07 Jan 2004 12:53:26 -0800, Steven Ihde wrote:
> > > I get "Aborting journal on device dm-x" and associated errors about every
> > > week or two. This is on a 2.6.0 kernel, ext3 filesystem running on lvm
> > > over md raid5.
> >
> > I've found a similar situation, have any fix came forth?
>
> Can anyone suggest what the next step is?
I'd suggest a bug report... I was thinking about upgrading to 2.6.3 and
following the procedures to generate a PROBLEM-format bug report. Only
I'm lazy and usually wait for the Debian packages to upgrade...
But I guess we already have enough information and thus this would be
unnecessary, and all we need now is a sufficient knowledgeable hacker to
take an interest in our plight.
> Here are the latest entries from my syslog (this is 2.6.2). As it
> usually does, this occurred early in the morning while the locatedb
> was being updated.
I've Google'd around and at another lists (like the ext3-users and
linux-kernel ones) people usually have this at the same hour, usually
linked to some scheduled (implying I/O heavy) event.
Mine occurred around 18h, same as one of the reporters I found.
--
Leandro Guimarães Faria Corcete Dutra
Maringá, PR, BRASIL +55 (11) 5685 2219
http://br.geocities.com./lgcdutra/ +55 (44) 3028 7467
Soli Deo Gloria! +55 (44) 3028 8335
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.6.0: ext3 journal aborted with ext3/lvm/raid5
2004-02-19 21:08 ` Steven Ihde
2004-02-19 22:01 ` Leandro Guimarães Faria Corcete Dutra
@ 2004-02-21 11:45 ` Luca Berra
1 sibling, 0 replies; 9+ messages in thread
From: Luca Berra @ 2004-02-21 11:45 UTC (permalink / raw)
To: linux-raid; +Cc: linux-lvm
On Thu, Feb 19, 2004 at 01:08:05PM -0800, Steven Ihde wrote:
>On Wed, Feb 18, 2004 at 11:55:50AM -0300, Leandro Guimarães Faria Corsetti Dutra wrote:
>> On Wed, 07 Jan 2004 12:53:26 -0800, Steven Ihde wrote:
>>
>> > I get "Aborting journal on device dm-x" and associated errors about every
>> > week or two. This is on a 2.6.0 kernel, ext3 filesystem running on lvm
>> > over md raid5.
>>
>> I've found a similar situation, have any fix came forth?
>
>Funny you should bring it up now. When I posted back in January it
>was suggested that I verify my hardware/memory was OK and also try to
>reproduce the bug on 2.4.24. So, I ran burnBX/burnMMX for a couple
>days, turned up no problems, did memtest86 overnight, also no
>problems, and then I switched to 2.4.24.
I have seen this as well, no problem on 2.4 + md (raid5) + lvm2,
about every night on 2.6.2
>I ran 2.4.24 (plus dm-1.00.07 for lvm2) for about 25 days and had no
>problems. This seemed like reasonable evidence the problem doesn't
>exist (or takes longer to show up) in 2.4.24. So then I switched to
>2.6.2 (that being the latest 2.6 kernel). After about six days uptime
>on 2.6.2, I started seeing the problem again.
>
>Can anyone suggest what the next step is?
I believe it is another stacking problem, since there is no problem with
using only one of md and lvm2, but i have no idea how to get
more details on this. I also am an ext3 only user so i have no idea if
other fs are affected.
>Here are the latest entries from my syslog (this is 2.6.2). As it
>usually does, this occurred early in the morning while the locatedb
>was being updated.
>
>Feb 14 06:12:44 hamachi kernel: EXT3-fs error (device dm-4): ext3_readdir: bad entry in directory #5160995: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
>Feb 14 06:12:44 hamachi kernel: Aborting journal on device dm-4.
>Feb 14 06:12:45 hamachi kernel: ext3_abort called.
>Feb 14 06:12:45 hamachi kernel: EXT3-fs abort (device dm-4): ext3_journal_start: Detected aborted journal
>Feb 14 06:12:45 hamachi kernel: Remounting filesystem read-only
>Feb 14 06:13:56 hamachi kernel: EXT3-fs error (device dm-4) in start_transaction: Journal has aborted
>Feb 14 06:13:57 hamachi last message repeated 741 times
Regards,
L.
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-02-21 11:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-07 20:53 2.6.0: ext3 journal aborted with ext3/lvm/raid5 Steven Ihde
2004-01-07 21:08 ` Mike Fedyk
2004-01-07 21:31 ` Steven Ihde
2004-01-07 22:42 ` Mike Fedyk
2004-01-07 23:14 ` Steven Ihde
2004-02-18 14:55 ` Leandro Guimarães Faria Corsetti Dutra
2004-02-19 21:08 ` Steven Ihde
2004-02-19 22:01 ` Leandro Guimarães Faria Corcete Dutra
2004-02-21 11:45 ` Luca Berra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).