* 2.6.9-rc1-mm2 - 'journal block not found' - ext3 on crack?
@ 2004-09-02 18:15 Valdis.Kletnieks
2004-09-03 23:48 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Valdis.Kletnieks @ 2004-09-02 18:15 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]
I built 2.6.9-rc1-mm2 last night, and I've had this happen on 2 separate file systems today:
Sep 2 12:42:50 turing-police kernel: journal_bmap: journal block not found at offset 2316 on dm-6
Sep 2 12:42:52 turing-police kernel: Aborting journal on device dm-6.
Sep 2 12:42:52 turing-police kernel: ext3_abort called.
Sep 2 12:42:53 turing-police kernel: EXT3-fs error (device dm-6): ext3_journal_start: Detected aborted journal
Sep 2 12:42:53 turing-police kernel: Remounting filesystem read-only
Sep 2 12:42:54 turing-police kernel: EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
Sep 2 12:42:54 turing-police kernel: EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
Sep 2 12:42:54 turing-police kernel: journal commit I/O error
Sep 2 12:42:54 turing-police kernel: EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
Sep 2 12:42:57 turing-police last message repeated 15 times
(This one was /home - I'd paste the other one, but it happened on /var so
it didn't get logged because /var/adm/messages went R/O..)
Of interest:
1) Didn't see this under 2.6.8-mm4.
2) Neither time had any actual disk I/O error messages...
3) I'm using ext3-on-LVM, if that matters...
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.9-rc1-mm2 - 'journal block not found' - ext3 on crack?
2004-09-02 18:15 2.6.9-rc1-mm2 - 'journal block not found' - ext3 on crack? Valdis.Kletnieks
@ 2004-09-03 23:48 ` Andrew Morton
2004-09-05 12:00 ` karl.vogel
2004-09-13 8:25 ` Valdis.Kletnieks
0 siblings, 2 replies; 4+ messages in thread
From: Andrew Morton @ 2004-09-03 23:48 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: linux-kernel
Valdis.Kletnieks@vt.edu wrote:
>
> I built 2.6.9-rc1-mm2 last night, and I've had this happen on 2 separate file systems today:
>
> Sep 2 12:42:50 turing-police kernel: journal_bmap: journal block not found at offset 2316 on dm-6
> Sep 2 12:42:52 turing-police kernel: Aborting journal on device dm-6.
> Sep 2 12:42:52 turing-police kernel: ext3_abort called.
> Sep 2 12:42:53 turing-police kernel: EXT3-fs error (device dm-6): ext3_journal_start: Detected aborted journal
> Sep 2 12:42:53 turing-police kernel: Remounting filesystem read-only
> Sep 2 12:42:54 turing-police kernel: EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
> Sep 2 12:42:54 turing-police kernel: EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
> Sep 2 12:42:54 turing-police kernel: journal commit I/O error
> Sep 2 12:42:54 turing-police kernel: EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
> Sep 2 12:42:57 turing-police last message repeated 15 times
>
> (This one was /home - I'd paste the other one, but it happened on /var so
> it didn't get logged because /var/adm/messages went R/O..)
Probably caused by an I/O error (or data loss) performing metadata reads.
> Of interest:
>
> 1) Didn't see this under 2.6.8-mm4.
> 2) Neither time had any actual disk I/O error messages...
> 3) I'm using ext3-on-LVM, if that matters...
Let me guess: raid5?
We've had a steady stream of heisenbugreports for ext3-on-raid5
for several years.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.9-rc1-mm2 - 'journal block not found' - ext3 on crack?
2004-09-03 23:48 ` Andrew Morton
@ 2004-09-05 12:00 ` karl.vogel
2004-09-13 8:25 ` Valdis.Kletnieks
1 sibling, 0 replies; 4+ messages in thread
From: karl.vogel @ 2004-09-05 12:00 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Andrew Morton <akpm@osdl.org> writes:
> Valdis.Kletnieks@vt.edu wrote:
>>
>> I built 2.6.9-rc1-mm2 last night, and I've had this happen on 2 separate file systems today:
>>
>> Sep 2 12:42:54 turing-police kernel: EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
>
> Probably caused by an I/O error (or data loss) performing metadata reads.
>
>> 3) I'm using ext3-on-LVM, if that matters...
>
> Let me guess: raid5?
>
> We've had a steady stream of heisenbugreports for ext3-on-raid5
> for several years.
For the record, I just got a journal aborted on 2.6.9-rc1-mm3.
I'm also using ext3-on-LVM (no RAID stuff involved).
I'm afraid that I don't have more info.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.9-rc1-mm2 - 'journal block not found' - ext3 on crack?
2004-09-03 23:48 ` Andrew Morton
2004-09-05 12:00 ` karl.vogel
@ 2004-09-13 8:25 ` Valdis.Kletnieks
1 sibling, 0 replies; 4+ messages in thread
From: Valdis.Kletnieks @ 2004-09-13 8:25 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2975 bytes --]
On Fri, 03 Sep 2004 16:48:24 PDT, Andrew Morton said:
(Sorry for the delay in following up...)
> > Sep 2 12:42:50 turing-police kernel: journal_bmap: journal block not found
at offset 2316 on dm-6
> > Sep 2 12:42:52 turing-police kernel: Aborting journal on device dm-6.
> Probably caused by an I/O error (or data loss) performing metadata reads.
I'd be willing to buy that, except that it's been hitting me off-and-on and
there's NEVER a dmesg entry indicating an actual I/O problem. Also, I never
hit it against 2.6.8.1-mm4 - I've only seen it against -rc1-mm2 and -rc1-mm4
(didn't try mm1 or mm3). I'm always leery of silent "must be I/O error"
problems that show up when you upgrade the software.. ;)
Here's one against -mm4 last night:
Sep 12 03:00:03 turing-police kernel: audit(1094972403.380:0): avc: denied { search } for pid=14808 ex
e=/usr/bin/whereis name=man dev=dm-1 ino=18869 scontext=user_u:user_r:user_crond_t tcontext=system_u:obje
ct_r:man_t tclass=dir
Sep 12 03:37:04 turing-police kernel: journal_bmap: journal block not found at offset 12 on dm-6
Sep 12 03:37:04 turing-police kernel: Aborting journal on device dm-6.
Sep 12 03:37:04 turing-police kernel: __journal_remove_journal_head: freeing b_committed_data
Sep 12 03:37:04 turing-police last message repeated 4 times
Sep 12 03:37:07 turing-police kernel: ext3_abort called.
Sep 12 03:37:07 turing-police kernel: EXT3-fs error (device dm-6): ext3_journal_start: Detected aborted journal
Sep 12 03:37:07 turing-police kernel: Remounting filesystem read-only
Sep 12 03:37:07 turing-police kernel: EXT3-fs error (device dm-6) in start_transaction: Journal has aborted
Absolutely nothing out of the kernel for 37 minutes straight, till suddenly it
decides that /home has a problem (Yes, there *was* plenty activity on the disk
at the time - the usual 3AM cron-based disk thrashers, and I was trying to
catch up on lkml, hit 'commit' for a bunch of pending 'delete messages', and
things just sort of sat there because the file system just evaporated out from
under it).
> > Of interest:
> >
> > 1) Didn't see this under 2.6.8-mm4.
> > 2) Neither time had any actual disk I/O error messages...
> > 3) I'm using ext3-on-LVM, if that matters...
>
> Let me guess: raid5?
Nope. ext3, lvm, and a single laptop IDE disk under it all....
journal_bmap() appears to eventually end up down in ext3_get_branch(). Is it
possible that there's some weird locking/race issue where we end up calling
__bread() for the block, but get a bh that has wrong/bogus uptodate/dirty
flags? (this only seems to bite under high (for a laptop) I/O - while updatedb
or tripwire are running in background, or reading down my mail in the morning,
or doing a large compile/make, etc) Is it possible for the journal to wrap
around and we come back to an "old" page that *still* hasn't made it out to
disk due to LVM/elevator/etc issues (even though enough other I/O requests have
been retired to keep things happy?)
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-09-14 18:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-02 18:15 2.6.9-rc1-mm2 - 'journal block not found' - ext3 on crack? Valdis.Kletnieks
2004-09-03 23:48 ` Andrew Morton
2004-09-05 12:00 ` karl.vogel
2004-09-13 8:25 ` Valdis.Kletnieks
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox