* Kernel panic from corrupt journal
@ 2012-08-30 9:22 Brian Candler
2012-08-31 5:53 ` Theodore Ts'o
0 siblings, 1 reply; 3+ messages in thread
From: Brian Candler @ 2012-08-30 9:22 UTC (permalink / raw)
To: linux-ext4
I have a system where just replaying the journal causes a kernel panic. If I
boot into recovery mode and then type
# fsck -y /dev/sda8
it says it's recovering the journal, then a second or two later I get a
panic traceback. Unfortunately there are only 24 lines displayed on the
screen; my scribbled notes give the top and bottom ones as
req_bios_endio.isra.45+0xa3/0xe0
...
start_secondary+0xd8/0xdb
I can get a screenshot of this if it's useful to anyone.
With "debugfs /dev/sda8", then:
logdump /tmp/sda8.dmp -> this works OK, writes out a list of blocks
logdump -ac /tmp/sda8.dmp -> this also causes a kernel panic!
So:
(1) the fact that I can cause a kernel panic is a bug, and if I can help fix
it I will; however I'm not sure how I can pass on any useful information
given that even dumping the journal causes a kernel panic. Can I get the
journal by dd'ing at a specific offset?
(2) I'd also like to be able to recover this filesystem, e.g. by clearing
the journal, but I haven't been able to find out how to do this.
The best I can find by googling is to try mounting with ro,noload. I'll give
this a go to see if I can backup the filesystem, but otherwise it looks like
I may have to reformat the partition and restore.
Background info: this system is a Dell Zino HD running Ubuntu 12.04 (fully
patched as of 29 Aug 2012, standard 3.2.0-xx kernel). My wife accidentally
chose "suspend" rather than "shutdown" to turn it off yesterday, and it
failed to boot this morning.
Regards,
Brian.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Kernel panic from corrupt journal
2012-08-30 9:22 Kernel panic from corrupt journal Brian Candler
@ 2012-08-31 5:53 ` Theodore Ts'o
[not found] ` <20120831085307.GC17438@nsrc.org>
0 siblings, 1 reply; 3+ messages in thread
From: Theodore Ts'o @ 2012-08-31 5:53 UTC (permalink / raw)
To: Brian Candler; +Cc: linux-ext4
On Thu, Aug 30, 2012 at 10:22:12AM +0100, Brian Candler wrote:
> I have a system where just replaying the journal causes a kernel panic. If I
> boot into recovery mode and then type
>
> With "debugfs /dev/sda8", then:
>
> logdump /tmp/sda8.dmp -> this works OK, writes out a list of blocks
> logdump -ac /tmp/sda8.dmp -> this also causes a kernel panic!
I'm going to guess that there is some kind of hardware failure which
is returning an I/O error that the device driver can't handle, and so
you're getting a kernel panic. That's because if you get it while
accessing the data blocks on the disk which contains the journal from
a userspace program such as debugfs, there is no ext4 kernel code
which is involved.
Taking a screen shot with the scratch information is useful, but it
will almost certainly (especially when you are running the journal via
e2fsck or debugfs) show a kernel strack trace that doesn't involve the
ext4 code paths at all.
> (1) the fact that I can cause a kernel panic is a bug, and if I can help fix
> it I will; however I'm not sure how I can pass on any useful information
> given that even dumping the journal causes a kernel panic. Can I get the
> journal by dd'ing at a specific offset?
>
> (2) I'd also like to be able to recover this filesystem, e.g. by clearing
> the journal, but I haven't been able to find out how to do this.
What kind of device is /dev/sda? You might want to try removing the
disk and connecting it to another computer, and then copying the
entire disk image to a new disk using dd_rescue (which will create a
block-by-block copy of the disk image). You could try copying the
disk using dd_rescue on your current computer. but it will almost
certainly crash as well. It seems likely that the mere act of trying
to read the data block in question is causing an I/O error which is
causing the device driver to crash.
Regards,
- Ted
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Kernel panic from corrupt journal
[not found] ` <20120831085307.GC17438@nsrc.org>
@ 2012-08-31 17:46 ` Theodore Ts'o
0 siblings, 0 replies; 3+ messages in thread
From: Theodore Ts'o @ 2012-08-31 17:46 UTC (permalink / raw)
To: Brian Candler; +Cc: linux-ext4
On Fri, Aug 31, 2012 at 09:53:07AM +0100, Brian Candler wrote:
>
> You're quite right: yesterday I did see some I/O errors after I had mounted
> the filesystem using -o ro,noload.
>
> So this morning I ran
>
> dd if=/dev/sda8 of=/dev/null bs=1024k
>
> and it completed without a problem. And then I found I was able to mount the
> filesystem just fine!
>
> So this is definitely a hardware problem; it's just I didn't realise I/O
> errors could cause kernel panics as well as EIO.
Well, it's not *supposed* to cause kernel panics. If you can get a
stack trace in the future under similar circumstnaces, definitely
capture it (using a digital camera if you don't have a better way,
such as a network console or a serial console). Even if it's not an
ext4 bug, but I'm happy to to try to route the bug report to the
appropriate kernel developer or mailing list.
> I am currently refreshing my most recent backup of this drive, and I'll
> replace it ASAP.
The drive *might* be OK at this point. If you are willing to run a
full read/write test on the drive, and it shows no problem, it might
be worth trying to put it back in production (especially if you are
keeping regular backups); if it fails a second time, then it's
definitely time to replace it. It's really a question of how much the
cost of a new drive is worth compared to your time and the value of
your data in case of a second failure.
Or maybe you could a buy a second 500G drive, and set up software RAID
1 using the md device. This will give you protection if either of the
two drive fails, as well as giving you speed boost for reads.
Cheers!
- Ted
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-08-31 17:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-30 9:22 Kernel panic from corrupt journal Brian Candler
2012-08-31 5:53 ` Theodore Ts'o
[not found] ` <20120831085307.GC17438@nsrc.org>
2012-08-31 17:46 ` Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).