Recovering from a damaged root inode

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* Recovering from a damaged root inode
@ 2014-08-26 11:32 Liwei
  2014-08-28 12:16 ` Theodore Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Liwei @ 2014-08-26 11:32 UTC (permalink / raw)
  To: linux-ext4

Hi list,
    I have an ext4 volume that went through a power failure,
corrupting both the first superblock and apparently at least the root
inode.
    fsck managed to restore a backup superblock, so I believe that is
fine, but I am unable to see any files when the volume is mounted.
    dmesg gives me the following:

        EXT4-fs error (device dm-2): htree_dirblock_to_tree:919: inode
#2: block 9249: comm ls: bad entry in directory: rec_len % 4 != 0 -
offset=0(0), inode=1848761160, rec_len=32290, name_len=62

    I thought a second fsck run would help, but running it with -n
gave me the following:

Resize inode not valid.  Recreate? no
Pass 1: Checking inodes, blocks, and sizes
Inode 1049 has imagic flag set.  Clear? no
Inode 1049 has a extra size (21138) which is invalid
Fix? no
Inode 1049 has a bad extended attribute block 470477004.  Clear? no
Extended attribute block 470477004 has h_blocks > 1.  Clear? no
Extended attribute block 470477004 is corrupt (invalid value).  Clear? no
Extended attribute block 470477004 is corrupt (invalid value).  Clear? no
Extended attribute block 470477004 is corrupt (allocation collision).  Clear? no
Error while reading over extent tree in inode 1049: Corrupt extent header
Clear inode? no
Inode 1049, i_size is 11664618411086678673, should be 0.  Fix? no
Inode 1049, i_blocks is 263842572654276, should be 1.  Fix? no
Inode 1050 is in use, but has dtime set.  Fix? no
Inode 1050 has a extra size (22079) which is invalid
Fix? no

--------snip--------

Block #1032 (1256803293) causes symlink to be too big.  IGNORED.
Block #1033 (18311953) causes symlink to be too big.  IGNORED.
Block #1034 (2250429016) causes symlink to be too big.  IGNORED.
Block #1035 (1392819776) causes symlink to be too big.  IGNORED.
Illegal indirect block (4041828270) in inode 1065.  IGNORED.
Illegal triple indirect block (3637063325) in inode 1065.  IGNORED.
Error while iterating over blocks in inode 1065: Illegal triply
indirect block found

Archive: ********** WARNING: Filesystem still has errors **********
e2fsck: aborted
Archive: ********** WARNING: Filesystem still has errors **********

    Which is a bad sign that something is majorly messed up.

    It might also be relevant that a few days prior to this, I ran an
online resize from 10TB to 12TB. The volume had not been unmounted for
almost a year prior to that nor the days leading up to the power
failure.

    # uname -a
        Linux Archiver 3.9.0+ #2 SMP PREEMPT Mon Jun 17 21:25:29 SGT
2013 x86_64 GNU/Linux
    fsck from util-linux 2.20.1

    What would be the best way to proceed? The volume is sitting on
top of a LVM, which I had already taken a snapshot of.

Liwei

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovering from a damaged root inode
  2014-08-26 11:32 Recovering from a damaged root inode Liwei
@ 2014-08-28 12:16 ` Theodore Ts'o
  2014-08-28 17:19   ` Liwei
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2014-08-28 12:16 UTC (permalink / raw)
  To: Liwei; +Cc: linux-ext4

On Tue, Aug 26, 2014 at 07:32:59PM +0800, Liwei wrote:
> 
>     I thought a second fsck run would help, but running it with -n
> gave me the following:

I take it you don't have the transcript from the first fsck run?

Also, you didn't tell us what version of e2fsprogs you are using.

Finally, this error is one was caused by your using fsck -n:

> Illegal triple indirect block (3637063325) in inode 1065.  IGNORED.
> Error while iterating over blocks in inode 1065: Illegal triply
> indirect block found

There was an illegal indirect bock in inode 1065, which wasn't fixed
because of e2fsck -n.  Unfortunately, this caused the scan to get
aborted, because the unfixed error caused the inode iterator to fail.
We could try to fix things up to make e2fsck -n recover more cleanly
in the face of errors caused by not fixing previously found errors,
but that hasn't been something that's been high priority.  (If someone
would like to improve e2fsck in this regard, please send patches.)

More generally, it looks like part of your inode table got smashed.
How, it's hard to say.  There have historically been some bugs with
resizing, but online resizing has been much more safe than off-line
resizing with big file systems, and the problems tend to with file
systems larger than 16TB.  (Although for file systems larger than 8TB,
I do strongly recommend that people update to the latest kernel and
e2fsprogs; and there have been a lot of bug fixes to e2fsprogs in the
past year and a half.  If you are using an enterprise distribution,
hopefully you're using one which has been good about backporting fixes
--- but 3.9.x hasn't been used by a distro kernel as far as I know,
and 3.9.x isn't even a long-term stable maintenace kernel.  So I'm
guessing this is a roll-your-own sort of system?)

	      	   		      	 - Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovering from a damaged root inode
  2014-08-28 12:16 ` Theodore Ts'o
@ 2014-08-28 17:19   ` Liwei
  2014-08-28 22:53     ` Liwei
  0 siblings, 1 reply; 4+ messages in thread
From: Liwei @ 2014-08-28 17:19 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

Hi Ted,
    Thanks for the response! Responses in-line.

On 28 August 2014 20:16, Theodore Ts'o <tytso@mit.edu> wrote:
> On Tue, Aug 26, 2014 at 07:32:59PM +0800, Liwei wrote:
>>
>>     I thought a second fsck run would help, but running it with -n
>> gave me the following:
>
> I take it you don't have the transcript from the first fsck run?

Yes, I initially thought it was a simple problem, so I did not keep a
log of the first run. From memory, all it did was replace the
superblock from a backup.

>
> Also, you didn't tell us what version of e2fsprogs you are using.

Not sure why I left out the obvious: 1.42.5-1.1

>
> Finally, this error is one was caused by your using fsck -n:
>
>> Illegal triple indirect block (3637063325) in inode 1065.  IGNORED.
>> Error while iterating over blocks in inode 1065: Illegal triply
>> indirect block found
>
> There was an illegal indirect bock in inode 1065, which wasn't fixed
> because of e2fsck -n.  Unfortunately, this caused the scan to get
> aborted, because the unfixed error caused the inode iterator to fail.
> We could try to fix things up to make e2fsck -n recover more cleanly
> in the face of errors caused by not fixing previously found errors,
> but that hasn't been something that's been high priority.  (If someone
> would like to improve e2fsck in this regard, please send patches.)

Personally I think the way e2fsck handled it is fine. Maybe a simple
message stating that "errors that occur when using -n may be the
result of the decision to ignore all fixes" would work.

>
> More generally, it looks like part of your inode table got smashed.

That sounds bad. From my limited understanding of ext4's structure,
each block group has its own inode table, right? Or is the inode table
global? What are my chances of recovering from this?

> How, it's hard to say.  There have historically been some bugs with
> resizing, but online resizing has been much more safe than off-line
> resizing with big file systems, and the problems tend to with file
> systems larger than 16TB.  (Although for file systems larger than 8TB,

I believe the problem came as a result of the power failure. Or are
you suggesting that the resize could have been instrumental in causing
this?

> I do strongly recommend that people update to the latest kernel and
> e2fsprogs; and there have been a lot of bug fixes to e2fsprogs in the
> past year and a half.  If you are using an enterprise distribution,
> hopefully you're using one which has been good about backporting fixes
> --- but 3.9.x hasn't been used by a distro kernel as far as I know,
> and 3.9.x isn't even a long-term stable maintenace kernel.  So I'm
> guessing this is a roll-your-own sort of system?)
>

Very good deduction. The machine is mainly a virtual machine host for
my own work, and I had to use the mainline kernel (about a year and
half ago, when I built the machine) in order to get some xen features
working. Since xen was very fidgety between kernel versions, I decided
to "not update when it ain't broke". I'll definitely update everything
after this, but my main concern now is the possibility of recovery.

>                                          - Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovering from a damaged root inode
  2014-08-28 17:19   ` Liwei
@ 2014-08-28 22:53     ` Liwei
  0 siblings, 0 replies; 4+ messages in thread
From: Liwei @ 2014-08-28 22:53 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On 29 August 2014 01:19, Liwei <xieliwei@gmail.com> wrote:
> Hi Ted,
>     Thanks for the response! Responses in-line.
>
====SNIP====
>> Finally, this error is one was caused by your using fsck -n:
>>
>>> Illegal triple indirect block (3637063325) in inode 1065.  IGNORED.
>>> Error while iterating over blocks in inode 1065: Illegal triply
>>> indirect block found
>>
>> There was an illegal indirect bock in inode 1065, which wasn't fixed
>> because of e2fsck -n.  Unfortunately, this caused the scan to get
>> aborted, because the unfixed error caused the inode iterator to fail.
>> We could try to fix things up to make e2fsck -n recover more cleanly
>> in the face of errors caused by not fixing previously found errors,
>> but that hasn't been something that's been high priority.  (If someone
>> would like to improve e2fsck in this regard, please send patches.)
>
====SNIP====
>> More generally, it looks like part of your inode table got smashed.
>
> That sounds bad. From my limited understanding of ext4's structure,
> each block group has its own inode table, right? Or is the inode table
> global? What are my chances of recovering from this?
>

I decided to make a snapshot and tried running fsck.ext4 -y.

Currently, this just flew by my screen:
Inode 117911, i_size is 10327091671466843186, should be 0.  Fix? yes

Inode 117911, i_blocks is 123836699223379, should be 0.  Fix? yes

Inode 117841 has a bad extended attribute block 3239803245.  Clear? yes

Inode 117841 has illegal block(s).  Clear? yes

Every single inode before that was erroneous. That means they're data
somehow mistaken as inodes, aren't they?

Liwei

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-08-28 22:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-26 11:32 Recovering from a damaged root inode Liwei
2014-08-28 12:16 ` Theodore Ts'o
2014-08-28 17:19   ` Liwei
2014-08-28 22:53     ` Liwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox