Recover from a "deleted inode referenced" situation

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Recover from a "deleted inode referenced" situation
@ 2017-10-05 21:31 Kilian Cavalotti
  2017-10-10 20:36 ` Andreas Dilger
  0 siblings, 1 reply; 12+ messages in thread
From: Kilian Cavalotti @ 2017-10-05 21:31 UTC (permalink / raw)
  To: linux-ext4

Dear ext4 experts,

TL;DR: I messed up a large filesystem, which now references deleted
inodes. What's the best way to recover from this and hopefully
reconstruct at least part of the directory hierarchy?

Full version:

I'm writing as a last recourse before committing data seppuku. I
failed to observe rule #1 of disaster recovery (sit on your hands) and
made a bad situation significantly worse. So I'm trying to figure out
how badly I'm screwed, and if there's any hope of salvation.

To set the stage, I have (sniff, *had*) an ext4 filesystem sitting on
a LVM logical volume, on top of a RAID5 dmraid volume. The dmraid
volume was expanded, then the LVM logical volume, and the ext4
filesystem was resize2fs'ed. Except somewhere in the process,
something failed and the ext4 filesystem was damaged. I unfortunately
don't really know much more about the failure.

At that point, the filesystem could be mounted read-only by using a
backup superblock (mount -o ro,sb=131072), and a quick glance at it
showed a decent directory structure, with at least top-level
directories intact.

So I jumped on it and started exfiltrating data from the damaged
filesystem to an external system. Now, and that's what will cause me
sorrow forever, I inadvertently remounted that filesystem read-write
while the transfer was running...

Of course, it soon started to throw errors about deleted inodes, like this:

EXT4-fs error (device dm-0): ext4_lookup:1644: inode #2: comm rsync:
deleted inode referenced: 1517

At that point, listing the root of the filesystem generated I/O errors
and dreadful question marks, where it displayed a valid directory
before the r/w remount:

$ ls /vol
ls: cannot access backup: Input/output error
drwxr-xr-x 2 root root 4096 Sep 28 11:10 .
drwxr-xr-x 4 root root 4096 Sep 14  2013 ..
-????????? ? ?    ?       ?            ? backup
[...]

I re-remounted read-only as soon as I realized my mistake, but the
filesystem stayed mounted r/w for a few minutes.

That's where I'm at right now. I'm dd'ing the LVM device to another
system before doing anything else, and while this is running (it will
take a few days, as the filesystem size is close to 20TB), I'm
pondering options.

I guess the next logical step would be to run fsck, but I'm very
worried that I will end up with mess of detached inodes in /lost+found
without any way to figure out their original location in the
filesystem...

I read about ways to run fsck without touching the underlying
filesystem (or image) with a LVM snapshots, or getting a copy of the
metadata information with e2image, but I'm not really sure how to
proceed.

Could anybody provide pointers or advice on what to do next? Is there
a way to undo the latest modifications done while the filesystem was
mounted r/w? Do I have any chance to recover the initial structure and
contents of my filesystem?

I can obviously provide all the required information, just didn't want
to make an already long email even longer.

Thanks!
-- 
Kilian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-05 21:31 Recover from a "deleted inode referenced" situation Kilian Cavalotti
@ 2017-10-10 20:36 ` Andreas Dilger
  2017-10-11 14:36   ` Kilian Cavalotti
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Dilger @ 2017-10-10 20:36 UTC (permalink / raw)
  To: Kilian Cavalotti; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 4414 bytes --]

On Oct 5, 2017, at 3:31 PM, Kilian Cavalotti <kilian.cavalotti.work@gmail.com> wrote:
> 
> Dear ext4 experts,
> 
> TL;DR: I messed up a large filesystem, which now references deleted
> inodes. What's the best way to recover from this and hopefully
> reconstruct at least part of the directory hierarchy?

If the problem is only one of the partition being misaligned compared
to the logical volume, you can run the "findsuper" utility which is
part of e2fsprogs *sources* (it isn't built and packaged by default).
It will scan your block device and print out the ext2/3/4 superblocks
that it finds, along with the *byte* offset of each one found.  You
can use this to determine where the start of the filesystem should be.

This is made *much* more complex if you have other LVs on the same
storage, and the LV was increased in size over multiple iterations,
resulting in a fragmented allocation of PEs.

> Full version:
> 
> I'm writing as a last recourse before committing data seppuku. I
> failed to observe rule #1 of disaster recovery (sit on your hands) and
> made a bad situation significantly worse. So I'm trying to figure out
> how badly I'm screwed, and if there's any hope of salvation.
> 
> To set the stage, I have (sniff, *had*) an ext4 filesystem sitting on
> a LVM logical volume, on top of a RAID5 dmraid volume. The dmraid
> volume was expanded, then the LVM logical volume, and the ext4
> filesystem was resize2fs'ed. Except somewhere in the process,
> something failed and the ext4 filesystem was damaged. I unfortunately
> don't really know much more about the failure.
> 
> At that point, the filesystem could be mounted read-only by using a
> backup superblock (mount -o ro,sb=131072), and a quick glance at it
> showed a decent directory structure, with at least top-level
> directories intact.
> 
> So I jumped on it and started exfiltrating data from the damaged
> filesystem to an external system. Now, and that's what will cause me
> sorrow forever, I inadvertently remounted that filesystem read-write
> while the transfer was running...
> 
> Of course, it soon started to throw errors about deleted inodes, like this:
> 
> EXT4-fs error (device dm-0): ext4_lookup:1644: inode #2: comm rsync:
> deleted inode referenced: 1517
> 
> At that point, listing the root of the filesystem generated I/O errors
> and dreadful question marks, where it displayed a valid directory
> before the r/w remount:
> 
> $ ls /vol
> ls: cannot access backup: Input/output error
> drwxr-xr-x 2 root root 4096 Sep 28 11:10 .
> drwxr-xr-x 4 root root 4096 Sep 14  2013 ..
> -????????? ? ?    ?       ?            ? backup
> [...]
> 
> I re-remounted read-only as soon as I realized my mistake, but the
> filesystem stayed mounted r/w for a few minutes.

It sounds like this replayed a corrupted journal over the rest of your
filesystem, leading to further corruption.

> That's where I'm at right now. I'm dd'ing the LVM device to another
> system before doing anything else, and while this is running (it will
> take a few days, as the filesystem size is close to 20TB), I'm
> pondering options.
> 
> I guess the next logical step would be to run fsck, but I'm very
> worried that I will end up with mess of detached inodes in /lost+found
> without any way to figure out their original location in the
> filesystem...
> 
> I read about ways to run fsck without touching the underlying
> filesystem (or image) with a LVM snapshots, or getting a copy of the
> metadata information with e2image, but I'm not really sure how to
> proceed.
> 
> Could anybody provide pointers or advice on what to do next?

My only recommendation would be to update to the latest e2fsprogs,
since it usually fixes important issues found in older versions.

> Is there a way to undo the latest modifications done while the
> filesystem was mounted r/w?

Seems unlikely, unless you have an LVM snapshot.

> Do I have any chance to recover the initial structure and
> contents of my filesystem?

e2fsck is good at recovering what files are available, much better
than other filesystem recovery tools, but it can only work with the
data it has.

> 
> I can obviously provide all the required information, just didn't want
> to make an already long email even longer.
> 
> 
> Thanks!
> --
> Kilian


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-10 20:36 ` Andreas Dilger
@ 2017-10-11 14:36   ` Kilian Cavalotti
  2017-10-12 22:02     ` Kilian Cavalotti
  0 siblings, 1 reply; 12+ messages in thread
From: Kilian Cavalotti @ 2017-10-11 14:36 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4

Hi Andreas,

Thanks a lot for taking the time to answer my plea for help. :)

On Tue, Oct 10, 2017 at 1:36 PM, Andreas Dilger <adilger@dilger.ca> wrote:
> If the problem is only one of the partition being misaligned compared
> to the logical volume, you can run the "findsuper" utility which is
> part of e2fsprogs *sources* (it isn't built and packaged by default).
> It will scan your block device and print out the ext2/3/4 superblocks
> that it finds, along with the *byte* offset of each one found.  You
> can use this to determine where the start of the filesystem should be.

I will give this a try, thanks. Although I don't really had any issue
to mount the filesystem r/o, which seems to indicate that there is no
misalignment issue, right?

> This is made *much* more complex if you have other LVs on the same
> storage, and the LV was increased in size over multiple iterations,
> resulting in a fragmented allocation of PEs.

Just one LV there.

>> $ ls /vol
>> ls: cannot access backup: Input/output error
>> drwxr-xr-x 2 root root 4096 Sep 28 11:10 .
>> drwxr-xr-x 4 root root 4096 Sep 14  2013 ..
>> -????????? ? ?    ?       ?            ? backup
>> [...]
>>
>> I re-remounted read-only as soon as I realized my mistake, but the
>> filesystem stayed mounted r/w for a few minutes.
>
> It sounds like this replayed a corrupted journal over the rest of your
> filesystem, leading to further corruption.

Ah I see. So even of no process was actively writing to the
filesystem, simply remounting it read-write made it replay an old
journal? I know I shouldn't have done that, but I really didn't expect
so much impact: there's probably only around 15-20% of the original
data left... :(

> My only recommendation would be to update to the latest e2fsprogs,
> since it usually fixes important issues found in older versions.

Will make sure to use the latest one.

> Seems unlikely, unless you have an LVM snapshot.

I so wish, but I don't. :(

> e2fsck is good at recovering what files are available, much better
> than other filesystem recovery tools, but it can only work with the
> data it has.

So, another question: given e2fsck doesn't complain about a missing or
damaged superblock, is there any reason why running it with an
alternative superblock (with -b) would yield different results?

Thanks!
-- 
Kilian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-11 14:36   ` Kilian Cavalotti
@ 2017-10-12 22:02     ` Kilian Cavalotti
  2017-10-13 18:40       ` Andreas Dilger
  0 siblings, 1 reply; 12+ messages in thread
From: Kilian Cavalotti @ 2017-10-12 22:02 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4

On Wed, Oct 11, 2017 at 7:36 AM, Kilian Cavalotti
<kilian.cavalotti.work@gmail.com> wrote:
> On Tue, Oct 10, 2017 at 1:36 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>> If the problem is only one of the partition being misaligned compared
>> to the logical volume, you can run the "findsuper" utility which is
>> part of e2fsprogs *sources* (it isn't built and packaged by default).
>> It will scan your block device and print out the ext2/3/4 superblocks
>> that it finds, along with the *byte* offset of each one found.  You
>> can use this to determine where the start of the filesystem should be.
>
> I will give this a try, thanks. Although I don't really had any issue
> to mount the filesystem r/o, which seems to indicate that there is no
> misalignment issue, right?

So... I tried findsuper, and it started listing things (including, for
some, non-printable characters in the "label" field, but I guess it's
just false-positives).

It's still running right now, but it already listed superblocks which
are obviously not related to the actual filesystem, but to raw disk
images (created via dd) that I had on that filesystem. For instance:

-- 8< ---------------------------------------------------------
# /usr/local/sbin/findsuper /dev/mapper/overlay
starting at 0, with 512 byte increments
byte_offset  byte_start     byte_end  fs_blocks blksz  grp
mkfs/mount_time           sb_uuid label
      1024           0  17983724322816   95590400  4096    0  Thu Nov
26 23:22:42 2015 12c2c019 1.42.6-5644
149595921408 18323935265112545280  13818231386087778304  2356078660
1024 28990  Sun Sep  7 13:34:55 2098 dc41997c 8�b���a`/����03��x^��|


��Fq��<<��_�sߠ蒿���Cx�X��Z��8���MxR��(R���/
                              �@�-+�>�F�������ۂ����@�
���$�׸}��|���)����
��K����y��l��0�ހc.3`B�~g݄`A�~��{�12��!^p���������k�Dn
0������oD��޳���I��;&ڢ�B�����/�Y� �_Ba�o���?$}�b'�B���%A�!�#�$@u�8~
293534171136 293534170112  18277258492928   95590400  4096    0  Thu
Nov 26 23:22:42 2015 12c2c019 1.42.6-5644
716505512960 716505511936  716628391936     120000  1024    0  Thu Aug
20 23:31:41 2009 d33ff464
716539067392 716530677760  716653557760     120000  1024    1  Thu Aug
20 23:31:41 2009 d33ff464
[...]
2556972232704 2556972231680  2560139979776     773376  4096    0  Tue
Jan 29 20:40:53 2013 da57c5f6 casper-rw
2557257444352 2557123226624  2560290974720     773376  4096    1  Tue
Jan 29 20:40:53 2013 da57c5f6 casper-rw
2557962087424 2557559434240  2560727182336     773376  4096    3  Tue
Jan 29 20:40:53 2013 da57c5f6 casper-rw
2558532512768 2557861424128  2561029172224     773376  4096    5  Tue
Jan 29 20:40:53 2013 da57c5f6 casper-rw
2559086160896 2558146636800  2561314384896     773376  4096    7  Tue
Jan 29 20:40:53 2013 da57c5f6 casper-rw
2559656586240 2558448626688  2561616374784     773376  4096    9  Tue
Jan 29 20:40:53 2013 da57c5f6 casper-rw
-- 8< ---------------------------------------------------------

That makes me think I should be able to recover those dd images with
the correct offsets, right? So for instance, looking at the
"casper-rw" label, I tried this:

-- 8< ---------------------------------------------------------
# dd if=/dev/ro_device of=/tmp/test.dd bs=4096 skip=2556972231680
count=773376 iflag=skip_bytes
-- 8< ---------------------------------------------------------

The resulting /tmp/test.dd file looks like a correct filesystem:
-- 8< ---------------------------------------------------------
# file /tmp/dd_test
/tmp/dd_test: Linux rev 1.0 ext2 filesystem data (mounted or unclean),
UUID=da57c5f6-1018-8a45-83b9-e12a39be7ce2, volume name "casper-rw"
(large files)
-- 8< ---------------------------------------------------------

But I can't seem to be able to mount it:
-- 8< ---------------------------------------------------------
[507159.565593] EXT2-fs (loop3): warning: mounting unchecked fs,
running e2fsck is recommended
[507165.114603] EXT2-fs (loop3): error: ext2_check_page: bad entry in
directory #2: : unaligned directory entry - offset=0,
inode=2263518911, rec_len=23425, name_len=177
[507165.129657] EXT2-fs (loop3): error: ext2_readdir: bad page in #2
-- 8< ---------------------------------------------------------

Did I misinterpret the output of findsuper?

Thanks!
-- 
Kilian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-12 22:02     ` Kilian Cavalotti
@ 2017-10-13 18:40       ` Andreas Dilger
  2017-10-15  1:16         ` Kilian Cavalotti
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Dilger @ 2017-10-13 18:40 UTC (permalink / raw)
  To: Kilian Cavalotti; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 5009 bytes --]

On Oct 12, 2017, at 4:02 PM, Kilian Cavalotti <kilian.cavalotti.work@gmail.com> wrote:
> 
> On Wed, Oct 11, 2017 at 7:36 AM, Kilian Cavalotti
> <kilian.cavalotti.work@gmail.com> wrote:
>> On Tue, Oct 10, 2017 at 1:36 PM, Andreas Dilger <adilger@dilger.ca> wrote:
>>> If the problem is only one of the partition being misaligned compared
>>> to the logical volume, you can run the "findsuper" utility which is
>>> part of e2fsprogs *sources* (it isn't built and packaged by default).
>>> It will scan your block device and print out the ext2/3/4 superblocks
>>> that it finds, along with the *byte* offset of each one found.  You
>>> can use this to determine where the start of the filesystem should be.
>> 
>> I will give this a try, thanks. Although I don't really had any issue
>> to mount the filesystem r/o, which seems to indicate that there is no
>> misalignment issue, right?
> 
> So... I tried findsuper, and it started listing things (including, for
> some, non-printable characters in the "label" field, but I guess it's
> just false-positives).

The findsuper comment was potentially misleading, as I was mixing up your
problem with one in another thread where the partition table was clobbered.

> It's still running right now, but it already listed superblocks which
> are obviously not related to the actual filesystem, but to raw disk
> images (created via dd) that I had on that filesystem. For instance:
> 
> -- 8< ---------------------------------------------------------
> # /usr/local/sbin/findsuper /dev/mapper/overlay
> starting at 0, with 512 byte increments
> byte_offset  byte_start     byte_end  fs_blocks blksz  grp
> mkfs/mount_time           sb_uuid label
>      1024           0  17983724322816   95590400  4096    0  Thu Nov
> 26 23:22:42 2015 12c2c019 1.42.6-5644

This is likely to be the proper superblock, but it has a bit of a
strange label.  Looks like a really old e2fsprogs build version?

> 2556972232704 2556972231680  2560139979776     773376  4096    0  Tue
> Jan 29 20:40:53 2013 da57c5f6 casper-rw
> 2557257444352 2557123226624  2560290974720     773376  4096    1  Tue
> Jan 29 20:40:53 2013 da57c5f6 casper-rw
> 2557962087424 2557559434240  2560727182336     773376  4096    3  Tue
> Jan 29 20:40:53 2013 da57c5f6 casper-rw
> 2558532512768 2557861424128  2561029172224     773376  4096    5  Tue
> Jan 29 20:40:53 2013 da57c5f6 casper-rw
> 2559086160896 2558146636800  2561314384896     773376  4096    7  Tue
> Jan 29 20:40:53 2013 da57c5f6 casper-rw
> 2559656586240 2558448626688  2561616374784     773376  4096    9  Tue
> Jan 29 20:40:53 2013 da57c5f6 casper-rw
> -- 8< ---------------------------------------------------------
> 
> That makes me think I should be able to recover those dd images with
> the correct offsets, right? So for instance, looking at the
> "casper-rw" label, I tried this:
> 
> -- 8< ---------------------------------------------------------
> # dd if=/dev/ro_device of=/tmp/test.dd bs=4096 skip=2556972231680
> count=773376 iflag=skip_bytes
> -- 8< ---------------------------------------------------------
> 
> The resulting /tmp/test.dd file looks like a correct filesystem:
> -- 8< ---------------------------------------------------------
> # file /tmp/dd_test
> /tmp/dd_test: Linux rev 1.0 ext2 filesystem data (mounted or unclean),
> UUID=da57c5f6-1018-8a45-83b9-e12a39be7ce2, volume name "casper-rw"
> (large files)
> -- 8< ---------------------------------------------------------
> 
> But I can't seem to be able to mount it:
> -- 8< ---------------------------------------------------------
> [507159.565593] EXT2-fs (loop3): warning: mounting unchecked fs,
> running e2fsck is recommended
> [507165.114603] EXT2-fs (loop3): error: ext2_check_page: bad entry in
> directory #2: : unaligned directory entry - offset=0,
> inode=2263518911, rec_len=23425, name_len=177
> [507165.129657] EXT2-fs (loop3): error: ext2_readdir: bad page in #2
> -- 8< ---------------------------------------------------------
> 
> Did I misinterpret the output of findsuper?

No, this looks like the _start_ of a filesystem image, but there is
no real guarantee that the blocks in the file are allocated contiguously
in the actual filesystem, so your "dd" is unlikely to work properly.
The filesystem itself is 773376 * 4KB ~= 3GB in size, and if it was
originally created as a sparse file there is little chance those blocks
were allocated contiguously.  The findsuper utility is meant to locate
superblocks in a block device to help recover from partition table woes.


If you still have access to some of the files, you should consider to
copy them out of the filesystem.  Next, I would recommend to make an
LVM snapshot and run e2fsck on that to see what else you get out of it.
Depending on the amount and type of corruption, that may take a very
long time on a 20TB filesystem, and not be worthwhile to wait for.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-13 18:40       ` Andreas Dilger
@ 2017-10-15  1:16         ` Kilian Cavalotti
  2017-10-15 12:48           ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Kilian Cavalotti @ 2017-10-15  1:16 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4

Hi Andreas,

On Fri, Oct 13, 2017 at 11:40 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> The findsuper comment was potentially misleading, as I was mixing up your
> problem with one in another thread where the partition table was clobbered.

Gotcha. Learned about it though, so it was still useful. ;)

>>      1024           0  17983724322816   95590400  4096    0  Thu Nov
>> 26 23:22:42 2015 12c2c019 1.42.6-5644
>
> This is likely to be the proper superblock, but it has a bit of a
> strange label.  Looks like a really old e2fsprogs build version?

Yes, the filesystem label name is for some reason the version of mkfs
that created it. The volume is from a Synology NAS, I assume that's
how they do things.

> No, this looks like the _start_ of a filesystem image, but there is
> no real guarantee that the blocks in the file are allocated contiguously
> in the actual filesystem, so your "dd" is unlikely to work properly.
> The filesystem itself is 773376 * 4KB ~= 3GB in size, and if it was
> originally created as a sparse file there is little chance those blocks
> were allocated contiguously.  The findsuper utility is meant to locate
> superblocks in a block device to help recover from partition table woes.

Aaah, right, got it.

> If you still have access to some of the files, you should consider to
> copy them out of the filesystem.  Next, I would recommend to make an
> LVM snapshot and run e2fsck on that to see what else you get out of it.
> Depending on the amount and type of corruption, that may take a very
> long time on a 20TB filesystem, and not be worthwhile to wait for.

I did that actually: I was able to salvage about 2.3 TB of data that
was still accessible from a read-only mount. Then, I created a
snapshot (although with mdraid, not LVM, but same idea), and fsck'ed
the filesystem: fsck found about 800GB more, but not many intact
directories. I was hoping that some of the top directories that got
lost could be found relatively intact in lost+found/ but I ended up
with a pretty flat hierarchy of things there, with directories at most
maybe 3-4 levels deep. I'll need to spend some time trying to put
pieces together from what fsck recovered.

But unfortunately there's another ~17TB of data that fsck didin't
find. That seems like a lot of data lost from just replaying a
corrupted journal... :(

Cheers,
-- 
Kilian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-15  1:16         ` Kilian Cavalotti
@ 2017-10-15 12:48           ` Theodore Ts'o
  2017-10-15 23:37             ` Kilian Cavalotti
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2017-10-15 12:48 UTC (permalink / raw)
  To: Kilian Cavalotti; +Cc: Andreas Dilger, linux-ext4

On Sat, Oct 14, 2017 at 06:16:14PM -0700, Kilian Cavalotti wrote:
> But unfortunately there's another ~17TB of data that fsck didin't
> find. That seems like a lot of data lost from just replaying a
> corrupted journal... :(

It wasn't from replaying a journal, corrupted or not.  Andreas was
mistaken there; remounting the file system read/write would not have
triggered a journal replay; if the journal needed replaying it would
have been replayed on the read-only mount.

There are two possibilities about what could have happened; one is
that the file system was already badly corrupted, but your copy
command hadn't started hitting the corrupted portion of the file
system, and so it was coincidence that the r/w remount happened right
before the errors started getting flagged.

The second possibility is that is that the allocation bitmaps were
corrupted, and shortly after you remounted read/write something stated
to write into your file system, and since the part of the inode table
areas was marked as "available" the write into the file system ended
up smashing the inode table.  (More modern kernels enable the
block_validity option by default, which would have prevented this; but
if you were using an older kernel, it would not have enabled this
feature by default.)

Since the problem started with the resize, I'm actually guessing the
first is more likely.  Especially if you were using an older version
of e2fsprogs/resize2fs, and if you were doing an off-line resize
(i.e., the file system was unmounted at the time).  There were a
number of bugs with older versions of e2fsprogs with file systems
larger than 16TB (hence, the 64-bit file system feature was enabled)
associated with off-line resize, and the manisfestation of these bugs
includes portions of the inode table getting smashed.

Unfortunately, there may not be a lot we can do, if that's the case.  :-(

This is probably not a great time to remind people about the value of
backups, especially off-site backups (even if software was 100%
bug-free, what if there was a fire at your home/work)?

Sorry,

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-15 12:48           ` Theodore Ts'o
@ 2017-10-15 23:37             ` Kilian Cavalotti
  2017-10-16  1:28               ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Kilian Cavalotti @ 2017-10-15 23:37 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andreas Dilger, linux-ext4

Hi Ted,

I very much appreciate you taking the time to answer here. Some
comments inline below.

On Sun, Oct 15, 2017 at 5:48 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> It wasn't from replaying a journal, corrupted or not.  Andreas was
> mistaken there; remounting the file system read/write would not have
> triggered a journal replay; if the journal needed replaying it would
> have been replayed on the read-only mount.
>
> There are two possibilities about what could have happened; one is
> that the file system was already badly corrupted, but your copy
> command hadn't started hitting the corrupted portion of the file
> system, and so it was coincidence that the r/w remount happened right
> before the errors started getting flagged.

That's something I considered indeed, but (and I have recorded session
logs to prove I didn't dream it after the fact) right after the
initial r/o mount, I ran a "du -hs" on some deeper-level directories:
there wasn't any error and "du" returned reasonable values.

The timeline was:
1. mounted r/o
2. "du /mnt/path/to/deep/dir" returned decent value
3. started rsyncíng data out
4. mounted r/w
5. errors started to appear in the rsync process
6. "ls /mnt/path" returned I/O error and "deleted inode referenced"

So I _think_ that the filesystem was mostly ok before the r/w remount,
because the first-level directory of my initial "du", which worked,
ended up disappearing.

> The second possibility is that is that the allocation bitmaps were
> corrupted, and shortly after you remounted read/write something stated
> to write into your file system, and since the part of the inode table
> areas was marked as "available" the write into the file system ended
> up smashing the inode table.  (More modern kernels enable the
> block_validity option by default, which would have prevented this; but
> if you were using an older kernel, it would not have enabled this
> feature by default.)

Yeah, I think it may have been this: although I didn't explicitly
write to the filesystem, I suspect some system daemon may have...
It's a 3.10 kernel, by the way, with likely some back-ported patches,
but the vendor doesn't provide many details.

> Since the problem started with the resize, I'm actually guessing the
> first is more likely.  Especially if you were using an older version
> of e2fsprogs/resize2fs,

1.42.6, most likely.

> and if you were doing an off-line resize
> (i.e., the file system was unmounted at the time).

I think it started online, but I'm not even sure it actually did it. I
don't have enough logs from that part to be sure what happened. I
believe resize2fs may actually have refused to operate, because of
pre-existing ext4 errors, but in the end, the filesystem appears to
have been resized anyways... So maybe the online tentative didn't
work, and the vendor automated process tried again an offline resize?
Is there any possibility that the filesystem could appear to be
resized (extended) with the actual inode table still referencing the
pre-resize one?

> There were a
> number of bugs with older versions of e2fsprogs with file systems
> larger than 16TB (hence, the 64-bit file system feature was enabled)
> associated with off-line resize, and the manisfestation of these bugs
> includes portions of the inode table getting smashed.
>
> Unfortunately, there may not be a lot we can do, if that's the case.  :-(

The upside is that I'm now learning a lot about file carving tools. :)

> This is probably not a great time to remind people about the value of
> backups, especially off-site backups (even if software was 100%
> bug-free, what if there was a fire at your home/work)?

It's always a good time to remind about backups, and although the bulk
of the most precious data I had was replicated elsewhere, verifying
integrity and consistency of such replications is a whole venture in
itself.
So yeah, from now on, logical replication on different (file)systems,
metadata backup (I had no idea about e2image before) and snapshots
will be non negotiable requirements.

> Sorry,

I appreciate this, truly, and I'm very grateful for the very existence
of ext4. As many catastrophes, this was likely an accumulation of
little things that would have been benign taken independently, but
that each contributed to my data going poof. :)

Cheers,
-- 
Kilian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-15 23:37             ` Kilian Cavalotti
@ 2017-10-16  1:28               ` Theodore Ts'o
  2017-10-17 15:32                 ` Kilian Cavalotti
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2017-10-16  1:28 UTC (permalink / raw)
  To: Kilian Cavalotti; +Cc: Andreas Dilger, linux-ext4

On Sun, Oct 15, 2017 at 04:37:03PM -0700, Kilian Cavalotti wrote:
> The timeline was:
> 1. mounted r/o
> 2. "du /mnt/path/to/deep/dir" returned decent value
> 3. started rsyncíng data out
> 4. mounted r/w
> 5. errors started to appear in the rsync process
> 6. "ls /mnt/path" returned I/O error and "deleted inode referenced"
> 
> So I _think_ that the filesystem was mostly ok before the r/w remount,
> because the first-level directory of my initial "du", which worked,
> ended up disappearing.

I agree with you; if the initial du worked, but then a du after the
r/w remount failed, then the second possibility is more likely what
happened.

> I think it started online, but I'm not even sure it actually did it. I
> don't have enough logs from that part to be sure what happened. I
> believe resize2fs may actually have refused to operate, because of
> pre-existing ext4 errors, but in the end, the filesystem appears to
> have been resized anyways... So maybe the online tentative didn't
> work, and the vendor automated process tried again an offline resize?
> Is there any possibility that the filesystem could appear to be
> resized (extended) with the actual inode table still referencing the
> pre-resize one?

It's possible, I suppose.  If the vendor script unmounted the file
system and then attempted to run e2fsck -fy to fix the file system,
perhaps.  In which case the damage could also have been done by the
e2fsck -fy run, depending on how badly the file system was corrupted
before this whole procedure was started.

But that would imply that the NAS box would have to stop serving the
file system, and it would have been pretty obviously an off-line
procedure.

Good luck,

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-16  1:28               ` Theodore Ts'o
@ 2017-10-17 15:32                 ` Kilian Cavalotti
  2017-10-17 16:48                   ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Kilian Cavalotti @ 2017-10-17 15:32 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andreas Dilger, linux-ext4

On Sun, Oct 15, 2017 at 6:28 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>> Is there any possibility that the filesystem could appear to be
>> resized (extended) with the actual inode table still referencing the
>> pre-resize one?
>
> It's possible, I suppose.  If the vendor script unmounted the file
> system and then attempted to run e2fsck -fy to fix the file system,
> perhaps.  In which case the damage could also have been done by the
> e2fsck -fy run, depending on how badly the file system was corrupted
> before this whole procedure was started.

In that case, is there any way to re-shrink the filesystem to its
pre-expansion size, and try to read the pre-expansion inode table from
another superblock? Or did the r/w remount over-write all the existing
superblocks with the same new, corrupted information?

> But that would imply that the NAS box would have to stop serving the
> file system, and it would have been pretty obviously an off-line
> procedure.

Well, I have this elements that point in that direction:
1. the online resize2fs tentative aborted (I have that logged),
2. the filesystem has been expanded (I can see it now),
3. it definitely stopped serving the filesystem at some point.


Cheers,
-- 
Kilian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-17 15:32                 ` Kilian Cavalotti
@ 2017-10-17 16:48                   ` Theodore Ts'o
  2017-10-18 18:39                     ` Kilian Cavalotti
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2017-10-17 16:48 UTC (permalink / raw)
  To: Kilian Cavalotti; +Cc: Andreas Dilger, linux-ext4

On Tue, Oct 17, 2017 at 08:32:52AM -0700, Kilian Cavalotti wrote:
> On Sun, Oct 15, 2017 at 6:28 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> >> Is there any possibility that the filesystem could appear to be
> >> resized (extended) with the actual inode table still referencing the
> >> pre-resize one?
> >
> > It's possible, I suppose.  If the vendor script unmounted the file
> > system and then attempted to run e2fsck -fy to fix the file system,
> > perhaps.  In which case the damage could also have been done by the
> > e2fsck -fy run, depending on how badly the file system was corrupted
> > before this whole procedure was started.
> 
> In that case, is there any way to re-shrink the filesystem to its
> pre-expansion size, and try to read the pre-expansion inode table from
> another superblock? Or did the r/w remount over-write all the existing
> superblocks with the same new, corrupted information?

Unfortunately, if this is what happened, then the true damage was done
when the file system was remounted read/write, and because the
allocation bitmaps were incorrect, portions of the inode table were
overwritten with file data.  (As I mentioned, with modern file systems
there is a safety check which is now enabled by default which will
notice when there is an attempt to allocate blocks that are part of
the inode table, and stop this Very Bad Thing from happening.  It does
take a tiny bit of extra CPU overhead, but we ultimately decided It
Was Worth It.  Unfortunately, it was not enabled by default in the
3.10 kernel.)

There's not really any recovery possible in that case, unfortunately.  :-(

	    	       			    - Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Recover from a "deleted inode referenced" situation
  2017-10-17 16:48                   ` Theodore Ts'o
@ 2017-10-18 18:39                     ` Kilian Cavalotti
  0 siblings, 0 replies; 12+ messages in thread
From: Kilian Cavalotti @ 2017-10-18 18:39 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andreas Dilger, linux-ext4

On Tue, Oct 17, 2017 at 9:48 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> Unfortunately, if this is what happened, then the true damage was done
> when the file system was remounted read/write, and because the
> allocation bitmaps were incorrect, portions of the inode table were
> overwritten with file data.  (As I mentioned, with modern file systems
> there is a safety check which is now enabled by default which will
> notice when there is an attempt to allocate blocks that are part of
> the inode table, and stop this Very Bad Thing from happening.  It does
> take a tiny bit of extra CPU overhead, but we ultimately decided It
> Was Worth It.  Unfortunately, it was not enabled by default in the
> 3.10 kernel.)
>
> There's not really any recovery possible in that case, unfortunately.  :-(

Argh.

Well, thanks for the explanation, at least I think I have a better
understanding of how things unfolded, now. I'm glad that this safety
check is enabled by default now, and I'm gonna keep repeating myself
"never ever mount read-write when you have a doubt" while I'm waiting
for the carving tools to scan my blocks.

Thanks again!
-- 
Kilian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-18 18:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-05 21:31 Recover from a "deleted inode referenced" situation Kilian Cavalotti
2017-10-10 20:36 ` Andreas Dilger
2017-10-11 14:36   ` Kilian Cavalotti
2017-10-12 22:02     ` Kilian Cavalotti
2017-10-13 18:40       ` Andreas Dilger
2017-10-15  1:16         ` Kilian Cavalotti
2017-10-15 12:48           ` Theodore Ts'o
2017-10-15 23:37             ` Kilian Cavalotti
2017-10-16  1:28               ` Theodore Ts'o
2017-10-17 15:32                 ` Kilian Cavalotti
2017-10-17 16:48                   ` Theodore Ts'o
2017-10-18 18:39                     ` Kilian Cavalotti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).