linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Recovery after mkfs.ext4 on a ext4
@ 2014-06-15  8:12 Killian De Volder
  2014-06-15 13:20 ` Theodore Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Killian De Volder @ 2014-06-15  8:12 UTC (permalink / raw)
  To: linux-ext4

Excuse me for requesting this information,
but I could not find any good leads on how to deal with this issue.
Any help would be appreciated.

This is what happened:
I accidentally exported the wrong block-device to a virtual machine.
I ran mkfs.ext4 on this, already mounted ext4, block-device.
As soon as I noticed the error,I stopped it.
It got to "Writing superblocks and file system accounting information".
(After Writing inode tables, and creating journal).

I remounted the filesytem RO on the original mount as soon as possible
and set vfs_cache_pressure to 0 to prevent any cache to move out.
> Jun 11 17:42:15 [kernel] EXT4-fs (xvda8): re-mounted. Opts: errors=remount-ro,barrier=0
> Jun 11 18:06:30 [kernel] EXT4-fs error (device xvda8): htree_dirblock_to_tree:892: inode #2: block 1313: comm ls: bad entry in directory: inode out of bounds - offset=0(0), inode=2751463424, rec_len=16, name_len=0
> Jun 11 18:06:44 [kernel] EXT4-fs error (device xvda8): htree_dirblock_to_tree:892: inode #2: block 1313: comm ls: bad entry in directory: inode out of bounds - offset=0(0), inode=2751463424, rec_len=16, name_len=0
> Jun 11 18:07:58 [kernel] EXT4-fs error (device xvda8): ext4_lookup:1416: inode #6492161: comm ls: deleted inode referenced: 11
This is the kernel log, I don't know if the orginal mount has been writing out new inode-tables or not ?

I was able to copy a good portion of data out,
however due to another issue the server locked up (bug in bcache),
and I had to reset the server.
(I did however not do a dumpe2fs from the vm, I only later learned I could do this.)

Now I'm performing a fsck -nf using scratch files (392k for dirinfo and 20k for icount so far).
However the memory usage is 19.3GiB(I have 16GiB) as a result the process is painstaking slow.

Regardless it got trough the first part of pass1.
Now it's spitting out the following a lot:
> Block #XXXX (XXXX) causes file to be too big.  IGNORED.
> Too many illegal blocks in inode 426.
> Clear inode? no
Debug2fs has the following information:
- volume name -> is still the orginal
- Don't know if UUID changed or not.
- Some group blocks have there there checksum set to 0x0000 (unused inodes 0)
  others have a checksum (unused inodes 2048).
- None of the groups have a valid checksum.

A few questions:
Would it be better if I ran mkfs.ext4 -S ?
Can e2fsck recover the directory structure and/or files in this scenario?
Can I use debugfs to start at a directory inode and then use rdump ?
Should I just revert to file recovery tools like photorec ?
Is there a way to reduce the memory usage during e2fsck in this scenario ?
Other suggestions ?
(Ps don't have room for a full DD, but I could use lvm snapshots, atm the blockdevice is also marked read-only.)

I can live without this data, but it would be better if I could get it back.
(No backup is available, warned several times about it that total data loss was possible at any point.)

Kind regards,
Killian De Volder
(I should also put the reply of this email on the wiki.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-15  8:12 Recovery after mkfs.ext4 on a ext4 Killian De Volder
@ 2014-06-15 13:20 ` Theodore Ts'o
  2014-06-15 20:27   ` Killian De Volder
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2014-06-15 13:20 UTC (permalink / raw)
  To: Killian De Volder; +Cc: linux-ext4

On Sun, Jun 15, 2014 at 10:12:14AM +0200, Killian De Volder wrote:
> Excuse me for requesting this information,
> but I could not find any good leads on how to deal with this issue.
> Any help would be appreciated.
> 
> This is what happened:
> I accidentally exported the wrong block-device to a virtual machine.
> I ran mkfs.ext4 on this, already mounted ext4, block-device.
> As soon as I noticed the error,I stopped it.
> It got to "Writing superblocks and file system accounting information".
> (After Writing inode tables, and creating journal).

Ouch.  There are safety measures to prevent mke2fs from running on a
mounted file system.  However, these don't apply when you've exported
the block device via KVM.  It could be perhaps argued that qemu should
add this safety check, and at least warn before you exported a block
device already in use as a file system.  It's probably worth taking
that up with the qemu folks.

If it's any consolation the very latest version of e2fsprogs has a
safety check that might have caught the problem:

# mke2fs -t ext4 /dev/heap/scratch
mke2fs 1.42.10 (18-May-2014)
/dev/heap/scratch contains a ext4 file system labelled 'Important Stuff'
		  last mounted on Tue Jun  3 16:12:01 2014
Proceed anyway? (y,n) n

> Would it be better if I ran mkfs.ext4 -S ?

Probably note.  This is useful when the superblock and block group
descriptors have been destroyed, and that's not the case here.  The
fact that the volume name is the original means that you have at least
the original superblock, and the real problem here is that damage that
was caused by the portions of the inode table that were wiped out.  

> Can e2fsck recover the directory structure and/or files in this scenario?

Well, maybe.  The problem is what got destroyed....  given some of the
errors you have described, it looks like more than the inode table got
wiped.  It's quite possible that version of mke2fs used to create the
original file system is older than the one used in the guest OS.  For
example, we changed where we placed the journal at one point.  That
would explain some of the file system errors.

> Can I use debugfs to start at a directory inode and then use rdump ?

Again, maybe.  The problem is that if a particular subdirectory is
destroyed, then you won't find it via rdump.  E2fsck can relocate
files and subdirectories contained in damaged directories to
lost+found, which rdump obviously can't cope with.

> Should I just revert to file recovery tools like photorec ?

Sorry, I keep saying maybe.  Photorec will definitely recover more
files; however, you won't have the filename data, which may be quite
problematic.  If the files are self identifying via things like EXIF
tags, or MP3 tags, or other in-file metadata, then photorec works
really well. 

But if you are stiching together lots of small source files, or
component .tex files from several dozen different directories, it's
photorec may not be that much more useful.

> Is there a way to reduce the memory usage during e2fsck in this scenario ?

Sorry, not really.

Good luck,

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-15 13:20 ` Theodore Ts'o
@ 2014-06-15 20:27   ` Killian De Volder
  2014-06-15 21:44     ` Theodore Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Killian De Volder @ 2014-06-15 20:27 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On 15-06-14 15:20, Theodore Ts'o wrote:
> On Sun, Jun 15, 2014 at 10:12:14AM +0200, Killian De Volder wrote:
>
> It could be perhaps argued that qemu should
> add this safety check, and at least warn before you exported a block
> device already in use as a file system.  It's probably worth taking
> that up with the qemu folks.
It was Xen, but same issue.
> If it's any consolation the very latest version of e2fsprogs has a
> safety check that might have caught the problem:
>
> # mke2fs -t ext4 /dev/heap/scratch
> mke2fs 1.42.10 (18-May-2014)
> /dev/heap/scratch contains a ext4 file system labelled 'Important Stuff'
> 		  last mounted on Tue Jun  3 16:12:01 2014
> Proceed anyway? (y,n) n
Very good !
>> Can e2fsck recover the directory structure and/or files in this scenario?
> Well, maybe.  The problem is what got destroyed....  given some of the
> errors you have described, it looks like more than the inode table got
> wiped.  It's quite possible that version of mke2fs used to create the
> original file system is older than the one used in the guest OS.  For
> example, we changed where we placed the journal at one point.  That
> would explain some of the file system errors.
I assume this includes changing the journal size manually? (For the wiki.)
>> Can I use debugfs to start at a directory inode and then use rdump ?
> Again, maybe.  The problem is that if a particular subdirectory is
> destroyed, then you won't find it via rdump.  E2fsck can relocate
> files and subdirectories contained in damaged directories to
> lost+found, which rdump obviously can't cope with.
If you mount it read-only all  I  see is ./ ../ and ./lost+found/, so rdump is out the question.
I'm hoping on e2fsck.
>> Is there a way to reduce the memory usage during e2fsck in this scenario ?
> Sorry, not really.
Sometimes I think it's certain inodes causing the excessive memory usage cause.
20GiB sounds a lot when the normal -f fschk took less then 3GiB. (It's a 16TiB file system).
But suppose it needs more binary maps when the filesystem is this corrupt ?

Actually there might be one thing I could do, I should have a look at zram and zswap.
(Well after this e2fsck -nf check is done, which could be a day or 2...)

I've also been pondering about taking a lvm snapshot and running an actual repair.
(Instead of a testrun.) But I have no idea howbig the snapshot should be. Any indicators ?
> Good luck,
Thank you for the info and luck, I'll need it :)
> 						- Ted


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-15 20:27   ` Killian De Volder
@ 2014-06-15 21:44     ` Theodore Ts'o
  2014-06-23  6:09       ` Killian De Volder
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2014-06-15 21:44 UTC (permalink / raw)
  To: Killian De Volder; +Cc: linux-ext4

On Sun, Jun 15, 2014 at 10:27:08PM +0200, Killian De Volder wrote:
> >> Can e2fsck recover the directory structure and/or files in this scenario?
> > Well, maybe.  The problem is what got destroyed....  given some of the
> > errors you have described, it looks like more than the inode table got
> > wiped.  It's quite possible that version of mke2fs used to create the
> > original file system is older than the one used in the guest OS.  For
> > example, we changed where we placed the journal at one point.  That
> > would explain some of the file system errors.
> I assume this includes changing the journal size manually? (For the wiki.)

Yes, indeed.  Or any of the other mke2fs parameters.  This includes if
the file system was originally formated for ext3, and then was
upgraded to ext4, for example.  The size of the inode table, certain
file system features, etc., all can change the layout of where the
file system metadata gets placed.

All of these caveats apply for mke2fs -S as well, by the way.  The
mke2fs -S command basically reconstructs the superblock and block
group descriptors, and then exits afterwards, before celaring the
inode table.  It's unknown exactly how many of the disk blocks written
by your mke2fs running the guest OS actually made it to the disk, but
mercifully, the block group descriptors generally don't get written
out until after the inode table is cleared.  But that's one of the
reasons why I suggested that running mke2fs -S probably isn't going to
help.

> Sometimes I think it's certain inodes causing the excessive memory usage cause.
> 20GiB sounds a lot when the normal -f fschk took less then 3GiB. (It's a 16TiB file system).
> But suppose it needs more binary maps when the filesystem is this corrupt ?

E2fsck needs a lot more memory when dealing with a file systems where
some blocks are claimed by multiple inodes.  This is when pass
1b/1c/1d are invoked.  The e2fsck program also caches where the
directory blocks are located, but I doubt that's a particular concern
here.

> I've also been pondering about taking a lvm snapshot and running an
> actual repair.  (Instead of a testrun.) But I have no idea howbig
> the snapshot should be. Any indicators ?  > Good luck, Thank you for
> the info and luck, I'll need it :) > - Ted

Sorry, no clue.  It really depends on how badly damaged the file
system might be.

Something that could in theory be done is a modified version of e2fsck
that does not try to repair all file system corruption, but just
enough so that orphaned directories are mapped into lost+found, and
inodes that are so badly corrupted that it would cause the kernel
really complain are made inaccessible.  This would just be enough so
that the file system could be mounted read-only, or accessed via
debugfs's rdump, for data recovery purposes.

Unfortunately, I don't have time to implement such a best, but maybe
it's mode that could be added at some point in the future.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-15 21:44     ` Theodore Ts'o
@ 2014-06-23  6:09       ` Killian De Volder
  2014-06-23 12:37         ` Theodore Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Killian De Volder @ 2014-06-23  6:09 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On 15-06-14 23:44, Theodore Ts'o wrote:
>
>> Sometimes I think it's certain inodes causing the excessive memory usage cause.
>> 20GiB sounds a lot when the normal -f fschk took less then 3GiB. (It's a 16TiB file system).
>> But suppose it needs more binary maps when the filesystem is this corrupt ?
> E2fsck needs a lot more memory when dealing with a file systems where
> some blocks are claimed by multiple inodes.  This is when pass
> 1b/1c/1d are invoked.  The e2fsck program also caches where the
> directory blocks are located, but I doubt that's a particular concern
> here.
>
> Regards,
> 						- Ted
It's still checking due to the high amount of ram it's using.
However if I start a parallel check with -nf if find other errors the one with the high memory usage hasn't found yet ?

Should I start a new one, or is this not advised ?
As sometimes I think it's bad inodes causing artificial usage of memory.

Kind regards,
Killian De Volder

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-23  6:09       ` Killian De Volder
@ 2014-06-23 12:37         ` Theodore Ts'o
  2014-06-23 16:37           ` Killian De Volder
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2014-06-23 12:37 UTC (permalink / raw)
  To: Killian De Volder; +Cc: linux-ext4

On Mon, Jun 23, 2014 at 08:09:37AM +0200, Killian De Volder wrote:
> It's still checking due to the high amount of ram it's using.
> However if I start a parallel check with -nf if find other errors the one with the high memory usage hasn't found yet ?

No, definitely not that!  Running two e2fsck's in parallel will do far
more harm than good.

> Should I start a new one, or is this not advised ?
> As sometimes I think it's bad inodes causing artificial usage of memory.

What part of the e2fsck run are you in?  If you are in passes
1b/1c/1d, then one of the things you can do is to analyze the log
output to date, and individually investigate the inodes that were
reported as bad using debugfs.  You could then backup what was worth
backuping up out of those inodes, and then use the debugfs "clri"
command to zap the bad inode.  I have done that to reduce the number
of bad inodes to make e2fsck pass 1b, 1c, and 1d run faster.  But I've
never done it on a really huge file system, and it may not be worth
the effort.

What I'd probably do instead is to edit e2fsck to skip pass 1b, 1c,
and 1d, and then hope for the best.  The file system will still be
corrupted, and there is the chance that you will do some damage in the
later passes because you skipped passes 1b/c/d, but if the goal is to
get the file system in a state where you can safely mount it
read-only, that would probably be your best bet.

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-23 12:37         ` Theodore Ts'o
@ 2014-06-23 16:37           ` Killian De Volder
  2014-06-23 17:31             ` Theodore Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Killian De Volder @ 2014-06-23 16:37 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On 23-06-14 14:37, Theodore Ts'o wrote:
> On Mon, Jun 23, 2014 at 08:09:37AM +0200, Killian De Volder wrote:
>> It's still checking due to the high amount of ram it's using.
>> However if I start a parallel check with -nf if find other errors the one with the high memory usage hasn't found yet ?
> No, definitely not that!  Running two e2fsck's in parallel will do far
> more harm than good.
In parallel is a big word: the check repair is SOOO slow, it might as well been killed when the second (read-only) test is done.
I once has a OOM because of tomuch ZRAM allocated, after I restarted e2fsck, it found more error before going into massive ram-usage.
So I was wonder what would happen if I restarted it.
>
>> Should I start a new one, or is this not advised ?
>> As sometimes I think it's bad inodes causing artificial usage of memory.
> What part of the e2fsck run are you in?  If you are in passes
> 1b/1c/1d, then one of the things you can do is to analyze the log
Pass 1: Checking inodes, blocks, and sizes
Notthing else below this except things like:

Too many illegal blocks in inode 488.
Clear inode<y>? yes

But no mention of any next pass.

This is the stack it's "stuck" on: (should compile one with debugging data)
#4  0x00007f1b0f1a0edb in block_iterate_dind ()
   from /lib64/libext2fs.so.2
#5  0x00007f1b0f1a1950 in ext2fs_block_iterate3 ()
   from /lib64/libext2fs.so.2
#6  0x00000000004118c3 in check_blocks ()
#7  0x0000000000412921 in process_inodes.part.6 ()
#8  0x0000000000413923 in e2fsck_pass1 ()
#9  0x000000000040e2cf in e2fsck_run ()
#10 0x000000000040a8e5 in main ()

So this is passA correct ?

> output to date, and individually investigate the inodes that were
> reported as bad using debugfs.  You could then backup what was worth
> backuping up out of those inodes, and then use the debugfs "clri"
> command to zap the bad inode.  I have done that to reduce the number
> of bad inodes to make e2fsck pass 1b, 1c, and 1d run faster.  But I've
> never done it on a really huge file system, and it may not be worth
> the effort.
>
> What I'd probably do instead is to edit e2fsck to skip pass 1b, 1c,
> and 1d, and then hope for the best.  The file system will still be
> corrupted, and there is the chance that you will do some damage in the
> later passes because you skipped passes 1b/c/d, but if the goal is to
> get the file system in a state where you can safely mount it
> read-only, that would probably be your best bet.
>
> 						- Ted
>
Regards,
Killian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-23 16:37           ` Killian De Volder
@ 2014-06-23 17:31             ` Theodore Ts'o
  2014-06-23 18:34               ` Killian De Volder
  2015-03-22  8:19               ` Killian De Volder
  0 siblings, 2 replies; 11+ messages in thread
From: Theodore Ts'o @ 2014-06-23 17:31 UTC (permalink / raw)
  To: Killian De Volder; +Cc: linux-ext4

On Mon, Jun 23, 2014 at 06:37:20PM +0200, Killian De Volder wrote:
> On 23-06-14 14:37, Theodore Ts'o wrote:
> > On Mon, Jun 23, 2014 at 08:09:37AM +0200, Killian De Volder wrote:
> >> It's still checking due to the high amount of ram it's using.
> >> However if I start a parallel check with -nf if find other errors the one with the high memory usage hasn't found yet ?
> > No, definitely not that!  Running two e2fsck's in parallel will do far
> > more harm than good.
> In parallel is a big word: the check repair is SOOO slow, it might as well been killed when the second (read-only) test is done.
> I once has a OOM because of tomuch ZRAM allocated, after I restarted e2fsck, it found more error before going into massive ram-usage.
> So I was wonder what would happen if I restarted it.
> >
> >> Should I start a new one, or is this not advised ?
> >> As sometimes I think it's bad inodes causing artificial usage of memory.
> > What part of the e2fsck run are you in?  If you are in passes
> > 1b/1c/1d, then one of the things you can do is to analyze the log
> Pass 1: Checking inodes, blocks, and sizes
> Notthing else below this except things like:
> 
> Too many illegal blocks in inode 488.
> Clear inode<y>? yes

Does it stop after one of these messages without displaying anything
else?  Or does it just continue emitting a large number of these
messages?  And is the time between each one getting longer and longer?

We do actually keep a linked list of these inode numbers so we can try
to report a directory name so you know which file has been trashed.
This happens in pass #2, so the inodes which are invalid are stored in
pass #1 and only removed in pass #2.  

So if you are seeing gazillions of bad inodes, that could very easily
be what's going on.  If so, I can imagine having some mode that we
enter after a hundred inodes where we just ask permission to blow away
all of the corrupted inodes in pass #1, without waiting until we can
give you a proper pathname.

The other possibility is that a particular indode is so badly
corrupted that we're looping trying to evaluate a particular inode.
That's why I'm asking if e2fsck is has just stopped and not printing
any more messages, in what might be an apparent infinite loop.

						 - Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-23 17:31             ` Theodore Ts'o
@ 2014-06-23 18:34               ` Killian De Volder
  2015-03-22  8:19               ` Killian De Volder
  1 sibling, 0 replies; 11+ messages in thread
From: Killian De Volder @ 2014-06-23 18:34 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On 23-06-14 19:31, Theodore Ts'o wrote:
> On Mon, Jun 23, 2014 at 06:37:20PM +0200, Killian De Volder wrote:
>> On 23-06-14 14:37, Theodore Ts'o wrote:
>>> On Mon, Jun 23, 2014 at 08:09:37AM +0200, Killian De Volder wrote:
>>>> It's still checking due to the high amount of ram it's using.
>>>> However if I start a parallel check with -nf if find other errors the one with the high memory usage hasn't found yet ?
>>> No, definitely not that!  Running two e2fsck's in parallel will do far
>>> more harm than good.
>> In parallel is a big word: the check repair is SOOO slow, it might as well been killed when the second (read-only) test is done.
>> I once has a OOM because of tomuch ZRAM allocated, after I restarted e2fsck, it found more error before going into massive ram-usage.
>> So I was wonder what would happen if I restarted it.
>>>> Should I start a new one, or is this not advised ?
>>>> As sometimes I think it's bad inodes causing artificial usage of memory.
>>> What part of the e2fsck run are you in?  If you are in passes
>>> 1b/1c/1d, then one of the things you can do is to analyze the log
>> Pass 1: Checking inodes, blocks, and sizes
>> Notthing else below this except things like:
>>
>> Too many illegal blocks in inode 488.
>> Clear inode<y>? yes
> Does it stop after one of these messages without displaying anything
> else?  Or does it just continue emitting a large number of these
> messages?  And is the time between each one getting longer and longer?
>
> We do actually keep a linked list of these inode numbers so we can try
> to report a directory name so you know which file has been trashed.
> This happens in pass #2, so the inodes which are invalid are stored in
> pass #1 and only removed in pass #2.  
>
> So if you are seeing gazillions of bad inodes, that could very easily
> be what's going on.  If so, I can imagine having some mode that we
> enter after a hundred inodes where we just ask permission to blow away
> all of the corrupted inodes in pass #1, without waiting until we can
> give you a proper pathname.
>
> The other possibility is that a particular indode is so badly
> corrupted that we're looping trying to evaluate a particular inode.
> That's why I'm asking if e2fsck is has just stopped and not printing
> any more messages, in what might be an apparent infinite loop.
>
> 						 - Ted
>
Yes, this is the output so far of this fsck attempt:

Pass 1: Checking inodes, blocks, and sizes

Inode 488 is too big.  Truncate<y>? yes
Block #563048161 (3717262637) causes file to be too big.  CLEARED.
Block #563048162 (3068047020) causes file to be too big.  CLEARED.
Block #563048163 (3476424287) causes file to be too big.  CLEARED.
Block #563048164 (301063316) causes file to be too big.  CLEARED.
Block #563048165 (12584754) causes file to be too big.  CLEARED.
Block #563048166 (528287744) causes file to be too big.  CLEARED.
Block #563048167 (2728512811) causes file to be too big.  CLEARED.
Block #563048168 (1152011501) causes file to be too big.  CLEARED.
Block #563048169 (692919630) causes file to be too big.  CLEARED.
Block #563048170 (3050472104) causes file to be too big.  CLEARED.
Block #563048171 (2888907055) causes file to be too big.  CLEARED.
Too many illegal blocks in inode 488.
Clear inode<y>? yes
Inode 435, i_size is 5006055699917260305, should be 0.  Fix<y>? yes
Inode 435, i_blocks is 190421251318606, should be 0.  Fix<y>? yes
Inode 407 has compression flag set on filesystem without compression support.  Clear<y>? yes

The first times I ran fsck it found quite a few (after which they crashed due to OOM, and other issues not related to fsck).
The following times it only  found 1 to 3 of these before starting to eat memory.
- Killian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2014-06-23 17:31             ` Theodore Ts'o
  2014-06-23 18:34               ` Killian De Volder
@ 2015-03-22  8:19               ` Killian De Volder
  2015-03-22 20:19                 ` Theodore Ts'o
  1 sibling, 1 reply; 11+ messages in thread
From: Killian De Volder @ 2015-03-22  8:19 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On 23-06-14 19:31, Theodore Ts'o wrote:
>
> ...
> We do actually keep a linked list of these inode numbers so we can try
> to report a directory name so you know which file has been trashed.
> This happens in pass #2, so the inodes which are invalid are stored in
> pass #1 and only removed in pass #2.  
>
> So if you are seeing gazillions of bad inodes, that could very easily
> be what's going on.  If so, I can imagine having some mode that we
> enter after a hundred inodes where we just ask permission to blow away
> all of the corrupted inodes in pass #1, without waiting until we can
> give you a proper pathname.
> ...
>
>
> 						 - Ted
>
Been thinking, maybe I should rewrite this code to used linked-arrays ?
Linked lists are painfully slow on swap. (and cpu too because of all the cache misses)
Or are we doing a lot of inserts ?

Think it might be worth it ? Would take me a week to learn the code in e2fsck though...
Biggest fear I have is making a mistake in the code that causes silent bugs.

Kind regards, Killian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Recovery after mkfs.ext4 on a ext4
  2015-03-22  8:19               ` Killian De Volder
@ 2015-03-22 20:19                 ` Theodore Ts'o
  0 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2015-03-22 20:19 UTC (permalink / raw)
  To: Killian De Volder; +Cc: linux-ext4

On Sun, Mar 22, 2015 at 09:19:17AM +0100, Killian De Volder wrote:
> On 23-06-14 19:31, Theodore Ts'o wrote:
> >
> > ...
> > We do actually keep a linked list of these inode numbers so we can try
> > to report a directory name so you know which file has been trashed.
> > This happens in pass #2, so the inodes which are invalid are stored in
> > pass #1 and only removed in pass #2.  
> >
> > So if you are seeing gazillions of bad inodes, that could very easily
> > be what's going on.  If so, I can imagine having some mode that we
> > enter after a hundred inodes where we just ask permission to blow away
> > all of the corrupted inodes in pass #1, without waiting until we can
> > give you a proper pathname.
> > ...
> >
> >
> > 						 - Ted
> >
> Been thinking, maybe I should rewrite this code to used linked-arrays ?
> Linked lists are painfully slow on swap. (and cpu too because of all the cache misses)
> Or are we doing a lot of inserts ?

I'm not sure why I said linked lists last June, but that's actually
not correct.  In old versions of e2fsck, we used a bitmap to mark
which inodes were bad.  In newer versions of e2fscks, we have an
alternative representation for bitmaps where we can used red/black
trees for extents (contiguous regions of blocks or inodes that are
"set" in the bitmap).

In any case, I don't think trying to further optimize how we store bad
inodes is really worth it.  It happens extremely rarely, and it's
probably more useful to consider how we can avoid these sorts of
things from happening in the future.

For example, e2fsck and mke2fs will try to open the file system using
the O_EXCL flag.  The kernel will not allow the file system to opened
if either (a) the file system is mounted, or (b) some other file
system has the block device opened using O_EXCL.  From looking at the
e-mail thread history, the problem was that you accidentally ran
mke2fs on a file system that was mounted via qemu.  So if qemu opens
its block devices with O_EXCL, that would avoid a lot of problems.

In addition, newer versions of mke2fs will now warn you when you try
running mke2fs on an existing file system:

mke2fs 1.42.12 (29-Aug-2014)
/dev/sda3 contains a ext4 file system labelled 'test-filesystem'
	created on Sun Mar 22 16:18:03 2015
Proceed anyway? (y,n)

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-03-22 20:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-15  8:12 Recovery after mkfs.ext4 on a ext4 Killian De Volder
2014-06-15 13:20 ` Theodore Ts'o
2014-06-15 20:27   ` Killian De Volder
2014-06-15 21:44     ` Theodore Ts'o
2014-06-23  6:09       ` Killian De Volder
2014-06-23 12:37         ` Theodore Ts'o
2014-06-23 16:37           ` Killian De Volder
2014-06-23 17:31             ` Theodore Ts'o
2014-06-23 18:34               ` Killian De Volder
2015-03-22  8:19               ` Killian De Volder
2015-03-22 20:19                 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).