* tune2fs -I seems dangerous @ 2008-12-04 22:26 Eric Sandeen 2008-12-04 23:14 ` Jonathan Bastien-Filiatrault 2008-12-05 1:24 ` Theodore Tso 0 siblings, 2 replies; 5+ messages in thread From: Eric Sandeen @ 2008-12-04 22:26 UTC (permalink / raw) To: ext4 development As a small experiment... dd if=/dev/zero of=fsfile bs=1M count=16 mkfs.ext4 -F -I 128 fsfile mkdir -p mnt mount -o loop fsfile mnt for I in `seq 1 4096`; do echo $I > mnt/file.$I; done umount mnt tune2fs -I 256 fsfile e2fsck -fy fsfile ... this yields 10031 lines of fsck output, and results in about 38% of the files that were on the filesystem going missing. I don't have the strong sense that tune2fs -I has been shaken out at all; should it be shipping as a useable option? Thanks, -Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: tune2fs -I seems dangerous 2008-12-04 22:26 tune2fs -I seems dangerous Eric Sandeen @ 2008-12-04 23:14 ` Jonathan Bastien-Filiatrault 2008-12-05 1:25 ` Theodore Tso 2008-12-05 1:24 ` Theodore Tso 1 sibling, 1 reply; 5+ messages in thread From: Jonathan Bastien-Filiatrault @ 2008-12-04 23:14 UTC (permalink / raw) To: Eric Sandeen; +Cc: ext4 development Eric Sandeen wrote: > As a small experiment... > [snip] > ... this yields 10031 lines of fsck output, and results in about 38% of > the files that were on the filesystem going missing. I have had a similar experience converting from 128 to 256 bytes inodes. After a while, tune2fs -I would simply stop doing IO and using a lot of CPU with a few IO bursts every half-hour or so. This is on a recent/fast x86-64 computer. I had to cancel the thing after leaving it running for over 24 hours. > > I don't have the strong sense that tune2fs -I has been shaken out at > all; should it be shipping as a useable option? Maybe add a --accept-consequences-of-shooting-myself-in-the-foot flag ? Cheers, Jonathan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: tune2fs -I seems dangerous 2008-12-04 23:14 ` Jonathan Bastien-Filiatrault @ 2008-12-05 1:25 ` Theodore Tso 0 siblings, 0 replies; 5+ messages in thread From: Theodore Tso @ 2008-12-05 1:25 UTC (permalink / raw) To: Jonathan Bastien-Filiatrault; +Cc: Eric Sandeen, ext4 development On Thu, Dec 04, 2008 at 06:14:36PM -0500, Jonathan Bastien-Filiatrault wrote: > I have had a similar experience converting from 128 to 256 bytes inodes. > After a while, tune2fs -I would simply stop doing IO and using a lot of > CPU with a few IO bursts every half-hour or so. This is on a recent/fast > x86-64 computer. I had to cancel the thing after leaving it running for > over 24 hours. The speed problems have been reported and mostly addressed in the e2fsprogs git repository. There's one more patch that should completely solve the tune2fs -I performance problem, but I haven't had time to audit the patch just yet. It is on my to do list. - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: tune2fs -I seems dangerous 2008-12-04 22:26 tune2fs -I seems dangerous Eric Sandeen 2008-12-04 23:14 ` Jonathan Bastien-Filiatrault @ 2008-12-05 1:24 ` Theodore Tso 2008-12-05 3:08 ` Eric Sandeen 1 sibling, 1 reply; 5+ messages in thread From: Theodore Tso @ 2008-12-05 1:24 UTC (permalink / raw) To: Eric Sandeen; +Cc: ext4 development On Thu, Dec 04, 2008 at 04:26:47PM -0600, Eric Sandeen wrote: > As a small experiment... > > dd if=/dev/zero of=fsfile bs=1M count=16 > mkfs.ext4 -F -I 128 fsfile > mkdir -p mnt > mount -o loop fsfile mnt > for I in `seq 1 4096`; do echo $I > mnt/file.$I; done > umount mnt > tune2fs -I 256 fsfile > e2fsck -fy fsfile > > ... this yields 10031 lines of fsck output, and results in about 38% of > the files that were on the filesystem going missing. Looks like the problem is that tune2fs -I was only tested on ext3 filesystem. It blows up rather spectacularly on filesystems with the flex_bg option, and it's apparently not updating the checksums if the uninit_bg option is specified. > I don't have the strong sense that tune2fs -I has been shaken out at > all; should it be shipping as a useable option? It needs some TLC, that's for certain. Move of the code was copied from resize2fs, so it's pretty paranoid about error checking and so on. The major problems from when the code was adapted for use in expanding the inode table, and the algorithm that tries to do that work. The major problem is seems to be that it's not double checking to make sure that all of the blocks that it needs to move in order to expand the inode table are in fact moveable. Specifically, the code is not checking and will blindly assume success when in fact things are *not* successful under the following conditions: 1) Flex_bg is enabled, and there is an inode table for a subsequent block group immediately following the inode table. 2) There is a block from the bad block inode immediately following the inode table (which is really bad). Tune2fs -I will not notice, relocate the block in the bad block, and then write the inode table onto the bad block, possibly causing the loss of up to 16 inodes per bad block immediately following the inode table. 3) The filesystem is formatted for RAID so there is stride setting which causes the block or inode bitmap to be located immediately following the inode. This will be caught be e2fsck, if the user is paranoid enough to run e2fsck immediately after tune2fs -I. I think is fair, though, to say that tune2fs -I code was written by someone who wasn't sufficiently paranoid to think through all of the failure cases. There is in fact a FIXME!! comment for case #2, but at the very least what should have happend is that the move_block should keep track of how many blocks were moved, and if it wasn't equal to needed blocks, it should have signalled an error because it would have indicated either a programming bug or a hardware bug or a filesystem corruption bug. Either way, it shouldn't move forward because there is the risk that users' files might get destroyed. - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: tune2fs -I seems dangerous 2008-12-05 1:24 ` Theodore Tso @ 2008-12-05 3:08 ` Eric Sandeen 0 siblings, 0 replies; 5+ messages in thread From: Eric Sandeen @ 2008-12-05 3:08 UTC (permalink / raw) To: Theodore Tso; +Cc: ext4 development Theodore Tso wrote: > I think is fair, though, to say that tune2fs -I code was written by > someone who wasn't sufficiently paranoid to think through all of the > failure cases. There is in fact a FIXME!! comment for case #2, but at > the very least what should have happend is that the move_block should > keep track of how many blocks were moved, and if it wasn't equal to > needed blocks, it should have signalled an error because it would have > indicated either a programming bug or a hardware bug or a filesystem > corruption bug. Either way, it shouldn't move forward because there > is the risk that users' files might get destroyed. So for kicks, when I try ext3: mkfs.ext3 -F -I 128 -N 16384 fsfile mount -o loop fsfile mnt for I in `seq 1 16384`; do echo $I > mnt/file.$I; done umount mnt tune2fs -I 256 fsfile tune2fs 1.41.3 (12-Oct-2008) Error in resizing the inode size. Run e2undo to undo the file system changes. e2undo Usage: e2undo <transaction file> <filesystem> it seems to me that a) it'd be nice to know what the error was (likely no room for the larger inodes?) and maybe not even begin if you know you'll hit an error due to free space, and b) maybe e2undo (or the previous command) should give you a hint about where the e2undo transaction file is? It's no my intent to pick on anyone who wrote the code, but it seems like maybe this should be disabled in the next release unless these issues get ironed out... Thanks, -Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-12-05 3:08 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-12-04 22:26 tune2fs -I seems dangerous Eric Sandeen 2008-12-04 23:14 ` Jonathan Bastien-Filiatrault 2008-12-05 1:25 ` Theodore Tso 2008-12-05 1:24 ` Theodore Tso 2008-12-05 3:08 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).