* "tune2fs -I 256" runtime---can it be interrupted safely? @ 2008-11-01 19:10 Michael B. Trausch 2008-11-01 23:46 ` Michael B. Trausch 0 siblings, 1 reply; 9+ messages in thread From: Michael B. Trausch @ 2008-11-01 19:10 UTC (permalink / raw) To: linux-ext4 [-- Attachment #1: Type: text/plain, Size: 1069 bytes --] Hello, I have a drive (a 250 GB external USB hard drive) which I wanted to convert to 256 byte inodes and start using as an ext4 drive (currently I use it as ext3). However, I started the tune2fs operation on 01 Nov 2008 02:27:34 -0400, and it is currently nearly 13 hours later, and the operation is _still_ going at 100% of one core and operating on the drive. Three questions: * Is this normal? * How long can I expect this process to continue? * Can it be (safely) interrupted if it will continue for a long time? I converted my home directory to ext4 without issue and so I (stupidly, I admit) did not take a backup of this 250 GB drive's contents before I started the conversion process. Oops. Lesson learned. My thinking at this point is that it would take _far_ less time to interrupt, backup, and just use mkfs.ext4 on the drive and then restore (about 2 hours instead of an unknown quantity which is already nearing 13 hours). Thanks, Mike Trausch -- My sigfile ran away and is on hiatus. http://www.trausch.us/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "tune2fs -I 256" runtime---can it be interrupted safely? 2008-11-01 19:10 "tune2fs -I 256" runtime---can it be interrupted safely? Michael B. Trausch @ 2008-11-01 23:46 ` Michael B. Trausch 2008-11-07 21:16 ` Andreas Dilger 0 siblings, 1 reply; 9+ messages in thread From: Michael B. Trausch @ 2008-11-01 23:46 UTC (permalink / raw) To: linux-ext4 [-- Attachment #1: Type: text/plain, Size: 3395 bytes --] On Sat, 1 Nov 2008 15:10:02 -0400 "Michael B. Trausch" <mike@trausch.us> wrote: > > I converted my home directory to ext4 without issue and so I > (stupidly, I admit) did not take a backup of this 250 GB drive's > contents before I started the conversion process. Oops. Lesson > learned. My thinking at this point is that it would take _far_ less > time to interrupt, backup, and just use mkfs.ext4 on the drive and > then restore (about 2 hours instead of an unknown quantity which is > already nearing 13 hours). > At 18:01:34 (after 15 hours and change) I decided to give it up and sent tune2fs a SIGINT. It happily aborted, and fsck appears to have fixed whatever inconsistencies were introduced during the tune2fs run and the files on the volume appear to be perfectly intact. Am going to go the route of simply pulling the data off and using mkfs.ext4 to create a new filesystem on the device. But, I am still curious as to why this would run for so long. Some volume statistics, in case they may be of help in trying to determine why this process was running for so long; I don't even know how much longer it would have run for. Stats from fsck: 651384 inodes used (4.27%) 788 non-contiguous inodes (0.1%) # of inodes with ind/dind/tind blocks: 27974/1745/8 28464107 blocks used (46.62%) 0 bad blocks 14 large files 553474 regular files 90764 directories 0 character device files 0 block device files 0 fifos 3 links 7136 symbolic links (7093 fast symbolic links) 1 socket -------- 651378 files Stats from dumpe2fs: Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: 95904edb-b501-4c3c-9bec-b9f046b04e3b Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 15269888 Block count: 61049596 Reserved block count: 3052479 Free blocks: 32585489 Free inodes: 14618504 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1009 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 256 Filesystem created: Mon Jul 7 23:09:06 2008 Last mount time: Sat Nov 1 18:20:45 2008 Last write time: Sat Nov 1 18:20:45 2008 Mount count: 1 Maximum mount count: 27 Last checked: Sat Nov 1 18:18:24 2008 Check interval: 15552000 (6 months) Next check after: Thu Apr 30 18:18:24 2009 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 52aba0f3-6da7-4e1a-b4b3-e8c51e8168fc Journal backup: inode blocks Journal size: 128M --- Mike -- My sigfile ran away and is on hiatus. http://www.trausch.us/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "tune2fs -I 256" runtime---can it be interrupted safely? 2008-11-01 23:46 ` Michael B. Trausch @ 2008-11-07 21:16 ` Andreas Dilger 2008-11-11 16:09 ` Michael B. Trausch 0 siblings, 1 reply; 9+ messages in thread From: Andreas Dilger @ 2008-11-07 21:16 UTC (permalink / raw) To: Michael B. Trausch; +Cc: linux-ext4 On Nov 01, 2008 19:46 -0400, Michael B. Trausch wrote: > On Sat, 1 Nov 2008 15:10:02 -0400 > "Michael B. Trausch" <mike@trausch.us> wrote: > > I converted my home directory to ext4 without issue and so I > > (stupidly, I admit) did not take a backup of this 250 GB drive's > > contents before I started the conversion process. Oops. Lesson > > learned. My thinking at this point is that it would take _far_ less > > time to interrupt, backup, and just use mkfs.ext4 on the drive and > > then restore (about 2 hours instead of an unknown quantity which is > > already nearing 13 hours). > > At 18:01:34 (after 15 hours and change) I decided to give it up and > sent tune2fs a SIGINT. It happily aborted, and fsck appears to have > fixed whatever inconsistencies were introduced during the tune2fs run > and the files on the volume appear to be perfectly intact. That is good news. I would have worried that interrupting this operation isn't very safe, because it is moving some core filesystem data around. > Am going to go the route of simply pulling the data off and using > mkfs.ext4 to create a new filesystem on the device. But, I am still > curious as to why this would run for so long. Some volume statistics, > in case they may be of help in trying to determine why this process was > running for so long; I don't even know how much longer it would have > run for. > Inode size: 128 It looks like it did not finish the inode resize operation, for whatever reason. If you have time, after the backup is done could you please re-try the "tune2fs -I 256" operation and let it go for a while, then "ltrace -p {pid}" on the process from another window to see what it is doing (if anything). You may need to have the "e2fsprogs-debuginfo" RPM installed to get useful data there. After grabbing some of the data with ltrace, also run "strace -p {pid}" to get disk IO information, and finally "gdb -p {pid}" and grab a stack trace with "bt", let it continue for a few seconds with "c", and then CTRL-C again and grab another stack trace with "bt". I'm assuming the code shouldn't actually have taken 13h to complete, but is instead confused about something in the filesystem and is looping. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "tune2fs -I 256" runtime---can it be interrupted safely? 2008-11-07 21:16 ` Andreas Dilger @ 2008-11-11 16:09 ` Michael B. Trausch 2008-11-14 8:30 ` Andreas Dilger 0 siblings, 1 reply; 9+ messages in thread From: Michael B. Trausch @ 2008-11-11 16:09 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-ext4 [-- Attachment #1: Type: text/plain, Size: 1521 bytes --] On Fri, 07 Nov 2008 14:16:43 -0700 Andreas Dilger <adilger@sun.com> wrote: > If you have time, after the backup is done could you please re-try the > "tune2fs -I 256" operation and let it go for a while, then "ltrace -p > {pid}" on the process from another window to see what it is doing (if > anything). You may need to have the "e2fsprogs-debuginfo" RPM > installed to get useful data there. > > After grabbing some of the data with ltrace, also run "strace -p > {pid}" to get disk IO information, and finally "gdb -p {pid}" and > grab a stack trace with "bt", let it continue for a few seconds with > "c", and then CTRL-C again and grab another stack trace with "bt". Ugh, I forgot to do the gdb backtraces. *headdesk* I will do another run with them later. I do have an ltrace up and online at: http://fd0man.theunixplace.com/tune2fs.ltrace.bz2 The bzip2 file is 44 MB, and the uncompressed text is _huge_ (about 3 GB). This is a two hour run of "tune2fs -I 256 /dev/sdb" on an external 250 GB (235 GiB) drive formatted with the ext3 filesystem. I aborted the tune2fs run after 2 hours, though I am sure it would have run for far longer. Hopefully, this will give some insight as to what the problem might be. I _was_ hoping that it would make a difference in runtime that the filesystem was empty (I did this right after "mkfs.ext3 -I 128 /dev/sdb"), but it was still going very slowly... --- Mike -- My sigfile ran away and is on hiatus. http://www.trausch.us/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "tune2fs -I 256" runtime---can it be interrupted safely? 2008-11-11 16:09 ` Michael B. Trausch @ 2008-11-14 8:30 ` Andreas Dilger 0 siblings, 0 replies; 9+ messages in thread From: Andreas Dilger @ 2008-11-14 8:30 UTC (permalink / raw) To: Michael B. Trausch; +Cc: linux-ext4 On Nov 11, 2008 11:09 -0500, Michael B. Trausch wrote: > I do have an ltrace up and online at: > > http://fd0man.theunixplace.com/tune2fs.ltrace.bz2 > > The bzip2 file is 44 MB, and the uncompressed text is _huge_ (about 3 > GB). This is a two hour run of "tune2fs -I 256 /dev/sdb" on an > external 250 GB (235 GiB) drive formatted with the ext3 filesystem. > I aborted the tune2fs run after 2 hours, though I am sure it would have > run for far longer. Hopefully, this will give some insight as to what > the problem might be. It looks like the code is just spending hours looking at the bitmaps, but I have no idea why: ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e1, 1, 0, 0x1f418b0) = 0 ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e2, 2, 1, 0x1f418b0) = 0 ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e3, 4, 2, 0x1f418b0) = 0 ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e4, 8, 3, 0x1f418b0) = 0 ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e5, 16, 4, 0x1f418b0) = 0 ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e6, 32, 5, 0x1f418b0) = 0 ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e7, 64, 6, 0x1f418b0) = 0 ext2fs_test_generic_bitmap(0x1f3d020, 0xfa31e8, 128, 7, 0x1f418b0) = 0 Possibly the code is very inefficient and is scanning every bit in the bitmap individually to see if an inode is in use, instead of doing something fast like just checking an entire bitmap word == 0 and skipping it. Unfortunately, ltrace doesn't show what is calling this code, so a stack trace is needed from GDB. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "tune2fs -I 256" runtime---can it be interrupted safely?
@ 2008-11-14 12:49 Andreas Schultz
2008-11-14 21:06 ` Andreas Dilger
0 siblings, 1 reply; 9+ messages in thread
From: Andreas Schultz @ 2008-11-14 12:49 UTC (permalink / raw)
To: linux-ext4; +Cc: Andreas Dilger
[-- Attachment #1: Type: text/plain, Size: 676 bytes --]
Hi,
[...]
> Unfortunately, ltrace doesn't show what is calling this code, so
> a stack trace is needed from GDB.
Got the same problem, here is a GDB backtrace done on Ubuntu Intrepid, e2fsprogs 1.41.3:
(gdb) bt
#0 0x00007fa619228bcc in ext2fs_new_block (fs=0x1fea860, goal=263012, map=0x1feaa30, ret=0x7fff21668cac) at /build/buildd/e2fsprogs-1.41.3/lib/ext2fs/alloc.c:166
#1 0x000000000040408d in resize_inode (fs=0x1fea860, new_size=<value optimized out>) at /build/buildd/e2fsprogs-1.41.3/misc/tune2fs.c:1020
#2 0x0000000000404d83 in main (argc=<value optimized out>, argv=<value optimized out>) at /build/buildd/e2fsprogs-1.41.3/misc/tune2fs.c:1638
Cheers
Andreas
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: "tune2fs -I 256" runtime---can it be interrupted safely? 2008-11-14 12:49 Andreas Schultz @ 2008-11-14 21:06 ` Andreas Dilger 2008-11-15 5:50 ` Theodore Tso 0 siblings, 1 reply; 9+ messages in thread From: Andreas Dilger @ 2008-11-14 21:06 UTC (permalink / raw) To: Andreas Schultz; +Cc: linux-ext4, Aneesh Kumar On Nov 14, 2008 13:49 +0100, Andreas Schultz wrote: > > Unfortunately, ltrace doesn't show what is calling this code, so > > a stack trace is needed from GDB. > > Got the same problem, here is a GDB backtrace done on Ubuntu Intrepid, > e2fsprogs 1.41.3: > > (gdb) bt > #0 0x00007fa619228bcc in ext2fs_new_block (fs=0x1fea860, goal=263012, map=0x1feaa30, ret=0x7fff21668cac) at /build/buildd/e2fsprogs-1.41.3/lib/ext2fs/alloc.c:166 > #1 0x000000000040408d in resize_inode (fs=0x1fea860, new_size=<value optimized out>) at /build/buildd/e2fsprogs-1.41.3/misc/tune2fs.c:1020 > #2 0x0000000000404d83 in main (argc=<value optimized out>, argv=<value optimized out>) at /build/buildd/e2fsprogs-1.41.3/misc/tune2fs.c:1638 Ah, it is trying to find free blocks, but the e2fsprogs allocator is very inefficient, IIRC, doing a linear scan of the filesystem. We probably would be far better off to generate an RB tree of the block maps so that it is easier to work with lists of blocks that need to be moved or marked in use for the resized inode table. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "tune2fs -I 256" runtime---can it be interrupted safely? 2008-11-14 21:06 ` Andreas Dilger @ 2008-11-15 5:50 ` Theodore Tso 2008-11-16 12:11 ` Andreas Schultz 0 siblings, 1 reply; 9+ messages in thread From: Theodore Tso @ 2008-11-15 5:50 UTC (permalink / raw) To: Andreas Dilger; +Cc: Andreas Schultz, linux-ext4, Aneesh Kumar On Fri, Nov 14, 2008 at 02:06:12PM -0700, Andreas Dilger wrote: > > Ah, it is trying to find free blocks, but the e2fsprogs allocator is very > inefficient, IIRC, doing a linear scan of the filesystem. We probably > would be far better off to generate an RB tree of the block maps so that > it is easier to work with lists of blocks that need to be moved or marked > in use for the resized inode table. We eventually could use a more sophisticated block allocator in the ext2fs userspace library, but there's a quick fix should do the job without having to do something more complicated. The problem is that the goal block is always reset to blk, which makes this an O(n**2) operation in the number of in-use blocks in the filesystem. This following patch makes it be O(n), which should make it be a marked improvement. Running a profile analysis on tune2fs, there's another O(n*m) operation in transalate_block() (yes, the function name is mispelled), where n is the number of blocks that needs to be moved, and m is the number of in-use blocks in the filesystems. Fixing both of these should remove most of the user-space time that's being burned by tune2fs -I 256. There is still some CPU burned by the undo file, and indeed we can probably speed this up some more by omitting creating an undo entry when we're copying a data block to a previously unused block. But hopefully this should be enough to speed up "tune2fs -I 256" for most people. - Ted commit 27c6de45a4187a348ec0960472d4a113ee6ea425 Author: Theodore Ts'o <tytso@mit.edu> Date: Sat Nov 15 00:32:39 2008 -0500 tune2fs: Fix inefficient O(n**2) algorithms when expanding the inode size When running "tune2fs -I 256" on moderate to large filesystems, the time required to run tune2fs can take many hours (20+ before some users gave up in disgust). This was due to some O(n**2) and O(n*m) algorithms in move_block() and inode_scan_and_fix(), respectively. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/misc/tune2fs.c b/misc/tune2fs.c index b29b344..e72518a 100644 --- a/misc/tune2fs.c +++ b/misc/tune2fs.c @@ -1011,13 +1011,13 @@ static int move_block(ext2_filsys fs, ext2fs_block_bitmap bmap) if (retval) return retval; - for (blk = fs->super->s_first_data_block; - blk < fs->super->s_blocks_count; blk++) { + for (new_blk = blk = fs->super->s_first_data_block; + blk < fs->super->s_blocks_count; blk++) { if (!ext2fs_test_block_bitmap(bmap, blk)) continue; - retval = ext2fs_new_block(fs, blk, NULL, &new_blk); + retval = ext2fs_new_block(fs, new_blk, NULL, &new_blk); if (retval) goto err_out; @@ -1068,12 +1068,14 @@ static int process_block(ext2_filsys fs EXT2FS_ATTR((unused)), e2_blkcnt_t blockcnt EXT2FS_ATTR((unused)), blk_t ref_block EXT2FS_ATTR((unused)), int ref_offset EXT2FS_ATTR((unused)), - void *priv_data EXT2FS_ATTR((unused))) + void *priv_data) { int ret = 0; blk_t new_blk; + ext2fs_block_bitmap bmap = (ext2fs_block_bitmap) priv_data; - + if (!ext2fs_test_block_bitmap(bmap, *block_nr)) + return 0; new_blk = transalate_block(*block_nr); if (new_blk) { *block_nr = new_blk; @@ -1086,7 +1088,7 @@ static int process_block(ext2_filsys fs EXT2FS_ATTR((unused)), return ret; } -static int inode_scan_and_fix(ext2_filsys fs) +static int inode_scan_and_fix(ext2_filsys fs, ext2fs_block_bitmap bmap) { errcode_t retval = 0; ext2_ino_t ino; @@ -1122,8 +1124,8 @@ static int inode_scan_and_fix(ext2_filsys fs) * Do we need to fix this ?? */ - if (inode.i_file_acl) { - + if (inode.i_file_acl && + ext2fs_test_block_bitmap(bmap, inode.i_file_acl)) { blk = transalate_block(inode.i_file_acl); if (!blk) continue; @@ -1142,9 +1144,8 @@ static int inode_scan_and_fix(ext2_filsys fs) if (!ext2fs_inode_has_valid_blocks(&inode)) continue; - retval = ext2fs_block_iterate2(fs, ino, 0, - block_buf, process_block, - 0); + retval = ext2fs_block_iterate2(fs, ino, 0, block_buf, + process_block, bmap); if (retval) goto err_out; @@ -1344,7 +1345,7 @@ static int resize_inode(ext2_filsys fs, unsigned long new_size) if (retval) goto err_out; - retval = inode_scan_and_fix(fs); + retval = inode_scan_and_fix(fs, bmap); if (retval) goto err_out; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: "tune2fs -I 256" runtime---can it be interrupted safely? 2008-11-15 5:50 ` Theodore Tso @ 2008-11-16 12:11 ` Andreas Schultz 0 siblings, 0 replies; 9+ messages in thread From: Andreas Schultz @ 2008-11-16 12:11 UTC (permalink / raw) To: Theodore Tso; +Cc: Andreas Dilger, linux-ext4, Aneesh Kumar [-- Attachment #1: Type: text/plain, Size: 1436 bytes --] Hi, On Saturday 15 November 2008 06:50:17 Theodore Tso wrote: > Fixing both of these should remove most of the user-space time that's > being burned by tune2fs -I 256. There is still some CPU burned by the > undo file, and indeed we can probably speed this up some more by > omitting creating an undo entry when we're copying a data block to a > previously unused block. But hopefully this should be enough to speed > up "tune2fs -I 256" for most people. it does speed up things, but it is still quite slow. I tried it on a 250GB partition which is about 86% filled: root@ws-aschultz:~# df -i /usr/src2/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 30523392 1401574 29121818 5% /usr/src2 root@ws-aschultz:~# df /usr/src2/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 240308484 196091320 32010176 86% /usr/src2 root@ws-aschultz:~# umount /usr/src2 root@ws-aschultz:~# time tune2fs -I 256 /dev/sdb1 tune2fs 1.41.3 (12-Oct-2008) Setting inode size 256 real 64m31.189s user 43m46.708s sys 1m50.627s 64min for 250GB seems to be to much for the average user. From watching the disc activity and the output of 'top', it seems that the operation periodically consumes 100%CPU for extended periods without any disc activity, followed by a brief period of about 20%CPU load and light disc activity. Andreas [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-11-16 12:11 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-01 19:10 "tune2fs -I 256" runtime---can it be interrupted safely? Michael B. Trausch 2008-11-01 23:46 ` Michael B. Trausch 2008-11-07 21:16 ` Andreas Dilger 2008-11-11 16:09 ` Michael B. Trausch 2008-11-14 8:30 ` Andreas Dilger -- strict thread matches above, loose matches on Subject: below -- 2008-11-14 12:49 Andreas Schultz 2008-11-14 21:06 ` Andreas Dilger 2008-11-15 5:50 ` Theodore Tso 2008-11-16 12:11 ` Andreas Schultz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).