* resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes @ 2015-09-22 19:12 Pocas, Jamie 2015-09-22 19:33 ` Eric Sandeen 2015-09-22 20:20 ` Theodore Ts'o 0 siblings, 2 replies; 13+ messages in thread From: Pocas, Jamie @ 2015-09-22 19:12 UTC (permalink / raw) To: linux-ext4@vger.kernel.org Hi, I apologize in advance if this is a well-known issue but I don't see it as an open bug in sourceforge.net. I'm not able to open a bug there without permission, so I am writing you here. I have a very reproducible spin in resize2fs (x86_64) on both CentOS 6 latest rpms and CentOS 7. It will peg one core at 100%. This happens with both e2fsprogs version 1.41.12 on CentOS 6 w/ latest 2.6.32 kernel rpm installed and e2fsprogs version 1.42.9 on CentOS 7 with latest 3.10 kernel rpm installed. The key to reproducing this seems to be when creating small filesystems. For example if I create an ext4 filesystem on a 100MiB disk (or file), and then increase the size of the underlying disk (or file) to say 1GiB, it will spin and consume 100% CPU and not finish even after hours (it should take a few seconds). Here are the flags used when creating the fs. mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F 0 /dev/sdz Some of these may not be necessary anymore but were very experimental when I first started testing on CentOS 5 way back. I think all of these options except "nodiscard" are the defaults now anyway. I only use the option because in the application I am using this for, it doesn't make sense to discard the existing devices which are initially zeroed anyway. I suppose with volumes this small it doesn't take much extra time anyway, but I don't want to go down that rat hole. I am not doing anything custom with the number of inodes, smaller blocksize (1k), etc... just what you see above. So it's taking the default settings for those, which maybe are bogus and broken for small volumes nowadays. I don't know. Here is the stack... [root@localhost ~]# cat /proc/8403/stack [<ffffffff8106ee1a>] __cond_resched+0x2a/0x40 [<ffffffff8112860b>] find_lock_page+0x3b/0x80 [<ffffffff8112874f>] find_or_create_page+0x3f/0xb0 [<ffffffff811c8540>] __getblk+0xf0/0x2a0 [<ffffffff811c9ad3>] __bread+0x13/0xb0 [<ffffffffa056098c>] ext4_group_extend+0xfc/0x410 [ext4] [<ffffffffa05498a0>] ext4_ioctl+0x660/0x920 [ext4] [<ffffffff811a7372>] vfs_ioctl+0x22/0xa0 [<ffffffff811a7514>] do_vfs_ioctl+0x84/0x580 [<ffffffff811a7a91>] sys_ioctl+0x81/0xa0 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff It seems to be sleeping, waiting for a free page, and then sleeping again in the kernel. I don't get ANY output after the version heading prints out, even with the -d debug flags turned up all the way. It's really getting stuck very early on with no I/O going to the disk during this CPU spinning. I don't see anything in the dmesg related to this activity either. I haven't finished binary searching for the specific boundary where the problem occurs, but I initially noticed that 1GiB and larger always worked and took only a few seconds. Then I stepped down to 500MiB and it hung in the same way. Then stepped up to 750MiB and it works normally. So there is some kind of boundary between 500-750MiB that I haven't found yet. I understand that these are really small filesystems nowadays other than something that might fit on a CD, but I'm hoping that it's something simple that could probably be fixed easily. I suspect that due to the disk size, there are probably bad or unusual defaults being selected, or there is a structure that is being undersized, or with unexpected filesystem dimensions such that the conditions it's expecting are invalid and will never be satisfied. On that note I am wondering with disks this small if it is relying on the antiquated geometry reporting from the device because I know that sometimes with small virtual disks like there, there can sometimes be problems trying to accurately emulate a fake C/H/S geometry with disks this small and sometimes rounding down is necessary. I wonder if a mismatch could cause this. I don't want to steer anyone off into the weeds though. I haven't dug into the code much yet, but I was wondering if anyone had any ideas what could be going on. I think at the very least this is a bug in the resize code in the ext4 code in the kernel itself because even if the resize2fs program is giving bad parameters, I would not expect this type of hang to be able to be initiated from user space. Regards, Jamie ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-22 19:12 resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes Pocas, Jamie @ 2015-09-22 19:33 ` Eric Sandeen 2015-09-22 20:28 ` Pocas, Jamie 2015-09-22 20:20 ` Theodore Ts'o 1 sibling, 1 reply; 13+ messages in thread From: Eric Sandeen @ 2015-09-22 19:33 UTC (permalink / raw) To: Pocas, Jamie, linux-ext4@vger.kernel.org On 9/22/15 2:12 PM, Pocas, Jamie wrote: > Hi, > > I apologize in advance if this is a well-known issue but I don't see > it as an open bug in sourceforge.net. I'm not able to open a bug > there without permission, so I am writing you here. the centos bug tracker may be the right place for your distro... > I have a very reproducible spin in resize2fs (x86_64) on both CentOS > 6 latest rpms and CentOS 7. It will peg one core at 100%. This > happens with both e2fsprogs version 1.41.12 on CentOS 6 w/ latest > 2.6.32 kernel rpm installed and e2fsprogs version 1.42.9 on CentOS 7 > with latest 3.10 kernel rpm installed. The key to reproducing this > seems to be when creating small filesystems. For example if I create > an ext4 filesystem on a 100MiB disk (or file), and then increase the > size of the underlying disk (or file) to say 1GiB, it will spin and > consume 100% CPU and not finish even after hours (it should take a > few seconds). > > Here are the flags used when creating the fs. > > mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F 0 /dev/sdz AFAIK -F doesn't take an argument, is that 0 supposed to be there? but if I test this: # truncate --size=100m testfile # mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F testfile # truncate --size=1g testfile # mount -o loop testfile mnt # resize2fs /dev/loop0 that works fine on my rhel7 box, with kernel-3.10.0-229.el7 and e2fsprogs-1.42.9-7.el7 Do those same steps fail for you? -Eric > Some of these may not be necessary anymore but were very experimental > when I first started testing on CentOS 5 way back. I think all of > these options except "nodiscard" are the defaults now anyway. I only > use the option because in the application I am using this for, it > doesn't make sense to discard the existing devices which are > initially zeroed anyway. I suppose with volumes this small it doesn't > take much extra time anyway, but I don't want to go down that rat > hole. I am not doing anything custom with the number of inodes, > smaller blocksize (1k), etc... just what you see above. So it's > taking the default settings for those, which maybe are bogus and > broken for small volumes nowadays. I don't know. > > Here is the stack... > > [root@localhost ~]# cat /proc/8403/stack > [<ffffffff8106ee1a>] __cond_resched+0x2a/0x40 > [<ffffffff8112860b>] find_lock_page+0x3b/0x80 > [<ffffffff8112874f>] find_or_create_page+0x3f/0xb0 > [<ffffffff811c8540>] __getblk+0xf0/0x2a0 > [<ffffffff811c9ad3>] __bread+0x13/0xb0 > [<ffffffffa056098c>] ext4_group_extend+0xfc/0x410 [ext4] > [<ffffffffa05498a0>] ext4_ioctl+0x660/0x920 [ext4] > [<ffffffff811a7372>] vfs_ioctl+0x22/0xa0 > [<ffffffff811a7514>] do_vfs_ioctl+0x84/0x580 > [<ffffffff811a7a91>] sys_ioctl+0x81/0xa0 > [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > It seems to be sleeping, waiting for a free page, and then sleeping > again in the kernel. I don't get ANY output after the version heading > prints out, even with the -d debug flags turned up all the way. It's > really getting stuck very early on with no I/O going to the disk > during this CPU spinning. I don't see anything in the dmesg related > to this activity either. > > I haven't finished binary searching for the specific boundary where > the problem occurs, but I initially noticed that 1GiB and larger > always worked and took only a few seconds. Then I stepped down to > 500MiB and it hung in the same way. Then stepped up to 750MiB and it > works normally. So there is some kind of boundary between 500-750MiB > that I haven't found yet. > > I understand that these are really small filesystems nowadays other > than something that might fit on a CD, but I'm hoping that it's > something simple that could probably be fixed easily. I suspect that > due to the disk size, there are probably bad or unusual defaults > being selected, or there is a structure that is being undersized, or > with unexpected filesystem dimensions such that the conditions it's > expecting are invalid and will never be satisfied. On that note I am > wondering with disks this small if it is relying on the antiquated > geometry reporting from the device because I know that sometimes with > small virtual disks like there, there can sometimes be problems > trying to accurately emulate a fake C/H/S geometry with disks this > small and sometimes rounding down is necessary. I wonder if a > mismatch could cause this. I don't want to steer anyone off into the > weeds though. > > I haven't dug into the code much yet, but I was wondering if anyone > had any ideas what could be going on. I think at the very least this > is a bug in the resize code in the ext4 code in the kernel itself > because even if the resize2fs program is giving bad parameters, I > would not expect this type of hang to be able to be initiated from > user space.> > Regards, > Jamie > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-22 19:33 ` Eric Sandeen @ 2015-09-22 20:28 ` Pocas, Jamie 2015-09-22 23:02 ` Theodore Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Pocas, Jamie @ 2015-09-22 20:28 UTC (permalink / raw) To: Eric Sandeen, linux-ext4@vger.kernel.org Thanks for the prompt reply. Yes the "0" in mkfs was an accidental copy and paste. It's not supposed to be there. Your sequence works, but it's a tad bit more synthetic than what's really happening in my case. In your example, the backing store (testfile in this case) is being resized using truncate before the contained filesystem is mounted. In my case the underlying device is being grown while the filesystem is mounted. If I do the following instead, which is more analogous to the way that the underlying device is resized at runtime, it reproduces the 100% consumption. $ truncate --size=100M testfile # mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F testfile mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0 blocks 25688 inodes, 102400 blocks 5120 blocks (5.00%) reserved for the super user First data block=1 Maximum filesystem blocks=33685504 13 block groups 8192 blocks per group, 8192 fragments per group 1976 inodes per group Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729 Allocating group tables: done Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done # mount -o loop testfile mnt # truncate --size=1G testfile # losetup -c /dev/loop0 ## Cause loop device to reread size of backing file while still online # resize2fs /dev/loop0 resize2fs 1.42.9 (28-Dec-2013) Filesystem at /dev/loop0 is mounted on /home/jpocas/source/hulk.1/mnt; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 8 ##... it's hung here spinning at 100%, at least I got SOME output though. ## From another shell I can see the following # top | head top - 16:22:53 up 6:02, 6 users, load average: 1.05, 0.80, 0.40 Tasks: 518 total, 2 running, 516 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.6 us, 0.7 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 5933160 total, 1864476 free, 1196476 used, 2872208 buff/cache KiB Swap: 3670012 total, 3670012 free, 0 used. 4403764 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13664 root 20 0 116548 1032 864 R 100.0 0.0 5:54.61 resize2fs 2214 root 20 0 264300 72876 8756 S 6.2 1.2 2:19.58 Xorg 3892 jpocas 20 0 432920 7884 6052 S 6.2 0.1 0:56.68 ibus-x11 # ## BTW, I am not sure why the heading only shows the 1.42.9 on CentOS but I surely have the 1.42.9-7 rpm installed. # rpm -q e2fsprogs e2fsprogs-1.42.9-7.el7.x86_64 # -----Original Message----- From: Eric Sandeen [mailto:sandeen@redhat.com] Sent: Tuesday, September 22, 2015 3:33 PM To: Pocas, Jamie; linux-ext4@vger.kernel.org Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes On 9/22/15 2:12 PM, Pocas, Jamie wrote: > Hi, > > I apologize in advance if this is a well-known issue but I don't see > it as an open bug in sourceforge.net. I'm not able to open a bug there > without permission, so I am writing you here. the centos bug tracker may be the right place for your distro... > I have a very reproducible spin in resize2fs (x86_64) on both CentOS > 6 latest rpms and CentOS 7. It will peg one core at 100%. This happens > with both e2fsprogs version 1.41.12 on CentOS 6 w/ latest > 2.6.32 kernel rpm installed and e2fsprogs version 1.42.9 on CentOS 7 > with latest 3.10 kernel rpm installed. The key to reproducing this > seems to be when creating small filesystems. For example if I create > an ext4 filesystem on a 100MiB disk (or file), and then increase the > size of the underlying disk (or file) to say 1GiB, it will spin and > consume 100% CPU and not finish even after hours (it should take a few > seconds). > > Here are the flags used when creating the fs. > > mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F 0 /dev/sdz AFAIK -F doesn't take an argument, is that 0 supposed to be there? but if I test this: # truncate --size=100m testfile # mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F testfile # truncate --size=1g testfile # mount -o loop testfile mnt #resize2fs /dev/loop0 that works fine on my rhel7 box, with kernel-3.10.0-229.el7 and e2fsprogs-1.42.9-7.el7 Do those same steps fail for you? -Eric > Some of these may not be necessary anymore but were very experimental > when I first started testing on CentOS 5 way back. I think all of > these options except "nodiscard" are the defaults now anyway. I only > use the option because in the application I am using this for, it > doesn't make sense to discard the existing devices which are initially > zeroed anyway. I suppose with volumes this small it doesn't take much > extra time anyway, but I don't want to go down that rat hole. I am not > doing anything custom with the number of inodes, smaller blocksize > (1k), etc... just what you see above. So it's taking the default > settings for those, which maybe are bogus and broken for small volumes > nowadays. I don't know. > > Here is the stack... > > [root@localhost ~]# cat /proc/8403/stack [<ffffffff8106ee1a>] > __cond_resched+0x2a/0x40 [<ffffffff8112860b>] find_lock_page+0x3b/0x80 > [<ffffffff8112874f>] find_or_create_page+0x3f/0xb0 > [<ffffffff811c8540>] __getblk+0xf0/0x2a0 [<ffffffff811c9ad3>] > __bread+0x13/0xb0 [<ffffffffa056098c>] ext4_group_extend+0xfc/0x410 > [ext4] [<ffffffffa05498a0>] ext4_ioctl+0x660/0x920 [ext4] > [<ffffffff811a7372>] vfs_ioctl+0x22/0xa0 [<ffffffff811a7514>] > do_vfs_ioctl+0x84/0x580 [<ffffffff811a7a91>] sys_ioctl+0x81/0xa0 > [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > It seems to be sleeping, waiting for a free page, and then sleeping > again in the kernel. I don't get ANY output after the version heading > prints out, even with the -d debug flags turned up all the way. It's > really getting stuck very early on with no I/O going to the disk > during this CPU spinning. I don't see anything in the dmesg related to > this activity either. > > I haven't finished binary searching for the specific boundary where > the problem occurs, but I initially noticed that 1GiB and larger > always worked and took only a few seconds. Then I stepped down to > 500MiB and it hung in the same way. Then stepped up to 750MiB and it > works normally. So there is some kind of boundary between 500-750MiB > that I haven't found yet. > > I understand that these are really small filesystems nowadays other > than something that might fit on a CD, but I'm hoping that it's > something simple that could probably be fixed easily. I suspect that > due to the disk size, there are probably bad or unusual defaults being > selected, or there is a structure that is being undersized, or with > unexpected filesystem dimensions such that the conditions it's > expecting are invalid and will never be satisfied. On that note I am > wondering with disks this small if it is relying on the antiquated > geometry reporting from the device because I know that sometimes with > small virtual disks like there, there can sometimes be problems trying > to accurately emulate a fake C/H/S geometry with disks this small and > sometimes rounding down is necessary. I wonder if a mismatch could > cause this. I don't want to steer anyone off into the weeds though. > > I haven't dug into the code much yet, but I was wondering if anyone > had any ideas what could be going on. I think at the very least this > is a bug in the resize code in the ext4 code in the kernel itself > because even if the resize2fs program is giving bad parameters, I > would not expect this type of hang to be able to be initiated from > user space.> Regards, Jamie > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" > in the body of a message to majordomo@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-22 20:28 ` Pocas, Jamie @ 2015-09-22 23:02 ` Theodore Ts'o 2015-09-23 4:20 ` Pocas, Jamie 0 siblings, 1 reply; 13+ messages in thread From: Theodore Ts'o @ 2015-09-22 23:02 UTC (permalink / raw) To: Pocas, Jamie; +Cc: Eric Sandeen, linux-ext4@vger.kernel.org On Tue, Sep 22, 2015 at 04:28:39PM -0400, Pocas, Jamie wrote: > # mount -o loop testfile mnt > # truncate --size=1G testfile > # losetup -c /dev/loop0 ## Cause loop device to reread size of backing file while still online > # resize2fs /dev/loop0 It looks like the problem is with the loopback driver, and I can reproduce the problem using 4.3-rc2. If you don't do *either* the truncate or the resize2fs command in the above sequence, and then do a "touch mnt/foo ; sync", the sync command will hang. The problem is the losetup -c command, which calls the LOOP_SET_CAPACITY ioctl. The problem is that this causes bd_set_size() to be called, which has the side effect of forcing the block size of /dev/loop0 to 4096 --- which is a problem if the file system is using a 1k block size, and so the block size was properly set to 1024. This is subsequently causing the buffer cache operations to hang. So this will cause a hang: cp /dev/null /tmp/foo.img mke2fs -t ext4 /tmp/foo.img 100M mount -o loop /tmp/foo.img /mnt losetup -c /dev/loop0 touch /mnt/foo sync This will not hang: cp /dev/null /tmp/foo.img mke2fs -t ext4 -b 4096 /tmp/foo.img 100M mount -o loop /tmp/foo.img /mnt losetup -c /dev/loop0 touch /mnt/foo sync And this also explains why you weren't seeing the problem with small file systems. By default mke2fs uses a block size of 1k for file systems smaller than 512 MB. This is largely for historical reasons since there was a time when we worried about optimizing the storage of every single byte of your 80MB disk (which was all you had on your 40 MHz 80386 :-). With larger file systems, the block size defaults to 4096, so we don't run into problems when losetup -c attempts to set the block size --- which is something that is *not* supposed to change if the block device is currently mounted. So for example, if you try to run the command "blockdev --setbsz", it will fail with an EBUSY if the block device is curently mounted. So the workaround is to just create the file system with "-b 4096" when you call mkfs.ext4. This is a good idea if you intend to grow the file system, since it is far more efficient to use a 4k block size. The proper fix in the kernel is to have the loop device check to see if the block device is currently mounted. If it is, then needs to avoid changing the block size (which probably means it will need to call a modified version of bd_set_size), and the capacity of the block device needs to be rounded-down to the current block size. (Currently if you set the capacity of the block device to be say, 1MB plus 2k, and the current block size is 4k, it will change the block size of the device to be 2k, so that the entire block device is addressable. If the block device is mount and the block size is fixed to 4k, then it must not change the block size --- either up or down. Instead, it must keep the block size at 4k, and only allow the capacity to be set to 1MB.) Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-22 23:02 ` Theodore Ts'o @ 2015-09-23 4:20 ` Pocas, Jamie 2015-09-23 15:14 ` Theodore Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Pocas, Jamie @ 2015-09-23 4:20 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Eric Sandeen, linux-ext4@vger.kernel.org Ted, just to add another data point, with some minor adjustments to the script to use xfs instead, such as using "mkfs.xfs -b size=1024" to force 1k blocks, I cannot reproduce the issue and the data block size doesn't change from 1k. This is still using loopback so I am a bit skeptical that the blame is due to the use of a loopback device or filesystems with an initial 1k fs block size. I can see this on other virtualized disks that can be resized online such as VMware virtual disks and remote iSCSI targets. I haven't tried LVM but I suspect that would be another good test. Suffer this small analogy for me and let me know where I am wrong: say hypothetically I expand a small partition (or LVM for that matter). Then I try to use resize2fs to grow the ext filesystem on it. I expect that this should *not* change the block size of the underlying device (of course not!) nor the filesystem's block size. Is that a correct assumption? I can see that it doesn't change the block size with xfs, nor the underlying device queue parameters for /dev/loop0 either (under /sys/block/loop0/queue). This use of a relatively tiny volume is not a normal use case for my application so I want to express that this is not a super urgent issue for me to resolve right away. For my purposes I can just disallow using devices that are that small. They are really impractical anyway and this just came up in testing. I just wanted to do my duty and report what I think is a legitimate issue, and maybe validate someone else's frustration if they are having this issue, however small of an edge case this might turn out to be :). I also wasn't sure if was indicative of a bug on a boundary condition that might happen with other potentially incompatible combinations of mkfs/mount parameters or sizes of volumes that are not validated before use. That would be more serious. I deal more with the block storag e itself and so I admit I am not an ext4 expert, hence the possibly bad analogy earlier :). I am willing to take a deeper look into the code and see if I can figure out a patch when I get some more time but I was just picking your brain in case it was something really obvious. -Jamie -----Original Message----- From: Theodore Ts'o [mailto:tytso@mit.edu] Sent: Tuesday, September 22, 2015 7:02 PM To: Pocas, Jamie Cc: Eric Sandeen; linux-ext4@vger.kernel.org Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes On Tue, Sep 22, 2015 at 04:28:39PM -0400, Pocas, Jamie wrote: > # mount -o loop testfile mnt > # truncate --size=1G testfile > # losetup -c /dev/loop0 ## Cause loop device to reread size of backing > file while still online # resize2fs /dev/loop0 It looks like the problem is with the loopback driver, and I can reproduce the problem using 4.3-rc2. If you don't do *either* the truncate or the resize2fs command in the above sequence, and then do a "touch mnt/foo ; sync", the sync command will hang. The problem is the losetup -c command, which calls the LOOP_SET_CAPACITY ioctl. The problem is that this causes bd_set_size() to be called, which has the side effect of forcing the block size of /dev/loop0 to 4096 --- which is a problem if the file system is using a 1k block size, and so the block size was properly set to 1024. This is subsequently causing the buffer cache operations to hang. So this will cause a hang: cp /dev/null /tmp/foo.img mke2fs -t ext4 /tmp/foo.img 100M mount -o loop /tmp/foo.img /mnt losetup -c /dev/loop0 touch /mnt/foo sync This will not hang: cp /dev/null /tmp/foo.img mke2fs -t ext4 -b 4096 /tmp/foo.img 100M mount -o loop /tmp/foo.img /mnt losetup -c /dev/loop0 touch /mnt/foo sync And this also explains why you weren't seeing the problem with small file systems. By default mke2fs uses a block size of 1k for file systems smaller than 512 MB. This is largely for historical reasons since there was a time when we worried about optimizing the storage of every single byte of your 80MB disk (which was all you had on your 40 MHz 80386 :-). With larger file systems, the block size defaults to 4096, so we don't run into problems when losetup -c attempts to set the block size --- which is something that is *not* supposed to change if the block device is currently mounted. So for example, if you try to run the command "blockdev --setbsz", it will fail with an EBUSY if the block device is curently mounted. So the workaround is to just create the file system with "-b 4096" when you call mkfs.ext4. This is a good idea if you intend to grow the file system, since it is far more efficient to use a 4k block size. The proper fix in the kernel is to have the loop device check to see if the block device is currently mounted. If it is, then needs to avoid changing the block size (which probably means it will need to call a modified version of bd_set_size), and the capacity of the block device needs to be rounded-down to the current block size. (Currently if you set the capacity of the block device to be say, 1MB plus 2k, and the current block size is 4k, it will change the block size of the device to be 2k, so that the entire block device is addressable. If the block device is mount and the block size is fixed to 4k, then it must not change the block size --- either up or down. Instead, it must keep the block size at 4k, and only allow the capacity to be set to 1MB.) Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-23 4:20 ` Pocas, Jamie @ 2015-09-23 15:14 ` Theodore Ts'o 2015-09-23 16:04 ` Pocas, Jamie 0 siblings, 1 reply; 13+ messages in thread From: Theodore Ts'o @ 2015-09-23 15:14 UTC (permalink / raw) To: Pocas, Jamie; +Cc: Eric Sandeen, linux-ext4@vger.kernel.org On Wed, Sep 23, 2015 at 12:20:17AM -0400, Pocas, Jamie wrote: > Ted, just to add another data point, with some minor adjustments to > the script to use xfs instead, such as using "mkfs.xfs -b size=1024" > to force 1k blocks, I cannot reproduce the issue and the data block > size doesn't change from 1k. Yes, that's not surprising, because XFS doesn't use the buffer cache layer. Ext4 does, because that's the basis of how the jbd2 layer works. It does change the block size as reported by the block device and which is used by the buffer cache layer, though. (Internally, this is known as the "soft" block size; it's basically the data in which data is cached in the buffer cache layer): root@kvm-xfstests:~# truncate -s 100M /tmp/foo.img root@kvm-xfstests:~# mkfs.xfs -b size=1024 /tmp/foo.img meta-data=/tmp/foo.img isize=512 agcount=4, agsize=25600 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0 data = bsize=1024 blocks=102400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=1024 blocks=2573, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@kvm-xfstests:~# mount -o loop /tmp/foo.img /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@kvm-xfstests:~# losetup -c /dev/loop0 root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 4096 <--------- BUG, note the change in the block size root@kvm-xfstests:~# touch /mnt/foo root@kvm-xfstests:~# sync <------ The reason why we don't hang is that XFS doesn't use the <------ buffer cache root@kvm-xfstests:~# umount /mnt Also feel free to try my repro, but using "blockdev --getbsz /dev/loop" before and after the losetup -c command, and note that it does not hang even though there is no resize2fs in the command sequence at all: root@kvm-xfstests:~# cp /dev/null /tmp/foo.img root@kvm-xfstests:~# truncate -s 100M /tmp/foo.img root@kvm-xfstests:~# mke2fs -t ext4 /tmp/foo.img mke2fs 1.43-WIP (18-May-2015) Discarding device blocks: done Creating filesystem with 102400 1k blocks and 25688 inodes Filesystem UUID: 27dfdbbe-f3a9-48a7-abe8-5a52798a9849 Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729 Allocating group tables: done Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done root@kvm-xfstests:~# mount -o loop /tmp/foo.img /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@kvm-xfstests:~# losetup -c /dev/loop0 root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 4096 <------------ BUG root@kvm-xfstests:~# touch /mnt/foo <------- Should hang here, even though there is no resize2fs command <------- If it doesn't hang right away, try typing the "sync" command > Suffer this small analogy > for me and let me know where I am wrong: say hypothetically I expand > a small partition (or LVM for that matter). Then I try to use > resize2fs to grow the ext filesystem on it. I expect that this > should *not* change the block size of the underlying device (of > course not!) nor the filesystem's block size. The cause of your misunderstanding is not understanding that there are actually 4 different concepts of block/sector size: * The logical block/sector size of the underlying storage device - Retrived via "blockdev --getss /dev/sdXX" - This is the smallest unit that can be sent to the disk from the Host OS. If the logical sector size is different from the physical block size, and write is smaller than the physical sector size (see below), then the disk will do a read-modify-write. - The file system block size MUST be greater than or equal to the logical sector size. * The physical block/sector size of the underlying storage device - Retrived via "blockdev --getpbsz /dev/sdXX" - This is the smallest unit can be physically written to the storage media. - The file system block size SHOULD be greater than or equal to the logical sector size. (To avoid read-modify-write operations by the hard drive that will bad for performance.) * The "soft" block size of the block device. - Retrived via "blockdev --getbsz /dev/sdXX" - This represents the units of storage which is used to cache data in the buffer cache. This only matters if you are using buffer cache --- for example, if you are doing buffered I/O to a block device, or if you are using a file system such as ext4 which is using buffer cache. Since data is indexed in the buffer cache by the 3-tuple (block device, block number, block size), Bad Things happen if you try to change the block size while the file system is mounted. Normally, the kernel will prevent you from changing the block size under these circumstances. * The file system block size. - Retrieved by some file-system dependent command. For ext4, this is "dumpe2fs -h". - Set at format time. For file systems that use the buffer cache, the file system driver will automatically set the "soft" block size of the block device when the file system is mounted. Speaking of LVM, I can't reproduce the problem using LVM, at least not with a 4.3-rc2 kernel: root@kvm-xfstests:~# pvcreate /dev/vdc Physical volume "/dev/vdc" successfully created root@kvm-xfstests:~# vgcreate test /dev/vdc Volume group "test" successfully created root@kvm-xfstests:~# lvcreate -L 100M -n small /dev/test Logical volume "small" created root@kvm-xfstests:~# mkfs.ext4 -Fq /dev/test/small root@kvm-xfstests:~# mount -o loop /dev/test/small /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@kvm-xfstests:~# lvresize -L 1G /dev/test/small Size of logical volume test/small changed from 100.00 MiB (25 extents) to 1.00 GiB (256 extents). Logical volume small successfully resized root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 1024 <------ NO BUG, see the block size has not changed root@kvm-xfstests:~# lvcreate -L 100M -n small /dev/test^C root@kvm-xfstests:~# touch /mnt/foo ; sync root@kvm-xfstests:~# resize2fs /dev/test/small resize2fs 1.43-WIP (18-May-2015) Filesystem at /dev/test/small is mounted on /mnt; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 8 The filesystem on /dev/test/small is now 1048576 (1k) blocks long. <------ Note that resize2fs works just fine! root@kvm-xfstests:~# touch /mnt/bar ; sync root@kvm-xfstests:~# umount /mnt root@kvm-xfstests:~# You might see if this works on CentOS; but if it doesn't, I'm pretty convinced this is a bug outside of ext4, and I've already given you a workaround --- using "-b 4096" on the command line to mkfs.ext4 or mke2fs. Alternatively, here's another workaround; you can change modify your /etc/mke2fs.conf so the "small" and "floppy" stanzas read: [fs_types] small = { blocksize = 4096 inode_size = 128 inode_ratio = 4096 } floppy = { blocksize = 4096 inode_size = 128 inode_ratio = 8192 } I'm pretty certain your failures won't reproduce if you either change how you call mke2fs for small file systems, or change your /etc/mke2fs.conf file as shown above. Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-23 15:14 ` Theodore Ts'o @ 2015-09-23 16:04 ` Pocas, Jamie 2015-09-23 16:59 ` Theodore Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Pocas, Jamie @ 2015-09-23 16:04 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Eric Sandeen, linux-ext4@vger.kernel.org Interesting. Thanks for the detailed break-down! I don't mind the workaround of using 4k "soft" block size on the filesystem, even for smaller filesystems. Now that I understand better, I think you were on target with your earlier explanation of bd_set_size(). So this means it's not an ext4 bug. I think the online resize of loopback device (or any other block device driver) should use something like the code in check_disk_size_change() instead of bd_set_size(). I will have to test this out. Thanks again. Regards, - Jamie -----Original Message----- From: Theodore Ts'o [mailto:tytso@mit.edu] Sent: Wednesday, September 23, 2015 11:14 AM To: Pocas, Jamie Cc: Eric Sandeen; linux-ext4@vger.kernel.org Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes On Wed, Sep 23, 2015 at 12:20:17AM -0400, Pocas, Jamie wrote: > Ted, just to add another data point, with some minor adjustments to > the script to use xfs instead, such as using "mkfs.xfs -b size=1024" > to force 1k blocks, I cannot reproduce the issue and the data block > size doesn't change from 1k. Yes, that's not surprising, because XFS doesn't use the buffer cache layer. Ext4 does, because that's the basis of how the jbd2 layer works. It does change the block size as reported by the block device and which is used by the buffer cache layer, though. (Internally, this is known as the "soft" block size; it's basically the data in which data is cached in the buffer cache layer): root@kvm-xfstests:~# truncate -s 100M /tmp/foo.img root@kvm-xfstests:~# mkfs.xfs -b size=1024 /tmp/foo.img meta-data=/tmp/foo.img isize=512 agcount=4, agsize=25600 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0 data = bsize=1024 blocks=102400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=1024 blocks=2573, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@kvm-xfstests:~# mount -o loop /tmp/foo.img /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@kvm-xfstests:~# losetup -c /dev/loop0 root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 4096 <--------- BUG, note the change in the block size root@kvm-xfstests:~# touch /mnt/foo root@kvm-xfstests:~# sync <------ The reason why we don't hang is that XFS doesn't use the <------ buffer cache root@kvm-xfstests:~# umount /mnt Also feel free to try my repro, but using "blockdev --getbsz /dev/loop" before and after the losetup -c command, and note that it does not hang even though there is no resize2fs in the command sequence at all: root@kvm-xfstests:~# cp /dev/null /tmp/foo.img root@kvm-xfstests:~# truncate -s 100M /tmp/foo.img root@kvm-xfstests:~# mke2fs -t ext4 /tmp/foo.img mke2fs 1.43-WIP (18-May-2015) Discarding device blocks: done Creating filesystem with 102400 1k blocks and 25688 inodes Filesystem UUID: 27dfdbbe-f3a9-48a7-abe8-5a52798a9849 Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729 Allocating group tables: done Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done root@kvm-xfstests:~# mount -o loop /tmp/foo.img /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@kvm-xfstests:~# losetup -c /dev/loop0 root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 4096 <------------ BUG root@kvm-xfstests:~# touch /mnt/foo <------- Should hang here, even though there is no resize2fs command <------- If it doesn't hang right away, try typing the "sync" command > Suffer this small analogy > for me and let me know where I am wrong: say hypothetically I expand a > small partition (or LVM for that matter). Then I try to use resize2fs > to grow the ext filesystem on it. I expect that this should *not* > change the block size of the underlying device (of course not!) nor > the filesystem's block size. The cause of your misunderstanding is not understanding that there are actually 4 different concepts of block/sector size: * The logical block/sector size of the underlying storage device - Retrived via "blockdev --getss /dev/sdXX" - This is the smallest unit that can be sent to the disk from the Host OS. If the logical sector size is different from the physical block size, and write is smaller than the physical sector size (see below), then the disk will do a read-modify-write. - The file system block size MUST be greater than or equal to the logical sector size. * The physical block/sector size of the underlying storage device - Retrived via "blockdev --getpbsz /dev/sdXX" - This is the smallest unit can be physically written to the storage media. - The file system block size SHOULD be greater than or equal to the logical sector size. (To avoid read-modify-write operations by the hard drive that will bad for performance.) * The "soft" block size of the block device. - Retrived via "blockdev --getbsz /dev/sdXX" - This represents the units of storage which is used to cache data in the buffer cache. This only matters if you are using buffer cache --- for example, if you are doing buffered I/O to a block device, or if you are using a file system such as ext4 which is using buffer cache. Since data is indexed in the buffer cache by the 3-tuple (block device, block number, block size), Bad Things happen if you try to change the block size while the file system is mounted. Normally, the kernel will prevent you from changing the block size under these circumstances. * The file system block size. - Retrieved by some file-system dependent command. For ext4, this is "dumpe2fs -h". - Set at format time. For file systems that use the buffer cache, the file system driver will automatically set the "soft" block size of the block device when the file system is mounted. Speaking of LVM, I can't reproduce the problem using LVM, at least not with a 4.3-rc2 kernel: root@kvm-xfstests:~# pvcreate /dev/vdc Physical volume "/dev/vdc" successfully created root@kvm-xfstests:~# vgcreate test /dev/vdc Volume group "test" successfully created root@kvm-xfstests:~# lvcreate -L 100M -n small /dev/test Logical volume "small" created root@kvm-xfstests:~# mkfs.ext4 -Fq /dev/test/small root@kvm-xfstests:~# mount -o loop /dev/test/small /mnt root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@kvm-xfstests:~# lvresize -L 1G /dev/test/small Size of logical volume test/small changed from 100.00 MiB (25 extents) to 1.00 GiB (256 extents). Logical volume small successfully resized root@kvm-xfstests:~# blockdev --getbsz /dev/loop0 1024 <------ NO BUG, see the block size has not changed root@kvm-xfstests:~# lvcreate -L 100M -n small /dev/test^C root@kvm-xfstests:~# touch /mnt/foo ; sync root@kvm-xfstests:~# resize2fs /dev/test/small resize2fs 1.43-WIP (18-May-2015) Filesystem at /dev/test/small is mounted on /mnt; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 8 The filesystem on /dev/test/small is now 1048576 (1k) blocks long. <------ Note that resize2fs works just fine! root@kvm-xfstests:~# touch /mnt/bar ; sync root@kvm-xfstests:~# umount /mnt root@kvm-xfstests:~# You might see if this works on CentOS; but if it doesn't, I'm pretty convinced this is a bug outside of ext4, and I've already given you a workaround --- using "-b 4096" on the command line to mkfs.ext4 or mke2fs. Alternatively, here's another workaround; you can change modify your /etc/mke2fs.conf so the "small" and "floppy" stanzas read: [fs_types] small = { blocksize = 4096 inode_size = 128 inode_ratio = 4096 } floppy = { blocksize = 4096 inode_size = 128 inode_ratio = 8192 } I'm pretty certain your failures won't reproduce if you either change how you call mke2fs for small file systems, or change your /etc/mke2fs.conf file as shown above. Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-23 16:04 ` Pocas, Jamie @ 2015-09-23 16:59 ` Theodore Ts'o 2015-09-23 18:20 ` Pocas, Jamie 0 siblings, 1 reply; 13+ messages in thread From: Theodore Ts'o @ 2015-09-23 16:59 UTC (permalink / raw) To: Pocas, Jamie; +Cc: Eric Sandeen, linux-ext4@vger.kernel.org On Wed, Sep 23, 2015 at 12:04:49PM -0400, Pocas, Jamie wrote: > Interesting. Thanks for the detailed break-down! I don't mind the > workaround of using 4k "soft" block size on the filesystem, even for > smaller filesystems. Now that I understand better, I think you were > on target with your earlier explanation of bd_set_size(). So this > means it's not an ext4 bug. I think the online resize of loopback > device (or any other block device driver) should use something like > the code in check_disk_size_change() instead of bd_set_size(). I > will have to test this out. Thanks again. To be clear, the 4k file system block size is an on-disk format thing, and it will give you better performance (at the cost of increasing internal fragmentation overhead which can consume more space). It will cause the soft block size to be set to be 4k when the file system is mounted, but that's a different thing. Note that for larger ext4 file systems, or if you are using XFS, the file system block size will be 4k, so explicitly configuring the blocksize to 4k isn't anything particularly unusual. It's a change in the defaults, but I showed you how you can change the defaults by editing /etc/mke2fs.conf. Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-23 16:59 ` Theodore Ts'o @ 2015-09-23 18:20 ` Pocas, Jamie 0 siblings, 0 replies; 13+ messages in thread From: Pocas, Jamie @ 2015-09-23 18:20 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Eric Sandeen, linux-ext4@vger.kernel.org I understand the tradeoff. Thanks. I tested a change in the driver from calling bd_set_size() to calling check_disk_size_change() and it is in fact working as expected! I can see that the capacity is increased correctly, but the blocksize reported by 'blockdev --getbsz' is still retained as 1024 correctly. Obviously this needs more review and testing, but I think this is a bug with loopback and any other driver that would call bd_set_size() to do online resize. As far as ext is concerned, consider it case-closed. Thanks again for the tips. -----Original Message----- From: Theodore Ts'o [mailto:tytso@mit.edu] Sent: Wednesday, September 23, 2015 12:59 PM To: Pocas, Jamie Cc: Eric Sandeen; linux-ext4@vger.kernel.org Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes On Wed, Sep 23, 2015 at 12:04:49PM -0400, Pocas, Jamie wrote: > Interesting. Thanks for the detailed break-down! I don't mind the > workaround of using 4k "soft" block size on the filesystem, even for > smaller filesystems. Now that I understand better, I think you were on > target with your earlier explanation of bd_set_size(). So this means > it's not an ext4 bug. I think the online resize of loopback device (or > any other block device driver) should use something like the code in > check_disk_size_change() instead of bd_set_size(). I will have to test > this out. Thanks again. To be clear, the 4k file system block size is an on-disk format thing, and it will give you better performance (at the cost of increasing internal fragmentation overhead which can consume more space). It will cause the soft block size to be set to be 4k when the file system is mounted, but that's a different thing. Note that for larger ext4 file systems, or if you are using XFS, the file system block size will be 4k, so explicitly configuring the blocksize to 4k isn't anything particularly unusual. It's a change in the defaults, but I showed you how you can change the defaults by editing /etc/mke2fs.conf. Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-22 19:12 resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes Pocas, Jamie 2015-09-22 19:33 ` Eric Sandeen @ 2015-09-22 20:20 ` Theodore Ts'o 2015-09-22 21:26 ` Pocas, Jamie 1 sibling, 1 reply; 13+ messages in thread From: Theodore Ts'o @ 2015-09-22 20:20 UTC (permalink / raw) To: Pocas, Jamie; +Cc: linux-ext4@vger.kernel.org On Tue, Sep 22, 2015 at 03:12:53PM -0400, Pocas, Jamie wrote: > > I have a very reproducible spin in resize2fs (x86_64) on both CentOS > 6 latest rpms and CentOS 7. It will peg one core at 100%. This > happens with both e2fsprogs version 1.41.12 on CentOS 6 w/ latest > 2.6.32 kernel rpm installed and e2fsprogs version 1.42.9 on CentOS 7 > with latest 3.10 kernel rpm installed. The key to reproducing this > seems to be when creating small filesystems. For example if I create > an ext4 filesystem on a 100MiB disk (or file), and then increase the > size of the underlying disk (or file) to say 1GiB, it will spin and > consume 100% CPU and not finish even after hours (it should take a > few seconds). I can't reproduce the problem using a 3.10.88 kernel using e2fsprogs 1.42.12-1.1 as shipped with Debian x86_64 jessie 8.2 release image. (As found on Google Compute Engine, but it should be the same no matter what you're using.) I've attached the repro script I'm using. The kernel config I'm using is here: https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kernel-configs/ext4-x86_64-config-3.10 I also tried reproducing it on CentOS 6.7 as shipped by Google Compute Engine: [root@centos-test tytso]# cat /etc/centos-release CentOS release 6.7 (Final) [root@centos-test tytso]# uname -a Linux centos-test 2.6.32-573.3.1.el6.x86_64 #1 SMP Thu Aug 13 22:55:16 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@centos-test tytso]# rpm -q e2fsprogs e2fsprogs-1.41.12-22.el6.x86_64 And I can't reproduce it there either. Can you take a look at my repro script and see if it fails for you? And if it doesn't, can you adjust it until it does reproduce for you? Thanks, - Ted #!/bin/bash FS=/tmp/foo.img cp /dev/null $FS mke2fs -t ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -Fq $FS 100M truncate -s 1G $FS DEV=$(losetup -j $FS | awk -F: '{print $1}') if test -z "$DEV" then losetup -f $FS DEV=$(losetup -j $FS | awk -F: '{print $1}') fi if test -z "$DEV" then echo "Can't create loop device for $FS" else echo "Using loop device $DEV" CLEANUP_LOOP=yes fi e2fsck -p $DEV mkdir /tmp/mnt$$ mount $DEV /tmp/mnt$$ resize2fs -p $DEV 1G umount /tmp/mnt$$ e2fsck -fy $DEV if test "$CLEANUP_LOOP" = "yes" then losetup -d $DEV fi rmdir /tmp/mnt$$ ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-22 20:20 ` Theodore Ts'o @ 2015-09-22 21:26 ` Pocas, Jamie 2015-09-22 23:41 ` Eric Sandeen 0 siblings, 1 reply; 13+ messages in thread From: Pocas, Jamie @ 2015-09-22 21:26 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4@vger.kernel.org Hi Theodore, I am not sure if you had a chance to see my reply to Eric yet. I can see you are using the same general approach that Eric was using. The key difference from what I am doing again seems to be that I am resizing the underlying disk *while the filesystem is mounted*. Instead you both are using truncate to grow the disk while the filesystem is not currently mounted, and then mounting it. So maybe there is some fundamental cleanup or fixup that happens during the subsequent mount that doesn't happen if you grow the disk while the filesystem is already online. With the test example, you can do this using 'losetup -c' to force reread the size of the underlying file. I can understand why a disk should not shrink while the filesystem is mounted, but in my case I am growing it so the existing FS st ructure should be unharmed. Your script works -- caveat I had to fix some line wrap issues probably due to my email client, but it was pretty clear what your intention was. Here's my modification to your script that reproduces the issue. #!/bin/bash FS=/tmp/foo.img cp /dev/null $FS mke2fs -t ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -Fq $FS 100M DEV=$(losetup -j $FS | awk -F: '{print $1}') if test -z "$DEV" then losetup -f $FS DEV=$(losetup -j $FS | awk -F: '{print $1}') fi if test -z "$DEV" then echo "Can't create loop device for $FS" else echo "Using loop device $DEV" CLEANUP_LOOP=yes fi #e2fsck -p $DEV # Not sure if this needs to be commented out. I will have to reboot to find out though. mkdir /tmp/mnt$$ mount $DEV /tmp/mnt$$ # Grow the backing file *AFTER* we are mounted truncate -s 1G $FS # Tell loopback device to rescan the size losetup -c $DEV resize2fs -p $DEV 1G umount /tmp/mnt$$ e2fsck -fy $DEV if test "$CLEANUP_LOOP" = "yes" then losetup -d $DEV fi rmdir /tmp/mnt$$ ## END OF SCRIPT Execution looks like this $ sudo ./repro.sh [sudo] password for jpocas: Using loop device /dev/loop0 resize2fs 1.42.9 (28-Dec-2013) Filesystem at /dev/loop0 is mounted on /tmp/mnt5715; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 8 ## SPINNING 100% CPU! -----Original Message----- From: Theodore Ts'o [mailto:tytso@mit.edu] Sent: Tuesday, September 22, 2015 4:21 PM To: Pocas, Jamie Cc: linux-ext4@vger.kernel.org Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes On Tue, Sep 22, 2015 at 03:12:53PM -0400, Pocas, Jamie wrote: > > I have a very reproducible spin in resize2fs (x86_64) on both CentOS > 6 latest rpms and CentOS 7. It will peg one core at 100%. This happens > with both e2fsprogs version 1.41.12 on CentOS 6 w/ latest > 2.6.32 kernel rpm installed and e2fsprogs version 1.42.9 on CentOS 7 > with latest 3.10 kernel rpm installed. The key to reproducing this > seems to be when creating small filesystems. For example if I create > an ext4 filesystem on a 100MiB disk (or file), and then increase the > size of the underlying disk (or file) to say 1GiB, it will spin and > consume 100% CPU and not finish even after hours (it should take a few > seconds). I can't reproduce the problem using a 3.10.88 kernel using e2fsprogs 1.42.12-1.1 as shipped with Debian x86_64 jessie 8.2 release image. (As found on Google Compute Engine, but it should be the same no matter what you're using.) I've attached the repro script I'm using. The kernel config I'm using is here: https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kernel-configs/ext4-x86_64-config-3.10 I also tried reproducing it on CentOS 6.7 as shipped by Google Compute Engine: [root@centos-test tytso]# cat /etc/centos-release CentOS release 6.7 (Final) [root@centos-test tytso]# uname -a Linux centos-test 2.6.32-573.3.1.el6.x86_64 #1 SMP Thu Aug 13 22:55:16 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@centos-test tytso]# rpm -q e2fsprogs e2fsprogs-1.41.12-22.el6.x86_64 And I can't reproduce it there either. Can you take a look at my repro script and see if it fails for you? And if it doesn't, can you adjust it until it does reproduce for you? Thanks, - Ted #!/bin/bash FS=/tmp/foo.img cp /dev/null $FS mke2fs -t ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -Fq $FS 100M truncate -s 1G $FS DEV=$(losetup -j $FS | awk -F: '{print $1}') if test -z "$DEV" then losetup -f $FS DEV=$(losetup -j $FS | awk -F: '{print $1}') fi if test -z "$DEV" then echo "Can't create loop device for $FS" else echo "Using loop device $DEV" CLEANUP_LOOP=yes fi e2fsck -p $DEV mkdir /tmp/mnt$$ mount $DEV /tmp/mnt$$ resize2fs -p $DEV 1G umount /tmp/mnt$$ e2fsck -fy $DEV if test "$CLEANUP_LOOP" = "yes" then losetup -d $DEV fi rmdir /tmp/mnt$$ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-22 21:26 ` Pocas, Jamie @ 2015-09-22 23:41 ` Eric Sandeen 2015-09-23 3:40 ` Pocas, Jamie 0 siblings, 1 reply; 13+ messages in thread From: Eric Sandeen @ 2015-09-22 23:41 UTC (permalink / raw) To: Pocas, Jamie, Theodore Ts'o; +Cc: linux-ext4@vger.kernel.org On 9/22/15 4:26 PM, Pocas, Jamie wrote: > Hi Theodore, > > I am not sure if you had a chance to see my reply to Eric yet. I can > see you are using the same general approach that Eric was using. The > key difference from what I am doing again seems to be that I am > resizing the underlying disk *while the filesystem is mounted*. Do you see the same problem if you resize a physical disk, not just with loopback? Sounds like it... In theory it should be reproducible w/ lvm too, then, I think, unless there's some issue specific to your block device similar to what's happening on the loop device. > Instead you both are using truncate to grow the disk while the > filesystem is not currently mounted, and then mounting it. Always worth communicating a testcase in the first email, if you have one, so we don't have to guess. ;) thanks, -Eric ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes 2015-09-22 23:41 ` Eric Sandeen @ 2015-09-23 3:40 ` Pocas, Jamie 0 siblings, 0 replies; 13+ messages in thread From: Pocas, Jamie @ 2015-09-23 3:40 UTC (permalink / raw) To: Eric Sandeen, Theodore Ts'o; +Cc: linux-ext4@vger.kernel.org Yes I am seeing the same problem with physical disks and other types of virtualized disks (e.g. VMware can resize vmdk virtual disks online). Sorry if the initial ambiguity wasted some time. I was trying to come up with the smallest most isolated example that reproduced the issue so I went with the loopback approach since it doesn't have a lot of moving parts or external dependencies and it's easy to make arbitrary sized devices including these small ones. I could care less if loopback didn't work for my intended use but I am happy it is useful in reproducing the issue. Honestly, for my application it's easy to work around by just not allowing devices that small that we will never encounter anyway but I thought I would do my due diligence and report what I think is a bug :) ________________________________________ From: Eric Sandeen [sandeen@redhat.com] Sent: Tuesday, September 22, 2015 7:41 PM To: Pocas, Jamie; Theodore Ts'o Cc: linux-ext4@vger.kernel.org Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes On 9/22/15 4:26 PM, Pocas, Jamie wrote: > Hi Theodore, > > I am not sure if you had a chance to see my reply to Eric yet. I can > see you are using the same general approach that Eric was using. The > key difference from what I am doing again seems to be that I am > resizing the underlying disk *while the filesystem is mounted*. Do you see the same problem if you resize a physical disk, not just with loopback? Sounds like it... In theory it should be reproducible w/ lvm too, then, I think, unless there's some issue specific to your block device similar to what's happening on the loop device. > Instead you both are using truncate to grow the disk while the > filesystem is not currently mounted, and then mounting it. Always worth communicating a testcase in the first email, if you have one, so we don't have to guess. ;) thanks, -Eric ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-09-23 18:21 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-22 19:12 resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes Pocas, Jamie 2015-09-22 19:33 ` Eric Sandeen 2015-09-22 20:28 ` Pocas, Jamie 2015-09-22 23:02 ` Theodore Ts'o 2015-09-23 4:20 ` Pocas, Jamie 2015-09-23 15:14 ` Theodore Ts'o 2015-09-23 16:04 ` Pocas, Jamie 2015-09-23 16:59 ` Theodore Ts'o 2015-09-23 18:20 ` Pocas, Jamie 2015-09-22 20:20 ` Theodore Ts'o 2015-09-22 21:26 ` Pocas, Jamie 2015-09-22 23:41 ` Eric Sandeen 2015-09-23 3:40 ` Pocas, Jamie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox