From: Eric Sandeen <sandeen@redhat.com>
To: "Pocas, Jamie" <Jamie.Pocas@emc.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes
Date: Tue, 22 Sep 2015 14:33:18 -0500 [thread overview]
Message-ID: <5601ACFE.5080904@redhat.com> (raw)
In-Reply-To: <06724CF51D6BC94E9BEE7A8A8CB82A6740FE22BCBA@MX01A.corp.emc.com>
On 9/22/15 2:12 PM, Pocas, Jamie wrote:
> Hi,
>
> I apologize in advance if this is a well-known issue but I don't see
> it as an open bug in sourceforge.net. I'm not able to open a bug
> there without permission, so I am writing you here.
the centos bug tracker may be the right place for your distro...
> I have a very reproducible spin in resize2fs (x86_64) on both CentOS
> 6 latest rpms and CentOS 7. It will peg one core at 100%. This
> happens with both e2fsprogs version 1.41.12 on CentOS 6 w/ latest
> 2.6.32 kernel rpm installed and e2fsprogs version 1.42.9 on CentOS 7
> with latest 3.10 kernel rpm installed. The key to reproducing this
> seems to be when creating small filesystems. For example if I create
> an ext4 filesystem on a 100MiB disk (or file), and then increase the
> size of the underlying disk (or file) to say 1GiB, it will spin and
> consume 100% CPU and not finish even after hours (it should take a
> few seconds).
>
> Here are the flags used when creating the fs.
>
> mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F 0 /dev/sdz
AFAIK -F doesn't take an argument, is that 0 supposed to be there?
but if I test this:
# truncate --size=100m testfile
# mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F testfile
# truncate --size=1g testfile
# mount -o loop testfile mnt
# resize2fs /dev/loop0
that works fine on my rhel7 box, with kernel-3.10.0-229.el7 and
e2fsprogs-1.42.9-7.el7
Do those same steps fail for you?
-Eric
> Some of these may not be necessary anymore but were very experimental
> when I first started testing on CentOS 5 way back. I think all of
> these options except "nodiscard" are the defaults now anyway. I only
> use the option because in the application I am using this for, it
> doesn't make sense to discard the existing devices which are
> initially zeroed anyway. I suppose with volumes this small it doesn't
> take much extra time anyway, but I don't want to go down that rat
> hole. I am not doing anything custom with the number of inodes,
> smaller blocksize (1k), etc... just what you see above. So it's
> taking the default settings for those, which maybe are bogus and
> broken for small volumes nowadays. I don't know.
>
> Here is the stack...
>
> [root@localhost ~]# cat /proc/8403/stack
> [<ffffffff8106ee1a>] __cond_resched+0x2a/0x40
> [<ffffffff8112860b>] find_lock_page+0x3b/0x80
> [<ffffffff8112874f>] find_or_create_page+0x3f/0xb0
> [<ffffffff811c8540>] __getblk+0xf0/0x2a0
> [<ffffffff811c9ad3>] __bread+0x13/0xb0
> [<ffffffffa056098c>] ext4_group_extend+0xfc/0x410 [ext4]
> [<ffffffffa05498a0>] ext4_ioctl+0x660/0x920 [ext4]
> [<ffffffff811a7372>] vfs_ioctl+0x22/0xa0
> [<ffffffff811a7514>] do_vfs_ioctl+0x84/0x580
> [<ffffffff811a7a91>] sys_ioctl+0x81/0xa0
> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> It seems to be sleeping, waiting for a free page, and then sleeping
> again in the kernel. I don't get ANY output after the version heading
> prints out, even with the -d debug flags turned up all the way. It's
> really getting stuck very early on with no I/O going to the disk
> during this CPU spinning. I don't see anything in the dmesg related
> to this activity either.
>
> I haven't finished binary searching for the specific boundary where
> the problem occurs, but I initially noticed that 1GiB and larger
> always worked and took only a few seconds. Then I stepped down to
> 500MiB and it hung in the same way. Then stepped up to 750MiB and it
> works normally. So there is some kind of boundary between 500-750MiB
> that I haven't found yet.
>
> I understand that these are really small filesystems nowadays other
> than something that might fit on a CD, but I'm hoping that it's
> something simple that could probably be fixed easily. I suspect that
> due to the disk size, there are probably bad or unusual defaults
> being selected, or there is a structure that is being undersized, or
> with unexpected filesystem dimensions such that the conditions it's
> expecting are invalid and will never be satisfied. On that note I am
> wondering with disks this small if it is relying on the antiquated
> geometry reporting from the device because I know that sometimes with
> small virtual disks like there, there can sometimes be problems
> trying to accurately emulate a fake C/H/S geometry with disks this
> small and sometimes rounding down is necessary. I wonder if a
> mismatch could cause this. I don't want to steer anyone off into the
> weeds though.
>
> I haven't dug into the code much yet, but I was wondering if anyone
> had any ideas what could be going on. I think at the very least this
> is a bug in the resize code in the ext4 code in the kernel itself
> because even if the resize2fs program is giving bad parameters, I
> would not expect this type of hang to be able to be initiated from
> user space.>
> Regards,
> Jamie
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2015-09-22 19:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-22 19:12 resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes Pocas, Jamie
2015-09-22 19:33 ` Eric Sandeen [this message]
2015-09-22 20:28 ` Pocas, Jamie
2015-09-22 23:02 ` Theodore Ts'o
2015-09-23 4:20 ` Pocas, Jamie
2015-09-23 15:14 ` Theodore Ts'o
2015-09-23 16:04 ` Pocas, Jamie
2015-09-23 16:59 ` Theodore Ts'o
2015-09-23 18:20 ` Pocas, Jamie
2015-09-22 20:20 ` Theodore Ts'o
2015-09-22 21:26 ` Pocas, Jamie
2015-09-22 23:41 ` Eric Sandeen
2015-09-23 3:40 ` Pocas, Jamie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5601ACFE.5080904@redhat.com \
--to=sandeen@redhat.com \
--cc=Jamie.Pocas@emc.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox