Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Sandeen <sandeen@redhat.com>
To: "Pocas, Jamie" <Jamie.Pocas@emc.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes
Date: Tue, 22 Sep 2015 14:33:18 -0500	[thread overview]
Message-ID: <5601ACFE.5080904@redhat.com> (raw)
In-Reply-To: <06724CF51D6BC94E9BEE7A8A8CB82A6740FE22BCBA@MX01A.corp.emc.com>

On 9/22/15 2:12 PM, Pocas, Jamie wrote:
> Hi,
> 
> I apologize in advance if this is a well-known issue but I don't see
> it as an open bug in sourceforge.net. I'm not able to open a bug
> there without permission, so I am writing you here.

the centos bug tracker may be the right place for your distro...

> I have a very reproducible spin in resize2fs (x86_64) on both CentOS
> 6 latest rpms and CentOS 7. It will peg one core at 100%. This
> happens with both e2fsprogs version 1.41.12 on CentOS 6 w/ latest
> 2.6.32 kernel rpm installed and e2fsprogs version 1.42.9 on CentOS 7
> with latest 3.10 kernel rpm installed. The key to reproducing this
> seems to be when creating small filesystems. For example if I create
> an ext4 filesystem on a 100MiB disk (or file), and then increase the
> size of the underlying disk (or file) to say 1GiB, it will spin and
> consume 100% CPU and not finish even after hours (it should take a
> few seconds).
> 
> Here are the flags used when creating the fs.
> 
> mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F 0 /dev/sdz

AFAIK -F doesn't take an argument, is that 0 supposed to be there?

but if I test this:

# truncate --size=100m testfile
# mkfs.ext4 -O uninit_bg -E nodiscard,lazy_itable_init=1 -F testfile
# truncate --size=1g testfile
# mount -o loop testfile mnt
# resize2fs /dev/loop0

that works fine on my rhel7 box, with kernel-3.10.0-229.el7 and
e2fsprogs-1.42.9-7.el7

Do those same steps fail for you?

-Eric

> Some of these may not be necessary anymore but were very experimental
> when I first started testing on CentOS 5 way back. I think all of
> these options except "nodiscard" are the defaults now anyway. I only
> use the option because in the application I am using this for, it
> doesn't make sense to discard the existing devices which are
> initially zeroed anyway. I suppose with volumes this small it doesn't
> take much extra time anyway, but I don't want to go down that rat
> hole. I am not doing anything custom with the number of inodes,
> smaller blocksize (1k), etc... just what you see above. So it's
> taking the default settings for those, which maybe are bogus and
> broken for small volumes nowadays. I don't know.
> 
> Here is the stack...
> 
> [root@localhost ~]# cat /proc/8403/stack
> [<ffffffff8106ee1a>] __cond_resched+0x2a/0x40
> [<ffffffff8112860b>] find_lock_page+0x3b/0x80
> [<ffffffff8112874f>] find_or_create_page+0x3f/0xb0
> [<ffffffff811c8540>] __getblk+0xf0/0x2a0
> [<ffffffff811c9ad3>] __bread+0x13/0xb0
> [<ffffffffa056098c>] ext4_group_extend+0xfc/0x410 [ext4]
> [<ffffffffa05498a0>] ext4_ioctl+0x660/0x920 [ext4]
> [<ffffffff811a7372>] vfs_ioctl+0x22/0xa0
> [<ffffffff811a7514>] do_vfs_ioctl+0x84/0x580
> [<ffffffff811a7a91>] sys_ioctl+0x81/0xa0
> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> It seems to be sleeping, waiting for a free page, and then sleeping
> again in the kernel. I don't get ANY output after the version heading
> prints out, even with the -d debug flags turned up all the way. It's
> really getting stuck very early on with no I/O going to the disk
> during this CPU spinning. I don't see anything in the dmesg related
> to this activity either.
> 
> I haven't finished binary searching for the specific boundary where
> the problem occurs, but I initially noticed that 1GiB and larger
> always worked and took only a few seconds. Then I stepped down to
> 500MiB and it hung in the same way. Then stepped up to 750MiB and it
> works normally. So there is some kind of boundary between 500-750MiB
> that I haven't found yet.
> 
> I understand that these are really small filesystems nowadays other
> than something that might fit on a CD, but I'm hoping that it's
> something simple that could probably be fixed easily. I suspect that
> due to the disk size, there are probably bad or unusual defaults
> being selected, or there is a structure that is being undersized, or
> with unexpected filesystem dimensions such that the conditions it's
> expecting are invalid and will never be satisfied. On that note I am
> wondering with disks this small if it is relying on the antiquated
> geometry reporting from the device because I know that sometimes with
> small virtual disks like there, there can sometimes be problems
> trying to accurately emulate a fake C/H/S geometry with disks this
> small and sometimes rounding down is necessary. I wonder if a
> mismatch could cause this. I don't want to steer anyone off into the
> weeds though.
> 
> I haven't dug into the code much yet, but I was wondering if anyone
> had any ideas what could be going on. I think at the very least this
> is a bug in the resize code in the ext4 code in the kernel itself
> because even if the resize2fs program is giving bad parameters, I
> would not expect this type of hang to be able to be initiated from
> user space.> 
> Regards,
> Jamie
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2015-09-22 19:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-22 19:12 resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes Pocas, Jamie
2015-09-22 19:33 ` Eric Sandeen [this message]
2015-09-22 20:28   ` Pocas, Jamie
2015-09-22 23:02     ` Theodore Ts'o
2015-09-23  4:20       ` Pocas, Jamie
2015-09-23 15:14         ` Theodore Ts'o
2015-09-23 16:04           ` Pocas, Jamie
2015-09-23 16:59             ` Theodore Ts'o
2015-09-23 18:20               ` Pocas, Jamie
2015-09-22 20:20 ` Theodore Ts'o
2015-09-22 21:26   ` Pocas, Jamie
2015-09-22 23:41     ` Eric Sandeen
2015-09-23  3:40       ` Pocas, Jamie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5601ACFE.5080904@redhat.com \
    --to=sandeen@redhat.com \
    --cc=Jamie.Pocas@emc.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.