linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jon Bernard <jbernard@tuxion.com>,
	Dmitry Monakhov <dmonakhov@openvz.org>,
	linux-ext4@vger.kernel.org
Subject: Re: kernel bug at fs/ext4/resize.c:409
Date: Fri, 14 Feb 2014 19:16:24 -0800	[thread overview]
Message-ID: <20140215031624.GI9176@birch.djwong.org> (raw)
In-Reply-To: <20140214234631.GC1748@thunk.org>

Per Ted's request, I've started editing a document on the ext4 wiki:

https://ext4.wiki.kernel.org/index.php/Ext4_VM_Images 

[comments below too]

On Fri, Feb 14, 2014 at 06:46:31PM -0500, Theodore Ts'o wrote:
> On Fri, Feb 14, 2014 at 03:19:05PM -0500, Jon Bernard wrote:
> > Ahh, I see.  Here's where this comes from: the particular usecase is
> > provisioning of new cloud instances whose root volume is of unknown
> > size.  The filesystem and its contents are created and bundled
> > before-hand into the smallest filesystem possible.  The instance is PXE
> > booted for provisioning and the root filesystem is then copied onto the
> > disk - and then resized to take advantage of the total amount of space.
> > 
> > In order to support very large partitions, the filesystem is created
> > with an abnormally large inode table so that large resizes would be
> > possible.  I traced it to this commit as best I can tell:
> > 
> >     https://github.com/openstack/diskimage-builder/commit/fb246a02eb2ed330d3cc37f5795b3ed026aabe07
> > 
> > I assumed that additional inodes would be allocated along with block
> > groups during an online resize, but that commit contradicts my current
> > understanding. 
> 
> Additional inodes *are* allocated as the file system is grown.
> However thought otherwise was wrong.  What happens is that there is a
> fixed number of inodes per block group.  When the file system is
> resized, either by growing or shrinking file system, as block groups
> are added or removed from the file system, the number of inodes
> is also added or removed.
> 
> > I suggested that the filesystem be created during the time of
> > provisioning to allow a more optimal on-disk layout, and I believe this
> > is being considered now.
> 
> What causes the most damage in terms of a non-optimal data block
> layout, installing the file system on a large file system, and then
> shrinking the file system to its minimum size use resize2fs -M.  There
> is so some non-optimality that occurs as the file system gets filled
> beyond about 90% full, but that it's not nearly so bad as shrinking
> the file system --- which you should avoid at all costs.
> 
> From a performance point of view, the only time you should try to do
> an off-line resize2fs shrink is if you are shrinking the file system
> by a handful of blocks as part of converting a file system in place to
> use LVM or LUKS encryption, and you need to make room for some
> metadata blocks at the end of the partition.
> 
> The other thing thing to note is that if you are using a format such
> as qcow2, or something like the device-mapper's thin-provisining
> (thinkp) scheme, or if you are willing to deal with sparse files, one
> approach is to not resize the file system at all.  You could just use
> a tool like zerofree[1] to zero out all of the unused blocks in the
> file system, and then use "/bin/cp --sparse==always" to cause all zero
> blocks to be treated as sparse blocks on the destination file.
> 
> [1] http://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/util/zerofree.c

I have a zerofree variant that knows how to punch/discard blocks that I'll
throw into contrib/ the next time I send out one of my megapatch sets.

> This is part of how I maintain my root filesystem that I use in a VM
> for testing ext4 changes upstream.  After I update to the latest
> Debian unstable package updates, install the latest updates from the
> xfstests and e2fsprogs git repositories, I then run the following
> script which uses the zerofree.c program to compress the qcow2 root
> file system image that I use with kvm:
> 
> http://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/compress-rootfs
> 
> 
> Also, starting with e2fsprogs 1.42.10, there's another way you can

These three options (-rap) are available in 1.42.9.  Is there a particular
reason not to use it before 1.42.10?

> efficiently deploy a large file system image by only copying the
> blocks which are in use, by using a command like this:
> 
>        e2image -rap src_fs dest_fs
> 
> (See also the -c flag as described in e2image's man page if you want
> to use this technique to do incremental image-based backups onto a
> flash-based backup medium; I was using this for a while to keep two
> laptop SSD's root filesystem in sync with one another.)
> 
> So there are lots of ways that you can do what you need, all without
> playing games with resize2fs.  Perhaps some of them would actually be
> better for your use case.

Calvin Watson noted on Ted's G+ repost that one can use fstrim in newer
versions of QEMU (1.5+?) to punch out unused blocks if the virtual disk is
emulated via virtio-scsi.

--D
> 
> 
> > If it turns out to be not terribly complicated and there is not an
> > immediate time constraint, I would love to try to help with this or at
> > least test patches.
> 
> I will hopefully have a bug fix in the next week or two.  
> 
> Cheers,
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-02-15  3:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-03 18:26 kernel bug at fs/ext4/resize.c:409 Jon Bernard
2014-02-03 18:56 ` Theodore Ts'o
2014-02-06 21:08   ` Jon Bernard
2014-02-13 13:24     ` Dmitry Monakhov
2014-02-13 14:53       ` Jon Bernard
2014-02-13 21:18         ` Theodore Ts'o
2014-02-13 21:27           ` Theodore Ts'o
2014-02-14  3:13           ` Andreas Dilger
2014-02-14 20:19           ` Jon Bernard
2014-02-14 23:46             ` Theodore Ts'o
2014-02-15  3:16               ` Darrick J. Wong [this message]
2014-02-15 15:34                 ` Theodore Ts'o
2014-02-16  2:35 ` [PATCH] ext4: fix online resize with very large inode tables Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140215031624.GI9176@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=dmonakhov@openvz.org \
    --cc=jbernard@tuxion.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).