From: Ted Ts'o <tytso@mit.edu>
To: Robin Dong <hao.bigrat@gmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: Question about BIGALLOC
Date: Wed, 10 Aug 2011 22:59:44 -0400 [thread overview]
Message-ID: <20110811025944.GE3625@thunk.org> (raw)
In-Reply-To: <CAFZ0FUUH85yWoVZ3XpMMYjhiaDrV9QQRSrLsAomXhH8uG_duBg@mail.gmail.com>
On Fri, Aug 05, 2011 at 03:59:01PM +0800, Robin Dong wrote:
> Hi, Ted
>
> I am doing some test of BIGALLOC using "next" branch of e2fsprogs and
> kernel (3.0) with 23 bigalloc-patches.
>
> Everything seems to work, but I have a question:
> the "ee_len" of "struct ext4_extent" is used to indicate block numbers
> not cluster,
> an ext4_extent in 4K-block-size filesystem can only hold 128MB space at most
> even with BIGALLOC feature enabled, so we don't have any benefit from
> this for a file with large number of blocks.
>
> Is this the design behavior or you will change it in the next version?
It's not something I can change, because the VM subsystem
fundamentally assumes that file system block size is less than or
equal to the page size. If I changed the granularity in the extent
length, then in the case of a sparse file, blocks would have to be
allocated and zero'ed in units of a cluster. The VM doesn't support
this well.
For example, suppose you fallocate a file to be 1 megabyte. That
means that you have a 1mb extent which is marked as uninitialized.
Now suppose you mmap() this file, and then you write a single byte at
offset 20480. This dirties a single 4k page, and when we write out
that page, we end up converting the 1mb extent uninitialized extent
into 3 extents: a 20k uninitalized extent, a 4k initialized extent,
and a 1000k uninitalized extent. Now suppose this was done on a
bigalloc file system with a 64k cluster size. If ee_len was
denominated in 64k cluster chunks, we couldn't express the concept of
a 20k or 4k extent.
This is the same reason why we can't support a 64k file system block
size. If the user dirties a single 4k block in an otherwise sparse
file, the VM would have to instantiate the other 56k pages and zero
them (atomically!) --- and the VM doesn't know how to do this.
- Ted
prev parent reply other threads:[~2011-08-11 2:59 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-05 7:59 Question about BIGALLOC Robin Dong
2011-08-11 2:59 ` Ted Ts'o [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110811025944.GE3625@thunk.org \
--to=tytso@mit.edu \
--cc=hao.bigrat@gmail.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).