From: Ted Ts'o <tytso@mit.edu>
To: Robin Dong <hao.bigrat@gmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: Question about BIGALLOC
Date: Wed, 10 Aug 2011 22:59:44 -0400 [thread overview]
Message-ID: <20110811025944.GE3625@thunk.org> (raw)
In-Reply-To: <CAFZ0FUUH85yWoVZ3XpMMYjhiaDrV9QQRSrLsAomXhH8uG_duBg@mail.gmail.com>
On Fri, Aug 05, 2011 at 03:59:01PM +0800, Robin Dong wrote:
> Hi, Ted
>
> I am doing some test of BIGALLOC using "next" branch of e2fsprogs and
> kernel (3.0) with 23 bigalloc-patches.
>
> Everything seems to work, but I have a question:
> the "ee_len" of "struct ext4_extent" is used to indicate block numbers
> not cluster,
> an ext4_extent in 4K-block-size filesystem can only hold 128MB space at most
> even with BIGALLOC feature enabled, so we don't have any benefit from
> this for a file with large number of blocks.
>
> Is this the design behavior or you will change it in the next version?
It's not something I can change, because the VM subsystem
fundamentally assumes that file system block size is less than or
equal to the page size. If I changed the granularity in the extent
length, then in the case of a sparse file, blocks would have to be
allocated and zero'ed in units of a cluster. The VM doesn't support
this well.
For example, suppose you fallocate a file to be 1 megabyte. That
means that you have a 1mb extent which is marked as uninitialized.
Now suppose you mmap() this file, and then you write a single byte at
offset 20480. This dirties a single 4k page, and when we write out
that page, we end up converting the 1mb extent uninitialized extent
into 3 extents: a 20k uninitalized extent, a 4k initialized extent,
and a 1000k uninitalized extent. Now suppose this was done on a
bigalloc file system with a 64k cluster size. If ee_len was
denominated in 64k cluster chunks, we couldn't express the concept of
a 20k or 4k extent.
This is the same reason why we can't support a 64k file system block
size. If the user dirties a single 4k block in an otherwise sparse
file, the VM would have to instantiate the other 56k pages and zero
them (atomically!) --- and the VM doesn't know how to do this.
- Ted
prev parent reply other threads:[~2011-08-11 2:59 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-05 7:59 Question about BIGALLOC Robin Dong
2011-08-11 2:59 ` Ted Ts'o [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110811025944.GE3625@thunk.org \
--to=tytso@mit.edu \
--cc=hao.bigrat@gmail.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.