From: Christoph Hellwig <hch@infradead.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: linux-fsdevel@vger.kernel.org,
Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH RFC] fs: add FIEMAP_FLAG_DISCARD support
Date: Mon, 4 Nov 2013 17:17:31 -0800 [thread overview]
Message-ID: <20131105011731.GA4586@infradead.org> (raw)
In-Reply-To: <20131105005146.GA26249@thunk.org>
On Mon, Nov 04, 2013 at 07:51:46PM -0500, Theodore Ts'o wrote:
> The an application in question wants to treat a large file as if it
> were a block device --- that's hardly unprecedented; enterprise
> databases tend to prefer using raw block devices (at least for
> benchmarking purposes), but system administrators like to
> administrative convenience of using a file system.
Totally reasonable use case.
>
> The goal here is get the performace as close to a raw block device as
> possible. Especially if you are using fast flash, the overhead of
> deallocating blocks using punch, only to reallocate the blocks when we
> later write into them, is just unnecessary overhead. Also, if you
> deallocate the blocks, they could end up getting grabbed by some other
> block allocation, which means the file can end up getting very
> fragmented --- which doesn't matter that much for flash, I suppose,
> but it means the extent tree could end up growing and getting nasty
> over time. The bottom line is why bother doing extra work when it's
> not necessary?
Now we're getting into trouble. I'm all for optimizing for a use case
someone cares for. But exposing intimate implementation of that use
case is almost always a bad idea.
So having a new fallocate to zero out parts of a file and not requiring
an allocation to back the file is fine. If it is on a filesystem
supporting discards with the discard zeroes blocks flag we can use the
implementation from your patch. If the device doesn't support discards
or doesn't zero them we'd need to implement it like the
XFS_IOC_ZERO_RANGE ioctl.
Note that exposing stale blocks is a problem at the block device level,
too. If you look at the openstack volume service for example they have
to explicitly zero out volumes during volume creation or deletion to
make sure no data is exposed to another tenant. The only way to
avoid that is to have some auto-zeroing extent state either in software
or hardware.
next prev parent reply other threads:[~2013-11-05 1:17 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-02 23:45 [PATCH RFC] fs: add FIEMAP_FLAG_DISCARD support Theodore Ts'o
2013-11-03 23:14 ` Dave Chinner
2013-11-03 23:42 ` Theodore Ts'o
2013-11-04 21:57 ` Dave Chinner
2013-11-05 0:35 ` [PATCH RFC] " Andreas Dilger
2013-11-04 10:03 ` [PATCH RFC] fs: " Christoph Hellwig
2013-11-05 0:51 ` Theodore Ts'o
2013-11-05 1:17 ` Christoph Hellwig [this message]
2013-11-05 4:36 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131105011731.GA4586@infradead.org \
--to=hch@infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.