From: Dave Chinner <david@fromorbit.com>
To: Ted Ts'o <tytso@mit.edu>, Lawrence Greenfield <leg@google.com>,
Josef Bacik <josef@redhat.com>,
linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-ext4@vger.kernel.org,
Subject: Re: [PATCH 1/6] fs: add hole punching to fallocate
Date: Wed, 12 Jan 2011 22:48:23 +1100 [thread overview]
Message-ID: <20110112114823.GO28803@dastard> (raw)
In-Reply-To: <20110111213007.GF2917@thunk.org>
On Tue, Jan 11, 2011 at 04:30:07PM -0500, Ted Ts'o wrote:
> On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote:
> > > IOWs, all they want to do is avoid the unwritten extent conversion
> > > overhead. Time has shown that a bad security/performance tradeoff
> > > decision was made 13 years ago in XFS, so I see little reason to
> > > repeat it for ext4 today....
>
> I suspect things may have changed somewhat; both in terms of
> requirements and nature of cluter file systems, and the performance of
> various storage systems (including PCIe-attached flash devices).
We can throw 1000x more CPU power and memory at the problem than
we could 13 years ago. IOW the system balance hasn't changed (even
considering pci-e SSDs) compared to 13 years. Hence if it was a bad
tradeoff 13 years ago, it's still a bad tradeoff today.
> > I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead
> > of extent conversion. It's that extent conversion causes more metadata
> > operations than what you'd have otherwise, which means systems that
> > want to use O_DIRECT and make sure the data doesn't go away either
> > have to write O_DIRECT|O_DSYNC or need to call fdatasync().
> > cluster file system implementor,
>
> One possibility might be to make it an optional feature which is only
> enabled via a mount option. That way someone would have to explicit
> ask for this feature two ways (via a new flag to fallocate) and a
> mount option.
Proliferation of mount options just to enable feature X of API Y for
filesystem Z is not a good idea. Either you enable it via the
fallocate API or you don't allow it at all.
> It might not make sense for XFS, but for people who are using ext4
> as the local storage file system back-end,
How does this differ from a local filesystem? Are you talking about
storage nodes for clustered/cloudy storage?
If so, I know of quite a few places that use XFS for this purpose
and they all seem to measure storage in petabytes made up of small
boxes containing anywhere between 30-100TB each. The only request
for additional preallocation functionality I've got from people
running such applications recently is for XFS_IOC_ZERO_RANGE. This
is quite relevant, because that specifically converts allocated
extents to unwritten extents. i.e. they like to be able to
efficiently re-initialise allocated space to zeros rather than
have it contain stale data.
> and are doing all sorts of things to get the best performance,
> including disabling the journal, I suspect it really would make
> sense.
That's not really a convincing argument for a new interface that
needs to be maintained forever.
> So it could always be an
> optional-to-implement flag, that not all file systems should feel
> obliged to implement.
It could, but it still needs better justification.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Ted Ts'o <tytso@mit.edu>, Lawrence Greenfield <leg@google.com>,
Josef Bacik <josef@redhat.com>,
linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
xfs@oss.sgi.com, joel.becker@oracle.com, cmm@us.ibm.com,
cluster-devel@redhat.com
Subject: Re: [PATCH 1/6] fs: add hole punching to fallocate
Date: Wed, 12 Jan 2011 22:48:23 +1100 [thread overview]
Message-ID: <20110112114823.GO28803@dastard> (raw)
In-Reply-To: <20110111213007.GF2917@thunk.org>
On Tue, Jan 11, 2011 at 04:30:07PM -0500, Ted Ts'o wrote:
> On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote:
> > > IOWs, all they want to do is avoid the unwritten extent conversion
> > > overhead. Time has shown that a bad security/performance tradeoff
> > > decision was made 13 years ago in XFS, so I see little reason to
> > > repeat it for ext4 today....
>
> I suspect things may have changed somewhat; both in terms of
> requirements and nature of cluter file systems, and the performance of
> various storage systems (including PCIe-attached flash devices).
We can throw 1000x more CPU power and memory at the problem than
we could 13 years ago. IOW the system balance hasn't changed (even
considering pci-e SSDs) compared to 13 years. Hence if it was a bad
tradeoff 13 years ago, it's still a bad tradeoff today.
> > I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead
> > of extent conversion. It's that extent conversion causes more metadata
> > operations than what you'd have otherwise, which means systems that
> > want to use O_DIRECT and make sure the data doesn't go away either
> > have to write O_DIRECT|O_DSYNC or need to call fdatasync().
> > cluster file system implementor,
>
> One possibility might be to make it an optional feature which is only
> enabled via a mount option. That way someone would have to explicit
> ask for this feature two ways (via a new flag to fallocate) and a
> mount option.
Proliferation of mount options just to enable feature X of API Y for
filesystem Z is not a good idea. Either you enable it via the
fallocate API or you don't allow it at all.
> It might not make sense for XFS, but for people who are using ext4
> as the local storage file system back-end,
How does this differ from a local filesystem? Are you talking about
storage nodes for clustered/cloudy storage?
If so, I know of quite a few places that use XFS for this purpose
and they all seem to measure storage in petabytes made up of small
boxes containing anywhere between 30-100TB each. The only request
for additional preallocation functionality I've got from people
running such applications recently is for XFS_IOC_ZERO_RANGE. This
is quite relevant, because that specifically converts allocated
extents to unwritten extents. i.e. they like to be able to
efficiently re-initialise allocated space to zeros rather than
have it contain stale data.
> and are doing all sorts of things to get the best performance,
> including disabling the journal, I suspect it really would make
> sense.
That's not really a convincing argument for a new interface that
needs to be maintained forever.
> So it could always be an
> optional-to-implement flag, that not all file systems should feel
> obliged to implement.
It could, but it still needs better justification.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: "Ted Ts'o" <tytso@mit.edu>, Lawrence Greenfield <leg@google.com>,
Josef Bacik <josef@redhat.com>,
linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
xfs@oss.sgi.com, joel.becker@oracle.com, cmm@us.ibm.com,
cluster-devel@redhat.com
Subject: Re: [PATCH 1/6] fs: add hole punching to fallocate
Date: Wed, 12 Jan 2011 22:48:23 +1100 [thread overview]
Message-ID: <20110112114823.GO28803@dastard> (raw)
In-Reply-To: <20110111213007.GF2917@thunk.org>
On Tue, Jan 11, 2011 at 04:30:07PM -0500, Ted Ts'o wrote:
> On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote:
> > > IOWs, all they want to do is avoid the unwritten extent conversion
> > > overhead. Time has shown that a bad security/performance tradeoff
> > > decision was made 13 years ago in XFS, so I see little reason to
> > > repeat it for ext4 today....
>
> I suspect things may have changed somewhat; both in terms of
> requirements and nature of cluter file systems, and the performance of
> various storage systems (including PCIe-attached flash devices).
We can throw 1000x more CPU power and memory at the problem than
we could 13 years ago. IOW the system balance hasn't changed (even
considering pci-e SSDs) compared to 13 years. Hence if it was a bad
tradeoff 13 years ago, it's still a bad tradeoff today.
> > I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead
> > of extent conversion. It's that extent conversion causes more metadata
> > operations than what you'd have otherwise, which means systems that
> > want to use O_DIRECT and make sure the data doesn't go away either
> > have to write O_DIRECT|O_DSYNC or need to call fdatasync().
> > cluster file system implementor,
>
> One possibility might be to make it an optional feature which is only
> enabled via a mount option. That way someone would have to explicit
> ask for this feature two ways (via a new flag to fallocate) and a
> mount option.
Proliferation of mount options just to enable feature X of API Y for
filesystem Z is not a good idea. Either you enable it via the
fallocate API or you don't allow it at all.
> It might not make sense for XFS, but for people who are using ext4
> as the local storage file system back-end,
How does this differ from a local filesystem? Are you talking about
storage nodes for clustered/cloudy storage?
If so, I know of quite a few places that use XFS for this purpose
and they all seem to measure storage in petabytes made up of small
boxes containing anywhere between 30-100TB each. The only request
for additional preallocation functionality I've got from people
running such applications recently is for XFS_IOC_ZERO_RANGE. This
is quite relevant, because that specifically converts allocated
extents to unwritten extents. i.e. they like to be able to
efficiently re-initialise allocated space to zeros rather than
have it contain stale data.
> and are doing all sorts of things to get the best performance,
> including disabling the journal, I suspect it really would make
> sense.
That's not really a convincing argument for a new interface that
needs to be maintained forever.
> So it could always be an
> optional-to-implement flag, that not all file systems should feel
> obliged to implement.
It could, but it still needs better justification.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2011-01-12 11:48 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-08 20:32 [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` [PATCH 2/6] XFS: handle hole punching via fallocate properly Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-09 1:22 ` Dave Chinner
2010-11-09 1:22 ` Dave Chinner
2010-11-09 2:05 ` Josef Bacik
2010-11-09 2:05 ` Josef Bacik
2010-11-09 4:21 ` Dave Chinner
2010-11-09 4:21 ` Dave Chinner
2010-11-08 20:32 ` [PATCH 3/6] Ocfs2: " Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` [PATCH 4/6] Ext4: fail if we try to use hole punch Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` [PATCH 5/6] Btrfs: " Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-09 10:05 ` Will Newton
2010-11-09 10:05 ` Will Newton
2010-11-09 10:05 ` Will Newton
2010-11-09 10:05 ` Will Newton
2010-11-09 12:53 ` Josef Bacik
2010-11-09 12:53 ` Josef Bacik
2010-11-09 12:53 ` Josef Bacik
2010-11-09 12:53 ` Josef Bacik
2010-11-08 20:32 ` [PATCH 6/6] Gfs2: " Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-08 20:32 ` Josef Bacik
2010-11-09 1:12 ` [PATCH 1/6] fs: add hole punching to fallocate Dave Chinner
2010-11-09 1:12 ` Dave Chinner
2010-11-09 2:10 ` Josef Bacik
2010-11-09 2:10 ` Josef Bacik
2010-11-09 3:30 ` Ted Ts'o
2010-11-09 3:30 ` Ted Ts'o
2010-11-09 4:42 ` Dave Chinner
2010-11-09 4:42 ` Dave Chinner
2010-11-09 4:42 ` Dave Chinner
2010-11-09 21:41 ` Ted Ts'o
2010-11-09 21:41 ` Ted Ts'o
2010-11-09 21:53 ` [Cluster-devel] " Jan Kara
2010-11-09 21:53 ` Jan Kara
2010-11-09 21:53 ` Jan Kara
2010-11-09 23:40 ` Dave Chinner
2010-11-09 23:40 ` Dave Chinner
2010-11-09 23:40 ` Dave Chinner
2010-11-09 23:40 ` Dave Chinner
2011-01-11 21:13 ` Lawrence Greenfield
2011-01-11 21:13 ` Lawrence Greenfield
2011-01-11 21:13 ` Lawrence Greenfield
2011-01-11 21:13 ` Lawrence Greenfield
2011-01-11 21:30 ` Ted Ts'o
2011-01-11 21:30 ` Ted Ts'o
2011-01-12 11:48 ` Dave Chinner
2011-01-12 11:48 ` Dave Chinner [this message]
2011-01-12 11:48 ` Dave Chinner
2011-01-12 11:48 ` Dave Chinner
2011-01-12 12:44 ` Dave Chinner
2011-01-12 12:44 ` Dave Chinner
2011-01-28 18:13 ` Ric Wheeler
2011-01-28 18:13 ` Ric Wheeler
2010-11-09 20:51 ` Josef Bacik
2010-11-09 20:51 ` Josef Bacik
-- strict thread matches above, loose matches on Subject: below --
2010-11-15 17:05 Hole Punching V2 Josef Bacik
2010-11-15 17:05 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik
2010-11-15 17:05 ` Josef Bacik
2010-11-15 17:05 ` Josef Bacik
2010-11-16 11:16 ` Jan Kara
2010-11-16 11:16 ` Jan Kara
2010-11-16 11:43 ` Jan Kara
2010-11-16 11:43 ` Jan Kara
2010-11-16 12:52 ` Josef Bacik
2010-11-16 12:52 ` Josef Bacik
2010-11-16 13:14 ` Jan Kara
2010-11-16 13:14 ` Jan Kara
2010-11-17 0:22 ` Andreas Dilger
2010-11-17 0:22 ` Andreas Dilger
2010-11-17 2:11 ` Dave Chinner
2010-11-17 2:11 ` Dave Chinner
2010-11-17 2:28 ` Josef Bacik
2010-11-17 2:28 ` Josef Bacik
2010-11-17 2:34 ` Josef Bacik
2010-11-17 2:34 ` Josef Bacik
2010-11-17 9:30 ` Andreas Dilger
2010-11-17 9:30 ` Andreas Dilger
2010-11-17 9:19 ` Andreas Dilger
2010-11-17 9:19 ` Andreas Dilger
2010-11-16 12:53 ` Josef Bacik
2010-11-16 12:53 ` Josef Bacik
2010-11-18 1:46 Hole Punching V3 Josef Bacik
2010-11-18 1:46 ` [PATCH 1/6] fs: add hole punching to fallocate Josef Bacik
2010-11-18 1:46 ` Josef Bacik
2010-11-18 1:46 ` Josef Bacik
2010-11-18 23:43 ` Jan Kara
2010-11-18 23:43 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110112114823.GO28803@dastard \
--to=david@fromorbit.com \
--cc=josef@redhat.com \
--cc=leg@google.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.