linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Valerie Aurora Henson <vaurora@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Chris Mason <chris.mason@oracle.com>,
	Theodore Tso <tytso@mit.edu>, Eric Sandeen <sandeen@redhat.com>
Subject: Re: [RFC PATCH] fpathconf() for fsync() behavior
Date: Thu, 23 Apr 2009 12:10:13 -0400	[thread overview]
Message-ID: <49F092E5.9020601@redhat.com> (raw)
In-Reply-To: <20090423160426.GF8476@shell>

Valerie Aurora Henson wrote:
> On Wed, Apr 22, 2009 at 10:17:48PM -0700, Andrew Morton wrote:
>> On Wed, 22 Apr 2009 20:12:57 -0400 Valerie Aurora Henson <vaurora@redhat.com> wrote:
>>
>>> In the default mode for ext3 and btrfs, fsync() is both slow and
>>> unnecessary for some important application use cases - at the same
>>> time that it is absolutely required for correctness for other modes of
>>> ext3, ext4, XFS, etc.  If applications could easilyl distinguish
>>> between the two cases, they would be more likely to be correct and
>>> fast.
>>>
>>> How about an fpathconf() variable, something like _PC_ORDERED?  E.g.:
>>>
>>> 	/* Unoptimized example optional fsync() demo */
>>> 	write(fd);
>>> 	/* Only fsync() if we need it */
>>> 	if (fpath_conf(fd, _PC_ORDERED) != 1)
>>> 		fsync(fd);
>>> 	rename(tmp_path, new_path);
>>>
>>> I know of two specific real-world cases in which this would
>>> significantly improve performance: (a) fsync() before rename(), (b)
>>> fsync() of the parent directory of a newly created file.  Case (b) is
>>> particularly nasty when you have multiple threads creating files in
>>> the same directory because the dir's i_mutex is held across fsync() -
>>> file creates become limited to the speed of sequential fsync()s.
>>>
>>> Conceptual libc patch below.
>> Would it be better to implement new syscall(s) with finer-grained control
>> and better semantics?  Then userspace would just need to to:
>>
>> 	fsync_on_steroids(fd, FSYNC_BEFORE_RENAME);
>>
>> and that all gets down into the filesystem which can then work out what
>> it needs to do to implement the command.
> 
> You and Jamie have a good point: fsync() is a very big hammer used for
> many different purposes, and it would be nice to have finer-grained
> tools.  There are distinct limits to what you can do to optimize a
> full fsync(); we should be thrilled to get fewer of them from userspace.
> 
> Like others, I am concerned about the complexity for the programmer.
> Perhaps in addition to the various fine-grained options, there is a:
> 
> 	fsync_on_steroids(fd, FSYNC_DO_WHAT_ORDERED_WOULD_DO);
> 
> The idea is that we've currently got a lot of code that assumes ext3
> data=ordered semantics (btrfs will fulfill these assumptions too).  It
> would be nice if we had one simple drop-in test to distinguish between
> ext3-ordered/btrfs/reiserfs and all other fs's; I think we'd get a lot
> more adoption that way.
> 
> All that being said, I'd be thrilled to have fine-grained fsync().
> 
> -VAL

I like the fine grained fsync variation as well. We could reimplement the 
standard fsync to be safe, boring and relatively slow while allowing the few 
really sophisticated users the extra options.

It would also make it easier to insure that the traditional fsync() semantics 
are not weakened in unexpected ways for apps that care.

ric


  reply	other threads:[~2009-04-23 16:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-23  0:12 [RFC PATCH] fpathconf() for fsync() behavior Valerie Aurora Henson
2009-04-23  5:17 ` Andrew Morton
2009-04-23 11:21   ` Jamie Lokier
2009-04-23 12:42     ` Theodore Tso
2009-04-23 12:48       ` Jeff Garzik
2009-04-23 14:10         ` Theodore Tso
2009-04-23 16:16       ` Valerie Aurora Henson
2009-04-26  9:26         ` Pavel Machek
2009-04-23 16:43       ` Jamie Lokier
2009-04-23 17:29         ` Theodore Tso
2009-04-23 20:44           ` fsync_range_with_flags() - improving sync_file_range() Jamie Lokier
2009-04-23 21:13             ` Theodore Tso
2009-04-23 22:03               ` Jamie Lokier
2009-04-23 16:04   ` [RFC PATCH] fpathconf() for fsync() behavior Valerie Aurora Henson
2009-04-23 16:10     ` Ric Wheeler [this message]
2009-04-23 17:23     ` Jamie Lokier
2009-04-23 11:11 ` Christoph Hellwig
2009-04-23 15:49   ` Valerie Aurora Henson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49F092E5.9020601@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    --cc=vaurora@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).