From: Dave Chinner <david@fromorbit.com>
To: Chad Talbott <ctalbott@google.com>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@google.com>,
Michael Rubin <mrubin@google.com>
Subject: Re: Metadata in sys_sync_file_range and fadvise(DONTNEED)
Date: Thu, 6 Nov 2008 15:20:54 +1100 [thread overview]
Message-ID: <20081106042054.GB2373@disturbed> (raw)
In-Reply-To: <1786ab030811051719w63ada9c8ldf75f15367adbb8b@mail.gmail.com>
On Wed, Nov 05, 2008 at 05:19:20PM -0800, Chad Talbott wrote:
> On Sun, Nov 2, 2008 at 2:45 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Oct 31, 2008 at 01:54:14PM -0700, Chad Talbott wrote:
> >> Andrew suggests a new SYNC_FILE_RANGE_METADATA flag for
> >> sys_sync_file_range(), and leaving posix_fadvise() alone.
> >
> > What is the interface that a filesystem will see? No filesystem has
> > a "metadata sync" method - is this going to fall through to some new
> > convoluted combination of writeback flags to an inode/mapping
> > that more filesystems than not can get wrong?
>
> Good point, coupled with metadata/data ordering and your argument
> below, a decent argument against exposing this interface.
>
> > FWIW, sys_sync_file_range() is fundamentally broken for data
> > integrity writeback - at no time does it call a filesystem method
> > that can result in a barrier I/O being issued to disk after
> > writeback is complete. So, unlike fsync() or fdatasync(), the data
> > can still be lost after completion due to power failure on drives
> > with volatile write caches....
>
> Seems to be true. I'm not currently concerned with sync_file_range
> for data integrity, so I'm going to punt on this issue.
;)
> If the consensus is against exposing a "sync metadata" interface, I'm
> fine with ext2 silently updating metadata alongside neighboring data
> in *either* posix_fadvise() or sync_file_range.
I think that sync_file_range is the better choice for "correct"
behaviour. There is the assumption with syncing data explicitly
that the metadata needs to reference that data is written to disk
as well.
> Either way, does it
> seem reasonable for posix_fadvise(DONTNEED) to call
> __filemap_fdatawrite_range to do its work?
>From a kernel perspective, I don't think it really matters. To an
application, it could. e.g. If you're calling posix_fadvise on a
large range, then the I/O patterns will be the same either way. If
you're calling posix_fadvise() on small, sparse ranges of the file,
then you'll turn one large, fast writeout into lots of small random
writes. i.e. upgrade the kernel and the application goes much
slower....
I guess this all depends on whether this would be considered a
regression or a stupid application ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2008-11-06 4:23 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-31 20:54 Metadata in sys_sync_file_range and fadvise(DONTNEED) Chad Talbott
2008-11-01 9:21 ` Andrew Morton
2008-11-06 0:56 ` Chad Talbott
2008-11-06 1:07 ` Andrew Morton
2008-11-06 1:27 ` Chad Talbott
2008-11-02 22:45 ` Dave Chinner
2008-11-06 1:19 ` Chad Talbott
2008-11-06 4:20 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081106042054.GB2373@disturbed \
--to=david@fromorbit.com \
--cc=akpm@google.com \
--cc=ctalbott@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mrubin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox