From: Jamie Lokier <jamie@shareable.org>
To: "Stephen C. Tweedie" <sct@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
Badari Pulavarty <pbadari@us.ibm.com>,
Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
jack@suse.cz
Subject: Re: ext3_ordered_writepage() questions
Date: Fri, 17 Mar 2006 22:23:05 +0000 [thread overview]
Message-ID: <20060317222305.GA14552@mail.shareable.org> (raw)
In-Reply-To: <1142632221.3641.33.camel@orbit.scot.redhat.com>
Stephen C. Tweedie wrote:
> > That's the wrong way around for uses which check mtimes to revalidate
> > information about a file's contents.
>
> It's actually the right way for newly-allocated data: the blocks being
> written early are invisible until the mtime update, because the mtime
> update is an atomic part of the transaction which links the blocks into
> the inode.
Yes, I agree. It's right for that.
> > Local search engines like Beagle, and also anything where "make" is
> > involved, and "rsync" come to mind.
>
> Make and rsync (when writing, that is) are not usually updating in
> place, so they do in fact want the current ordered mode.
I'm referring to make and rsync _after_ a recovery, when _reading_ to
decide whether file data is up to date. The writing in that scenario
is by other programs.
Those are the times when the current ordering gives surprising
results, to the person who hasn't thought about this ordering, such as
rsync not synchronising a directory properly because it assumes
(incorrectly) a file's mtime is indicative of the last time data was
written to the file.
I agree that when writing data to the end of a new file, the data must
be committed before the metadata.
The weird distinction is really because the order ought to be, if they
can't all be atomic: commit mtime, then data, then size. But we
always commit size and mtime together.
> It's *only* for updating existing data blocks that there's any
> justification for writing mtime first. That's the question here.
>
> There's a significant cost in forcing the mtime to go first: it means
> that the VM cannot perform any data writeback for data written by a
> transaction until the transaction has first been committed. That's the
> last thing you want to be happening under VM pressure, as you may not in
> fact be able to close the transaction without first allocating more
> memory.
While I agree that it's not good for VM pressure, fooling programs
that rely on mtimes to decide if a file's content has changed is a
correctness issue for some applications.
I picked the example of copying a directory using rsync (or any other
program which compares mtimes) and not getting expected results as one
that's easily understood, that people actually do, and where they may
already be getting surprises that may not be noticed immediately.
Maybe the answer is to make the writeback order for in-place writes a
mount option and/or a file attribute?
-- Jamie
prev parent reply other threads:[~2006-03-18 1:14 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-08 0:19 [RFC PATCH 0/3] VFS changes to collapse all the vectored and AIO support Badari Pulavarty
2006-03-08 0:22 ` [PATCH 1/3] Vectorize aio_read/aio_write methods Badari Pulavarty
2006-03-08 12:44 ` christoph
2006-03-08 0:23 ` [PATCH 2/3] Remove readv/writev methods and use aio_read/aio_write instead Badari Pulavarty
2006-03-08 12:45 ` christoph
2006-03-08 16:26 ` Badari Pulavarty
2006-03-08 0:24 ` [PATCH 3/3] Zach's core aio changes to support vectored AIO Badari Pulavarty
2006-03-08 3:37 ` Benjamin LaHaise
2006-03-08 16:34 ` Badari Pulavarty
2006-03-08 12:47 ` [RFC PATCH 0/3] VFS changes to collapse all the vectored and AIO support christoph
2006-03-08 16:24 ` Badari Pulavarty
2006-03-09 16:17 ` ext3_ordered_writepage() questions Badari Pulavarty
2006-03-09 23:35 ` Andrew Morton
2006-03-10 0:36 ` Badari Pulavarty
2006-03-16 18:09 ` Theodore Ts'o
2006-03-16 18:22 ` Badari Pulavarty
2006-03-16 21:04 ` Theodore Ts'o
2006-03-16 21:57 ` Badari Pulavarty
2006-03-16 22:05 ` Jan Kara
2006-03-16 23:45 ` Badari Pulavarty
2006-03-17 0:44 ` Theodore Ts'o
2006-03-17 0:54 ` Andreas Dilger
2006-03-17 17:05 ` Stephen C. Tweedie
2006-03-17 21:32 ` Badari Pulavarty
2006-03-17 22:22 ` Stephen C. Tweedie
2006-03-17 22:38 ` Badari Pulavarty
2006-03-17 23:23 ` Mingming Cao
2006-03-20 17:05 ` Stephen C. Tweedie
2006-03-18 2:57 ` Suparna Bhattacharya
2006-03-18 3:02 ` Suparna Bhattacharya
2006-03-17 15:32 ` Jamie Lokier
2006-03-17 21:50 ` Stephen C. Tweedie
2006-03-17 22:11 ` Theodore Ts'o
2006-03-17 22:44 ` Jamie Lokier
2006-03-18 23:40 ` Theodore Ts'o
2006-03-19 2:36 ` Jamie Lokier
2006-03-19 5:28 ` Chris Adams
2006-03-20 2:18 ` Theodore Ts'o
2006-03-20 16:26 ` Stephen C. Tweedie
2006-03-17 22:23 ` Jamie Lokier [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060317222305.GA14552@mail.shareable.org \
--to=jamie@shareable.org \
--cc=akpm@osdl.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbadari@us.ibm.com \
--cc=sct@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox