From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
linux-kernel@vger.kernel.org,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>
Subject: Re: [RFC PATCH 2/4] mm: Update file times when inodes are written after mmaped writes
Date: Fri, 21 Dec 2012 01:58:25 +0100 [thread overview]
Message-ID: <20121221005825.GC13474@quack.suse.cz> (raw)
In-Reply-To: <20121221001457.GY15182@dastard>
On Fri 21-12-12 11:14:57, Dave Chinner wrote:
> On Thu, Dec 20, 2012 at 03:10:10PM -0800, Andy Lutomirski wrote:
> > The onus is currently on filesystems to call file_update_time
> > somewhere in the page_mkwrite path. This is unfortunate for three
> > reasons:
> >
> > 1. page_mkwrite on a locked page should be fast. ext4, for example,
> > often sleeps while dirtying inodes.
>
> That's an ext4 problem, not a page fault or timestamp update
> problem. Fix ext4.
Well, XFS doesn't journal the timestamp update which is why it gets away
without blocking on journal. Other filesystems (and I don't think it's just
ext4) are so benevolent with timestamps so their updates are more costly...
> > 2. The current behavior is surprising -- the timestamp resulting from
> > an mmaped write will be before the write, not after. This contradicts
> > the mmap(2) manpage, which says:
> >
> > The st_ctime and st_mtime field for a file mapped with PROT_WRITE and
> > MAP_SHARED will be updated after a write to the mapped region, and
> > before a subsequent msync(2) with the MS_SYNC or MS_ASYNC flag, if one
> > occurs.
>
> What you propose (time updates in do_writepages()) violates this.
> msync(MS_ASYNC) doesn't actually start any IO, therefore the
> timestamp wil not be updated.
>
> Besides, what POSIX actually says is:
>
> | The st_ctime and st_mtime fields of a file that is mapped with
> | MAP_SHARED and PROT_WRITE shall be marked for update at some point
> | in the interval between a write reference to the mapped region and
> | the next call to msync() with MS_ASYNC or MS_SYNC for that portion
> | of the file by any process.
>
> Which means updating the timestamp during the first write is
> perfectly acceptible. Indeed, by definition, we are compliant with
> the man page because the update is after the write has occurred.
> That is, the write triggered the page fault, so the page fault
> processing where we update the timestamps is definitely after the
> write occurred. :)
Well, but there can be more writes to the already write faulted page.
They can come seconds after we called ->page_mkwrite() and thus updated
time stamps. So Andy is correct we violate the spec AFAICT.
> > 3. (An ulterior motive) I'd like to use hardware dirty tracking for
> > shared, locked, writable mappings (instead of page faults). Moving
> > important work out of the page_mkwrite path is an important first step.
>
> I don't think you're going to get very far doing this. page_mkwrite
> needs to do:
>
> a) block allocation in page_mkwrite() for faults over holes
> to detect ENOSPC conditions immediately rather than in
> writeback when such an error results in data loss.
> b) detect writes over unwritten extents so that the pages in
> the page cache can be set up correctly for conversion to
> occur during writeback.
>
> Correcting these two problems was the reason for introducing
> page_mkwrite in the first place - we need to do this stuff before
> the page fault is completed, and that means, by definition,
> page_mkwrite needs to be able to block. Moving c/mtime updates out
> of the way does not, in any way, change these requirements.
Here I completely agree. I wanted to comment on it in my post as well but
then forgot about it.
> Perhaps you should implement everything you want to do inside ext4
> first, so we can get an idea of exactly what you want page_mkwrite()
> to do, how you want it to behave, and how you expect filesystems to
> handle the above situations correctly....
Ah, now I noticed we don't call file_update_time() from
__block_page_mkwrite() so yes, just changing ext4 without touching generic
code is easily possible.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2012-12-21 0:58 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-18 1:10 Are there u32 atomic bitops? (or dealing w/ i_flags) Andy Lutomirski
2012-12-18 1:34 ` Ming Lei
2012-12-18 1:57 ` Al Viro
2012-12-18 2:42 ` Andy Lutomirski
2012-12-18 21:30 ` Dave Chinner
2012-12-18 22:20 ` Andy Lutomirski
2012-12-20 7:03 ` Dave Chinner
2012-12-20 20:05 ` Andy Lutomirski
2012-12-20 23:10 ` [RFC PATCH 0/4] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2012-12-20 23:10 ` [RFC PATCH 1/4] mm: Explicitly track when the page dirty bit is transferred from a pte Andy Lutomirski
2012-12-20 23:10 ` [RFC PATCH 2/4] mm: Update file times when inodes are written after mmaped writes Andy Lutomirski
2012-12-21 0:14 ` Dave Chinner
2012-12-21 0:58 ` Jan Kara [this message]
2012-12-21 1:12 ` Dave Chinner
2012-12-21 1:36 ` Jan Kara
2012-12-21 5:36 ` Andy Lutomirski
2012-12-21 10:51 ` Jan Kara
2012-12-21 18:26 ` Andy Lutomirski
2012-12-21 0:34 ` Jan Kara
2012-12-21 5:42 ` Andy Lutomirski
2012-12-21 11:03 ` Jan Kara
2012-12-20 23:10 ` [RFC PATCH 3/4] Remove file_update_time from all mkwrite paths Andy Lutomirski
2012-12-20 23:10 ` [RFC PATCH 4/4] ext4: Fix an incorrect comment about i_mutex Andy Lutomirski
2012-12-20 23:42 ` Jan Kara
2012-12-20 23:36 ` Are there u32 atomic bitops? (or dealing w/ i_flags) Dave Chinner
2012-12-20 23:42 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121221005825.GC13474@quack.suse.cz \
--to=jack@suse.cz \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).