From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
linux-kernel@vger.kernel.org,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>
Subject: Re: [RFC PATCH 2/4] mm: Update file times when inodes are written after mmaped writes
Date: Fri, 21 Dec 2012 01:58:25 +0100 [thread overview]
Message-ID: <20121221005825.GC13474@quack.suse.cz> (raw)
In-Reply-To: <20121221001457.GY15182@dastard>
On Fri 21-12-12 11:14:57, Dave Chinner wrote:
> On Thu, Dec 20, 2012 at 03:10:10PM -0800, Andy Lutomirski wrote:
> > The onus is currently on filesystems to call file_update_time
> > somewhere in the page_mkwrite path. This is unfortunate for three
> > reasons:
> >
> > 1. page_mkwrite on a locked page should be fast. ext4, for example,
> > often sleeps while dirtying inodes.
>
> That's an ext4 problem, not a page fault or timestamp update
> problem. Fix ext4.
Well, XFS doesn't journal the timestamp update which is why it gets away
without blocking on journal. Other filesystems (and I don't think it's just
ext4) are so benevolent with timestamps so their updates are more costly...
> > 2. The current behavior is surprising -- the timestamp resulting from
> > an mmaped write will be before the write, not after. This contradicts
> > the mmap(2) manpage, which says:
> >
> > The st_ctime and st_mtime field for a file mapped with PROT_WRITE and
> > MAP_SHARED will be updated after a write to the mapped region, and
> > before a subsequent msync(2) with the MS_SYNC or MS_ASYNC flag, if one
> > occurs.
>
> What you propose (time updates in do_writepages()) violates this.
> msync(MS_ASYNC) doesn't actually start any IO, therefore the
> timestamp wil not be updated.
>
> Besides, what POSIX actually says is:
>
> | The st_ctime and st_mtime fields of a file that is mapped with
> | MAP_SHARED and PROT_WRITE shall be marked for update at some point
> | in the interval between a write reference to the mapped region and
> | the next call to msync() with MS_ASYNC or MS_SYNC for that portion
> | of the file by any process.
>
> Which means updating the timestamp during the first write is
> perfectly acceptible. Indeed, by definition, we are compliant with
> the man page because the update is after the write has occurred.
> That is, the write triggered the page fault, so the page fault
> processing where we update the timestamps is definitely after the
> write occurred. :)
Well, but there can be more writes to the already write faulted page.
They can come seconds after we called ->page_mkwrite() and thus updated
time stamps. So Andy is correct we violate the spec AFAICT.
> > 3. (An ulterior motive) I'd like to use hardware dirty tracking for
> > shared, locked, writable mappings (instead of page faults). Moving
> > important work out of the page_mkwrite path is an important first step.
>
> I don't think you're going to get very far doing this. page_mkwrite
> needs to do:
>
> a) block allocation in page_mkwrite() for faults over holes
> to detect ENOSPC conditions immediately rather than in
> writeback when such an error results in data loss.
> b) detect writes over unwritten extents so that the pages in
> the page cache can be set up correctly for conversion to
> occur during writeback.
>
> Correcting these two problems was the reason for introducing
> page_mkwrite in the first place - we need to do this stuff before
> the page fault is completed, and that means, by definition,
> page_mkwrite needs to be able to block. Moving c/mtime updates out
> of the way does not, in any way, change these requirements.
Here I completely agree. I wanted to comment on it in my post as well but
then forgot about it.
> Perhaps you should implement everything you want to do inside ext4
> first, so we can get an idea of exactly what you want page_mkwrite()
> to do, how you want it to behave, and how you expect filesystems to
> handle the above situations correctly....
Ah, now I noticed we don't call file_update_time() from
__block_page_mkwrite() so yes, just changing ext4 without touching generic
code is easily possible.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2012-12-21 0:58 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-18 1:10 Are there u32 atomic bitops? (or dealing w/ i_flags) Andy Lutomirski
2012-12-18 1:34 ` Ming Lei
2012-12-18 1:57 ` Al Viro
2012-12-18 2:42 ` Andy Lutomirski
2012-12-18 21:30 ` Dave Chinner
2012-12-18 22:20 ` Andy Lutomirski
2012-12-20 7:03 ` Dave Chinner
2012-12-20 20:05 ` Andy Lutomirski
2012-12-20 23:10 ` [RFC PATCH 0/4] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2012-12-20 23:10 ` [RFC PATCH 1/4] mm: Explicitly track when the page dirty bit is transferred from a pte Andy Lutomirski
2012-12-20 23:10 ` [RFC PATCH 2/4] mm: Update file times when inodes are written after mmaped writes Andy Lutomirski
2012-12-21 0:14 ` Dave Chinner
2012-12-21 0:58 ` Jan Kara [this message]
2012-12-21 1:12 ` Dave Chinner
2012-12-21 1:36 ` Jan Kara
2012-12-21 5:36 ` Andy Lutomirski
2012-12-21 10:51 ` Jan Kara
2012-12-21 18:26 ` Andy Lutomirski
2012-12-21 0:34 ` Jan Kara
2012-12-21 5:42 ` Andy Lutomirski
2012-12-21 11:03 ` Jan Kara
2012-12-20 23:10 ` [RFC PATCH 3/4] Remove file_update_time from all mkwrite paths Andy Lutomirski
2012-12-20 23:10 ` [RFC PATCH 4/4] ext4: Fix an incorrect comment about i_mutex Andy Lutomirski
2012-12-20 23:42 ` Jan Kara
2012-12-20 23:36 ` Are there u32 atomic bitops? (or dealing w/ i_flags) Dave Chinner
2012-12-20 23:42 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121221005825.GC13474@quack.suse.cz \
--to=jack@suse.cz \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.