linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Dave Chinner <david@fromorbit.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	xfs@oss.sgi.com, Jan Kara <jack@suse.cz>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH v3 3/5] mm: Notify filesystems when it's time to apply a deferred cmtime update
Date: Mon, 19 Aug 2013 20:28:20 -0700	[thread overview]
Message-ID: <CALCETrV-Toj-NGpmWnmoUbCwrMUXOSbjQdYsSVuTiH+2dEgPTQ@mail.gmail.com> (raw)
In-Reply-To: <20130820023615.GE6023@dastard>

On Mon, Aug 19, 2013 at 7:36 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Aug 16, 2013 at 04:22:10PM -0700, Andy Lutomirski wrote:
>> Filesystems that defer cmtime updates should update cmtime when any
>> of these events happen after a write via a mapping:
>>
>>  - The mapping is written back to disk.  This happens from all kinds
>>    of places, all of which eventually call ->writepages.
>>
>>  - munmap is called or the mapping is removed when the process exits
>>
>>  - msync(MS_ASYNC) is called.  Linux currently does nothing for
>>    msync(MS_ASYNC), but POSIX says that cmtime should be updated some
>>    time between an mmaped write and the subsequent msync call.
>>    MS_SYNC calls ->writepages, but MS_ASYNC needs special handling.
>>
>> Filesystmes that defer cmtime updates should flush them on munmap or
>> exit.  Finding out that this happened through vm_ops is messy, so
>> add a new address space op for this.
>>
>> It's not strictly necessary to call ->flush_cmtime after ->writepages,
>> but it simplifies the fs code.  As an optional optimization,
>> filesystems can call mapping_test_clear_cmtime themselves in
>> ->writepages (as long as they're careful to scan all the pages first
>> -- the cmtime bit may not be set when ->writepages is entered).
>
> .flush_cmtime is effectively a duplicate method.  We already have
> .update_time to notify filesystems that they need to update the
> timestamp in the inode transactionally.

.update_time is used for the atime update as well, and it relies on
the core code to update the in-memory timestamp first.  I used that
approach in v2, but it was (correctly, I think) pointed out that this
was a layering violation and that core code shouldn't be mucking with
the timestamps directly during writeback.

There was a recent effort to move most of the file_update_calls from
the core into .page_mkwrite, and I don't think anyone wants to undo
that.

>
> Indeed:
>
>> +     /*
>> +      * Userspace expects certain system calls to update cmtime if
>> +      * a file has been recently written using a shared vma.  In
>> +      * cases where cmtime may need to be updated but writepages is
>> +      * not called, this is called instead.  (Implementations
>> +      * should call mapping_test_clear_cmtime.)
>> +      */
>> +     void (*flush_cmtime)(struct address_space *);
>
> You say it can be implemented in the ->writepage(s) method, and all
> filesystems provide ->writepage(s) in some form. Therefore I would
> have thought it be best to simply require filesystems to check that
> mapping flag during those methods and update the inode directly when
> that is set?

The problem with only doing it in ->writepages is that calling
writepages from munmap and exit would probably hurt performance for no
particular gain.  So I need some kind of callback to say "update the
time, but don't write data."  The AS_CMTIME bit will still be set when
the ptes are removed.

I could require ->writepages *and* ->flush_cmtime to handle the time
update, but that would complicate non-transactional filesystems.
Those filesystems should just flush cmtime at the end of writepages.

>
> Indeed, the way you've set up the infrastructure, we'll have to
> rewrite the cmtime update code to enable writepages to update this
> within some other transaction. Perhaps you should just implement it
> that way first?

This is already possible although not IMO necessary for correctness.
All that ext4 would need to do is to add something like:

if (mapping_test_clear_cmtime(mapping)) {
  update times within current transaction
}

somewhere inside the transaction in writepages.  There would probably
be room for some kind of generic helper to do everything in
inode_update_time_writable except for the actual mark_inode_dirty
part, but this still seems nasty from a locking perspective, and I'd
rather leave that optimization to an ext4 developer who wants to do
it.

I could simplify this a bit by moving the mapping_test_clear_cmtime
part from .flush_cmtime to its callers.

--Andy

  reply	other threads:[~2013-08-20  3:28 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-16 23:22 [PATCH v3 0/5] Rework mtime and ctime updates on mmaped Andy Lutomirski
2013-08-16 23:22 ` [PATCH v3 1/5] mm: Track mappings that have been written via ptes Andy Lutomirski
2013-08-16 23:22 ` [PATCH v3 2/5] fs: Add inode_update_time_writable Andy Lutomirski
2013-08-20  2:28   ` Dave Chinner
2013-08-20  3:20     ` Andy Lutomirski
2013-08-20  3:33       ` Dave Chinner
2013-08-20  4:07         ` Andy Lutomirski
2013-08-20 16:10           ` Jan Kara
2013-08-16 23:22 ` [PATCH v3 3/5] mm: Notify filesystems when it's time to apply a deferred cmtime update Andy Lutomirski
2013-08-20  2:36   ` Dave Chinner
2013-08-20  3:28     ` Andy Lutomirski [this message]
2013-08-20  4:08       ` Dave Chinner
2013-08-20  4:14         ` Andy Lutomirski
2013-08-20 16:00           ` Jan Kara
2013-08-20 16:42             ` Andy Lutomirski
2013-08-20 19:27               ` Andy Lutomirski
2013-08-20 21:48               ` Dave Chinner
2013-08-20 21:54                 ` Andy Lutomirski
2013-08-20 22:43                   ` Dave Chinner
2013-08-21  0:47                     ` Andy Lutomirski
2013-08-21  1:33                       ` Dave Chinner
2013-08-16 23:22 ` [PATCH v3 4/5] mm: Scan for dirty ptes and update cmtime on MS_ASYNC Andy Lutomirski
2013-08-16 23:22 ` [PATCH v3 5/5] ext4: Defer mmap cmtime update until writeback Andy Lutomirski
2013-08-20  2:38   ` Dave Chinner
2013-08-20  3:30     ` Andy Lutomirski
2013-08-20  4:08       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrV-Toj-NGpmWnmoUbCwrMUXOSbjQdYsSVuTiH+2dEgPTQ@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).