linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [RFC PATCH 2/4] mm: Update file times when inodes are written after mmaped writes
Date: Thu, 20 Dec 2012 21:42:46 -0800	[thread overview]
Message-ID: <CALCETrVCGeL6-Jji05CLrFUSyMOE6b5W2dqGNQoyUqLYXw0LLg@mail.gmail.com> (raw)
In-Reply-To: <20121221003401.GB13474@quack.suse.cz>

On Thu, Dec 20, 2012 at 4:34 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 20-12-12 15:10:10, Andy Lutomirski wrote:
>> The onus is currently on filesystems to call file_update_time
>> somewhere in the page_mkwrite path.  This is unfortunate for three
>> reasons:
>>
>> 1. page_mkwrite on a locked page should be fast.  ext4, for example,
>>    often sleeps while dirtying inodes.
>>
>> 2. The current behavior is surprising -- the timestamp resulting from
>>    an mmaped write will be before the write, not after.  This contradicts
>>    the mmap(2) manpage, which says:
>>
>>      The st_ctime and st_mtime field for a file mapped with  PROT_WRITE  and
>>      MAP_SHARED  will  be  updated  after  a write to the mapped region, and
>>      before a subsequent msync(2) with the MS_SYNC or MS_ASYNC flag, if  one
>>      occurs.
>   I agree your behavior is more correct wrt to the manpage / spec. OTOH I
> could dig out several emails where users complain time stamps magically
> change some time after the file was written via mmap (because writeback
> happened at that time and it did some allocation to the inode). People hit
> this e.g. when compiling something, ld(1) writes final binary through mmap,
> the package / archive the final binary and later some sanity check finds
> the time stamp on the binary is newer than the package / archive.
>
> Looking more into the patch you end up updating timestamps on munmap(2)
> (thus on file close in particular). That should avoid the most surprising
> cases and users hopefully won't notice the difference. Good. But please
> mention this explicitely in the changelog.

I was careful to get that case right.  I'll update the changelog.

In particular, I've so far tested munmap, msync(MS_SYNC), fsync,
waiting 30 seconds, and dying by fatal signal.  All of those paths
work right.

>> +/**
>> + *   inode_update_time_writable      -       update mtime and ctime time
>> + *   @inode: inode accessed
>> + *
>> + *   This is like file_update_time, but it assumes the mnt is writable
>> + *   and takes an inode parameter instead.
>> + */
>> +
>> +int inode_update_time_writable(struct inode *inode)
>> +{
>> +     struct timespec now;
>> +     int sync_it = 0;
>> +     int ret;
>> +
>> +     /* First try to exhaust all avenues to not sync */
>> +     if (IS_NOCMTIME(inode))
>> +             return 0;
>> +
>> +     now = current_fs_time(inode->i_sb);
>> +     if (!timespec_equal(&inode->i_mtime, &now))
>> +             sync_it = S_MTIME;
>> +
>> +     if (!timespec_equal(&inode->i_ctime, &now))
>> +             sync_it |= S_CTIME;
>> +
>> +     if (IS_I_VERSION(inode))
>> +             sync_it |= S_VERSION;
>> +
>> +     if (!sync_it)
>> +             return 0;
>> +
>> +     ret = update_time(inode, &now, sync_it);
>> +
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL(inode_update_time_writable);
>> +
>   So this differs from file_update_time() only by not calling
> __mnt_want_write(). Why this special function? It is actually unsafe wrt
> remounts read-only or filesystem freezing... For that you need to call
> sb_start_write() / sb_end_write() around the timestamp update. Umm, or
> better sb_start_pagefault() / sb_end_pagefault() because the call in
> remove_vma() gets called under mmap_sem so we are in a rather similar
> situation to ->page_mkwrite.

The important difference is that it takes an inode* as a parameter
instead of a file*.  I don't think that inodes have a struct vfsmount,
so I can't call __mnt_want_write.  I'll take a look at
sb_start_pagefault.  I'll also refactor this a bit to minimize code
duplication.  The current approach was for the v1 rfc version. :)

>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 3913262..60301dc 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -223,6 +223,10 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
>>       struct vm_area_struct *next = vma->vm_next;
>>
>>       might_sleep();
>> +
>> +     if (vma->vm_file)
>> +             mapping_flush_cmtime(vma->vm_file->f_mapping);
>> +
>>       if (vma->vm_ops && vma->vm_ops->close)
>>               vma->vm_ops->close(vma);
>>       if (vma->vm_file)
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index cdea11a..8cbb7fb 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -1910,6 +1910,13 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
>>               ret = mapping->a_ops->writepages(mapping, wbc);
>>       else
>>               ret = generic_writepages(mapping, wbc);
>> +
>> +     /*
>> +      * This is after writepages because the AS_CMTIME bit won't
>> +      * bet set until writepages is called.
>> +      */
>> +     mapping_flush_cmtime(mapping);
>> +
>>       return ret;
>>  }
>>
>> @@ -2117,8 +2124,17 @@ EXPORT_SYMBOL(set_page_dirty);
>>   */
>>  int set_page_dirty_from_pte(struct page *page)
>>  {
>> -     /* Doesn't do anything interesting yet. */
>> -     return set_page_dirty(page);
>> +     int ret = set_page_dirty(page);
>> +
>> +     /*
>> +      * We may be out of memory and/or have various locks held, so
>> +      * there isn't much we can do in here.
>> +      */
>> +     struct address_space *mapping = page_mapping(page);
>   Declarations should go together please. So something like:
>         int ret = set_page_dirty(page);
>         struct address_space *mapping = page_mapping(page);
>
>         /* comment... */

Will do.  Some day I'll learn how to act less like a C99/C++
programmer when writing kernel code.

Am I correct in interpreting this as "these patches may be
sufficiently non-insane that I should keep working on them"?  I admit
I'm pretty far out of my depth working on vm/vfs stuff.

Thanks,
Andy

  reply	other threads:[~2012-12-21  5:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-18  1:10 Are there u32 atomic bitops? (or dealing w/ i_flags) Andy Lutomirski
2012-12-18  1:34 ` Ming Lei
2012-12-18  1:57 ` Al Viro
2012-12-18  2:42   ` Andy Lutomirski
2012-12-18 21:30     ` Dave Chinner
2012-12-18 22:20       ` Andy Lutomirski
2012-12-20  7:03         ` Dave Chinner
2012-12-20 20:05           ` Andy Lutomirski
2012-12-20 23:10             ` [RFC PATCH 0/4] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2012-12-20 23:10               ` [RFC PATCH 1/4] mm: Explicitly track when the page dirty bit is transferred from a pte Andy Lutomirski
2012-12-20 23:10               ` [RFC PATCH 2/4] mm: Update file times when inodes are written after mmaped writes Andy Lutomirski
2012-12-21  0:14                 ` Dave Chinner
2012-12-21  0:58                   ` Jan Kara
2012-12-21  1:12                     ` Dave Chinner
2012-12-21  1:36                       ` Jan Kara
2012-12-21  5:36                   ` Andy Lutomirski
2012-12-21 10:51                     ` Jan Kara
2012-12-21 18:26                       ` Andy Lutomirski
2012-12-21  0:34                 ` Jan Kara
2012-12-21  5:42                   ` Andy Lutomirski [this message]
2012-12-21 11:03                     ` Jan Kara
2012-12-20 23:10               ` [RFC PATCH 3/4] Remove file_update_time from all mkwrite paths Andy Lutomirski
2012-12-20 23:10               ` [RFC PATCH 4/4] ext4: Fix an incorrect comment about i_mutex Andy Lutomirski
2012-12-20 23:42                 ` Jan Kara
2012-12-20 23:36             ` Are there u32 atomic bitops? (or dealing w/ i_flags) Dave Chinner
2012-12-20 23:42               ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrVCGeL6-Jji05CLrFUSyMOE6b5W2dqGNQoyUqLYXw0LLg@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).