linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com, Christoph Hellwig <hch@infradead.org>,
	Jan Kara <jack@suse.cz>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	Tim Chen <tim.c.chen@linux.intel.com>
Subject: Re: [PATCH v4 3/7] mm: Allow filesystems to defer cmtime updates
Date: Wed, 4 Sep 2013 21:20:48 +0200	[thread overview]
Message-ID: <20130904192048.GM3996@quack.suse.cz> (raw)
In-Reply-To: <CALCETrUA653bw3C=XP=Es9UApqsKsDG-UAzuZ=qS1RA1LeoKYw@mail.gmail.com>

On Wed 04-09-13 10:54:50, Andy Lutomirski wrote:
> >> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait)
> >>  }
> >>  EXPORT_SYMBOL(write_one_page);
> >>
> >> +void mapping_flush_cmtime(struct address_space *mapping)
> >> +{
> >> +     if (mapping_test_clear_cmtime(mapping) &&
> >> +         mapping->a_ops->update_cmtime_deferred)
> >> +             mapping->a_ops->update_cmtime_deferred(mapping);
> >> +}
> >> +EXPORT_SYMBOL(mapping_flush_cmtime);
> >   Hum, is there a reason for update_cmtime_deferred() operation? I can
> > hardly imagine anyone will want to do anything else than what
> > inode_update_time_writable() does so why bother? You mention tmpfs & co.
> > don't fit into your scheme well with which I agree so let's just keep
> > file_update_time() in their page_mkwrite() operation. But I don't see a
> > real need for avoiding the deferred cmtime logic...
> 
> I think there might be odd corner cases.  For example, mmap a tmpfs
> file, write it, and unmap it.  Then, an hour later, maybe the system
  If you unmap it then that will handle the update. But if you won't unmap,
you'd get spurious updates of timestamps which would be strange.

> will be under memory pressure and page out the file.  This could
> trigger a surprising time update.  (I'm not sure this can actually
> happen on tmpfs, but maybe it would on some other filesystem.)
> 
> Does this actually matter?  A flag to turn the feature on or off would
> do the trick, but I don't think there's precedent for sticking a flag
> in a_ops.
  Flag in a_ops is ugly. But you can have a flag in 'struct
filesystem_type' which would be reasonable. 

> >> +void mapping_flush_cmtime_nowb(struct address_space *mapping)
> >> +{
> >> +     /*
> >> +      * We get called from munmap and msync.  Both calls can race
> >> +      * with fs freezing.  If the fs is frozen after
> >> +      * mapping_test_clear_cmtime but before the time update, then
> >> +      * sync_filesystem will miss the cmtime update (because we
> >> +      * just cleared it) and we don't be able to write (because the
> >> +      * fs is frozen).  On the other hand, we can't just return if
> >> +      * we're in the SB_FREEZE_PAGEFAULT state because our caller
> >> +      * expects the timestamp to be synchronously updated.  So we
> >> +      * get write access without blocking, at the SB_FREEZE_FS
> >> +      * level.  If the fs is already fully frozen, then we already
> >> +      * know we have nothing to do.
> >> +      */
> >> +
> >> +     if (!mapping_test_cmtime(mapping))
> >> +             return;  /* Optimization: nothing to do. */
> >> +
> >> +     if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
> >> +             mapping_flush_cmtime(mapping);
> >> +             __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
> >> +     }
> >> +}
> >   This is wrong because SB_FREEZE_FS level is targetted for filesystem
> > internal use. Also it is racy. mapping_flush_cmtime() ends up calling
> > mark_inode_dirty() and filesystems such as ext4 or xfs will start a
> > transaction to store inode in the journal. This gets freeze protection at
> > SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to
> > SB_FREEZE_FS before this second protection, things will deadlock.
> 
> Whoops -- I assumed that it was safe to recursively take freeze
> protection at the same level.
> 
> I'm worried about the following race:
> 
> Thread 1 (in munmap):
> Check AS_CMTIME set
> sb_start_pagefault
> 
> Thread 2 (freezing the fs):
> frozen = SB_FREEZE_PAGEFAULT;
> sync_filesystem()
> 
> Thread 1 is now stuck.  It doesn't need to be, because sync_filesystem
> will flush out the cmtime write.  But there doesn't seem to be a clean
> mechanism to wait for the freeze to finish.
  OK, I see. Frankly, I'd rather live with msync() and munmap() blocking
while filesystem is frozen than trying to outsmart the freezing logic...
If someone comes up with a usecase where it causes trouble, we can always
improve the logic with some clever tricks.

> Is there a clean way to avoid this?  I don't want to return
> immediately if a freeze is in progress, because userspace expects that
> munmap will update cmtime synchronously.
> 
> And ugly but simple solution is:
> 
> if (!mapping_test_cmtime(mapping))
>     return;  /* Optimization: nothing to do. */
> 
> if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
>     mapping_flush_cmtime(mapping);
>     __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
> } else {
>     /* Freeze is or was in progress.  The part of freezing from
> SB_FREEZE_PAGEFAULT through sync_filesystem holds s_umount for write,
> so we can wait for it to finish by taking s_umount for read. */
>     down_read(&sb->s_umount);
>     up_read(&sb->s_umount);
> }
  Yes, this would probably work but as I said above, I'd prefer the to keep
it simple unless we have a good reason for the complex solution.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-09-04 19:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-23  0:03 [PATCH v3 0/5] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2013-08-23  0:03 ` [PATCH v4 1/7] mm: Track mappings that have been written via ptes Andy Lutomirski
2013-08-23  0:03 ` [PATCH v4 2/7] fs: Add inode_update_time_writable Andy Lutomirski
2013-08-23  0:03 ` [PATCH v4 3/7] mm: Allow filesystems to defer cmtime updates Andy Lutomirski
2013-09-04 14:57   ` Jan Kara
2013-09-04 17:54     ` Andy Lutomirski
2013-09-04 19:20       ` Jan Kara [this message]
2013-09-04 20:05         ` Andy Lutomirski
2013-08-23  0:03 ` [PATCH v4 4/7] mm: Scan for dirty ptes and update cmtime on MS_ASYNC Andy Lutomirski
2013-08-23  0:03 ` [PATCH v4 5/7] ext4: Defer mmap cmtime updates Andy Lutomirski
2013-08-23  0:03 ` [PATCH v4 6/7] btrfs: " Andy Lutomirski
2013-08-23  0:03 ` [PATCH v4 7/7] xfs: " Andy Lutomirski
2013-08-23  1:12 ` [PATCH v3 0/5] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2013-09-04 14:06 ` Jan Kara
2013-09-04 15:08 ` Jan Kara
2013-09-04 17:33   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130904192048.GM3996@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=dave.hansen@linux.intel.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).