From: Jan Kara <jack@suse.cz>
To: Andy Lutomirski <luto@amacapital.net>
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>,
Dave Hansen <dave.hansen@linux.intel.com>,
xfs@oss.sgi.com, Jan Kara <jack@suse.cz>,
Tim Chen <tim.c.chen@linux.intel.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH v4 3/7] mm: Allow filesystems to defer cmtime updates
Date: Wed, 4 Sep 2013 16:57:40 +0200 [thread overview]
Message-ID: <20130904145740.GE3996@quack.suse.cz> (raw)
In-Reply-To: <e1620c8e4909a65e270c8e9590e307c22fd96a44.1377193658.git.luto@amacapital.net>
On Thu 22-08-13 17:03:19, Andy Lutomirski wrote:
> Filesystems that defer cmtime updates should update cmtime when any
> of these events happen after a write via a mapping:
>
> - The mapping is written back to disk. This happens from all kinds
> of places, most of which eventually call ->writepages. (The
> exceptions are vmscan and migration.)
>
> - munmap is called or the mapping is removed when the process exits
>
> - msync(MS_ASYNC) is called. Linux currently does nothing for
> msync(MS_ASYNC), but POSIX says that cmtime should be updated some
> time between an mmaped write and the subsequent msync call.
> MS_SYNC calls ->writepages, but MS_ASYNC needs special handling.
>
> Filesystems are responsible for checking for pending deferred cmtime
> updates in .writepages (a helper is provided for this purpose) and
> for doing the actual update in .update_cmtime_deferred.
>
> These changes have no effect by themselves; filesystems must opt in
> by implementing .update_cmtime_deferred and removing any
> file_update_time call in .page_mkwrite.
>
> This patch does not implement the MS_ASYNC case; that's in the next
> patch.
>
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
...
> +/**
> + * generic_update_cmtime_deferred - update cmtime after an mmapped write
> + * @mapping: The mapping
> + *
> + * This library function implements .update_cmtime_deferred. It is unlikely
> + * that any filesystem will want to do anything here except update the time
> + * (using this helper) or nothing at all (by leaving .update_cmtime_deferred
> + * NULL).
> + */
> +void generic_update_cmtime_deferred(struct address_space *mapping)
> +{
> + struct blk_plug plug;
> + blk_start_plug(&plug);
> + inode_update_time_writable(mapping->host);
> + blk_finish_plug(&plug);
> +}
> +EXPORT_SYMBOL(generic_update_cmtime_deferred);
> +
You can remove the pluggin here. Inode update will likely result in a
single write so there's no point.
> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait)
> }
> EXPORT_SYMBOL(write_one_page);
>
> +void mapping_flush_cmtime(struct address_space *mapping)
> +{
> + if (mapping_test_clear_cmtime(mapping) &&
> + mapping->a_ops->update_cmtime_deferred)
> + mapping->a_ops->update_cmtime_deferred(mapping);
> +}
> +EXPORT_SYMBOL(mapping_flush_cmtime);
Hum, is there a reason for update_cmtime_deferred() operation? I can
hardly imagine anyone will want to do anything else than what
inode_update_time_writable() does so why bother? You mention tmpfs & co.
don't fit into your scheme well with which I agree so let's just keep
file_update_time() in their page_mkwrite() operation. But I don't see a
real need for avoiding the deferred cmtime logic...
> +
> +void mapping_flush_cmtime_nowb(struct address_space *mapping)
> +{
> + /*
> + * We get called from munmap and msync. Both calls can race
> + * with fs freezing. If the fs is frozen after
> + * mapping_test_clear_cmtime but before the time update, then
> + * sync_filesystem will miss the cmtime update (because we
> + * just cleared it) and we don't be able to write (because the
> + * fs is frozen). On the other hand, we can't just return if
> + * we're in the SB_FREEZE_PAGEFAULT state because our caller
> + * expects the timestamp to be synchronously updated. So we
> + * get write access without blocking, at the SB_FREEZE_FS
> + * level. If the fs is already fully frozen, then we already
> + * know we have nothing to do.
> + */
> +
> + if (!mapping_test_cmtime(mapping))
> + return; /* Optimization: nothing to do. */
> +
> + if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
> + mapping_flush_cmtime(mapping);
> + __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
> + }
> +}
This is wrong because SB_FREEZE_FS level is targetted for filesystem
internal use. Also it is racy. mapping_flush_cmtime() ends up calling
mark_inode_dirty() and filesystems such as ext4 or xfs will start a
transaction to store inode in the journal. This gets freeze protection at
SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to
SB_FREEZE_FS before this second protection, things will deadlock.
Since the callers of this function hold mmap_sem, using SB_FREEZE_PAGEFAULT
protection would be appropriate. Also since there are just two places that
need the freeze protection I'd be inclined to open code the protection in
the two places rather than hiding it in a special function.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2013-09-04 14:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-23 0:03 [PATCH v3 0/5] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2013-08-23 0:03 ` [PATCH v4 1/7] mm: Track mappings that have been written via ptes Andy Lutomirski
2013-08-23 0:03 ` [PATCH v4 2/7] fs: Add inode_update_time_writable Andy Lutomirski
2013-08-23 0:03 ` [PATCH v4 3/7] mm: Allow filesystems to defer cmtime updates Andy Lutomirski
2013-09-04 14:57 ` Jan Kara [this message]
2013-09-04 17:54 ` Andy Lutomirski
2013-09-04 19:20 ` Jan Kara
2013-09-04 20:05 ` Andy Lutomirski
2013-08-23 0:03 ` [PATCH v4 4/7] mm: Scan for dirty ptes and update cmtime on MS_ASYNC Andy Lutomirski
2013-08-23 0:03 ` [PATCH v4 5/7] ext4: Defer mmap cmtime updates Andy Lutomirski
2013-08-23 0:03 ` [PATCH v4 6/7] btrfs: " Andy Lutomirski
2013-08-23 0:03 ` [PATCH v4 7/7] xfs: " Andy Lutomirski
2013-08-23 1:12 ` [PATCH v3 0/5] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2013-09-04 14:06 ` Jan Kara
2013-09-04 15:08 ` Jan Kara
2013-09-04 17:33 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130904145740.GE3996@quack.suse.cz \
--to=jack@suse.cz \
--cc=dave.hansen@linux.intel.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=tim.c.chen@linux.intel.com \
--cc=tytso@mit.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).