From: Andy Lutomirski <luto@amacapital.net>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Al Viro <viro@ZenIV.linux.org.uk>,
linux-fsdevel@vger.kernel.org, dchinner@redhat.com,
Jaya Kumar <jayalk@intworks.biz>, Sage Weil <sage@newdream.net>,
ceph-devel@vger.kernel.org, Steve French <sfrench@samba.org>,
linux-cifs@vger.kernel.org,
Eric Van Hensbergen <ericvh@gmail.com>,
Ron Minnich <rminnich@sandia.gov>,
Latchesar Ionkov <lucho@ionkov.net>,
v9fs-developer@lists.sourceforge.net,
Miklos Szeredi <miklos@szeredi.hu>,
fuse-devel@lists.sourceforge.net,
Steven Whitehouse <swhiteho@redhat.com>,
cluster-devel@redhat.com,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH 00/11 v2] Push file_update_time() into .page_mkwrite
Date: Thu, 08 Mar 2012 15:12:56 -0800 [thread overview]
Message-ID: <4F593CF8.2000105@amacapital.net> (raw)
In-Reply-To: <1330602103-8851-1-git-send-email-jack@suse.cz>
On 03/01/2012 03:41 AM, Jan Kara wrote:
> Hello,
>
> to provide reliable support for filesystem freezing, filesystems need to have
> complete control over when metadata is changed. In particular,
> file_update_time() calls from page fault code make it impossible for
> filesystems to prevent inodes from being dirtied while the filesystem is
> frozen.
>
> To fix the issue, this patch set changes page fault code to call
> file_update_time() only when ->page_mkwrite() callback is not provided. If the
> callback is provided, it is the responsibility of the filesystem to perform
> update of i_mtime / i_ctime if needed. We also push file_update_time() call
> to all existing ->page_mkwrite() implementations if the time update does not
> obviously happen by other means. If you know your filesystem does not need
> update of modification times in ->page_mkwrite() handler, please speak up and
> I'll drop the patch for your filesystem.
>
> As a side note, an alternative would be to remove call of file_update_time()
> from page fault code altogether and require all filesystems needing it to do
> that in their ->page_mkwrite() implementation. That is certainly possible
> although maybe slightly inefficient and would require auditting 100+
> vm_operations_structs *shiver*.
IMO updating file times should happen when changes get written out, not
when a page is made writable, for two reasons:
1. Correctness. With the current approach, it's very easy for files to
be changed after the last mtime update -- any changes between mkwrite
and actual writeback won't affect mtime.
2. Performance. I have an application (presumably guessable from my
email address) for which blocking in page_mkwrite is an absolute
show-stopper. (In fact it's so bad that we reverted back to running on
Windows until I hacked up a kernel to not do this.) I have an incorrect
patch [1] to fix it, but I haven't gotten around to a real fix. (I also
have stable pages reverted in my kernel. Some day I'll submit a patch
to make it a filesystem option. Or maybe it should even be a block
device / queue property like the alignment offset and optimal io size --
there are plenty of block device and file combinations which don't
benefit at all from stable pages.)
I'd prefer if file_update_time in page_mkwrite didn't proliferate. A
better fix is probably to introduce a new inode flag, update it when a
page is undirtied, and then dirty and write the inode from the writeback
path. (Kind of like my patch, but with an inode flag instead of a page
flag, and with the file_update_time done from the fs.)
[1] http://patchwork.ozlabs.org/patch/122516/
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@amacapital.net>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Al Viro <viro@ZenIV.linux.org.uk>,
linux-fsdevel@vger.kernel.org, dchinner@redhat.com,
Jaya Kumar <jayalk@intworks.biz>, Sage Weil <sage@newdream.net>,
ceph-devel@vger.kernel.org, Steve French <sfrench@samba.org>,
linux-cifs@vger.kernel.org,
Eric Van Hensbergen <ericvh@gmail.com>,
Ron Minnich <rminnich@sandia.gov>,
Latchesar Ionkov <lucho@ionkov.net>,
v9fs-developer@lists.sourceforge.net,
Miklos Szeredi <miklos@szeredi.hu>,
fuse-devel@lists.sourceforge.net,
Steven Whitehouse <swhiteho@redhat.com>,
cluster-devel@redhat.com,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH 00/11 v2] Push file_update_time() into .page_mkwrite
Date: Thu, 08 Mar 2012 15:12:56 -0800 [thread overview]
Message-ID: <4F593CF8.2000105@amacapital.net> (raw)
In-Reply-To: <1330602103-8851-1-git-send-email-jack@suse.cz>
On 03/01/2012 03:41 AM, Jan Kara wrote:
> Hello,
>
> to provide reliable support for filesystem freezing, filesystems need to have
> complete control over when metadata is changed. In particular,
> file_update_time() calls from page fault code make it impossible for
> filesystems to prevent inodes from being dirtied while the filesystem is
> frozen.
>
> To fix the issue, this patch set changes page fault code to call
> file_update_time() only when ->page_mkwrite() callback is not provided. If the
> callback is provided, it is the responsibility of the filesystem to perform
> update of i_mtime / i_ctime if needed. We also push file_update_time() call
> to all existing ->page_mkwrite() implementations if the time update does not
> obviously happen by other means. If you know your filesystem does not need
> update of modification times in ->page_mkwrite() handler, please speak up and
> I'll drop the patch for your filesystem.
>
> As a side note, an alternative would be to remove call of file_update_time()
> from page fault code altogether and require all filesystems needing it to do
> that in their ->page_mkwrite() implementation. That is certainly possible
> although maybe slightly inefficient and would require auditting 100+
> vm_operations_structs *shiver*.
IMO updating file times should happen when changes get written out, not
when a page is made writable, for two reasons:
1. Correctness. With the current approach, it's very easy for files to
be changed after the last mtime update -- any changes between mkwrite
and actual writeback won't affect mtime.
2. Performance. I have an application (presumably guessable from my
email address) for which blocking in page_mkwrite is an absolute
show-stopper. (In fact it's so bad that we reverted back to running on
Windows until I hacked up a kernel to not do this.) I have an incorrect
patch [1] to fix it, but I haven't gotten around to a real fix. (I also
have stable pages reverted in my kernel. Some day I'll submit a patch
to make it a filesystem option. Or maybe it should even be a block
device / queue property like the alignment offset and optimal io size --
there are plenty of block device and file combinations which don't
benefit at all from stable pages.)
I'd prefer if file_update_time in page_mkwrite didn't proliferate. A
better fix is probably to introduce a new inode flag, update it when a
page is undirtied, and then dirty and write the inode from the writeback
path. (Kind of like my patch, but with an inode flag instead of a page
flag, and with the file_update_time done from the fs.)
[1] http://patchwork.ozlabs.org/patch/122516/
--Andy
next prev parent reply other threads:[~2012-03-08 23:12 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-01 11:41 [PATCH 00/11 v2] Push file_update_time() into .page_mkwrite Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 11:41 ` [Cluster-devel] " Jan Kara
2012-03-01 11:41 ` [PATCH 1/9] fb_defio: Push file_update_time() into fb_deferred_io_mkwrite() Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 11:41 ` [PATCH 2/9] fs: Push file_update_time() into __block_page_mkwrite() Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 11:41 ` [PATCH 3/9] ceph: Push file_update_time() into ceph_page_mkwrite() Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 11:41 ` [PATCH 4/9] cifs: Push file_update_time() into cifs_page_mkwrite() Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 12:25 ` Jeff Layton
2012-03-01 12:25 ` Jeff Layton
2012-03-01 12:30 ` Jan Kara
2012-03-01 12:30 ` Jan Kara
2012-03-01 11:41 ` [PATCH 5/9] 9p: Push file_update_time() into v9fs_vm_page_mkwrite() Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 11:41 ` [PATCH 6/9] fuse: Push file_update_time() into fuse_page_mkwrite() Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 19:31 ` Miklos Szeredi
2012-03-01 19:31 ` Miklos Szeredi
2012-03-01 20:36 ` Jan Kara
2012-03-01 20:36 ` Jan Kara
2012-03-01 11:41 ` [Cluster-devel] [PATCH 7/9] gfs2: Push file_update_time() into gfs2_page_mkwrite() Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 11:41 ` [PATCH 8/9] sysfs: Push file_update_time() into bin_page_mkwrite() Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 11:41 ` [PATCH 9/9] mm: Update file times from fault path only if .page_mkwrite is not set Jan Kara
2012-03-01 11:41 ` Jan Kara
2012-03-01 12:23 ` [PATCH 00/11 v2] Push file_update_time() into .page_mkwrite Jan Kara
2012-03-01 12:23 ` Jan Kara
2012-03-01 12:23 ` [Cluster-devel] " Jan Kara
2012-03-01 23:29 ` Ted Ts'o
2012-03-01 23:29 ` Ted Ts'o
2012-03-02 9:41 ` Jan Kara
2012-03-02 9:41 ` Jan Kara
2012-03-02 9:41 ` [Cluster-devel] " Jan Kara
2012-03-08 23:12 ` Andy Lutomirski [this message]
2012-03-08 23:12 ` Andy Lutomirski
[not found] ` <4F593CF8.2000105-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
2012-03-09 8:19 ` Jan Kara
2012-03-09 8:19 ` Jan Kara
2012-03-09 8:19 ` Jan Kara
2012-03-09 8:19 ` [Cluster-devel] " Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F593CF8.2000105@amacapital.net \
--to=luto@amacapital.net \
--cc=akpm@linux-foundation.org \
--cc=ceph-devel@vger.kernel.org \
--cc=cluster-devel@redhat.com \
--cc=dchinner@redhat.com \
--cc=ericvh@gmail.com \
--cc=fuse-devel@lists.sourceforge.net \
--cc=gregkh@linuxfoundation.org \
--cc=jack@suse.cz \
--cc=jayalk@intworks.biz \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lucho@ionkov.net \
--cc=miklos@szeredi.hu \
--cc=rminnich@sandia.gov \
--cc=sage@newdream.net \
--cc=sfrench@samba.org \
--cc=swhiteho@redhat.com \
--cc=v9fs-developer@lists.sourceforge.net \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.