* [PATCH v2 0/3] Rework mtime and ctime updates on mmaped writes
@ 2012-12-21 21:28 Andy Lutomirski
2012-12-21 21:28 ` [PATCH v2 1/3] mm: Explicitly track when the page dirty bit is transferred from a pte Andy Lutomirski
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Andy Lutomirski @ 2012-12-21 21:28 UTC (permalink / raw)
To: linux-kernel, linux-mm, Linux FS Devel
Cc: Dave Chinner, Jan Kara, Al Viro, Andy Lutomirski
Writes via mmap currently update mtime and ctime in ->page_mkwrite.
This is unfortunate from a performance and a correctness point of view.
The file times should be updated after writes, not before (so that every
write eventually results in a fresh timestamp). This is needed for
POSIX compliance. More importantly (for me), ->page_mkwrite is called
periodically even on mlocked pages, and some filesystems can sleep in
mark_inode_dirty.
This patchset attempts to fix both issues at once. It adds a new
address_space flag AS_CMTIME that is set atomically whenever the system
transfers a pte dirty bit to a struct page backed by the address_space.
This can happen with various locks held and when low on memory.
Later on, whenever syncing an inode (which happens indirectly in msync)
or whenever a vma is torn down, if AS_CMTIME is set, then the file times
are updated. This happens in a context from which (I think) it's safe
to dirty inodes.
One nice property of this approach is that it requires no fs-specific
work. It's actually quite a bit simpler than I expected.
I've tested this, and mtime and ctime are updated on munmap, exit, MS_SYNC,
and fsync after writing via mmap. The times are also updated 30 seconds
after writing, all by themselves :) xfstest #215 also passes.Lockdep has
no complaints.
Changes from v1:
- inode_update_time_writable now locks against the fs freezer
- Minor cleanups
- Major changelog improvements
Andy Lutomirski (3):
mm: Explicitly track when the page dirty bit is transferred from a
pte
mm: Update file times when inodes are written after mmaped writes
Remove file_update_time from all mkwrite paths
fs/9p/vfs_file.c | 3 ---
fs/btrfs/inode.c | 4 +--
fs/buffer.c | 6 -----
fs/ceph/addr.c | 3 ---
fs/ext4/inode.c | 1 -
fs/gfs2/file.c | 3 ---
fs/inode.c | 72 ++++++++++++++++++++++++++++++++++++++-----------
fs/nilfs2/file.c | 1 -
fs/sysfs/bin.c | 2 --
include/linux/fs.h | 1 +
include/linux/mm.h | 1 +
include/linux/pagemap.h | 3 +++
mm/filemap.c | 1 -
mm/memory-failure.c | 4 +--
mm/memory.c | 5 +---
mm/mmap.c | 4 +++
mm/page-writeback.c | 42 +++++++++++++++++++++++++++--
mm/rmap.c | 9 ++++---
18 files changed, 115 insertions(+), 50 deletions(-)
--
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] mm: Explicitly track when the page dirty bit is transferred from a pte
2012-12-21 21:28 [PATCH v2 0/3] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
@ 2012-12-21 21:28 ` Andy Lutomirski
2012-12-21 21:28 ` [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes Andy Lutomirski
2012-12-21 21:28 ` [PATCH v2 3/3] Remove file_update_time from all mkwrite paths Andy Lutomirski
2 siblings, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2012-12-21 21:28 UTC (permalink / raw)
To: linux-kernel, linux-mm, Linux FS Devel
Cc: Dave Chinner, Jan Kara, Al Viro, Andy Lutomirski
This is a slight cleanup, but, more importantly, it will let us
easily detect writes via mmap when the process is done writing
(e.g. munmaps, msyncs, fsyncs, or dies).
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
include/linux/mm.h | 1 +
mm/memory-failure.c | 4 +---
mm/page-writeback.c | 16 ++++++++++++++--
mm/rmap.c | 9 ++++++---
4 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a44aa00..4c7e49b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1037,6 +1037,7 @@ int redirty_page_for_writepage(struct writeback_control *wbc,
void account_page_dirtied(struct page *page, struct address_space *mapping);
void account_page_writeback(struct page *page);
int set_page_dirty(struct page *page);
+int set_page_dirty_from_pte(struct page *page);
int set_page_dirty_lock(struct page *page);
int clear_page_dirty_for_io(struct page *page);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8b20278..ab5c3f4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -892,9 +892,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
mapping = page_mapping(hpage);
if (!(flags & MF_MUST_KILL) && !PageDirty(hpage) && mapping &&
mapping_cap_writeback_dirty(mapping)) {
- if (page_mkclean(hpage)) {
- SetPageDirty(hpage);
- } else {
+ if (!page_mkclean(hpage)) {
kill = 0;
ttu |= TTU_IGNORE_HWPOISON;
printk(KERN_INFO
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 830893b..cdea11a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2109,6 +2109,19 @@ int set_page_dirty(struct page *page)
EXPORT_SYMBOL(set_page_dirty);
/*
+ * Dirty a page due to the page table dirty bit.
+ *
+ * For pages with a mapping this should be done under the page lock.
+ * This does set_page_dirty plus extra work to account for the fact that
+ * the kernel was not notified when the actual write was done.
+ */
+int set_page_dirty_from_pte(struct page *page)
+{
+ /* Doesn't do anything interesting yet. */
+ return set_page_dirty(page);
+}
+
+/*
* set_page_dirty() is racy if the caller has no reference against
* page->mapping->host, and if the page is unlocked. This is because another
* CPU could truncate the page off the mapping and then free the mapping.
@@ -2175,8 +2188,7 @@ int clear_page_dirty_for_io(struct page *page)
* as a serialization point for all the different
* threads doing their things.
*/
- if (page_mkclean(page))
- set_page_dirty(page);
+ page_mkclean(page);
/*
* We carefully synchronise fault handlers against
* installing a dirty pte and marking the page dirty
diff --git a/mm/rmap.c b/mm/rmap.c
index 2ee1ef0..b8fe00e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -931,6 +931,9 @@ int page_mkclean(struct page *page)
ret = page_mkclean_file(mapping, page);
}
+ if (ret)
+ set_page_dirty_from_pte(page);
+
return ret;
}
EXPORT_SYMBOL_GPL(page_mkclean);
@@ -1151,7 +1154,7 @@ void page_remove_rmap(struct page *page)
*/
if (mapping && !mapping_cap_account_dirty(mapping) &&
page_test_and_clear_dirty(page_to_pfn(page), 1))
- set_page_dirty(page);
+ set_page_dirty_from_pte(page);
/*
* Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
* and not charged by memcg for now.
@@ -1229,7 +1232,7 @@ int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
- set_page_dirty(page);
+ set_page_dirty_from_pte(page);
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
@@ -1423,7 +1426,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
- set_page_dirty(page);
+ set_page_dirty_from_pte(page);
page_remove_rmap(page);
page_cache_release(page);
--
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes
2012-12-21 21:28 [PATCH v2 0/3] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2012-12-21 21:28 ` [PATCH v2 1/3] mm: Explicitly track when the page dirty bit is transferred from a pte Andy Lutomirski
@ 2012-12-21 21:28 ` Andy Lutomirski
2012-12-22 8:29 ` Christoph Hellwig
2012-12-24 8:36 ` Zheng Liu
2012-12-21 21:28 ` [PATCH v2 3/3] Remove file_update_time from all mkwrite paths Andy Lutomirski
2 siblings, 2 replies; 10+ messages in thread
From: Andy Lutomirski @ 2012-12-21 21:28 UTC (permalink / raw)
To: linux-kernel, linux-mm, Linux FS Devel
Cc: Dave Chinner, Jan Kara, Al Viro, Andy Lutomirski
The onus is currently on filesystems to call file_update_time
somewhere in the page_mkwrite path. This is unfortunate for three
reasons:
1. page_mkwrite on a locked page should be fast. ext4, for example,
often sleeps while dirtying inodes. (This could be considered a
fixable problem with ext4, but this approach makes it
irrelevant.)
2. The current behavior is surprising -- the timestamp resulting
from an mmaped write will be before the write, not after. This
contradicts POSIX, which says:
The st_ctime and st_mtime fields of a file that is mapped with
MAP_SHARED and PROT_WRITE, will be marked for update at some
point in the interval between a write reference to the mapped
region and the next call to msync() with MS_ASYNC or MS_SYNC
for that portion of the file by any process. If there is no
such call, these fields may be marked for update at any time
after a write reference if the underlying file is modified as
a result.
We currently get this wrong:
addr = mmap(..., PROT_WRITE, MAP_SHARED, fd, 0);
*addr = 0; <-- mtime is updated here
sleep(5);
*addr = 1; <-- this write will never trigger an mtime update
msync(fd, MS_SYNC);
For now, MS_ASYNC still doesn't trigger an mtime update because
it doesn't do anything at all. This would be easy enough to fix.
POSIX (oddly IMO) does not require that munmap(2), fsync(2), or
_exit(2) cause mtime writes, but this patch is careful to do that
as well. According to Jan Kara, people have complained about
ld(1) writing its output via mmap and the timestamp magically
changing well after ld is done.
3. (An ulterior motive) I'd like to use hardware dirty tracking for
shared, locked, writable mappings (instead of page faults).
Moving important work out of the page_mkwrite path is an
important first step.
This patch moves the time update into the core pagecache code. When
a pte dirty bit is transferred to struct page, a new address_space
flag AS_CMTIME is atomically set on the mapping. (This happens
during writeback and when ptes are unmapped.) Subsequently (after
an inode is written back or when a vma is removed), the AS_CMTIME
bit is checked, and, if set, the inode's time is updated.
The next patch will remove the now-unnecessary file_update_time
calls in ->page_mkwrite.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
fs/inode.c | 72 ++++++++++++++++++++++++++++++++++++++-----------
include/linux/fs.h | 1 +
include/linux/pagemap.h | 3 +++
mm/memory.c | 2 +-
mm/mmap.c | 4 +++
mm/page-writeback.c | 30 +++++++++++++++++++--
6 files changed, 94 insertions(+), 18 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 64999f1..1d2e303 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1640,6 +1640,34 @@ int file_remove_suid(struct file *file)
}
EXPORT_SYMBOL(file_remove_suid);
+/*
+ * This does the work that's common to file_update_time and
+ * inode_update_time.
+ */
+static int prepare_update_cmtime(struct inode *inode, struct timespec *now)
+{
+ int sync_it;
+
+ /* First try to exhaust all avenues to not sync */
+ if (IS_NOCMTIME(inode))
+ return 0;
+
+ *now = current_fs_time(inode->i_sb);
+ if (!timespec_equal(&inode->i_mtime, now))
+ sync_it = S_MTIME;
+
+ if (!timespec_equal(&inode->i_ctime, now))
+ sync_it |= S_CTIME;
+
+ if (IS_I_VERSION(inode))
+ sync_it |= S_VERSION;
+
+ if (!sync_it)
+ return 0;
+
+ return sync_it;
+}
+
/**
* file_update_time - update mtime and ctime time
* @file: file accessed
@@ -1657,23 +1685,9 @@ int file_update_time(struct file *file)
{
struct inode *inode = file->f_path.dentry->d_inode;
struct timespec now;
- int sync_it = 0;
+ int sync_it = prepare_update_cmtime(inode, &now);
int ret;
- /* First try to exhaust all avenues to not sync */
- if (IS_NOCMTIME(inode))
- return 0;
-
- now = current_fs_time(inode->i_sb);
- if (!timespec_equal(&inode->i_mtime, &now))
- sync_it = S_MTIME;
-
- if (!timespec_equal(&inode->i_ctime, &now))
- sync_it |= S_CTIME;
-
- if (IS_I_VERSION(inode))
- sync_it |= S_VERSION;
-
if (!sync_it)
return 0;
@@ -1688,6 +1702,34 @@ int file_update_time(struct file *file)
}
EXPORT_SYMBOL(file_update_time);
+/**
+ * inode_update_time_writable - update mtime and ctime time
+ * @inode: inode accessed
+ *
+ * This is like file_update_time, but it assumes the mnt is writable
+ * and takes an inode parameter instead. (We need to assume the mnt
+ * was writable because inodes aren't associated with any particular
+ * mnt.
+ */
+
+int inode_update_time_writable(struct inode *inode)
+{
+ struct timespec now;
+ int sync_it = prepare_update_cmtime(inode, &now);
+ int ret;
+
+ if (!sync_it)
+ return 0;
+
+ /* sb_start_pagefault and update_time can both sleep. */
+ sb_start_pagefault(inode->i_sb);
+ ret = update_time(inode, &now, sync_it);
+ sb_end_pagefault(inode->i_sb);
+
+ return ret;
+}
+EXPORT_SYMBOL(inode_update_time_writable);
+
int inode_needs_sync(struct inode *inode)
{
if (IS_SYNC(inode))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 75fe9a1..c95f9fa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2554,6 +2554,7 @@ extern int inode_newsize_ok(const struct inode *, loff_t offset);
extern void setattr_copy(struct inode *inode, const struct iattr *attr);
extern int file_update_time(struct file *file);
+extern int inode_update_time_writable(struct inode *inode);
extern int generic_show_options(struct seq_file *m, struct dentry *root);
extern void save_mount_options(struct super_block *sb, char *options);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e42c762..a038ed9 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -24,6 +24,7 @@ enum mapping_flags {
AS_ENOSPC = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */
AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
+ AS_CMTIME = __GFP_BITS_SHIFT + 4, /* written via pte */
};
static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -68,6 +69,8 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
(__force unsigned long)mask;
}
+extern void mapping_flush_cmtime(struct address_space *mapping);
+
/*
* The page cache can done in larger chunks than
* one page, because it allows for more efficient
diff --git a/mm/memory.c b/mm/memory.c
index 221fc9f..086b901 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1160,7 +1160,7 @@ again:
rss[MM_ANONPAGES]--;
else {
if (pte_dirty(ptent))
- set_page_dirty(page);
+ set_page_dirty_from_pte(page);
if (pte_young(ptent) &&
likely(!VM_SequentialReadHint(vma)))
mark_page_accessed(page);
diff --git a/mm/mmap.c b/mm/mmap.c
index 3913262..60301dc 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -223,6 +223,10 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
struct vm_area_struct *next = vma->vm_next;
might_sleep();
+
+ if (vma->vm_file)
+ mapping_flush_cmtime(vma->vm_file->f_mapping);
+
if (vma->vm_ops && vma->vm_ops->close)
vma->vm_ops->close(vma);
if (vma->vm_file)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index cdea11a..9f5d50f 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1910,6 +1910,13 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
ret = mapping->a_ops->writepages(mapping, wbc);
else
ret = generic_writepages(mapping, wbc);
+
+ /*
+ * This is after writepages because the AS_CMTIME bit won't
+ * bet set until writepages is called.
+ */
+ mapping_flush_cmtime(mapping);
+
return ret;
}
@@ -2117,8 +2124,17 @@ EXPORT_SYMBOL(set_page_dirty);
*/
int set_page_dirty_from_pte(struct page *page)
{
- /* Doesn't do anything interesting yet. */
- return set_page_dirty(page);
+ int ret = set_page_dirty(page);
+ struct address_space *mapping = page_mapping(page);
+
+ /*
+ * We may be out of memory and/or have various locks held, so
+ * there isn't much we can do in here.
+ */
+ if (mapping)
+ set_bit(AS_CMTIME, &mapping->flags);
+
+ return ret;
}
/*
@@ -2287,3 +2303,13 @@ int mapping_tagged(struct address_space *mapping, int tag)
return radix_tree_tagged(&mapping->page_tree, tag);
}
EXPORT_SYMBOL(mapping_tagged);
+
+/*
+ * Call from any context from which inode_update_time_writable would be okay
+ * to flush deferred cmtime changes.
+ */
+void mapping_flush_cmtime(struct address_space *mapping)
+{
+ if (test_and_clear_bit(AS_CMTIME, &mapping->flags))
+ inode_update_time_writable(mapping->host);
+}
--
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/3] Remove file_update_time from all mkwrite paths
2012-12-21 21:28 [PATCH v2 0/3] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2012-12-21 21:28 ` [PATCH v2 1/3] mm: Explicitly track when the page dirty bit is transferred from a pte Andy Lutomirski
2012-12-21 21:28 ` [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes Andy Lutomirski
@ 2012-12-21 21:28 ` Andy Lutomirski
2 siblings, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2012-12-21 21:28 UTC (permalink / raw)
To: linux-kernel, linux-mm, Linux FS Devel
Cc: Dave Chinner, Jan Kara, Al Viro, Andy Lutomirski
The times are now updated at sync time.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
fs/9p/vfs_file.c | 3 ---
fs/btrfs/inode.c | 4 +---
fs/buffer.c | 6 ------
fs/ceph/addr.c | 3 ---
fs/ext4/inode.c | 1 -
fs/gfs2/file.c | 3 ---
fs/nilfs2/file.c | 1 -
fs/sysfs/bin.c | 2 --
mm/filemap.c | 1 -
mm/memory.c | 3 ---
10 files changed, 1 insertion(+), 26 deletions(-)
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index c2483e9..34b84f0 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -610,9 +610,6 @@ v9fs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
p9_debug(P9_DEBUG_VFS, "page %p fid %lx\n",
page, (unsigned long)filp->private_data);
- /* Update file times before taking page lock */
- file_update_time(filp);
-
v9inode = V9FS_I(inode);
/* make sure the cache has finished storing the page */
v9fs_fscache_wait_on_page_write(inode, page);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95542a1..6fb8558 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6747,10 +6747,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
sb_start_pagefault(inode->i_sb);
ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
if (!ret) {
- ret = file_update_time(vma->vm_file);
reserved = 1;
- }
- if (ret) {
+ } else if (ret) {
if (ret == -ENOMEM)
ret = VM_FAULT_OOM;
else /* -ENOSPC, -EIO, etc */
diff --git a/fs/buffer.c b/fs/buffer.c
index ec0aca8..a9a8f2a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2379,12 +2379,6 @@ int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
sb_start_pagefault(sb);
- /*
- * Update file times before taking page lock. We may end up failing the
- * fault so this update may be superfluous but who really cares...
- */
- file_update_time(vma->vm_file);
-
ret = __block_page_mkwrite(vma, vmf, get_block);
sb_end_pagefault(sb);
return block_page_mkwrite_return(ret);
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6690269..19af339 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1183,9 +1183,6 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
loff_t size, len;
int ret;
- /* Update time before taking page lock */
- file_update_time(vma->vm_file);
-
size = i_size_read(inode);
if (off + PAGE_CACHE_SIZE <= size)
len = PAGE_CACHE_SIZE;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b3c243b..5ddbc75 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4780,7 +4780,6 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
int retries = 0;
sb_start_pagefault(inode->i_sb);
- file_update_time(vma->vm_file);
/* Delalloc case is easy... */
if (test_opt(inode->i_sb, DELALLOC) &&
!ext4_should_journal_data(inode) &&
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index e056b4c..999b1ed 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -398,9 +398,6 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
sb_start_pagefault(inode->i_sb);
- /* Update file times before taking page lock */
- file_update_time(vma->vm_file);
-
ret = gfs2_rs_alloc(ip);
if (ret)
return ret;
diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index 16f35f7..185b8e6 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -116,7 +116,6 @@ static int nilfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
if (unlikely(ret))
goto out;
- file_update_time(vma->vm_file);
ret = __block_page_mkwrite(vma, vmf, nilfs_get_block);
if (ret) {
nilfs_transaction_abort(inode->i_sb);
diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 614b2b5..a475983 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -228,8 +228,6 @@ static int bin_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
ret = 0;
if (bb->vm_ops->page_mkwrite)
ret = bb->vm_ops->page_mkwrite(vma, vmf);
- else
- file_update_time(file);
sysfs_put_active(attr_sd);
return ret;
diff --git a/mm/filemap.c b/mm/filemap.c
index 83efee7..feb0540 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1715,7 +1715,6 @@ int filemap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
int ret = VM_FAULT_LOCKED;
sb_start_pagefault(inode->i_sb);
- file_update_time(vma->vm_file);
lock_page(page);
if (page->mapping != inode->i_mapping) {
unlock_page(page);
diff --git a/mm/memory.c b/mm/memory.c
index 086b901..07a0b0d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2658,9 +2658,6 @@ reuse:
if (!page_mkwrite) {
wait_on_page_locked(dirty_page);
set_page_dirty_balance(dirty_page, page_mkwrite);
- /* file_update_time outside page_lock */
- if (vma->vm_file)
- file_update_time(vma->vm_file);
}
put_page(dirty_page);
if (page_mkwrite) {
--
1.7.11.7
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes
2012-12-21 21:28 ` [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes Andy Lutomirski
@ 2012-12-22 8:29 ` Christoph Hellwig
2012-12-22 8:43 ` Andy Lutomirski
2012-12-24 8:36 ` Zheng Liu
1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2012-12-22 8:29 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-kernel, linux-mm, Linux FS Devel, Dave Chinner, Jan Kara,
Al Viro
NAK, we went through great trouble to get rid of the nasty layering
violation where the VM called file_update_time directly just a short
while ago, reintroducing that is a massive step back.
Make sure whatever "solution" for your problem you come up with keeps
the file update in the filesystem or generic helpers.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes
2012-12-22 8:29 ` Christoph Hellwig
@ 2012-12-22 8:43 ` Andy Lutomirski
2012-12-31 16:11 ` Jan Kara
0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2012-12-22 8:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-kernel, linux-mm, Linux FS Devel, Dave Chinner, Jan Kara,
Al Viro
On Sat, Dec 22, 2012 at 12:29 AM, Christoph Hellwig <hch@infradead.org> wrote:
> NAK, we went through great trouble to get rid of the nasty layering
> violation where the VM called file_update_time directly just a short
> while ago, reintroducing that is a massive step back.
>
> Make sure whatever "solution" for your problem you come up with keeps
> the file update in the filesystem or generic helpers.
>
There's an inode operation ->update_time that is called (if it exists)
in these patches to update the time. Is that insufficient? I could
add a new inode operation ->modified_by_mmap that would be called in
mapping_flush_cmtime if that would be better.
The original version of this patch did the update in ->writepage and
->writepages, but that may have had lock ordering issues. (I wasn't
able to confirm that there was any actual problem.)
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes
2012-12-21 21:28 ` [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes Andy Lutomirski
2012-12-22 8:29 ` Christoph Hellwig
@ 2012-12-24 8:36 ` Zheng Liu
1 sibling, 0 replies; 10+ messages in thread
From: Zheng Liu @ 2012-12-24 8:36 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-kernel, linux-mm, Linux FS Devel, Dave Chinner, Jan Kara,
Al Viro
On Fri, Dec 21, 2012 at 01:28:27PM -0800, Andy Lutomirski wrote:
> The onus is currently on filesystems to call file_update_time
> somewhere in the page_mkwrite path. This is unfortunate for three
> reasons:
>
> 1. page_mkwrite on a locked page should be fast. ext4, for example,
> often sleeps while dirtying inodes. (This could be considered a
> fixable problem with ext4, but this approach makes it
> irrelevant.)
Hi Andy,
Out of curiosity, could you please share more detailed information about
how to reproduce and measure this problem in ext4?
Thanks
- Zheng
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes
2012-12-22 8:43 ` Andy Lutomirski
@ 2012-12-31 16:11 ` Jan Kara
2013-01-03 17:49 ` Andy Lutomirski
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2012-12-31 16:11 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Christoph Hellwig, linux-kernel, linux-mm, Linux FS Devel,
Dave Chinner, Jan Kara, Al Viro
On Sat 22-12-12 00:43:30, Andy Lutomirski wrote:
> On Sat, Dec 22, 2012 at 12:29 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > NAK, we went through great trouble to get rid of the nasty layering
> > violation where the VM called file_update_time directly just a short
> > while ago, reintroducing that is a massive step back.
> >
> > Make sure whatever "solution" for your problem you come up with keeps
> > the file update in the filesystem or generic helpers.
>
>
> There's an inode operation ->update_time that is called (if it exists)
> in these patches to update the time. Is that insufficient?
Filesystems need more choice in when they modify time stamps because that
can mean complex work like starting a transaction which has various locking
implications. Just making a separate callback for this doesn't really solve
the problem. Although I don't see particular problem with the call sites you
chose Christoph is right we probably don't want to reintroduce updates of
time stamps from generic code because eventually someone will have problem
with that.
> I could add a new inode operation ->modified_by_mmap that would be
> called in mapping_flush_cmtime if that would be better.
Not really. That's only uglier ;)
> The original version of this patch did the update in ->writepage and
> ->writepages, but that may have had lock ordering issues. (I wasn't
> able to confirm that there was any actual problem.)
Well, your call of mapping_flush_cmtime() from do_writepages() is easy to
move to generic_writepages(). Thus filesystem can easily implement it's own
->writepages() callback if time update doesn't suit it. With the call from
remove_vma() it is more problematic (and the calling context there is
harder as well because we hold mmap_sem). We could maybe leave the call
upto filesystem's ->release callback (and provide generic ->release handler
which just calls mapping_flush_cmtime()). It won't be perfect because that
gets called only after the last file descriptor for that struct file is
closed (i.e., if a process forks and child inherits mappings, ->release gets
called only after both parent and the child unmap the file) but it should
catch 99% of the real world cases. Christoph, would the be OK with
you?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes
2012-12-31 16:11 ` Jan Kara
@ 2013-01-03 17:49 ` Andy Lutomirski
2013-01-03 18:56 ` Jan Kara
0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2013-01-03 17:49 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, linux-kernel, linux-mm, Linux FS Devel,
Dave Chinner, Al Viro
On Mon, Dec 31, 2012 at 8:11 AM, Jan Kara <jack@suse.cz> wrote:
> On Sat 22-12-12 00:43:30, Andy Lutomirski wrote:
>> On Sat, Dec 22, 2012 at 12:29 AM, Christoph Hellwig <hch@infradead.org> wrote:
>> > NAK, we went through great trouble to get rid of the nasty layering
>> > violation where the VM called file_update_time directly just a short
>> > while ago, reintroducing that is a massive step back.
>> >
[...]
>
>> The original version of this patch did the update in ->writepage and
>> ->writepages, but that may have had lock ordering issues. (I wasn't
>> able to confirm that there was any actual problem.)
> Well, your call of mapping_flush_cmtime() from do_writepages() is easy to
> move to generic_writepages(). Thus filesystem can easily implement it's own
> ->writepages() callback if time update doesn't suit it.
That sounds fine to me. Updating the handful of filesystems in there
isn't a big deal.
>With the call from
> remove_vma() it is more problematic (and the calling context there is
> harder as well because we hold mmap_sem). We could maybe leave the call
> upto filesystem's ->release callback (and provide generic ->release handler
> which just calls mapping_flush_cmtime()). It won't be perfect because that
> gets called only after the last file descriptor for that struct file is
> closed (i.e., if a process forks and child inherits mappings, ->release gets
> called only after both parent and the child unmap the file) but it should
> catch 99% of the real world cases. Christoph, would the be OK with
> you?
I'm not sure that 99% is good enough -- I'd be nervous about breaking
some build or versioning system.
vm_ops->close is almost a good place for this, except that it's called
on some failure paths and it will mess up is_mergeable_vma if lots of
filesystems suddenly have a ->close operation. What about adding
vm_ops->flush, which would be called in remove_vma and possibly
msync(MS_ASYNC)? I think that all real filesystems (i.e. things that
care about cmtime updates) have vm_operations.
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes
2013-01-03 17:49 ` Andy Lutomirski
@ 2013-01-03 18:56 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2013-01-03 18:56 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Jan Kara, Christoph Hellwig, linux-kernel, linux-mm,
Linux FS Devel, Dave Chinner, Al Viro
On Thu 03-01-13 09:49:37, Andy Lutomirski wrote:
> On Mon, Dec 31, 2012 at 8:11 AM, Jan Kara <jack@suse.cz> wrote:
> > On Sat 22-12-12 00:43:30, Andy Lutomirski wrote:
> >> On Sat, Dec 22, 2012 at 12:29 AM, Christoph Hellwig <hch@infradead.org> wrote:
> >> > NAK, we went through great trouble to get rid of the nasty layering
> >> > violation where the VM called file_update_time directly just a short
> >> > while ago, reintroducing that is a massive step back.
> >> >
[...]
> >With the call from
> > remove_vma() it is more problematic (and the calling context there is
> > harder as well because we hold mmap_sem). We could maybe leave the call
> > upto filesystem's ->release callback (and provide generic ->release handler
> > which just calls mapping_flush_cmtime()). It won't be perfect because that
> > gets called only after the last file descriptor for that struct file is
> > closed (i.e., if a process forks and child inherits mappings, ->release gets
> > called only after both parent and the child unmap the file) but it should
> > catch 99% of the real world cases. Christoph, would the be OK with
> > you?
>
> I'm not sure that 99% is good enough -- I'd be nervous about breaking
> some build or versioning system.
>
> vm_ops->close is almost a good place for this, except that it's called
> on some failure paths and it will mess up is_mergeable_vma if lots of
> filesystems suddenly have a ->close operation. What about adding
> vm_ops->flush, which would be called in remove_vma and possibly
> msync(MS_ASYNC)? I think that all real filesystems (i.e. things that
> care about cmtime updates) have vm_operations.
Yeah, that could work. I'm still somewhat nervous about updating the time
stamp under mmap_sem but in ->page_mkwrite we were in the same situation so
I guess it's fine.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-01-03 18:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-21 21:28 [PATCH v2 0/3] Rework mtime and ctime updates on mmaped writes Andy Lutomirski
2012-12-21 21:28 ` [PATCH v2 1/3] mm: Explicitly track when the page dirty bit is transferred from a pte Andy Lutomirski
2012-12-21 21:28 ` [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes Andy Lutomirski
2012-12-22 8:29 ` Christoph Hellwig
2012-12-22 8:43 ` Andy Lutomirski
2012-12-31 16:11 ` Jan Kara
2013-01-03 17:49 ` Andy Lutomirski
2013-01-03 18:56 ` Jan Kara
2012-12-24 8:36 ` Zheng Liu
2012-12-21 21:28 ` [PATCH v2 3/3] Remove file_update_time from all mkwrite paths Andy Lutomirski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).