* [PATCH -v5 0/2] Updating ctime and mtime for memory-mapped files
@ 2008-01-17 0:57 Anton Salikhmetov
2008-01-17 0:57 ` [PATCH -v5 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov
2008-01-17 0:57 ` [PATCH -v5 2/2] Updating ctime and mtime at syncing Anton Salikhmetov
0 siblings, 2 replies; 18+ messages in thread
From: Anton Salikhmetov @ 2008-01-17 0:57 UTC (permalink / raw)
To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb,
miklos, r.e.wolff, hidave.darkstar, hch
This is the fifth version of my solution for the bug #2645:
http://bugzilla.kernel.org/show_bug.cgi?id=2645
New since the previous version:
1) the case of retouching an already-dirty page pointed out
by Miklos Szeredi has been correctly addressed;
2) a few cosmetic changes according to the latest feedback;
3) fixed the error of calling a possibly sleeping function
from an atomic context.
The design for the first item above was suggested by Peter Zijlstra:
> It would require scanning the PTEs and marking them read-only again on
> MS_ASYNC, and some more logic in set_page_dirty() because that currently
> bails out early if the page in question is already dirty.
Miklos' test program now produces the following output for
the repeated calls to msync() with the MS_ASYNC flag:
debian:~/miklos# ./miklos_test file
begin 1200529196 1200529196 1200528798
write 1200529197 1200529197 1200528798
mmap 1200529197 1200529197 1200529198
b 1200529197 1200529197 1200529198
msync b 1200529199 1200529199 1200529198
c 1200529199 1200529199 1200529198
msync c 1200529201 1200529201 1200529198
d 1200529201 1200529201 1200529198
munmap 1200529201 1200529201 1200529198
close 1200529201 1200529201 1200529198
sync 1200529204 1200529204 1200529198
debian:~/miklos#
Miklos' test program can be found using the following link:
http://lkml.org/lkml/2008/1/14/104
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH -v5 1/2] Massive code cleanup of sys_msync() 2008-01-17 0:57 [PATCH -v5 0/2] Updating ctime and mtime for memory-mapped files Anton Salikhmetov @ 2008-01-17 0:57 ` Anton Salikhmetov 2008-01-17 11:01 ` Miklos Szeredi 2008-01-17 0:57 ` [PATCH -v5 2/2] Updating ctime and mtime at syncing Anton Salikhmetov 1 sibling, 1 reply; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 0:57 UTC (permalink / raw) To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, miklos, r.e.wolff, hidave.darkstar, hch Substantial code cleanup of the sys_msync() function: 1) using the PAGE_ALIGN() macro instead of "manual" alignment; 2) improved readability of the loop traversing the process memory regions. Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> --- mm/msync.c | 74 +++++++++++++++++++++++++++++------------------------------ 1 files changed, 36 insertions(+), 38 deletions(-) diff --git a/mm/msync.c b/mm/msync.c index 144a757..44997bf 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -1,24 +1,22 @@ /* - * linux/mm/msync.c + * The msync() system call. * - * Copyright (C) 1994-1999 Linus Torvalds + * Copyright (C) 1994-1999 Linus Torvalds + * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> */ -/* - * The msync() system call. - */ +#include <linux/file.h> #include <linux/fs.h> #include <linux/mm.h> #include <linux/mman.h> -#include <linux/file.h> -#include <linux/syscalls.h> #include <linux/sched.h> +#include <linux/syscalls.h> /* * MS_SYNC syncs the entire file - including mappings. * * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). + * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). * Now it doesn't do anything, since dirty pages are properly tracked. * * The application may now run fsync() to @@ -33,8 +31,7 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) unsigned long end; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - int unmapped_error = 0; - int error = -EINVAL; + int error = -EINVAL, unmapped_error = 0; if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; @@ -42,62 +39,63 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) goto out; if ((flags & MS_ASYNC) && (flags & MS_SYNC)) goto out; - error = -ENOMEM; - len = (len + ~PAGE_MASK) & PAGE_MASK; + + len = PAGE_ALIGN(len); end = start + len; - if (end < start) + if (end < start) { + error = -ENOMEM; goto out; + } + error = 0; + if (end == start) goto out; + /* * If the interval [start,end) covers some unmapped address ranges, * just ignore them, but return -ENOMEM at the end. */ down_read(&mm->mmap_sem); vma = find_vma(mm, start); - for (;;) { + do { struct file *file; - /* Still start < end. */ - error = -ENOMEM; - if (!vma) - goto out_unlock; - /* Here start < vma->vm_end. */ + if (!vma) { + error = -ENOMEM; + break; + } if (start < vma->vm_start) { start = vma->vm_start; - if (start >= end) - goto out_unlock; + if (start >= end) { + error = -ENOMEM; + break; + } unmapped_error = -ENOMEM; } - /* Here vma->vm_start <= start < vma->vm_end. */ - if ((flags & MS_INVALIDATE) && - (vma->vm_flags & VM_LOCKED)) { + if ((flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED)) { error = -EBUSY; - goto out_unlock; + break; } - file = vma->vm_file; start = vma->vm_end; - if ((flags & MS_SYNC) && file && - (vma->vm_flags & VM_SHARED)) { + + file = vma->vm_file; + if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) { get_file(file); up_read(&mm->mmap_sem); error = do_fsync(file, 0); fput(file); - if (error || start >= end) + if (error) goto out; down_read(&mm->mmap_sem); vma = find_vma(mm, start); - } else { - if (start >= end) { - error = 0; - goto out_unlock; - } - vma = vma->vm_next; + continue; } - } -out_unlock: + + vma = vma->vm_next; + } while (start < end); up_read(&mm->mmap_sem); + out: - return error ? : unmapped_error; + return error ? error : unmapped_error; } -- 1.4.4.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 1/2] Massive code cleanup of sys_msync() 2008-01-17 0:57 ` [PATCH -v5 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov @ 2008-01-17 11:01 ` Miklos Szeredi 2008-01-17 11:47 ` Anton Salikhmetov 0 siblings, 1 reply; 18+ messages in thread From: Miklos Szeredi @ 2008-01-17 11:01 UTC (permalink / raw) To: salikhmetov Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, miklos, r.e.wolff, hidave.darkstar, hch > Substantial code cleanup of the sys_msync() function: > > 1) using the PAGE_ALIGN() macro instead of "manual" alignment; > 2) improved readability of the loop traversing the process memory regions. > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > --- > mm/msync.c | 74 +++++++++++++++++++++++++++++------------------------------ > 1 files changed, 36 insertions(+), 38 deletions(-) > > diff --git a/mm/msync.c b/mm/msync.c > index 144a757..44997bf 100644 > --- a/mm/msync.c > +++ b/mm/msync.c > @@ -1,24 +1,22 @@ > /* > - * linux/mm/msync.c > + * The msync() system call. > * > - * Copyright (C) 1994-1999 Linus Torvalds > + * Copyright (C) 1994-1999 Linus Torvalds > + * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> > */ > > -/* > - * The msync() system call. > - */ > +#include <linux/file.h> > #include <linux/fs.h> > #include <linux/mm.h> > #include <linux/mman.h> > -#include <linux/file.h> > -#include <linux/syscalls.h> > #include <linux/sched.h> > +#include <linux/syscalls.h> > > /* > * MS_SYNC syncs the entire file - including mappings. > * > * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > - * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). > + * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > * Now it doesn't do anything, since dirty pages are properly tracked. > * > * The application may now run fsync() to > @@ -33,8 +31,7 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > unsigned long end; > struct mm_struct *mm = current->mm; > struct vm_area_struct *vma; > - int unmapped_error = 0; > - int error = -EINVAL; > + int error = -EINVAL, unmapped_error = 0; I prefer multi-line variable declarations, especially for ones with an initializer. > > if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) > goto out; > @@ -42,62 +39,63 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > goto out; > if ((flags & MS_ASYNC) && (flags & MS_SYNC)) > goto out; > - error = -ENOMEM; > - len = (len + ~PAGE_MASK) & PAGE_MASK; > + > + len = PAGE_ALIGN(len); > end = start + len; > - if (end < start) > + if (end < start) { > + error = -ENOMEM; The usual style is to have the error assignment outside the conditional. That way is shorter, clearer, as well as possibly generating better code. > goto out; > + } > + > error = 0; > + Unnecessary empty line here, these two statements actually belong together. > if (end == start) > goto out; > + > /* > * If the interval [start,end) covers some unmapped address ranges, > * just ignore them, but return -ENOMEM at the end. > */ > down_read(&mm->mmap_sem); > vma = find_vma(mm, start); > - for (;;) { > + do { > struct file *file; > > - /* Still start < end. */ > - error = -ENOMEM; > - if (!vma) > - goto out_unlock; > - /* Here start < vma->vm_end. */ > + if (!vma) { > + error = -ENOMEM; > + break; > + } Again, error asignment should be outside the conditional. This of course means, you'll have to set the error back to zero at the end of the loop, but that's fine. > if (start < vma->vm_start) { > start = vma->vm_start; > - if (start >= end) > - goto out_unlock; > + if (start >= end) { > + error = -ENOMEM; > + break; > + } Ditto. > unmapped_error = -ENOMEM; > } > - /* Here vma->vm_start <= start < vma->vm_end. */ > - if ((flags & MS_INVALIDATE) && > - (vma->vm_flags & VM_LOCKED)) { > + if ((flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED)) { > error = -EBUSY; Ditto2. > - goto out_unlock; > + break; > } > - file = vma->vm_file; > start = vma->vm_end; > - if ((flags & MS_SYNC) && file && > - (vma->vm_flags & VM_SHARED)) { > + > + file = vma->vm_file; > + if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) { > get_file(file); > up_read(&mm->mmap_sem); > error = do_fsync(file, 0); > fput(file); > - if (error || start >= end) > + if (error) This simplifies, but also does unnecessary down/find_vma/up. > goto out; > down_read(&mm->mmap_sem); > vma = find_vma(mm, start); > - } else { > - if (start >= end) { > - error = 0; > - goto out_unlock; > - } > - vma = vma->vm_next; > + continue; > } > - } > -out_unlock: > + > + vma = vma->vm_next; > + } while (start < end); > up_read(&mm->mmap_sem); > + > out: > - return error ? : unmapped_error; > + return error ? error : unmapped_error; > } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 1/2] Massive code cleanup of sys_msync() 2008-01-17 11:01 ` Miklos Szeredi @ 2008-01-17 11:47 ` Anton Salikhmetov 0 siblings, 0 replies; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 11:47 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > Substantial code cleanup of the sys_msync() function: > > > > 1) using the PAGE_ALIGN() macro instead of "manual" alignment; > > 2) improved readability of the loop traversing the process memory regions. > > > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > > --- > > mm/msync.c | 74 +++++++++++++++++++++++++++++------------------------------ > > 1 files changed, 36 insertions(+), 38 deletions(-) > > > > diff --git a/mm/msync.c b/mm/msync.c > > index 144a757..44997bf 100644 > > --- a/mm/msync.c > > +++ b/mm/msync.c > > @@ -1,24 +1,22 @@ > > /* > > - * linux/mm/msync.c > > + * The msync() system call. > > * > > - * Copyright (C) 1994-1999 Linus Torvalds > > + * Copyright (C) 1994-1999 Linus Torvalds > > + * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> > > */ > > > > -/* > > - * The msync() system call. > > - */ > > +#include <linux/file.h> > > #include <linux/fs.h> > > #include <linux/mm.h> > > #include <linux/mman.h> > > -#include <linux/file.h> > > -#include <linux/syscalls.h> > > #include <linux/sched.h> > > +#include <linux/syscalls.h> > > > > /* > > * MS_SYNC syncs the entire file - including mappings. > > * > > * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > > - * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). > > + * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > > * Now it doesn't do anything, since dirty pages are properly tracked. > > * > > * The application may now run fsync() to > > @@ -33,8 +31,7 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > > unsigned long end; > > struct mm_struct *mm = current->mm; > > struct vm_area_struct *vma; > > - int unmapped_error = 0; > > - int error = -EINVAL; > > + int error = -EINVAL, unmapped_error = 0; > > I prefer multi-line variable declarations, especially for ones with an > initializer. > > > > > if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) > > goto out; > > @@ -42,62 +39,63 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > > goto out; > > if ((flags & MS_ASYNC) && (flags & MS_SYNC)) > > goto out; > > - error = -ENOMEM; > > - len = (len + ~PAGE_MASK) & PAGE_MASK; > > + > > + len = PAGE_ALIGN(len); > > end = start + len; > > - if (end < start) > > + if (end < start) { > > + error = -ENOMEM; > > The usual style is to have the error assignment outside the > conditional. That way is shorter, clearer, as well as possibly > generating better code. > > > goto out; > > + } > > + > > error = 0; > > + > > Unnecessary empty line here, these two statements actually belong > together. > > > if (end == start) > > goto out; > > + > > /* > > * If the interval [start,end) covers some unmapped address ranges, > > * just ignore them, but return -ENOMEM at the end. > > */ > > down_read(&mm->mmap_sem); > > vma = find_vma(mm, start); > > - for (;;) { > > + do { > > struct file *file; > > > > - /* Still start < end. */ > > - error = -ENOMEM; > > - if (!vma) > > - goto out_unlock; > > - /* Here start < vma->vm_end. */ > > + if (!vma) { > > + error = -ENOMEM; > > + break; > > + } > > Again, error asignment should be outside the conditional. This of > course means, you'll have to set the error back to zero at the end of > the loop, but that's fine. > > > if (start < vma->vm_start) { > > start = vma->vm_start; > > - if (start >= end) > > - goto out_unlock; > > + if (start >= end) { > > + error = -ENOMEM; > > + break; > > + } > > Ditto. > > > unmapped_error = -ENOMEM; > > } > > - /* Here vma->vm_start <= start < vma->vm_end. */ > > - if ((flags & MS_INVALIDATE) && > > - (vma->vm_flags & VM_LOCKED)) { > > + if ((flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED)) { > > error = -EBUSY; > > Ditto2. > > > - goto out_unlock; > > + break; > > } > > - file = vma->vm_file; > > start = vma->vm_end; > > - if ((flags & MS_SYNC) && file && > > - (vma->vm_flags & VM_SHARED)) { > > + > > + file = vma->vm_file; > > + if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) { > > get_file(file); > > up_read(&mm->mmap_sem); > > error = do_fsync(file, 0); > > fput(file); > > - if (error || start >= end) > > + if (error) > > This simplifies, but also does unnecessary down/find_vma/up. > > > goto out; > > down_read(&mm->mmap_sem); > > vma = find_vma(mm, start); > > - } else { > > - if (start >= end) { > > - error = 0; > > - goto out_unlock; > > - } > > - vma = vma->vm_next; > > + continue; > > } > > - } > > -out_unlock: > > + > > + vma = vma->vm_next; > > + } while (start < end); > > up_read(&mm->mmap_sem); > > + > > out: > > - return error ? : unmapped_error; > > + return error ? error : unmapped_error; > > } > Thanks for your recommendations! I'll take them into account for the next version. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 0:57 [PATCH -v5 0/2] Updating ctime and mtime for memory-mapped files Anton Salikhmetov 2008-01-17 0:57 ` [PATCH -v5 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov @ 2008-01-17 0:57 ` Anton Salikhmetov 2008-01-17 11:13 ` Miklos Szeredi 1 sibling, 1 reply; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 0:57 UTC (permalink / raw) To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, miklos, r.e.wolff, hidave.darkstar, hch Changes for updating the ctime and mtime fields for memory-mapped files: 1) a new flag triggering update of the inode data; 2) a new field in the address_space structure for saving modification time; 3) a new helper function to update ctime and mtime when needed; 4) updating time stamps for mapped files in sys_msync() and do_fsync(); 5) implementing lazy ctime and mtime update. Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> --- fs/buffer.c | 3 ++ fs/fs-writeback.c | 2 + fs/inode.c | 43 +++++++++++++++++++++++---------- fs/sync.c | 2 + include/linux/fs.h | 13 +++++++++- include/linux/pagemap.h | 3 +- mm/msync.c | 61 +++++++++++++++++++++++++++++++++------------- mm/page-writeback.c | 54 ++++++++++++++++++++++------------------- 8 files changed, 124 insertions(+), 57 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 7249e01..3967aa7 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -701,6 +701,9 @@ static int __set_page_dirty(struct page *page, if (unlikely(!mapping)) return !TestSetPageDirty(page); + mapping->mtime = CURRENT_TIME; + set_bit(AS_MCTIME, &mapping->flags); + if (TestSetPageDirty(page)) return 0; diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 300324b..affd291 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc) spin_unlock(&inode_lock); + mapping_update_time(mapping); + ret = do_writepages(mapping, wbc); /* Don't write the inode if only I_DIRTY_PAGES was set */ diff --git a/fs/inode.c b/fs/inode.c index ed35383..edd5bf4 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1243,8 +1243,10 @@ void touch_atime(struct vfsmount *mnt, struct dentry *dentry) EXPORT_SYMBOL(touch_atime); /** - * file_update_time - update mtime and ctime time - * @file: file accessed + * inode_update_time - update mtime and ctime time + * @inode: inode accessed + * @ts: time when inode was accessed + * @sync: whether to do synchronous update * * Update the mtime and ctime members of an inode and mark the inode * for writeback. Note that this function is meant exclusively for @@ -1253,11 +1255,8 @@ EXPORT_SYMBOL(touch_atime); * S_NOCTIME inode flag, e.g. for network filesystem where these * timestamps are handled by the server. */ - -void file_update_time(struct file *file) +void inode_update_time(struct inode *inode, struct timespec *ts) { - struct inode *inode = file->f_path.dentry->d_inode; - struct timespec now; int sync_it = 0; if (IS_NOCMTIME(inode)) @@ -1265,22 +1264,41 @@ void file_update_time(struct file *file) if (IS_RDONLY(inode)) return; - now = current_fs_time(inode->i_sb); - if (!timespec_equal(&inode->i_mtime, &now)) { - inode->i_mtime = now; + if (timespec_compare(&inode->i_mtime, ts) < 0) { + inode->i_mtime = *ts; sync_it = 1; } - if (!timespec_equal(&inode->i_ctime, &now)) { - inode->i_ctime = now; + if (timespec_compare(&inode->i_ctime, ts) < 0) { + inode->i_ctime = *ts; sync_it = 1; } if (sync_it) mark_inode_dirty_sync(inode); } +EXPORT_SYMBOL(inode_update_time); -EXPORT_SYMBOL(file_update_time); +/* + * Update the ctime and mtime stamps after checking if they are to be updated. + */ +void mapping_update_time(struct address_space *mapping) +{ + if (test_and_clear_bit(AS_MCTIME, &mapping->flags)) { + struct inode *inode = mapping->host; + struct timespec *ts = &mapping->mtime; + + if (S_ISBLK(inode->i_mode)) { + struct block_device *bdev = inode->i_bdev; + + mutex_lock(&bdev->bd_mutex); + list_for_each_entry(inode, &bdev->bd_inodes, i_devices) + inode_update_time(inode, ts); + mutex_unlock(&bdev->bd_mutex); + } else + inode_update_time(inode, ts); + } +} int inode_needs_sync(struct inode *inode) { @@ -1290,7 +1308,6 @@ int inode_needs_sync(struct inode *inode) return 1; return 0; } - EXPORT_SYMBOL(inode_needs_sync); int inode_wait(void *word) diff --git a/fs/sync.c b/fs/sync.c index 7cd005e..5561464 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -87,6 +87,8 @@ long do_fsync(struct file *file, int datasync) goto out; } + mapping_update_time(mapping); + ret = filemap_fdatawrite(mapping); /* diff --git a/include/linux/fs.h b/include/linux/fs.h index b3ec4a4..f0d3ced 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -511,6 +511,7 @@ struct address_space { spinlock_t private_lock; /* for use by the address_space */ struct list_head private_list; /* ditto */ struct address_space *assoc_mapping; /* ditto */ + struct timespec mtime; /* modification time */ } __attribute__((aligned(sizeof(long)))); /* * On most architectures that alignment is already the case; but @@ -1977,7 +1978,17 @@ extern int buffer_migrate_page(struct address_space *, extern int inode_change_ok(struct inode *, struct iattr *); extern int __must_check inode_setattr(struct inode *, struct iattr *); -extern void file_update_time(struct file *file); +extern void inode_update_time(struct inode *, struct timespec *); + +static inline void file_update_time(struct file *file) +{ + struct inode *inode = file->f_dentry->d_inode; + struct timespec ts = current_fs_time(inode->i_sb); + + inode_update_time(inode, &ts); +} + +extern void mapping_update_time(struct address_space *); static inline ino_t parent_ino(struct dentry *dentry) { diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index db8a410..bf0f9e7 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -17,8 +17,9 @@ * Bits in mapping->flags. The lower __GFP_BITS_SHIFT bits are the page * allocation mode flags. */ -#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ +#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ +#define AS_MCTIME (__GFP_BITS_SHIFT + 2) /* mtime and ctime to update */ static inline void mapping_set_error(struct address_space *mapping, int error) { diff --git a/mm/msync.c b/mm/msync.c index 44997bf..7657776 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -13,16 +13,37 @@ #include <linux/syscalls.h> /* + * Scan the PTEs for pages belonging to the VMA and mark them read-only. + * It will force a pagefault on the next write access. + */ +static void vma_wrprotect(struct vm_area_struct *vma) +{ + unsigned long addr; + + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { + spinlock_t *ptl; + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); + pud_t *pud = pud_offset(pgd, addr); + pmd_t *pmd = pmd_offset(pud, addr); + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + + if (pte_dirty(*pte) && pte_write(*pte)) + *pte = pte_wrprotect(*pte); + pte_unmap_unlock(pte, ptl); + } +} + +/* * MS_SYNC syncs the entire file - including mappings. * - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). - * Now it doesn't do anything, since dirty pages are properly tracked. + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages + * read-only by calling vma_wrprotect(). This is needed to catch the next + * write reference to the mapped region and update the file times + * accordingly. * - * The application may now run fsync() to - * write out the dirty pages and wait on the writeout and check the result. - * Or the application may run fadvise(FADV_DONTNEED) against the fd to start - * async writeout immediately. + * The application may now run fsync() to write out the dirty pages and + * wait on the writeout and check the result. Or the application may run + * fadvise(FADV_DONTNEED) against the fd to start async writeout immediately. * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to * applications. */ @@ -80,16 +101,22 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) start = vma->vm_end; file = vma->vm_file; - if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) { - get_file(file); - up_read(&mm->mmap_sem); - error = do_fsync(file, 0); - fput(file); - if (error) - goto out; - down_read(&mm->mmap_sem); - vma = find_vma(mm, start); - continue; + if (file && (vma->vm_flags & VM_SHARED)) { + if (flags & MS_ASYNC) { + vma_wrprotect(vma); + mapping_update_time(file->f_mapping); + } + if (flags & MS_SYNC) { + get_file(file); + up_read(&mm->mmap_sem); + error = do_fsync(file, 0); + fput(file); + if (error) + goto out; + down_read(&mm->mmap_sem); + vma = find_vma(mm, start); + continue; + } } vma = vma->vm_next; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 3d3848f..53d0e34 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page) */ int __set_page_dirty_nobuffers(struct page *page) { - if (!TestSetPageDirty(page)) { - struct address_space *mapping = page_mapping(page); - struct address_space *mapping2; + struct address_space *mapping = page_mapping(page); + struct address_space *mapping2; - if (!mapping) - return 1; + if (!mapping) + return 1; - write_lock_irq(&mapping->tree_lock); - mapping2 = page_mapping(page); - if (mapping2) { /* Race with truncate? */ - BUG_ON(mapping2 != mapping); - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); - if (mapping_cap_account_dirty(mapping)) { - __inc_zone_page_state(page, NR_FILE_DIRTY); - __inc_bdi_stat(mapping->backing_dev_info, - BDI_RECLAIMABLE); - task_io_account_write(PAGE_CACHE_SIZE); - } - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - } - write_unlock_irq(&mapping->tree_lock); - if (mapping->host) { - /* !PageAnon && !swapper_space */ - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + mapping->mtime = CURRENT_TIME; + set_bit(AS_MCTIME, &mapping->flags); + + if (TestSetPageDirty(page)) + return 0; + + write_lock_irq(&mapping->tree_lock); + mapping2 = page_mapping(page); + if (mapping2) { + /* Race with truncate? */ + BUG_ON(mapping2 != mapping); + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); + if (mapping_cap_account_dirty(mapping)) { + __inc_zone_page_state(page, NR_FILE_DIRTY); + __inc_bdi_stat(mapping->backing_dev_info, + BDI_RECLAIMABLE); + task_io_account_write(PAGE_CACHE_SIZE); } - return 1; + radix_tree_tag_set(&mapping->page_tree, + page_index(page), PAGECACHE_TAG_DIRTY); } - return 0; + write_unlock_irq(&mapping->tree_lock); + + if (mapping->host) + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + + return 1; } EXPORT_SYMBOL(__set_page_dirty_nobuffers); -- 1.4.4.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 0:57 ` [PATCH -v5 2/2] Updating ctime and mtime at syncing Anton Salikhmetov @ 2008-01-17 11:13 ` Miklos Szeredi 2008-01-17 12:16 ` Anton Salikhmetov 0 siblings, 1 reply; 18+ messages in thread From: Miklos Szeredi @ 2008-01-17 11:13 UTC (permalink / raw) To: salikhmetov Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, miklos, r.e.wolff, hidave.darkstar, hch > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > Changes for updating the ctime and mtime fields for memory-mapped files: > > 1) a new flag triggering update of the inode data; > 2) a new field in the address_space structure for saving modification time; > 3) a new helper function to update ctime and mtime when needed; > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > 5) implementing lazy ctime and mtime update. OK, the functionality seems to be there now. As a next step, I think you should try to simplify the patch, removing everything, that is not strictly necessary. > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > --- > fs/buffer.c | 3 ++ > fs/fs-writeback.c | 2 + > fs/inode.c | 43 +++++++++++++++++++++++---------- > fs/sync.c | 2 + > include/linux/fs.h | 13 +++++++++- > include/linux/pagemap.h | 3 +- > mm/msync.c | 61 +++++++++++++++++++++++++++++++++------------- > mm/page-writeback.c | 54 ++++++++++++++++++++++------------------- > 8 files changed, 124 insertions(+), 57 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 7249e01..3967aa7 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -701,6 +701,9 @@ static int __set_page_dirty(struct page *page, > if (unlikely(!mapping)) > return !TestSetPageDirty(page); > > + mapping->mtime = CURRENT_TIME; Why is this needed? POSIX explicitly states, that the modification time can be set to anywhere between the first write and the msync. > + set_bit(AS_MCTIME, &mapping->flags); A bigger problem is that doing this in __set_page_dirty() and friends will mean, that the flag will be set for non-mapped writes as well, which we definitely don't want. A better place to put it is do_wp_page and __do_fault, where set_page_dirty_balance() is called. > + > if (TestSetPageDirty(page)) > return 0; > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 300324b..affd291 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc) > > spin_unlock(&inode_lock); > > + mapping_update_time(mapping); > + I think this is unnecessary. Rather put the update into remove_vma(). > ret = do_writepages(mapping, wbc); > > /* Don't write the inode if only I_DIRTY_PAGES was set */ > diff --git a/fs/inode.c b/fs/inode.c > index ed35383..edd5bf4 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -1243,8 +1243,10 @@ void touch_atime(struct vfsmount *mnt, struct dentry *dentry) > EXPORT_SYMBOL(touch_atime); > > /** > - * file_update_time - update mtime and ctime time > - * @file: file accessed > + * inode_update_time - update mtime and ctime time > + * @inode: inode accessed > + * @ts: time when inode was accessed > + * @sync: whether to do synchronous update > * > * Update the mtime and ctime members of an inode and mark the inode > * for writeback. Note that this function is meant exclusively for > @@ -1253,11 +1255,8 @@ EXPORT_SYMBOL(touch_atime); > * S_NOCTIME inode flag, e.g. for network filesystem where these > * timestamps are handled by the server. > */ > - > -void file_update_time(struct file *file) > +void inode_update_time(struct inode *inode, struct timespec *ts) > { > - struct inode *inode = file->f_path.dentry->d_inode; > - struct timespec now; > int sync_it = 0; > > if (IS_NOCMTIME(inode)) > @@ -1265,22 +1264,41 @@ void file_update_time(struct file *file) > if (IS_RDONLY(inode)) > return; > > - now = current_fs_time(inode->i_sb); > - if (!timespec_equal(&inode->i_mtime, &now)) { > - inode->i_mtime = now; > + if (timespec_compare(&inode->i_mtime, ts) < 0) { > + inode->i_mtime = *ts; > sync_it = 1; > } > > - if (!timespec_equal(&inode->i_ctime, &now)) { > - inode->i_ctime = now; > + if (timespec_compare(&inode->i_ctime, ts) < 0) { > + inode->i_ctime = *ts; > sync_it = 1; > } > > if (sync_it) > mark_inode_dirty_sync(inode); > } > +EXPORT_SYMBOL(inode_update_time); > > -EXPORT_SYMBOL(file_update_time); > +/* > + * Update the ctime and mtime stamps after checking if they are to be updated. > + */ > +void mapping_update_time(struct address_space *mapping) > +{ > + if (test_and_clear_bit(AS_MCTIME, &mapping->flags)) { > + struct inode *inode = mapping->host; > + struct timespec *ts = &mapping->mtime; > + > + if (S_ISBLK(inode->i_mode)) { > + struct block_device *bdev = inode->i_bdev; > + > + mutex_lock(&bdev->bd_mutex); > + list_for_each_entry(inode, &bdev->bd_inodes, i_devices) > + inode_update_time(inode, ts); > + mutex_unlock(&bdev->bd_mutex); > + } else > + inode_update_time(inode, ts); > + } > +} All these changes to inode.c are unnecessary, I think. > > int inode_needs_sync(struct inode *inode) > { > @@ -1290,7 +1308,6 @@ int inode_needs_sync(struct inode *inode) > return 1; > return 0; > } > - > EXPORT_SYMBOL(inode_needs_sync); > > int inode_wait(void *word) > diff --git a/fs/sync.c b/fs/sync.c > index 7cd005e..5561464 100644 > --- a/fs/sync.c > +++ b/fs/sync.c > @@ -87,6 +87,8 @@ long do_fsync(struct file *file, int datasync) > goto out; > } > > + mapping_update_time(mapping); > + > ret = filemap_fdatawrite(mapping); > > /* > diff --git a/include/linux/fs.h b/include/linux/fs.h > index b3ec4a4..f0d3ced 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -511,6 +511,7 @@ struct address_space { > spinlock_t private_lock; /* for use by the address_space */ > struct list_head private_list; /* ditto */ > struct address_space *assoc_mapping; /* ditto */ > + struct timespec mtime; /* modification time */ > } __attribute__((aligned(sizeof(long)))); > /* > * On most architectures that alignment is already the case; but > @@ -1977,7 +1978,17 @@ extern int buffer_migrate_page(struct address_space *, > extern int inode_change_ok(struct inode *, struct iattr *); > extern int __must_check inode_setattr(struct inode *, struct iattr *); > > -extern void file_update_time(struct file *file); > +extern void inode_update_time(struct inode *, struct timespec *); > + > +static inline void file_update_time(struct file *file) > +{ > + struct inode *inode = file->f_dentry->d_inode; > + struct timespec ts = current_fs_time(inode->i_sb); > + > + inode_update_time(inode, &ts); > +} > + > +extern void mapping_update_time(struct address_space *); As is this. > > static inline ino_t parent_ino(struct dentry *dentry) > { > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index db8a410..bf0f9e7 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -17,8 +17,9 @@ > * Bits in mapping->flags. The lower __GFP_BITS_SHIFT bits are the page > * allocation mode flags. > */ > -#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ > +#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ > #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ > +#define AS_MCTIME (__GFP_BITS_SHIFT + 2) /* mtime and ctime to update */ > > static inline void mapping_set_error(struct address_space *mapping, int error) > { > diff --git a/mm/msync.c b/mm/msync.c > index 44997bf..7657776 100644 > --- a/mm/msync.c > +++ b/mm/msync.c > @@ -13,16 +13,37 @@ > #include <linux/syscalls.h> > > /* > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > + * It will force a pagefault on the next write access. > + */ > +static void vma_wrprotect(struct vm_area_struct *vma) > +{ > + unsigned long addr; > + > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > + spinlock_t *ptl; > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > + pud_t *pud = pud_offset(pgd, addr); > + pmd_t *pmd = pmd_offset(pud, addr); > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > + > + if (pte_dirty(*pte) && pte_write(*pte)) > + *pte = pte_wrprotect(*pte); > + pte_unmap_unlock(pte, ptl); > + } > +} > + There might be a more efficient way to do this, than getting and releasing the lock for each pte. > +/* > * MS_SYNC syncs the entire file - including mappings. > * > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > - * Now it doesn't do anything, since dirty pages are properly tracked. > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages > + * read-only by calling vma_wrprotect(). This is needed to catch the next > + * write reference to the mapped region and update the file times > + * accordingly. > * > - * The application may now run fsync() to > - * write out the dirty pages and wait on the writeout and check the result. > - * Or the application may run fadvise(FADV_DONTNEED) against the fd to start > - * async writeout immediately. > + * The application may now run fsync() to write out the dirty pages and > + * wait on the writeout and check the result. Or the application may run > + * fadvise(FADV_DONTNEED) against the fd to start async writeout immediately. > * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to > * applications. > */ > @@ -80,16 +101,22 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > start = vma->vm_end; > > file = vma->vm_file; > - if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) { > - get_file(file); > - up_read(&mm->mmap_sem); > - error = do_fsync(file, 0); > - fput(file); > - if (error) > - goto out; > - down_read(&mm->mmap_sem); > - vma = find_vma(mm, start); > - continue; > + if (file && (vma->vm_flags & VM_SHARED)) { > + if (flags & MS_ASYNC) { > + vma_wrprotect(vma); > + mapping_update_time(file->f_mapping); > + } > + if (flags & MS_SYNC) { > + get_file(file); > + up_read(&mm->mmap_sem); > + error = do_fsync(file, 0); > + fput(file); > + if (error) > + goto out; > + down_read(&mm->mmap_sem); > + vma = find_vma(mm, start); > + continue; > + } > } > > vma = vma->vm_next; > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 3d3848f..53d0e34 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page) > */ > int __set_page_dirty_nobuffers(struct page *page) > { > - if (!TestSetPageDirty(page)) { > - struct address_space *mapping = page_mapping(page); > - struct address_space *mapping2; > + struct address_space *mapping = page_mapping(page); > + struct address_space *mapping2; > > - if (!mapping) > - return 1; > + if (!mapping) > + return 1; > > - write_lock_irq(&mapping->tree_lock); > - mapping2 = page_mapping(page); > - if (mapping2) { /* Race with truncate? */ > - BUG_ON(mapping2 != mapping); > - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > - if (mapping_cap_account_dirty(mapping)) { > - __inc_zone_page_state(page, NR_FILE_DIRTY); > - __inc_bdi_stat(mapping->backing_dev_info, > - BDI_RECLAIMABLE); > - task_io_account_write(PAGE_CACHE_SIZE); > - } > - radix_tree_tag_set(&mapping->page_tree, > - page_index(page), PAGECACHE_TAG_DIRTY); > - } > - write_unlock_irq(&mapping->tree_lock); > - if (mapping->host) { > - /* !PageAnon && !swapper_space */ > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > + mapping->mtime = CURRENT_TIME; > + set_bit(AS_MCTIME, &mapping->flags); > + > + if (TestSetPageDirty(page)) > + return 0; > + > + write_lock_irq(&mapping->tree_lock); > + mapping2 = page_mapping(page); > + if (mapping2) { > + /* Race with truncate? */ > + BUG_ON(mapping2 != mapping); > + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > + if (mapping_cap_account_dirty(mapping)) { > + __inc_zone_page_state(page, NR_FILE_DIRTY); > + __inc_bdi_stat(mapping->backing_dev_info, > + BDI_RECLAIMABLE); > + task_io_account_write(PAGE_CACHE_SIZE); > } > - return 1; > + radix_tree_tag_set(&mapping->page_tree, > + page_index(page), PAGECACHE_TAG_DIRTY); > } > - return 0; > + write_unlock_irq(&mapping->tree_lock); > + > + if (mapping->host) > + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > + > + return 1; > } > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > -- > 1.4.4.4 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 11:13 ` Miklos Szeredi @ 2008-01-17 12:16 ` Anton Salikhmetov 2008-01-17 12:45 ` Miklos Szeredi 0 siblings, 1 reply; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 12:16 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > 1) a new flag triggering update of the inode data; > > 2) a new field in the address_space structure for saving modification time; > > 3) a new helper function to update ctime and mtime when needed; > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > 5) implementing lazy ctime and mtime update. > > OK, the functionality seems to be there now. As a next step, I think > you should try to simplify the patch, removing everything, that is not > strictly necessary. > > > > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > > --- > > fs/buffer.c | 3 ++ > > fs/fs-writeback.c | 2 + > > fs/inode.c | 43 +++++++++++++++++++++++---------- > > fs/sync.c | 2 + > > include/linux/fs.h | 13 +++++++++- > > include/linux/pagemap.h | 3 +- > > mm/msync.c | 61 +++++++++++++++++++++++++++++++++------------- > > mm/page-writeback.c | 54 ++++++++++++++++++++++------------------- > > 8 files changed, 124 insertions(+), 57 deletions(-) > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index 7249e01..3967aa7 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -701,6 +701,9 @@ static int __set_page_dirty(struct page *page, > > if (unlikely(!mapping)) > > return !TestSetPageDirty(page); > > > > + mapping->mtime = CURRENT_TIME; > > Why is this needed? POSIX explicitly states, that the modification > time can be set to anywhere between the first write and the msync. I've already described this change in one of my previous emails: > 4. Recording the time was the file data changed > > Finally, I noticed yet another issue with the previous version of my patch. > Specifically, the time stamps were set to the current time of the moment > when syncing but not the write reference was being done. This led to the > following adverse effect on my development system: > > 1) a text file A was updated by process B; > 2) process B exits without calling any of the *sync() functions; > 3) vi editor opens the file A; > 4) file data synced, file times updated; > 5) vi is confused by "thinking" that the file was changed after 3). > > This version overcomes this problem by introducing another field into the > address_space structure. This field is used to "remember" the time of > dirtying, and then this time value is propagated to the file metadata. > > This approach is based upon the following suggestion given by Peter > Staubach during one of our previous discussions: > > http://lkml.org/lkml/2008/1/9/267 > > > A better architecture would be to arrange for the file times > > to be updated when the page makes the transition from being > > unmodified to modified. This is not straightforward due to > > the current locking, but should be doable, I think. Perhaps > > recording the current time and then using it to update the > > file times at a more suitable time (no pun intended) might > > work. > > The solution I propose now proves the viability of the latter > approach. See: http://lkml.org/lkml/2008/1/15/202 > > > + set_bit(AS_MCTIME, &mapping->flags); > > A bigger problem is that doing this in __set_page_dirty() and friends > will mean, that the flag will be set for non-mapped writes as well, > which we definitely don't want. > > A better place to put it is do_wp_page and __do_fault, where > set_page_dirty_balance() is called. Thanks for your suggestion, I'll consider how I can apply it. > > > + > > if (TestSetPageDirty(page)) > > return 0; > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > > index 300324b..affd291 100644 > > --- a/fs/fs-writeback.c > > +++ b/fs/fs-writeback.c > > @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc) > > > > spin_unlock(&inode_lock); > > > > + mapping_update_time(mapping); > > + > > I think this is unnecessary. Rather put the update into remove_vma(). Thanks for your suggestion, I'll consider how I can apply it. However, I suspect that this is a bit problematic. Let me check it out. > > > ret = do_writepages(mapping, wbc); > > > > /* Don't write the inode if only I_DIRTY_PAGES was set */ > > diff --git a/fs/inode.c b/fs/inode.c > > index ed35383..edd5bf4 100644 > > --- a/fs/inode.c > > +++ b/fs/inode.c > > @@ -1243,8 +1243,10 @@ void touch_atime(struct vfsmount *mnt, struct dentry *dentry) > > EXPORT_SYMBOL(touch_atime); > > > > /** > > - * file_update_time - update mtime and ctime time > > - * @file: file accessed > > + * inode_update_time - update mtime and ctime time > > + * @inode: inode accessed > > + * @ts: time when inode was accessed > > + * @sync: whether to do synchronous update > > * > > * Update the mtime and ctime members of an inode and mark the inode > > * for writeback. Note that this function is meant exclusively for > > @@ -1253,11 +1255,8 @@ EXPORT_SYMBOL(touch_atime); > > * S_NOCTIME inode flag, e.g. for network filesystem where these > > * timestamps are handled by the server. > > */ > > - > > -void file_update_time(struct file *file) > > +void inode_update_time(struct inode *inode, struct timespec *ts) > > { > > - struct inode *inode = file->f_path.dentry->d_inode; > > - struct timespec now; > > int sync_it = 0; > > > > if (IS_NOCMTIME(inode)) > > @@ -1265,22 +1264,41 @@ void file_update_time(struct file *file) > > if (IS_RDONLY(inode)) > > return; > > > > - now = current_fs_time(inode->i_sb); > > - if (!timespec_equal(&inode->i_mtime, &now)) { > > - inode->i_mtime = now; > > + if (timespec_compare(&inode->i_mtime, ts) < 0) { > > + inode->i_mtime = *ts; > > sync_it = 1; > > } > > > > - if (!timespec_equal(&inode->i_ctime, &now)) { > > - inode->i_ctime = now; > > + if (timespec_compare(&inode->i_ctime, ts) < 0) { > > + inode->i_ctime = *ts; > > sync_it = 1; > > } > > > > if (sync_it) > > mark_inode_dirty_sync(inode); > > } > > +EXPORT_SYMBOL(inode_update_time); > > > > -EXPORT_SYMBOL(file_update_time); > > +/* > > + * Update the ctime and mtime stamps after checking if they are to be updated. > > + */ > > +void mapping_update_time(struct address_space *mapping) > > +{ > > + if (test_and_clear_bit(AS_MCTIME, &mapping->flags)) { > > + struct inode *inode = mapping->host; > > + struct timespec *ts = &mapping->mtime; > > + > > + if (S_ISBLK(inode->i_mode)) { > > + struct block_device *bdev = inode->i_bdev; > > + > > + mutex_lock(&bdev->bd_mutex); > > + list_for_each_entry(inode, &bdev->bd_inodes, i_devices) > > + inode_update_time(inode, ts); > > + mutex_unlock(&bdev->bd_mutex); > > + } else > > + inode_update_time(inode, ts); > > + } > > +} > > All these changes to inode.c are unnecessary, I think. The first part is necessary to account for "remembering" the modification time. The second part is for handling block device files. I cannot see any other sane way to update file times for them. > > > > > int inode_needs_sync(struct inode *inode) > > { > > @@ -1290,7 +1308,6 @@ int inode_needs_sync(struct inode *inode) > > return 1; > > return 0; > > } > > - > > EXPORT_SYMBOL(inode_needs_sync); > > > > int inode_wait(void *word) > > diff --git a/fs/sync.c b/fs/sync.c > > index 7cd005e..5561464 100644 > > --- a/fs/sync.c > > +++ b/fs/sync.c > > @@ -87,6 +87,8 @@ long do_fsync(struct file *file, int datasync) > > goto out; > > } > > > > + mapping_update_time(mapping); > > + > > ret = filemap_fdatawrite(mapping); > > > > /* > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index b3ec4a4..f0d3ced 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -511,6 +511,7 @@ struct address_space { > > spinlock_t private_lock; /* for use by the address_space */ > > struct list_head private_list; /* ditto */ > > struct address_space *assoc_mapping; /* ditto */ > > + struct timespec mtime; /* modification time */ > > } __attribute__((aligned(sizeof(long)))); > > /* > > * On most architectures that alignment is already the case; but > > @@ -1977,7 +1978,17 @@ extern int buffer_migrate_page(struct address_space *, > > extern int inode_change_ok(struct inode *, struct iattr *); > > extern int __must_check inode_setattr(struct inode *, struct iattr *); > > > > -extern void file_update_time(struct file *file); > > +extern void inode_update_time(struct inode *, struct timespec *); > > + > > +static inline void file_update_time(struct file *file) > > +{ > > + struct inode *inode = file->f_dentry->d_inode; > > + struct timespec ts = current_fs_time(inode->i_sb); > > + > > + inode_update_time(inode, &ts); > > +} > > + > > +extern void mapping_update_time(struct address_space *); > > As is this. This change is a logical consequence of introducing the new inode_update_time() routine, which was explained above. > > > > > static inline ino_t parent_ino(struct dentry *dentry) > > { > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > > index db8a410..bf0f9e7 100644 > > --- a/include/linux/pagemap.h > > +++ b/include/linux/pagemap.h > > @@ -17,8 +17,9 @@ > > * Bits in mapping->flags. The lower __GFP_BITS_SHIFT bits are the page > > * allocation mode flags. > > */ > > -#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ > > +#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ > > #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ > > +#define AS_MCTIME (__GFP_BITS_SHIFT + 2) /* mtime and ctime to update */ > > > > static inline void mapping_set_error(struct address_space *mapping, int error) > > { > > diff --git a/mm/msync.c b/mm/msync.c > > index 44997bf..7657776 100644 > > --- a/mm/msync.c > > +++ b/mm/msync.c > > @@ -13,16 +13,37 @@ > > #include <linux/syscalls.h> > > > > /* > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > > + * It will force a pagefault on the next write access. > > + */ > > +static void vma_wrprotect(struct vm_area_struct *vma) > > +{ > > + unsigned long addr; > > + > > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > > + spinlock_t *ptl; > > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > > + pud_t *pud = pud_offset(pgd, addr); > > + pmd_t *pmd = pmd_offset(pud, addr); > > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > > + > > + if (pte_dirty(*pte) && pte_write(*pte)) > > + *pte = pte_wrprotect(*pte); > > + pte_unmap_unlock(pte, ptl); > > + } > > +} > > + > > There might be a more efficient way to do this, than getting and > releasing the lock for each pte. I haven't found any locks with bigger granularity here. Frankly, I do not think that this should be a big performance hit if we acquire and release the lock for each PTE. However, I'll be grateful for any further suggestions. Rik, can you please comment on this? > > > +/* > > * MS_SYNC syncs the entire file - including mappings. > > * > > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > > - * Now it doesn't do anything, since dirty pages are properly tracked. > > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages > > + * read-only by calling vma_wrprotect(). This is needed to catch the next > > + * write reference to the mapped region and update the file times > > + * accordingly. > > * > > - * The application may now run fsync() to > > - * write out the dirty pages and wait on the writeout and check the result. > > - * Or the application may run fadvise(FADV_DONTNEED) against the fd to start > > - * async writeout immediately. > > + * The application may now run fsync() to write out the dirty pages and > > + * wait on the writeout and check the result. Or the application may run > > + * fadvise(FADV_DONTNEED) against the fd to start async writeout immediately. > > * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to > > * applications. > > */ > > @@ -80,16 +101,22 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > > start = vma->vm_end; > > > > file = vma->vm_file; > > - if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) { > > - get_file(file); > > - up_read(&mm->mmap_sem); > > - error = do_fsync(file, 0); > > - fput(file); > > - if (error) > > - goto out; > > - down_read(&mm->mmap_sem); > > - vma = find_vma(mm, start); > > - continue; > > + if (file && (vma->vm_flags & VM_SHARED)) { > > + if (flags & MS_ASYNC) { > > + vma_wrprotect(vma); > > + mapping_update_time(file->f_mapping); > > + } > > + if (flags & MS_SYNC) { > > + get_file(file); > > + up_read(&mm->mmap_sem); > > + error = do_fsync(file, 0); > > + fput(file); > > + if (error) > > + goto out; > > + down_read(&mm->mmap_sem); > > + vma = find_vma(mm, start); > > + continue; > > + } > > } > > > > vma = vma->vm_next; > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 3d3848f..53d0e34 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -997,35 +997,39 @@ int __set_page_dirty_no_writeback(struct page *page) > > */ > > int __set_page_dirty_nobuffers(struct page *page) > > { > > - if (!TestSetPageDirty(page)) { > > - struct address_space *mapping = page_mapping(page); > > - struct address_space *mapping2; > > + struct address_space *mapping = page_mapping(page); > > + struct address_space *mapping2; > > > > - if (!mapping) > > - return 1; > > + if (!mapping) > > + return 1; > > > > - write_lock_irq(&mapping->tree_lock); > > - mapping2 = page_mapping(page); > > - if (mapping2) { /* Race with truncate? */ > > - BUG_ON(mapping2 != mapping); > > - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > > - if (mapping_cap_account_dirty(mapping)) { > > - __inc_zone_page_state(page, NR_FILE_DIRTY); > > - __inc_bdi_stat(mapping->backing_dev_info, > > - BDI_RECLAIMABLE); > > - task_io_account_write(PAGE_CACHE_SIZE); > > - } > > - radix_tree_tag_set(&mapping->page_tree, > > - page_index(page), PAGECACHE_TAG_DIRTY); > > - } > > - write_unlock_irq(&mapping->tree_lock); > > - if (mapping->host) { > > - /* !PageAnon && !swapper_space */ > > - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > + mapping->mtime = CURRENT_TIME; > > + set_bit(AS_MCTIME, &mapping->flags); > > + > > + if (TestSetPageDirty(page)) > > + return 0; > > + > > + write_lock_irq(&mapping->tree_lock); > > + mapping2 = page_mapping(page); > > + if (mapping2) { > > + /* Race with truncate? */ > > + BUG_ON(mapping2 != mapping); > > + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > > + if (mapping_cap_account_dirty(mapping)) { > > + __inc_zone_page_state(page, NR_FILE_DIRTY); > > + __inc_bdi_stat(mapping->backing_dev_info, > > + BDI_RECLAIMABLE); > > + task_io_account_write(PAGE_CACHE_SIZE); > > } > > - return 1; > > + radix_tree_tag_set(&mapping->page_tree, > > + page_index(page), PAGECACHE_TAG_DIRTY); > > } > > - return 0; > > + write_unlock_irq(&mapping->tree_lock); > > + > > + if (mapping->host) > > + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > + > > + return 1; > > } > > EXPORT_SYMBOL(__set_page_dirty_nobuffers); > > > > -- > > 1.4.4.4 > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 12:16 ` Anton Salikhmetov @ 2008-01-17 12:45 ` Miklos Szeredi 2008-01-17 12:51 ` Rogier Wolff 2008-01-17 13:16 ` Anton Salikhmetov 0 siblings, 2 replies; 18+ messages in thread From: Miklos Szeredi @ 2008-01-17 12:45 UTC (permalink / raw) To: salikhmetov Cc: miklos, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch > > 4. Recording the time was the file data changed > > > > Finally, I noticed yet another issue with the previous version of my patch. > > Specifically, the time stamps were set to the current time of the moment > > when syncing but not the write reference was being done. This led to the > > following adverse effect on my development system: > > > > 1) a text file A was updated by process B; > > 2) process B exits without calling any of the *sync() functions; > > 3) vi editor opens the file A; > > 4) file data synced, file times updated; > > 5) vi is confused by "thinking" that the file was changed after 3). Updating the time in remove_vma() would fix this, no? > > All these changes to inode.c are unnecessary, I think. > > The first part is necessary to account for "remembering" the modification time. > > The second part is for handling block device files. I cannot see any other > sane way to update file times for them. Use file_update_time(), which will do the right thing. It will in fact do the same thing as write(2) on the device, which is really what we want. Block devices being mapped for write through different device nodes..., well, I don't think we really need to handle such weird corner cases 100% acurately. Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 12:45 ` Miklos Szeredi @ 2008-01-17 12:51 ` Rogier Wolff 2008-01-17 13:16 ` Anton Salikhmetov 1 sibling, 0 replies; 18+ messages in thread From: Rogier Wolff @ 2008-01-17 12:51 UTC (permalink / raw) To: Miklos Szeredi Cc: salikhmetov, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch On Thu, Jan 17, 2008 at 01:45:43PM +0100, Miklos Szeredi wrote: > > > 4. Recording the time was the file data changed > > > > > > Finally, I noticed yet another issue with the previous version of my patch. > > > Specifically, the time stamps were set to the current time of the moment > > > when syncing but not the write reference was being done. This led to the > > > following adverse effect on my development system: > > > > > > 1) a text file A was updated by process B; > > > 2) process B exits without calling any of the *sync() functions; > > > 3) vi editor opens the file A; > > > 4) file data synced, file times updated; > > > 5) vi is confused by "thinking" that the file was changed after 3). > > Updating the time in remove_vma() would fix this, no? That sounds to me as the right thing to do. Although not explcitly mentioned in the standard, it is the logical (latest allowable) timestamp to put on the modifications by process B. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. Does it sit on the couch all day? Is it unemployed? Please be specific! Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 12:45 ` Miklos Szeredi 2008-01-17 12:51 ` Rogier Wolff @ 2008-01-17 13:16 ` Anton Salikhmetov 2008-01-17 13:24 ` Rogier Wolff 2008-01-17 13:33 ` Miklos Szeredi 1 sibling, 2 replies; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 13:16 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > > 4. Recording the time was the file data changed > > > > > > Finally, I noticed yet another issue with the previous version of my patch. > > > Specifically, the time stamps were set to the current time of the moment > > > when syncing but not the write reference was being done. This led to the > > > following adverse effect on my development system: > > > > > > 1) a text file A was updated by process B; > > > 2) process B exits without calling any of the *sync() functions; > > > 3) vi editor opens the file A; > > > 4) file data synced, file times updated; > > > 5) vi is confused by "thinking" that the file was changed after 3). > > Updating the time in remove_vma() would fix this, no? We need to save modification time. Otherwise, updating time stamps will be confusing the vi editor. > > > > All these changes to inode.c are unnecessary, I think. > > > > The first part is necessary to account for "remembering" the modification time. > > > > The second part is for handling block device files. I cannot see any other > > sane way to update file times for them. > > Use file_update_time(), which will do the right thing. It will in > fact do the same thing as write(2) on the device, which is really what > we want. > > Block devices being mapped for write through different device > nodes..., well, I don't think we really need to handle such weird > corner cases 100% acurately. The file_update_time() cannot be used for implementing the "auto-update" feature, because the sync() system call doesn't "know" about the file which was memory-mapped. > > Miklos > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 13:16 ` Anton Salikhmetov @ 2008-01-17 13:24 ` Rogier Wolff 2008-01-17 13:34 ` Anton Salikhmetov 2008-01-17 13:33 ` Miklos Szeredi 1 sibling, 1 reply; 18+ messages in thread From: Rogier Wolff @ 2008-01-17 13:24 UTC (permalink / raw) To: Anton Salikhmetov Cc: Miklos Szeredi, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch On Thu, Jan 17, 2008 at 04:16:47PM +0300, Anton Salikhmetov wrote: > 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > > > 4. Recording the time was the file data changed > > > > > > > > Finally, I noticed yet another issue with the previous version of my patch. > > > > Specifically, the time stamps were set to the current time of the moment > > > > when syncing but not the write reference was being done. This led to the > > > > following adverse effect on my development system: > > > > > > > > 1) a text file A was updated by process B; > > > > 2) process B exits without calling any of the *sync() functions; > > > > 3) vi editor opens the file A; > > > > 4) file data synced, file times updated; > > > > 5) vi is confused by "thinking" that the file was changed after 3). > > > > Updating the time in remove_vma() would fix this, no? > > We need to save modification time. Otherwise, updating time stamps > will be confusing the vi editor. If process B exits before vi opens the file, the timestamp should at the latest be the time that process B exits. There is no excuse for setting the timestamp later than the time that B exits. If process B no longer modifies the file, but still keeps it mapped until after vi starts, then the system can't help the situation. Wether or not B acesses those pages is unknown to the system. So you get what you deserve. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. Does it sit on the couch all day? Is it unemployed? Please be specific! Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 13:24 ` Rogier Wolff @ 2008-01-17 13:34 ` Anton Salikhmetov 0 siblings, 0 replies; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 13:34 UTC (permalink / raw) To: Rogier Wolff Cc: Miklos Szeredi, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, hidave.darkstar, hch 2008/1/17, Rogier Wolff <R.E.Wolff@bitwizard.nl>: > On Thu, Jan 17, 2008 at 04:16:47PM +0300, Anton Salikhmetov wrote: > > 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > > > > 4. Recording the time was the file data changed > > > > > > > > > > Finally, I noticed yet another issue with the previous version of my patch. > > > > > Specifically, the time stamps were set to the current time of the moment > > > > > when syncing but not the write reference was being done. This led to the > > > > > following adverse effect on my development system: > > > > > > > > > > 1) a text file A was updated by process B; > > > > > 2) process B exits without calling any of the *sync() functions; > > > > > 3) vi editor opens the file A; > > > > > 4) file data synced, file times updated; > > > > > 5) vi is confused by "thinking" that the file was changed after 3). > > > > > > Updating the time in remove_vma() would fix this, no? > > > > We need to save modification time. Otherwise, updating time stamps > > will be confusing the vi editor. > > If process B exits before vi opens the file, the timestamp should at > the latest be the time that process B exits. There is no excuse for > setting the timestamp later than the time that B exits. > > If process B no longer modifies the file, but still keeps it mapped > until after vi starts, then the system can't help the > situation. Wether or not B acesses those pages is unknown to the > system. So you get what you deserve. The exit() system call closes the file memory-mapped file. Therefore, it's better to catch the close() system call. However, the close() system call does not trigger unmapping the file. The mapped area for the file may be used after closing the file. So, we should catch only the unmap() call, not close() or exit(). > > Roger. > > -- > ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** > ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** > *-- BitWizard writes Linux device drivers for any device you may have! --* > Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. > Does it sit on the couch all day? Is it unemployed? Please be specific! > Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 13:16 ` Anton Salikhmetov 2008-01-17 13:24 ` Rogier Wolff @ 2008-01-17 13:33 ` Miklos Szeredi 2008-01-17 13:40 ` Anton Salikhmetov 1 sibling, 1 reply; 18+ messages in thread From: Miklos Szeredi @ 2008-01-17 13:33 UTC (permalink / raw) To: salikhmetov Cc: miklos, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch > 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > > > 4. Recording the time was the file data changed > > > > > > > > Finally, I noticed yet another issue with the previous version of my patch. > > > > Specifically, the time stamps were set to the current time of the moment > > > > when syncing but not the write reference was being done. This led to the > > > > following adverse effect on my development system: > > > > > > > > 1) a text file A was updated by process B; > > > > 2) process B exits without calling any of the *sync() functions; > > > > 3) vi editor opens the file A; > > > > 4) file data synced, file times updated; > > > > 5) vi is confused by "thinking" that the file was changed after 3). > > > > Updating the time in remove_vma() would fix this, no? > > We need to save modification time. Otherwise, updating time stamps > will be confusing the vi editor. remove_vma() will be called when process B exits, so if the times are updated there, and the flag is cleared, the times won't be updated later. > > > > > > All these changes to inode.c are unnecessary, I think. > > > > > > The first part is necessary to account for "remembering" the modification time. > > > > > > The second part is for handling block device files. I cannot see any other > > > sane way to update file times for them. > > > > Use file_update_time(), which will do the right thing. It will in > > fact do the same thing as write(2) on the device, which is really what > > we want. > > > > Block devices being mapped for write through different device > > nodes..., well, I don't think we really need to handle such weird > > corner cases 100% acurately. > > The file_update_time() cannot be used for implementing > the "auto-update" feature, because the sync() system call > doesn't "know" about the file which was memory-mapped. I'm not sure this auto-updating is really needed (POSIX doesn't mandate it). At least split it out into a separate patch, so it can be considered separately on it's own merit. I think doing the same with the page-table reprotecting in MS_ASYNC is also a good idea. That will leave us with 1) a base patch: update time just from fsync() and remove_vma() 2) update time on sync(2) as well 3) update time on MS_ASYNC as well I'd happily ack the first one, which would solve the most serious issues, but have some reservations about the other two. Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 13:33 ` Miklos Szeredi @ 2008-01-17 13:40 ` Anton Salikhmetov 2008-01-17 15:45 ` Miklos Szeredi 0 siblings, 1 reply; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 13:40 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > > > > 4. Recording the time was the file data changed > > > > > > > > > > Finally, I noticed yet another issue with the previous version of my patch. > > > > > Specifically, the time stamps were set to the current time of the moment > > > > > when syncing but not the write reference was being done. This led to the > > > > > following adverse effect on my development system: > > > > > > > > > > 1) a text file A was updated by process B; > > > > > 2) process B exits without calling any of the *sync() functions; > > > > > 3) vi editor opens the file A; > > > > > 4) file data synced, file times updated; > > > > > 5) vi is confused by "thinking" that the file was changed after 3). > > > > > > Updating the time in remove_vma() would fix this, no? > > > > We need to save modification time. Otherwise, updating time stamps > > will be confusing the vi editor. > > remove_vma() will be called when process B exits, so if the times are > updated there, and the flag is cleared, the times won't be updated > later. > > > > > > > > > All these changes to inode.c are unnecessary, I think. > > > > > > > > The first part is necessary to account for "remembering" the modification time. > > > > > > > > The second part is for handling block device files. I cannot see any other > > > > sane way to update file times for them. > > > > > > Use file_update_time(), which will do the right thing. It will in > > > fact do the same thing as write(2) on the device, which is really what > > > we want. > > > > > > Block devices being mapped for write through different device > > > nodes..., well, I don't think we really need to handle such weird > > > corner cases 100% acurately. > > > > The file_update_time() cannot be used for implementing > > the "auto-update" feature, because the sync() system call > > doesn't "know" about the file which was memory-mapped. > > I'm not sure this auto-updating is really needed (POSIX doesn't > mandate it). Peter Shtaubach, author of the first solution for this bug, and Jacob Ostergaard, the reporter of this bug, insist the "auto-update" feature to be implemented. > > At least split it out into a separate patch, so it can be considered > separately on it's own merit. > > I think doing the same with the page-table reprotecting in MS_ASYNC is > also a good idea. That will leave us with > > 1) a base patch: update time just from fsync() and remove_vma() > 2) update time on sync(2) as well > 3) update time on MS_ASYNC as well > > I'd happily ack the first one, which would solve the most serious > issues, but have some reservations about the other two. > > Miklos > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 13:40 ` Anton Salikhmetov @ 2008-01-17 15:45 ` Miklos Szeredi 2008-01-17 16:20 ` Anton Salikhmetov 0 siblings, 1 reply; 18+ messages in thread From: Miklos Szeredi @ 2008-01-17 15:45 UTC (permalink / raw) To: salikhmetov Cc: miklos, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch > > I'm not sure this auto-updating is really needed (POSIX doesn't > > mandate it). > > Peter Shtaubach, author of the first solution for this bug, > and Jacob Ostergaard, the reporter of this bug, insist the "auto-update" > feature to be implemented. Can they state their reasons for the insistence? > 1) a base patch: update time just from fsync() and remove_vma() > 2) update time on sync(2) as well > 3) update time on MS_ASYNC as well Oh, and the four-liner I posted the other day will give you 1) + 2) + even more at a small fraction of the complexity. And tacking on the reprotect code will solve the MS_ASYNC issue just the same. I agree, that having the timestamp updated on sync() is nice, and that trivial patch will give you that, and will also update the timestamp at least each 30 seconds if the file is being constantly modified, even if no explicit syncing is done. So maybe it's worth a little effort benchmarking how much that patch affects the cost of writing to a page. You could write a little test program like this (if somebody hasn't yet done so): - do some preparation: echo 80 > dirty_ratio echo 80 > dirty_background_ratio echo 30000 > dirty_expire_centisecs sync - map a large file, one that fits comfortably into free memory - bring the whole file in, by reading a byte from each page - start the timer - write a byte to each page - stop the timer It would be most interesting to try this on a filesystem supporting nanosecond timestamps. Anyone know which these are? Miklos ---- Index: linux/mm/memory.c =================================================================== --- linux.orig/mm/memory.c 2008-01-09 21:16:30.000000000 +0100 +++ linux/mm/memory.c 2008-01-15 21:16:14.000000000 +0100 @@ -1680,6 +1680,8 @@ gotten: unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { + if (vma->vm_file) + file_update_time(vma->vm_file); /* * Yes, Virginia, this is actually required to prevent a race * with clear_page_dirty_for_io() from clearing the page dirty @@ -2313,6 +2315,8 @@ out_unlocked: if (anon) page_cache_release(vmf.page); else if (dirty_page) { + if (vma->vm_file) + file_update_time(vma->vm_file); set_page_dirty_balance(dirty_page, page_mkwrite); put_page(dirty_page); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 15:45 ` Miklos Szeredi @ 2008-01-17 16:20 ` Anton Salikhmetov 2008-01-17 16:26 ` Miklos Szeredi 0 siblings, 1 reply; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 16:20 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > > I'm not sure this auto-updating is really needed (POSIX doesn't > > > mandate it). > > > > Peter Shtaubach, author of the first solution for this bug, > > and Jacob Ostergaard, the reporter of this bug, insist the "auto-update" > > feature to be implemented. > > Can they state their reasons for the insistence? > > > 1) a base patch: update time just from fsync() and remove_vma() > > 2) update time on sync(2) as well > > 3) update time on MS_ASYNC as well > > Oh, and the four-liner I posted the other day will give you 1) + 2) + > even more at a small fraction of the complexity. And tacking on the > reprotect code will solve the MS_ASYNC issue just the same. > > I agree, that having the timestamp updated on sync() is nice, and that > trivial patch will give you that, and will also update the timestamp > at least each 30 seconds if the file is being constantly modified, > even if no explicit syncing is done. > > So maybe it's worth a little effort benchmarking how much that patch > affects the cost of writing to a page. > > You could write a little test program like this (if somebody hasn't > yet done so): > > - do some preparation: > > echo 80 > dirty_ratio > echo 80 > dirty_background_ratio > echo 30000 > dirty_expire_centisecs > sync > > - map a large file, one that fits comfortably into free memory > - bring the whole file in, by reading a byte from each page > - start the timer > - write a byte to each page > - stop the timer > > It would be most interesting to try this on a filesystem supporting > nanosecond timestamps. Anyone know which these are? The do_wp_page() function is called in mm/memory.c after locking PTE. And the file_update_time() routine calls the filesystem operation that can sleep. It's not accepted, I guess. > > Miklos > ---- > > Index: linux/mm/memory.c > =================================================================== > --- linux.orig/mm/memory.c 2008-01-09 21:16:30.000000000 +0100 > +++ linux/mm/memory.c 2008-01-15 21:16:14.000000000 +0100 > @@ -1680,6 +1680,8 @@ gotten: > unlock: > pte_unmap_unlock(page_table, ptl); > if (dirty_page) { > + if (vma->vm_file) > + file_update_time(vma->vm_file); > /* > * Yes, Virginia, this is actually required to prevent a race > * with clear_page_dirty_for_io() from clearing the page dirty > @@ -2313,6 +2315,8 @@ out_unlocked: > if (anon) > page_cache_release(vmf.page); > else if (dirty_page) { > + if (vma->vm_file) > + file_update_time(vma->vm_file); > set_page_dirty_balance(dirty_page, page_mkwrite); > put_page(dirty_page); > } > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 16:20 ` Anton Salikhmetov @ 2008-01-17 16:26 ` Miklos Szeredi 2008-01-17 16:33 ` Anton Salikhmetov 0 siblings, 1 reply; 18+ messages in thread From: Miklos Szeredi @ 2008-01-17 16:26 UTC (permalink / raw) To: salikhmetov Cc: miklos, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch > The do_wp_page() function is called in mm/memory.c after locking PTE. > And the file_update_time() routine calls the filesystem operation that can > sleep. It's not accepted, I guess. do_wp_page() is called with the pte lock but drops it, so that's fine. Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH -v5 2/2] Updating ctime and mtime at syncing 2008-01-17 16:26 ` Miklos Szeredi @ 2008-01-17 16:33 ` Anton Salikhmetov 0 siblings, 0 replies; 18+ messages in thread From: Anton Salikhmetov @ 2008-01-17 16:33 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb, r.e.wolff, hidave.darkstar, hch 2008/1/17, Miklos Szeredi <miklos@szeredi.hu>: > > The do_wp_page() function is called in mm/memory.c after locking PTE. > > And the file_update_time() routine calls the filesystem operation that can > > sleep. It's not accepted, I guess. > > do_wp_page() is called with the pte lock but drops it, so that's fine. OK, I agree. I'll take into account your suggestion to move updating time stamps from the __set_page_dirty() and __set_page_dirty_nobuffers() routines to do_wp_page(). Thank you! > > Miklos > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-01-17 16:33 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-17 0:57 [PATCH -v5 0/2] Updating ctime and mtime for memory-mapped files Anton Salikhmetov 2008-01-17 0:57 ` [PATCH -v5 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov 2008-01-17 11:01 ` Miklos Szeredi 2008-01-17 11:47 ` Anton Salikhmetov 2008-01-17 0:57 ` [PATCH -v5 2/2] Updating ctime and mtime at syncing Anton Salikhmetov 2008-01-17 11:13 ` Miklos Szeredi 2008-01-17 12:16 ` Anton Salikhmetov 2008-01-17 12:45 ` Miklos Szeredi 2008-01-17 12:51 ` Rogier Wolff 2008-01-17 13:16 ` Anton Salikhmetov 2008-01-17 13:24 ` Rogier Wolff 2008-01-17 13:34 ` Anton Salikhmetov 2008-01-17 13:33 ` Miklos Szeredi 2008-01-17 13:40 ` Anton Salikhmetov 2008-01-17 15:45 ` Miklos Szeredi 2008-01-17 16:20 ` Anton Salikhmetov 2008-01-17 16:26 ` Miklos Szeredi 2008-01-17 16:33 ` Anton Salikhmetov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).