linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Anton Salikhmetov <salikhmetov@gmail.com>
To: linux-mm@kvack.org, jakob@unthought.net,
	linux-kernel@vger.kernel.org, valdis.kletnieks@vt.edu,
	riel@redhat.com, ksm@42.dk, staubach@redhat.com,
	jesper.juhl@gmail.com, torvalds@linux-foundation.org,
	a.p.zijlstra@chello.nl, akpm@linux-foundation.org,
	protasnb@gmail.com, miklos@szeredi.hu, r.e.wolff@bitwizard.nl,
	hidave.darkstar@gmail.com, hch@infradead.org
Subject: [PATCH -v8 4/4] The design document for memory-mapped file times update
Date: Wed, 23 Jan 2008 02:21:20 +0300	[thread overview]
Message-ID: <1201044083554-git-send-email-salikhmetov@gmail.com> (raw)
In-Reply-To: <12010440803930-git-send-email-salikhmetov@gmail.com>

Add a document, which describes how the POSIX requirements on updating
memory-mapped file times are addressed in Linux.

Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com>
---
 Documentation/vm/00-INDEX  |    2 +
 Documentation/vm/msync.txt |  117 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+), 0 deletions(-)

diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index 2131b00..2726c8d 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -6,6 +6,8 @@ hugetlbpage.txt
 	- a brief summary of hugetlbpage support in the Linux kernel.
 locking
 	- info on how locking and synchronization is done in the Linux vm code.
+msync.txt
+	- the design document for memory-mapped file times update
 numa
 	- information about NUMA specific code in the Linux vm.
 numa_memory_policy.txt
diff --git a/Documentation/vm/msync.txt b/Documentation/vm/msync.txt
new file mode 100644
index 0000000..571a766
--- /dev/null
+++ b/Documentation/vm/msync.txt
@@ -0,0 +1,117 @@
+
+	The msync() system call and memory-mapped file times
+
+	Copyright (C) 2008 Anton Salikhmetov
+
+The POSIX standard requires that any write reference to memory-mapped file
+data should result in updating the ctime and mtime for that file. Moreover,
+the standard mandates that updated file times should become visible to the
+world no later than at the next call to msync().
+
+Failure to meet this requirement creates difficulties for certain classes
+of important applications. For instance, database backup systems fail to
+pick up the files modified via the mmap() interface. Also, this is a
+security hole, which allows forging file data in such a manner that proving
+the fact that file data was modified is not possible.
+
+Briefly put, this requirement can be stated as follows:
+
+	once the file data has changed, the operating system
+	should acknowledge this fact by updating file metadata.
+
+This document describes how this POSIX requirement is addressed in Linux.
+
+1. Requirements
+
+1.1) the POSIX standard requires updating ctime and mtime not later
+than at the call to msync() with MS_SYNC or MS_ASYNC flags;
+
+1.2) in existing POSIX implementations, ctime and mtime
+get updated not later than at the call to fsync();
+
+1.3) in existing POSIX implementation, ctime and mtime
+get updated not later than at the call to sync(), the "auto-update" feature;
+
+1.4) the customers require and the common sense suggests that
+ctime and mtime should be updated not later than at the call to munmap()
+or exit(), the latter function implying an implicit call to munmap();
+
+1.5) the (1.1) item should be satisfied if the file is a block device
+special file;
+
+1.6) the (1.1) item should be satisfied for files residing on
+memory-backed filesystems such as tmpfs, too.
+
+The following operating systems were used as the reference platforms
+and are referred to as the "existing implementations" above:
+HP-UX B.11.31 and FreeBSD 6.2-RELEASE.
+
+2. Lazy update
+
+Many attempts before the current version implemented the "lazy update" approach
+to satisfying the requirements given above. Within the latter approach, ctime
+and mtime get updated at last moment allowable.
+
+Since we don't update the file times immediately, some Flag has to be
+used. When up, this Flag means that the file data was modified and
+the file times need to be updated as soon as possible.
+
+Any existing "dirty" flag which, when up, mean that a page has been written to,
+is not suitable for this purpose. Indeed, msync() called with MS_ASYNC
+would have to reset this "dirty" flag after updating ctime and mtime.
+The sys_msync() function itself is basically a no-op in the MS_ASYNC case.
+Thereby, the synchronization routines relying upon this "dirty" flag
+would lose data. Therefore, a new Flag has to be introduced.
+
+The (1.5) item coupled with (1.3) requirement leads to hard work with
+the block device inodes. Specifically, during writeback it is impossible to
+tell which block device file was originally mapped. Therefore, we need to
+traverse the list of "active" devices associated with the block device inode.
+This would lead to updating file times for block device files, which were not
+taking part in the data transfer.
+
+Also all versions prior to version 6 failed to correctly process ctime and
+mtime for files on the memory-backed filesystems such as tmpfs. So the (1.6)
+requirement was not satisfied.
+
+If a write reference has occurred between two consecutive calls to msync()
+with MS_ASYNC, the second call to the latter function should take into
+account the last write reference. The last write reference can not be caught
+if no pagefault occurs. Hence a pagefault needs to be forced. This can be done
+using two different approaches. The first one is to synchronize data even when
+msync() was called with MS_ASYNC. This is not acceptable because the current
+design of the sys_msync() routine forbids starting I/O for the MS_ASYNC case.
+The second approach is to write protect the page for triggering a pagefault
+at the next write reference. Note that the dirty flag for the page should not
+be cleared thereby.
+
+In the "lazy update" approach, the requirements (1.1), (1.2), (1.3), and (1.4)
+taken together result in adding code at least to the following kernel routines:
+sys_msync(), do_fsync(), some routine in the unmap() call path, some routine
+in the sync() call path.
+
+Finally, a file_update_time()-like function would have to be created for
+processing the inode objects, not file objects. This is due to the fact that
+during the sync() operation, the file object may not exist any more, only
+the inode is known.
+
+To sum up: this "lazy" approach leads to massive changes, incurs overhead in
+the block device case, and requires complicated design decisions.
+
+3. Immediate update
+
+OK, still reading? There's a better way.
+
+In a fashion analogous to what happens at write(2), react to the fact
+that the page gets dirtied by updating the file times immediately.
+Thereby any page writeback happens when the write reference has already
+been accounted for from the view point of file times.
+
+The only problem which remains is to force refreshing file times at the write
+reference following a call to msync() with MS_ASYNC. As mentioned above, all
+that is needed here is to force a pagefault.
+
+The vma_wrprotect() routine introduced in this patch series is called
+from sys_msync() in the MS_ASYNC case. The former routine is essentially
+a version of existing page_mkclean_one() function from mm/rmap.c. Unlike
+the latter function, the vma_wrprotect() does not touch the dirty bit.
-- 
1.4.4.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-01-22 23:21 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-22 23:21 [PATCH -v8 0/4] Fixing the issue with memory-mapped file times Anton Salikhmetov
2008-01-22 23:21 ` [PATCH -v8 1/4] Massive code cleanup of sys_msync() Anton Salikhmetov
2008-01-22 23:21 ` [PATCH -v8 2/4] Update ctime and mtime for memory-mapped files Anton Salikhmetov
2008-01-23 18:03   ` Linus Torvalds
2008-01-23 23:14     ` Anton Salikhmetov
2008-01-22 23:21 ` [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys_msync() Anton Salikhmetov
2008-01-23  8:47   ` Peter Zijlstra
2008-01-23  8:51     ` Peter Zijlstra
2008-01-23  9:34       ` Miklos Szeredi
2008-01-23  9:51         ` Miklos Szeredi
2008-01-23 13:09           ` Anton Salikhmetov
2008-01-23 12:53     ` Anton Salikhmetov
2008-01-23  9:41   ` Miklos Szeredi
2008-01-23 17:05   ` Linus Torvalds
2008-01-23 17:26     ` Anton Salikhmetov
2008-01-23 17:41     ` Peter Zijlstra
2008-01-23 19:35       ` Linus Torvalds
2008-01-23 19:55         ` Miklos Szeredi
2008-01-23 21:00           ` Linus Torvalds
2008-01-23 21:16             ` Miklos Szeredi
2008-01-23 21:36               ` Linus Torvalds
2008-01-23 22:29                 ` Hugh Dickins
2008-01-23 22:41                   ` Linus Torvalds
2008-01-24  0:03                     ` Hugh Dickins
2008-01-24  0:05                 ` Miklos Szeredi
2008-01-24  0:11                   ` Linus Torvalds
2008-01-24  1:36     ` Nick Piggin
2008-01-24 18:56       ` Matt Mackall
2008-01-22 23:21 ` Anton Salikhmetov [this message]
2008-01-23  9:26   ` [PATCH -v8 4/4] The design document for memory-mapped file times update Miklos Szeredi
2008-01-23 10:37     ` Anton Salikhmetov
2008-01-23 10:53       ` Miklos Szeredi
2008-01-23 11:16         ` Miklos Szeredi
2008-01-23 12:25           ` Anton Salikhmetov
2008-01-23 13:55             ` Miklos Szeredi
2008-01-25 16:27   ` Randy Dunlap
2008-01-25 16:40     ` Anton Salikhmetov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1201044083554-git-send-email-salikhmetov@gmail.com \
    --to=salikhmetov@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=hidave.darkstar@gmail.com \
    --cc=jakob@unthought.net \
    --cc=jesper.juhl@gmail.com \
    --cc=ksm@42.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=protasnb@gmail.com \
    --cc=r.e.wolff@bitwizard.nl \
    --cc=riel@redhat.com \
    --cc=staubach@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valdis.kletnieks@vt.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).