From: Nick Piggin <npiggin@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>,
Mark Fasheh <mark.fasheh@oracle.com>,
Linux Memory Management <linux-mm@kvack.org>,
Andrew Morton <akpm@osdl.org>
Subject: [patch 03/44] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83
Date: Tue, 24 Apr 2007 11:23:49 +1000 [thread overview]
Message-ID: <20070424013432.285287000@suse.de> (raw)
In-Reply-To: 20070424012346.696840000@suse.de
[-- Attachment #1: mm-revert-buffered-write-deadlock-fix.patch --]
[-- Type: text/plain, Size: 2922 bytes --]
From: Andrew Morton <akpm@osdl.org>
This patch fixed the following bug:
When prefaulting in the pages in generic_file_buffered_write(), we only
faulted in the pages for the firts segment of the iovec. If the second of
successive segment described a mmapping of the page into which we're
write()ing, and that page is not up-to-date, the fault handler tries to lock
the already-locked page (to bring it up to date) and deadlocks.
An exploit for this bug is in writev-deadlock-demo.c, in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
(These demos assume blocksize < PAGE_CACHE_SIZE).
The problem with this fix is that it takes the kernel back to doing a single
prepare_write()/commit_write() per iovec segment. So in the worst case we'll
run prepare_write+commit_write 1024 times where we previously would have run
it once. The other problem with the fix is that it fix all the locking problems.
<insert numbers obtained via ext3-tools's writev-speed.c here>
And apparently this change killed NFS overwrite performance, because, I
suppose, it talks to the server for each prepare_write+commit_write.
So just back that patch out - we'll be fixing the deadlock by other means.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Nick says: also it only ever actually papered over the bug, because after
faulting in the pages, they might be unmapped or reclaimed.
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1971,21 +1971,14 @@ generic_file_buffered_write(struct kiocb
do {
unsigned long index;
unsigned long offset;
+ unsigned long maxlen;
size_t copied;
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
-
- /* Limit the size of the copy to the caller's write size */
- bytes = min(bytes, count);
-
- /*
- * Limit the size of the copy to that of the current segment,
- * because fault_in_pages_readable() doesn't know how to walk
- * segments.
- */
- bytes = min(bytes, cur_iov->iov_len - iov_base);
+ if (bytes > count)
+ bytes = count;
/*
* Bring in the user page that we will copy from _first_.
@@ -1993,7 +1986,10 @@ generic_file_buffered_write(struct kiocb
* same page as we're writing to, without it being marked
* up-to-date.
*/
- fault_in_pages_readable(buf, bytes);
+ maxlen = cur_iov->iov_len - iov_base;
+ if (maxlen > bytes)
+ maxlen = bytes;
+ fault_in_pages_readable(buf, maxlen);
page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
if (!page) {
--
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de>
From: Andrew Morton <akpm@osdl.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>,
Mark Fasheh <mark.fasheh@oracle.com>,
Linux Memory Management <linux-mm@kvack.org>,
Andrew Morton <akpm@osdl.org>
Subject: [patch 03/44] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83
Date: Tue, 24 Apr 2007 11:23:49 +1000 [thread overview]
Message-ID: <20070424013432.285287000@suse.de> (raw)
In-Reply-To: 20070424012346.696840000@suse.de
[-- Attachment #1: mm-revert-buffered-write-deadlock-fix.patch --]
[-- Type: text/plain, Size: 3110 bytes --]
This patch fixed the following bug:
When prefaulting in the pages in generic_file_buffered_write(), we only
faulted in the pages for the firts segment of the iovec. If the second of
successive segment described a mmapping of the page into which we're
write()ing, and that page is not up-to-date, the fault handler tries to lock
the already-locked page (to bring it up to date) and deadlocks.
An exploit for this bug is in writev-deadlock-demo.c, in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
(These demos assume blocksize < PAGE_CACHE_SIZE).
The problem with this fix is that it takes the kernel back to doing a single
prepare_write()/commit_write() per iovec segment. So in the worst case we'll
run prepare_write+commit_write 1024 times where we previously would have run
it once. The other problem with the fix is that it fix all the locking problems.
<insert numbers obtained via ext3-tools's writev-speed.c here>
And apparently this change killed NFS overwrite performance, because, I
suppose, it talks to the server for each prepare_write+commit_write.
So just back that patch out - we'll be fixing the deadlock by other means.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Nick says: also it only ever actually papered over the bug, because after
faulting in the pages, they might be unmapped or reclaimed.
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1971,21 +1971,14 @@ generic_file_buffered_write(struct kiocb
do {
unsigned long index;
unsigned long offset;
+ unsigned long maxlen;
size_t copied;
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
-
- /* Limit the size of the copy to the caller's write size */
- bytes = min(bytes, count);
-
- /*
- * Limit the size of the copy to that of the current segment,
- * because fault_in_pages_readable() doesn't know how to walk
- * segments.
- */
- bytes = min(bytes, cur_iov->iov_len - iov_base);
+ if (bytes > count)
+ bytes = count;
/*
* Bring in the user page that we will copy from _first_.
@@ -1993,7 +1986,10 @@ generic_file_buffered_write(struct kiocb
* same page as we're writing to, without it being marked
* up-to-date.
*/
- fault_in_pages_readable(buf, bytes);
+ maxlen = cur_iov->iov_len - iov_base;
+ if (maxlen > bytes)
+ maxlen = bytes;
+ fault_in_pages_readable(buf, maxlen);
page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
if (!page) {
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-04-24 5:20 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-24 1:23 [patch 00/44] Buffered write deadlock fix and new aops for 2.6.21-rc6-mm1 Nick Piggin
2007-04-24 1:23 ` [patch 01/44] mm: revert KERNEL_DS buffered write optimisation Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 1:23 ` [patch 02/44] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6 Nick Piggin
2007-04-24 1:23 ` Nick Piggin, Andrew Morton
2007-04-24 1:23 ` Nick Piggin [this message]
2007-04-24 1:23 ` [patch 03/44] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 Nick Piggin, Andrew Morton
2007-04-24 1:23 ` [patch 04/44] mm: clean up buffered write code Nick Piggin
2007-04-24 1:23 ` Nick Piggin, Andrew Morton
2007-04-24 1:23 ` [patch 05/44] mm: debug write deadlocks Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 1:23 ` [patch 06/44] mm: trim more holes Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 6:07 ` Neil Brown
2007-04-24 6:07 ` Neil Brown
2007-04-24 6:17 ` Nick Piggin
2007-04-24 6:17 ` Nick Piggin
2007-04-24 1:23 ` [patch 07/44] mm: buffered write cleanup Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 1:23 ` [patch 08/44] mm: write iovec cleanup Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 1:23 ` [patch 09/44] mm: fix pagecache write deadlocks Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 1:23 ` [patch 10/44] mm: buffered write iterator Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 1:23 ` [patch 11/44] fs: fix data-loss on error Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 1:23 ` [patch 12/44] fs: introduce write_begin, write_end, and perform_write aops Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 6:59 ` Neil Brown
2007-04-24 6:59 ` Neil Brown
2007-04-24 7:23 ` Nick Piggin
2007-04-24 7:23 ` Nick Piggin
2007-04-24 7:49 ` Neil Brown
2007-04-24 7:49 ` Neil Brown
2007-04-24 10:37 ` Nick Piggin
2007-04-24 10:37 ` Nick Piggin
2007-04-24 1:23 ` [patch 13/44] mm: restore KERNEL_DS optimisations Nick Piggin
2007-04-24 1:23 ` Nick Piggin
2007-04-24 10:43 ` Christoph Hellwig
2007-04-24 10:43 ` Christoph Hellwig
2007-04-24 11:03 ` Nick Piggin
2007-04-24 11:03 ` Nick Piggin
2007-04-24 1:24 ` [patch 14/44] implement simple fs aops Nick Piggin
2007-04-24 1:24 ` [patch 15/44] block_dev convert to new aops Nick Piggin
2007-04-24 1:24 ` [patch 16/44] rd " Nick Piggin
2007-04-24 10:46 ` Christoph Hellwig
2007-04-24 11:05 ` Nick Piggin
2007-04-24 11:11 ` Christoph Hellwig
2007-04-24 11:16 ` Nick Piggin
2007-04-24 11:18 ` Christoph Hellwig
2007-04-24 11:20 ` Nick Piggin
2007-04-24 11:42 ` Neil Brown
2007-04-24 1:24 ` [patch 17/44] ext2 " Nick Piggin
2007-04-24 1:24 ` [patch 18/44] ext3 " Nick Piggin
2007-04-24 1:24 ` [patch 19/44] ext4 " Nick Piggin
2007-04-24 1:24 ` [patch 20/44] xfs " Nick Piggin
2007-04-24 1:24 ` [patch 21/44] fs: new cont helpers Nick Piggin
2007-04-24 1:24 ` [patch 22/44] fat convert to new aops Nick Piggin
2007-04-24 1:24 ` [patch 23/44] adfs " Nick Piggin
2007-04-24 1:24 ` [patch 24/44] affs " Nick Piggin
2007-04-24 1:24 ` [patch 25/44] hfs " Nick Piggin
2007-04-24 1:24 ` [patch 26/44] hfsplus " Nick Piggin
2007-04-24 1:24 ` [patch 27/44] hpfs " Nick Piggin
2007-04-24 1:24 ` [patch 28/44] bfs " Nick Piggin
2007-04-24 1:24 ` [patch 29/44] qnx4 " Nick Piggin
2007-04-24 1:24 ` [patch 30/44] nfs " Nick Piggin
2007-04-24 1:24 ` [patch 31/44] smb " Nick Piggin
2007-04-24 1:24 ` [patch 32/44] ocfs2: " Nick Piggin
2007-04-24 1:24 ` [patch 33/44] gfs2 " Nick Piggin
2007-04-24 1:24 ` [patch 34/44] fs: no AOP_TRUNCATED_PAGE for writes Nick Piggin
2007-04-24 1:24 ` [patch 35/44] ecryptfs convert to new aops Nick Piggin
2007-04-24 1:24 ` [patch 36/44] fuse " Nick Piggin
2007-04-24 1:24 ` [patch 37/44] hostfs " Nick Piggin
2007-04-27 16:11 ` Jeff Dike
2007-04-24 1:24 ` [patch 38/44] jffs2 " Nick Piggin
2007-04-24 1:24 ` [patch 39/44] cifs " Nick Piggin
2007-04-24 1:24 ` [patch 40/44] ufs " Nick Piggin
2007-04-24 1:24 ` [patch 41/44] udf " Nick Piggin
2007-04-24 1:24 ` [patch 42/44] sysv " Nick Piggin
2007-04-24 1:24 ` [patch 43/44] minix " Nick Piggin
2007-04-24 1:24 ` [patch 44/44] jfs " Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070424013432.285287000@suse.de \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=akpm@osdl.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.fasheh@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.