All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Linux Memory Management <linux-mm@kvack.org>
Cc: Neil Brown <neilb@suse.de>, Andrew Morton <akpm@osdl.org>,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	Chris Mason <chris.mason@oracle.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Nick Piggin <npiggin@suse.de>
Subject: [patch 2/6] mm: revert "generic_file_buffered_write(): deadlock on vectored write"
Date: Fri, 13 Oct 2006 18:44:12 +0200 (CEST)	[thread overview]
Message-ID: <20061013143536.15438.66118.sendpatchset@linux.site> (raw)
In-Reply-To: <20061013143516.15438.8802.sendpatchset@linux.site>

From: Andrew Morton <akpm@osdl.org>

Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83

This patch fixed the following bug:

  When prefaulting in the pages in generic_file_buffered_write(), we only
  faulted in the pages for the firts segment of the iovec.  If the second of
  successive segment described a mmapping of the page into which we're
  write()ing, and that page is not up-to-date, the fault handler tries to lock
  the already-locked page (to bring it up to date) and deadlocks.

  An exploit for this bug is in writev-deadlock-demo.c, in
  http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

  (These demos assume blocksize < PAGE_CACHE_SIZE).

The problem with this fix is that it takes the kernel back to doing a single
prepare_write()/commit_write() per iovec segment.  So in the worst case we'll
run prepare_write+commit_write 1024 times where we previously would have run
it once. The other problem with the fix is that it fix all the locking problems.


<insert numbers obtained via ext3-tools's writev-speed.c here>

And apparently this change killed NFS overwrite performance, because, I
suppose, it talks to the server for each prepare_write+commit_write.

So just back that patch out - we'll be fixing the deadlock by other means.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1882,21 +1882,14 @@ generic_file_buffered_write(struct kiocb
 	do {
 		unsigned long index;
 		unsigned long offset;
+		unsigned long maxlen;
 		size_t copied;
 
 		offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
 		index = pos >> PAGE_CACHE_SHIFT;
 		bytes = PAGE_CACHE_SIZE - offset;
-
-		/* Limit the size of the copy to the caller's write size */
-		bytes = min(bytes, count);
-
-		/*
-		 * Limit the size of the copy to that of the current segment,
-		 * because fault_in_pages_readable() doesn't know how to walk
-		 * segments.
-		 */
-		bytes = min(bytes, cur_iov->iov_len - iov_base);
+		if (bytes > count)
+			bytes = count;
 
 		/*
 		 * Bring in the user page that we will copy from _first_.
@@ -1904,7 +1897,10 @@ generic_file_buffered_write(struct kiocb
 		 * same page as we're writing to, without it being marked
 		 * up-to-date.
 		 */
-		fault_in_pages_readable(buf, bytes);
+		maxlen = cur_iov->iov_len - iov_base;
+		if (maxlen > bytes)
+			maxlen = bytes;
+		fault_in_pages_readable(buf, maxlen);
 
 		page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
 		if (!page) {

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@suse.de>
From: Andrew Morton <akpm@osdl.org>
To: Linux Memory Management <linux-mm@kvack.org>
Cc: Neil Brown <neilb@suse.de>, Andrew Morton <akpm@osdl.org>,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	Chris Mason <chris.mason@oracle.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Nick Piggin <npiggin@suse.de>
Subject: [patch 2/6] mm: revert "generic_file_buffered_write(): deadlock on vectored write"
Date: Fri, 13 Oct 2006 18:44:12 +0200 (CEST)	[thread overview]
Message-ID: <20061013143536.15438.66118.sendpatchset@linux.site> (raw)
In-Reply-To: <20061013143516.15438.8802.sendpatchset@linux.site>

Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83

This patch fixed the following bug:

  When prefaulting in the pages in generic_file_buffered_write(), we only
  faulted in the pages for the firts segment of the iovec.  If the second of
  successive segment described a mmapping of the page into which we're
  write()ing, and that page is not up-to-date, the fault handler tries to lock
  the already-locked page (to bring it up to date) and deadlocks.

  An exploit for this bug is in writev-deadlock-demo.c, in
  http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

  (These demos assume blocksize < PAGE_CACHE_SIZE).

The problem with this fix is that it takes the kernel back to doing a single
prepare_write()/commit_write() per iovec segment.  So in the worst case we'll
run prepare_write+commit_write 1024 times where we previously would have run
it once. The other problem with the fix is that it fix all the locking problems.


<insert numbers obtained via ext3-tools's writev-speed.c here>

And apparently this change killed NFS overwrite performance, because, I
suppose, it talks to the server for each prepare_write+commit_write.

So just back that patch out - we'll be fixing the deadlock by other means.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1882,21 +1882,14 @@ generic_file_buffered_write(struct kiocb
 	do {
 		unsigned long index;
 		unsigned long offset;
+		unsigned long maxlen;
 		size_t copied;
 
 		offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
 		index = pos >> PAGE_CACHE_SHIFT;
 		bytes = PAGE_CACHE_SIZE - offset;
-
-		/* Limit the size of the copy to the caller's write size */
-		bytes = min(bytes, count);
-
-		/*
-		 * Limit the size of the copy to that of the current segment,
-		 * because fault_in_pages_readable() doesn't know how to walk
-		 * segments.
-		 */
-		bytes = min(bytes, cur_iov->iov_len - iov_base);
+		if (bytes > count)
+			bytes = count;
 
 		/*
 		 * Bring in the user page that we will copy from _first_.
@@ -1904,7 +1897,10 @@ generic_file_buffered_write(struct kiocb
 		 * same page as we're writing to, without it being marked
 		 * up-to-date.
 		 */
-		fault_in_pages_readable(buf, bytes);
+		maxlen = cur_iov->iov_len - iov_base;
+		if (maxlen > bytes)
+			maxlen = bytes;
+		fault_in_pages_readable(buf, maxlen);
 
 		page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
 		if (!page) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-10-13 16:44 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-13 16:43 [rfc] buffered write deadlock fix Nick Piggin
2006-10-13 16:43 ` Nick Piggin
2006-10-13 16:44 ` [patch 1/6] mm: revert "generic_file_buffered_write(): handle zero length iovec segments" Nick Piggin
2006-10-13 16:44   ` Nick Piggin, Andrew Morton
2006-10-13 16:44 ` Nick Piggin [this message]
2006-10-13 16:44   ` [patch 2/6] mm: revert "generic_file_buffered_write(): deadlock on vectored write" Nick Piggin, Andrew Morton
2006-10-13 16:44 ` [patch 3/6] mm: generic_file_buffered_write cleanup Nick Piggin
2006-10-13 16:44   ` Nick Piggin, Andrew Morton
2006-10-13 16:44 ` [patch 4/6] mm: comment mmap_sem / lock_page lockorder Nick Piggin
2006-10-13 16:44   ` Nick Piggin
2006-10-13 16:44 ` [patch 5/6] mm: debug write deadlocks Nick Piggin
2006-10-13 16:44   ` Nick Piggin
2006-10-13 16:44 ` [patch 6/6] mm: fix pagecache " Nick Piggin
2006-10-13 16:44   ` Nick Piggin, Andrew Morton
2006-10-13 22:14   ` Andrew Morton
2006-10-13 22:14     ` Andrew Morton
2006-10-14  4:19     ` Nick Piggin
2006-10-14  4:19       ` Nick Piggin
2006-10-14  4:30       ` Nick Piggin
2006-10-14  4:30         ` Nick Piggin
2006-10-15 11:35       ` Peter Zijlstra
2006-10-15 11:35         ` Peter Zijlstra
2006-10-14  5:04   ` Nick Piggin
2006-10-14  5:04     ` Nick Piggin
2006-10-15 11:37   ` Peter Zijlstra
2006-10-15 11:37     ` Peter Zijlstra
2006-10-15 11:56     ` Nick Piggin
2006-10-15 11:56       ` Nick Piggin
2006-10-15 13:51       ` Peter Zijlstra
2006-10-15 13:51         ` Peter Zijlstra
2006-10-15 14:19         ` SPAM: " Nick Piggin
2006-10-15 14:19           ` Nick Piggin
2006-10-15 15:47           ` Peter Zijlstra
2006-10-15 15:47             ` Peter Zijlstra
2006-10-15 15:57             ` RRe: " Nick Piggin
2006-10-15 15:57               ` Nick Piggin
2006-10-15 16:13               ` Peter Zijlstra
2006-10-15 16:13                 ` Peter Zijlstra
2006-10-16 15:24                 ` pagefault_disable (was Re: [patch 6/6] mm: fix pagecache write deadlocks) Nick Piggin
2006-10-16 15:24                   ` Nick Piggin
2006-10-16 16:05                   ` Peter Zijlstra
2006-10-16 16:05                     ` Peter Zijlstra
2006-10-16 16:12                     ` Nick Piggin
2006-10-16 16:12                       ` Nick Piggin
2006-10-18 14:25   ` [patch 6/6] mm: fix pagecache write deadlocks Chris Mason
2006-10-18 14:25     ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061013143536.15438.66118.sendpatchset@linux.site \
    --to=npiggin@suse.de \
    --cc=aia21@cam.ac.uk \
    --cc=akpm@osdl.org \
    --cc=chris.mason@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.