public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nfs: clear_commit_release incorrectly handle truncated page
@ 2010-02-02 10:36 Dmitry Monakhov
  2010-02-02 15:04 ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Monakhov @ 2010-02-02 10:36 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1744 bytes --]


After page was truncated it lost it's mapping, this result in null
pointer dereference on bdi_stat update. In fact we have to decrement
bdi_stat even for truncated pages, so let's pass correct mapping in
function arguments. Patch against linux-2.6
##TEST_CASE
/*
Tast case for bug in nfs_clear_request_commit()
caused by null pointer dereference in case of truncated page.
It takes less than 10 minutes to reproduce the bug.
### start script
#! /bin/bash -x
# mount my-host:/my-share /mnt
mkdir /mnt/T
cd /mnt/T || exit 1
while true ;
      do for ((i=0;i<3;i++)); do
      	 /tmp/mmap file3 200000000 $i & done;
	 sleep 2 ;
	 killall -9 mmap ;
      done
done
*/
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
	char *addr;
	unsigned int size;
	off_t off;
	int i = 0;
	int fd, result;
	if (argc != 4) {
		perror("Wrong args:\n Usage: %s [file] [size] [offset]\n");
		exit(EXIT_FAILURE);
	}
	size = atoi(argv[2]);
	size &= ~(4096-1);
	off = atol(argv[3]);
	off = ((off_t)size)*off;
	fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, (mode_t)0600);
	if (fd == -1) {
		perror("Error opening file for writing");
		exit(EXIT_FAILURE);
	}
	result = lseek(fd, off + (off_t)size, SEEK_SET);
	if (result == -1) {
		close(fd);
		perror("Error calling lseek() to 'stretch' the file");
		exit(EXIT_FAILURE);
	}
	write(fd, "a", 1);
	addr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, off);
	if (addr == MAP_FAILED) {
		close(fd);
		perror("Error mmapping the file");
		exit(EXIT_FAILURE);
	}
	/* Now write int's to the file as if it were memory (an array of ints).
	 */
	while (1) {
		memset(addr, i++, size);
	}
}


[-- Attachment #2: 0001-nfs-clear_commit_release-incorrectly-handle-truncate.patch --]
[-- Type: text/plain, Size: 3113 bytes --]

>From bdce13e6947ad738b68bcb9fd507885d14dca9f0 Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Tue, 2 Feb 2010 13:24:58 +0300
Subject: [PATCH] nfs: clear_commit_release incorrectly handle truncated page

After page was truncated it lost it's mapping, this result in null
pointer dereference on bdi_stat update. In fact we have to decrement
bdi_stat even for truncated pages, so let's pass correct mapping in
function arguments.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/nfs/write.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d171696..bfcf92a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -445,13 +445,13 @@ nfs_mark_request_commit(struct nfs_page *req)
 }
 
 static int
-nfs_clear_request_commit(struct nfs_page *req)
+nfs_clear_request_commit(struct nfs_page *req, struct address_space *mapping)
 {
 	struct page *page = req->wb_page;
-
+	/* page->mapping may be NULL if page was truncated */
 	if (test_and_clear_bit(PG_CLEAN, &(req)->wb_flags)) {
 		dec_zone_page_state(page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(page->mapping->backing_dev_info, BDI_RECLAIMABLE);
+		dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
 		return 1;
 	}
 	return 0;
@@ -483,7 +483,7 @@ nfs_mark_request_commit(struct nfs_page *req)
 }
 
 static inline int
-nfs_clear_request_commit(struct nfs_page *req)
+nfs_clear_request_commit(struct nfs_page *req, struct address_space *mapping)
 {
 	return 0;
 }
@@ -539,14 +539,15 @@ static int nfs_wait_on_requests_locked(struct inode *inode, pgoff_t idx_start, u
 	return res;
 }
 
-static void nfs_cancel_commit_list(struct list_head *head)
+static void nfs_cancel_commit_list(struct list_head *head,
+				struct address_space *mapping)
 {
 	struct nfs_page *req;
 
 	while(!list_empty(head)) {
 		req = nfs_list_entry(head->next);
 		nfs_list_remove_request(req);
-		nfs_clear_request_commit(req);
+		nfs_clear_request_commit(req, mapping);
 		nfs_inode_remove_request(req);
 		nfs_unlock_request(req);
 	}
@@ -642,7 +643,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 		spin_lock(&inode->i_lock);
 	}
 
-	if (nfs_clear_request_commit(req))
+	if (nfs_clear_request_commit(req, inode->i_mapping))
 		radix_tree_tag_clear(&NFS_I(inode)->nfs_page_tree,
 				req->wb_index, NFS_PAGE_TAG_COMMIT);
 
@@ -1352,7 +1353,7 @@ static void nfs_commit_release(void *calldata)
 	while (!list_empty(&data->pages)) {
 		req = nfs_list_entry(data->pages.next);
 		nfs_list_remove_request(req);
-		nfs_clear_request_commit(req);
+		nfs_clear_request_commit(req, data->inode->i_mapping);
 
 		dprintk("NFS:       commit (%s/%lld %d@%lld)",
 			req->wb_context->path.dentry->d_inode->i_sb->s_id,
@@ -1449,7 +1450,7 @@ long nfs_sync_mapping_wait(struct address_space *mapping, struct writeback_contr
 			break;
 		if (how & FLUSH_INVALIDATE) {
 			spin_unlock(&inode->i_lock);
-			nfs_cancel_commit_list(&head);
+			nfs_cancel_commit_list(&head, mapping);
 			ret = pages;
 			spin_lock(&inode->i_lock);
 			continue;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 10:36 [PATCH] nfs: clear_commit_release incorrectly handle truncated page Dmitry Monakhov
@ 2010-02-02 15:04 ` Trond Myklebust
  2010-02-02 15:17   ` Dmitry Monakhov
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-02-02 15:04 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Linux Kernel Mailing List, linux-nfs

On Tue, 2010-02-02 at 13:36 +0300, Dmitry Monakhov wrote: 
> After page was truncated it lost it's mapping, this result in null
> pointer dereference on bdi_stat update. In fact we have to decrement
> bdi_stat even for truncated pages, so let's pass correct mapping in
> function arguments. Patch against linux-2.6
> ##TEST_CASE
> /*
> Tast case for bug in nfs_clear_request_commit()
> caused by null pointer dereference in case of truncated page.
> It takes less than 10 minutes to reproduce the bug.

Something is wrong here. nfs_release_page() returns '0' if the 
page has an associated write request (i.e. PagePrivate is set), and so
both invalidate_complete_page() and invalidate_complete_page2() will
fail.

So what is truncating the page?

Trond


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 15:04 ` Trond Myklebust
@ 2010-02-02 15:17   ` Dmitry Monakhov
  2010-02-02 15:36     ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Monakhov @ 2010-02-02 15:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel Mailing List, linux-nfs

Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> On Tue, 2010-02-02 at 13:36 +0300, Dmitry Monakhov wrote: 
>> After page was truncated it lost it's mapping, this result in null
>> pointer dereference on bdi_stat update. In fact we have to decrement
>> bdi_stat even for truncated pages, so let's pass correct mapping in
>> function arguments. Patch against linux-2.6
>> ##TEST_CASE
>> /*
>> Tast case for bug in nfs_clear_request_commit()
>> caused by null pointer dereference in case of truncated page.
>> It takes less than 10 minutes to reproduce the bug.
>
> Something is wrong here. nfs_release_page() returns '0' if the 
> page has an associated write request (i.e. PagePrivate is set), and so
> both invalidate_complete_page() and invalidate_complete_page2() will
> fail.
>
> So what is truncating the page?
truncate_inode_page()
  truncate_complete_page()
    if (page_has_private(page))
       do_invalidatepage()
         ->nfs_invalidate_page()
>
> Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 15:17   ` Dmitry Monakhov
@ 2010-02-02 15:36     ` Trond Myklebust
  2010-02-02 15:56       ` Dmitry Monakhov
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-02-02 15:36 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Linux Kernel Mailing List, linux-nfs

On Tue, 2010-02-02 at 18:17 +0300, Dmitry Monakhov wrote: 
> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> 
> > On Tue, 2010-02-02 at 13:36 +0300, Dmitry Monakhov wrote: 
> >> After page was truncated it lost it's mapping, this result in null
> >> pointer dereference on bdi_stat update. In fact we have to decrement
> >> bdi_stat even for truncated pages, so let's pass correct mapping in
> >> function arguments. Patch against linux-2.6
> >> ##TEST_CASE
> >> /*
> >> Tast case for bug in nfs_clear_request_commit()
> >> caused by null pointer dereference in case of truncated page.
> >> It takes less than 10 minutes to reproduce the bug.
> >
> > Something is wrong here. nfs_release_page() returns '0' if the 
> > page has an associated write request (i.e. PagePrivate is set), and so
> > both invalidate_complete_page() and invalidate_complete_page2() will
> > fail.
> >
> > So what is truncating the page?
> truncate_inode_page()
>   truncate_complete_page()
>     if (page_has_private(page))
>        do_invalidatepage()
>          ->nfs_invalidate_page()

do_invalidate_page() is called before remove_from_page_cache(), so
page->mapping should still be set.

Trond


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 15:36     ` Trond Myklebust
@ 2010-02-02 15:56       ` Dmitry Monakhov
  2010-02-02 16:17         ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Monakhov @ 2010-02-02 15:56 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel Mailing List, linux-nfs

Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> On Tue, 2010-02-02 at 18:17 +0300, Dmitry Monakhov wrote: 
>> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> 
>> > On Tue, 2010-02-02 at 13:36 +0300, Dmitry Monakhov wrote: 
>> >> After page was truncated it lost it's mapping, this result in null
>> >> pointer dereference on bdi_stat update. In fact we have to decrement
>> >> bdi_stat even for truncated pages, so let's pass correct mapping in
>> >> function arguments. Patch against linux-2.6
>> >> ##TEST_CASE
>> >> /*
>> >> Tast case for bug in nfs_clear_request_commit()
>> >> caused by null pointer dereference in case of truncated page.
>> >> It takes less than 10 minutes to reproduce the bug.
>> >
>> > Something is wrong here. nfs_release_page() returns '0' if the 
>> > page has an associated write request (i.e. PagePrivate is set), and so
>> > both invalidate_complete_page() and invalidate_complete_page2() will
>> > fail.
>> >
>> > So what is truncating the page?
>> truncate_inode_page()
>>   truncate_complete_page()
>>     if (page_has_private(page))
>>        do_invalidatepage()
>>          ->nfs_invalidate_page()
>
> do_invalidate_page() is called before remove_from_page_cache(), so
> page->mapping should still be set.
Yes nfs_invalidate_page() happens before, but nfs_clear_commit_release()
is called from rpc task after page was removed from page-cache.
I've add following debug code in to nfs_clear_commit_release()
+ printk("page private index flags")
+ BUG_ON(!page->mapping);
And have got following output:

 page:c5c790e0 private:f109b700  index:97656 fl:8000082c
 ------------[ cut here ]------------
 kernel BUG at fs/nfs/write.c:456!
 invalid opcode: 0000 [#1] SMP 
 last sysfs file: /sys/devices/pci0000:00/0000:00:1b.0/sound/card0/controlC0/uevent
 Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc binfmt_misc kvm_intel kvm radeon ttm drm_kms_helper drm i2c_algo_bit quota_v2 quota_tree snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy thinkpad_acpi snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event arc4 snd_seq iwl3945 snd_timer iwlcore snd_seq_device iptable_filter tpm_tis snd pcmcia mac80211 yenta_socket soundcore ip_tables tpm led_class psmouse rsrc_nonstatic snd_page_alloc tpm_bios x_tables nvram serio_raw sierra cfg80211 pcmcia_core raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear intel_agp video output e1000e agpgart [last unloaded: nfs]
 
 Pid: 3646, comm: nfsiod Not tainted 2.6.33-rc4 #47 2623DDU/2623DDU
 EIP: 0060:[<fc87ce57>] EFLAGS: 00010282 CPU: 0
 EIP is at nfs_clear_request_commit+0xf7/0x100 [nfs]
 EAX: 00000049 EBX: c5c790e0 ECX: c05a9a8f EDX: 05764000
 ESI: f10d9c00 EDI: fc8968f8 EBP: c49fbebc ESP: c49fbea0
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 Process nfsiod (pid: 3646, ti=c49fa000 task=f5e3d580 task.ti=c49fa000)
 Stack:
  fc89b544 c5c790e0 f109b700 00017d78 8000082c f109b700 f10d9c00 c49fbefc
 <0> fc87cee8 00000002 00000001 00000000 c01bd123 00000046 c49fbf00 f5e3d580
 <0> f10d9d28 f10d9d30 00000000 f10d9c00 f10d9c04 f10d9c00 fc8968f8 c49fbf04
 Call Trace:
  [<fc87cee8>] ? nfs_commit_release+0x88/0x1a0 [nfs]
  [<c01bd123>] ? probe_workqueue_execution+0x33/0xa0
  [<f83ebc43>] ? rpc_release_calldata+0x13/0x20 [sunrpc]
  [<f83ebdc1>] ? rpc_free_task+0x41/0x70 [sunrpc]
  [<c015c2c6>] ? worker_thread+0x136/0x300
  [<f83ebea0>] ? rpc_async_release+0x10/0x20 [sunrpc]
  [<c015c327>] ? worker_thread+0x197/0x300
  [<c015c2c6>] ? worker_thread+0x136/0x300
  [<f83ebe90>] ? rpc_async_release+0x0/0x20 [sunrpc]
  [<c015ffb0>] ? autoremove_wake_function+0x0/0x40
  [<c015c190>] ? worker_thread+0x0/0x300
  [<c015fbd4>] ? kthread+0x74/0x80
  [<c015fb60>] ? kthread+0x0/0x80
  [<c010353a>] ? kernel_thread_helper+0x6/0x10
 Code: 0b eb fe 0f 0b eb fe 8b 03 89 44 24 10 8b 43 14 89 44 24 0c 8b 43 0c 89 5c 24 04 89 44 24 08 c7 04 24 44 b5 89 fc e8 2b 95 d2 c3 <0f> 0b eb fe 0f 0b eb fe 90 55 89 e5 57 56 53 83 ec 2c 0f 1f 44 
 EIP: [<fc87ce57>] nfs_clear_request_commit+0xf7/0x100 [nfs] SS:ESP 0068:c49fbea0
 ---[ end trace a852f1835725d3b2 ]---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 15:56       ` Dmitry Monakhov
@ 2010-02-02 16:17         ` Trond Myklebust
  2010-02-02 16:47           ` Dmitry Monakhov
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-02-02 16:17 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Linux Kernel Mailing List, linux-nfs

On Tue, 2010-02-02 at 18:56 +0300, Dmitry Monakhov wrote: 
> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> 
> > On Tue, 2010-02-02 at 18:17 +0300, Dmitry Monakhov wrote: 
> >> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> >> 
> >> > On Tue, 2010-02-02 at 13:36 +0300, Dmitry Monakhov wrote: 
> >> >> After page was truncated it lost it's mapping, this result in null
> >> >> pointer dereference on bdi_stat update. In fact we have to decrement
> >> >> bdi_stat even for truncated pages, so let's pass correct mapping in
> >> >> function arguments. Patch against linux-2.6
> >> >> ##TEST_CASE
> >> >> /*
> >> >> Tast case for bug in nfs_clear_request_commit()
> >> >> caused by null pointer dereference in case of truncated page.
> >> >> It takes less than 10 minutes to reproduce the bug.
> >> >
> >> > Something is wrong here. nfs_release_page() returns '0' if the 
> >> > page has an associated write request (i.e. PagePrivate is set), and so
> >> > both invalidate_complete_page() and invalidate_complete_page2() will
> >> > fail.
> >> >
> >> > So what is truncating the page?
> >> truncate_inode_page()
> >>   truncate_complete_page()
> >>     if (page_has_private(page))
> >>        do_invalidatepage()
> >>          ->nfs_invalidate_page()
> >
> > do_invalidate_page() is called before remove_from_page_cache(), so
> > page->mapping should still be set.
> Yes nfs_invalidate_page() happens before, but nfs_clear_commit_release()
> is called from rpc task after page was removed from page-cache.
> I've add following debug code in to nfs_clear_commit_release()
> + printk("page private index flags")
> + BUG_ON(!page->mapping);
> And have got following output:
> 
>  page:c5c790e0 private:f109b700  index:97656 fl:8000082c
>  ------------[ cut here ]------------
>  kernel BUG at fs/nfs/write.c:456!
>  invalid opcode: 0000 [#1] SMP 
>  last sysfs file: /sys/devices/pci0000:00/0000:00:1b.0/sound/card0/controlC0/uevent
>  Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc binfmt_misc kvm_intel kvm radeon ttm drm_kms_helper drm i2c_algo_bit quota_v2 quota_tree snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy thinkpad_acpi snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event arc4 snd_seq iwl3945 snd_timer iwlcore snd_seq_device iptable_filter tpm_tis snd pcmcia mac80211 yenta_socket soundcore ip_tables tpm led_class psmouse rsrc_nonstatic snd_page_alloc tpm_bios x_tables nvram serio_raw sierra cfg80211 pcmcia_core raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear intel_agp video output e1000e agpgart [last unloaded: nfs]
>  
>  Pid: 3646, comm: nfsiod Not tainted 2.6.33-rc4 #47 2623DDU/2623DDU
>  EIP: 0060:[<fc87ce57>] EFLAGS: 00010282 CPU: 0
>  EIP is at nfs_clear_request_commit+0xf7/0x100 [nfs]
>  EAX: 00000049 EBX: c5c790e0 ECX: c05a9a8f EDX: 05764000
>  ESI: f10d9c00 EDI: fc8968f8 EBP: c49fbebc ESP: c49fbea0
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>  Process nfsiod (pid: 3646, ti=c49fa000 task=f5e3d580 task.ti=c49fa000)
>  Stack:
>   fc89b544 c5c790e0 f109b700 00017d78 8000082c f109b700 f10d9c00 c49fbefc
>  <0> fc87cee8 00000002 00000001 00000000 c01bd123 00000046 c49fbf00 f5e3d580
>  <0> f10d9d28 f10d9d30 00000000 f10d9c00 f10d9c04 f10d9c00 fc8968f8 c49fbf04
>  Call Trace:
>   [<fc87cee8>] ? nfs_commit_release+0x88/0x1a0 [nfs]
>   [<c01bd123>] ? probe_workqueue_execution+0x33/0xa0
>   [<f83ebc43>] ? rpc_release_calldata+0x13/0x20 [sunrpc]
>   [<f83ebdc1>] ? rpc_free_task+0x41/0x70 [sunrpc]
>   [<c015c2c6>] ? worker_thread+0x136/0x300
>   [<f83ebea0>] ? rpc_async_release+0x10/0x20 [sunrpc]
>   [<c015c327>] ? worker_thread+0x197/0x300
>   [<c015c2c6>] ? worker_thread+0x136/0x300
>   [<f83ebe90>] ? rpc_async_release+0x0/0x20 [sunrpc]
>   [<c015ffb0>] ? autoremove_wake_function+0x0/0x40
>   [<c015c190>] ? worker_thread+0x0/0x300
>   [<c015fbd4>] ? kthread+0x74/0x80
>   [<c015fb60>] ? kthread+0x0/0x80
>   [<c010353a>] ? kernel_thread_helper+0x6/0x10
>  Code: 0b eb fe 0f 0b eb fe 8b 03 89 44 24 10 8b 43 14 89 44 24 0c 8b 43 0c 89 5c 24 04 89 44 24 08 c7 04 24 44 b5 89 fc e8 2b 95 d2 c3 <0f> 0b eb fe 0f 0b eb fe 90 55 89 e5 57 56 53 83 ec 2c 0f 1f 44 
>  EIP: [<fc87ce57>] nfs_clear_request_commit+0xf7/0x100 [nfs] SS:ESP 0068:c49fbea0
>  ---[ end trace a852f1835725d3b2 ]---

Hmm.... There is a known problem with a reference leak in
nfs_wb_page_cancel() (I've queued up a fix for 2.6.33 in the 'bugfixes'
branch of my git tree already). What happens when you apply the
following patch?

Cheers
   Trond
------------------------------------------------------------------------------------- 
NFS: Fix a reference leak in nfs_wb_cancel_page()

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfs/write.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d171696..dac8d76 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1541,6 +1541,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
 			break;
 		}
 		ret = nfs_wait_on_request(req);
+		nfs_release_request(req);
 		if (ret < 0)
 			goto out;
 	}



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 16:17         ` Trond Myklebust
@ 2010-02-02 16:47           ` Dmitry Monakhov
  2010-02-02 17:00             ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Monakhov @ 2010-02-02 16:47 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel Mailing List, linux-nfs

Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> On Tue, 2010-02-02 at 18:56 +0300, Dmitry Monakhov wrote: 
>> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> 
>> > On Tue, 2010-02-02 at 18:17 +0300, Dmitry Monakhov wrote: 
>> >> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> >> 
>> >> > On Tue, 2010-02-02 at 13:36 +0300, Dmitry Monakhov wrote:
>> >> >
>> >> > Something is wrong here. nfs_release_page() returns '0' if the 
>> >> > page has an associated write request (i.e. PagePrivate is set), and so
>> >> > both invalidate_complete_page() and invalidate_complete_page2() will
>> >> > fail.
>> >> >
>> >> > So what is truncating the page?
>> >> truncate_inode_page()
>> >>   truncate_complete_page()
>> >>     if (page_has_private(page))
>> >>        do_invalidatepage()
>> >>          ->nfs_invalidate_page()
>> >
>> > do_invalidate_page() is called before remove_from_page_cache(), so
>> > page->mapping should still be set.
>
> Hmm.... There is a known problem with a reference leak in
> nfs_wb_page_cancel() (I've queued up a fix for 2.6.33 in the 'bugfixes'
> branch of my git tree already). What happens when you apply the
> following patch?
The not helps, still get the same oops(log follows).
Have you tried my testcase?

 BUG: unable to handle kernel NULL pointer dereference at 00000040
 IP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs]
 *pde = 00000000 
 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
 last sysfs file: /sys/devices/platform/thinkpad_acpi/leds/tpacpi::thinkvantage/uevent
 Modules linked in: binfmt_misc quota_v2 quota_tree nfsd exportfs nfs lockd sunrpc iwl3945 thinkpad_acpi psmouse led_class serio_raw iwlcore nvram raid1 raid0 linear e1000e
 
 Pid: 1035, comm: nfsiod Not tainted 2.6.33-rc6 #60 2623DDU/2623DDU
 EIP: 0060:[<f80d415f>] EFLAGS: 00010296 CPU: 0
 EIP is at nfs_clear_request_commit+0x3f/0xb0 [nfs]
 EAX: 00000000 EBX: c2561d80 ECX: c06d3700 EDX: 00000014
 ESI: f69916c0 EDI: f80dab58 EBP: f6724ef4 ESP: f6724ee8
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 Process nfsiod (pid: 1035, ti=f6724000 task=f69dda90 task.ti=f6724000)
 Stack:
  c04df67d f69d7440 f69916c0 f6724f34 f80d4258 f6724f44 00001b0e 00000000
 <0> ffff799c 00000400 f505f000 94c0042a f69917e0 f69917e8 00000000 f69916c0
 <0> f69916c4 f69916c0 f80dab58 f6724f3c f8075703 f6724f58 f8075871 f6724f60
 Call Trace:
  [<c04df67d>] ? schedule+0x3ad/0xa30
  [<f80d4258>] ? nfs_commit_release+0x88/0x1a0 [nfs]
  [<f8075703>] ? rpc_release_calldata+0x13/0x20 [sunrpc]
  [<f8075871>] ? rpc_free_task+0x41/0x70 [sunrpc]
  [<c01acc4c>] ? probe_workqueue_execution+0x8c/0xd0
  [<f8075940>] ? rpc_async_release+0x10/0x20 [sunrpc]
  [<c015834d>] ? worker_thread+0x10d/0x210
  [<f8075930>] ? rpc_async_release+0x0/0x20 [sunrpc]
  [<c015bb10>] ? autoremove_wake_function+0x0/0x50
  [<c0158240>] ? worker_thread+0x0/0x210
  [<c015b724>] ? kthread+0x74/0x80
  [<c015b6b0>] ? kthread+0x0/0x80
  [<c0103546>] ? kernel_thread_helper+0x6/0x10
 Code: f0 0f ba 70 28 01 19 d2 31 c0 85 d2 75 0e 8b 5d f8 8b 75 fc 89 ec 5d c3 8d 74 26 00 89 d8 ba 10 00 00 00 e8 74 0e 10 c8 8b 43 10 <8b> 70 40 9c 5b fa e8 26 67 0d c8 8d 46 30 b9 ff ff ff ff 0f bd 
 EIP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs] SS:ESP 0068:f6724ee8
 CR2: 0000000000000040
 ---[ end trace 4bf8ee9d233ce744 ]---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 16:47           ` Dmitry Monakhov
@ 2010-02-02 17:00             ` Trond Myklebust
  2010-02-02 17:09               ` Dmitry Monakhov
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-02-02 17:00 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Linux Kernel Mailing List, linux-nfs

On Tue, 2010-02-02 at 19:47 +0300, Dmitry Monakhov wrote: 
> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> > Hmm.... There is a known problem with a reference leak in
> > nfs_wb_page_cancel() (I've queued up a fix for 2.6.33 in the 'bugfixes'
> > branch of my git tree already). What happens when you apply the
> > following patch?
> The not helps, still get the same oops(log follows).
> Have you tried my testcase?
> 
>  BUG: unable to handle kernel NULL pointer dereference at 00000040
>  IP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs]
>  *pde = 00000000 
>  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>  last sysfs file: /sys/devices/platform/thinkpad_acpi/leds/tpacpi::thinkvantage/uevent
>  Modules linked in: binfmt_misc quota_v2 quota_tree nfsd exportfs nfs lockd sunrpc iwl3945 thinkpad_acpi psmouse led_class serio_raw iwlcore nvram raid1 raid0 linear e1000e
>  
>  Pid: 1035, comm: nfsiod Not tainted 2.6.33-rc6 #60 2623DDU/2623DDU
>  EIP: 0060:[<f80d415f>] EFLAGS: 00010296 CPU: 0
>  EIP is at nfs_clear_request_commit+0x3f/0xb0 [nfs]
>  EAX: 00000000 EBX: c2561d80 ECX: c06d3700 EDX: 00000014
>  ESI: f69916c0 EDI: f80dab58 EBP: f6724ef4 ESP: f6724ee8
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>  Process nfsiod (pid: 1035, ti=f6724000 task=f69dda90 task.ti=f6724000)
>  Stack:
>   c04df67d f69d7440 f69916c0 f6724f34 f80d4258 f6724f44 00001b0e 00000000
>  <0> ffff799c 00000400 f505f000 94c0042a f69917e0 f69917e8 00000000 f69916c0
>  <0> f69916c4 f69916c0 f80dab58 f6724f3c f8075703 f6724f58 f8075871 f6724f60
>  Call Trace:
>   [<c04df67d>] ? schedule+0x3ad/0xa30
>   [<f80d4258>] ? nfs_commit_release+0x88/0x1a0 [nfs]
>   [<f8075703>] ? rpc_release_calldata+0x13/0x20 [sunrpc]
>   [<f8075871>] ? rpc_free_task+0x41/0x70 [sunrpc]
>   [<c01acc4c>] ? probe_workqueue_execution+0x8c/0xd0
>   [<f8075940>] ? rpc_async_release+0x10/0x20 [sunrpc]
>   [<c015834d>] ? worker_thread+0x10d/0x210
>   [<f8075930>] ? rpc_async_release+0x0/0x20 [sunrpc]
>   [<c015bb10>] ? autoremove_wake_function+0x0/0x50
>   [<c0158240>] ? worker_thread+0x0/0x210
>   [<c015b724>] ? kthread+0x74/0x80
>   [<c015b6b0>] ? kthread+0x0/0x80
>   [<c0103546>] ? kernel_thread_helper+0x6/0x10
>  Code: f0 0f ba 70 28 01 19 d2 31 c0 85 d2 75 0e 8b 5d f8 8b 75 fc 89 ec 5d c3 8d 74 26 00 89 d8 ba 10 00 00 00 e8 74 0e 10 c8 8b 43 10 <8b> 70 40 9c 5b fa e8 26 67 0d c8 8d 46 30 b9 ff ff ff ff 0f bd 
>  EIP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs] SS:ESP 0068:f6724ee8
>  CR2: 0000000000000040
>  ---[ end trace 4bf8ee9d233ce744 ]---

Yep. Looking more carefully at your test case, I don't see how
truncate_inode_page() can be involved at all. You are extending the file
using lseek(), not truncate(). So something else must be at work here.

I'll see if I can reproduce it.

Trond


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 17:00             ` Trond Myklebust
@ 2010-02-02 17:09               ` Dmitry Monakhov
  2010-02-02 19:54                 ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Monakhov @ 2010-02-02 17:09 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Kernel Mailing List, linux-nfs

Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> On Tue, 2010-02-02 at 19:47 +0300, Dmitry Monakhov wrote: 
>> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>> > Hmm.... There is a known problem with a reference leak in
>> > nfs_wb_page_cancel() (I've queued up a fix for 2.6.33 in the 'bugfixes'
>> > branch of my git tree already). What happens when you apply the
>> > following patch?
>> The not helps, still get the same oops(log follows).
>> Have you tried my testcase?
>> 
>>  BUG: unable to handle kernel NULL pointer dereference at 00000040
>>  IP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs]
>>  *pde = 00000000 
>>  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>>  last sysfs file: /sys/devices/platform/thinkpad_acpi/leds/tpacpi::thinkvantage/uevent
>>  Modules linked in: binfmt_misc quota_v2 quota_tree nfsd exportfs nfs lockd sunrpc iwl3945 thinkpad_acpi psmouse led_class serio_raw iwlcore nvram raid1 raid0 linear e1000e
>>  
>>  Pid: 1035, comm: nfsiod Not tainted 2.6.33-rc6 #60 2623DDU/2623DDU
>>  EIP: 0060:[<f80d415f>] EFLAGS: 00010296 CPU: 0
>>  EIP is at nfs_clear_request_commit+0x3f/0xb0 [nfs]
>>  EAX: 00000000 EBX: c2561d80 ECX: c06d3700 EDX: 00000014
>>  ESI: f69916c0 EDI: f80dab58 EBP: f6724ef4 ESP: f6724ee8
>>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>>  Process nfsiod (pid: 1035, ti=f6724000 task=f69dda90 task.ti=f6724000)
>>  Stack:
>>   c04df67d f69d7440 f69916c0 f6724f34 f80d4258 f6724f44 00001b0e 00000000
>>  <0> ffff799c 00000400 f505f000 94c0042a f69917e0 f69917e8 00000000 f69916c0
>>  <0> f69916c4 f69916c0 f80dab58 f6724f3c f8075703 f6724f58 f8075871 f6724f60
>>  Call Trace:
>>   [<c04df67d>] ? schedule+0x3ad/0xa30
>>   [<f80d4258>] ? nfs_commit_release+0x88/0x1a0 [nfs]
>>   [<f8075703>] ? rpc_release_calldata+0x13/0x20 [sunrpc]
>>   [<f8075871>] ? rpc_free_task+0x41/0x70 [sunrpc]
>>   [<c01acc4c>] ? probe_workqueue_execution+0x8c/0xd0
>>   [<f8075940>] ? rpc_async_release+0x10/0x20 [sunrpc]
>>   [<c015834d>] ? worker_thread+0x10d/0x210
>>   [<f8075930>] ? rpc_async_release+0x0/0x20 [sunrpc]
>>   [<c015bb10>] ? autoremove_wake_function+0x0/0x50
>>   [<c0158240>] ? worker_thread+0x0/0x210
>>   [<c015b724>] ? kthread+0x74/0x80
>>   [<c015b6b0>] ? kthread+0x0/0x80
>>   [<c0103546>] ? kernel_thread_helper+0x6/0x10
>>  Code: f0 0f ba 70 28 01 19 d2 31 c0 85 d2 75 0e 8b 5d f8 8b 75 fc 89 ec 5d c3 8d 74 26 00 89 d8 ba 10 00 00 00 e8 74 0e 10 c8 8b 43 10 <8b> 70 40 9c 5b fa e8 26 67 0d c8 8d 46 30 b9 ff ff ff ff 0f bd 
>>  EIP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs] SS:ESP 0068:f6724ee8
>>  CR2: 0000000000000040
>>  ---[ end trace 4bf8ee9d233ce744 ]---
>
> Yep. Looking more carefully at your test case, I don't see how
> truncate_inode_page() can be involved at all. You are extending the file
> using lseek(), not truncate(). So something else must be at work here.
open(,O_TRUNC,)
 do_filp_open()
  handle_truncate()
   do_truncate()
Yess this is craziness to run concurrent tasks which do:
open(,O_TRUNC,); mmap();
But initially i've done this by occasion and this result in OOps :)
>
> I'll see if I can reproduce it.
>
> Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 17:09               ` Dmitry Monakhov
@ 2010-02-02 19:54                 ` Trond Myklebust
  2010-02-02 20:19                   ` Chuck Lever
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-02-02 19:54 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Linux Kernel Mailing List, linux-nfs

On Tue, 2010-02-02 at 20:09 +0300, Dmitry Monakhov wrote: 
> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> 
> > On Tue, 2010-02-02 at 19:47 +0300, Dmitry Monakhov wrote: 
> >> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> >> > Hmm.... There is a known problem with a reference leak in
> >> > nfs_wb_page_cancel() (I've queued up a fix for 2.6.33 in the 'bugfixes'
> >> > branch of my git tree already). What happens when you apply the
> >> > following patch?
> >> The not helps, still get the same oops(log follows).
> >> Have you tried my testcase?
> >> 
> >>  BUG: unable to handle kernel NULL pointer dereference at 00000040
> >>  IP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs]
> >>  *pde = 00000000 
> >>  Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> >>  last sysfs file: /sys/devices/platform/thinkpad_acpi/leds/tpacpi::thinkvantage/uevent
> >>  Modules linked in: binfmt_misc quota_v2 quota_tree nfsd exportfs nfs lockd sunrpc iwl3945 thinkpad_acpi psmouse led_class serio_raw iwlcore nvram raid1 raid0 linear e1000e
> >>  
> >>  Pid: 1035, comm: nfsiod Not tainted 2.6.33-rc6 #60 2623DDU/2623DDU
> >>  EIP: 0060:[<f80d415f>] EFLAGS: 00010296 CPU: 0
> >>  EIP is at nfs_clear_request_commit+0x3f/0xb0 [nfs]
> >>  EAX: 00000000 EBX: c2561d80 ECX: c06d3700 EDX: 00000014
> >>  ESI: f69916c0 EDI: f80dab58 EBP: f6724ef4 ESP: f6724ee8
> >>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >>  Process nfsiod (pid: 1035, ti=f6724000 task=f69dda90 task.ti=f6724000)
> >>  Stack:
> >>   c04df67d f69d7440 f69916c0 f6724f34 f80d4258 f6724f44 00001b0e 00000000
> >>  <0> ffff799c 00000400 f505f000 94c0042a f69917e0 f69917e8 00000000 f69916c0
> >>  <0> f69916c4 f69916c0 f80dab58 f6724f3c f8075703 f6724f58 f8075871 f6724f60
> >>  Call Trace:
> >>   [<c04df67d>] ? schedule+0x3ad/0xa30
> >>   [<f80d4258>] ? nfs_commit_release+0x88/0x1a0 [nfs]
> >>   [<f8075703>] ? rpc_release_calldata+0x13/0x20 [sunrpc]
> >>   [<f8075871>] ? rpc_free_task+0x41/0x70 [sunrpc]
> >>   [<c01acc4c>] ? probe_workqueue_execution+0x8c/0xd0
> >>   [<f8075940>] ? rpc_async_release+0x10/0x20 [sunrpc]
> >>   [<c015834d>] ? worker_thread+0x10d/0x210
> >>   [<f8075930>] ? rpc_async_release+0x0/0x20 [sunrpc]
> >>   [<c015bb10>] ? autoremove_wake_function+0x0/0x50
> >>   [<c0158240>] ? worker_thread+0x0/0x210
> >>   [<c015b724>] ? kthread+0x74/0x80
> >>   [<c015b6b0>] ? kthread+0x0/0x80
> >>   [<c0103546>] ? kernel_thread_helper+0x6/0x10
> >>  Code: f0 0f ba 70 28 01 19 d2 31 c0 85 d2 75 0e 8b 5d f8 8b 75 fc 89 ec 5d c3 8d 74 26 00 89 d8 ba 10 00 00 00 e8 74 0e 10 c8 8b 43 10 <8b> 70 40 9c 5b fa e8 26 67 0d c8 8d 46 30 b9 ff ff ff ff 0f bd 
> >>  EIP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs] SS:ESP 0068:f6724ee8
> >>  CR2: 0000000000000040
> >>  ---[ end trace 4bf8ee9d233ce744 ]---
> >
> > Yep. Looking more carefully at your test case, I don't see how
> > truncate_inode_page() can be involved at all. You are extending the file
> > using lseek(), not truncate(). So something else must be at work here.
> open(,O_TRUNC,)
>  do_filp_open()
>   handle_truncate()
>    do_truncate()
> Yess this is craziness to run concurrent tasks which do:
> open(,O_TRUNC,); mmap();
> But initially i've done this by occasion and this result in OOps :)
> >
> > I'll see if I can reproduce it.
> >

OK. I haven't been able to reproduce your bug yet, but I think I see
what is happening.

Your 'kill -9' will occasionally hit nfs_wb_page_cancel() and cause it
to fail. When _that_ happens, then all hell breaks loose, because
mapping->a_ops->invalidatepage() is not allowed to fail.

Ugh... I don't think there much of an alternative to making
nfs_wait_on_request() uninterruptible. On the plus side, that does make
the behaviour of the NFS writeback code consistent with that of the VFS
layer (i.e. wait_on_page_writeback()).

So here goes...

Trond
---------------------------------------------------------------------------------------------- 
NFS: Fix an Oops when truncating a file

From: Trond Myklebust <Trond.Myklebust@netapp.com>

The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
Since the NFS code assumes that the page stays mapped for as long as the
writeback is active, we can end up Oopsing (among other things).

The only safe fix here is to convert nfs_wait_on_request(), so as to make
it uninterruptible (as is already the case with wait_on_page_writeback()).


Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/pagelist.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)


diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index e297593..a12c45b 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -176,6 +176,12 @@ void nfs_release_request(struct nfs_page *req)
 	kref_put(&req->wb_kref, nfs_free_request);
 }
 
+static int nfs_wait_bit_uninterruptible(void *word)
+{
+	io_schedule();
+	return 0;
+}
+
 /**
  * nfs_wait_on_request - Wait for a request to complete.
  * @req: request to wait upon.
@@ -186,14 +192,9 @@ void nfs_release_request(struct nfs_page *req)
 int
 nfs_wait_on_request(struct nfs_page *req)
 {
-	int ret = 0;
-
-	if (!test_bit(PG_BUSY, &req->wb_flags))
-		goto out;
-	ret = out_of_line_wait_on_bit(&req->wb_flags, PG_BUSY,
-			nfs_wait_bit_killable, TASK_KILLABLE);
-out:
-	return ret;
+	return wait_on_bit(&req->wb_flags, PG_BUSY,
+			nfs_wait_bit_uninterruptible,
+			TASK_UNINTERRUPTIBLE);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 19:54                 ` Trond Myklebust
@ 2010-02-02 20:19                   ` Chuck Lever
  2010-02-02 20:26                     ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2010-02-02 20:19 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Dmitry Monakhov, Linux Kernel Mailing List, linux-nfs

On Feb 2, 2010, at 2:54 PM, Trond Myklebust wrote:
> On Tue, 2010-02-02 at 20:09 +0300, Dmitry Monakhov wrote:
>> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>>
>>> On Tue, 2010-02-02 at 19:47 +0300, Dmitry Monakhov wrote:
>>>> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>>>>> Hmm.... There is a known problem with a reference leak in
>>>>> nfs_wb_page_cancel() (I've queued up a fix for 2.6.33 in the  
>>>>> 'bugfixes'
>>>>> branch of my git tree already). What happens when you apply the
>>>>> following patch?
>>>> The not helps, still get the same oops(log follows).
>>>> Have you tried my testcase?
>>>>
>>>> BUG: unable to handle kernel NULL pointer dereference at 00000040
>>>> IP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs]
>>>> *pde = 00000000
>>>> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>>>> last sysfs file: /sys/devices/platform/thinkpad_acpi/leds/ 
>>>> tpacpi::thinkvantage/uevent
>>>> Modules linked in: binfmt_misc quota_v2 quota_tree nfsd exportfs  
>>>> nfs lockd sunrpc iwl3945 thinkpad_acpi psmouse led_class  
>>>> serio_raw iwlcore nvram raid1 raid0 linear e1000e
>>>>
>>>> Pid: 1035, comm: nfsiod Not tainted 2.6.33-rc6 #60 2623DDU/2623DDU
>>>> EIP: 0060:[<f80d415f>] EFLAGS: 00010296 CPU: 0
>>>> EIP is at nfs_clear_request_commit+0x3f/0xb0 [nfs]
>>>> EAX: 00000000 EBX: c2561d80 ECX: c06d3700 EDX: 00000014
>>>> ESI: f69916c0 EDI: f80dab58 EBP: f6724ef4 ESP: f6724ee8
>>>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>>>> Process nfsiod (pid: 1035, ti=f6724000 task=f69dda90  
>>>> task.ti=f6724000)
>>>> Stack:
>>>>  c04df67d f69d7440 f69916c0 f6724f34 f80d4258 f6724f44 00001b0e  
>>>> 00000000
>>>> <0> ffff799c 00000400 f505f000 94c0042a f69917e0 f69917e8  
>>>> 00000000 f69916c0
>>>> <0> f69916c4 f69916c0 f80dab58 f6724f3c f8075703 f6724f58  
>>>> f8075871 f6724f60
>>>> Call Trace:
>>>>  [<c04df67d>] ? schedule+0x3ad/0xa30
>>>>  [<f80d4258>] ? nfs_commit_release+0x88/0x1a0 [nfs]
>>>>  [<f8075703>] ? rpc_release_calldata+0x13/0x20 [sunrpc]
>>>>  [<f8075871>] ? rpc_free_task+0x41/0x70 [sunrpc]
>>>>  [<c01acc4c>] ? probe_workqueue_execution+0x8c/0xd0
>>>>  [<f8075940>] ? rpc_async_release+0x10/0x20 [sunrpc]
>>>>  [<c015834d>] ? worker_thread+0x10d/0x210
>>>>  [<f8075930>] ? rpc_async_release+0x0/0x20 [sunrpc]
>>>>  [<c015bb10>] ? autoremove_wake_function+0x0/0x50
>>>>  [<c0158240>] ? worker_thread+0x0/0x210
>>>>  [<c015b724>] ? kthread+0x74/0x80
>>>>  [<c015b6b0>] ? kthread+0x0/0x80
>>>>  [<c0103546>] ? kernel_thread_helper+0x6/0x10
>>>> Code: f0 0f ba 70 28 01 19 d2 31 c0 85 d2 75 0e 8b 5d f8 8b 75 fc  
>>>> 89 ec 5d c3 8d 74 26 00 89 d8 ba 10 00 00 00 e8 74 0e 10 c8 8b 43  
>>>> 10 <8b> 70 40 9c 5b fa e8 26 67 0d c8 8d 46 30 b9 ff ff ff ff 0f bd
>>>> EIP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs] SS:ESP  
>>>> 0068:f6724ee8
>>>> CR2: 0000000000000040
>>>> ---[ end trace 4bf8ee9d233ce744 ]---
>>>
>>> Yep. Looking more carefully at your test case, I don't see how
>>> truncate_inode_page() can be involved at all. You are extending  
>>> the file
>>> using lseek(), not truncate(). So something else must be at work  
>>> here.
>> open(,O_TRUNC,)
>> do_filp_open()
>>  handle_truncate()
>>   do_truncate()
>> Yess this is craziness to run concurrent tasks which do:
>> open(,O_TRUNC,); mmap();
>> But initially i've done this by occasion and this result in OOps :)
>>>
>>> I'll see if I can reproduce it.
>>>
>
> OK. I haven't been able to reproduce your bug yet, but I think I see
> what is happening.
>
> Your 'kill -9' will occasionally hit nfs_wb_page_cancel() and cause it
> to fail. When _that_ happens, then all hell breaks loose, because
> mapping->a_ops->invalidatepage() is not allowed to fail.
>
> Ugh... I don't think there much of an alternative to making
> nfs_wait_on_request() uninterruptible. On the plus side, that does  
> make
> the behaviour of the NFS writeback code consistent with that of the  
> VFS
> layer (i.e. wait_on_page_writeback()).
>
> So here goes...
>
> Trond
> ----------------------------------------------------------------------------------------------
> NFS: Fix an Oops when truncating a file
>
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
> Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
> Since the NFS code assumes that the page stays mapped for as long as  
> the
> writeback is active, we can end up Oopsing (among other things).
>
> The only safe fix here is to convert nfs_wait_on_request(), so as to  
> make
> it uninterruptible (as is already the case with  
> wait_on_page_writeback()).

What happens when the server is unreachable while we're in  
nfs_wait_on_request?

> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> fs/nfs/pagelist.c |   17 +++++++++--------
> 1 files changed, 9 insertions(+), 8 deletions(-)
>
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index e297593..a12c45b 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -176,6 +176,12 @@ void nfs_release_request(struct nfs_page *req)
> 	kref_put(&req->wb_kref, nfs_free_request);
> }
>
> +static int nfs_wait_bit_uninterruptible(void *word)
> +{
> +	io_schedule();
> +	return 0;
> +}
> +
> /**
>  * nfs_wait_on_request - Wait for a request to complete.
>  * @req: request to wait upon.
> @@ -186,14 +192,9 @@ void nfs_release_request(struct nfs_page *req)
> int
> nfs_wait_on_request(struct nfs_page *req)
> {
> -	int ret = 0;
> -
> -	if (!test_bit(PG_BUSY, &req->wb_flags))
> -		goto out;
> -	ret = out_of_line_wait_on_bit(&req->wb_flags, PG_BUSY,
> -			nfs_wait_bit_killable, TASK_KILLABLE);
> -out:
> -	return ret;
> +	return wait_on_bit(&req->wb_flags, PG_BUSY,
> +			nfs_wait_bit_uninterruptible,
> +			TASK_UNINTERRUPTIBLE);
> }
>
> /**
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nfs: clear_commit_release incorrectly handle truncated page
  2010-02-02 20:19                   ` Chuck Lever
@ 2010-02-02 20:26                     ` Trond Myklebust
  0 siblings, 0 replies; 12+ messages in thread
From: Trond Myklebust @ 2010-02-02 20:26 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Dmitry Monakhov, Linux Kernel Mailing List, linux-nfs

On Tue, 2010-02-02 at 15:19 -0500, Chuck Lever wrote: 
> On Feb 2, 2010, at 2:54 PM, Trond Myklebust wrote:
> > On Tue, 2010-02-02 at 20:09 +0300, Dmitry Monakhov wrote:
> >> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> >>
> >>> On Tue, 2010-02-02 at 19:47 +0300, Dmitry Monakhov wrote:
> >>>> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> >>>>> Hmm.... There is a known problem with a reference leak in
> >>>>> nfs_wb_page_cancel() (I've queued up a fix for 2.6.33 in the  
> >>>>> 'bugfixes'
> >>>>> branch of my git tree already). What happens when you apply the
> >>>>> following patch?
> >>>> The not helps, still get the same oops(log follows).
> >>>> Have you tried my testcase?
> >>>>
> >>>> BUG: unable to handle kernel NULL pointer dereference at 00000040
> >>>> IP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs]
> >>>> *pde = 00000000
> >>>> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> >>>> last sysfs file: /sys/devices/platform/thinkpad_acpi/leds/ 
> >>>> tpacpi::thinkvantage/uevent
> >>>> Modules linked in: binfmt_misc quota_v2 quota_tree nfsd exportfs  
> >>>> nfs lockd sunrpc iwl3945 thinkpad_acpi psmouse led_class  
> >>>> serio_raw iwlcore nvram raid1 raid0 linear e1000e
> >>>>
> >>>> Pid: 1035, comm: nfsiod Not tainted 2.6.33-rc6 #60 2623DDU/2623DDU
> >>>> EIP: 0060:[<f80d415f>] EFLAGS: 00010296 CPU: 0
> >>>> EIP is at nfs_clear_request_commit+0x3f/0xb0 [nfs]
> >>>> EAX: 00000000 EBX: c2561d80 ECX: c06d3700 EDX: 00000014
> >>>> ESI: f69916c0 EDI: f80dab58 EBP: f6724ef4 ESP: f6724ee8
> >>>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >>>> Process nfsiod (pid: 1035, ti=f6724000 task=f69dda90  
> >>>> task.ti=f6724000)
> >>>> Stack:
> >>>>  c04df67d f69d7440 f69916c0 f6724f34 f80d4258 f6724f44 00001b0e  
> >>>> 00000000
> >>>> <0> ffff799c 00000400 f505f000 94c0042a f69917e0 f69917e8  
> >>>> 00000000 f69916c0
> >>>> <0> f69916c4 f69916c0 f80dab58 f6724f3c f8075703 f6724f58  
> >>>> f8075871 f6724f60
> >>>> Call Trace:
> >>>>  [<c04df67d>] ? schedule+0x3ad/0xa30
> >>>>  [<f80d4258>] ? nfs_commit_release+0x88/0x1a0 [nfs]
> >>>>  [<f8075703>] ? rpc_release_calldata+0x13/0x20 [sunrpc]
> >>>>  [<f8075871>] ? rpc_free_task+0x41/0x70 [sunrpc]
> >>>>  [<c01acc4c>] ? probe_workqueue_execution+0x8c/0xd0
> >>>>  [<f8075940>] ? rpc_async_release+0x10/0x20 [sunrpc]
> >>>>  [<c015834d>] ? worker_thread+0x10d/0x210
> >>>>  [<f8075930>] ? rpc_async_release+0x0/0x20 [sunrpc]
> >>>>  [<c015bb10>] ? autoremove_wake_function+0x0/0x50
> >>>>  [<c0158240>] ? worker_thread+0x0/0x210
> >>>>  [<c015b724>] ? kthread+0x74/0x80
> >>>>  [<c015b6b0>] ? kthread+0x0/0x80
> >>>>  [<c0103546>] ? kernel_thread_helper+0x6/0x10
> >>>> Code: f0 0f ba 70 28 01 19 d2 31 c0 85 d2 75 0e 8b 5d f8 8b 75 fc  
> >>>> 89 ec 5d c3 8d 74 26 00 89 d8 ba 10 00 00 00 e8 74 0e 10 c8 8b 43  
> >>>> 10 <8b> 70 40 9c 5b fa e8 26 67 0d c8 8d 46 30 b9 ff ff ff ff 0f bd
> >>>> EIP: [<f80d415f>] nfs_clear_request_commit+0x3f/0xb0 [nfs] SS:ESP  
> >>>> 0068:f6724ee8
> >>>> CR2: 0000000000000040
> >>>> ---[ end trace 4bf8ee9d233ce744 ]---
> >>>
> >>> Yep. Looking more carefully at your test case, I don't see how
> >>> truncate_inode_page() can be involved at all. You are extending  
> >>> the file
> >>> using lseek(), not truncate(). So something else must be at work  
> >>> here.
> >> open(,O_TRUNC,)
> >> do_filp_open()
> >>  handle_truncate()
> >>   do_truncate()
> >> Yess this is craziness to run concurrent tasks which do:
> >> open(,O_TRUNC,); mmap();
> >> But initially i've done this by occasion and this result in OOps :)
> >>>
> >>> I'll see if I can reproduce it.
> >>>
> >
> > OK. I haven't been able to reproduce your bug yet, but I think I see
> > what is happening.
> >
> > Your 'kill -9' will occasionally hit nfs_wb_page_cancel() and cause it
> > to fail. When _that_ happens, then all hell breaks loose, because
> > mapping->a_ops->invalidatepage() is not allowed to fail.
> >
> > Ugh... I don't think there much of an alternative to making
> > nfs_wait_on_request() uninterruptible. On the plus side, that does  
> > make
> > the behaviour of the NFS writeback code consistent with that of the  
> > VFS
> > layer (i.e. wait_on_page_writeback()).
> >
> > So here goes...
> >
> > Trond
> > ----------------------------------------------------------------------------------------------
> > NFS: Fix an Oops when truncating a file
> >
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> >
> > The VM/VFS does not allow mapping->a_ops->invalidatepage() to fail.
> > Unfortunately, nfs_wb_page_cancel() may fail if a fatal signal occurs.
> > Since the NFS code assumes that the page stays mapped for as long as  
> > the
> > writeback is active, we can end up Oopsing (among other things).
> >
> > The only safe fix here is to convert nfs_wait_on_request(), so as to  
> > make
> > it uninterruptible (as is already the case with  
> > wait_on_page_writeback()).
> 
> What happens when the server is unreachable while we're in  
> nfs_wait_on_request?

The same thing that happens if it is unreachable while we're in
wait_on_page_writeback. i.e. we hang until someone kills the RPC call
for us.

Trond


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-02-02 20:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-02 10:36 [PATCH] nfs: clear_commit_release incorrectly handle truncated page Dmitry Monakhov
2010-02-02 15:04 ` Trond Myklebust
2010-02-02 15:17   ` Dmitry Monakhov
2010-02-02 15:36     ` Trond Myklebust
2010-02-02 15:56       ` Dmitry Monakhov
2010-02-02 16:17         ` Trond Myklebust
2010-02-02 16:47           ` Dmitry Monakhov
2010-02-02 17:00             ` Trond Myklebust
2010-02-02 17:09               ` Dmitry Monakhov
2010-02-02 19:54                 ` Trond Myklebust
2010-02-02 20:19                   ` Chuck Lever
2010-02-02 20:26                     ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox