Odd NFS related SIGBUS (& possible fix)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Odd NFS related SIGBUS (& possible fix)
@ 2010-09-29  4:33 Benjamin Herrenschmidt
  2010-09-29  7:44 ` Benjamin Herrenschmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-09-29  4:33 UTC (permalink / raw)
  To: Nick Piggin, Trond Myklebust
  Cc: linux-kernel@vger.kernel.org, Al Viro, linux-fsdevel

Hi Nick, Trond !

I've been tracking a problem on a heavily SMP machine here where running
LTP "mmapstress01" spawning 64 CPUs with /tmp over NFS causes some of
the tests to sigbus.

The test itself is a relatively boring mmap+fork hammering test.

What I've tracked down so far is that it seems to SIGBUS due to the
statement in nfs_vm_page_mkwrite()

	mapping = page->mapping;
	if (mapping != dentry->d_inode->i_mapping)
		goto out_unlock;

Which will then hit

	return VM_FAULT_SIGBUS;

Now, while I understand the validity of that test if the mapping indeed
-changed-, in the case I'm hitting it's been merely invalidated.

IE. page->mapping is NULL, as a result of something in NFS deciding to
go through one of the gazillion code path that invalidate mappings (in
this case, an mtime change on the server.

Now, I think -this- root cause is bogus and will need some separate
debugging, but regardless, I don't see why at this stage, page_mkwrite()
should cause a SIGBUS if the file has changed on the server, since we
have pushed our our dirty mappings afaik, and so all that tells is is
that we raced with the cache invalidation while the struct page wasn't
locked.

So I'm wondering if the right solution shouldn't be to replay the fault
in that case instead.

Now, I initially thought about returning 0; and hitting the following
code path in __do_fault() but...

				if (unlikely(!(tmp & VM_FAULT_LOCKED))) {
					lock_page(page);
					if (!page->mapping) {
						ret = 0; /* retry the fault */
						unlock_page(page);
						goto unwritable_page;
					}

 ... I'm not too happy about it and I'll need Nick insight here. The thing
is that to hit there, I need to unlock the page first. That means page->mapping
can change, and thus no longer be NULL by the time we get there, in which case
it doesn't sound right at all to move on and make the page writable, which
the code would do. Or am I missing something ?

So my preferred fix, if I'm indeed right and this is a real bug, would be
to do something in nfs_vm_page_mkwrite() along the lines of:

 	lock_page(page);
 	mapping = page->mapping;
-	if (mapping != dentry->d_inode->i_mapping)
+ 	if (mapping != dentry->d_inode->i_mapping) {
+		if (!mapping)
+			ret = 0;
 		goto out_unlock;
+	}

Or am I missing something ?

Now regarding the other bug, unless Trond has an idea already, I think I'll start
a separate email thread once I've collected more data. I -think- it invalidates it
because it sees a the server mtime that is more recent than the inode, but the
server shouldn't be touching at files, so I suspect we get confused somewhere in
the kernel and I don't know why yet (the code path inside NFS aren't obvious to
me at this stage).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-09-29  4:33 Odd NFS related SIGBUS (& possible fix) Benjamin Herrenschmidt
@ 2010-09-29  7:44 ` Benjamin Herrenschmidt
  2010-10-01  5:57 ` Benjamin Herrenschmidt
  2010-10-01 18:12 ` Trond Myklebust
  2 siblings, 0 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-09-29  7:44 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Trond Myklebust, linux-kernel@vger.kernel.org, Al Viro,
	linux-fsdevel

On Wed, 2010-09-29 at 14:33 +1000, Benjamin Herrenschmidt wrote:

> Now regarding the other bug, unless Trond has an idea already, I think I'll start
> a separate email thread once I've collected more data. I -think- it invalidates it
> because it sees a the server mtime that is more recent than the inode, but the
> server shouldn't be touching at files, so I suspect we get confused somewhere in
> the kernel and I don't know why yet (the code path inside NFS aren't obvious to
> me at this stage).

Ok, so that doesn't happen with current upstream (we were on some
2.6.32.16 based kernel here). I'll figure that out from there.

However, I still think there's a potential erroneous sigbus in the NFS
page_mkwrite() code path unless I'm mistaken.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-09-29  4:33 Odd NFS related SIGBUS (& possible fix) Benjamin Herrenschmidt
  2010-09-29  7:44 ` Benjamin Herrenschmidt
@ 2010-10-01  5:57 ` Benjamin Herrenschmidt
  2010-10-01  6:01   ` Benjamin Herrenschmidt
  2010-10-01 17:18   ` J. Bruce Fields
  2010-10-01 18:12 ` Trond Myklebust
  2 siblings, 2 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-01  5:57 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Trond Myklebust,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2010-09-29 at 14:33 +1000, Benjamin Herrenschmidt wrote:
> Hi Nick, Trond !

(adding linux-nfs list and some more data)

> Now regarding the other bug, unless Trond has an idea already, I think I'll start
> a separate email thread once I've collected more data. I -think- it invalidates it
> because it sees a the server mtime that is more recent than the inode, but the
> server shouldn't be touching at files, so I suspect we get confused somewhere in
> the kernel and I don't know why yet (the code path inside NFS aren't obvious to
> me at this stage).

Now that one is hell...

So it seems to be that depending on the kernel and the machine, it
varies between

 - Fails often
 - Fails sometimes
 - Doesn't fail
 - Test crashes the kernel

Basically, "fails" here means I see the sigbus when running the test
case (I'll write details down below for those who didn't follow), so far
on Power7.

So from what I can tell, Trond's patch acdc53b2... (Replace
__nfs_write_mapping with sync_inode()) causes the problem to go from
"fails often" to "fails sometimes".

So for some reason, we seem to be taking a lot less of invalidations
from the server, and thus hitting that race I think I found (see
previous email) a lot less often. Whether the remaining invalidations
are "legit" or themselves the sign of something wrong, I don't know for
sure.

(Reminder: On 2.6.32.16, where we initially experienced the problem, I
found out that we get a -lot- of invalidations coming due to server
mtime being more recent than the local mtime, which is very very odd
considering that the local machine is the only one to ever modify the
files)

Now, I haven't managed to reproduce the failure on 2.6.34 with an x86_64
24 way machine.

Note: On that machine, kernel is whatever's in debian, unfortunately
this isn't a crashbox, I'm looking at getting an x86_64 one to test
other kernels as we speak.

However, that machine exhibit a different problem which is while running
the test case, ctrl-C doesn't work ... ctrl-Z does tho and you can then
kill it, very odd.

If I try current Linus upstream as of today on the Power7 box, I
experience something slightly similar... some echo happens when I
ctrl-C ... and the entire machine locks up. If I use an nmi to get a
stack dump of all processors, I observe 63 of them idle and one there:

[c0000000114cb5b0] c0000000001548c0 .page_mkclean+0x238/0x2c4
[c0000000114cb6f0] c00000000012e468 .clear_page_dirty_for_io+0xa0/0x1a8
[c0000000114cb780] c0000000002f30f4 .nfs_wb_page+0x90/0x100
[c0000000114cb860] c0000000002f3890 .nfs_flush_incompatible+0xc0/0xf0
[c0000000114cb900] c0000000002dff00 .nfs_vm_page_mkwrite+0x170/0x1a4
[c0000000114cb9a0] c000000000146e0c .do_wp_page+0x294/0x9a0
[c0000000114cbaa0] c000000000148468 .handle_mm_fault+0x9b8/0xa78
[c0000000114cbb90] c0000000006b1aac .do_page_fault+0x428/0x6ac
[c0000000114cbe30] c0000000000051e0 handle_page_fault+0x20/0x74

(roughly, I don't think it stays in page_mkclean, I -think- I've seen it
up one level but I'm not 100% sure, most of the time it's somewhere in
there tho).

The machine stops making fwd progress and doesn't echo on the console
anymore.

I actually just reproduced the lockup on x86_64 with 4 CPUs using
mmapstress01 -p 5 -t 1.3 -f 4096.

So the trick is to mount /tmp over nfs (so that the mmap'ed file ends up
on nfs) and run mmapstress01 as above. Then try ctrl-C it.

I'll do a bz for that one.

I'm now going to try going back kernel versions to see if I can also
reproduce the SIGBUS or other timing issues.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-10-01  5:57 ` Benjamin Herrenschmidt
@ 2010-10-01  6:01   ` Benjamin Herrenschmidt
  2010-10-01 17:18   ` J. Bruce Fields
  1 sibling, 0 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-01  6:01 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Trond Myklebust,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Fri, 2010-10-01 at 15:57 +1000, Benjamin Herrenschmidt wrote:

> I actually just reproduced the lockup on x86_64 with 4 CPUs using
> mmapstress01 -p 5 -t 1.3 -f 4096.
> 
> So the trick is to mount /tmp over nfs (so that the mmap'ed file ends up
> on nfs) and run mmapstress01 as above. Then try ctrl-C it.
> 
> I'll do a bz for that one.
> 
> I'm now going to try going back kernel versions to see if I can also
> reproduce the SIGBUS or other timing issues.

So on x86 I didn't get a backtrace, but the VGA console was still
responsive somewhat, and I saw mmapstress01 there using 100% CPU and
unresponsive to kill -9 (stays in state R).

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-10-01  5:57 ` Benjamin Herrenschmidt
  2010-10-01  6:01   ` Benjamin Herrenschmidt
@ 2010-10-01 17:18   ` J. Bruce Fields
  1 sibling, 0 replies; 10+ messages in thread
From: J. Bruce Fields @ 2010-10-01 17:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Nick Piggin, Trond Myklebust, linux-kernel@vger.kernel.org,
	Al Viro, linux-fsdevel, linux-nfs

On Fri, Oct 01, 2010 at 03:57:47PM +1000, Benjamin Herrenschmidt wrote:
> (Reminder: On 2.6.32.16, where we initially experienced the problem, I
> found out that we get a -lot- of invalidations coming due to server
> mtime being more recent than the local mtime, which is very very odd
> considering that the local machine is the only one to ever modify the
> files)

Possibly also related?:

	https://bugzilla.redhat.com/show_bug.cgi?id=636590

We see what seem to be cache invalidations when all we're doing is
something like

	open
	write
	close
	open
	read

from a single thread on a single client.  This happens over a variety of
protocol and server versions.

--b.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-09-29  4:33 Odd NFS related SIGBUS (& possible fix) Benjamin Herrenschmidt
  2010-09-29  7:44 ` Benjamin Herrenschmidt
  2010-10-01  5:57 ` Benjamin Herrenschmidt
@ 2010-10-01 18:12 ` Trond Myklebust
  2010-10-01 18:35   ` Trond Myklebust
  2010-10-01 20:53   ` Benjamin Herrenschmidt
  2 siblings, 2 replies; 10+ messages in thread
From: Trond Myklebust @ 2010-10-01 18:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Nick Piggin, linux-kernel@vger.kernel.org, Al Viro, linux-fsdevel

On Wed, 2010-09-29 at 14:33 +1000, Benjamin Herrenschmidt wrote:
> Hi Nick, Trond !
> 
> I've been tracking a problem on a heavily SMP machine here where running
> LTP "mmapstress01" spawning 64 CPUs with /tmp over NFS causes some of
> the tests to sigbus.
> 
> The test itself is a relatively boring mmap+fork hammering test.
> 
> What I've tracked down so far is that it seems to SIGBUS due to the
> statement in nfs_vm_page_mkwrite()
> 
> 	mapping = page->mapping;
> 	if (mapping != dentry->d_inode->i_mapping)
> 		goto out_unlock;
> 
> Which will then hit
> 
> 	return VM_FAULT_SIGBUS;
> 
> Now, while I understand the validity of that test if the mapping indeed
> -changed-, in the case I'm hitting it's been merely invalidated.
> 
> IE. page->mapping is NULL, as a result of something in NFS deciding to
> go through one of the gazillion code path that invalidate mappings (in
> this case, an mtime change on the server.
> 
> Now, I think -this- root cause is bogus and will need some separate
> debugging, but regardless, I don't see why at this stage, page_mkwrite()
> should cause a SIGBUS if the file has changed on the server, since we
> have pushed our our dirty mappings afaik, and so all that tells is is
> that we raced with the cache invalidation while the struct page wasn't
> locked.
> 
> So I'm wondering if the right solution shouldn't be to replay the fault
> in that case instead.
> 
> Now, I initially thought about returning 0; and hitting the following
> code path in __do_fault() but...
> 
> 				if (unlikely(!(tmp & VM_FAULT_LOCKED))) {
> 					lock_page(page);
> 					if (!page->mapping) {
> 						ret = 0; /* retry the fault */
> 						unlock_page(page);
> 						goto unwritable_page;
> 					}
> 
>  ... I'm not too happy about it and I'll need Nick insight here. The thing
> is that to hit there, I need to unlock the page first. That means page->mapping
> can change, and thus no longer be NULL by the time we get there, in which case
> it doesn't sound right at all to move on and make the page writable, which
> the code would do. Or am I missing something ?
> 
> So my preferred fix, if I'm indeed right and this is a real bug, would be
> to do something in nfs_vm_page_mkwrite() along the lines of:
> 
>  	lock_page(page);
>  	mapping = page->mapping;
> -	if (mapping != dentry->d_inode->i_mapping)
> + 	if (mapping != dentry->d_inode->i_mapping) {
> +		if (!mapping)
> +			ret = 0;
>  		goto out_unlock;
> +	}
> 
> Or am I missing something ?
> 
> Now regarding the other bug, unless Trond has an idea already, I think I'll start
> a separate email thread once I've collected more data. I -think- it invalidates it
> because it sees a the server mtime that is more recent than the inode, but the
> server shouldn't be touching at files, so I suspect we get confused somewhere in
> the kernel and I don't know why yet (the code path inside NFS aren't obvious to
> me at this stage).

Hi Ben,

We do want the code to be robust w.r.t. cache invalidations. In NFS,
those can happen all the time due to the unordered nature of RPC calls
and the piss-poor mtime/ctime resolution on the exported filesystems on
most Linux servers in particular.

e.g. against most Linux servers, if I'm sending out 2 WRITE requests in
parallel that both append data to the file, I can have the 2 replies
come back with the same ctime/mtime, but with different file lengths. If
those replies are processed in a different order on the client than on
the server (not necessarily due to any client error - the server may
simply have sent the replies in the wrong order), then we may end up
with an incorrect estimate of the file length, and so may be forced to
assume that something weird has happened to the file on the server.

However, it looks to me as if the right thing to do when the
page->mapping has changed would be to do the same thing as
block_page_mkwrite(), and just return VM_FAULT_NOPAGE so that the VM can
retry the fault.
IMO: We should only SIGBUS if the calls to nfs_flush_incompatible()
and/or nfs_updatepage() fail.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-10-01 18:12 ` Trond Myklebust
@ 2010-10-01 18:35   ` Trond Myklebust
  2010-10-01 20:57     ` Benjamin Herrenschmidt
  2010-10-01 20:53   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2010-10-01 18:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Nick Piggin, linux-kernel@vger.kernel.org, Al Viro, linux-fsdevel

On Fri, 2010-10-01 at 14:12 -0400, Trond Myklebust wrote:
> However, it looks to me as if the right thing to do when the
> page->mapping has changed would be to do the same thing as
> block_page_mkwrite(), and just return VM_FAULT_NOPAGE so that the VM can
> retry the fault.
> IMO: We should only SIGBUS if the calls to nfs_flush_incompatible()
> and/or nfs_updatepage() fail.
> 
> Cheers
>   Trond

IOW: Something like the following patch.

Cheers
  Trond

---------------------------------------------------------------------------
NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidation

From: Trond Myklebust <Trond.Myklebust@netapp.com>

In the case where we lock the page, and then find out that the page has
been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
This is what block_page_mkwrite() does in these situations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/file.c |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)


diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 05bf3c0..6d95e24 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -551,7 +551,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct file *filp = vma->vm_file;
 	struct dentry *dentry = filp->f_path.dentry;
 	unsigned pagelen;
-	int ret = -EINVAL;
+	int ret = VM_FAULT_NOPAGE;
 	struct address_space *mapping;
 
 	dfprintk(PAGECACHE, "NFS: vm_page_mkwrite(%s/%s(%ld), offset %lld)\n",
@@ -567,21 +567,20 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (mapping != dentry->d_inode->i_mapping)
 		goto out_unlock;
 
-	ret = 0;
 	pagelen = nfs_page_length(page);
 	if (pagelen == 0)
 		goto out_unlock;
 
-	ret = nfs_flush_incompatible(filp, page);
-	if (ret != 0)
-		goto out_unlock;
+	ret = VM_FAULT_LOCKED;
+	if (nfs_flush_incompatible(filp, page) == 0 &&
+	    nfs_updatepage(filp, page, 0, pagelen) == 0)
+		goto out;
 
-	ret = nfs_updatepage(filp, page, 0, pagelen);
+	ret = VM_FAULT_SIGBUS;
 out_unlock:
-	if (!ret)
-		return VM_FAULT_LOCKED;
 	unlock_page(page);
-	return VM_FAULT_SIGBUS;
+out:
+	return ret;
 }
 
 static const struct vm_operations_struct nfs_file_vm_ops = {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-10-01 18:35   ` Trond Myklebust
@ 2010-10-01 20:57     ` Benjamin Herrenschmidt
  2010-10-18 23:43       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-01 20:57 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Nick Piggin, linux-kernel@vger.kernel.org, Al Viro, linux-fsdevel

On Fri, 2010-10-01 at 14:35 -0400, Trond Myklebust wrote:
> On Fri, 2010-10-01 at 14:12 -0400, Trond Myklebust wrote:
> > However, it looks to me as if the right thing to do when the
> > page->mapping has changed would be to do the same thing as
> > block_page_mkwrite(), and just return VM_FAULT_NOPAGE so that the VM can
> > retry the fault.
> > IMO: We should only SIGBUS if the calls to nfs_flush_incompatible()
> > and/or nfs_updatepage() fail.
> > 
> > Cheers
> >   Trond
> 
> IOW: Something like the following patch.

Thanks, I'll try it but it will have to wait about 10 days as I'm off on
vacation.

I'll see if somebody here can do some tests while I'm away

Cheers,
Ben.

> Cheers
>   Trond
> 
> ---------------------------------------------------------------------------
> NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidation
> 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> In the case where we lock the page, and then find out that the page has
> been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
> This is what block_page_mkwrite() does in these situations.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> 
>  fs/nfs/file.c |   17 ++++++++---------
>  1 files changed, 8 insertions(+), 9 deletions(-)
> 
> 
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 05bf3c0..6d95e24 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -551,7 +551,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	struct file *filp = vma->vm_file;
>  	struct dentry *dentry = filp->f_path.dentry;
>  	unsigned pagelen;
> -	int ret = -EINVAL;
> +	int ret = VM_FAULT_NOPAGE;
>  	struct address_space *mapping;
>  
>  	dfprintk(PAGECACHE, "NFS: vm_page_mkwrite(%s/%s(%ld), offset %lld)\n",
> @@ -567,21 +567,20 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	if (mapping != dentry->d_inode->i_mapping)
>  		goto out_unlock;
>  
> -	ret = 0;
>  	pagelen = nfs_page_length(page);
>  	if (pagelen == 0)
>  		goto out_unlock;
>  
> -	ret = nfs_flush_incompatible(filp, page);
> -	if (ret != 0)
> -		goto out_unlock;
> +	ret = VM_FAULT_LOCKED;
> +	if (nfs_flush_incompatible(filp, page) == 0 &&
> +	    nfs_updatepage(filp, page, 0, pagelen) == 0)
> +		goto out;
>  
> -	ret = nfs_updatepage(filp, page, 0, pagelen);
> +	ret = VM_FAULT_SIGBUS;
>  out_unlock:
> -	if (!ret)
> -		return VM_FAULT_LOCKED;
>  	unlock_page(page);
> -	return VM_FAULT_SIGBUS;
> +out:
> +	return ret;
>  }
>  
>  static const struct vm_operations_struct nfs_file_vm_ops = {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-10-01 20:57     ` Benjamin Herrenschmidt
@ 2010-10-18 23:43       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-18 23:43 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Nick Piggin, linux-kernel@vger.kernel.org, Al Viro, linux-fsdevel

On Sat, 2010-10-02 at 06:57 +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-10-01 at 14:35 -0400, Trond Myklebust wrote:
> > On Fri, 2010-10-01 at 14:12 -0400, Trond Myklebust wrote:
> > > However, it looks to me as if the right thing to do when the
> > > page->mapping has changed would be to do the same thing as
> > > block_page_mkwrite(), and just return VM_FAULT_NOPAGE so that the VM can
> > > retry the fault.
> > > IMO: We should only SIGBUS if the calls to nfs_flush_incompatible()
> > > and/or nfs_updatepage() fail.
> > > 
> > > Cheers
> > >   Trond
> > 
> > IOW: Something like the following patch.
> 
> Thanks, I'll try it but it will have to wait about 10 days as I'm off on
> vacation.
> 
> I'll see if somebody here can do some tests while I'm away

BTW. I got some reports from testers that the patch fixes the issue.

Thanks !

Cheers,
Ben.

> Cheers,
> Ben.
> 
> > Cheers
> >   Trond
> > 
> > ---------------------------------------------------------------------------
> > NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidation
> > 
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > 
> > In the case where we lock the page, and then find out that the page has
> > been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
> > This is what block_page_mkwrite() does in these situations.
> > 
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > ---
> > 
> >  fs/nfs/file.c |   17 ++++++++---------
> >  1 files changed, 8 insertions(+), 9 deletions(-)
> > 
> > 
> > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> > index 05bf3c0..6d95e24 100644
> > --- a/fs/nfs/file.c
> > +++ b/fs/nfs/file.c
> > @@ -551,7 +551,7 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  	struct file *filp = vma->vm_file;
> >  	struct dentry *dentry = filp->f_path.dentry;
> >  	unsigned pagelen;
> > -	int ret = -EINVAL;
> > +	int ret = VM_FAULT_NOPAGE;
> >  	struct address_space *mapping;
> >  
> >  	dfprintk(PAGECACHE, "NFS: vm_page_mkwrite(%s/%s(%ld), offset %lld)\n",
> > @@ -567,21 +567,20 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  	if (mapping != dentry->d_inode->i_mapping)
> >  		goto out_unlock;
> >  
> > -	ret = 0;
> >  	pagelen = nfs_page_length(page);
> >  	if (pagelen == 0)
> >  		goto out_unlock;
> >  
> > -	ret = nfs_flush_incompatible(filp, page);
> > -	if (ret != 0)
> > -		goto out_unlock;
> > +	ret = VM_FAULT_LOCKED;
> > +	if (nfs_flush_incompatible(filp, page) == 0 &&
> > +	    nfs_updatepage(filp, page, 0, pagelen) == 0)
> > +		goto out;
> >  
> > -	ret = nfs_updatepage(filp, page, 0, pagelen);
> > +	ret = VM_FAULT_SIGBUS;
> >  out_unlock:
> > -	if (!ret)
> > -		return VM_FAULT_LOCKED;
> >  	unlock_page(page);
> > -	return VM_FAULT_SIGBUS;
> > +out:
> > +	return ret;
> >  }
> >  
> >  static const struct vm_operations_struct nfs_file_vm_ops = {
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Odd NFS related SIGBUS (& possible fix)
  2010-10-01 18:12 ` Trond Myklebust
  2010-10-01 18:35   ` Trond Myklebust
@ 2010-10-01 20:53   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-01 20:53 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Nick Piggin, linux-kernel@vger.kernel.org, Al Viro, linux-fsdevel

On Fri, 2010-10-01 at 14:12 -0400, Trond Myklebust wrote:
> 
> However, it looks to me as if the right thing to do when the
> page->mapping has changed would be to do the same thing as
> block_page_mkwrite(), and just return VM_FAULT_NOPAGE so that the VM
> can
> retry the fault.
> IMO: We should only SIGBUS if the calls to nfs_flush_incompatible()
> and/or nfs_updatepage() fail.

Well, other filesystems seem to think that if the mapping -changed-,
SIGBUS is a good idea... But they don't have to deal with invalidations
wiping mappings in the background.

So that's why I was thinking about singling out the "mapping became
NULL" case and keep the SIGBUS for when the mapping became something
else... but that's your call really :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-10-18 23:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-29  4:33 Odd NFS related SIGBUS (& possible fix) Benjamin Herrenschmidt
2010-09-29  7:44 ` Benjamin Herrenschmidt
2010-10-01  5:57 ` Benjamin Herrenschmidt
2010-10-01  6:01   ` Benjamin Herrenschmidt
2010-10-01 17:18   ` J. Bruce Fields
2010-10-01 18:12 ` Trond Myklebust
2010-10-01 18:35   ` Trond Myklebust
2010-10-01 20:57     ` Benjamin Herrenschmidt
2010-10-18 23:43       ` Benjamin Herrenschmidt
2010-10-01 20:53   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).