linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Odd NFS related SIGBUS (& possible fix)
@ 2010-09-29  4:33 Benjamin Herrenschmidt
  2010-09-29  7:44 ` Benjamin Herrenschmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-09-29  4:33 UTC (permalink / raw)
  To: Nick Piggin, Trond Myklebust
  Cc: linux-kernel@vger.kernel.org, Al Viro, linux-fsdevel

Hi Nick, Trond !

I've been tracking a problem on a heavily SMP machine here where running
LTP "mmapstress01" spawning 64 CPUs with /tmp over NFS causes some of
the tests to sigbus.

The test itself is a relatively boring mmap+fork hammering test.

What I've tracked down so far is that it seems to SIGBUS due to the
statement in nfs_vm_page_mkwrite()

	mapping = page->mapping;
	if (mapping != dentry->d_inode->i_mapping)
		goto out_unlock;

Which will then hit

	return VM_FAULT_SIGBUS;

Now, while I understand the validity of that test if the mapping indeed
-changed-, in the case I'm hitting it's been merely invalidated.

IE. page->mapping is NULL, as a result of something in NFS deciding to
go through one of the gazillion code path that invalidate mappings (in
this case, an mtime change on the server.

Now, I think -this- root cause is bogus and will need some separate
debugging, but regardless, I don't see why at this stage, page_mkwrite()
should cause a SIGBUS if the file has changed on the server, since we
have pushed our our dirty mappings afaik, and so all that tells is is
that we raced with the cache invalidation while the struct page wasn't
locked.

So I'm wondering if the right solution shouldn't be to replay the fault
in that case instead.

Now, I initially thought about returning 0; and hitting the following
code path in __do_fault() but...

				if (unlikely(!(tmp & VM_FAULT_LOCKED))) {
					lock_page(page);
					if (!page->mapping) {
						ret = 0; /* retry the fault */
						unlock_page(page);
						goto unwritable_page;
					}

 ... I'm not too happy about it and I'll need Nick insight here. The thing
is that to hit there, I need to unlock the page first. That means page->mapping
can change, and thus no longer be NULL by the time we get there, in which case
it doesn't sound right at all to move on and make the page writable, which
the code would do. Or am I missing something ?

So my preferred fix, if I'm indeed right and this is a real bug, would be
to do something in nfs_vm_page_mkwrite() along the lines of:

 	lock_page(page);
 	mapping = page->mapping;
-	if (mapping != dentry->d_inode->i_mapping)
+ 	if (mapping != dentry->d_inode->i_mapping) {
+		if (!mapping)
+			ret = 0;
 		goto out_unlock;
+	}

Or am I missing something ?

Now regarding the other bug, unless Trond has an idea already, I think I'll start
a separate email thread once I've collected more data. I -think- it invalidates it
because it sees a the server mtime that is more recent than the inode, but the
server shouldn't be touching at files, so I suspect we get confused somewhere in
the kernel and I don't know why yet (the code path inside NFS aren't obvious to
me at this stage).

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-10-18 23:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-29  4:33 Odd NFS related SIGBUS (& possible fix) Benjamin Herrenschmidt
2010-09-29  7:44 ` Benjamin Herrenschmidt
2010-10-01  5:57 ` Benjamin Herrenschmidt
2010-10-01  6:01   ` Benjamin Herrenschmidt
2010-10-01 17:18   ` J. Bruce Fields
2010-10-01 18:12 ` Trond Myklebust
2010-10-01 18:35   ` Trond Myklebust
2010-10-01 20:57     ` Benjamin Herrenschmidt
2010-10-18 23:43       ` Benjamin Herrenschmidt
2010-10-01 20:53   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).