linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Nick Piggin <npiggin@kernel.dk>,
	Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	linux-fsdevel@vger.kernel.org
Subject: Odd NFS related SIGBUS (& possible fix)
Date: Wed, 29 Sep 2010 14:33:45 +1000	[thread overview]
Message-ID: <1285734825.14081.87.camel@pasglop> (raw)

Hi Nick, Trond !

I've been tracking a problem on a heavily SMP machine here where running
LTP "mmapstress01" spawning 64 CPUs with /tmp over NFS causes some of
the tests to sigbus.

The test itself is a relatively boring mmap+fork hammering test.

What I've tracked down so far is that it seems to SIGBUS due to the
statement in nfs_vm_page_mkwrite()

	mapping = page->mapping;
	if (mapping != dentry->d_inode->i_mapping)
		goto out_unlock;

Which will then hit

	return VM_FAULT_SIGBUS;

Now, while I understand the validity of that test if the mapping indeed
-changed-, in the case I'm hitting it's been merely invalidated.

IE. page->mapping is NULL, as a result of something in NFS deciding to
go through one of the gazillion code path that invalidate mappings (in
this case, an mtime change on the server.

Now, I think -this- root cause is bogus and will need some separate
debugging, but regardless, I don't see why at this stage, page_mkwrite()
should cause a SIGBUS if the file has changed on the server, since we
have pushed our our dirty mappings afaik, and so all that tells is is
that we raced with the cache invalidation while the struct page wasn't
locked.

So I'm wondering if the right solution shouldn't be to replay the fault
in that case instead.

Now, I initially thought about returning 0; and hitting the following
code path in __do_fault() but...

				if (unlikely(!(tmp & VM_FAULT_LOCKED))) {
					lock_page(page);
					if (!page->mapping) {
						ret = 0; /* retry the fault */
						unlock_page(page);
						goto unwritable_page;
					}

 ... I'm not too happy about it and I'll need Nick insight here. The thing
is that to hit there, I need to unlock the page first. That means page->mapping
can change, and thus no longer be NULL by the time we get there, in which case
it doesn't sound right at all to move on and make the page writable, which
the code would do. Or am I missing something ?

So my preferred fix, if I'm indeed right and this is a real bug, would be
to do something in nfs_vm_page_mkwrite() along the lines of:

 	lock_page(page);
 	mapping = page->mapping;
-	if (mapping != dentry->d_inode->i_mapping)
+ 	if (mapping != dentry->d_inode->i_mapping) {
+		if (!mapping)
+			ret = 0;
 		goto out_unlock;
+	}

Or am I missing something ?

Now regarding the other bug, unless Trond has an idea already, I think I'll start
a separate email thread once I've collected more data. I -think- it invalidates it
because it sees a the server mtime that is more recent than the inode, but the
server shouldn't be touching at files, so I suspect we get confused somewhere in
the kernel and I don't know why yet (the code path inside NFS aren't obvious to
me at this stage).

Cheers,
Ben.



             reply	other threads:[~2010-09-29  4:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-29  4:33 Benjamin Herrenschmidt [this message]
2010-09-29  7:44 ` Odd NFS related SIGBUS (& possible fix) Benjamin Herrenschmidt
2010-10-01  5:57 ` Benjamin Herrenschmidt
2010-10-01  6:01   ` Benjamin Herrenschmidt
2010-10-01 17:18   ` J. Bruce Fields
2010-10-01 18:12 ` Trond Myklebust
2010-10-01 18:35   ` Trond Myklebust
2010-10-01 20:57     ` Benjamin Herrenschmidt
2010-10-18 23:43       ` Benjamin Herrenschmidt
2010-10-01 20:53   ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1285734825.14081.87.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    --cc=trond.myklebust@fys.uio.no \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).