From: Bruce Fields <bfields@fieldses.org>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Larry McVoy <lm@bitmover.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Chuck Lever <chuck.lever@oracle.com>
Subject: Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
Date: Sun, 14 Oct 2012 15:39:05 -0400 [thread overview]
Message-ID: <20121014193905.GC32420@fieldses.org> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA9091FDED5@SACEXCMBX04-PRD.hq.netapp.com>
On Sat, Oct 13, 2012 at 02:28:39AM +0000, Myklebust, Trond wrote:
> On Sat, 2012-10-13 at 10:02 +0900, Linus Torvalds wrote:
> > On Sat, Oct 13, 2012 at 9:21 AM, Larry McVoy <lm@bitmover.com> wrote:
> > >
> > > Ahh, I've been away from the kernel too long. I miss that delicate
> > > management touch.
> >
> > "Delicate Management Touch" is my middle name.
> >
> > > pics of the stack trace at http://www.mcvoy.com/lm/nfs-lock-crash
> >
> > Ok, that's just the normal kind of random left-over oopses due to
> > subsequent problems of a BUG_ON(). Looks like the watchdog timer ends
> > up being unhappy, almost certainly simply because some core filesystem
> > spinlock not being released.
> >
> > It used to be (a long long time ago) that we'd recover fairly
> > gracefully from BUG_ON()'s - back when the main shared lock we had was
> > the kernel lock, and we had a single per-process kernel lock counter.
> > So when we killed the process, we could clean that single lock up.
> >
> > These days, if some process dies in random kernel code due to a
> > BUG_ON() or a wild pointer or similar, and we kill it, we are seldom
> > able to do so cleanly. So the best we can hope for is that it happened
> > in some context where it held no (important) locks. Which is rare. So
> > BUG_ON()'s are often fatal, and there are these kinds of downstream
> > problems where they get flushed off the screen by subsequent issues...
>
> If that code is being called under a lock, then we have other problems.
> It is standard XDR code: it should always be called from an ordinary
> process context with no special locks being held by the callers.
>
> > Ho humm. Google doesn't seem to be finding any similar bug-reports, so
> > unless Bruce or Trond go "Ahh, I know what it's about", I do think we
> > would want to get as much more info as possible.
>
> Never seen it before, and I see no reason why it should drag the entire
> box down with it. It is part of the NLM server's callback code, so there
> is no chance of it being called as part of a memory reclaim or anything
> similarly sensitive to the rest of the box.
>
> Are we sure that this BUG_ON() really is top of the chain of Oopses
> here? All I can see it doing is crashing the lockd server process,
Can't it be called from the rpciod workqueue? I'm not sure what happens
when we hit a BUG there.
It looks like a bunch of BUG_ON's got added with an xdr rewrite in
2b061f9ef216b6d229b06267f188167fd6ab3d9b. Maybe Chuck or someone should
do a 'git grep BUG fs/lockd' and figure out what those should be
instead?
And I need to do the same for nfsd; I've been sloppy about using them as
asserts.
--b.
> which
> will seriously inconvenience all the NFS clients trying to do locking,
> but it shouldn't be affecting the swapper process as we're seeing in the
> Oops screenshots.
> If it really is the first thing to Oops, then the only thing I can think
> of there that would trigger other Oopses would be a memory corruption
> (use after free or some such thing?). Perhaps Larry could try turning on
> some of the less intrusive slab debugging options?
>
> > Doing a kernel compile really isn't that bad. The only nasty piece is
> > getting the kernel configuration right, but you can just use the
> > distro config. It's much too big and contains everything, but it will
> > work, and gets you as similar a kernel as possible. Of course, Ubuntu
> > has made installing your own kernel stupidly complicated (you have to
> > build a package and install it using the package manager), but while
> > it's an annoying extra step or two (compared to just doing a "make
> > modules_install install"), it's not rocket surgery. There's a few help
> > pages for it:
> >
> > https://help.ubuntu.com/community/Kernel/Compile
> >
> > being the first one.
> >
> > Linus
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
next prev parent reply other threads:[~2012-10-14 19:39 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20121012211701.GA8301@bitmover.com>
2012-10-12 23:52 ` kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226! Linus Torvalds
2012-10-13 0:21 ` Larry McVoy
2012-10-13 1:02 ` Linus Torvalds
2012-10-13 1:36 ` Jim Rees
2012-10-13 1:45 ` Linus Torvalds
2012-10-13 21:31 ` Daniel Kahn Gillmor
2012-10-13 2:08 ` Boaz Harrosh
2012-10-13 2:28 ` Myklebust, Trond
2012-10-13 2:31 ` Larry McVoy
2012-10-13 2:52 ` Myklebust, Trond
2012-10-13 2:56 ` Larry McVoy
2012-10-13 3:05 ` Myklebust, Trond
2012-10-13 4:42 ` Myklebust, Trond
2012-10-14 1:42 ` Larry McVoy
2012-10-15 0:43 ` Bruce Fields
2012-10-15 4:38 ` Myklebust, Trond
2012-10-15 14:34 ` Larry McVoy
2012-10-15 18:02 ` Bruce Fields
2012-10-15 4:41 ` Myklebust, Trond
2012-10-15 12:11 ` Bruce Fields
2012-10-17 14:00 ` Bruce Fields
2012-10-14 19:39 ` Bruce Fields [this message]
2012-10-14 19:44 ` Linus Torvalds
2012-10-14 20:55 ` Chuck Lever
2012-10-14 21:05 ` Linus Torvalds
2012-10-14 22:32 ` Chuck Lever
2012-10-14 22:54 ` Linus Torvalds
2012-10-13 2:27 ` Boaz Harrosh
2012-10-13 2:30 ` Larry McVoy
2012-10-13 2:32 ` Myklebust, Trond
2012-10-13 2:39 ` Boaz Harrosh
2012-10-13 2:43 ` Larry McVoy
2012-10-14 19:43 ` Bruce Fields
2012-10-15 18:20 ` Boaz Harrosh
2012-10-13 2:37 ` Linus Torvalds
2012-10-15 8:05 George Spelvin
2012-10-15 12:19 ` Myklebust, Trond
2012-10-15 16:02 ` VDR User
2012-10-16 1:48 ` George Spelvin
2012-10-16 1:52 ` Larry McVoy
2012-10-16 3:46 ` Myklebust, Trond
2012-10-16 4:39 ` George Spelvin
2012-10-16 11:17 ` Jim Rees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121014193905.GC32420@fieldses.org \
--to=bfields@fieldses.org \
--cc=Trond.Myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=lm@bitmover.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.