All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
       [not found] <20121012211701.GA8301@bitmover.com>
@ 2012-10-12 23:52 ` Linus Torvalds
  2012-10-13  0:21   ` Larry McVoy
  2012-10-13  2:27   ` Boaz Harrosh
  0 siblings, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-10-12 23:52 UTC (permalink / raw)
  To: Bruce Fields, Trond Myklebust, Linux NFS Mailing List; +Cc: Larry McVoy

Guys, check this report from Larry out.

Also, why the *HELL* is that a BUG_ON() in the first place? Who was
the less-than-gifted person who decided "if this thing can happen,
let's just kill the whole machine"?

BUG_ON()'s are for machine-killing problems, not for some random
assert() in the code. If the thing can never happen, then the BUG_ON()
shouldn't have existed in the first place. And if you are worried
about it happening (like it clearly did for Larry), ou should have
handled it, since clearly this is *not* a machine-killing worthy
problem.

IOW, something like

        WARN_ON_ONCE(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD);

would have been way more appropriate. Let people know that there is a
problem, but don't kill the machine.

We have way too many people who seem to think that "I don't know what
I should do, so I'll just kill the machine" is a sane option. It's
not. Sure, often a BUG_ON() is survivable (just killing the process),
but in filesystem code there's usually a lock (or many) that tends to
make it problematic even when we can just kill the process, and often
causes these things to not even be logged very well.

Larry, the stack trace and registers would be useful. Picture or a
full dump of the BUG_ON() if it got logged? If it gets eaten by the
machine being unresponsive after the event and since you can reproduce
it, you could just try to change it to the WARN_ON_ONCE() above, and
then it should be easier to just get out of the dmesg, since hopefully
the machine stays up despite the odd status value..

                  Linus

On Sat, Oct 13, 2012 at 6:17 AM, Larry McVoy <lm@bitmover.com> wrote:
> I've got a reproduce-at-will crash, it's starting skype w/ /home nfs mounted
> on an x86 mac.  I can text you the stack trace if you like.
> --
> ---
> Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-12 23:52 ` kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226! Linus Torvalds
@ 2012-10-13  0:21   ` Larry McVoy
  2012-10-13  1:02     ` Linus Torvalds
  2012-10-13  2:27   ` Boaz Harrosh
  1 sibling, 1 reply; 43+ messages in thread
From: Larry McVoy @ 2012-10-13  0:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bruce Fields, Trond Myklebust, Linux NFS Mailing List,
	Larry McVoy

On Sat, Oct 13, 2012 at 08:52:44AM +0900, Linus Torvalds wrote:
> Also, why the *HELL* is that a BUG_ON() in the first place? Who was
> the less-than-gifted person who decided "if this thing can happen,
> let's just kill the whole machine"?

Ahh, I've been away from the kernel too long.  I miss that delicate
management touch.

> Larry, the stack trace and registers would be useful. Picture or a
> full dump of the BUG_ON() if it got logged? If it gets eaten by the
> machine being unresponsive after the event and since you can reproduce
> it, you could just try to change it to the WARN_ON_ONCE() above, and
> then it should be easier to just get out of the dmesg, since hopefully
> the machine stays up despite the odd status value..

Been a while since I've built a kernel and this is our production file
server, it goes down and our whole company stops.  As surprising as
it might sound, given git's success, we're still busy so crashing the
server isn't fun :)

pics of the stack trace at http://www.mcvoy.com/lm/nfs-lock-crash

-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  0:21   ` Larry McVoy
@ 2012-10-13  1:02     ` Linus Torvalds
  2012-10-13  1:36       ` Jim Rees
  2012-10-13  2:28       ` Myklebust, Trond
  0 siblings, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-10-13  1:02 UTC (permalink / raw)
  To: Larry McVoy, Bruce Fields, Trond Myklebust,
	Linux NFS Mailing List

On Sat, Oct 13, 2012 at 9:21 AM, Larry McVoy <lm@bitmover.com> wrote:
>
> Ahh, I've been away from the kernel too long.  I miss that delicate
> management touch.

"Delicate Management Touch" is my middle name.

> pics of the stack trace at http://www.mcvoy.com/lm/nfs-lock-crash

Ok, that's just the normal kind of random left-over oopses due to
subsequent problems of a BUG_ON(). Looks like the watchdog timer ends
up being unhappy, almost certainly simply because some core filesystem
spinlock not being released.

It used to be (a long long time ago) that we'd recover fairly
gracefully from BUG_ON()'s - back when the main shared lock we had was
the kernel lock, and we had a single per-process kernel lock counter.
So when we killed the process, we could clean that single lock up.

These days, if some process dies in random kernel code due to a
BUG_ON() or a wild pointer or similar, and we kill it, we are seldom
able to do so cleanly. So the best we can hope for is that it happened
in some context where it held no (important) locks. Which is rare. So
BUG_ON()'s are often fatal, and there are these kinds of downstream
problems where they get flushed off the screen by subsequent issues...

Ho humm. Google doesn't seem to be finding any similar bug-reports, so
unless Bruce or Trond go "Ahh, I know what it's about", I do think we
would want to get as much more info as possible.

Doing a kernel compile really isn't that bad. The only nasty piece is
getting the kernel configuration right, but you can just use the
distro config. It's much too big and contains everything, but it will
work, and gets you as similar a kernel as possible. Of course, Ubuntu
has made installing your own kernel stupidly complicated (you have to
build a package and install it using the package manager), but while
it's an annoying extra step or two (compared to just doing a "make
modules_install install"), it's not rocket surgery. There's a few help
pages for it:

    https://help.ubuntu.com/community/Kernel/Compile

being the first one.

                Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  1:02     ` Linus Torvalds
@ 2012-10-13  1:36       ` Jim Rees
  2012-10-13  1:45         ` Linus Torvalds
  2012-10-13  2:08         ` Boaz Harrosh
  2012-10-13  2:28       ` Myklebust, Trond
  1 sibling, 2 replies; 43+ messages in thread
From: Jim Rees @ 2012-10-13  1:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Larry McVoy, Bruce Fields, Trond Myklebust,
	Linux NFS Mailing List

Linus Torvalds wrote:

  Doing a kernel compile really isn't that bad. The only nasty piece is
  getting the kernel configuration right, but you can just use the
  distro config. It's much too big and contains everything, but it will
  work, and gets you as similar a kernel as possible. Of course, Ubuntu
  has made installing your own kernel stupidly complicated (you have to
  build a package and install it using the package manager), but while
  it's an annoying extra step or two (compared to just doing a "make
  modules_install install"), it's not rocket surgery.

I install kernels on Ubuntu every day of the week and twice on Sundays, and
never jump through their silly hoops.  The only magic is making the
initramfs, if you use one.

make install modules_install
update-initramfs -c -k 3.6.1 (or whatever)
update-grub

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  1:36       ` Jim Rees
@ 2012-10-13  1:45         ` Linus Torvalds
  2012-10-13 21:31           ` Daniel Kahn Gillmor
  2012-10-13  2:08         ` Boaz Harrosh
  1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-10-13  1:45 UTC (permalink / raw)
  To: Jim Rees; +Cc: Larry McVoy, Bruce Fields, Trond Myklebust,
	Linux NFS Mailing List

On Sat, Oct 13, 2012 at 10:36 AM, Jim Rees <rees@umich.edu> wrote:
>
> I install kernels on Ubuntu every day of the week and twice on Sundays, and
> never jump through their silly hoops.  The only magic is making the
> initramfs, if you use one.
>
> make install modules_install
> update-initramfs -c -k 3.6.1 (or whatever)
> update-grub

This magic is what the distro "/usr/sbin/installkernel" script is
supposed to do, which the kernel Makefile does for the "install"
target.

So the "make install" *should* have done the grub and initramfs
update. Ubuntu screwed up (probably because Debian did).

I wish they'd just fix it.  The whole reason the kernel does that
/usr/sbin/installkernel thing is so that distributions can do whatever
magic it is they want to do for themselves.

              Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  1:36       ` Jim Rees
  2012-10-13  1:45         ` Linus Torvalds
@ 2012-10-13  2:08         ` Boaz Harrosh
  1 sibling, 0 replies; 43+ messages in thread
From: Boaz Harrosh @ 2012-10-13  2:08 UTC (permalink / raw)
  To: Jim Rees
  Cc: Linus Torvalds, Larry McVoy, Bruce Fields, Trond Myklebust,
	Linux NFS Mailing List

On 10/12/2012 06:36 PM, Jim Rees wrote:
> Linus Torvalds wrote:
> 
>   Doing a kernel compile really isn't that bad. The only nasty piece is
>   getting the kernel configuration right, but you can just use the
>   distro config. It's much too big and contains everything, but it will
>   work, and gets you as similar a kernel as possible. Of course, Ubuntu
>   has made installing your own kernel stupidly complicated (you have to
>   build a package and install it using the package manager), but while
>   it's an annoying extra step or two (compared to just doing a "make
>   modules_install install"), it's not rocket surgery.
> 
> I install kernels on Ubuntu every day of the week and twice on Sundays, and
> never jump through their silly hoops.  The only magic is making the
> initramfs, if you use one.
> 
> make install modules_install
> update-initramfs -c -k 3.6.1 (or whatever)

I thought "make install" is suppose to call a distro specific
driver script to do exactly that initramfs thing.

Some good sole should just fix it up on Broken ubuntu. I would
but I don't use ubuntu I use Fedora.

Just "which installkernel" usually /sbin/installkernel and fix
in there, I think.

> update-grub

And that too, it's a new grub2 thing, right

Cheers
Boaz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-12 23:52 ` kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226! Linus Torvalds
  2012-10-13  0:21   ` Larry McVoy
@ 2012-10-13  2:27   ` Boaz Harrosh
  2012-10-13  2:30     ` Larry McVoy
                       ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Boaz Harrosh @ 2012-10-13  2:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bruce Fields, Trond Myklebust, Linux NFS Mailing List,
	Larry McVoy

On 10/12/2012 04:52 PM, Linus Torvalds wrote:
> Guys, check this report from Larry out.
> 
> Also, why the *HELL* is that a BUG_ON() in the first place? Who was
> the less-than-gifted person who decided "if this thing can happen,
> let's just kill the whole machine"?
> 

Something is trivially weird

fs/lockd/clntxdr.c:226 is in this static function:
	encode_nlm_stat()

encode_nlm_stat() is called in two places
	static void nlm_xdr_enc_res(...)
  and
	static void nlm_xdr_enc_testres(...)

But these two are not called anywhere. In-fact a Kernel wide grep
returns a single occurrence of both

I suspect the BUG_ON() belongs else where.
Perhaps we should add __func__ to the BUG_ON macro exactly for times
that source and Kernel get out of sync.

Puzzled
Boaz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  1:02     ` Linus Torvalds
  2012-10-13  1:36       ` Jim Rees
@ 2012-10-13  2:28       ` Myklebust, Trond
  2012-10-13  2:31         ` Larry McVoy
  2012-10-14 19:39         ` Bruce Fields
  1 sibling, 2 replies; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-13  2:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Larry McVoy, Bruce Fields, Linux NFS Mailing List

T24gU2F0LCAyMDEyLTEwLTEzIGF0IDEwOjAyICswOTAwLCBMaW51cyBUb3J2YWxkcyB3cm90ZToN
Cj4gT24gU2F0LCBPY3QgMTMsIDIwMTIgYXQgOToyMSBBTSwgTGFycnkgTWNWb3kgPGxtQGJpdG1v
dmVyLmNvbT4gd3JvdGU6DQo+ID4NCj4gPiBBaGgsIEkndmUgYmVlbiBhd2F5IGZyb20gdGhlIGtl
cm5lbCB0b28gbG9uZy4gIEkgbWlzcyB0aGF0IGRlbGljYXRlDQo+ID4gbWFuYWdlbWVudCB0b3Vj
aC4NCj4gDQo+ICJEZWxpY2F0ZSBNYW5hZ2VtZW50IFRvdWNoIiBpcyBteSBtaWRkbGUgbmFtZS4N
Cj4gDQo+ID4gcGljcyBvZiB0aGUgc3RhY2sgdHJhY2UgYXQgaHR0cDovL3d3dy5tY3ZveS5jb20v
bG0vbmZzLWxvY2stY3Jhc2gNCj4gDQo+IE9rLCB0aGF0J3MganVzdCB0aGUgbm9ybWFsIGtpbmQg
b2YgcmFuZG9tIGxlZnQtb3ZlciBvb3BzZXMgZHVlIHRvDQo+IHN1YnNlcXVlbnQgcHJvYmxlbXMg
b2YgYSBCVUdfT04oKS4gTG9va3MgbGlrZSB0aGUgd2F0Y2hkb2cgdGltZXIgZW5kcw0KPiB1cCBi
ZWluZyB1bmhhcHB5LCBhbG1vc3QgY2VydGFpbmx5IHNpbXBseSBiZWNhdXNlIHNvbWUgY29yZSBm
aWxlc3lzdGVtDQo+IHNwaW5sb2NrIG5vdCBiZWluZyByZWxlYXNlZC4NCj4gDQo+IEl0IHVzZWQg
dG8gYmUgKGEgbG9uZyBsb25nIHRpbWUgYWdvKSB0aGF0IHdlJ2QgcmVjb3ZlciBmYWlybHkNCj4g
Z3JhY2VmdWxseSBmcm9tIEJVR19PTigpJ3MgLSBiYWNrIHdoZW4gdGhlIG1haW4gc2hhcmVkIGxv
Y2sgd2UgaGFkIHdhcw0KPiB0aGUga2VybmVsIGxvY2ssIGFuZCB3ZSBoYWQgYSBzaW5nbGUgcGVy
LXByb2Nlc3Mga2VybmVsIGxvY2sgY291bnRlci4NCj4gU28gd2hlbiB3ZSBraWxsZWQgdGhlIHBy
b2Nlc3MsIHdlIGNvdWxkIGNsZWFuIHRoYXQgc2luZ2xlIGxvY2sgdXAuDQo+IA0KPiBUaGVzZSBk
YXlzLCBpZiBzb21lIHByb2Nlc3MgZGllcyBpbiByYW5kb20ga2VybmVsIGNvZGUgZHVlIHRvIGEN
Cj4gQlVHX09OKCkgb3IgYSB3aWxkIHBvaW50ZXIgb3Igc2ltaWxhciwgYW5kIHdlIGtpbGwgaXQs
IHdlIGFyZSBzZWxkb20NCj4gYWJsZSB0byBkbyBzbyBjbGVhbmx5LiBTbyB0aGUgYmVzdCB3ZSBj
YW4gaG9wZSBmb3IgaXMgdGhhdCBpdCBoYXBwZW5lZA0KPiBpbiBzb21lIGNvbnRleHQgd2hlcmUg
aXQgaGVsZCBubyAoaW1wb3J0YW50KSBsb2Nrcy4gV2hpY2ggaXMgcmFyZS4gU28NCj4gQlVHX09O
KCkncyBhcmUgb2Z0ZW4gZmF0YWwsIGFuZCB0aGVyZSBhcmUgdGhlc2Uga2luZHMgb2YgZG93bnN0
cmVhbQ0KPiBwcm9ibGVtcyB3aGVyZSB0aGV5IGdldCBmbHVzaGVkIG9mZiB0aGUgc2NyZWVuIGJ5
IHN1YnNlcXVlbnQgaXNzdWVzLi4uDQoNCklmIHRoYXQgY29kZSBpcyBiZWluZyBjYWxsZWQgdW5k
ZXIgYSBsb2NrLCB0aGVuIHdlIGhhdmUgb3RoZXIgcHJvYmxlbXMuDQpJdCBpcyBzdGFuZGFyZCBY
RFIgY29kZTogaXQgc2hvdWxkIGFsd2F5cyBiZSBjYWxsZWQgZnJvbSBhbiBvcmRpbmFyeQ0KcHJv
Y2VzcyBjb250ZXh0IHdpdGggbm8gc3BlY2lhbCBsb2NrcyBiZWluZyBoZWxkIGJ5IHRoZSBjYWxs
ZXJzLg0KDQo+IEhvIGh1bW0uIEdvb2dsZSBkb2Vzbid0IHNlZW0gdG8gYmUgZmluZGluZyBhbnkg
c2ltaWxhciBidWctcmVwb3J0cywgc28NCj4gdW5sZXNzIEJydWNlIG9yIFRyb25kIGdvICJBaGgs
IEkga25vdyB3aGF0IGl0J3MgYWJvdXQiLCBJIGRvIHRoaW5rIHdlDQo+IHdvdWxkIHdhbnQgdG8g
Z2V0IGFzIG11Y2ggbW9yZSBpbmZvIGFzIHBvc3NpYmxlLg0KDQpOZXZlciBzZWVuIGl0IGJlZm9y
ZSwgYW5kIEkgc2VlIG5vIHJlYXNvbiB3aHkgaXQgc2hvdWxkIGRyYWcgdGhlIGVudGlyZQ0KYm94
IGRvd24gd2l0aCBpdC4gSXQgaXMgcGFydCBvZiB0aGUgTkxNIHNlcnZlcidzIGNhbGxiYWNrIGNv
ZGUsIHNvIHRoZXJlDQppcyBubyBjaGFuY2Ugb2YgaXQgYmVpbmcgY2FsbGVkIGFzIHBhcnQgb2Yg
YSBtZW1vcnkgcmVjbGFpbSBvciBhbnl0aGluZw0Kc2ltaWxhcmx5IHNlbnNpdGl2ZSB0byB0aGUg
cmVzdCBvZiB0aGUgYm94Lg0KDQpBcmUgd2Ugc3VyZSB0aGF0IHRoaXMgQlVHX09OKCkgcmVhbGx5
IGlzIHRvcCBvZiB0aGUgY2hhaW4gb2YgT29wc2VzDQpoZXJlPyBBbGwgSSBjYW4gc2VlIGl0IGRv
aW5nIGlzIGNyYXNoaW5nIHRoZSBsb2NrZCBzZXJ2ZXIgcHJvY2Vzcywgd2hpY2gNCndpbGwgc2Vy
aW91c2x5IGluY29udmVuaWVuY2UgYWxsIHRoZSBORlMgY2xpZW50cyB0cnlpbmcgdG8gZG8gbG9j
a2luZywNCmJ1dCBpdCBzaG91bGRuJ3QgYmUgYWZmZWN0aW5nIHRoZSBzd2FwcGVyIHByb2Nlc3Mg
YXMgd2UncmUgc2VlaW5nIGluIHRoZQ0KT29wcyBzY3JlZW5zaG90cy4NCklmIGl0IHJlYWxseSBp
cyB0aGUgZmlyc3QgdGhpbmcgdG8gT29wcywgdGhlbiB0aGUgb25seSB0aGluZyBJIGNhbiB0aGlu
aw0Kb2YgdGhlcmUgdGhhdCB3b3VsZCB0cmlnZ2VyIG90aGVyIE9vcHNlcyB3b3VsZCBiZSBhIG1l
bW9yeSBjb3JydXB0aW9uDQoodXNlIGFmdGVyIGZyZWUgb3Igc29tZSBzdWNoIHRoaW5nPykuIFBl
cmhhcHMgTGFycnkgY291bGQgdHJ5IHR1cm5pbmcgb24NCnNvbWUgb2YgdGhlIGxlc3MgaW50cnVz
aXZlIHNsYWIgZGVidWdnaW5nIG9wdGlvbnM/DQoNCj4gRG9pbmcgYSBrZXJuZWwgY29tcGlsZSBy
ZWFsbHkgaXNuJ3QgdGhhdCBiYWQuIFRoZSBvbmx5IG5hc3R5IHBpZWNlIGlzDQo+IGdldHRpbmcg
dGhlIGtlcm5lbCBjb25maWd1cmF0aW9uIHJpZ2h0LCBidXQgeW91IGNhbiBqdXN0IHVzZSB0aGUN
Cj4gZGlzdHJvIGNvbmZpZy4gSXQncyBtdWNoIHRvbyBiaWcgYW5kIGNvbnRhaW5zIGV2ZXJ5dGhp
bmcsIGJ1dCBpdCB3aWxsDQo+IHdvcmssIGFuZCBnZXRzIHlvdSBhcyBzaW1pbGFyIGEga2VybmVs
IGFzIHBvc3NpYmxlLiBPZiBjb3Vyc2UsIFVidW50dQ0KPiBoYXMgbWFkZSBpbnN0YWxsaW5nIHlv
dXIgb3duIGtlcm5lbCBzdHVwaWRseSBjb21wbGljYXRlZCAoeW91IGhhdmUgdG8NCj4gYnVpbGQg
YSBwYWNrYWdlIGFuZCBpbnN0YWxsIGl0IHVzaW5nIHRoZSBwYWNrYWdlIG1hbmFnZXIpLCBidXQg
d2hpbGUNCj4gaXQncyBhbiBhbm5veWluZyBleHRyYSBzdGVwIG9yIHR3byAoY29tcGFyZWQgdG8g
anVzdCBkb2luZyBhICJtYWtlDQo+IG1vZHVsZXNfaW5zdGFsbCBpbnN0YWxsIiksIGl0J3Mgbm90
IHJvY2tldCBzdXJnZXJ5LiBUaGVyZSdzIGEgZmV3IGhlbHANCj4gcGFnZXMgZm9yIGl0Og0KPiAN
Cj4gICAgIGh0dHBzOi8vaGVscC51YnVudHUuY29tL2NvbW11bml0eS9LZXJuZWwvQ29tcGlsZQ0K
PiANCj4gYmVpbmcgdGhlIGZpcnN0IG9uZS4NCj4gDQo+ICAgICAgICAgICAgICAgICBMaW51cw0K
DQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5l
dEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQo=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:27   ` Boaz Harrosh
@ 2012-10-13  2:30     ` Larry McVoy
  2012-10-13  2:32     ` Myklebust, Trond
  2012-10-13  2:37     ` Linus Torvalds
  2 siblings, 0 replies; 43+ messages in thread
From: Larry McVoy @ 2012-10-13  2:30 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Linus Torvalds, Bruce Fields, Trond Myklebust,
	Linux NFS Mailing List, Larry McVoy

In BitKeeper, we wacked perror() to print out file, line.  Saved us
a ton of debugging.  Oh, and bk-version as well.  That helps.

On Fri, Oct 12, 2012 at 07:27:08PM -0700, Boaz Harrosh wrote:
> On 10/12/2012 04:52 PM, Linus Torvalds wrote:
> > Guys, check this report from Larry out.
> > 
> > Also, why the *HELL* is that a BUG_ON() in the first place? Who was
> > the less-than-gifted person who decided "if this thing can happen,
> > let's just kill the whole machine"?
> > 
> 
> Something is trivially weird
> 
> fs/lockd/clntxdr.c:226 is in this static function:
> 	encode_nlm_stat()
> 
> encode_nlm_stat() is called in two places
> 	static void nlm_xdr_enc_res(...)
>   and
> 	static void nlm_xdr_enc_testres(...)
> 
> But these two are not called anywhere. In-fact a Kernel wide grep
> returns a single occurrence of both
> 
> I suspect the BUG_ON() belongs else where.
> Perhaps we should add __func__ to the BUG_ON macro exactly for times
> that source and Kernel get out of sync.
> 
> Puzzled
> Boaz

-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:28       ` Myklebust, Trond
@ 2012-10-13  2:31         ` Larry McVoy
  2012-10-13  2:52           ` Myklebust, Trond
  2012-10-14 19:39         ` Bruce Fields
  1 sibling, 1 reply; 43+ messages in thread
From: Larry McVoy @ 2012-10-13  2:31 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Linus Torvalds, Larry McVoy, Bruce Fields, Linux NFS Mailing List

As I've said, debugging this is going to be hard, it means stopping my
company.  We can do one more crash, you guys can tell me what to do,
but that's about it.

On Sat, Oct 13, 2012 at 02:28:39AM +0000, Myklebust, Trond wrote:
> On Sat, 2012-10-13 at 10:02 +0900, Linus Torvalds wrote:
> > On Sat, Oct 13, 2012 at 9:21 AM, Larry McVoy <lm@bitmover.com> wrote:
> > >
> > > Ahh, I've been away from the kernel too long.  I miss that delicate
> > > management touch.
> > 
> > "Delicate Management Touch" is my middle name.
> > 
> > > pics of the stack trace at http://www.mcvoy.com/lm/nfs-lock-crash
> > 
> > Ok, that's just the normal kind of random left-over oopses due to
> > subsequent problems of a BUG_ON(). Looks like the watchdog timer ends
> > up being unhappy, almost certainly simply because some core filesystem
> > spinlock not being released.
> > 
> > It used to be (a long long time ago) that we'd recover fairly
> > gracefully from BUG_ON()'s - back when the main shared lock we had was
> > the kernel lock, and we had a single per-process kernel lock counter.
> > So when we killed the process, we could clean that single lock up.
> > 
> > These days, if some process dies in random kernel code due to a
> > BUG_ON() or a wild pointer or similar, and we kill it, we are seldom
> > able to do so cleanly. So the best we can hope for is that it happened
> > in some context where it held no (important) locks. Which is rare. So
> > BUG_ON()'s are often fatal, and there are these kinds of downstream
> > problems where they get flushed off the screen by subsequent issues...
> 
> If that code is being called under a lock, then we have other problems.
> It is standard XDR code: it should always be called from an ordinary
> process context with no special locks being held by the callers.
> 
> > Ho humm. Google doesn't seem to be finding any similar bug-reports, so
> > unless Bruce or Trond go "Ahh, I know what it's about", I do think we
> > would want to get as much more info as possible.
> 
> Never seen it before, and I see no reason why it should drag the entire
> box down with it. It is part of the NLM server's callback code, so there
> is no chance of it being called as part of a memory reclaim or anything
> similarly sensitive to the rest of the box.
> 
> Are we sure that this BUG_ON() really is top of the chain of Oopses
> here? All I can see it doing is crashing the lockd server process, which
> will seriously inconvenience all the NFS clients trying to do locking,
> but it shouldn't be affecting the swapper process as we're seeing in the
> Oops screenshots.
> If it really is the first thing to Oops, then the only thing I can think
> of there that would trigger other Oopses would be a memory corruption
> (use after free or some such thing?). Perhaps Larry could try turning on
> some of the less intrusive slab debugging options?
> 
> > Doing a kernel compile really isn't that bad. The only nasty piece is
> > getting the kernel configuration right, but you can just use the
> > distro config. It's much too big and contains everything, but it will
> > work, and gets you as similar a kernel as possible. Of course, Ubuntu
> > has made installing your own kernel stupidly complicated (you have to
> > build a package and install it using the package manager), but while
> > it's an annoying extra step or two (compared to just doing a "make
> > modules_install install"), it's not rocket surgery. There's a few help
> > pages for it:
> > 
> >     https://help.ubuntu.com/community/Kernel/Compile
> > 
> > being the first one.
> > 
> >                 Linus
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com

-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:27   ` Boaz Harrosh
  2012-10-13  2:30     ` Larry McVoy
@ 2012-10-13  2:32     ` Myklebust, Trond
  2012-10-13  2:39       ` Boaz Harrosh
  2012-10-13  2:37     ` Linus Torvalds
  2 siblings, 1 reply; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-13  2:32 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Linus Torvalds, Bruce Fields, Linux NFS Mailing List, Larry McVoy

T24gRnJpLCAyMDEyLTEwLTEyIGF0IDE5OjI3IC0wNzAwLCBCb2F6IEhhcnJvc2ggd3JvdGU6DQo+
IE9uIDEwLzEyLzIwMTIgMDQ6NTIgUE0sIExpbnVzIFRvcnZhbGRzIHdyb3RlOg0KPiA+IEd1eXMs
IGNoZWNrIHRoaXMgcmVwb3J0IGZyb20gTGFycnkgb3V0Lg0KPiA+IA0KPiA+IEFsc28sIHdoeSB0
aGUgKkhFTEwqIGlzIHRoYXQgYSBCVUdfT04oKSBpbiB0aGUgZmlyc3QgcGxhY2U/IFdobyB3YXMN
Cj4gPiB0aGUgbGVzcy10aGFuLWdpZnRlZCBwZXJzb24gd2hvIGRlY2lkZWQgImlmIHRoaXMgdGhp
bmcgY2FuIGhhcHBlbiwNCj4gPiBsZXQncyBqdXN0IGtpbGwgdGhlIHdob2xlIG1hY2hpbmUiPw0K
PiA+IA0KPiANCj4gU29tZXRoaW5nIGlzIHRyaXZpYWxseSB3ZWlyZA0KPiANCj4gZnMvbG9ja2Qv
Y2xudHhkci5jOjIyNiBpcyBpbiB0aGlzIHN0YXRpYyBmdW5jdGlvbjoNCj4gCWVuY29kZV9ubG1f
c3RhdCgpDQo+IA0KPiBlbmNvZGVfbmxtX3N0YXQoKSBpcyBjYWxsZWQgaW4gdHdvIHBsYWNlcw0K
PiAJc3RhdGljIHZvaWQgbmxtX3hkcl9lbmNfcmVzKC4uLikNCj4gICBhbmQNCj4gCXN0YXRpYyB2
b2lkIG5sbV94ZHJfZW5jX3Rlc3RyZXMoLi4uKQ0KPiANCj4gQnV0IHRoZXNlIHR3byBhcmUgbm90
IGNhbGxlZCBhbnl3aGVyZS4gSW4tZmFjdCBhIEtlcm5lbCB3aWRlIGdyZXANCj4gcmV0dXJucyBh
IHNpbmdsZSBvY2N1cnJlbmNlIG9mIGJvdGgNCg0KVGhleSBhcmUgdXNlZCBpbiB0aGUgbmxtX3By
b2NlZHVyZXNbXSBhcnJheSBhcyBURVNUX1JFUywgTE9DS19SRVMsDQpDQU5DRUxfUkVTLCBVTkxP
Q0tfUkVTIGFuZCBHUkFOVEVEX1JFUyB4ZHIgZW5jb2RpbmcgcHJvY2VkdXJlcy4NCg0KDQoNCi0t
IA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBw
DQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg==

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:27   ` Boaz Harrosh
  2012-10-13  2:30     ` Larry McVoy
  2012-10-13  2:32     ` Myklebust, Trond
@ 2012-10-13  2:37     ` Linus Torvalds
  2 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-10-13  2:37 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Bruce Fields, Trond Myklebust, Linux NFS Mailing List,
	Larry McVoy

On Sat, Oct 13, 2012 at 11:27 AM, Boaz Harrosh <bharrosh@panasas.com> wrote:
>
> I suspect the BUG_ON() belongs else where.
> Perhaps we should add __func__ to the BUG_ON macro exactly for times
> that source and Kernel get out of sync.

BUG_ON already does the whole "line and file" thing, so the exact
BUG_ON() should be very well pinpointed.

Sometimes various distro changes do make the lines move around, but
clntxdr.c has been very stable, and line 226 has the same BUG_ON in it
in both 3.2 and 3.6, so I seriously doubt that's the case.

And that function is definitely called, it's just that you missed the
magic cpp stuff. See the preprocessor using it like this:

  .p_encode    = (kxdreproc_t)nlm_xdr_enc_##argtype,

so the nlm_xdr_enc_res() function ends up being used as the .p_encode
function for the 'res' argtype.. (and same for testres)

             Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:32     ` Myklebust, Trond
@ 2012-10-13  2:39       ` Boaz Harrosh
  2012-10-13  2:43         ` Larry McVoy
  2012-10-14 19:43         ` Bruce Fields
  0 siblings, 2 replies; 43+ messages in thread
From: Boaz Harrosh @ 2012-10-13  2:39 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Linus Torvalds, Bruce Fields, Linux NFS Mailing List, Larry McVoy

On 10/12/2012 07:32 PM, Myklebust, Trond wrote:
> On Fri, 2012-10-12 at 19:27 -0700, Boaz Harrosh wrote:
>> On 10/12/2012 04:52 PM, Linus Torvalds wrote:
>>> Guys, check this report from Larry out.
>>>
>>> Also, why the *HELL* is that a BUG_ON() in the first place? Who was
>>> the less-than-gifted person who decided "if this thing can happen,
>>> let's just kill the whole machine"?
>>>
>>
>> Something is trivially weird
>>
>> fs/lockd/clntxdr.c:226 is in this static function:
>> 	encode_nlm_stat()
>>
>> encode_nlm_stat() is called in two places
>> 	static void nlm_xdr_enc_res(...)
>>   and
>> 	static void nlm_xdr_enc_testres(...)
>>
>> But these two are not called anywhere. In-fact a Kernel wide grep
>> returns a single occurrence of both
> 
> They are used in the nlm_procedures[] array as TEST_RES, LOCK_RES,
> CANCEL_RES, UNLOCK_RES and GRANTED_RES xdr encoding procedures.
> 

Ha, sorry that was impossible to find. OK I'll have a look

Thanks
Boaz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:39       ` Boaz Harrosh
@ 2012-10-13  2:43         ` Larry McVoy
  2012-10-14 19:43         ` Bruce Fields
  1 sibling, 0 replies; 43+ messages in thread
From: Larry McVoy @ 2012-10-13  2:43 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Myklebust, Trond, Linus Torvalds, Bruce Fields,
	Linux NFS Mailing List, Larry McVoy

I could go in on a weekend and bang on things if that helps.  Here's some
other info:  we use old systems so we are forward compat.  The mac was
/ is running MacOS 10.4:

macos-x86:~ uname -a
Darwin macos-x86.bitmover.com 8.11.1 Darwin Kernel Version 8.11.1: Wed Oct 10
18:23:28 PDT 2007; root:xnu-792.25.20~1/RELEASE_I386 i386 i386

No idea if they fucked up NFS, certainly possible.

The way to make it happen is to have your home directory mounted on that
machine and fire up skype.  It tries to do some locking thing and barfs.

How do I figure out which NFS version macos is using?

macos-x86:~ mount
/dev/disk0s2 on / (local, journaled)
devfs on /dev (local)
fdesc on /dev (union)
<volfs> on /.vol
automount -nsl [144] on /Network (automounted)
automount -fstab [148] on /automount/Servers (automounted)
automount -static [148] on /automount/static (automounted)
10.3.9.1:/home on /private/var/automount/home (automounted)

On Fri, Oct 12, 2012 at 07:39:22PM -0700, Boaz Harrosh wrote:
> On 10/12/2012 07:32 PM, Myklebust, Trond wrote:
> > On Fri, 2012-10-12 at 19:27 -0700, Boaz Harrosh wrote:
> >> On 10/12/2012 04:52 PM, Linus Torvalds wrote:
> >>> Guys, check this report from Larry out.
> >>>
> >>> Also, why the *HELL* is that a BUG_ON() in the first place? Who was
> >>> the less-than-gifted person who decided "if this thing can happen,
> >>> let's just kill the whole machine"?
> >>>
> >>
> >> Something is trivially weird
> >>
> >> fs/lockd/clntxdr.c:226 is in this static function:
> >> 	encode_nlm_stat()
> >>
> >> encode_nlm_stat() is called in two places
> >> 	static void nlm_xdr_enc_res(...)
> >>   and
> >> 	static void nlm_xdr_enc_testres(...)
> >>
> >> But these two are not called anywhere. In-fact a Kernel wide grep
> >> returns a single occurrence of both
> > 
> > They are used in the nlm_procedures[] array as TEST_RES, LOCK_RES,
> > CANCEL_RES, UNLOCK_RES and GRANTED_RES xdr encoding procedures.
> > 
> 
> Ha, sorry that was impossible to find. OK I'll have a look
> 
> Thanks
> Boaz

-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:31         ` Larry McVoy
@ 2012-10-13  2:52           ` Myklebust, Trond
  2012-10-13  2:56             ` Larry McVoy
  0 siblings, 1 reply; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-13  2:52 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Linus Torvalds, Bruce Fields, Linux NFS Mailing List

T24gRnJpLCAyMDEyLTEwLTEyIGF0IDE5OjMxIC0wNzAwLCBMYXJyeSBNY1ZveSB3cm90ZToNCj4g
QXMgSSd2ZSBzYWlkLCBkZWJ1Z2dpbmcgdGhpcyBpcyBnb2luZyB0byBiZSBoYXJkLCBpdCBtZWFu
cyBzdG9wcGluZyBteQ0KPiBjb21wYW55LiAgV2UgY2FuIGRvIG9uZSBtb3JlIGNyYXNoLCB5b3Ug
Z3V5cyBjYW4gdGVsbCBtZSB3aGF0IHRvIGRvLA0KPiBidXQgdGhhdCdzIGFib3V0IGl0Lg0KDQpE
byB5b3UgaGF2ZSBhIHJlcHJvZHVjZXI/IElmIHlvdSBkbywgdGhlbiBwZXJoYXBzIHdlIGNhbiB0
cnkgdG8NCmR1cGxpY2F0ZSB0aGUgaXNzdWUgb24gb3VyIG93biBzZXR1cHMgd2l0aG91dCBuZWVk
aW5nIHRvIGRlbGliZXJhdGVseQ0KY2F1c2UgeW91IG1vcmUgZG93bnRpbWUuDQoNCklmIG5vdCwg
dGhlbiBJIGd1ZXNzIHdlJ2xsIGhhdmUgdG8gdHJ5IGxvb2tpbmcgYXQgdGhvc2Ugc2xhYiBtZW1v
cnkNCmRlYnVnZ2luZyBvcHRpb25zLiBJJ20gbm8gZXhwZXJ0IG9uIHNsYWIgZGVidWdnaW5nLCBh
bmQgaXQncyB3YXkgdG9vDQpsYXRlIGluIHRoaXMgY29ybmVyIG9mIHRoZSBVUyBmb3IgbWUgdG8g
dHJ5IHRvIGZpZ3VyZSBvdXQgZXhhY3RseSB3aGF0DQpvcHRpb25zIGFyZSBhcHByb3ByaWF0ZSwg
YnV0IEkgY2FuIGxvb2sgaW50byBpdCB0b21vcnJvdyAodW5sZXNzIExpbnVzDQpoYXMgYSBzdWdn
ZXN0aW9uKS4NCg0KPiBPbiBTYXQsIE9jdCAxMywgMjAxMiBhdCAwMjoyODozOUFNICswMDAwLCBN
eWtsZWJ1c3QsIFRyb25kIHdyb3RlOg0KPiA+IE9uIFNhdCwgMjAxMi0xMC0xMyBhdCAxMDowMiAr
MDkwMCwgTGludXMgVG9ydmFsZHMgd3JvdGU6DQo+ID4gPiBPbiBTYXQsIE9jdCAxMywgMjAxMiBh
dCA5OjIxIEFNLCBMYXJyeSBNY1ZveSA8bG1AYml0bW92ZXIuY29tPiB3cm90ZToNCj4gPiA+ID4N
Cj4gPiA+ID4gQWhoLCBJJ3ZlIGJlZW4gYXdheSBmcm9tIHRoZSBrZXJuZWwgdG9vIGxvbmcuICBJ
IG1pc3MgdGhhdCBkZWxpY2F0ZQ0KPiA+ID4gPiBtYW5hZ2VtZW50IHRvdWNoLg0KPiA+ID4gDQo+
ID4gPiAiRGVsaWNhdGUgTWFuYWdlbWVudCBUb3VjaCIgaXMgbXkgbWlkZGxlIG5hbWUuDQo+ID4g
PiANCj4gPiA+ID4gcGljcyBvZiB0aGUgc3RhY2sgdHJhY2UgYXQgaHR0cDovL3d3dy5tY3ZveS5j
b20vbG0vbmZzLWxvY2stY3Jhc2gNCj4gPiA+IA0KPiA+ID4gT2ssIHRoYXQncyBqdXN0IHRoZSBu
b3JtYWwga2luZCBvZiByYW5kb20gbGVmdC1vdmVyIG9vcHNlcyBkdWUgdG8NCj4gPiA+IHN1YnNl
cXVlbnQgcHJvYmxlbXMgb2YgYSBCVUdfT04oKS4gTG9va3MgbGlrZSB0aGUgd2F0Y2hkb2cgdGlt
ZXIgZW5kcw0KPiA+ID4gdXAgYmVpbmcgdW5oYXBweSwgYWxtb3N0IGNlcnRhaW5seSBzaW1wbHkg
YmVjYXVzZSBzb21lIGNvcmUgZmlsZXN5c3RlbQ0KPiA+ID4gc3BpbmxvY2sgbm90IGJlaW5nIHJl
bGVhc2VkLg0KPiA+ID4gDQo+ID4gPiBJdCB1c2VkIHRvIGJlIChhIGxvbmcgbG9uZyB0aW1lIGFn
bykgdGhhdCB3ZSdkIHJlY292ZXIgZmFpcmx5DQo+ID4gPiBncmFjZWZ1bGx5IGZyb20gQlVHX09O
KCkncyAtIGJhY2sgd2hlbiB0aGUgbWFpbiBzaGFyZWQgbG9jayB3ZSBoYWQgd2FzDQo+ID4gPiB0
aGUga2VybmVsIGxvY2ssIGFuZCB3ZSBoYWQgYSBzaW5nbGUgcGVyLXByb2Nlc3Mga2VybmVsIGxv
Y2sgY291bnRlci4NCj4gPiA+IFNvIHdoZW4gd2Uga2lsbGVkIHRoZSBwcm9jZXNzLCB3ZSBjb3Vs
ZCBjbGVhbiB0aGF0IHNpbmdsZSBsb2NrIHVwLg0KPiA+ID4gDQo+ID4gPiBUaGVzZSBkYXlzLCBp
ZiBzb21lIHByb2Nlc3MgZGllcyBpbiByYW5kb20ga2VybmVsIGNvZGUgZHVlIHRvIGENCj4gPiA+
IEJVR19PTigpIG9yIGEgd2lsZCBwb2ludGVyIG9yIHNpbWlsYXIsIGFuZCB3ZSBraWxsIGl0LCB3
ZSBhcmUgc2VsZG9tDQo+ID4gPiBhYmxlIHRvIGRvIHNvIGNsZWFubHkuIFNvIHRoZSBiZXN0IHdl
IGNhbiBob3BlIGZvciBpcyB0aGF0IGl0IGhhcHBlbmVkDQo+ID4gPiBpbiBzb21lIGNvbnRleHQg
d2hlcmUgaXQgaGVsZCBubyAoaW1wb3J0YW50KSBsb2Nrcy4gV2hpY2ggaXMgcmFyZS4gU28NCj4g
PiA+IEJVR19PTigpJ3MgYXJlIG9mdGVuIGZhdGFsLCBhbmQgdGhlcmUgYXJlIHRoZXNlIGtpbmRz
IG9mIGRvd25zdHJlYW0NCj4gPiA+IHByb2JsZW1zIHdoZXJlIHRoZXkgZ2V0IGZsdXNoZWQgb2Zm
IHRoZSBzY3JlZW4gYnkgc3Vic2VxdWVudCBpc3N1ZXMuLi4NCj4gPiANCj4gPiBJZiB0aGF0IGNv
ZGUgaXMgYmVpbmcgY2FsbGVkIHVuZGVyIGEgbG9jaywgdGhlbiB3ZSBoYXZlIG90aGVyIHByb2Js
ZW1zLg0KPiA+IEl0IGlzIHN0YW5kYXJkIFhEUiBjb2RlOiBpdCBzaG91bGQgYWx3YXlzIGJlIGNh
bGxlZCBmcm9tIGFuIG9yZGluYXJ5DQo+ID4gcHJvY2VzcyBjb250ZXh0IHdpdGggbm8gc3BlY2lh
bCBsb2NrcyBiZWluZyBoZWxkIGJ5IHRoZSBjYWxsZXJzLg0KPiA+IA0KPiA+ID4gSG8gaHVtbS4g
R29vZ2xlIGRvZXNuJ3Qgc2VlbSB0byBiZSBmaW5kaW5nIGFueSBzaW1pbGFyIGJ1Zy1yZXBvcnRz
LCBzbw0KPiA+ID4gdW5sZXNzIEJydWNlIG9yIFRyb25kIGdvICJBaGgsIEkga25vdyB3aGF0IGl0
J3MgYWJvdXQiLCBJIGRvIHRoaW5rIHdlDQo+ID4gPiB3b3VsZCB3YW50IHRvIGdldCBhcyBtdWNo
IG1vcmUgaW5mbyBhcyBwb3NzaWJsZS4NCj4gPiANCj4gPiBOZXZlciBzZWVuIGl0IGJlZm9yZSwg
YW5kIEkgc2VlIG5vIHJlYXNvbiB3aHkgaXQgc2hvdWxkIGRyYWcgdGhlIGVudGlyZQ0KPiA+IGJv
eCBkb3duIHdpdGggaXQuIEl0IGlzIHBhcnQgb2YgdGhlIE5MTSBzZXJ2ZXIncyBjYWxsYmFjayBj
b2RlLCBzbyB0aGVyZQ0KPiA+IGlzIG5vIGNoYW5jZSBvZiBpdCBiZWluZyBjYWxsZWQgYXMgcGFy
dCBvZiBhIG1lbW9yeSByZWNsYWltIG9yIGFueXRoaW5nDQo+ID4gc2ltaWxhcmx5IHNlbnNpdGl2
ZSB0byB0aGUgcmVzdCBvZiB0aGUgYm94Lg0KPiA+IA0KPiA+IEFyZSB3ZSBzdXJlIHRoYXQgdGhp
cyBCVUdfT04oKSByZWFsbHkgaXMgdG9wIG9mIHRoZSBjaGFpbiBvZiBPb3BzZXMNCj4gPiBoZXJl
PyBBbGwgSSBjYW4gc2VlIGl0IGRvaW5nIGlzIGNyYXNoaW5nIHRoZSBsb2NrZCBzZXJ2ZXIgcHJv
Y2Vzcywgd2hpY2gNCj4gPiB3aWxsIHNlcmlvdXNseSBpbmNvbnZlbmllbmNlIGFsbCB0aGUgTkZT
IGNsaWVudHMgdHJ5aW5nIHRvIGRvIGxvY2tpbmcsDQo+ID4gYnV0IGl0IHNob3VsZG4ndCBiZSBh
ZmZlY3RpbmcgdGhlIHN3YXBwZXIgcHJvY2VzcyBhcyB3ZSdyZSBzZWVpbmcgaW4gdGhlDQo+ID4g
T29wcyBzY3JlZW5zaG90cy4NCj4gPiBJZiBpdCByZWFsbHkgaXMgdGhlIGZpcnN0IHRoaW5nIHRv
IE9vcHMsIHRoZW4gdGhlIG9ubHkgdGhpbmcgSSBjYW4gdGhpbmsNCj4gPiBvZiB0aGVyZSB0aGF0
IHdvdWxkIHRyaWdnZXIgb3RoZXIgT29wc2VzIHdvdWxkIGJlIGEgbWVtb3J5IGNvcnJ1cHRpb24N
Cj4gPiAodXNlIGFmdGVyIGZyZWUgb3Igc29tZSBzdWNoIHRoaW5nPykuIFBlcmhhcHMgTGFycnkg
Y291bGQgdHJ5IHR1cm5pbmcgb24NCj4gPiBzb21lIG9mIHRoZSBsZXNzIGludHJ1c2l2ZSBzbGFi
IGRlYnVnZ2luZyBvcHRpb25zPw0KPiA+IA0KPiA+ID4gRG9pbmcgYSBrZXJuZWwgY29tcGlsZSBy
ZWFsbHkgaXNuJ3QgdGhhdCBiYWQuIFRoZSBvbmx5IG5hc3R5IHBpZWNlIGlzDQo+ID4gPiBnZXR0
aW5nIHRoZSBrZXJuZWwgY29uZmlndXJhdGlvbiByaWdodCwgYnV0IHlvdSBjYW4ganVzdCB1c2Ug
dGhlDQo+ID4gPiBkaXN0cm8gY29uZmlnLiBJdCdzIG11Y2ggdG9vIGJpZyBhbmQgY29udGFpbnMg
ZXZlcnl0aGluZywgYnV0IGl0IHdpbGwNCj4gPiA+IHdvcmssIGFuZCBnZXRzIHlvdSBhcyBzaW1p
bGFyIGEga2VybmVsIGFzIHBvc3NpYmxlLiBPZiBjb3Vyc2UsIFVidW50dQ0KPiA+ID4gaGFzIG1h
ZGUgaW5zdGFsbGluZyB5b3VyIG93biBrZXJuZWwgc3R1cGlkbHkgY29tcGxpY2F0ZWQgKHlvdSBo
YXZlIHRvDQo+ID4gPiBidWlsZCBhIHBhY2thZ2UgYW5kIGluc3RhbGwgaXQgdXNpbmcgdGhlIHBh
Y2thZ2UgbWFuYWdlciksIGJ1dCB3aGlsZQ0KPiA+ID4gaXQncyBhbiBhbm5veWluZyBleHRyYSBz
dGVwIG9yIHR3byAoY29tcGFyZWQgdG8ganVzdCBkb2luZyBhICJtYWtlDQo+ID4gPiBtb2R1bGVz
X2luc3RhbGwgaW5zdGFsbCIpLCBpdCdzIG5vdCByb2NrZXQgc3VyZ2VyeS4gVGhlcmUncyBhIGZl
dyBoZWxwDQo+ID4gPiBwYWdlcyBmb3IgaXQ6DQo+ID4gPiANCj4gPiA+ICAgICBodHRwczovL2hl
bHAudWJ1bnR1LmNvbS9jb21tdW5pdHkvS2VybmVsL0NvbXBpbGUNCj4gPiA+IA0KPiA+ID4gYmVp
bmcgdGhlIGZpcnN0IG9uZS4NCj4gPiA+IA0KPiA+ID4gICAgICAgICAgICAgICAgIExpbnVzDQo+
ID4gDQo+ID4gLS0gDQo+ID4gVHJvbmQgTXlrbGVidXN0DQo+ID4gTGludXggTkZTIGNsaWVudCBt
YWludGFpbmVyDQo+ID4gDQo+ID4gTmV0QXBwDQo+ID4gVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5j
b20NCj4gPiB3d3cubmV0YXBwLmNvbQ0KPiANCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4
IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAu
Y29tDQp3d3cubmV0YXBwLmNvbQ0K

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:52           ` Myklebust, Trond
@ 2012-10-13  2:56             ` Larry McVoy
  2012-10-13  3:05               ` Myklebust, Trond
  2012-10-13  4:42               ` Myklebust, Trond
  0 siblings, 2 replies; 43+ messages in thread
From: Larry McVoy @ 2012-10-13  2:56 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Larry McVoy, Linus Torvalds, Bruce Fields, Linux NFS Mailing List

On Sat, Oct 13, 2012 at 02:52:51AM +0000, Myklebust, Trond wrote:
> On Fri, 2012-10-12 at 19:31 -0700, Larry McVoy wrote:
> > As I've said, debugging this is going to be hard, it means stopping my
> > company.  We can do one more crash, you guys can tell me what to do,
> > but that's about it.
> 
> Do you have a reproducer? 

Oh yeah, it happens every time.  Run skype on a mac w/ your home dir NFS
mounted.  Skype tries to do some lock operation and the machine panics.

We've done it 4 times in a row to be sure we weren't smoking crack.  No
crack.  Just panics.
-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:56             ` Larry McVoy
@ 2012-10-13  3:05               ` Myklebust, Trond
  2012-10-13  4:42               ` Myklebust, Trond
  1 sibling, 0 replies; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-13  3:05 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Linus Torvalds, Bruce Fields, Linux NFS Mailing List

T24gRnJpLCAyMDEyLTEwLTEyIGF0IDE5OjU2IC0wNzAwLCBMYXJyeSBNY1ZveSB3cm90ZToNCj4g
T24gU2F0LCBPY3QgMTMsIDIwMTIgYXQgMDI6NTI6NTFBTSArMDAwMCwgTXlrbGVidXN0LCBUcm9u
ZCB3cm90ZToNCj4gPiBPbiBGcmksIDIwMTItMTAtMTIgYXQgMTk6MzEgLTA3MDAsIExhcnJ5IE1j
Vm95IHdyb3RlOg0KPiA+ID4gQXMgSSd2ZSBzYWlkLCBkZWJ1Z2dpbmcgdGhpcyBpcyBnb2luZyB0
byBiZSBoYXJkLCBpdCBtZWFucyBzdG9wcGluZyBteQ0KPiA+ID4gY29tcGFueS4gIFdlIGNhbiBk
byBvbmUgbW9yZSBjcmFzaCwgeW91IGd1eXMgY2FuIHRlbGwgbWUgd2hhdCB0byBkbywNCj4gPiA+
IGJ1dCB0aGF0J3MgYWJvdXQgaXQuDQo+ID4gDQo+ID4gRG8geW91IGhhdmUgYSByZXByb2R1Y2Vy
PyANCj4gDQo+IE9oIHllYWgsIGl0IGhhcHBlbnMgZXZlcnkgdGltZS4gIFJ1biBza3lwZSBvbiBh
IG1hYyB3LyB5b3VyIGhvbWUgZGlyIE5GUw0KPiBtb3VudGVkLiAgU2t5cGUgdHJpZXMgdG8gZG8g
c29tZSBsb2NrIG9wZXJhdGlvbiBhbmQgdGhlIG1hY2hpbmUgcGFuaWNzLg0KPiANCj4gV2UndmUg
ZG9uZSBpdCA0IHRpbWVzIGluIGEgcm93IHRvIGJlIHN1cmUgd2Ugd2VyZW4ndCBzbW9raW5nIGNy
YWNrLiAgTm8NCj4gY3JhY2suICBKdXN0IHBhbmljcy4NCg0KVGhhbmtzISBUaGF0IHNob3VsZCBi
ZSBlYXN5IHRvIGR1cGxpY2F0ZS4NCg0KSSdsbCBnaXZlIGl0IGEgc2hvdCB0b21vcnJvdywgYW5k
IHNlZSBpZiBJIGNhbiBnZXQgYSB3aXJlc2hhcmsgdHJhY2UuLi4NCg0KLS0gDQpUcm9uZCBNeWts
ZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xl
YnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:56             ` Larry McVoy
  2012-10-13  3:05               ` Myklebust, Trond
@ 2012-10-13  4:42               ` Myklebust, Trond
  2012-10-14  1:42                 ` Larry McVoy
  2012-10-17 14:00                 ` Bruce Fields
  1 sibling, 2 replies; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-13  4:42 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Linus Torvalds, Bruce Fields, Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

On Fri, 2012-10-12 at 19:56 -0700, Larry McVoy wrote:
> On Sat, Oct 13, 2012 at 02:52:51AM +0000, Myklebust, Trond wrote:
> > On Fri, 2012-10-12 at 19:31 -0700, Larry McVoy wrote:
> > > As I've said, debugging this is going to be hard, it means stopping my
> > > company.  We can do one more crash, you guys can tell me what to do,
> > > but that's about it.
> > 
> > Do you have a reproducer? 
> 
> Oh yeah, it happens every time.  Run skype on a mac w/ your home dir NFS
> mounted.  Skype tries to do some lock operation and the machine panics.
> 
> We've done it 4 times in a row to be sure we weren't smoking crack.  No
> crack.  Just panics.

OK. I think I can see what is happening. We need to filter the return
values from nlm_lookup_file() when dealing with NLMv1 and NLMv3.

Can you please try the attached patch?

Cheers
  Trond

PS: you are presumably running NFSv2 on your Macs. Odd that they should
default to that...
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-NLM-nlm_lookup_file-may-return-NLMv4-specific-error-.patch --]
[-- Type: text/x-patch; name="0001-NLM-nlm_lookup_file-may-return-NLMv4-specific-error-.patch", Size: 1753 bytes --]

From 1f0d4d1bb77fa0216dd2fb2041cc9b44db4697e0 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Sat, 13 Oct 2012 00:30:28 -0400
Subject: [PATCH] NLM: nlm_lookup_file() may return NLMv4-specific error codes

If the filehandle is stale, or open access is denied for some reason,
nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh
or nlm4_failed. These get passed right through nlm_lookup_file(),
and so when nlmsvc_retrieve_args() calls the latter, it needs to filter
the result through the cast_status() machinery.

Failure to do so, will trigger the BUG_ON() in encode_nlm_stat...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/lockd/clntxdr.c | 2 +-
 fs/lockd/svcproc.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/lockd/clntxdr.c b/fs/lockd/clntxdr.c
index 180ac34..cf04cda 100644
--- a/fs/lockd/clntxdr.c
+++ b/fs/lockd/clntxdr.c
@@ -223,7 +223,7 @@ static void encode_nlm_stat(struct xdr_stream *xdr,
 {
 	__be32 *p;
 
-	BUG_ON(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD);
+	WARN_ON_ONCE(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD);
 	p = xdr_reserve_space(xdr, 4);
 	*p = stat;
 }
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
index d27aab1..d413af3 100644
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -67,7 +67,8 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
 
 	/* Obtain file pointer. Not used by FREE_ALL call. */
 	if (filp != NULL) {
-		if ((error = nlm_lookup_file(rqstp, &file, &lock->fh)) != 0)
+		error = cast_status(nlm_lookup_file(rqstp, &file, &lock->fh));
+		if (error != 0)
 			goto no_locks;
 		*filp = file;
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  1:45         ` Linus Torvalds
@ 2012-10-13 21:31           ` Daniel Kahn Gillmor
  0 siblings, 0 replies; 43+ messages in thread
From: Daniel Kahn Gillmor @ 2012-10-13 21:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jim Rees, Larry McVoy, Bruce Fields, Trond Myklebust,
	Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]

On 10/12/2012 09:45 PM, Linus Torvalds wrote:
> On Sat, Oct 13, 2012 at 10:36 AM, Jim Rees <rees@umich.edu> wrote:
>>
>> I install kernels on Ubuntu every day of the week and twice on Sundays, and
>> never jump through their silly hoops.  The only magic is making the
>> initramfs, if you use one.
>>
>> make install modules_install
>> update-initramfs -c -k 3.6.1 (or whatever)
>> update-grub
> 
> This magic is what the distro "/usr/sbin/installkernel" script is
> supposed to do, which the kernel Makefile does for the "install"
> target.
> 
> So the "make install" *should* have done the grub and initramfs
> update. Ubuntu screwed up (probably because Debian did).
> 
> I wish they'd just fix it. 

fwiw, this was fixed in debian as of debianutils package 3.4.3.  So it
should be functional in testing (wheezy) but, alas, not in the current
stable release (squeeze).

For ubuntu, i think the fixed packages should be available from "nattie
narwhal" forward, but i haven't tested them directly.

For reference, the bug report in debian is at:

   http://bugs.debian.org/607411

Regards,

	--dkg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1030 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  4:42               ` Myklebust, Trond
@ 2012-10-14  1:42                 ` Larry McVoy
  2012-10-15  0:43                   ` Bruce Fields
  2012-10-17 14:00                 ` Bruce Fields
  1 sibling, 1 reply; 43+ messages in thread
From: Larry McVoy @ 2012-10-14  1:42 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Larry McVoy, Linus Torvalds, Bruce Fields, Linux NFS Mailing List

> PS: you are presumably running NFSv2 on your Macs. Odd that they should
> default to that...

I dunno if they default to that or we force that.  We have had lots of
problems with Linux NFS, we export our home directories but they are
more or less read only.

Maybe things have gotten better but back in the day I could crash the
kernel with a bk clone to a NFS directory.

We've also had problems with NFS versions, 2 has been stable for us so
we tend to use that when we can.

If you want me to get a rant about how retarded the various NFS versions
are I can go there.  The Sun NFS people absolutely positively did not
understand what RPC version numbers are for.  And they invented them.

Sigh.
-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:28       ` Myklebust, Trond
  2012-10-13  2:31         ` Larry McVoy
@ 2012-10-14 19:39         ` Bruce Fields
  2012-10-14 19:44           ` Linus Torvalds
  2012-10-14 20:55           ` Chuck Lever
  1 sibling, 2 replies; 43+ messages in thread
From: Bruce Fields @ 2012-10-14 19:39 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Linus Torvalds, Larry McVoy, Linux NFS Mailing List, Chuck Lever

On Sat, Oct 13, 2012 at 02:28:39AM +0000, Myklebust, Trond wrote:
> On Sat, 2012-10-13 at 10:02 +0900, Linus Torvalds wrote:
> > On Sat, Oct 13, 2012 at 9:21 AM, Larry McVoy <lm@bitmover.com> wrote:
> > >
> > > Ahh, I've been away from the kernel too long.  I miss that delicate
> > > management touch.
> > 
> > "Delicate Management Touch" is my middle name.
> > 
> > > pics of the stack trace at http://www.mcvoy.com/lm/nfs-lock-crash
> > 
> > Ok, that's just the normal kind of random left-over oopses due to
> > subsequent problems of a BUG_ON(). Looks like the watchdog timer ends
> > up being unhappy, almost certainly simply because some core filesystem
> > spinlock not being released.
> > 
> > It used to be (a long long time ago) that we'd recover fairly
> > gracefully from BUG_ON()'s - back when the main shared lock we had was
> > the kernel lock, and we had a single per-process kernel lock counter.
> > So when we killed the process, we could clean that single lock up.
> > 
> > These days, if some process dies in random kernel code due to a
> > BUG_ON() or a wild pointer or similar, and we kill it, we are seldom
> > able to do so cleanly. So the best we can hope for is that it happened
> > in some context where it held no (important) locks. Which is rare. So
> > BUG_ON()'s are often fatal, and there are these kinds of downstream
> > problems where they get flushed off the screen by subsequent issues...
> 
> If that code is being called under a lock, then we have other problems.
> It is standard XDR code: it should always be called from an ordinary
> process context with no special locks being held by the callers.
> 
> > Ho humm. Google doesn't seem to be finding any similar bug-reports, so
> > unless Bruce or Trond go "Ahh, I know what it's about", I do think we
> > would want to get as much more info as possible.
> 
> Never seen it before, and I see no reason why it should drag the entire
> box down with it. It is part of the NLM server's callback code, so there
> is no chance of it being called as part of a memory reclaim or anything
> similarly sensitive to the rest of the box.
> 
> Are we sure that this BUG_ON() really is top of the chain of Oopses
> here? All I can see it doing is crashing the lockd server process,

Can't it be called from the rpciod workqueue?  I'm not sure what happens
when we hit a BUG there.

It looks like a bunch of BUG_ON's got added with an xdr rewrite in
2b061f9ef216b6d229b06267f188167fd6ab3d9b.  Maybe Chuck or someone should
do a 'git grep BUG fs/lockd' and figure out what those should be
instead?

And I need to do the same for nfsd; I've been sloppy about using them as
asserts.

--b.

> which
> will seriously inconvenience all the NFS clients trying to do locking,
> but it shouldn't be affecting the swapper process as we're seeing in the
> Oops screenshots.
> If it really is the first thing to Oops, then the only thing I can think
> of there that would trigger other Oopses would be a memory corruption
> (use after free or some such thing?). Perhaps Larry could try turning on
> some of the less intrusive slab debugging options?
> 
> > Doing a kernel compile really isn't that bad. The only nasty piece is
> > getting the kernel configuration right, but you can just use the
> > distro config. It's much too big and contains everything, but it will
> > work, and gets you as similar a kernel as possible. Of course, Ubuntu
> > has made installing your own kernel stupidly complicated (you have to
> > build a package and install it using the package manager), but while
> > it's an annoying extra step or two (compared to just doing a "make
> > modules_install install"), it's not rocket surgery. There's a few help
> > pages for it:
> > 
> >     https://help.ubuntu.com/community/Kernel/Compile
> > 
> > being the first one.
> > 
> >                 Linus
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  2:39       ` Boaz Harrosh
  2012-10-13  2:43         ` Larry McVoy
@ 2012-10-14 19:43         ` Bruce Fields
  2012-10-15 18:20           ` Boaz Harrosh
  1 sibling, 1 reply; 43+ messages in thread
From: Bruce Fields @ 2012-10-14 19:43 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Myklebust, Trond, Linus Torvalds, Linux NFS Mailing List,
	Larry McVoy

On Fri, Oct 12, 2012 at 07:39:22PM -0700, Boaz Harrosh wrote:
> On 10/12/2012 07:32 PM, Myklebust, Trond wrote:
> > On Fri, 2012-10-12 at 19:27 -0700, Boaz Harrosh wrote:
> >> On 10/12/2012 04:52 PM, Linus Torvalds wrote:
> >>> Guys, check this report from Larry out.
> >>>
> >>> Also, why the *HELL* is that a BUG_ON() in the first place? Who was
> >>> the less-than-gifted person who decided "if this thing can happen,
> >>> let's just kill the whole machine"?
> >>>
> >>
> >> Something is trivially weird
> >>
> >> fs/lockd/clntxdr.c:226 is in this static function:
> >> 	encode_nlm_stat()
> >>
> >> encode_nlm_stat() is called in two places
> >> 	static void nlm_xdr_enc_res(...)
> >>   and
> >> 	static void nlm_xdr_enc_testres(...)
> >>
> >> But these two are not called anywhere. In-fact a Kernel wide grep
> >> returns a single occurrence of both
> > 
> > They are used in the nlm_procedures[] array as TEST_RES, LOCK_RES,
> > CANCEL_RES, UNLOCK_RES and GRANTED_RES xdr encoding procedures.
> 
> Ha, sorry that was impossible to find. OK I'll have a look

Everybody falls into that trap at least once.  Any objections to killing
those macros and expanding that stuff out?

--b.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-14 19:39         ` Bruce Fields
@ 2012-10-14 19:44           ` Linus Torvalds
  2012-10-14 20:55           ` Chuck Lever
  1 sibling, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-10-14 19:44 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Myklebust, Trond, Larry McVoy, Linux NFS Mailing List,
	Chuck Lever

On Sun, Oct 14, 2012 at 12:39 PM, Bruce Fields <bfields@fieldses.org> wrote:
>
> Can't it be called from the rpciod workqueue?  I'm not sure what happens
> when we hit a BUG there.

>From personal experience, I can say that killing a workqueue will
cause tons of *very* non-obvious downstream errors, so yeah, if it's
called that way, I could easily see watchdogs firing etc.

> It looks like a bunch of BUG_ON's got added with an xdr rewrite in
> 2b061f9ef216b6d229b06267f188167fd6ab3d9b.  Maybe Chuck or someone should
> do a 'git grep BUG fs/lockd' and figure out what those should be
> instead?

Even just replacing them with a WARN_ON_ONCE() would probably help
things. Because BUG_ON() really *really* should be a "last possible
case" thing.

               Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-14 19:39         ` Bruce Fields
  2012-10-14 19:44           ` Linus Torvalds
@ 2012-10-14 20:55           ` Chuck Lever
  2012-10-14 21:05             ` Linus Torvalds
  1 sibling, 1 reply; 43+ messages in thread
From: Chuck Lever @ 2012-10-14 20:55 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Myklebust, Trond, Linus Torvalds, Larry McVoy,
	Linux NFS Mailing List


On Oct 14, 2012, at 3:39 PM, Bruce Fields <bfields@fieldses.org> wrote:

> On Sat, Oct 13, 2012 at 02:28:39AM +0000, Myklebust, Trond wrote:
>> On Sat, 2012-10-13 at 10:02 +0900, Linus Torvalds wrote:
>>> On Sat, Oct 13, 2012 at 9:21 AM, Larry McVoy <lm@bitmover.com> wrote:
>>>> 
>>>> Ahh, I've been away from the kernel too long.  I miss that delicate
>>>> management touch.
>>> 
>>> "Delicate Management Touch" is my middle name.
>>> 
>>>> pics of the stack trace at http://www.mcvoy.com/lm/nfs-lock-crash
>>> 
>>> Ok, that's just the normal kind of random left-over oopses due to
>>> subsequent problems of a BUG_ON(). Looks like the watchdog timer ends
>>> up being unhappy, almost certainly simply because some core filesystem
>>> spinlock not being released.
>>> 
>>> It used to be (a long long time ago) that we'd recover fairly
>>> gracefully from BUG_ON()'s - back when the main shared lock we had was
>>> the kernel lock, and we had a single per-process kernel lock counter.
>>> So when we killed the process, we could clean that single lock up.
>>> 
>>> These days, if some process dies in random kernel code due to a
>>> BUG_ON() or a wild pointer or similar, and we kill it, we are seldom
>>> able to do so cleanly. So the best we can hope for is that it happened
>>> in some context where it held no (important) locks. Which is rare. So
>>> BUG_ON()'s are often fatal, and there are these kinds of downstream
>>> problems where they get flushed off the screen by subsequent issues...
>> 
>> If that code is being called under a lock, then we have other problems.
>> It is standard XDR code: it should always be called from an ordinary
>> process context with no special locks being held by the callers.
>> 
>>> Ho humm. Google doesn't seem to be finding any similar bug-reports, so
>>> unless Bruce or Trond go "Ahh, I know what it's about", I do think we
>>> would want to get as much more info as possible.
>> 
>> Never seen it before, and I see no reason why it should drag the entire
>> box down with it. It is part of the NLM server's callback code, so there
>> is no chance of it being called as part of a memory reclaim or anything
>> similarly sensitive to the rest of the box.
>> 
>> Are we sure that this BUG_ON() really is top of the chain of Oopses
>> here? All I can see it doing is crashing the lockd server process,
> 
> Can't it be called from the rpciod workqueue?  I'm not sure what happens
> when we hit a BUG there.
> 
> It looks like a bunch of BUG_ON's got added with an xdr rewrite in
> 2b061f9ef216b6d229b06267f188167fd6ab3d9b.  Maybe Chuck or someone should
> do a 'git grep BUG fs/lockd' and figure out what those should be
> instead?

In my own defense, I also removed a lot of BUG_ON's in that series.

This particular one was added because "getting the endianness of the reply status code wrong" turns out to be a fairly easy problem to introduce without noticing it when making changes to NFSD or lockd server.  There is a set of status codes that are already XDR-encoded, and a set that are not.  A person can easily choose the wrong one to use.

I expected that such problems would be caught quickly during development if we actually checked for it in the XDR layer and barked if it is incorrect.  I assumed therefore that these assertions would not be encountered by the time code gets in front of users.

I think range-check assertions in the XDR code are valuable.  Whether they are done via BUG_ON or WARN_ON_ONCE is a matter of priority: BUG_ON forces you to notice the problem and address it, while WARN_ON allows the system to continue operating with the bug, but the bug can be ignored (or the WARN_ON simply removed because it is annoying).


> And I need to do the same for nfsd; I've been sloppy about using them as
> asserts.
> 
> --b.
> 
>> which
>> will seriously inconvenience all the NFS clients trying to do locking,
>> but it shouldn't be affecting the swapper process as we're seeing in the
>> Oops screenshots.
>> If it really is the first thing to Oops, then the only thing I can think
>> of there that would trigger other Oopses would be a memory corruption
>> (use after free or some such thing?). Perhaps Larry could try turning on
>> some of the less intrusive slab debugging options?
>> 
>>> Doing a kernel compile really isn't that bad. The only nasty piece is
>>> getting the kernel configuration right, but you can just use the
>>> distro config. It's much too big and contains everything, but it will
>>> work, and gets you as similar a kernel as possible. Of course, Ubuntu
>>> has made installing your own kernel stupidly complicated (you have to
>>> build a package and install it using the package manager), but while
>>> it's an annoying extra step or two (compared to just doing a "make
>>> modules_install install"), it's not rocket surgery. There's a few help
>>> pages for it:
>>> 
>>>    https://help.ubuntu.com/community/Kernel/Compile
>>> 
>>> being the first one.
>>> 
>>>                Linus
>> 
>> -- 
>> Trond Myklebust
>> Linux NFS client maintainer
>> 
>> NetApp
>> Trond.Myklebust@netapp.com
>> www.netapp.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-14 20:55           ` Chuck Lever
@ 2012-10-14 21:05             ` Linus Torvalds
  2012-10-14 22:32               ` Chuck Lever
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2012-10-14 21:05 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Bruce Fields, Myklebust, Trond, Larry McVoy,
	Linux NFS Mailing List

On Sun, Oct 14, 2012 at 1:55 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
> I think range-check assertions in the XDR code are valuable.  Whether they are done via BUG_ON or WARN_ON_ONCE is a matter of priority:

Bullshit.

> BUG_ON forces you to notice the problem and address it, while WARN_ON allows the system to continue operating with the bug, but the bug can be ignored (or the WARN_ON simply removed because it is annoying).

Bullshit again.

I don't think you've ever even *seen* a BUG_ON() fire in critical
code, have you? It's not pretty.

It doesn't "force you to notice the problem". Rather the reverse. It
often causes a totally silent death of the machine, and/or
scrolled-off messages that you never see, exactly because the thing
you *do* see is the fallout from the developer having done something
incredibly stupid.

Really.

There's a reason I say that BUG_ON() is only for those cases where the
machine is already totally unrecoverable. It *will* screw the machine
up, usually in ways that makes it totally undebuggable.

So please stop making excuses for it. It's bad. It isn't some kind of
"this forces you to notice". Really. The fact that you think it is
just shows that you probably never saw them actually trigger, or you
only saw them trigger in code that happened to not hold any locks etc.

So shove those BUG_ON()'s where the sun don't shine. They make things
*worse*, not better. They are valid for some very core code where a
major internal data structure is totally scrogged, but they are
*never* valid for some kind of random assert(). Exactly because
BUG_ON() is such a big hammer that using it is more likely to
*destroy* the data you want to assert and show the user, rather than
actually showing the user anything at all.

             Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-14 21:05             ` Linus Torvalds
@ 2012-10-14 22:32               ` Chuck Lever
  2012-10-14 22:54                 ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Chuck Lever @ 2012-10-14 22:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bruce Fields, Myklebust, Trond, Larry McVoy,
	Linux NFS Mailing List


On Oct 14, 2012, at 5:05 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, Oct 14, 2012 at 1:55 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> I think range-check assertions in the XDR code are valuable.  Whether they are done via BUG_ON or WARN_ON_ONCE is a matter of priority:
> 
> Bullshit.
> 
>> BUG_ON forces you to notice the problem and address it, while WARN_ON allows the system to continue operating with the bug, but the bug can be ignored (or the WARN_ON simply removed because it is annoying).
> 
> Bullshit again.

> I don't think you've ever even *seen* a BUG_ON() fire in critical
> code, have you? It's not pretty.

I've been working on the Linux kernel NFS client for 12 years.  I've seen plenty of BUG_ONs fire at inopportune moments, and plenty fire right when it is useful.

The client-side XDR encoder functions don't return a return value.  Upper layers can not find out there was a problem if marshalling fails.  That's why we use BUG_ON here: the code really has to stop executing, especially if pointers are involved, to avoid crapping on other parts of memory or posting data to a server that could corrupt file data.

Remember, in a file system, you need to be concerned about kernel data structures _and_ what is being written to disk.

Even if we change BUG_ON to WARN_ON everywhere, there are still plenty of ways to oops in that code, and we have the same result: the machine may lock up or not write the back trace to the system log.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-14 22:32               ` Chuck Lever
@ 2012-10-14 22:54                 ` Linus Torvalds
  0 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2012-10-14 22:54 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Bruce Fields, Myklebust, Trond, Larry McVoy,
	Linux NFS Mailing List

On Sun, Oct 14, 2012 at 3:32 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
> The client-side XDR encoder functions don't return a return value.  Upper layers can not find out there was a problem if marshalling fails.  That's why we use BUG_ON here

The above is STILL pure bull.

Use a WARN_ON_ONCE() so that people can actually *report* the bugs
sanely (see this very thread about why BUG_ON causes problems for
reporters).

If you are so damned certain that you mustn't continue, make the damn
function return the error code, and stop marshalling. If you don't do
error handling, that's no excuse for BUG_ON.

And if you think the code gets too complicated from that, then that
*still* isn't a reason to do BUG_ON(). See above.

What part of this thread do you have problems acknowledging? The
undeniable *facts* (that you seem to be in total denial about) from
this very thread are:

 - the BUG_ON() caused an otherwise *benign* bug to result in a unusable system.

   Your "to avoid crapping on other parts of memory or posting data"
argument is pretty much f*cked up, when you instead cause a fatal
error!

 - the BUG_ON() caused the debug data to be almost totally useless.

   Look at the screen shots. Look at the lack of debug info. Just look at it.

Why are you denying reality, and bringing up arguments that are purely
theoretical and shown to be wrong in reality?

BUG_ON() in a filesystem or random service is pure and utter garbage.
There is no way it is ever the right thing to do. Get rid of them, and
stop posting excuses for them.

Christ, we had THIS VERY SAME issue not more than a month or two ago,
when the locking code had a BUG() in it that turned what should have
been a simple error return into a DoS attack by normal users. See
commit 8d657eb3b438.

The whole "it's better to BUG_ON() than do something unexpected" is a
disease. It may be well-intentioned, but the road to hell is paved
with good intentions, and saying "I don't want to do odd things" is
stupid to do, when the BUG_ON() itself just causes *different*
catastrophic odd things to happen.

Really: killing the machine IS NOT ANY BETTER than sending out odd packets.

             Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-14  1:42                 ` Larry McVoy
@ 2012-10-15  0:43                   ` Bruce Fields
  2012-10-15  4:38                     ` Myklebust, Trond
  2012-10-15  4:41                     ` Myklebust, Trond
  0 siblings, 2 replies; 43+ messages in thread
From: Bruce Fields @ 2012-10-15  0:43 UTC (permalink / raw)
  To: Larry McVoy, Myklebust, Trond, Linus Torvalds,
	Linux NFS Mailing List

On Sat, Oct 13, 2012 at 06:42:15PM -0700, Larry McVoy wrote:
> > PS: you are presumably running NFSv2 on your Macs. Odd that they should
> > default to that...
> 
> I dunno if they default to that or we force that.  We have had lots of
> problems with Linux NFS, we export our home directories but they are
> more or less read only.
> 
> Maybe things have gotten better but back in the day I could crash the
> kernel with a bk clone to a NFS directory.

Bug reports welcomed if any of those bugs are still around.

--b.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-15  0:43                   ` Bruce Fields
@ 2012-10-15  4:38                     ` Myklebust, Trond
  2012-10-15 14:34                       ` Larry McVoy
  2012-10-15  4:41                     ` Myklebust, Trond
  1 sibling, 1 reply; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-15  4:38 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Larry McVoy, Linus Torvalds, Linux NFS Mailing List

T24gU3VuLCAyMDEyLTEwLTE0IGF0IDIwOjQzIC0wNDAwLCBCcnVjZSBGaWVsZHMgd3JvdGU6DQo+
IE9uIFNhdCwgT2N0IDEzLCAyMDEyIGF0IDA2OjQyOjE1UE0gLTA3MDAsIExhcnJ5IE1jVm95IHdy
b3RlOg0KPiA+ID4gUFM6IHlvdSBhcmUgcHJlc3VtYWJseSBydW5uaW5nIE5GU3YyIG9uIHlvdXIg
TWFjcy4gT2RkIHRoYXQgdGhleSBzaG91bGQNCj4gPiA+IGRlZmF1bHQgdG8gdGhhdC4uLg0KPiA+
IA0KPiA+IEkgZHVubm8gaWYgdGhleSBkZWZhdWx0IHRvIHRoYXQgb3Igd2UgZm9yY2UgdGhhdC4g
IFdlIGhhdmUgaGFkIGxvdHMgb2YNCj4gPiBwcm9ibGVtcyB3aXRoIExpbnV4IE5GUywgd2UgZXhw
b3J0IG91ciBob21lIGRpcmVjdG9yaWVzIGJ1dCB0aGV5IGFyZQ0KPiA+IG1vcmUgb3IgbGVzcyBy
ZWFkIG9ubHkuDQo+ID4gDQo+ID4gTWF5YmUgdGhpbmdzIGhhdmUgZ290dGVuIGJldHRlciBidXQg
YmFjayBpbiB0aGUgZGF5IEkgY291bGQgY3Jhc2ggdGhlDQo+ID4ga2VybmVsIHdpdGggYSBiayBj
bG9uZSB0byBhIE5GUyBkaXJlY3RvcnkuDQo+IA0KPiBCdWcgcmVwb3J0cyB3ZWxjb21lZCBpZiBh
bnkgb2YgdGhvc2UgYnVncyBhcmUgc3RpbGwgYXJvdW5kLg0KDQpUaGUgb3RoZXIgdGhpbmcgdG8g
bm90ZSBpcyB0aGF0IGF0IHRoaXMgcG9pbnQsIE5GU3YyIGhhcyBiZWVuIGxlZ2FjeQ0KY29kZSBm
b3IgbW9yZSB0aGFuIDEwIHllYXJzIG9uIExpbnV4IGFuZCBpcyBzdGFydGluZyB0byBzdWZmZXIg
YmlnIHRpbWUNCmZyb20gYml0IHJvdC4gV2Ugc2hvdWxkIHByb2JhYmx5IGFpbSB0byByZW1vdmUg
aXQgZW50aXJlbHkgaW4gdGhlIG5leHQNCjEtMiB5ZWFycy4gVGhlcmUgaXMgbm8gcGxhY2UgaW4g
dG9kYXkncyB3b3JsZCBmb3IgYSBwcm90b2NvbCB0aGF0IGNhbg0Kb25seSBkZWFsIHdpdGggYSAy
R0IgbWF4aW11bSBmaWxlIHNpemUuLi4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29t
DQp3d3cubmV0YXBwLmNvbQ0K

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-15  0:43                   ` Bruce Fields
  2012-10-15  4:38                     ` Myklebust, Trond
@ 2012-10-15  4:41                     ` Myklebust, Trond
  2012-10-15 12:11                       ` Bruce Fields
  1 sibling, 1 reply; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-15  4:41 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Larry McVoy, Linus Torvalds, Linux NFS Mailing List

T24gU3VuLCAyMDEyLTEwLTE0IGF0IDIwOjQzIC0wNDAwLCBCcnVjZSBGaWVsZHMgd3JvdGU6DQo+
IEJ1ZyByZXBvcnRzIHdlbGNvbWVkIGlmIGFueSBvZiB0aG9zZSBidWdzIGFyZSBzdGlsbCBhcm91
bmQuDQoNCllvdSBkaWQgc2VlIHRoZSBwYXRjaCB0aGF0IEkgc3VibWl0dGVkLCByaWdodD8gU2lu
Y2UgaXQgaXMgYSBzZXJ2ZXIgYnVnLA0KSSdtIGFzc3VtaW5nIHRoYXQgeW91IHdpbGwgd2FudCB0
byBwdXNoIGl0IHVwc3RyZWFtLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNs
aWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3
dy5uZXRhcHAuY29tDQo=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
@ 2012-10-15  8:05 George Spelvin
  2012-10-15 12:19 ` Myklebust, Trond
  0 siblings, 1 reply; 43+ messages in thread
From: George Spelvin @ 2012-10-15  8:05 UTC (permalink / raw)
  To: linux-nfs, Trond.Myklebust; +Cc: linux

Trond.Myklebust@netapp.com wrote:
> We should probably aim to remove it entirely in the next 1-2 years.
> There is no place in today's world for a protocol that can only deal
> with a 2GB maximum file size...

Today's world includes a lot of yesterday's world.

I'm still running a SunOS 4.1.4 machine.  Yes, it does real (dusty
deck) work; I just had to resurrect it when its HDD died.

I got to rediscover its 2 GB maximum FILESYSTEM size.

(When the hardware finally craps out for real, I'll probably
punt to QEMU.  But I either have to add SunOS system call
emulation to QEMU, or run the whole OS under hardware emulation.)

Anyway, most of its files are actually on a Linux NFS server
with a proper RAID and backup system.

I'd kind of like to keep NFSv2 working, if you don't mind.

("I do mind; install unfsd" is perhaps a legitimate response.)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-15  4:41                     ` Myklebust, Trond
@ 2012-10-15 12:11                       ` Bruce Fields
  0 siblings, 0 replies; 43+ messages in thread
From: Bruce Fields @ 2012-10-15 12:11 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Larry McVoy, Linus Torvalds, Linux NFS Mailing List

On Mon, Oct 15, 2012 at 04:41:56AM +0000, Myklebust, Trond wrote:
> On Sun, 2012-10-14 at 20:43 -0400, Bruce Fields wrote:
> > Bug reports welcomed if any of those bugs are still around.
> 
> You did see the patch that I submitted, right?

Yes.

> Since it is a server bug,
> I'm assuming that you will want to push it upstream.

Sure, I can take it.

--b.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-15  8:05 George Spelvin
@ 2012-10-15 12:19 ` Myklebust, Trond
  2012-10-15 16:02   ` VDR User
  2012-10-16  1:48   ` George Spelvin
  0 siblings, 2 replies; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-15 12:19 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-nfs@vger.kernel.org

T24gTW9uLCAyMDEyLTEwLTE1IGF0IDA0OjA1IC0wNDAwLCBHZW9yZ2UgU3BlbHZpbiB3cm90ZToN
Cj4gVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20gd3JvdGU6DQo+ID4gV2Ugc2hvdWxkIHByb2Jh
Ymx5IGFpbSB0byByZW1vdmUgaXQgZW50aXJlbHkgaW4gdGhlIG5leHQgMS0yIHllYXJzLg0KPiA+
IFRoZXJlIGlzIG5vIHBsYWNlIGluIHRvZGF5J3Mgd29ybGQgZm9yIGEgcHJvdG9jb2wgdGhhdCBj
YW4gb25seSBkZWFsDQo+ID4gd2l0aCBhIDJHQiBtYXhpbXVtIGZpbGUgc2l6ZS4uLg0KPiANCj4g
VG9kYXkncyB3b3JsZCBpbmNsdWRlcyBhIGxvdCBvZiB5ZXN0ZXJkYXkncyB3b3JsZC4NCj4gDQo+
IEknbSBzdGlsbCBydW5uaW5nIGEgU3VuT1MgNC4xLjQgbWFjaGluZS4gIFllcywgaXQgZG9lcyBy
ZWFsIChkdXN0eQ0KPiBkZWNrKSB3b3JrOyBJIGp1c3QgaGFkIHRvIHJlc3VycmVjdCBpdCB3aGVu
IGl0cyBIREQgZGllZC4NCj4gDQo+IEkgZ290IHRvIHJlZGlzY292ZXIgaXRzIDIgR0IgbWF4aW11
bSBGSUxFU1lTVEVNIHNpemUuDQo+IA0KPiAoV2hlbiB0aGUgaGFyZHdhcmUgZmluYWxseSBjcmFw
cyBvdXQgZm9yIHJlYWwsIEknbGwgcHJvYmFibHkNCj4gcHVudCB0byBRRU1VLiAgQnV0IEkgZWl0
aGVyIGhhdmUgdG8gYWRkIFN1bk9TIHN5c3RlbSBjYWxsDQo+IGVtdWxhdGlvbiB0byBRRU1VLCBv
ciBydW4gdGhlIHdob2xlIE9TIHVuZGVyIGhhcmR3YXJlIGVtdWxhdGlvbi4pDQo+IA0KPiBBbnl3
YXksIG1vc3Qgb2YgaXRzIGZpbGVzIGFyZSBhY3R1YWxseSBvbiBhIExpbnV4IE5GUyBzZXJ2ZXIN
Cj4gd2l0aCBhIHByb3BlciBSQUlEIGFuZCBiYWNrdXAgc3lzdGVtLg0KPiANCj4gSSdkIGtpbmQg
b2YgbGlrZSB0byBrZWVwIE5GU3YyIHdvcmtpbmcsIGlmIHlvdSBkb24ndCBtaW5kLg0KPiANCj4g
KCJJIGRvIG1pbmQ7IGluc3RhbGwgdW5mc2QiIGlzIHBlcmhhcHMgYSBsZWdpdGltYXRlIHJlc3Bv
bnNlLikNCg0KSSBkbyBtaW5kOyBpdCBpcyBjbGVhcmx5IHN0YXJ0aW5nIHRvIGJpdHJvdCBkdWUg
dG8gYW4gYWJzZW5jZSBvZiB1c2Vycy4NCk1haW50ZW5hbmNlIG9mIHVudXNlZCBjb2RlIGlzIGFj
dHVhbGx5IF9tb3JlXyBvZiBhIHBhaW4sIG5vdCBsZXNzLg0KDQpTbyB1bmZzZCBpcyBvbmUgc29s
dXRpb24uIEtlZXBpbmcgYSBWTSB3aXRoIGFuIG9sZGVyIHZlcnNpb24gb2YgdGhlDQpMaW51eCBr
ZXJuZWwgdGhhdCBzdGlsbCBzdXBwb3J0cyBORlN2MiBpcyBhbm90aGVyLiBWb2x1bnRlZXJpbmcg
dG8NCm1haW50YWluIHRoZSBjb2RlIGlzIGEgdGhpcmQuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0
DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RA
bmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg==

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-15  4:38                     ` Myklebust, Trond
@ 2012-10-15 14:34                       ` Larry McVoy
  2012-10-15 18:02                         ` Bruce Fields
  0 siblings, 1 reply; 43+ messages in thread
From: Larry McVoy @ 2012-10-15 14:34 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Bruce Fields, Larry McVoy, Linus Torvalds, Linux NFS Mailing List

On Mon, Oct 15, 2012 at 04:38:02AM +0000, Myklebust, Trond wrote:
> On Sun, 2012-10-14 at 20:43 -0400, Bruce Fields wrote:
> > On Sat, Oct 13, 2012 at 06:42:15PM -0700, Larry McVoy wrote:
> > > > PS: you are presumably running NFSv2 on your Macs. Odd that they should
> > > > default to that...
> > > 
> > > I dunno if they default to that or we force that.  We have had lots of
> > > problems with Linux NFS, we export our home directories but they are
> > > more or less read only.
> > > 
> > > Maybe things have gotten better but back in the day I could crash the
> > > kernel with a bk clone to a NFS directory.
> > 
> > Bug reports welcomed if any of those bugs are still around.
> 
> The other thing to note is that at this point, NFSv2 has been legacy
> code for more than 10 years on Linux and is starting to suffer big time
> from bit rot. We should probably aim to remove it entirely in the next
> 1-2 years. There is no place in today's world for a protocol that can
> only deal with a 2GB maximum file size...

Well, we support ancient machines (why?  damn good question) that only
speak NFSv2 so it would sorta suck for us if you pulled it.
-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-15 12:19 ` Myklebust, Trond
@ 2012-10-15 16:02   ` VDR User
  2012-10-16  1:48   ` George Spelvin
  1 sibling, 0 replies; 43+ messages in thread
From: VDR User @ 2012-10-15 16:02 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: George Spelvin, linux-nfs@vger.kernel.org

On Mon, Oct 15, 2012 at 5:19 AM, Myklebust, Trond
<Trond.Myklebust@netapp.com> wrote:
> I do mind; it is clearly starting to bitrot due to an absence of users.
> Maintenance of unused code is actually _more_ of a pain, not less.
>
> So unfsd is one solution. Keeping a VM with an older version of the
> Linux kernel that still supports NFSv2 is another. Volunteering to
> maintain the code is a third.

Growing pains can be hard for those who can't/won't update but when
it's time to move on, it's time to move on. Maintaining code _is_ in
fact a chore that eventually becomes counter-productive once the user
base has diminished enough and/or it's become so outdated. Your
suggestions might not thrill other people but then who likes taking
their medicine?

Cheers

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-15 14:34                       ` Larry McVoy
@ 2012-10-15 18:02                         ` Bruce Fields
  0 siblings, 0 replies; 43+ messages in thread
From: Bruce Fields @ 2012-10-15 18:02 UTC (permalink / raw)
  To: Larry McVoy, Myklebust, Trond, Linus Torvalds,
	Linux NFS Mailing List

On Mon, Oct 15, 2012 at 07:34:56AM -0700, Larry McVoy wrote:
> On Mon, Oct 15, 2012 at 04:38:02AM +0000, Myklebust, Trond wrote:
> > On Sun, 2012-10-14 at 20:43 -0400, Bruce Fields wrote:
> > > On Sat, Oct 13, 2012 at 06:42:15PM -0700, Larry McVoy wrote:
> > > > > PS: you are presumably running NFSv2 on your Macs. Odd that they should
> > > > > default to that...
> > > > 
> > > > I dunno if they default to that or we force that.  We have had lots of
> > > > problems with Linux NFS, we export our home directories but they are
> > > > more or less read only.
> > > > 
> > > > Maybe things have gotten better but back in the day I could crash the
> > > > kernel with a bk clone to a NFS directory.
> > > 
> > > Bug reports welcomed if any of those bugs are still around.
> > 
> > The other thing to note is that at this point, NFSv2 has been legacy
> > code for more than 10 years on Linux and is starting to suffer big time
> > from bit rot. We should probably aim to remove it entirely in the next
> > 1-2 years. There is no place in today's world for a protocol that can
> > only deal with a 2GB maximum file size...
> 
> Well, we support ancient machines (why?  damn good question) that only
> speak NFSv2 so it would sorta suck for us if you pulled it.

At least on the server side, I'd prefer to be more conservative about
removing NFSv2 support.

--b.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-14 19:43         ` Bruce Fields
@ 2012-10-15 18:20           ` Boaz Harrosh
  0 siblings, 0 replies; 43+ messages in thread
From: Boaz Harrosh @ 2012-10-15 18:20 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Myklebust, Trond, Linus Torvalds, Linux NFS Mailing List,
	Larry McVoy

On 10/14/2012 12:43 PM, Bruce Fields wrote:
> On Fri, Oct 12, 2012 at 07:39:22PM -0700, Boaz Harrosh wrote:
>> On 10/12/2012 07:32 PM, Myklebust, Trond wrote:
>>> They are used in the nlm_procedures[] array as TEST_RES, LOCK_RES,
>>> CANCEL_RES, UNLOCK_RES and GRANTED_RES xdr encoding procedures.
>>
>> Ha, sorry that was impossible to find. OK I'll have a look
> 
> Everybody falls into that trap at least once.  Any objections to killing
> those macros and expanding that stuff out?
> 

No objections from me. I did the one file just to understand and
actually I think the code even looks better without them.

But then I realized they are all over, I'm not sure I'll have
time to do it

> --b.
> 

Thanks
Boaz

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-15 12:19 ` Myklebust, Trond
  2012-10-15 16:02   ` VDR User
@ 2012-10-16  1:48   ` George Spelvin
  2012-10-16  1:52     ` Larry McVoy
  2012-10-16  3:46     ` Myklebust, Trond
  1 sibling, 2 replies; 43+ messages in thread
From: George Spelvin @ 2012-10-16  1:48 UTC (permalink / raw)
  To: linux, lm, Trond.Myklebust; +Cc: linux-nfs

Trond.Myklebust@netapp.com wrote:
> SSBkbyBtaW5kOyBpdCBpcyBjbGVhcmx5IHN0YXJ0aW5nIHRvIGJpdHJvdCBkdWUg
> dG8gYW4gYWJzZW5jZSBvZiB1c2Vycy4NCk1haW50ZW5hbmNlIG9mIHVudXNlZCBjb2RlIGlzIGFj
> dHVhbGx5IF9tb3JlXyBvZiBhIHBhaW4sIG5vdCBsZXNzLg0KDQpTbyB1bmZzZCBpcyBvbmUgc29s
> dXRpb24uIEtlZXBpbmcgYSBWTSB3aXRoIGFuIG9sZGVyIHZlcnNpb24gb2YgdGhlDQpMaW51eCBr
> ZXJuZWwgdGhhdCBzdGlsbCBzdXBwb3J0cyBORlN2MiBpcyBhbm90aGVyLiBWb2x1bnRlZXJpbmcg
> dG8NCm1haW50YWluIHRoZSBjb2RlIGlzIGEgdGhpcmQuDQoN

Which can be base64 decoded (why was it ever ENcoded?) to

> I do mind; it is clearly starting to bitrot due to an absence of users.
> Maintenance of unused code is actually _more_ of a pain, not less.
>
> So unfsd is one solution. Keeping a VM with an older version of the
> Linux kernel that still supports NFSv2 is another. Volunteering to
> maintain the code is a third.

If I might ask, though, is the pain concentrated more on the client or
the server side?

NFSv2 server support seems a fairly simple matter of some old
compatibility RPC calls.  The main pain is the limited size of the file
handle and (especially) readdir cookies.

Client support is probably more complicated, as NFS's "stateless server"
model puts the bulk of the complexity on the client, and you need a
thicker layer of logic to translate the operations into a different
vocabulary pf RPC calls.

I don't think NFSv2 client support would be mourned much; trying to to
use such an ancient limited machine as a file server seems stupid.

It's NFSv2 server support that I, and I believe Larry, are interested
in preserving, in order to provide services to ancient clients.

The only real use of v2 client code is te test the server.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-16  1:48   ` George Spelvin
@ 2012-10-16  1:52     ` Larry McVoy
  2012-10-16  3:46     ` Myklebust, Trond
  1 sibling, 0 replies; 43+ messages in thread
From: Larry McVoy @ 2012-10-16  1:52 UTC (permalink / raw)
  To: George Spelvin; +Cc: lm, Trond.Myklebust, linux-nfs

On Mon, Oct 15, 2012 at 09:48:24PM -0400, George Spelvin wrote:
> It's NFSv2 server support that I, and I believe Larry, are interested
> in preserving, in order to provide services to ancient clients.

Yup.  Exactly right.
-- 
---
Larry McVoy                lm at bitmover.com           http://www.bitkeeper.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-16  1:48   ` George Spelvin
  2012-10-16  1:52     ` Larry McVoy
@ 2012-10-16  3:46     ` Myklebust, Trond
  2012-10-16  4:39       ` George Spelvin
  2012-10-16 11:17       ` Jim Rees
  1 sibling, 2 replies; 43+ messages in thread
From: Myklebust, Trond @ 2012-10-16  3:46 UTC (permalink / raw)
  To: George Spelvin; +Cc: lm@bitmover.com, linux-nfs@vger.kernel.org

T24gTW9uLCAyMDEyLTEwLTE1IGF0IDIxOjQ4IC0wNDAwLCBHZW9yZ2UgU3BlbHZpbiB3cm90ZToN
Cj4gVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20gd3JvdGU6DQo+ID4gU1NCa2J5QnRhVzVrT3lC
cGRDQnBjeUJqYkdWaGNteDVJSE4wWVhKMGFXNW5JSFJ2SUdKcGRISnZkQ0JrZFdVZw0KPiA+IGRH
OGdZVzRnWVdKelpXNWpaU0J2WmlCMWMyVnljeTROQ2sxaGFXNTBaVzVoYm1ObElHOW1JSFZ1ZFhO
bFpDQmpiMlJsSUdseklHRmoNCj4gPiBkSFZoYkd4NUlGOXRiM0psWHlCdlppQmhJSEJoYVc0c0lH
NXZkQ0JzWlhOekxnMEtEUXBUYnlCMWJtWnpaQ0JwY3lCdmJtVWdjMjlzDQo+ID4gZFhScGIyNHVJ
RXRsWlhCcGJtY2dZU0JXVFNCM2FYUm9JR0Z1SUc5c1pHVnlJSFpsY25OcGIyNGdiMllnZEdobERR
cE1hVzUxZUNCcg0KPiA+IFpYSnVaV3dnZEdoaGRDQnpkR2xzYkNCemRYQndiM0owY3lCT1JsTjJN
aUJwY3lCaGJtOTBhR1Z5TGlCV2IyeDFiblJsWlhKcGJtY2cNCj4gPiBkRzhOQ20xaGFXNTBZV2x1
SUhSb1pTQmpiMlJsSUdseklHRWdkR2hwY21RdURRb04NCj4gDQo+IFdoaWNoIGNhbiBiZSBiYXNl
NjQgZGVjb2RlZCAod2h5IHdhcyBpdCBldmVyIEVOY29kZWQ/KSB0bw0KPiANCj4gPiBJIGRvIG1p
bmQ7IGl0IGlzIGNsZWFybHkgc3RhcnRpbmcgdG8gYml0cm90IGR1ZSB0byBhbiBhYnNlbmNlIG9m
IHVzZXJzLg0KPiA+IE1haW50ZW5hbmNlIG9mIHVudXNlZCBjb2RlIGlzIGFjdHVhbGx5IF9tb3Jl
XyBvZiBhIHBhaW4sIG5vdCBsZXNzLg0KPiA+DQo+ID4gU28gdW5mc2QgaXMgb25lIHNvbHV0aW9u
LiBLZWVwaW5nIGEgVk0gd2l0aCBhbiBvbGRlciB2ZXJzaW9uIG9mIHRoZQ0KPiA+IExpbnV4IGtl
cm5lbCB0aGF0IHN0aWxsIHN1cHBvcnRzIE5GU3YyIGlzIGFub3RoZXIuIFZvbHVudGVlcmluZyB0
bw0KPiA+IG1haW50YWluIHRoZSBjb2RlIGlzIGEgdGhpcmQuDQo+IA0KPiBJZiBJIG1pZ2h0IGFz
aywgdGhvdWdoLCBpcyB0aGUgcGFpbiBjb25jZW50cmF0ZWQgbW9yZSBvbiB0aGUgY2xpZW50IG9y
DQo+IHRoZSBzZXJ2ZXIgc2lkZT8NCj4gDQo+IE5GU3YyIHNlcnZlciBzdXBwb3J0IHNlZW1zIGEg
ZmFpcmx5IHNpbXBsZSBtYXR0ZXIgb2Ygc29tZSBvbGQNCj4gY29tcGF0aWJpbGl0eSBSUEMgY2Fs
bHMuICBUaGUgbWFpbiBwYWluIGlzIHRoZSBsaW1pdGVkIHNpemUgb2YgdGhlIGZpbGUNCj4gaGFu
ZGxlIGFuZCAoZXNwZWNpYWxseSkgcmVhZGRpciBjb29raWVzLg0KDQpXZWxsLCBJJ20gcmVhbGx5
IGdsYWQgdG8gaGVhciB0aGF0IGFmdGVyIHNldmVyYWwgcGVvcGxlIHNwZW50IDMtNCBob3Vycw0K
ZGVidWdnaW5nIGFuIE5GU3YyLW9ubHkgc2VydmVyIHNpZGUgcHJvYmxlbSBsYXN0IEZyaWRheS4g
UGxlYXNlIGxldCBtZQ0Ka25vdyB0aGUgbmV4dCB0aW1lIEkgY2FuIGhlbHAgZGVhbCB3aXRoIGFu
b3RoZXIgZmFpcmx5IHNpbXBsZSBtYXR0ZXIgb2YNCm9sZCBjb21wYXRpYmlsaXR5IGNhbGxzLi4u
DQoNCk9LLCBJJ2xsIGJpdGUuIFdoYXQgaXMgdGhpcyBidXNpbmVzcy1jcml0aWNhbCBhcHBsaWNh
dGlvbiB0aGF0IHlvdSBhcmUNCnJ1bm5pbmcgYW5kIHRoYXQgd2lsbCBvbmx5IHJ1biBvbiBhIG1h
Y2hpbmUgdGhhdCBpcyBpbmNhcGFibGUgb2YgcnVubmluZw0KYSBtZXJlIDIwLXllYXIgb2xkIHBy
b3RvY29sIGFuZCB0aGF0IG11c3QgaGF2ZSBhIDMwIHllYXIgb2xkIHByb3RvY29sPw0KVGhlIGZh
Y3QgdGhhdCB5b3UgYXJlIGFsbCBpbiBhIGh1ZmYgYWJvdXQgYmFzZTY0IGVuY29kZWQgZW1haWxz
DQppbmRpY2F0ZXMgdGhhdCB0aGlzIGlzIG5vdCBzb21ldGhpbmcgeW91IGFyZSBydW5uaW5nIG9u
IGFueXRoaW5nIGFzDQpzb3BoaXN0aWNhdGVkIGFzIGEgY2VsbCBwaG9uZS4NCg0KPiBDbGllbnQg
c3VwcG9ydCBpcyBwcm9iYWJseSBtb3JlIGNvbXBsaWNhdGVkLCBhcyBORlMncyAic3RhdGVsZXNz
IHNlcnZlciINCj4gbW9kZWwgcHV0cyB0aGUgYnVsayBvZiB0aGUgY29tcGxleGl0eSBvbiB0aGUg
Y2xpZW50LCBhbmQgeW91IG5lZWQgYQ0KPiB0aGlja2VyIGxheWVyIG9mIGxvZ2ljIHRvIHRyYW5z
bGF0ZSB0aGUgb3BlcmF0aW9ucyBpbnRvIGEgZGlmZmVyZW50DQo+IHZvY2FidWxhcnkgcGYgUlBD
IGNhbGxzLg0KPiANCj4gSSBkb24ndCB0aGluayBORlN2MiBjbGllbnQgc3VwcG9ydCB3b3VsZCBi
ZSBtb3VybmVkIG11Y2g7IHRyeWluZyB0byB0bw0KPiB1c2Ugc3VjaCBhbiBhbmNpZW50IGxpbWl0
ZWQgbWFjaGluZSBhcyBhIGZpbGUgc2VydmVyIHNlZW1zIHN0dXBpZC4NCj4gDQo+IEl0J3MgTkZT
djIgc2VydmVyIHN1cHBvcnQgdGhhdCBJLCBhbmQgSSBiZWxpZXZlIExhcnJ5LCBhcmUgaW50ZXJl
c3RlZA0KPiBpbiBwcmVzZXJ2aW5nLCBpbiBvcmRlciB0byBwcm92aWRlIHNlcnZpY2VzIHRvIGFu
Y2llbnQgY2xpZW50cy4NCj4gDQo+IFRoZSBvbmx5IHJlYWwgdXNlIG9mIHYyIGNsaWVudCBjb2Rl
IGlzIHRlIHRlc3QgdGhlIHNlcnZlci4NCg0KR29vZC4gVGhlbiBpdCB3b24ndCBiZSBtaXNzZWQu
DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0K
TmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg==

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-16  3:46     ` Myklebust, Trond
@ 2012-10-16  4:39       ` George Spelvin
  2012-10-16 11:17       ` Jim Rees
  1 sibling, 0 replies; 43+ messages in thread
From: George Spelvin @ 2012-10-16  4:39 UTC (permalink / raw)
  To: linux, Trond.Myklebust; +Cc: linux-nfs, lm

> Well, I'm really glad to hear that after several people spent 3-4 hours
> debugging an NFSv2-only server side problem last Friday. Please let me
> know the next time I can help deal with another fairly simple matter of
> old compatibility calls...

I didn't mean to rouse the sarcasm monster; I was, at your prompting,
investigating the effort of maintaining it myself.

Your .sig says "NFS client", I didn't see your name a lot in the fs/nfsd
directory changelogs, and I wondered, so I *asked*.

> OK, I'll bite. What is this business-critical application that you are
> running and that will only run on a machine that is incapable of running
> a mere 20-year old protocol and that must have a 30 year old protocol?

It's the firmware for an old embedded product, which I still do occasional
fixes for.  The compiler it's been developed with is a binary from
a company that's long out of business, so I'm stuck running the old
binary on the old OS.  (The product itself doesn't date back to 1990,
but the source code archives do, since the same platform was used for
earlier products.)

Yes, I *could* just port the whole thing over to GCC, but like most
embedded code, it's very sensitive to the environment, probably makes
some non-ANSI assumptions that will anger GCC's optimizer, *does* use
a different calling convention that will require fixing all the
assembly routines, and it would require a LOT of re-testing.

Since what we have now has decades of testing, and I'd rather spend my
time developing new products, keeping everything limping along has a
lot of advantages.

My most recent changes were in February 2011: an unusual combination
of commands could cause a lockup.  Before that was adding a new baud
rate a customer needed in July 2010.  There is very little effort,
but maintaining the capability is important.

I actually tried to compile git on SunOS, but things Didn't Work Well,
so what I do is do source control on the Linux NFS server, and just
use the old machine for the actual compiles.

> The fact that you are all in a huff about base64 encoded emails
> indicates that this is not something you are running on anything as
> sophisticated as a cell phone.

I didn't mean to come across as *that* peeved about the base64; it's
mostly annoying because I can't grep my mailbox, and my preferred
mailing list archive web interface (marc.info) doesn't decode them.
And it seems perverse for an MUA to encode something as base-64 that is
100% 7-bit ASCII.  If nothing else, it bloats the message size 35%.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-16  3:46     ` Myklebust, Trond
  2012-10-16  4:39       ` George Spelvin
@ 2012-10-16 11:17       ` Jim Rees
  1 sibling, 0 replies; 43+ messages in thread
From: Jim Rees @ 2012-10-16 11:17 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: George Spelvin, lm@bitmover.com, linux-nfs@vger.kernel.org

Myklebust, Trond wrote:

  The fact that you are all in a huff about base64 encoded emails
  indicates that this is not something you are running on anything as
  sophisticated as a cell phone.

The problem with your base64 email is that the vger server doesn't handle
them correctly.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226!
  2012-10-13  4:42               ` Myklebust, Trond
  2012-10-14  1:42                 ` Larry McVoy
@ 2012-10-17 14:00                 ` Bruce Fields
  1 sibling, 0 replies; 43+ messages in thread
From: Bruce Fields @ 2012-10-17 14:00 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Larry McVoy, Linus Torvalds, Linux NFS Mailing List

On Sat, Oct 13, 2012 at 04:42:05AM +0000, Myklebust, Trond wrote:
> On Fri, 2012-10-12 at 19:56 -0700, Larry McVoy wrote:
> > On Sat, Oct 13, 2012 at 02:52:51AM +0000, Myklebust, Trond wrote:
> > > On Fri, 2012-10-12 at 19:31 -0700, Larry McVoy wrote:
> > > > As I've said, debugging this is going to be hard, it means stopping my
> > > > company.  We can do one more crash, you guys can tell me what to do,
> > > > but that's about it.
> > > 
> > > Do you have a reproducer? 
> > 
> > Oh yeah, it happens every time.  Run skype on a mac w/ your home dir NFS
> > mounted.  Skype tries to do some lock operation and the machine panics.
> > 
> > We've done it 4 times in a row to be sure we weren't smoking crack.  No
> > crack.  Just panics.
> 
> OK. I think I can see what is happening. We need to filter the return
> values from nlm_lookup_file() when dealing with NLMv1 and NLMv3.
> 
> Can you please try the attached patch?

Larry, did you get a chance to test this patch?

It seems obviously right anyway, so I'll go ahead and pass it along
regardless, but it would be nice to have the confirmation.

--b.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2012-10-17 14:00 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20121012211701.GA8301@bitmover.com>
2012-10-12 23:52 ` kernel BUG at /build/buildd/linux-3.2.0/fs/lockd/clntxdr.c:226! Linus Torvalds
2012-10-13  0:21   ` Larry McVoy
2012-10-13  1:02     ` Linus Torvalds
2012-10-13  1:36       ` Jim Rees
2012-10-13  1:45         ` Linus Torvalds
2012-10-13 21:31           ` Daniel Kahn Gillmor
2012-10-13  2:08         ` Boaz Harrosh
2012-10-13  2:28       ` Myklebust, Trond
2012-10-13  2:31         ` Larry McVoy
2012-10-13  2:52           ` Myklebust, Trond
2012-10-13  2:56             ` Larry McVoy
2012-10-13  3:05               ` Myklebust, Trond
2012-10-13  4:42               ` Myklebust, Trond
2012-10-14  1:42                 ` Larry McVoy
2012-10-15  0:43                   ` Bruce Fields
2012-10-15  4:38                     ` Myklebust, Trond
2012-10-15 14:34                       ` Larry McVoy
2012-10-15 18:02                         ` Bruce Fields
2012-10-15  4:41                     ` Myklebust, Trond
2012-10-15 12:11                       ` Bruce Fields
2012-10-17 14:00                 ` Bruce Fields
2012-10-14 19:39         ` Bruce Fields
2012-10-14 19:44           ` Linus Torvalds
2012-10-14 20:55           ` Chuck Lever
2012-10-14 21:05             ` Linus Torvalds
2012-10-14 22:32               ` Chuck Lever
2012-10-14 22:54                 ` Linus Torvalds
2012-10-13  2:27   ` Boaz Harrosh
2012-10-13  2:30     ` Larry McVoy
2012-10-13  2:32     ` Myklebust, Trond
2012-10-13  2:39       ` Boaz Harrosh
2012-10-13  2:43         ` Larry McVoy
2012-10-14 19:43         ` Bruce Fields
2012-10-15 18:20           ` Boaz Harrosh
2012-10-13  2:37     ` Linus Torvalds
2012-10-15  8:05 George Spelvin
2012-10-15 12:19 ` Myklebust, Trond
2012-10-15 16:02   ` VDR User
2012-10-16  1:48   ` George Spelvin
2012-10-16  1:52     ` Larry McVoy
2012-10-16  3:46     ` Myklebust, Trond
2012-10-16  4:39       ` George Spelvin
2012-10-16 11:17       ` Jim Rees

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.