* Confused about BUG_ON in rpcb_getport_async
@ 2008-08-12 2:27 Neil Brown
[not found] ` <18592.62730.840231.108375-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2008-08-12 2:27 UTC (permalink / raw)
To: linux-nfs
Hi
I have a report of a the BUG_ON in rpcb_getport_clnt being
triggered. This is:
/* Autobind on cloned rpc clients is discouraged */
BUG_ON(clnt->cl_parent != clnt);
It looks to me that while they might be discouraged, they are not
prevented and so having the BUG_ON is wrong.
When rpc_clone_client creates a clone, it sets cl_autobind to 0,
and gives the new client a reference to the same cl_xprt as the
original client.
The only effect of cl_autobind is to prevent rpc_force_rebind from
clearing the BOUND flag on ->cl_xprt.
So while a call to rpc_force_rebind on the clone will not clear that
flag, a call on the original client will clear that flag.
So a cloned client can still end up with a ->cl_xprt with the BOUND
flag clear.
So call_bind (which is present in the call trace under the oops) can
find that !xprt_bound, even when the client is a cloned client.
When this happens, ->rpcbind, which is rpcb_getport_clnt, goes BOOM.
What should happen when a clone client finds that its transport is no
longer bound? Should rpc_getport_async just do
clnt = task->tk_client->cl_parent;
??
Perplexed,
NeilBrown
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <18592.62730.840231.108375-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: Confused about BUG_ON in rpcb_getport_async [not found] ` <18592.62730.840231.108375-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2008-08-12 20:32 ` Trond Myklebust 2008-08-13 23:45 ` Neil Brown 0 siblings, 1 reply; 5+ messages in thread From: Trond Myklebust @ 2008-08-12 20:32 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs On Tue, 2008-08-12 at 12:27 +1000, Neil Brown wrote: > Hi > I have a report of a the BUG_ON in rpcb_getport_clnt being > triggered. This is: > /* Autobind on cloned rpc clients is discouraged */ > BUG_ON(clnt->cl_parent != clnt); > > It looks to me that while they might be discouraged, they are not > prevented and so having the BUG_ON is wrong. > > When rpc_clone_client creates a clone, it sets cl_autobind to 0, > and gives the new client a reference to the same cl_xprt as the > original client. > > The only effect of cl_autobind is to prevent rpc_force_rebind from > clearing the BOUND flag on ->cl_xprt. > So while a call to rpc_force_rebind on the clone will not clear that > flag, a call on the original client will clear that flag. > > So a cloned client can still end up with a ->cl_xprt with the BOUND > flag clear. > > So call_bind (which is present in the call trace under the oops) can > find that !xprt_bound, even when the client is a cloned client. > > When this happens, ->rpcbind, which is rpcb_getport_clnt, goes BOOM. > > What should happen when a clone client finds that its transport is no > longer bound? Should rpc_getport_async just do > clnt = task->tk_client->cl_parent; > ?? This shouldn't be a particularly urgent problem, since lockd should normally be the only thing that needs to use the autobind functionality, and it does not use cloned clients. That said, I think that the answer is that cloned clients may indeed use rpcbind as long as they share the same program number and version as the parent. IOW: we should probably rather BUG_ON() calls to rpc_bind_new_program() if the parent has xprt->cl_autobind set. Cheers Trond ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Confused about BUG_ON in rpcb_getport_async 2008-08-12 20:32 ` Trond Myklebust @ 2008-08-13 23:45 ` Neil Brown [not found] ` <18595.29231.61262.220929-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Neil Brown @ 2008-08-13 23:45 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Tuesday August 12, trond.myklebust@fys.uio.no wrote: > On Tue, 2008-08-12 at 12:27 +1000, Neil Brown wrote: > > Hi > > I have a report of a the BUG_ON in rpcb_getport_clnt being > > triggered. This is: > > /* Autobind on cloned rpc clients is discouraged */ > > BUG_ON(clnt->cl_parent != clnt); > > > > It looks to me that while they might be discouraged, they are not > > prevented and so having the BUG_ON is wrong. > > > > When rpc_clone_client creates a clone, it sets cl_autobind to 0, > > and gives the new client a reference to the same cl_xprt as the > > original client. > > > > The only effect of cl_autobind is to prevent rpc_force_rebind from > > clearing the BOUND flag on ->cl_xprt. > > So while a call to rpc_force_rebind on the clone will not clear that > > flag, a call on the original client will clear that flag. > > > > So a cloned client can still end up with a ->cl_xprt with the BOUND > > flag clear. > > > > So call_bind (which is present in the call trace under the oops) can > > find that !xprt_bound, even when the client is a cloned client. > > > > When this happens, ->rpcbind, which is rpcb_getport_clnt, goes BOOM. > > > > What should happen when a clone client finds that its transport is no > > longer bound? Should rpc_getport_async just do > > clnt = task->tk_client->cl_parent; > > ?? > > This shouldn't be a particularly urgent problem, since lockd should > normally be the only thing that needs to use the autobind functionality, > and it does not use cloned clients. > > That said, I think that the answer is that cloned clients may indeed use > rpcbind as long as they share the same program number and version as the > parent. > IOW: we should probably rather BUG_ON() calls to rpc_bind_new_program() > if the parent has xprt->cl_autobind set. Just to confirm, you are saying that the current BUG_ON is indeed wrong, and the fact that it fires does not imply any real bug. It should be removed. Possibly a BUG_ON could be put somewhere else to catch incorrect behaviour. Is that correct? I now have multiple people for whom this BUG_ON has fired. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <18595.29231.61262.220929-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: Confused about BUG_ON in rpcb_getport_async [not found] ` <18595.29231.61262.220929-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2008-08-14 0:10 ` Trond Myklebust 2008-08-14 2:26 ` Neil Brown 0 siblings, 1 reply; 5+ messages in thread From: Trond Myklebust @ 2008-08-14 0:10 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs On Thu, 2008-08-14 at 09:45 +1000, Neil Brown wrote: > Just to confirm, you are saying that the current BUG_ON is indeed > wrong, and the fact that it fires does not imply any real bug. > It should be removed. Possibly a BUG_ON could be put somewhere else > to catch incorrect behaviour. > > Is that correct? Yes. > I now have multiple people for whom this BUG_ON has fired. Could you send us a trace? AFAIK it shouldn't really be happening... Trond ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Confused about BUG_ON in rpcb_getport_async 2008-08-14 0:10 ` Trond Myklebust @ 2008-08-14 2:26 ` Neil Brown 0 siblings, 0 replies; 5+ messages in thread From: Neil Brown @ 2008-08-14 2:26 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Wednesday August 13, trond.myklebust@fys.uio.no wrote: > On Thu, 2008-08-14 at 09:45 +1000, Neil Brown wrote: > > Just to confirm, you are saying that the current BUG_ON is indeed > > wrong, and the fact that it fires does not imply any real bug. > > It should be removed. Possibly a BUG_ON could be put somewhere else > > to catch incorrect behaviour. > > > > Is that correct? > > Yes. Thanks. > > > I now have multiple people for whom this BUG_ON has fired. > > Could you send us a trace? AFAIK it shouldn't really be happening... https://bugzilla.novell.com/show_bug.cgi?id=415607 contains Aug 7 09:55:42 copperhead klogd: ------------[ cut here ]------------ Aug 7 09:55:42 copperhead klogd: kernel BUG at net/sunrpc/rpcb_clnt.c:290! Aug 7 09:55:42 copperhead klogd: invalid opcode: 0000 [1] SMP Aug 7 09:55:42 copperhead klogd: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Aug 7 09:55:42 copperhead klogd: CPU 3 Aug 7 09:55:42 copperhead klogd: Modules linked in: lp parport_pc ppdev parport nls_utf8 nfs lockd nfs_acl sunrpc autofs4 binfmt_misc iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 bonding microcode firmware_class fuse loop dm_mod rtc_cmos iTCO_wdt sg shpchp rtc_core i2c_i801 iTCO_vendor_support rtc_lib floppy container i2c_core i5000_edac usb_storage sr_mod pci_hotplug button cdrom joydev e1000e edac_core usbhid hid ff_memless ehci_hcd uhci_hcd sd_mod usbcore edd ext3 mbcache jbd fan ata_piix ahci libata dock 3w_9xxx scsi_mod thermal processor [last unloaded: parport_pc] Aug 7 09:55:42 copperhead klogd: Pid: 14342, comm: xauth Tainted: G N 2.6.25.11-0.1-default #1 Aug 7 09:55:42 copperhead klogd: RIP: 0010:[<ffffffff883ad55a>] [<ffffffff883ad55a>] :sunrpc:rpcb_getport_async+0x66/0x460 Aug 7 09:55:42 copperhead klogd: RSP: 0018:ffff810110cff938 EFLAGS: 00010287 Aug 7 09:55:42 copperhead klogd: RAX: ffffffff883d11d0 RBX: ffff81022c8d5c00 RCX: 0000000000000001 Aug 7 09:55:42 copperhead klogd: RDX: 0000000000000000 RSI: ffff81010a543e40 RDI: ffff81010a543e40 Aug 7 09:55:42 copperhead klogd: RBP: ffff810110cffa58 R08: ffff81010a543f28 R09: 0000000000000001 Aug 7 09:55:42 copperhead klogd: R10: ffff810001031280 R11: ffff810110cffc18 R12: ffff81018b047800 Aug 7 09:55:42 copperhead klogd: R13: ffff81018b047800 R14: ffff81010a543e40 R15: ffff810229d40800 Aug 7 09:55:42 copperhead klogd: FS: 00007fb4bd7ed6f0(0000) GS:ffff81022f1037c0(0000) knlGS:0000000000000000 Aug 7 09:55:42 copperhead klogd: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 7 09:55:42 copperhead klogd: CR2: 00007f3992dfcf00 CR3: 00000001d4601000 CR4: 00000000000006e0 Aug 7 09:55:42 copperhead klogd: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 7 09:55:42 copperhead klogd: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 7 09:55:42 copperhead klogd: Process xauth (pid: 14342, threadinfo ffff810110cfe000, task ffff810116954080) Aug 7 09:55:42 copperhead klogd: Stack: 0000000000000246 ffff81022ad67370 ffff810110cff978 0000000000000246 Aug 7 09:55:42 copperhead klogd: 0000000000011200 ffff81022ad67340 ffff810110cff9a8 ffff810110cff9c0 Aug 7 09:55:42 copperhead klogd: ffff810110cff988 ffffffff80276757 ffff810110cffa08 ffffffff8027685a Aug 7 09:55:42 copperhead klogd: Call Trace: Aug 7 09:55:42 copperhead klogd: [<ffffffff883a018b>] :sunrpc:call_bind+0x6f/0x74 Aug 7 09:55:42 copperhead klogd: [<ffffffff883a72d9>] :sunrpc:__rpc_execute+0x86/0x246 Aug 7 09:55:42 copperhead klogd: [<ffffffff883a751e>] :sunrpc:rpc_execute+0x85/0x8e Aug 7 09:55:42 copperhead klogd: [<ffffffff883a14a5>] :sunrpc:rpc_run_task+0x52/0x5a Aug 7 09:55:42 copperhead klogd: [<ffffffff883a1598>] :sunrpc:rpc_call_sync+0x3f/0x5d Aug 7 09:55:42 copperhead klogd: [<ffffffff8840c0bb>] :nfs:nfs3_rpc_wrapper+0x22/0x5c Aug 7 09:55:42 copperhead klogd: [<ffffffff8840c5b5>] :nfs:nfs3_proc_access+0xf3/0x15e Aug 7 09:55:42 copperhead klogd: [<ffffffff883ffde0>] :nfs:nfs_do_access+0x173/0x341 Aug 7 09:55:42 copperhead klogd: [<ffffffff884000a3>] :nfs:nfs_permission+0xf5/0x166 Aug 7 09:55:42 copperhead klogd: [<ffffffff802a7497>] permission+0xc7/0x118 Aug 7 09:55:42 copperhead klogd: [<ffffffff802a7bbc>] lookup_hash+0x21/0x43 Aug 7 09:55:42 copperhead klogd: [<ffffffff802ab6d8>] open_namei+0xe8/0x6a0 Aug 7 09:55:42 copperhead klogd: [<ffffffff8029ec5b>] do_filp_open+0x28/0x4b Aug 7 09:55:42 copperhead klogd: [<ffffffff8029eccf>] do_sys_open+0x51/0xd2 Aug 7 09:55:42 copperhead klogd: [<ffffffff8029ed79>] sys_open+0x1b/0x1d Aug 7 09:55:42 copperhead klogd: [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f Aug 7 09:55:42 copperhead klogd: DWARF2 unwinder stuck at system_call_after_swapgs+0x8a/0x8f Aug 7 09:55:42 copperhead klogd: Aug 7 09:55:42 copperhead klogd: Leftover inexact backtrace: Aug 7 09:55:42 copperhead klogd: Aug 7 09:55:42 copperhead syslog-ng[2774]: last message repeated 2 times Aug 7 09:55:42 copperhead klogd: Code: 3b 88 44 8b 4b 44 44 8b 43 40 48 c7 c7 57 74 3b 88 48 8b 4b 50 89 04 24 31 c0 e8 ed 24 0a f8 4c 8b bb e8 00 00 00 49 39 df 74 04 <0f> 0b eb fe 49 8d 85 78 02 00 00 48 89 85 f8 fe ff ff f0 0f ba Aug 7 09:55:42 copperhead klogd: RIP [<ffffffff883ad55a>] :sunrpc:rpcb_getport_async+0x66/0x460 Aug 7 09:55:42 copperhead klogd: RSP <ffff810110cff938> Aug 7 09:55:42 copperhead klogd: ---[ end trace 4b516acb183486f8 ]--- Kernel was Linux copperhead 2.6.25.11-0.1-default #1 SMP 2008-07-13 20:48:28 +0200 x86_64 x86_64 x86_64 GNU/Linux NeilBrown ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-08-14 2:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-12 2:27 Confused about BUG_ON in rpcb_getport_async Neil Brown
[not found] ` <18592.62730.840231.108375-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-08-12 20:32 ` Trond Myklebust
2008-08-13 23:45 ` Neil Brown
[not found] ` <18595.29231.61262.220929-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-08-14 0:10 ` Trond Myklebust
2008-08-14 2:26 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox