Linux NFS development
 help / color / mirror / Atom feed
* oops in rpc_pipe_release
@ 2005-11-08  0:04 Vince Busam
  2005-11-08  0:10 ` J. Bruce Fields
  2005-11-08 14:23 ` Steve Dickson
  0 siblings, 2 replies; 7+ messages in thread
From: Vince Busam @ 2005-11-08  0:04 UTC (permalink / raw)
  To: nfs

I'm using NFS3 with kerberos authentication, and 25 hour tickets that refresh when
unlocking the screensaver.  Over the weekend, it'll hang with one of the following stack
traces.  Any ideas what could cause this?

Thanks,
Vince

Unable to handle kernel NULL pointer dereference at virtual address 00000004
  printing eip:
f890c6ca
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in: des binfmt_misc cpufreq_userspace cpufreq_ondemand cpufreq_powersave
autofs4 video sony_acpi pcc_acpi dev_acpi i2c_acpi_ec i2c_core button battery container ac
capability commoncap nfs lockd af_packet tg3 piix snd_intel8x0 snd_usb_audio
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_usb_lib snd_rawmidi snd_seq_device
snd_hwdep snd_timer snd soundcore snd_page_alloc pwc videodev v4l2_common uhci_hcd
pci_hotplug floppy pcspkr rtc md dm_mod nvidia agpgart psmouse tsdev evdev mousedev usbhid
parport_pc lp parport ide_generic ide_disk ide_cd cdrom rpcsec_gss_krb5 auth_rpcgss sunrpc
ehci_hcd ext3 jbd mbcache ahci sd_mod ata_piix libata usb_storage usbcore scsi_mod
ide_core unix thermal processor fan
CPU:    1
EIP:    0060:[<f890c6ca>]    Tainted: P      VLI
EFLAGS: 00010203   (2.6.12-gg3)
EIP is at __gss_unhash_msg+0x1a/0x60 [auth_rpcgss]
eax: 00000000   ebx: ef789200   ecx: ef789220   edx: 00000000
esi: efe1dbe4   edi: ffffffe0   ebp: f0368500   esp: efbb5f00
ds: 007b   es: 007b   ss: 0068
Process rpc.gssd (pid: 7760, threadinfo=efbb4000 task=efb56a60)
Stack: c02c3726 ef789200 ef789200 f890c734 ef789200 ef789208 ef789200 f890ccca
        ef789200 f0368684 f0368500 f0368684 f8acbfae ef789208 f0368500 ef3a6a80
        f0368500 f8acc33b f0368500 ffffffe0 00000008 ef3a6a80 f0369380 c01667fa
Call Trace:
  [<c02c3726>] _spin_lock+0x16/0x90
  [<f890c734>] gss_unhash_msg+0x24/0x40 [auth_rpcgss]
  [<f890ccca>] gss_pipe_destroy_msg+0x3a/0xa0 [auth_rpcgss]
  [<f8acbfae>] __rpc_purge_upcall+0x3e/0xb0 [sunrpc]
  [<f8acc33b>] rpc_pipe_release+0xcb/0xf0 [sunrpc]
  [<c01667fa>] __fput+0x18a/0x1d0
  [<c0164b72>] filp_close+0x52/0xa0
  [<c0164c2a>] sys_close+0x6a/0xa0
  [<c010343b>] sysenter_past_esp+0x54/0x75
Code: e8 ac 2d 81 c7 eb bb 8d 76 00 8d bc 27 00 00 00 00 53 83 ec 08 8b 5c 24 10 8b 53 20
8d 4b 20 39 ca 75 05 83 c4 08 5b c3 8b 41 04 <89> 42 04 89 10 89 49 04 8b 43 1c 89 4b 20
89 44 24 04 8d 43 2c


------------[ cut here ]------------
kernel BUG at <bad filename>:53227!
invalid operand: 0000 [#1]
PREEMPT SMP
Modules linked in: des binfmt_misc cpufreq_userspace cpufreq_ondemand cpufreq_powersave
autofs4 video button battery container ac nfs lockd af_packet tg3 generic snd_intel8x0
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc
hw_random uhci_hcd pci_hotplug floppy pcspkr rtc evdev md_mod dm_mod nvidia agpgart
psmouse mousedev parport_pc lp parport ide_generic ide_disk ide_cd ide_core
rpcsec_gss_krb5 auth_rpcgss sunrpc ehci_hcd usbcore ext3 jbd mbcache ahci sr_mod cdrom
sd_mod sg ata_piix libata scsi_mod unix thermal processor fan
CPU:    0
EIP:    0060:[<f88d0637>]    Tainted: P      VLI
EFLAGS: 00010203   (2.6.13.4-gg4)
EIP is at gss_release_msg+0x47/0x50 [auth_rpcgss]
eax: f72a6420   ebx: f72a6400   ecx: f72a6420   edx: 00000000
esi: f6c5ed04   edi: ffffffe0   ebp: f6c5eb80   esp: f71cdf28
ds: 007b   es: 007b   ss: 0068
Process rpc.gssd (pid: 5183, threadinfo=f71cc000 task=f7af0020)
Stack: f6c5eb80 f6c5eb80 f8a93dae f72a6400 f6c5eb80 d5ce9180 f6c5eb80 f8a9413b
       f6c5eb80 ffffffe0 00000008 d5ce9180 f74b7700 c0166caa f6c5eb80 d5ce9180
       00000000 00000000 dff47290 d5ce9180 c2b3f180 00000000 d5ce9180 c0164fb6
Call Trace:
[<f8a93dae>] __rpc_purge_upcall+0x3e/0xb0 [sunrpc]
[<f8a9413b>] rpc_pipe_release+0xcb/0xf0 [sunrpc]
[<c0166caa>] __fput+0x18a/0x1d0
[<c0164fb6>] filp_close+0x46/0x90
[<c016506a>] sys_close+0x6a/0xa0
[<c010316b>] sysenter_past_esp+0x54/0x75
Code: 68 85 d2 75 12 89 5c 24 0c 58 5b e9 84 df 87 c7 8d 74 26 00 58 5b c3 f0 ff 0a 0f 94
c0 84 c0 74 e4 89 14 24 e8 fb 0a 00 00 eb da <0f> 0b eb cf 90 8d 74 26 00 56 53 83 ec 08
8b 4c 24 14 8b 74 24
-------------------------------------------------------------------




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: oops in rpc_pipe_release
  2005-11-08  0:04 oops in rpc_pipe_release Vince Busam
@ 2005-11-08  0:10 ` J. Bruce Fields
  2005-11-08 14:23 ` Steve Dickson
  1 sibling, 0 replies; 7+ messages in thread
From: J. Bruce Fields @ 2005-11-08  0:10 UTC (permalink / raw)
  To: Vince Busam; +Cc: nfs

On Mon, Nov 07, 2005 at 04:04:43PM -0800, Vince Busam wrote:
> I'm using NFS3 with kerberos authentication, and 25 hour tickets that
> refresh when unlocking the screensaver.  Over the weekend, it'll hang
> with one of the following stack traces.  Any ideas what could cause
> this?

Could you retry with Trond's latest NFS_ALL?

http://linux-nfs.org/Linux-2.6.x/2.6.14/linux-2.6.14-NFS_ALL.dif

--b.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: oops in rpc_pipe_release
  2005-11-08  0:04 oops in rpc_pipe_release Vince Busam
  2005-11-08  0:10 ` J. Bruce Fields
@ 2005-11-08 14:23 ` Steve Dickson
  2005-11-08 18:37   ` Vince Busam
  1 sibling, 1 reply; 7+ messages in thread
From: Steve Dickson @ 2005-11-08 14:23 UTC (permalink / raw)
  To: Vince Busam; +Cc: nfs



Vince Busam wrote:
> I'm using NFS3 with kerberos authentication, and 25 hour tickets that 
> refresh when
> unlocking the screensaver.  Over the weekend, it'll hang with one of the 
> following stack
> traces.  Any ideas what could cause this?
I believe this is caused by the fact gss_pipe_release()
(i.e. rpci->ops->release_pipe(inode)) is being called
with a freed clnt->cl_auth pointer. I proposed the
following patch a while back that I thought fixed the
problem, but Trond said the patch "prevents anyone from
reopening the pipe after the first close(), so if gssd
needs to be restarted, then all pipes will forever block."
So the patch got reverted....


--- linux-2.6.13/net/sunrpc/rpc_pipe.c.orig	2005-08-28 
19:41:01.000000000 -0400
+++ linux-2.6.13/net/sunrpc/rpc_pipe.c	2005-09-16 11:18:53.598157000 -0400
@@ -177,6 +177,8 @@ rpc_pipe_release(struct inode *inode, st
  		__rpc_purge_upcall(inode, -EPIPE);
  	if (rpci->ops->release_pipe)
  		rpci->ops->release_pipe(inode);
+	if (!rpci->nreaders && !rpci->nwriters)
+		rpci->ops = NULL;
  out:
  	up(&inode->i_sem);
  	return 0;

I think the main problem here is there is no way of telling
if a rpc_inode is or is not valid (or active) so there
is no way of knowing whether or not a release call is needed...

steved.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: oops in rpc_pipe_release
  2005-11-08 14:23 ` Steve Dickson
@ 2005-11-08 18:37   ` Vince Busam
  2005-11-08 18:58     ` Steve Dickson
  0 siblings, 1 reply; 7+ messages in thread
From: Vince Busam @ 2005-11-08 18:37 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

I tried that with and without the linux-2.6.13-CITI_NFS4_ALL-1.dif patch, and either way 
it ends up causing another NULL pointer dereference in __rpc_purge_upcall after an hour or 
two.

Vince

Steve Dickson wrote:
> 
> 
> Vince Busam wrote:
> 
>> I'm using NFS3 with kerberos authentication, and 25 hour tickets that 
>> refresh when
>> unlocking the screensaver.  Over the weekend, it'll hang with one of 
>> the following stack
>> traces.  Any ideas what could cause this?
> 
> I believe this is caused by the fact gss_pipe_release()
> (i.e. rpci->ops->release_pipe(inode)) is being called
> with a freed clnt->cl_auth pointer. I proposed the
> following patch a while back that I thought fixed the
> problem, but Trond said the patch "prevents anyone from
> reopening the pipe after the first close(), so if gssd
> needs to be restarted, then all pipes will forever block."
> So the patch got reverted....
> 
> 
> --- linux-2.6.13/net/sunrpc/rpc_pipe.c.orig    2005-08-28 
> 19:41:01.000000000 -0400
> +++ linux-2.6.13/net/sunrpc/rpc_pipe.c    2005-09-16 11:18:53.598157000 
> -0400
> @@ -177,6 +177,8 @@ rpc_pipe_release(struct inode *inode, st
>          __rpc_purge_upcall(inode, -EPIPE);
>      if (rpci->ops->release_pipe)
>          rpci->ops->release_pipe(inode);
> +    if (!rpci->nreaders && !rpci->nwriters)
> +        rpci->ops = NULL;
>  out:
>      up(&inode->i_sem);
>      return 0;
> 
> I think the main problem here is there is no way of telling
> if a rpc_inode is or is not valid (or active) so there
> is no way of knowing whether or not a release call is needed...
> 
> steved.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: oops in rpc_pipe_release
  2005-11-08 18:37   ` Vince Busam
@ 2005-11-08 18:58     ` Steve Dickson
  2005-11-08 19:58       ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Steve Dickson @ 2005-11-08 18:58 UTC (permalink / raw)
  To: Vince Busam; +Cc: nfs

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]



Vince Busam wrote:
> I tried that with and without the linux-2.6.13-CITI_NFS4_ALL-1.dif 
> patch, and either way it ends up causing another NULL pointer 
> dereference in __rpc_purge_upcall after an hour or two.
See if the attached patch helps... It makes gss_pipe_release()
handles the fact that given pointer that are passed in
could be NULL. This seem to fix the problem I was seeing...

Trond, Is this something that's a bit more palatable? :)

steved.

[-- Attachment #2: linux-2.6.14-rpc-gss-oops.patch --]
[-- Type: text/x-patch, Size: 553 bytes --]

--- linux-2.6.9/net/sunrpc/auth_gss/auth_gss.c.orig	2005-11-07 11:05:52.800401000 -0500
+++ linux-2.6.9/net/sunrpc/auth_gss/auth_gss.c	2005-11-08 13:16:05.576222000 -0500
@@ -513,7 +513,10 @@ gss_pipe_release(struct inode *inode)
 	struct rpc_auth *auth;
 	struct gss_auth *gss_auth;
 
-	clnt = rpci->private;
+	clnt = ((rpci != NULL) ? ((struct rpc_clnt *)rpci->private) : NULL);
+	if (clnt == NULL || clnt->cl_auth == NULL)
+		return;
+
 	auth = clnt->cl_auth;
 	gss_auth = container_of(auth, struct gss_auth, rpc_auth);
 	spin_lock(&gss_auth->lock);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: oops in rpc_pipe_release
  2005-11-08 18:58     ` Steve Dickson
@ 2005-11-08 19:58       ` Trond Myklebust
  2005-11-08 20:51         ` Steve Dickson
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2005-11-08 19:58 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Vince Busam, nfs

On Tue, 2005-11-08 at 13:58 -0500, Steve Dickson wrote:
> 
> Vince Busam wrote:
> > I tried that with and without the linux-2.6.13-CITI_NFS4_ALL-1.dif 
> > patch, and either way it ends up causing another NULL pointer 
> > dereference in __rpc_purge_upcall after an hour or two.
> See if the attached patch helps... It makes gss_pipe_release()
> handles the fact that given pointer that are passed in
> could be NULL. This seem to fix the problem I was seeing...
> 
> Trond, Is this something that's a bit more palatable? :)

I'd rather like to find out how this is happening, and fix the root
cause. Your patch seems like a bit too much of a band-aid.

My point is that we should never want to find ourselves in the situation
that the directory is being cleared without the auth code having first
cleaned up and deleted its pipes.

Cheers,
  Trond



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: oops in rpc_pipe_release
  2005-11-08 19:58       ` Trond Myklebust
@ 2005-11-08 20:51         ` Steve Dickson
  0 siblings, 0 replies; 7+ messages in thread
From: Steve Dickson @ 2005-11-08 20:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs

Trond Myklebust wrote:
> 
> I'd rather like to find out how this is happening, and fix the root
> cause. Your patch seems like a bit too much of a band-aid.
Yeah I know its a hack.... I just wanted to make sure it address
the root cause of Vince's problem... Unfortunately I'm a bit
under the gun w.r.t, to deadlines... and an oops is an oops...
so I might have to go with it...

> 
> My point is that we should never want to find ourselves in the situation
> that the directory is being cleared without the auth code having first
> cleaned up and deleted its pipes.
Well here is how you should be able to reproduce it
as spelled out in bz 171112:
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171112)

root# mount -o krb5 server:/export /mnt
user$ cd /mnt
root# /bin/service nets stop
root# /bin/service rpcidmapd stop
root# kill -9 $(pgrep -u <user>)


steved.



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-11-08 20:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-08  0:04 oops in rpc_pipe_release Vince Busam
2005-11-08  0:10 ` J. Bruce Fields
2005-11-08 14:23 ` Steve Dickson
2005-11-08 18:37   ` Vince Busam
2005-11-08 18:58     ` Steve Dickson
2005-11-08 19:58       ` Trond Myklebust
2005-11-08 20:51         ` Steve Dickson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox