* Re: Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux servers. [not found] <20130124130243.449d5d92@notabene.brown> @ 2013-01-24 16:13 ` Chuck Lever 2013-01-30 22:29 ` J.Bruce Fields 2013-01-30 23:19 ` Myklebust, Trond 0 siblings, 2 replies; 4+ messages in thread From: Chuck Lever @ 2013-01-24 16:13 UTC (permalink / raw) To: NeilBrown; +Cc: Kevin Coffman, J.Bruce Fields, Steve Dickson, NFS On Jan 23, 2013, at 9:02 PM, NeilBrown <neilb@suse.de> wrote: > > Hi peoples, > > this issue has appeared on the mailing list before (particularly around July > 2011) but hasn't been resolved yet and it just bit me again so I figure it > is time it got fixed. > > > If you tcpdump the network connection while mounting an NFS filesystem > using kerberos - or while the client is establishing a new context because > e.g. the server rebooted - you will see a NULL RPC with an > RPC_GSS_PROC_DESTROY credential but no verifier. The lack of a verifier > makes the packet corrupt so the server ignores it, but people see it and > think something is wrong. > > It is good that the server ignores it as it really shouldn't be there. > What happens is that the NFS client calls up to rpc.gssd to request a > credential. rpc.gssd then establishes a connection directly with the > server, including the establishment of the security context. Then it > gathers the context details and passed them down to the kernel. > Then it closes the connection part of which involves calling > AUTH_DESTROY(auth) - necessary to free up data structures and not leak > memory. > This AUTH_DESTROY tries to destroy the context completely, including telling > the server that it has been destroyed! But it hasn't, it has just been > passed down to the kernel for use on a different connection. > > So there are two issues here: > - why is the GSS_PROC_DESTROY packet missing a verifier > - how can we get AUTH_DESTROY to *not* try to destroy the context on the > server - as that would be a bad thing. > > The first I cannot completely answer. I do know that in libtirpc, in > auth_gss.c, in authgss_marshal(), gss_get_mic is failing because it doesn't > think it has a valid context. I don't know why it thinks that, and I don't > really care. > > > The second question is more interesting and I see two possible options. > > 1/ If we knew why gss_get_mic failed and had good reason to believe it would > keep on failing, we could consider changing clnt_vc_call to respond to an > error from AUTH_MARSHALL not by sending a truncated packet, but by purging > the current message and not sending it at all. This should be possible but > might be messy. > > 2/ Make libtirpc behave more like librpcsecgss. > In libtirpc, the authgss_get_private_data() function just hands over a > pointer to the private data, but keeps its own pointer so it can free it > when the client is finally destroyed. > > In librpcsecgss, since commit 07fce317cac267509b944a8191cafa8e49b5e328 > (thanks Kevin), authgss_get_private_data() hands the data over to the > caller and doesn't keep it's own reference to it. So the caller has to call > authgss_free_private_data() when it has finished with the data. > As the library no longer has the credential, it doesn't even bother trying > to send a GSS_PROC_DESTROY request. > > When Chuck noticed this difference between the two libraries, he resolved > it - in commit 336f8bca825416082d62ef38314f3e0b7e8f5cc2 as follow: > > if (token.value) > free(token.value); > +#ifndef HAVE_LIBTIRPC > if (pd.pd_ctx_hndl.length != 0) > authgss_free_private_data(&pd); > +#endif > > Clearly to significance of this difference was not obvious, and this was > the easiest fix. > > If we were to "fix" this properly, we would need to add a commit like the > one from Kevin to libtirpc, and remove that #ifndef from nfs-utils. > co-ordinating this might be tricking. nfs-utils could presumably test if > libtirpc provided the function (at configure time) and call it if it does, This seems to me like the best approach for 2. > However is someone updates libtirpc without updating or recompiling > nfs-utils they would get a memory leak. May it would be slow enough not to > be serious, and if anyone noticed that could just upgrade and get a fix. Telling people to upgrade for a fix is what we do for a living. In all seriousness, though, in the common case, people will be using nfs-utils and libtirpc built by distributions, and we expect the distros will get the fix dependency right over time. > Does this seem reasonable? How is maintaining libtirpc these days? > Could we get the fix into 0.2.3, or would we need a minor version bump to > 0.3.0?? A minor version bump shouldn't be necessary if we're not changing the synopsis of a published API, nor are we removing a published API. > 3/ there is actually a third option. We could change > authgss_get_private_data() to set gc.gc_ctx.length to 0, but not free the > buffer. Then aithgss_destroy_context() could notice that the length is zero > and the buffer is not NULL, and could free the buffer but not try to send > the context_destroy request. It's an ugly hack though and I think I'd > rather not. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux servers. 2013-01-24 16:13 ` Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux servers Chuck Lever @ 2013-01-30 22:29 ` J.Bruce Fields 2013-01-30 23:19 ` Myklebust, Trond 1 sibling, 0 replies; 4+ messages in thread From: J.Bruce Fields @ 2013-01-30 22:29 UTC (permalink / raw) To: Chuck Lever; +Cc: NeilBrown, Kevin Coffman, Steve Dickson, NFS On Thu, Jan 24, 2013 at 11:13:51AM -0500, Chuck Lever wrote: > > On Jan 23, 2013, at 9:02 PM, NeilBrown <neilb@suse.de> wrote: > > > > > Hi peoples, > > > > this issue has appeared on the mailing list before (particularly around July > > 2011) but hasn't been resolved yet and it just bit me again so I figure it > > is time it got fixed. > > > > > > If you tcpdump the network connection while mounting an NFS filesystem > > using kerberos - or while the client is establishing a new context because > > e.g. the server rebooted - you will see a NULL RPC with an > > RPC_GSS_PROC_DESTROY credential but no verifier. The lack of a verifier > > makes the packet corrupt so the server ignores it, but people see it and > > think something is wrong. > > > > It is good that the server ignores it as it really shouldn't be there. > > What happens is that the NFS client calls up to rpc.gssd to request a > > credential. rpc.gssd then establishes a connection directly with the > > server, including the establishment of the security context. Then it > > gathers the context details and passed them down to the kernel. > > Then it closes the connection part of which involves calling > > AUTH_DESTROY(auth) - necessary to free up data structures and not leak > > memory. > > This AUTH_DESTROY tries to destroy the context completely, including telling > > the server that it has been destroyed! But it hasn't, it has just been > > passed down to the kernel for use on a different connection. > > > > So there are two issues here: > > - why is the GSS_PROC_DESTROY packet missing a verifier > > - how can we get AUTH_DESTROY to *not* try to destroy the context on the > > server - as that would be a bad thing. > > > > The first I cannot completely answer. I do know that in libtirpc, in > > auth_gss.c, in authgss_marshal(), gss_get_mic is failing because it doesn't > > think it has a valid context. I don't know why it thinks that, and I don't > > really care. > > > > > > The second question is more interesting and I see two possible options. > > > > 1/ If we knew why gss_get_mic failed and had good reason to believe it would > > keep on failing, we could consider changing clnt_vc_call to respond to an > > error from AUTH_MARSHALL not by sending a truncated packet, but by purging > > the current message and not sending it at all. This should be possible but > > might be messy. > > > > 2/ Make libtirpc behave more like librpcsecgss. > > In libtirpc, the authgss_get_private_data() function just hands over a > > pointer to the private data, but keeps its own pointer so it can free it > > when the client is finally destroyed. > > > > In librpcsecgss, since commit 07fce317cac267509b944a8191cafa8e49b5e328 > > (thanks Kevin), authgss_get_private_data() hands the data over to the > > caller and doesn't keep it's own reference to it. So the caller has to call > > authgss_free_private_data() when it has finished with the data. > > As the library no longer has the credential, it doesn't even bother trying > > to send a GSS_PROC_DESTROY request. > > > > When Chuck noticed this difference between the two libraries, he resolved > > it - in commit 336f8bca825416082d62ef38314f3e0b7e8f5cc2 as follow: > > > > if (token.value) > > free(token.value); > > +#ifndef HAVE_LIBTIRPC > > if (pd.pd_ctx_hndl.length != 0) > > authgss_free_private_data(&pd); > > +#endif > > > > Clearly to significance of this difference was not obvious, and this was > > the easiest fix. > > > > If we were to "fix" this properly, we would need to add a commit like the > > one from Kevin to libtirpc, and remove that #ifndef from nfs-utils. > > co-ordinating this might be tricking. nfs-utils could presumably test if > > libtirpc provided the function (at configure time) and call it if it does, > > This seems to me like the best approach for 2. > > > However is someone updates libtirpc without updating or recompiling > > nfs-utils they would get a memory leak. May it would be slow enough not to > > be serious, and if anyone noticed that could just upgrade and get a fix. > > Telling people to upgrade for a fix is what we do for a living. In all seriousness, though, in the common case, people will be using nfs-utils and libtirpc built by distributions, and we expect the distros will get the fix dependency right over time. Yes, I hate to be lax about library/application compatibility, but looks like the only consequence of the incompatibility here is a small memory leak, and nfs-utils and libtirpc are probably normally upgraded at the same time, so I think we could live with that. --b. > > > Does this seem reasonable? How is maintaining libtirpc these days? > > Could we get the fix into 0.2.3, or would we need a minor version bump to > > 0.3.0?? > > A minor version bump shouldn't be necessary if we're not changing the synopsis of a published API, nor are we removing a published API. > > > 3/ there is actually a third option. We could change > > authgss_get_private_data() to set gc.gc_ctx.length to 0, but not free the > > buffer. Then aithgss_destroy_context() could notice that the length is zero > > and the buffer is not NULL, and could free the buffer but not try to send > > the context_destroy request. It's an ugly hack though and I think I'd > > rather not. > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com > > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux servers. 2013-01-24 16:13 ` Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux servers Chuck Lever 2013-01-30 22:29 ` J.Bruce Fields @ 2013-01-30 23:19 ` Myklebust, Trond 2013-02-04 23:42 ` NeilBrown 1 sibling, 1 reply; 4+ messages in thread From: Myklebust, Trond @ 2013-01-30 23:19 UTC (permalink / raw) To: Chuck Lever, NeilBrown; +Cc: Kevin Coffman, J.Bruce Fields, Steve Dickson, NFS > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- > owner@vger.kernel.org] On Behalf Of Chuck Lever > Sent: Thursday, January 24, 2013 11:14 AM > To: NeilBrown > Cc: Kevin Coffman; J.Bruce Fields; Steve Dickson; NFS > Subject: Re: Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux > servers. > > > On Jan 23, 2013, at 9:02 PM, NeilBrown <neilb@suse.de> wrote: > > > > > Hi peoples, > > > > this issue has appeared on the mailing list before (particularly > > around July > > 2011) but hasn't been resolved yet and it just bit me again so I > > figure it is time it got fixed. > > > > > > If you tcpdump the network connection while mounting an NFS filesystem > > using kerberos - or while the client is establishing a new context > > because e.g. the server rebooted - you will see a NULL RPC with an > > RPC_GSS_PROC_DESTROY credential but no verifier. The lack of a > > verifier makes the packet corrupt so the server ignores it, but people > > see it and think something is wrong. > > > > It is good that the server ignores it as it really shouldn't be there. > > What happens is that the NFS client calls up to rpc.gssd to request a > > credential. rpc.gssd then establishes a connection directly with the > > server, including the establishment of the security context. Then it > > gathers the context details and passed them down to the kernel. > > Then it closes the connection part of which involves calling > > AUTH_DESTROY(auth) - necessary to free up data structures and not leak > > memory. > > This AUTH_DESTROY tries to destroy the context completely, including > > telling the server that it has been destroyed! But it hasn't, it has > > just been passed down to the kernel for use on a different connection. > > > > So there are two issues here: > > - why is the GSS_PROC_DESTROY packet missing a verifier > > - how can we get AUTH_DESTROY to *not* try to destroy the context on > the > > server - as that would be a bad thing. > > > > The first I cannot completely answer. I do know that in libtirpc, in > > auth_gss.c, in authgss_marshal(), gss_get_mic is failing because it doesn't > > think it has a valid context. I don't know why it thinks that, and I don't > > really care. > > > > > > The second question is more interesting and I see two possible options. > > > > 1/ If we knew why gss_get_mic failed and had good reason to believe it > > would keep on failing, we could consider changing clnt_vc_call to > > respond to an error from AUTH_MARSHALL not by sending a truncated > > packet, but by purging the current message and not sending it at all. > > This should be possible but might be messy. > > > > 2/ Make libtirpc behave more like librpcsecgss. > > In libtirpc, the authgss_get_private_data() function just hands over > > a pointer to the private data, but keeps its own pointer so it can > > free it when the client is finally destroyed. > > > > In librpcsecgss, since commit > > 07fce317cac267509b944a8191cafa8e49b5e328 > > (thanks Kevin), authgss_get_private_data() hands the data over to the > > caller and doesn't keep it's own reference to it. So the caller has > > to call > > authgss_free_private_data() when it has finished with the data. > > As the library no longer has the credential, it doesn't even bother > > trying to send a GSS_PROC_DESTROY request. > > > > When Chuck noticed this difference between the two libraries, he > > resolved it - in commit 336f8bca825416082d62ef38314f3e0b7e8f5cc2 as > follow: > > > > if (token.value) > > free(token.value); > > +#ifndef HAVE_LIBTIRPC > > if (pd.pd_ctx_hndl.length != 0) > > authgss_free_private_data(&pd); > > +#endif > > > > Clearly to significance of this difference was not obvious, and this > > was the easiest fix. > > > > If we were to "fix" this properly, we would need to add a commit like > > the one from Kevin to libtirpc, and remove that #ifndef from nfs-utils. > > co-ordinating this might be tricking. nfs-utils could presumably > > test if libtirpc provided the function (at configure time) and call > > it if it does, > > This seems to me like the best approach for 2. > > > However is someone updates libtirpc without updating or recompiling > > nfs-utils they would get a memory leak. May it would be slow enough > > not to be serious, and if anyone noticed that could just upgrade and get a > fix. > > Telling people to upgrade for a fix is what we do for a living. In all > seriousness, though, in the common case, people will be using nfs-utils and > libtirpc built by distributions, and we expect the distros will get the fix > dependency right over time. > > > Does this seem reasonable? How is maintaining libtirpc these days? > > Could we get the fix into 0.2.3, or would we need a minor version > > bump to 0.3.0?? > > A minor version bump shouldn't be necessary if we're not changing the > synopsis of a published API, nor are we removing a published API. > > > 3/ there is actually a third option. We could change > > authgss_get_private_data() to set gc.gc_ctx.length to 0, but not free > > the buffer. Then aithgss_destroy_context() could notice that the > > length is zero and the buffer is not NULL, and could free the buffer but not > try to send > > the context_destroy request. It's an ugly hack though and I think I'd > > rather not. 4/ Have authgss_get_private_data() consume the 'auth' argument. Reusing the auth in an RPC call after we've transferred the context to the kernel is in any case a bug, so why allow it at all? Cheers, Trond ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux servers. 2013-01-30 23:19 ` Myklebust, Trond @ 2013-02-04 23:42 ` NeilBrown 0 siblings, 0 replies; 4+ messages in thread From: NeilBrown @ 2013-02-04 23:42 UTC (permalink / raw) To: Myklebust, Trond Cc: Chuck Lever, Kevin Coffman, J.Bruce Fields, Steve Dickson, NFS [-- Attachment #1: Type: text/plain, Size: 6249 bytes --] On Wed, 30 Jan 2013 23:19:20 +0000 "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > > -----Original Message----- > > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- > > owner@vger.kernel.org] On Behalf Of Chuck Lever > > Sent: Thursday, January 24, 2013 11:14 AM > > To: NeilBrown > > Cc: Kevin Coffman; J.Bruce Fields; Steve Dickson; NFS > > Subject: Re: Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux > > servers. > > > > > > On Jan 23, 2013, at 9:02 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > Hi peoples, > > > > > > this issue has appeared on the mailing list before (particularly > > > around July > > > 2011) but hasn't been resolved yet and it just bit me again so I > > > figure it is time it got fixed. > > > > > > > > > If you tcpdump the network connection while mounting an NFS filesystem > > > using kerberos - or while the client is establishing a new context > > > because e.g. the server rebooted - you will see a NULL RPC with an > > > RPC_GSS_PROC_DESTROY credential but no verifier. The lack of a > > > verifier makes the packet corrupt so the server ignores it, but people > > > see it and think something is wrong. > > > > > > It is good that the server ignores it as it really shouldn't be there. > > > What happens is that the NFS client calls up to rpc.gssd to request a > > > credential. rpc.gssd then establishes a connection directly with the > > > server, including the establishment of the security context. Then it > > > gathers the context details and passed them down to the kernel. > > > Then it closes the connection part of which involves calling > > > AUTH_DESTROY(auth) - necessary to free up data structures and not leak > > > memory. > > > This AUTH_DESTROY tries to destroy the context completely, including > > > telling the server that it has been destroyed! But it hasn't, it has > > > just been passed down to the kernel for use on a different connection. > > > > > > So there are two issues here: > > > - why is the GSS_PROC_DESTROY packet missing a verifier > > > - how can we get AUTH_DESTROY to *not* try to destroy the context on > > the > > > server - as that would be a bad thing. > > > > > > The first I cannot completely answer. I do know that in libtirpc, in > > > auth_gss.c, in authgss_marshal(), gss_get_mic is failing because it doesn't > > > think it has a valid context. I don't know why it thinks that, and I don't > > > really care. > > > > > > > > > The second question is more interesting and I see two possible options. > > > > > > 1/ If we knew why gss_get_mic failed and had good reason to believe it > > > would keep on failing, we could consider changing clnt_vc_call to > > > respond to an error from AUTH_MARSHALL not by sending a truncated > > > packet, but by purging the current message and not sending it at all. > > > This should be possible but might be messy. > > > > > > 2/ Make libtirpc behave more like librpcsecgss. > > > In libtirpc, the authgss_get_private_data() function just hands over > > > a pointer to the private data, but keeps its own pointer so it can > > > free it when the client is finally destroyed. > > > > > > In librpcsecgss, since commit > > > 07fce317cac267509b944a8191cafa8e49b5e328 > > > (thanks Kevin), authgss_get_private_data() hands the data over to the > > > caller and doesn't keep it's own reference to it. So the caller has > > > to call > > > authgss_free_private_data() when it has finished with the data. > > > As the library no longer has the credential, it doesn't even bother > > > trying to send a GSS_PROC_DESTROY request. > > > > > > When Chuck noticed this difference between the two libraries, he > > > resolved it - in commit 336f8bca825416082d62ef38314f3e0b7e8f5cc2 as > > follow: > > > > > > if (token.value) > > > free(token.value); > > > +#ifndef HAVE_LIBTIRPC > > > if (pd.pd_ctx_hndl.length != 0) > > > authgss_free_private_data(&pd); > > > +#endif > > > > > > Clearly to significance of this difference was not obvious, and this > > > was the easiest fix. > > > > > > If we were to "fix" this properly, we would need to add a commit like > > > the one from Kevin to libtirpc, and remove that #ifndef from nfs-utils. > > > co-ordinating this might be tricking. nfs-utils could presumably > > > test if libtirpc provided the function (at configure time) and call > > > it if it does, > > > > This seems to me like the best approach for 2. > > > > > However is someone updates libtirpc without updating or recompiling > > > nfs-utils they would get a memory leak. May it would be slow enough > > > not to be serious, and if anyone noticed that could just upgrade and get a > > fix. > > > > Telling people to upgrade for a fix is what we do for a living. In all > > seriousness, though, in the common case, people will be using nfs-utils and > > libtirpc built by distributions, and we expect the distros will get the fix > > dependency right over time. > > > > > Does this seem reasonable? How is maintaining libtirpc these days? > > > Could we get the fix into 0.2.3, or would we need a minor version > > > bump to 0.3.0?? > > > > A minor version bump shouldn't be necessary if we're not changing the > > synopsis of a published API, nor are we removing a published API. > > > > > 3/ there is actually a third option. We could change > > > authgss_get_private_data() to set gc.gc_ctx.length to 0, but not free > > > the buffer. Then aithgss_destroy_context() could notice that the > > > length is zero and the buffer is not NULL, and could free the buffer but not > > try to send > > > the context_destroy request. It's an ugly hack though and I think I'd > > > rather not. > > 4/ Have authgss_get_private_data() consume the 'auth' argument. > > Reusing the auth in an RPC call after we've transferred the context to the kernel is in any case a bug, so why allow it at all? > This is exactly the same as '2' - though stated much more succinctly. It looks like it is the approach that everyone prefers. I'll send some patches. Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-02-04 23:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20130124130243.449d5d92@notabene.brown>
2013-01-24 16:13 ` Corrupted RPC_GSS_PROC_DESTROY packets coming from Linux servers Chuck Lever
2013-01-30 22:29 ` J.Bruce Fields
2013-01-30 23:19 ` Myklebust, Trond
2013-02-04 23:42 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox