* NFS/credentials leak in 2.6.29-rc1 @ 2009-01-20 11:46 Evgeniy Polyakov 2009-01-20 13:37 ` Trond Myklebust 2009-01-20 15:11 ` J. Bruce Fields 0 siblings, 2 replies; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-20 11:46 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-fsdevel Hi Trond. I recently observed several OOM crashes on the NFS server machine used for the heavy benchmarks. It did not show in .28 iirc and I never got OOM crash with other benches with different filesystems. That's what I observe right now via slabtop, both constantly grew uring the test: 1798890 1798672 99% 0.12K 59963 30 239852K cred_jar 1798320 1798307 99% 0.25K 119888 15 479552K size-256 machine then died after quite short time. Overall it took about an hour or two for the 8gb of ram. Is this a known issue or should I spent more time investigating the cause? -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-20 11:46 NFS/credentials leak in 2.6.29-rc1 Evgeniy Polyakov @ 2009-01-20 13:37 ` Trond Myklebust 2009-01-20 13:49 ` Evgeniy Polyakov 2009-01-20 15:11 ` J. Bruce Fields 1 sibling, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2009-01-20 13:37 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: linux-fsdevel On Tue, 2009-01-20 at 14:46 +0300, Evgeniy Polyakov wrote: > Hi Trond. > > I recently observed several OOM crashes on the NFS server machine used > for the heavy benchmarks. It did not show in .28 iirc and I never got > OOM crash with other benches with different filesystems. > > That's what I observe right now via slabtop, both constantly grew uring > the test: > 1798890 1798672 99% 0.12K 59963 30 239852K cred_jar > 1798320 1798307 99% 0.25K 119888 15 479552K size-256 > > machine then died after quite short time. > Overall it took about an hour or two for the 8gb of ram. > > Is this a known issue or should I spent more time investigating the > cause? Hi Evgeniy, This leak is occurring on the server only and not the client, is that correct? I don't think this is a known issue but I've forwarded your mail on to Bruce Fields, who is the current NFS server maintainer, for further information. Cheers Trond ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-20 13:37 ` Trond Myklebust @ 2009-01-20 13:49 ` Evgeniy Polyakov 0 siblings, 0 replies; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-20 13:49 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-fsdevel On Tue, Jan 20, 2009 at 08:37:03AM -0500, Trond Myklebust (trond.myklebust@fys.uio.no) wrote: > This leak is occurring on the server only and not the client, is that > correct? Yes, I only observed it on the server. I think I can reproduce it here, at least I already crashed machine 3 times. It looks like it fires with random IO (can not say if read or write though), since the same overall amount of data read or written sequentially did not result in OOM condition, but leak still may be presented there if credential is attached in single readpages()... Do not know. > I don't think this is a known issue but I've forwarded your mail on to > Bruce Fields, who is the current NFS server maintainer, for further > information. Thank you. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-20 11:46 NFS/credentials leak in 2.6.29-rc1 Evgeniy Polyakov 2009-01-20 13:37 ` Trond Myklebust @ 2009-01-20 15:11 ` J. Bruce Fields 2009-01-20 15:23 ` Evgeniy Polyakov 2009-01-20 21:44 ` David Howells 1 sibling, 2 replies; 27+ messages in thread From: J. Bruce Fields @ 2009-01-20 15:11 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Trond Myklebust, linux-fsdevel, David Howells On Tue, Jan 20, 2009 at 02:46:49PM +0300, Evgeniy Polyakov wrote: > I recently observed several OOM crashes on the NFS server machine used > for the heavy benchmarks. It did not show in .28 iirc and I never got > OOM crash with other benches with different filesystems. > > That's what I observe right now via slabtop, both constantly grew uring > the test: > 1798890 1798672 99% 0.12K 59963 30 239852K cred_jar > 1798320 1798307 99% 0.25K 119888 15 479552K size-256 > > machine then died after quite short time. > Overall it took about an hour or two for the 8gb of ram. > > Is this a known issue or should I spent more time investigating the > cause? This doesn't look familiar, no; thanks for the report. I guess we should take a careful look at the recent changes to fs/nfsd/auth.c? --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-20 15:11 ` J. Bruce Fields @ 2009-01-20 15:23 ` Evgeniy Polyakov 2009-01-20 23:53 ` J. Bruce Fields 2009-01-20 21:44 ` David Howells 1 sibling, 1 reply; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-20 15:23 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Trond Myklebust, linux-fsdevel, David Howells On Tue, Jan 20, 2009 at 10:11:25AM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > This doesn't look familiar, no; thanks for the report. I guess we > should take a careful look at the recent changes to fs/nfsd/auth.c? If creds are allocated in nfsd_setuser() and never freed? groups_alloc() can also explain size-256 slab grew, so this may be the place where things are allocated, but why they are not freed? This may also explain why I did not see this for the large sequential IO, since number of requests to the server was noticebly smaller, than in random IO test. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-20 15:23 ` Evgeniy Polyakov @ 2009-01-20 23:53 ` J. Bruce Fields 2009-01-21 12:23 ` David Howells 0 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2009-01-20 23:53 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Trond Myklebust, linux-fsdevel, David Howells On Tue, Jan 20, 2009 at 06:23:04PM +0300, Evgeniy Polyakov wrote: > On Tue, Jan 20, 2009 at 10:11:25AM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > > This doesn't look familiar, no; thanks for the report. I guess we > > should take a careful look at the recent changes to fs/nfsd/auth.c? > > If creds are allocated in nfsd_setuser() and never freed? groups_alloc() > can also explain size-256 slab grew, so this may be the place where > things are allocated, but why they are not freed? > This may also explain why I did not see this for the large sequential > IO, since number of requests to the server was noticebly smaller, than > in random IO test. Looking through nfsd_setuser(), one obvious bug: in the (flags & NFSEXP_ALLSQUASH) case, we never check the return value from the groups_alloc(0). If it returns NULL, we dereference it anyway. But that's unrelated. For the cred reference counting: - revert_creds(get_cred(current->real_cred)) modifies current->cred, putting the old value and getting the new. OK. - new = prepare_creds() creates a new object with count 1. All subsequent exits from the function go through the oom or error labels, which put new. OK. - Each of the three ALLSQUASH/ROOTSQUASH/else paths create a new reference to a groups_info struct in gi, which is unconditionally assigned to new by set_groups(). set_groups() takes its own reference on gi, then we put the original reference in the following put_group_info(). OK. - Finally, we put_cred(override_creds(new)). That modifies current->cred again, putting the old value and getting the new. Hm. But that last part's not OK; aren't we still holding our own reference to new, in addition to the one that override_creds() just took? So I think we need the following? --b. diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c index c903e04..9966e9e 100644 --- a/fs/nfsd/auth.c +++ b/fs/nfsd/auth.c @@ -85,6 +85,7 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp) new->cap_effective = cap_raise_nfsd_set(new->cap_effective, new->cap_permitted); put_cred(override_creds(new)); + put_cred(new); return 0; oom: ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-20 23:53 ` J. Bruce Fields @ 2009-01-21 12:23 ` David Howells 2009-01-21 12:37 ` Evgeniy Polyakov 2009-01-21 22:37 ` J. Bruce Fields 0 siblings, 2 replies; 27+ messages in thread From: David Howells @ 2009-01-21 12:23 UTC (permalink / raw) To: J. Bruce Fields Cc: dhowells, Evgeniy Polyakov, Trond Myklebust, linux-fsdevel J. Bruce Fields <bfields@fieldses.org> wrote: > - Finally, we put_cred(override_creds(new)). That modifies > current->cred again, putting the old value and getting the > new. > > Hm. But that last part's not OK; aren't we still holding our own > reference to new, in addition to the one that override_creds() just > took? So I think we need the following? Yes, you're right. override_creds() takes an extra ref on the argument it is passed, thus leaving the caller with their original reference intact. So really, you don't want to call override_creds() as that will cost you an extra atomic_inc() and atomic_dec_and_test(). I recommend you replace: put_cred(override_creds(new)); with: revert_creds(new); I think that should do the right thing. It may look a bit odd, but it'll be quicker. If you object to using revert_creds)( because of the name, we can come up with an alternative name. > Looking through nfsd_setuser(), one obvious bug: in the (flags & > NFSEXP_ALLSQUASH) case, we never check the return value from the > groups_alloc(0). If it returns NULL, we dereference it anyway. Since a zero-length groups list must be copied before writing, can I recommend that we make groups_alloc(0) a special case that returns pointer to a statically allocated groups list (after inc'ing the refcount) that represents a zero-length list, thus meaning groups_alloc(0) will never fail? David ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 12:23 ` David Howells @ 2009-01-21 12:37 ` Evgeniy Polyakov 2009-01-21 13:17 ` David Howells ` (2 more replies) 2009-01-21 22:37 ` J. Bruce Fields 1 sibling, 3 replies; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-21 12:37 UTC (permalink / raw) To: David Howells; +Cc: J. Bruce Fields, Trond Myklebust, linux-fsdevel On Wed, Jan 21, 2009 at 12:23:09PM +0000, David Howells (dhowells@redhat.com) wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > > - Finally, we put_cred(override_creds(new)). That modifies > > current->cred again, putting the old value and getting the > > new. > > > > Hm. But that last part's not OK; aren't we still holding our own > > reference to new, in addition to the one that override_creds() just > > took? So I think we need the following? > > Yes, you're right. override_creds() takes an extra ref on the argument it is > passed, thus leaving the caller with their original reference intact. > > So really, you don't want to call override_creds() as that will cost you an > extra atomic_inc() and atomic_dec_and_test(). I recommend you replace: > > put_cred(override_creds(new)); > > with: > > revert_creds(new); > > I think that should do the right thing. It may look a bit odd, but it'll be > quicker. If you object to using revert_creds)( because of the name, we can > come up with an alternative name. With additional put_cred, i.e.: put_cred(override_creds(new)); put_cred(new); return 0; I got following fun tcpdump and failed mount (it stuck, but can be interrupted): 15:34:41.253911 IP 77.88.20.183.1835336279 > 77.88.20.182.2049: 44 null 15:34:41.253916 IP 77.88.20.182.2049 > 77.88.20.183.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358> 15:34:41.254103 IP 77.88.20.182.2049 > 77.88.20.183.1835336279: reply ok 28 null 15:34:41.254229 IP 77.88.20.183.728 > 77.88.20.182.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463> 15:34:41.254238 IP 77.88.20.183.1852113495 > 77.88.20.182.2049: 44 null 15:34:41.254271 IP 77.88.20.182.2049 > 77.88.20.183.1852113495: reply ok 28 null 15:34:41.254378 IP 77.88.20.183.1868890711 > 77.88.20.182.2049: 100 fsinfo [|nfs] 15:34:41.254411 IP 77.88.20.182.2049 > 77.88.20.183.1868890711: reply ok 36 fsinfo [|nfs] 15:34:41.254528 IP 77.88.20.183.1885667927 > 77.88.20.182.2049: 100 fsinfo [|nfs] 15:34:41.254555 IP 77.88.20.182.2049 > 77.88.20.183.1885667927: reply ok 36 fsinfo [|nfs] But no corruption in the dmesg (like oops or bug). -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 12:37 ` Evgeniy Polyakov @ 2009-01-21 13:17 ` David Howells 2009-01-21 13:18 ` Evgeniy Polyakov 2009-01-21 22:39 ` J. Bruce Fields 2009-01-27 0:49 ` J. Bruce Fields 2 siblings, 1 reply; 27+ messages in thread From: David Howells @ 2009-01-21 13:17 UTC (permalink / raw) To: Evgeniy Polyakov Cc: dhowells, J. Bruce Fields, Trond Myklebust, linux-fsdevel Evgeniy Polyakov <zbr@ioremap.net> wrote: > I got following fun tcpdump and failed mount (it stuck, but can be interrupted): > ... > But no corruption in the dmesg (like oops or bug). Are you running with CONFIG_DEBUG_SLAB=y? David ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 13:17 ` David Howells @ 2009-01-21 13:18 ` Evgeniy Polyakov 0 siblings, 0 replies; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-21 13:18 UTC (permalink / raw) To: David Howells; +Cc: J. Bruce Fields, Trond Myklebust, linux-fsdevel On Wed, Jan 21, 2009 at 01:17:02PM +0000, David Howells (dhowells@redhat.com) wrote: > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted): > > ... > > But no corruption in the dmesg (like oops or bug). > > Are you running with CONFIG_DEBUG_SLAB=y? Nope: # CONFIG_DEBUG_KERNEL is not set -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 12:37 ` Evgeniy Polyakov 2009-01-21 13:17 ` David Howells @ 2009-01-21 22:39 ` J. Bruce Fields 2009-01-21 22:46 ` Evgeniy Polyakov 2009-01-27 0:49 ` J. Bruce Fields 2 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2009-01-21 22:39 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Wed, Jan 21, 2009 at 03:37:28PM +0300, Evgeniy Polyakov wrote: > On Wed, Jan 21, 2009 at 12:23:09PM +0000, David Howells (dhowells@redhat.com) wrote: > > J. Bruce Fields <bfields@fieldses.org> wrote: > > > > > - Finally, we put_cred(override_creds(new)). That modifies > > > current->cred again, putting the old value and getting the > > > new. > > > > > > Hm. But that last part's not OK; aren't we still holding our own > > > reference to new, in addition to the one that override_creds() just > > > took? So I think we need the following? > > > > Yes, you're right. override_creds() takes an extra ref on the argument it is > > passed, thus leaving the caller with their original reference intact. > > > > So really, you don't want to call override_creds() as that will cost you an > > extra atomic_inc() and atomic_dec_and_test(). I recommend you replace: > > > > put_cred(override_creds(new)); > > > > with: > > > > revert_creds(new); > > > > I think that should do the right thing. It may look a bit odd, but it'll be > > quicker. If you object to using revert_creds)( because of the name, we can > > come up with an alternative name. > > With additional put_cred, i.e.: > > put_cred(override_creds(new)); > put_cred(new); > return 0; > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted): I'm really bad at reading tcpdump. Where's the problem here? It looks like every call gets a reply. --b. > > 15:34:41.253911 IP 77.88.20.183.1835336279 > 77.88.20.182.2049: 44 null > 15:34:41.253916 IP 77.88.20.182.2049 > 77.88.20.183.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358> > 15:34:41.254103 IP 77.88.20.182.2049 > 77.88.20.183.1835336279: reply ok 28 null > 15:34:41.254229 IP 77.88.20.183.728 > 77.88.20.182.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463> > 15:34:41.254238 IP 77.88.20.183.1852113495 > 77.88.20.182.2049: 44 null > 15:34:41.254271 IP 77.88.20.182.2049 > 77.88.20.183.1852113495: reply ok 28 null > 15:34:41.254378 IP 77.88.20.183.1868890711 > 77.88.20.182.2049: 100 fsinfo [|nfs] > 15:34:41.254411 IP 77.88.20.182.2049 > 77.88.20.183.1868890711: reply ok 36 fsinfo [|nfs] > 15:34:41.254528 IP 77.88.20.183.1885667927 > 77.88.20.182.2049: 100 fsinfo [|nfs] > 15:34:41.254555 IP 77.88.20.182.2049 > 77.88.20.183.1885667927: reply ok 36 fsinfo [|nfs] > > But no corruption in the dmesg (like oops or bug). > > -- > Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 22:39 ` J. Bruce Fields @ 2009-01-21 22:46 ` Evgeniy Polyakov 2009-01-21 23:18 ` J. Bruce Fields 0 siblings, 1 reply; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-21 22:46 UTC (permalink / raw) To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Wed, Jan 21, 2009 at 05:39:19PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted): > > I'm really bad at reading tcpdump. Where's the problem here? It looks > like every call gets a reply. Port fields are fun > > 15:34:41.253911 IP addr1.1835336279 > addr2.2049: 44 null ^^^^^^^^^^ vs ^^^^ It is likely copied skb corruption, ports change, but acks correspond sometimes. > > 15:34:41.253916 IP addr2.2049 > addr1.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358> > > 15:34:41.254103 IP addr2.2049 > addr1.1835336279: reply ok 28 null > > 15:34:41.254229 IP addr1.728 > addr2.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463> > > 15:34:41.254238 IP addr1.1852113495 > addr2.2049: 44 null > > 15:34:41.254271 IP addr2.2049 > addr1.1852113495: reply ok 28 null > > 15:34:41.254378 IP addr1.1868890711 > addr2.2049: 100 fsinfo [|nfs] > > 15:34:41.254411 IP addr2.2049 > addr1.1868890711: reply ok 36 fsinfo [|nfs] > > 15:34:41.254528 IP addr1.1885667927 > addr2.2049: 100 fsinfo [|nfs] > > 15:34:41.254555 IP addr2.2049 > addr1.1885667927: reply ok 36 fsinfo [|nfs] -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 22:46 ` Evgeniy Polyakov @ 2009-01-21 23:18 ` J. Bruce Fields 2009-01-21 23:31 ` Evgeniy Polyakov 0 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2009-01-21 23:18 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Thu, Jan 22, 2009 at 01:46:59AM +0300, Evgeniy Polyakov wrote: > On Wed, Jan 21, 2009 at 05:39:19PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted): > > > > I'm really bad at reading tcpdump. Where's the problem here? It looks > > like every call gets a reply. > > Port fields are fun > > > > 15:34:41.253911 IP addr1.1835336279 > addr2.2049: 44 null > ^^^^^^^^^^ vs ^^^^ > > It is likely copied skb corruption, ports change, but acks correspond > sometimes. Obviously it's normal that the client- and server- side ports would differ, so I assume you meant to point out the variation in the client port number? (1835336279, 728, 1852113495, 1868890711, 1885667927). --b. > > > > 15:34:41.253916 IP addr2.2049 > addr1.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358> > > > 15:34:41.254103 IP addr2.2049 > addr1.1835336279: reply ok 28 null > > > 15:34:41.254229 IP addr1.728 > addr2.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463> > > > 15:34:41.254238 IP addr1.1852113495 > addr2.2049: 44 null > > > 15:34:41.254271 IP addr2.2049 > addr1.1852113495: reply ok 28 null > > > 15:34:41.254378 IP addr1.1868890711 > addr2.2049: 100 fsinfo [|nfs] > > > 15:34:41.254411 IP addr2.2049 > addr1.1868890711: reply ok 36 fsinfo [|nfs] > > > 15:34:41.254528 IP addr1.1885667927 > addr2.2049: 100 fsinfo [|nfs] > > > 15:34:41.254555 IP addr2.2049 > addr1.1885667927: reply ok 36 fsinfo [|nfs] > > -- > Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 23:18 ` J. Bruce Fields @ 2009-01-21 23:31 ` Evgeniy Polyakov 0 siblings, 0 replies; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-21 23:31 UTC (permalink / raw) To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Wed, Jan 21, 2009 at 06:18:11PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > > Port fields are fun > > > > > > 15:34:41.253911 IP addr1.1835336279 > addr2.2049: 44 null > > ^^^^^^^^^^ vs ^^^^ > > > > It is likely copied skb corruption, ports change, but acks correspond > > sometimes. > > Obviously it's normal that the client- and server- side ports would > differ, so I assume you meant to point out the variation in the client > port number? (1835336279, 728, 1852113495, 1868890711, 1885667927). I meant that ack still comes to the fun port, not only to correct one. And some ports are obviously incorrect like 4 above. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 12:37 ` Evgeniy Polyakov 2009-01-21 13:17 ` David Howells 2009-01-21 22:39 ` J. Bruce Fields @ 2009-01-27 0:49 ` J. Bruce Fields 2009-01-27 9:26 ` Evgeniy Polyakov 2009-02-05 13:22 ` David Howells 2 siblings, 2 replies; 27+ messages in thread From: J. Bruce Fields @ 2009-01-27 0:49 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Wed, Jan 21, 2009 at 03:37:28PM +0300, Evgeniy Polyakov wrote: > With additional put_cred, i.e.: > > put_cred(override_creds(new)); > put_cred(new); > return 0; > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted): I'm not able to reproduce this. Have you been able to reproduce this consistently? Any further results? If not, I'm tempted to assume you're seeing some unrelated bug. David, can you ACK the additional put_cred()? --b. > > 15:34:41.253911 IP 77.88.20.183.1835336279 > 77.88.20.182.2049: 44 null > 15:34:41.253916 IP 77.88.20.182.2049 > 77.88.20.183.728: . ack 44 win 88 <nop,nop,timestamp 125462 37402358> > 15:34:41.254103 IP 77.88.20.182.2049 > 77.88.20.183.1835336279: reply ok 28 null > 15:34:41.254229 IP 77.88.20.183.728 > 77.88.20.182.2049: . ack 29 win 89 <nop,nop,timestamp 37402358 125463> > 15:34:41.254238 IP 77.88.20.183.1852113495 > 77.88.20.182.2049: 44 null > 15:34:41.254271 IP 77.88.20.182.2049 > 77.88.20.183.1852113495: reply ok 28 null > 15:34:41.254378 IP 77.88.20.183.1868890711 > 77.88.20.182.2049: 100 fsinfo [|nfs] > 15:34:41.254411 IP 77.88.20.182.2049 > 77.88.20.183.1868890711: reply ok 36 fsinfo [|nfs] > 15:34:41.254528 IP 77.88.20.183.1885667927 > 77.88.20.182.2049: 100 fsinfo [|nfs] > 15:34:41.254555 IP 77.88.20.182.2049 > 77.88.20.183.1885667927: reply ok 36 fsinfo [|nfs] > > But no corruption in the dmesg (like oops or bug). > > -- > Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-27 0:49 ` J. Bruce Fields @ 2009-01-27 9:26 ` Evgeniy Polyakov 2009-01-27 22:07 ` J. Bruce Fields 2009-02-05 13:22 ` David Howells 1 sibling, 1 reply; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-27 9:26 UTC (permalink / raw) To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Mon, Jan 26, 2009 at 07:49:55PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > > With additional put_cred, i.e.: > > > > put_cred(override_creds(new)); > > put_cred(new); > > return 0; > > > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted): > > I'm not able to reproduce this. > > Have you been able to reproduce this consistently? Any further results? Yes, I can easily reproduce it with the tree I work with. But I will update to the latest git soon (likely today) and rerun the tests, so it may happen that rc1-rc2 frame fixed the issue at least partially. I can also turn on some debugging options if needed. > If not, I'm tempted to assume you're seeing some unrelated bug. > > David, can you ACK the additional put_cred()? -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-27 9:26 ` Evgeniy Polyakov @ 2009-01-27 22:07 ` J. Bruce Fields 2009-01-29 14:37 ` Evgeniy Polyakov 0 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2009-01-27 22:07 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Tue, Jan 27, 2009 at 12:26:59PM +0300, Evgeniy Polyakov wrote: > On Mon, Jan 26, 2009 at 07:49:55PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > > > With additional put_cred, i.e.: > > > > > > put_cred(override_creds(new)); > > > put_cred(new); > > > return 0; > > > > > > I got following fun tcpdump and failed mount (it stuck, but can be interrupted): > > > > I'm not able to reproduce this. > > > > Have you been able to reproduce this consistently? Any further results? > > Yes, I can easily reproduce it with the tree I work with. > But I will update to the latest git soon (likely today) and rerun the > tests, so it may happen that rc1-rc2 frame fixed the issue at least > partially. I can also turn on some debugging options if needed. I took a closer look at the tcpdump output. Those numbers aren't port numbers, they're rpc xid's: see "NFS Requests and Replies" in the "OUTPUT FORMAT" section of the tcpdump man page. Or compare the output of tcpdump with wireshark's output. That still doesn't explain why you were seeing a mount hang--but some network or other configuration problem may be the most likely explanation. --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-27 22:07 ` J. Bruce Fields @ 2009-01-29 14:37 ` Evgeniy Polyakov 2009-01-29 18:52 ` J. Bruce Fields 0 siblings, 1 reply; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-29 14:37 UTC (permalink / raw) To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Tue, Jan 27, 2009 at 05:07:35PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > I took a closer look at the tcpdump output. Those numbers aren't port > numbers, they're rpc xid's: see "NFS Requests and Replies" in the > "OUTPUT FORMAT" section of the tcpdump man page. Or compare the output > of tcpdump with wireshark's output. Yup, they are definitely not network ports :) > That still doesn't explain why you were seeing a mount hang--but some > network or other configuration problem may be the most likely > explanation. I've rerun the tests with the latest git to date and I do not observe neither leak (with credential patch applied) nor mount fail, so please put the patch into the tree. Feel free to add my signed/acked/tested/whatevered if needed :) -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-29 14:37 ` Evgeniy Polyakov @ 2009-01-29 18:52 ` J. Bruce Fields 2009-01-29 19:00 ` Evgeniy Polyakov 0 siblings, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2009-01-29 18:52 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Thu, Jan 29, 2009 at 05:37:53PM +0300, Evgeniy Polyakov wrote: > On Tue, Jan 27, 2009 at 05:07:35PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > > I took a closer look at the tcpdump output. Those numbers aren't port > > numbers, they're rpc xid's: see "NFS Requests and Replies" in the > > "OUTPUT FORMAT" section of the tcpdump man page. Or compare the output > > of tcpdump with wireshark's output. > > Yup, they are definitely not network ports :) > > > That still doesn't explain why you were seeing a mount hang--but some > > network or other configuration problem may be the most likely > > explanation. > > I've rerun the tests with the latest git to date and I do not observe > neither leak (with credential patch applied) nor mount fail, so please > put the patch into the tree. Thanks for the confirmation--but I jumped the gun and already submitted them... > Feel free to add my signed/acked/tested/whatevered if needed :) ... and forgot to credit you--sorry, I should have. Thanks again for catching the problem! --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-29 18:52 ` J. Bruce Fields @ 2009-01-29 19:00 ` Evgeniy Polyakov 0 siblings, 0 replies; 27+ messages in thread From: Evgeniy Polyakov @ 2009-01-29 19:00 UTC (permalink / raw) To: J. Bruce Fields; +Cc: David Howells, Trond Myklebust, linux-fsdevel On Thu, Jan 29, 2009 at 01:52:53PM -0500, J. Bruce Fields (bfields@fieldses.org) wrote: > > I've rerun the tests with the latest git to date and I do not observe > > neither leak (with credential patch applied) nor mount fail, so please > > put the patch into the tree. > > Thanks for the confirmation--but I jumped the gun and already submitted > them... > > > Feel free to add my signed/acked/tested/whatevered if needed :) > > ... and forgot to credit you--sorry, I should have. Thanks again for > catching the problem! No problem. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-27 0:49 ` J. Bruce Fields 2009-01-27 9:26 ` Evgeniy Polyakov @ 2009-02-05 13:22 ` David Howells 2009-02-05 17:21 ` J. Bruce Fields 1 sibling, 1 reply; 27+ messages in thread From: David Howells @ 2009-02-05 13:22 UTC (permalink / raw) To: J. Bruce Fields Cc: dhowells, Evgeniy Polyakov, Trond Myklebust, linux-fsdevel Sorry for not replying earlier, but I've been in Australia and my home DSL line has been down for the last two weeks. J. Bruce Fields <bfields@fieldses.org> wrote: > David, can you ACK the additional put_cred()? Feel free to add my Acked-by. David ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-02-05 13:22 ` David Howells @ 2009-02-05 17:21 ` J. Bruce Fields 0 siblings, 0 replies; 27+ messages in thread From: J. Bruce Fields @ 2009-02-05 17:21 UTC (permalink / raw) To: David Howells; +Cc: Evgeniy Polyakov, Trond Myklebust, linux-fsdevel On Thu, Feb 05, 2009 at 01:22:37PM +0000, David Howells wrote: > > Sorry for not replying earlier, but I've been in Australia and my home DSL > line has been down for the last two weeks. > > J. Bruce Fields <bfields@fieldses.org> wrote: > > > David, can you ACK the additional put_cred()? > > Feel free to add my Acked-by. I was impatient and already sent it upstream--but thanks for the confirmation.--b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 12:23 ` David Howells 2009-01-21 12:37 ` Evgeniy Polyakov @ 2009-01-21 22:37 ` J. Bruce Fields 2009-01-22 7:08 ` David Howells 1 sibling, 1 reply; 27+ messages in thread From: J. Bruce Fields @ 2009-01-21 22:37 UTC (permalink / raw) To: David Howells; +Cc: Evgeniy Polyakov, Trond Myklebust, linux-fsdevel On Wed, Jan 21, 2009 at 12:23:09PM +0000, David Howells wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > > - Finally, we put_cred(override_creds(new)). That modifies > > current->cred again, putting the old value and getting the > > new. > > > > Hm. But that last part's not OK; aren't we still holding our own > > reference to new, in addition to the one that override_creds() just > > took? So I think we need the following? > > Yes, you're right. override_creds() takes an extra ref on the argument it is > passed, thus leaving the caller with their original reference intact. > > So really, you don't want to call override_creds() as that will cost you an > extra atomic_inc() and atomic_dec_and_test(). I recommend you replace: > > put_cred(override_creds(new)); > > with: > > revert_creds(new); > > I think that should do the right thing. It may look a bit odd, but it'll be > quicker. If you object to using revert_creds)( because of the name, we can > come up with an alternative name. If the only difference is just whether it takes a reference on the passed-in cred it might be clearest just to write set_creds(new); or set_creds(get_creds(new)); depending on which you want? In any case, yes, the revert_creds()/override_creds() names don't tell me much. > > Looking through nfsd_setuser(), one obvious bug: in the (flags & > > NFSEXP_ALLSQUASH) case, we never check the return value from the > > groups_alloc(0). If it returns NULL, we dereference it anyway. > > Since a zero-length groups list must be copied before writing, can I recommend > that we make groups_alloc(0) a special case that returns pointer to a > statically allocated groups list (after inc'ing the refcount) that represents > a zero-length list, thus meaning groups_alloc(0) will never fail? Is there a really big advantage to that? On the face of it it strikes me as a weird corner case that I'll trip over every time I look at this code. --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-21 22:37 ` J. Bruce Fields @ 2009-01-22 7:08 ` David Howells 2009-01-22 16:43 ` J. Bruce Fields 0 siblings, 1 reply; 27+ messages in thread From: David Howells @ 2009-01-22 7:08 UTC (permalink / raw) To: J. Bruce Fields Cc: dhowells, Evgeniy Polyakov, Trond Myklebust, linux-fsdevel J. Bruce Fields <bfields@fieldses.org> wrote: > If the only difference is just whether it takes a reference on the > passed-in cred it might be clearest just to write > > set_creds(new); > > or > set_creds(get_creds(new)); > > depending on which you want? The former would be preferable, if it transfers the reference on the creds to the task_struct, thus eliminating the need for a put_cred(). > In any case, yes, the revert_creds()/override_creds() names don't tell > me much. > > > > Looking through nfsd_setuser(), one obvious bug: in the (flags & > > > NFSEXP_ALLSQUASH) case, we never check the return value from the > > > groups_alloc(0). If it returns NULL, we dereference it anyway. > > > > Since a zero-length groups list must be copied before writing, can I recommend > > that we make groups_alloc(0) a special case that returns pointer to a > > statically allocated groups list (after inc'ing the refcount) that represents > > a zero-length list, thus meaning groups_alloc(0) will never fail? > > Is there a really big advantage to that? On the face of it it strikes > me as a weird corner case that I'll trip over every time I look at this > code. It'll remove a potential OOM condition. It's a minor optimisation, I think. David ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-22 7:08 ` David Howells @ 2009-01-22 16:43 ` J. Bruce Fields 0 siblings, 0 replies; 27+ messages in thread From: J. Bruce Fields @ 2009-01-22 16:43 UTC (permalink / raw) To: David Howells; +Cc: Evgeniy Polyakov, Trond Myklebust, linux-fsdevel On Thu, Jan 22, 2009 at 07:08:33AM +0000, David Howells wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > > If the only difference is just whether it takes a reference on the > > passed-in cred it might be clearest just to write > > > > set_creds(new); > > > > or > > set_creds(get_creds(new)); > > > > depending on which you want? > > The former would be preferable, if it transfers the reference on the creds to > the task_struct, thus eliminating the need for a put_cred(). I think I was unclear--but I think we're agreeing anyway? I was proposing eliminating the two separate revert_creds() and override_creds() functions and instead using a single set_creds() that always consumes a reference to its argument, requiring the caller to explicitly get a reference (as in the second example above) when necessary. > > Is there a really big advantage to that? On the face of it it strikes > > me as a weird corner case that I'll trip over every time I look at this > > code. > > It'll remove a potential OOM condition. It's a minor optimisation, I think. OK. Let's keep things simple and set that idea aside for now; we've lived with the current groups_alloc(0) behavior for a while, and keeping it another kernel version or two can't be so bad. --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-20 15:11 ` J. Bruce Fields 2009-01-20 15:23 ` Evgeniy Polyakov @ 2009-01-20 21:44 ` David Howells 2009-01-21 22:42 ` J. Bruce Fields 1 sibling, 1 reply; 27+ messages in thread From: David Howells @ 2009-01-20 21:44 UTC (permalink / raw) To: J. Bruce Fields Cc: dhowells, Evgeniy Polyakov, Trond Myklebust, linux-fsdevel J. Bruce Fields <bfields@fieldses.org> wrote: > On Tue, Jan 20, 2009 at 02:46:49PM +0300, Evgeniy Polyakov wrote: > This doesn't look familiar, no; thanks for the report. I guess we > should take a careful look at the recent changes to fs/nfsd/auth.c? I don't see anything obvious in fs/nfsd/auth.c or nfsfh.c, but I proably won't be able to investigate properly until I get back from LCA in Feb. David ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: NFS/credentials leak in 2.6.29-rc1 2009-01-20 21:44 ` David Howells @ 2009-01-21 22:42 ` J. Bruce Fields 0 siblings, 0 replies; 27+ messages in thread From: J. Bruce Fields @ 2009-01-21 22:42 UTC (permalink / raw) To: David Howells; +Cc: Evgeniy Polyakov, Trond Myklebust, linux-fsdevel On Tue, Jan 20, 2009 at 09:44:58PM +0000, David Howells wrote: > J. Bruce Fields <bfields@fieldses.org> wrote: > > > On Tue, Jan 20, 2009 at 02:46:49PM +0300, Evgeniy Polyakov wrote: > > This doesn't look familiar, no; thanks for the report. I guess we > > should take a careful look at the recent changes to fs/nfsd/auth.c? > > I don't see anything obvious in fs/nfsd/auth.c or nfsfh.c, but I proably won't > be able to investigate properly until I get back from LCA in Feb. As long as we get this figured out well before the 2.6.29 release.... --b. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2009-02-05 17:21 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-20 11:46 NFS/credentials leak in 2.6.29-rc1 Evgeniy Polyakov 2009-01-20 13:37 ` Trond Myklebust 2009-01-20 13:49 ` Evgeniy Polyakov 2009-01-20 15:11 ` J. Bruce Fields 2009-01-20 15:23 ` Evgeniy Polyakov 2009-01-20 23:53 ` J. Bruce Fields 2009-01-21 12:23 ` David Howells 2009-01-21 12:37 ` Evgeniy Polyakov 2009-01-21 13:17 ` David Howells 2009-01-21 13:18 ` Evgeniy Polyakov 2009-01-21 22:39 ` J. Bruce Fields 2009-01-21 22:46 ` Evgeniy Polyakov 2009-01-21 23:18 ` J. Bruce Fields 2009-01-21 23:31 ` Evgeniy Polyakov 2009-01-27 0:49 ` J. Bruce Fields 2009-01-27 9:26 ` Evgeniy Polyakov 2009-01-27 22:07 ` J. Bruce Fields 2009-01-29 14:37 ` Evgeniy Polyakov 2009-01-29 18:52 ` J. Bruce Fields 2009-01-29 19:00 ` Evgeniy Polyakov 2009-02-05 13:22 ` David Howells 2009-02-05 17:21 ` J. Bruce Fields 2009-01-21 22:37 ` J. Bruce Fields 2009-01-22 7:08 ` David Howells 2009-01-22 16:43 ` J. Bruce Fields 2009-01-20 21:44 ` David Howells 2009-01-21 22:42 ` J. Bruce Fields
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.