* [PATCH/RFC] svcgssd always sets an infinite expiry on authentication tokens etc.
@ 2008-12-02 5:18 Neil Brown
[not found] ` <18740.50457.981544.21225-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2008-12-02 5:18 UTC (permalink / raw)
To: linux-nfs; +Cc: Kevin Coffman, J. Bruce Fields, Steve Dickson
Hi,
I have a report of an NFS server which runs out of kernel memory when
it gets heave rpcsec_gss traffic (auth_sys doesn't trigger the
problem so it must be gss related).
From looking at /proc/slab_allocators it seems that the main user of
memory is the rsc and rsi caches.
It appears entries are inserted into these caches with an expiry of
'forever' so they grow but never shrink.
We should fix this.
For the rsi (init) cache I assume the entry is only needed once so a
short expiry of (say) one minute should be plenty.
For the rsc (context) cache, the entry could be needed repeatedly
during the lifetime of a 'session'. However eventually it will
become stale and should be allowed to expire.
I assume that if the kernel requests a particular entry a second
time, an hour later, it will get the same answer - is that correct?
In that case, setting the expiry to something largish seems
appropriate.
Hence the following patch (untested yet - but I will get it tested in
due course).
Does this seem reasonable?
Thanks,
NeilBrown
diff --git a/utils/gssd/svcgssd_proc.c b/utils/gssd/svcgssd_proc.c
index 794c2f4..088a007 100644
--- a/utils/gssd/svcgssd_proc.c
+++ b/utils/gssd/svcgssd_proc.c
@@ -86,7 +86,9 @@ do_svc_downcall(gss_buffer_desc *out_handle, struct svc_cred *cred,
}
qword_printhex(f, out_handle->value, out_handle->length);
/* XXX are types OK for the rest of this? */
- qword_printint(f, 0x7fffffff); /*XXX need a better timeout */
+
+ /* 'context' could be needed for a while. */
+ qword_printint(f, time(0) + 60*60);
qword_printint(f, cred->cr_uid);
qword_printint(f, cred->cr_gid);
qword_printint(f, cred->cr_ngroups);
@@ -130,7 +132,8 @@ send_response(FILE *f, gss_buffer_desc *in_handle, gss_buffer_desc *in_token,
qword_addhex(&bp, &blen, in_handle->value, in_handle->length);
qword_addhex(&bp, &blen, in_token->value, in_token->length);
- qword_addint(&bp, &blen, 0x7fffffff); /*XXX need a better timeout */
+ /* INIT context info will only be needed for a short while */
+ qword_addint(&bp, &blen, time(0) + 60);
qword_adduint(&bp, &blen, maj_stat);
qword_adduint(&bp, &blen, min_stat);
qword_addhex(&bp, &blen, out_handle->value, out_handle->length);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] svcgssd always sets an infinite expiry on authentication tokens etc.
[not found] ` <18740.50457.981544.21225-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2008-12-02 5:45 ` Kevin Coffman
[not found] ` <4d569c330812012145y2353bc9asd7a0c62fef42ed3a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-12-02 23:23 ` J. Bruce Fields
1 sibling, 1 reply; 6+ messages in thread
From: Kevin Coffman @ 2008-12-02 5:45 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs, J. Bruce Fields, Steve Dickson, Kevin Coffman
Hi Neil,
This seems reasonable.
I have a patch somewhere that gets the actual Kerberos expiration that
could be used for the rsc timeout. But I think this should be fine
for now. (Perhaps at the cost of requiring clients to negotiate a new
context every hour?)
K.C.
On Tue, Dec 2, 2008 at 12:18 AM, Neil Brown <neilb@suse.de> wrote:
>
>
> Hi,
> I have a report of an NFS server which runs out of kernel memory when
> it gets heave rpcsec_gss traffic (auth_sys doesn't trigger the
> problem so it must be gss related).
>
> From looking at /proc/slab_allocators it seems that the main user of
> memory is the rsc and rsi caches.
> It appears entries are inserted into these caches with an expiry of
> 'forever' so they grow but never shrink.
> We should fix this.
>
> For the rsi (init) cache I assume the entry is only needed once so a
> short expiry of (say) one minute should be plenty.
> For the rsc (context) cache, the entry could be needed repeatedly
> during the lifetime of a 'session'. However eventually it will
> become stale and should be allowed to expire.
>
> I assume that if the kernel requests a particular entry a second
> time, an hour later, it will get the same answer - is that correct?
>
> In that case, setting the expiry to something largish seems
> appropriate.
>
> Hence the following patch (untested yet - but I will get it tested in
> due course).
>
> Does this seem reasonable?
>
> Thanks,
> NeilBrown
>
>
> diff --git a/utils/gssd/svcgssd_proc.c b/utils/gssd/svcgssd_proc.c
> index 794c2f4..088a007 100644
> --- a/utils/gssd/svcgssd_proc.c
> +++ b/utils/gssd/svcgssd_proc.c
> @@ -86,7 +86,9 @@ do_svc_downcall(gss_buffer_desc *out_handle, struct svc_cred *cred,
> }
> qword_printhex(f, out_handle->value, out_handle->length);
> /* XXX are types OK for the rest of this? */
> - qword_printint(f, 0x7fffffff); /*XXX need a better timeout */
> +
> + /* 'context' could be needed for a while. */
> + qword_printint(f, time(0) + 60*60);
> qword_printint(f, cred->cr_uid);
> qword_printint(f, cred->cr_gid);
> qword_printint(f, cred->cr_ngroups);
> @@ -130,7 +132,8 @@ send_response(FILE *f, gss_buffer_desc *in_handle, gss_buffer_desc *in_token,
>
> qword_addhex(&bp, &blen, in_handle->value, in_handle->length);
> qword_addhex(&bp, &blen, in_token->value, in_token->length);
> - qword_addint(&bp, &blen, 0x7fffffff); /*XXX need a better timeout */
> + /* INIT context info will only be needed for a short while */
> + qword_addint(&bp, &blen, time(0) + 60);
> qword_adduint(&bp, &blen, maj_stat);
> qword_adduint(&bp, &blen, min_stat);
> qword_addhex(&bp, &blen, out_handle->value, out_handle->length);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] svcgssd always sets an infinite expiry on authentication tokens etc.
[not found] ` <4d569c330812012145y2353bc9asd7a0c62fef42ed3a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-12-02 16:04 ` Steve Dickson
[not found] ` <49355C78.6080607-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Steve Dickson @ 2008-12-02 16:04 UTC (permalink / raw)
To: Kevin Coffman; +Cc: Neil Brown, linux-nfs, J. Bruce Fields
Kevin Coffman wrote:
> Hi Neil,
> This seems reasonable.
>
> I have a patch somewhere that gets the actual Kerberos expiration that
> could be used for the rsc timeout. But I think this should be fine
> for now. (Perhaps at the cost of requiring clients to negotiate a new
> context every hour?)
This question is a bit worrisome, imho... I understand the need to release
memory consumed by dead contexts but on the other hand, renegotiating
contexts every hour on the hours seems a bit costly as well...
Does it make sense to make this time out configurable? Yes, it would be
a very obscure knob, but in the unlikely event there is a bug in the
renegotiating code or renegotiating simply becomes too costly, I think
it would good to have a way to dial the time out back up as a work-around.
steved.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] svcgssd always sets an infinite expiry on authentication tokens etc.
[not found] ` <49355C78.6080607-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org>
@ 2008-12-02 17:40 ` Kevin Coffman
[not found] ` <4d569c330812020940n3b8561fexfb97d89a7d5779a4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Kevin Coffman @ 2008-12-02 17:40 UTC (permalink / raw)
To: Steve Dickson; +Cc: Neil Brown, linux-nfs, J. Bruce Fields
On Tue, Dec 2, 2008 at 11:04 AM, Steve Dickson <SteveD@redhat.com> wrote:
> Kevin Coffman wrote:
>> Hi Neil,
>> This seems reasonable.
>>
>> I have a patch somewhere that gets the actual Kerberos expiration that
>> could be used for the rsc timeout. But I think this should be fine
>> for now. (Perhaps at the cost of requiring clients to negotiate a new
>> context every hour?)
> This question is a bit worrisome, imho... I understand the need to release
> memory consumed by dead contexts but on the other hand, renegotiating
> contexts every hour on the hours seems a bit costly as well...
>
> Does it make sense to make this time out configurable? Yes, it would be
> a very obscure knob, but in the unlikely event there is a bug in the
> renegotiating code or renegotiating simply becomes too costly, I think
> it would good to have a way to dial the time out back up as a work-around.
>
> steved.
Hi Steve,
Rather than adding another config knob, how 'bout I dust off my patch
to get the "right" timeout value? I should be able to make that
available tomorrow. (I have limited resources to work on this at the
moment.)
K.C.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] svcgssd always sets an infinite expiry on authentication tokens etc.
[not found] ` <18740.50457.981544.21225-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-12-02 5:45 ` Kevin Coffman
@ 2008-12-02 23:23 ` J. Bruce Fields
1 sibling, 0 replies; 6+ messages in thread
From: J. Bruce Fields @ 2008-12-02 23:23 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs, Kevin Coffman, Steve Dickson
On Tue, Dec 02, 2008 at 04:18:17PM +1100, Neil Brown wrote:
> I have a report of an NFS server which runs out of kernel memory when
> it gets heave rpcsec_gss traffic (auth_sys doesn't trigger the
> problem so it must be gss related).
>
> From looking at /proc/slab_allocators it seems that the main user of
> memory is the rsc and rsi caches.
> It appears entries are inserted into these caches with an expiry of
> 'forever' so they grow but never shrink.
> We should fix this.
Yes, definitely a leak. But it's funny, I have a strong memory of a
conversation on one of these lists with someone who looked into this
problem and found that even with some silly artificial tests that
established as many contexts per second as possible, they weren't able
to see significant memory consumption--but I can't find the thread now.
Some limit on the size might also be nice depending on what the worst
case looks like with expiries of about a day.
--b.
>
> For the rsi (init) cache I assume the entry is only needed once so a
> short expiry of (say) one minute should be plenty.
> For the rsc (context) cache, the entry could be needed repeatedly
> during the lifetime of a 'session'. However eventually it will
> become stale and should be allowed to expire.
>
> I assume that if the kernel requests a particular entry a second
> time, an hour later, it will get the same answer - is that correct?
>
> In that case, setting the expiry to something largish seems
> appropriate.
>
> Hence the following patch (untested yet - but I will get it tested in
> due course).
>
> Does this seem reasonable?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] svcgssd always sets an infinite expiry on authentication tokens etc.
[not found] ` <4d569c330812020940n3b8561fexfb97d89a7d5779a4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-12-03 22:26 ` Kevin Coffman
0 siblings, 0 replies; 6+ messages in thread
From: Kevin Coffman @ 2008-12-03 22:26 UTC (permalink / raw)
To: Steve Dickson; +Cc: Neil Brown, linux-nfs, J. Bruce Fields
> Dec 2, 2008 at 12:40 PM, Kevin Coffman <kwc@citi.umich.edu> wrote:
> On Tue, Dec 2, 2008 at 11:04 AM, Steve Dickson <SteveD@redhat.com> wrote:
>> Kevin Coffman wrote:
>>> Hi Neil,
>>> This seems reasonable.
>>>
>>> I have a patch somewhere that gets the actual Kerberos expiration that
>>> could be used for the rsc timeout. But I think this should be fine
>>> for now. (Perhaps at the cost of requiring clients to negotiate a new
>>> context every hour?)
>> This question is a bit worrisome, imho... I understand the need to release
>> memory consumed by dead contexts but on the other hand, renegotiating
>> contexts every hour on the hours seems a bit costly as well...
>>
>> Does it make sense to make this time out configurable? Yes, it would be
>> a very obscure knob, but in the unlikely event there is a bug in the
>> renegotiating code or renegotiating simply becomes too costly, I think
>> it would good to have a way to dial the time out back up as a work-around.
>>
>> steved.
>
> Hi Steve,
>
> Rather than adding another config knob, how 'bout I dust off my patch
> to get the "right" timeout value? I should be able to make that
> available tomorrow. (I have limited resources to work on this at the
> moment.)
>
> K.C.
It took me a bit longer than I anticipated to retrofit my changes for
this. I have patches that I will send out for review later tonight or
tomorrow.
K.C.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-12-03 22:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-02 5:18 [PATCH/RFC] svcgssd always sets an infinite expiry on authentication tokens etc Neil Brown
[not found] ` <18740.50457.981544.21225-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-12-02 5:45 ` Kevin Coffman
[not found] ` <4d569c330812012145y2353bc9asd7a0c62fef42ed3a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-12-02 16:04 ` Steve Dickson
[not found] ` <49355C78.6080607-AfCzQyP5zfLQT0dZR+AlfA@public.gmane.org>
2008-12-02 17:40 ` Kevin Coffman
[not found] ` <4d569c330812020940n3b8561fexfb97d89a7d5779a4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-12-03 22:26 ` Kevin Coffman
2008-12-02 23:23 ` J. Bruce Fields
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.