All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harry Edmon <harry-qmPYOCrcNLLyFCzt5hm0YvZ8FUJU4vz8@public.gmane.org>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Max Kellermann <max-hDT0AjmEH7RAfugRpC6u6w@public.gmane.org>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: High load in 2.6.27, NFS / rpcauth_lookup_credcache()?
Date: Mon, 15 Dec 2008 15:44:58 -0800	[thread overview]
Message-ID: <4946EBFA.60700@atmos.washington.edu> (raw)
In-Reply-To: <1224773745.7625.4.camel@localhost>

Trond Myklebust wrote:
> On Thu, 2008-10-23 at 14:36 +0200, Max Kellermann wrote:
>   
>> On 2008/10/22 11:12, Max Kellermann <max-hDT0AjmEH7RAfugRpC6u6w@public.gmane.org> wrote:
>>     
>>> after I was able to fix http://lkml.org/lkml/2008/10/17/147, the
>>> server which was already upgraded to 2.6.27.2 still gets very high
>>> load.  It is a web server with NFS file storage (NetApp), and while
>>> the others in the cluster (kernel 2.6.25) have a load of 1-3, 2.6.27.2
>>> gets 30-50.
>>>
>>> I did an oprofile, with the following results (server just started,
>>> load "only" 5-10):
>>>
>>> 87593    56.1116  (no location information)   vmlinux
>>> vmlinux                  rpcauth_lookup_credcache
>>> 16037    10.2732  auth_generic.c:0            vmlinux
>>> vmlinux                  generic_match
>>> 6460      4.1382  (no location information)   php4
>>> php4                     (no symbols)
>>> 2478      1.5874  (no location information)   libc-2.7.so
>>> libc-2.7.so              (no symbols)
>>> [...]
>>>
>>> We havn't configured any special authentication method.  It is a NFSv3
>>> over UDP mount, but the kernel has NFSv4 and therefore KRB5 enabled.
>>>
>>> Any ideas why rpcauth_lookup_credcache() goes overboard with CPU
>>> usage?
>>>       
>> I have bisected the problem: 98a8e323 is the result ("SUNRPC: Add a
>> helper rpcauth_lookup_generic_cred()").  5c691044 is ok.
>>
>> See the attached oprofile annotation data for both commits.  I guess
>> that the function rpcauth_lookup_credcache() is waiting for a spinlock
>> too often and too long.  Trond, any idea?
>>     
>
> Can you add a '-v' to the rpc.gssd daemon startup line? I'd like to see
> how often you are creating new gss contexts.
>
>   
>> Harry: added you to Cc because your problem sounds similar.
>>     
>
> Harry's problem is should be unrelated. afaik, he is seeing a problem
> with userland RPC code, not kernel rpc code.
>
> Trond
>
>   
I am finally getting some time to look at my problem that I originally 
reported in October (SUNRPC problem with 2.6.26 and beyond), and I am 
seeing the same behavior as Max Kellermann when my machine slows as I 
described earlier.  The system in question is currently running 
2.6.27.7.  Here is what I see when it is misbehaving:

samples  %        image name               app name                 
symbol name
11380517 57.4191  sunrpc.ko                sunrpc                   
rpcauth_lookup_credcache
3263657  16.4664  sunrpc.ko                sunrpc                   
generic_match
1081287   5.4555  vmlinux                  vmlinux                  
copy_user_generic_string
499407    2.5197  vmlinux                  vmlinux                  
__posix_lock_file
[...]

And here is what I see when I stop the programs that are chewing up all 
the system time, and then starting them up again:

samples  %        image name               app name                 
symbol name
6372650  21.7978  vmlinux                  vmlinux                  
copy_user_generic_string
5401386  18.4755  sunrpc.ko                sunrpc                   
rpcauth_lookup_credcache
3018753  10.3257  vmlinux                  vmlinux                  
__posix_lock_file
1050095   3.5919  sunrpc.ko                sunrpc                   
generic_match


and I am not using Kerberos with NFSv4 (i.e. no rpc.gssd).  Did you ever 
find a solution for this problem with rpcauth_lookup_credcache?

-- 

 Dr. Harry Edmon			E-MAIL: harry-qmPYOCrcNLLyFCzt5hm0YvZ8FUJU4vz8@public.gmane.org
 206-543-0547				harry-B93hV6UPU7Z2icitjWtXSw@public.gmane.org
 Dept of Atmospheric Sciences		FAX:	206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640


WARNING: multiple messages have this Message-ID (diff)
From: Harry Edmon <harry@atmos.washington.edu>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Max Kellermann <max@duempel.org>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: High load in 2.6.27, NFS / rpcauth_lookup_credcache()?
Date: Mon, 15 Dec 2008 15:44:58 -0800	[thread overview]
Message-ID: <4946EBFA.60700@atmos.washington.edu> (raw)
In-Reply-To: <1224773745.7625.4.camel@localhost>

Trond Myklebust wrote:
> On Thu, 2008-10-23 at 14:36 +0200, Max Kellermann wrote:
>   
>> On 2008/10/22 11:12, Max Kellermann <max@duempel.org> wrote:
>>     
>>> after I was able to fix http://lkml.org/lkml/2008/10/17/147, the
>>> server which was already upgraded to 2.6.27.2 still gets very high
>>> load.  It is a web server with NFS file storage (NetApp), and while
>>> the others in the cluster (kernel 2.6.25) have a load of 1-3, 2.6.27.2
>>> gets 30-50.
>>>
>>> I did an oprofile, with the following results (server just started,
>>> load "only" 5-10):
>>>
>>> 87593    56.1116  (no location information)   vmlinux
>>> vmlinux                  rpcauth_lookup_credcache
>>> 16037    10.2732  auth_generic.c:0            vmlinux
>>> vmlinux                  generic_match
>>> 6460      4.1382  (no location information)   php4
>>> php4                     (no symbols)
>>> 2478      1.5874  (no location information)   libc-2.7.so
>>> libc-2.7.so              (no symbols)
>>> [...]
>>>
>>> We havn't configured any special authentication method.  It is a NFSv3
>>> over UDP mount, but the kernel has NFSv4 and therefore KRB5 enabled.
>>>
>>> Any ideas why rpcauth_lookup_credcache() goes overboard with CPU
>>> usage?
>>>       
>> I have bisected the problem: 98a8e323 is the result ("SUNRPC: Add a
>> helper rpcauth_lookup_generic_cred()").  5c691044 is ok.
>>
>> See the attached oprofile annotation data for both commits.  I guess
>> that the function rpcauth_lookup_credcache() is waiting for a spinlock
>> too often and too long.  Trond, any idea?
>>     
>
> Can you add a '-v' to the rpc.gssd daemon startup line? I'd like to see
> how often you are creating new gss contexts.
>
>   
>> Harry: added you to Cc because your problem sounds similar.
>>     
>
> Harry's problem is should be unrelated. afaik, he is seeing a problem
> with userland RPC code, not kernel rpc code.
>
> Trond
>
>   
I am finally getting some time to look at my problem that I originally 
reported in October (SUNRPC problem with 2.6.26 and beyond), and I am 
seeing the same behavior as Max Kellermann when my machine slows as I 
described earlier.  The system in question is currently running 
2.6.27.7.  Here is what I see when it is misbehaving:

samples  %        image name               app name                 
symbol name
11380517 57.4191  sunrpc.ko                sunrpc                   
rpcauth_lookup_credcache
3263657  16.4664  sunrpc.ko                sunrpc                   
generic_match
1081287   5.4555  vmlinux                  vmlinux                  
copy_user_generic_string
499407    2.5197  vmlinux                  vmlinux                  
__posix_lock_file
[...]

And here is what I see when I stop the programs that are chewing up all 
the system time, and then starting them up again:

samples  %        image name               app name                 
symbol name
6372650  21.7978  vmlinux                  vmlinux                  
copy_user_generic_string
5401386  18.4755  sunrpc.ko                sunrpc                   
rpcauth_lookup_credcache
3018753  10.3257  vmlinux                  vmlinux                  
__posix_lock_file
1050095   3.5919  sunrpc.ko                sunrpc                   
generic_match


and I am not using Kerberos with NFSv4 (i.e. no rpc.gssd).  Did you ever 
find a solution for this problem with rpcauth_lookup_credcache?

-- 

 Dr. Harry Edmon			E-MAIL: harry@atmos.washington.edu
 206-543-0547				harry@washington.edu
 Dept of Atmospheric Sciences		FAX:	206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640


  parent reply	other threads:[~2008-12-15 23:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-22  9:12 High load in 2.6.27, NFS / rpcauth_lookup_credcache()? Max Kellermann
2008-10-22  9:12 ` Max Kellermann
     [not found] ` <20081022091207.GA12996-2pNSKKP3PSJxEiad3KpGLI/oZP4lHnOC@public.gmane.org>
2008-10-22 17:56   ` J. Bruce Fields
2008-10-22 17:56     ` J. Bruce Fields
2008-10-23 12:36   ` Max Kellermann
2008-10-23 12:36     ` Max Kellermann
2008-10-23 14:55     ` Trond Myklebust
2008-10-24  8:39       ` Max Kellermann
2008-10-24  8:39         ` Max Kellermann
     [not found]         ` <20081024083913.GA15197-2pNSKKP3PSJxEiad3KpGLI/oZP4lHnOC@public.gmane.org>
2008-10-24 18:09           ` Trond Myklebust
2008-10-24 18:09             ` Trond Myklebust
2008-10-27  9:58             ` Max Kellermann
2008-10-27  9:58               ` Max Kellermann
     [not found]               ` <20081027095843.GA10937-2pNSKKP3PSJxEiad3KpGLI/oZP4lHnOC@public.gmane.org>
2008-10-27 15:48                 ` Trond Myklebust
2008-10-27 15:48                   ` Trond Myklebust
     [not found]                   ` <1225122503.14242.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-17 12:35                     ` Max Kellermann
2008-11-17 12:35                       ` Max Kellermann
     [not found]                       ` <20081117123536.GA16539-2pNSKKP3PSJxEiad3KpGLI/oZP4lHnOC@public.gmane.org>
2008-11-19 22:31                         ` Trond Myklebust
2008-11-19 22:31                           ` Trond Myklebust
     [not found]                           ` <1227133861.28898.26.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-20 14:08                             ` Max Kellermann
2008-11-20 14:08                               ` Max Kellermann
2008-12-15 23:44       ` Harry Edmon [this message]
2008-12-15 23:44         ` Harry Edmon
     [not found]         ` <4946EBFA.60700-qmPYOCrcNLLyFCzt5hm0YvZ8FUJU4vz8@public.gmane.org>
2008-12-16 13:02           ` Trond Myklebust
2008-12-16 13:02             ` Trond Myklebust
     [not found]             ` <1229432553.7257.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-12-16 21:21               ` Willy Tarreau
2008-12-16 21:21                 ` Willy Tarreau
     [not found]                 ` <20081216212155.GA581-K+wRfnb2/UA@public.gmane.org>
2008-12-16 23:21                   ` [stable] " Greg KH
2008-12-16 23:21                     ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4946EBFA.60700@atmos.washington.edu \
    --to=harry-qmpyocrcnllyfczt5hm0yvz8fuju4vz8@public.gmane.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=max-hDT0AjmEH7RAfugRpC6u6w@public.gmane.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.