All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Harry Edmon
	<harry-qmPYOCrcNLLyFCzt5hm0YvZ8FUJU4vz8@public.gmane.org>,
	Max Kellermann <max-hDT0AjmEH7RAfugRpC6u6w@public.gmane.org>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	stable@kernel.org
Subject: Re: High load in 2.6.27, NFS / rpcauth_lookup_credcache()?
Date: Tue, 16 Dec 2008 22:21:55 +0100	[thread overview]
Message-ID: <20081216212155.GA581@1wt.eu> (raw)
In-Reply-To: <1229432553.7257.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>

On Tue, Dec 16, 2008 at 08:02:33AM -0500, Trond Myklebust wrote:
> On Mon, 2008-12-15 at 15:44 -0800, Harry Edmon wrote:
> > Trond Myklebust wrote:
> > > On Thu, 2008-10-23 at 14:36 +0200, Max Kellermann wrote:
> > >   
> > >> On 2008/10/22 11:12, Max Kellermann <max-hDT0AjmEH7RAfugRpC6u6w@public.gmane.org> wrote:
> > >>     
> > >>> after I was able to fix http://lkml.org/lkml/2008/10/17/147, the
> > >>> server which was already upgraded to 2.6.27.2 still gets very high
> > >>> load.  It is a web server with NFS file storage (NetApp), and while
> > >>> the others in the cluster (kernel 2.6.25) have a load of 1-3, 2.6.27.2
> > >>> gets 30-50.
> > >>>
> > >>> I did an oprofile, with the following results (server just started,
> > >>> load "only" 5-10):
> > >>>
> > >>> 87593    56.1116  (no location information)   vmlinux
> > >>> vmlinux                  rpcauth_lookup_credcache
> > >>> 16037    10.2732  auth_generic.c:0            vmlinux
> > >>> vmlinux                  generic_match
> > >>> 6460      4.1382  (no location information)   php4
> > >>> php4                     (no symbols)
> > >>> 2478      1.5874  (no location information)   libc-2.7.so
> > >>> libc-2.7.so              (no symbols)
> > >>> [...]
> > >>>
> > >>> We havn't configured any special authentication method.  It is a NFSv3
> > >>> over UDP mount, but the kernel has NFSv4 and therefore KRB5 enabled.
> > >>>
> > >>> Any ideas why rpcauth_lookup_credcache() goes overboard with CPU
> > >>> usage?
> > >>>       
> > >> I have bisected the problem: 98a8e323 is the result ("SUNRPC: Add a
> > >> helper rpcauth_lookup_generic_cred()").  5c691044 is ok.
> > >>
> > >> See the attached oprofile annotation data for both commits.  I guess
> > >> that the function rpcauth_lookup_credcache() is waiting for a spinlock
> > >> too often and too long.  Trond, any idea?
> > >>     
> > >
> > > Can you add a '-v' to the rpc.gssd daemon startup line? I'd like to see
> > > how often you are creating new gss contexts.
> > >
> > >   
> > >> Harry: added you to Cc because your problem sounds similar.
> > >>     
> > >
> > > Harry's problem is should be unrelated. afaik, he is seeing a problem
> > > with userland RPC code, not kernel rpc code.
> > >
> > > Trond
> > >
> > >   
> > I am finally getting some time to look at my problem that I originally 
> > reported in October (SUNRPC problem with 2.6.26 and beyond), and I am 
> > seeing the same behavior as Max Kellermann when my machine slows as I 
> > described earlier.  The system in question is currently running 
> > 2.6.27.7.  Here is what I see when it is misbehaving:
> > 
> > samples  %        image name               app name                 
> > symbol name
> > 11380517 57.4191  sunrpc.ko                sunrpc                   
> > rpcauth_lookup_credcache
> > 3263657  16.4664  sunrpc.ko                sunrpc                   
> > generic_match
> > 1081287   5.4555  vmlinux                  vmlinux                  
> > copy_user_generic_string
> > 499407    2.5197  vmlinux                  vmlinux                  
> > __posix_lock_file
> > [...]
> > 
> > And here is what I see when I stop the programs that are chewing up all 
> > the system time, and then starting them up again:
> > 
> > samples  %        image name               app name                 
> > symbol name
> > 6372650  21.7978  vmlinux                  vmlinux                  
> > copy_user_generic_string
> > 5401386  18.4755  sunrpc.ko                sunrpc                   
> > rpcauth_lookup_credcache
> > 3018753  10.3257  vmlinux                  vmlinux                  
> > __posix_lock_file
> > 1050095   3.5919  sunrpc.ko                sunrpc                   
> > generic_match
> > 
> > 
> > and I am not using Kerberos with NFSv4 (i.e. no rpc.gssd).  Did you ever 
> > find a solution for this problem with rpcauth_lookup_credcache?
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=23918b03060f6e572168fdde1798a905679d2e06

Trond, this should be included into next stable, right ?

It's fortunate because I know someone else who recently described me the
same problem under the same circumstances when migrating from 2.6.22 to
2.6.27.

Regards,
Willy


WARNING: multiple messages have this Message-ID (diff)
From: Willy Tarreau <w@1wt.eu>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Harry Edmon <harry@atmos.washington.edu>,
	Max Kellermann <max@duempel.org>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	stable@kernel.org
Subject: Re: High load in 2.6.27, NFS / rpcauth_lookup_credcache()?
Date: Tue, 16 Dec 2008 22:21:55 +0100	[thread overview]
Message-ID: <20081216212155.GA581@1wt.eu> (raw)
In-Reply-To: <1229432553.7257.4.camel@heimdal.trondhjem.org>

On Tue, Dec 16, 2008 at 08:02:33AM -0500, Trond Myklebust wrote:
> On Mon, 2008-12-15 at 15:44 -0800, Harry Edmon wrote:
> > Trond Myklebust wrote:
> > > On Thu, 2008-10-23 at 14:36 +0200, Max Kellermann wrote:
> > >   
> > >> On 2008/10/22 11:12, Max Kellermann <max@duempel.org> wrote:
> > >>     
> > >>> after I was able to fix http://lkml.org/lkml/2008/10/17/147, the
> > >>> server which was already upgraded to 2.6.27.2 still gets very high
> > >>> load.  It is a web server with NFS file storage (NetApp), and while
> > >>> the others in the cluster (kernel 2.6.25) have a load of 1-3, 2.6.27.2
> > >>> gets 30-50.
> > >>>
> > >>> I did an oprofile, with the following results (server just started,
> > >>> load "only" 5-10):
> > >>>
> > >>> 87593    56.1116  (no location information)   vmlinux
> > >>> vmlinux                  rpcauth_lookup_credcache
> > >>> 16037    10.2732  auth_generic.c:0            vmlinux
> > >>> vmlinux                  generic_match
> > >>> 6460      4.1382  (no location information)   php4
> > >>> php4                     (no symbols)
> > >>> 2478      1.5874  (no location information)   libc-2.7.so
> > >>> libc-2.7.so              (no symbols)
> > >>> [...]
> > >>>
> > >>> We havn't configured any special authentication method.  It is a NFSv3
> > >>> over UDP mount, but the kernel has NFSv4 and therefore KRB5 enabled.
> > >>>
> > >>> Any ideas why rpcauth_lookup_credcache() goes overboard with CPU
> > >>> usage?
> > >>>       
> > >> I have bisected the problem: 98a8e323 is the result ("SUNRPC: Add a
> > >> helper rpcauth_lookup_generic_cred()").  5c691044 is ok.
> > >>
> > >> See the attached oprofile annotation data for both commits.  I guess
> > >> that the function rpcauth_lookup_credcache() is waiting for a spinlock
> > >> too often and too long.  Trond, any idea?
> > >>     
> > >
> > > Can you add a '-v' to the rpc.gssd daemon startup line? I'd like to see
> > > how often you are creating new gss contexts.
> > >
> > >   
> > >> Harry: added you to Cc because your problem sounds similar.
> > >>     
> > >
> > > Harry's problem is should be unrelated. afaik, he is seeing a problem
> > > with userland RPC code, not kernel rpc code.
> > >
> > > Trond
> > >
> > >   
> > I am finally getting some time to look at my problem that I originally 
> > reported in October (SUNRPC problem with 2.6.26 and beyond), and I am 
> > seeing the same behavior as Max Kellermann when my machine slows as I 
> > described earlier.  The system in question is currently running 
> > 2.6.27.7.  Here is what I see when it is misbehaving:
> > 
> > samples  %        image name               app name                 
> > symbol name
> > 11380517 57.4191  sunrpc.ko                sunrpc                   
> > rpcauth_lookup_credcache
> > 3263657  16.4664  sunrpc.ko                sunrpc                   
> > generic_match
> > 1081287   5.4555  vmlinux                  vmlinux                  
> > copy_user_generic_string
> > 499407    2.5197  vmlinux                  vmlinux                  
> > __posix_lock_file
> > [...]
> > 
> > And here is what I see when I stop the programs that are chewing up all 
> > the system time, and then starting them up again:
> > 
> > samples  %        image name               app name                 
> > symbol name
> > 6372650  21.7978  vmlinux                  vmlinux                  
> > copy_user_generic_string
> > 5401386  18.4755  sunrpc.ko                sunrpc                   
> > rpcauth_lookup_credcache
> > 3018753  10.3257  vmlinux                  vmlinux                  
> > __posix_lock_file
> > 1050095   3.5919  sunrpc.ko                sunrpc                   
> > generic_match
> > 
> > 
> > and I am not using Kerberos with NFSv4 (i.e. no rpc.gssd).  Did you ever 
> > find a solution for this problem with rpcauth_lookup_credcache?
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=23918b03060f6e572168fdde1798a905679d2e06

Trond, this should be included into next stable, right ?

It's fortunate because I know someone else who recently described me the
same problem under the same circumstances when migrating from 2.6.22 to
2.6.27.

Regards,
Willy


  parent reply	other threads:[~2008-12-16 21:22 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-22  9:12 High load in 2.6.27, NFS / rpcauth_lookup_credcache()? Max Kellermann
2008-10-22  9:12 ` Max Kellermann
     [not found] ` <20081022091207.GA12996-2pNSKKP3PSJxEiad3KpGLI/oZP4lHnOC@public.gmane.org>
2008-10-22 17:56   ` J. Bruce Fields
2008-10-22 17:56     ` J. Bruce Fields
2008-10-23 12:36   ` Max Kellermann
2008-10-23 12:36     ` Max Kellermann
2008-10-23 14:55     ` Trond Myklebust
2008-10-24  8:39       ` Max Kellermann
2008-10-24  8:39         ` Max Kellermann
     [not found]         ` <20081024083913.GA15197-2pNSKKP3PSJxEiad3KpGLI/oZP4lHnOC@public.gmane.org>
2008-10-24 18:09           ` Trond Myklebust
2008-10-24 18:09             ` Trond Myklebust
2008-10-27  9:58             ` Max Kellermann
2008-10-27  9:58               ` Max Kellermann
     [not found]               ` <20081027095843.GA10937-2pNSKKP3PSJxEiad3KpGLI/oZP4lHnOC@public.gmane.org>
2008-10-27 15:48                 ` Trond Myklebust
2008-10-27 15:48                   ` Trond Myklebust
     [not found]                   ` <1225122503.14242.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-17 12:35                     ` Max Kellermann
2008-11-17 12:35                       ` Max Kellermann
     [not found]                       ` <20081117123536.GA16539-2pNSKKP3PSJxEiad3KpGLI/oZP4lHnOC@public.gmane.org>
2008-11-19 22:31                         ` Trond Myklebust
2008-11-19 22:31                           ` Trond Myklebust
     [not found]                           ` <1227133861.28898.26.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-20 14:08                             ` Max Kellermann
2008-11-20 14:08                               ` Max Kellermann
2008-12-15 23:44       ` Harry Edmon
2008-12-15 23:44         ` Harry Edmon
     [not found]         ` <4946EBFA.60700-qmPYOCrcNLLyFCzt5hm0YvZ8FUJU4vz8@public.gmane.org>
2008-12-16 13:02           ` Trond Myklebust
2008-12-16 13:02             ` Trond Myklebust
     [not found]             ` <1229432553.7257.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-12-16 21:21               ` Willy Tarreau [this message]
2008-12-16 21:21                 ` Willy Tarreau
     [not found]                 ` <20081216212155.GA581-K+wRfnb2/UA@public.gmane.org>
2008-12-16 23:21                   ` [stable] " Greg KH
2008-12-16 23:21                     ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081216212155.GA581@1wt.eu \
    --to=w@1wt.eu \
    --cc=harry-qmPYOCrcNLLyFCzt5hm0YvZ8FUJU4vz8@public.gmane.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=max-hDT0AjmEH7RAfugRpC6u6w@public.gmane.org \
    --cc=stable@kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.