All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: "Weathers,
	Norman R."
	<Norman.R.Weathers-496aOtIFJR1B+Kdf37RAV9BPR1lH4CV8@public.gmane.org>
Cc: Jeff Layton <jlayton@poochiereds.net>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	Neil Brown <neilb@suse.de>
Subject: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
Date: Fri, 13 Jun 2008 16:15:52 -0400	[thread overview]
Message-ID: <20080613201552.GH8501@fieldses.org> (raw)
In-Reply-To: <0122F800A3B64C449565A9E8C297701002D75DAE-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>

On Thu, Jun 12, 2008 at 02:54:09PM -0500, Weathers, Norman R. wrote:
>  
> 
> > -----Original Message-----
> > From: linux-nfs-owner@vger.kernel.org 
> > [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of J. Bruce Fields
> > Sent: Wednesday, June 11, 2008 5:55 PM
> > To: Weathers, Norman R.
> > Cc: Jeff Layton; linux-kernel@vger.kernel.org; 
> > linux-nfs@vger.kernel.org
> > Subject: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
> > 
> > On Wed, Jun 11, 2008 at 05:46:13PM -0500, Weathers, Norman R. wrote:
> > > I will try and get it patched and retested, but it may be a 
> > day or two
> > > before I can get back the information due to production jobs now
> > > running.  Once they finish up, I will get back with the info.
> > 
> > Understood.
> > 
> 
> 
> I was able to get my big user to cooperate and let me in to be able to
> get the information that you were needing.  The full output from the
> /proc/slab_allocator file is at
> http://www.shashi-weathers.net/linux/cluster/NFS_DEBUG_2 .  The 16
> thread case is very interesting.  Also, there is a small txt file in the
> directory that has some rpc errors, but I imagine the way that I am
> running the box (oversubscribed threads) has more to do with the rpc
> errors than anything else.  For those of you wanting the gist of the
> story, the size-4096 slab has the following very large allocation:
> 
> size-4096: 2 sys_init_module+0x140b/0x1980
> size-4096: 1 __vmalloc_area_node+0x188/0x1b0
> size-4096: 1 seq_read+0x1d9/0x2e0
> size-4096: 1 slabstats_open+0x2b/0x80
> size-4096: 5 vc_allocate+0x167/0x190
> size-4096: 3 input_allocate_device+0x12/0x80
> size-4096: 1 hid_add_field+0x122/0x290
> size-4096: 9 reqsk_queue_alloc+0x5f/0xf0
> size-4096: 1846825 __alloc_skb+0x7d/0x170
> size-4096: 3 alloc_netdev+0x33/0xa0
> size-4096: 10 neigh_sysctl_register+0x52/0x2b0
> size-4096: 5 devinet_sysctl_register+0x28/0x110
> size-4096: 1 pidmap_init+0x15/0x60
> size-4096: 1 netlink_proto_init+0x44/0x190
> size-4096: 1 ip_rt_init+0xfd/0x2f0
> size-4096: 1 cipso_v4_init+0x13/0x70
> size-4096: 3 journal_init_revoke+0xe7/0x270 [jbd]
> size-4096: 3 journal_init_revoke+0x18a/0x270 [jbd]
> size-4096: 2 journal_init_inode+0x84/0x150 [jbd]
> size-4096: 2 bnx2_alloc_mem+0x18/0x1f0 [bnx2]
> size-4096: 1 joydev_connect+0x53/0x390 [joydev]
> size-4096: 13 kmem_alloc+0xb3/0x100 [xfs]
> size-4096: 5 addrconf_sysctl_register+0x31/0x130 [ipv6]
> size-4096: 7 rpc_clone_client+0x84/0x140 [sunrpc]
> size-4096: 3 rpc_create+0x254/0x4d0 [sunrpc]
> size-4096: 16 __svc_create_thread+0x53/0x1f0 [sunrpc]
> size-4096: 16 __svc_create_thread+0x72/0x1f0 [sunrpc]
> size-4096: 1 nfsd_racache_init+0x2e/0x140 [nfsd]
> 
> The big one seems to be the __alloc_skb. (This is with 16 threads, and
> it says that we are using up somewhere between 12 and 14 GB of memory,
> about 2 to 3 gig of that is disk cache).  If I were to put anymore
> threads out there, the server would become almost unresponsive (it was
> bad enough as it was).   
> 
> At the same time, I also noticed this:
> 
> skbuff_fclone_cache: 1842524 __alloc_skb+0x50/0x170
> 
> Don't know for sure if that is meaningful or not....

OK, so, starting at net/core/skbuff.c, this means that this memory was
allocated by __alloc_skb() calls with something nonzero in the third
("fclone") argument.  The only such caller is alloc_skb_fclone().
Callers of alloc_skb_fclone() include:

	sk_stream_alloc_skb:
		do_tcp_sendpages
		tcp_sendmsg
		tcp_fragment
		tso_fragment
		tcp_mtu_probe
	tcp_send_fin
	tcp_connect
	buf_acquire:
		lots of callers in tipc code (whatever that is).

So unless you're using tipc, or you have something in userspace going
haywire (perhaps netstat would help rule that out?), then I suppose
there's something wrong with knfsd's tcp code.  Which makes sense, I
guess.

I'd think this sort of allocation would be limited by the number of
sockets times the size of the send and receive buffers.
svc_xprt.c:svc_check_conn_limits() claims to be limiting the number of
sockets to (nrthreads+3)*20.  (You aren't hitting the "too many open
connections" printk there, are you?)  The total buffer size should be
bounded by something like 4 megs.

--b.

> 
> 
> 
> > > Thanks everyone for looking at this, by the way!
> > 
> > And thanks for your persistence.
> > 
> > --b.
> > 
> 
> 
> Anytime.  This is the part of the job that is fun (except for my
> users...).  Anyone can watch a system run, it's dealing with the unknown
> that makes it interesting.

OK!  Because I'm a bit stuck, so this will take some more work....

--b.

> 
> 
> Norman Weathers
> 
> 
> > > 
> > > > 
> > > > 
> > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > index 06236e4..b379e31 100644
> > > > --- a/mm/slab.c
> > > > +++ b/mm/slab.c
> > > > @@ -2202,7 +2202,7 @@ kmem_cache_create (const char *name, 
> > > > size_t size, size_t align,
> > > >  	 * above the next power of two: caches with object 
> > > > sizes just above a
> > > >  	 * power of two have a significant amount of internal 
> > > > fragmentation.
> > > >  	 */
> > > > -	if (size < 4096 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
> > > > +	if (size < 8192 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
> > > >  						2 * 
> > > > sizeof(unsigned long long)))
> > > >  		flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
> > > >  	if (!(flags & SLAB_DESTROY_BY_RCU))
> > > > 
> > > 
> > > 
> > > Norman Weathers
> > --
> > To unsubscribe from this list: send the line "unsubscribe 
> > linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

WARNING: multiple messages have this Message-ID (diff)
From: "J. Bruce Fields" <bfields@fieldses.org>
To: "Weathers, Norman R." <Norman.R.Weathers@conocophillips.com>
Cc: Jeff Layton <jlayton@poochiereds.net>,
	linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	Neil Brown <neilb@suse.de>
Subject: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
Date: Fri, 13 Jun 2008 16:15:52 -0400	[thread overview]
Message-ID: <20080613201552.GH8501@fieldses.org> (raw)
In-Reply-To: <0122F800A3B64C449565A9E8C297701002D75DAE@hoexmb9.conoco.net>

On Thu, Jun 12, 2008 at 02:54:09PM -0500, Weathers, Norman R. wrote:
>  
> 
> > -----Original Message-----
> > From: linux-nfs-owner@vger.kernel.org 
> > [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of J. Bruce Fields
> > Sent: Wednesday, June 11, 2008 5:55 PM
> > To: Weathers, Norman R.
> > Cc: Jeff Layton; linux-kernel@vger.kernel.org; 
> > linux-nfs@vger.kernel.org
> > Subject: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
> > 
> > On Wed, Jun 11, 2008 at 05:46:13PM -0500, Weathers, Norman R. wrote:
> > > I will try and get it patched and retested, but it may be a 
> > day or two
> > > before I can get back the information due to production jobs now
> > > running.  Once they finish up, I will get back with the info.
> > 
> > Understood.
> > 
> 
> 
> I was able to get my big user to cooperate and let me in to be able to
> get the information that you were needing.  The full output from the
> /proc/slab_allocator file is at
> http://www.shashi-weathers.net/linux/cluster/NFS_DEBUG_2 .  The 16
> thread case is very interesting.  Also, there is a small txt file in the
> directory that has some rpc errors, but I imagine the way that I am
> running the box (oversubscribed threads) has more to do with the rpc
> errors than anything else.  For those of you wanting the gist of the
> story, the size-4096 slab has the following very large allocation:
> 
> size-4096: 2 sys_init_module+0x140b/0x1980
> size-4096: 1 __vmalloc_area_node+0x188/0x1b0
> size-4096: 1 seq_read+0x1d9/0x2e0
> size-4096: 1 slabstats_open+0x2b/0x80
> size-4096: 5 vc_allocate+0x167/0x190
> size-4096: 3 input_allocate_device+0x12/0x80
> size-4096: 1 hid_add_field+0x122/0x290
> size-4096: 9 reqsk_queue_alloc+0x5f/0xf0
> size-4096: 1846825 __alloc_skb+0x7d/0x170
> size-4096: 3 alloc_netdev+0x33/0xa0
> size-4096: 10 neigh_sysctl_register+0x52/0x2b0
> size-4096: 5 devinet_sysctl_register+0x28/0x110
> size-4096: 1 pidmap_init+0x15/0x60
> size-4096: 1 netlink_proto_init+0x44/0x190
> size-4096: 1 ip_rt_init+0xfd/0x2f0
> size-4096: 1 cipso_v4_init+0x13/0x70
> size-4096: 3 journal_init_revoke+0xe7/0x270 [jbd]
> size-4096: 3 journal_init_revoke+0x18a/0x270 [jbd]
> size-4096: 2 journal_init_inode+0x84/0x150 [jbd]
> size-4096: 2 bnx2_alloc_mem+0x18/0x1f0 [bnx2]
> size-4096: 1 joydev_connect+0x53/0x390 [joydev]
> size-4096: 13 kmem_alloc+0xb3/0x100 [xfs]
> size-4096: 5 addrconf_sysctl_register+0x31/0x130 [ipv6]
> size-4096: 7 rpc_clone_client+0x84/0x140 [sunrpc]
> size-4096: 3 rpc_create+0x254/0x4d0 [sunrpc]
> size-4096: 16 __svc_create_thread+0x53/0x1f0 [sunrpc]
> size-4096: 16 __svc_create_thread+0x72/0x1f0 [sunrpc]
> size-4096: 1 nfsd_racache_init+0x2e/0x140 [nfsd]
> 
> The big one seems to be the __alloc_skb. (This is with 16 threads, and
> it says that we are using up somewhere between 12 and 14 GB of memory,
> about 2 to 3 gig of that is disk cache).  If I were to put anymore
> threads out there, the server would become almost unresponsive (it was
> bad enough as it was).   
> 
> At the same time, I also noticed this:
> 
> skbuff_fclone_cache: 1842524 __alloc_skb+0x50/0x170
> 
> Don't know for sure if that is meaningful or not....

OK, so, starting at net/core/skbuff.c, this means that this memory was
allocated by __alloc_skb() calls with something nonzero in the third
("fclone") argument.  The only such caller is alloc_skb_fclone().
Callers of alloc_skb_fclone() include:

	sk_stream_alloc_skb:
		do_tcp_sendpages
		tcp_sendmsg
		tcp_fragment
		tso_fragment
		tcp_mtu_probe
	tcp_send_fin
	tcp_connect
	buf_acquire:
		lots of callers in tipc code (whatever that is).

So unless you're using tipc, or you have something in userspace going
haywire (perhaps netstat would help rule that out?), then I suppose
there's something wrong with knfsd's tcp code.  Which makes sense, I
guess.

I'd think this sort of allocation would be limited by the number of
sockets times the size of the send and receive buffers.
svc_xprt.c:svc_check_conn_limits() claims to be limiting the number of
sockets to (nrthreads+3)*20.  (You aren't hitting the "too many open
connections" printk there, are you?)  The total buffer size should be
bounded by something like 4 megs.

--b.

> 
> 
> 
> > > Thanks everyone for looking at this, by the way!
> > 
> > And thanks for your persistence.
> > 
> > --b.
> > 
> 
> 
> Anytime.  This is the part of the job that is fun (except for my
> users...).  Anyone can watch a system run, it's dealing with the unknown
> that makes it interesting.

OK!  Because I'm a bit stuck, so this will take some more work....

--b.

> 
> 
> Norman Weathers
> 
> 
> > > 
> > > > 
> > > > 
> > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > index 06236e4..b379e31 100644
> > > > --- a/mm/slab.c
> > > > +++ b/mm/slab.c
> > > > @@ -2202,7 +2202,7 @@ kmem_cache_create (const char *name, 
> > > > size_t size, size_t align,
> > > >  	 * above the next power of two: caches with object 
> > > > sizes just above a
> > > >  	 * power of two have a significant amount of internal 
> > > > fragmentation.
> > > >  	 */
> > > > -	if (size < 4096 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
> > > > +	if (size < 8192 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
> > > >  						2 * 
> > > > sizeof(unsigned long long)))
> > > >  		flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
> > > >  	if (!(flags & SLAB_DESTROY_BY_RCU))
> > > > 
> > > 
> > > 
> > > Norman Weathers
> > --
> > To unsubscribe from this list: send the line "unsubscribe 
> > linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

  parent reply	other threads:[~2008-06-13 20:15 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-03 18:50 Problems with large number of clients and reads Norman Weathers
2008-06-04 13:49 ` Chuck Lever
     [not found]   ` <76bd70e30806040649h53ab5d66x8c3423c551e94f77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-04 14:13     ` Norman Weathers
2008-06-05 18:54       ` Norman Weathers
2008-06-06 14:44         ` Chuck Lever
2008-06-09 13:56           ` Weathers, Norman R.
2008-06-06  0:06 ` Dean Hildebrand
2008-06-09 13:20   ` Weathers, Norman R.
2008-06-06 16:09 ` J. Bruce Fields
2008-06-09 14:19   ` Weathers, Norman R.
     [not found]     ` <0122F800A3B64C449565A9E8C2977010155587-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-09 18:53       ` J. Bruce Fields
2008-06-10 14:30         ` Weathers, Norman R.
     [not found]           ` <0122F800A3B64C449565A9E8C297701002D75D9F-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-10 17:16             ` J. Bruce Fields
2008-06-10 22:12               ` Weathers, Norman R.
     [not found]                 ` <0122F800A3B64C449565A9E8C297701002D75DA3-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-11 18:46                   ` J. Bruce Fields
2008-06-11 19:52                     ` CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger? J. Bruce Fields
2008-06-11 19:52                       ` J. Bruce Fields
2008-06-11 20:09                       ` Jeff Layton
2008-06-11 20:09                         ` Jeff Layton
     [not found]                         ` <20080611160947.5f08fb16-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-11 20:57                           ` J. Bruce Fields
2008-06-11 20:57                             ` J. Bruce Fields
2008-06-11 22:46                             ` Weathers, Norman R.
2008-06-11 22:46                               ` Weathers, Norman R.
     [not found]                               ` <0122F800A3B64C449565A9E8C297701002D75DAA-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-11 22:54                                 ` J. Bruce Fields
2008-06-11 22:54                                   ` J. Bruce Fields
2008-06-12 19:54                                   ` Weathers, Norman R.
2008-06-12 19:54                                     ` Weathers, Norman R.
     [not found]                                     ` <0122F800A3B64C449565A9E8C297701002D75DAE-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-13 20:15                                       ` J. Bruce Fields [this message]
2008-06-13 20:15                                         ` J. Bruce Fields
2008-06-13 21:53                                         ` Weathers, Norman R.
2008-06-13 21:53                                           ` Weathers, Norman R.
     [not found]                                           ` <0122F800A3B64C449565A9E8C297701002D75DB6-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-13 22:04                                             ` J. Bruce Fields
2008-06-13 22:04                                               ` J. Bruce Fields
2008-06-13 22:53                                               ` Weathers, Norman R.
2008-06-13 22:53                                                 ` Weathers, Norman R.
     [not found]                                                 ` <0122F800A3B64C449565A9E8C297701002D75DB7-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-16 17:43                                                   ` J. Bruce Fields
2008-06-16 17:43                                                     ` J. Bruce Fields
2008-06-19 15:53                                                     ` Weathers, Norman R.
2008-06-19 15:53                                                       ` Weathers, Norman R.
     [not found]                                                       ` <0122F800A3B64C449565A9E8C297701002D75DD4-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-19 18:46                                                         ` J. Bruce Fields
2008-06-19 18:46                                                           ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080613201552.GH8501@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=Norman.R.Weathers-496aOtIFJR1B+Kdf37RAV9BPR1lH4CV8@public.gmane.org \
    --cc=jlayton@poochiereds.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.