From: "J. Bruce Fields" <bfields@fieldses.org>
To: linux-kernel@vger.kernel.org
Cc: linux-nfs@vger.kernel.org, "Weathers,
Norman R."
<Norman.R.Weathers-496aOtIFJR1B+Kdf37RAV9BPR1lH4CV8@public.gmane.org>
Subject: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
Date: Wed, 11 Jun 2008 15:52:22 -0400 [thread overview]
Message-ID: <20080611195222.GP15380@fieldses.org> (raw)
In-Reply-To: <20080611184613.GM15380@fieldses.org>
I'm probably missing something fundamental--why doesn't
/proc/slab_allocators show any results for size-x where x >= 4096?
Someone's seeing a performance problem with the linux nfs server. One
of the symptoms is the "size-4096" slab cache seems to be out of
control. I assumed that meant that memory allocated by kmalloc() might
be leaking, so figured it might be interesting to turn on
CONFIG_DEBUG_SLAB_LEAK. As far as I can tell what that does is list
kmalloc() callers in /proc/slab_allocators. But that doesn't seem to be
showing any results for size-4096. Can anyone provide a clue?
Thanks!
--b.
On Wed, Jun 11, 2008 at 02:46:13PM -0400, bfields wrote:
> On Tue, Jun 10, 2008 at 05:12:31PM -0500, Weathers, Norman R. wrote:
> >
> >
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > Sent: Tuesday, June 10, 2008 12:16 PM
> > > To: Weathers, Norman R.
> > > Cc: linux-nfs@vger.kernel.org
> > > Subject: Re: Problems with large number of clients and reads
> > >
> > > On Tue, Jun 10, 2008 at 09:30:18AM -0500, Weathers, Norman R. wrote:
> > > > Unfortunately, I cannot stop the clients (middle of long running
> > > > jobs). I might be able to test this soon. If I have the number of
> > > > threads high, yes I can reduce the number of threads and it
> > > appears to
> > > > lower some of the memory, but even with as little as three threads,
> > > > the memory usage climbs very high, just not as high as if there are
> > > > say 8 threads. When the memory usage climbs high, it can cause the
> > > > box to not respond over the network (ssh, rsh), and even be very
> > > > sluggish when I am connected over our serial console to the
> > > server(s).
> > > > This same scenario has been happening with kernels that I have tried
> > > > from 2.6.22.x on to the 2.6.25 series. The 2.6.25 series is
> > > > interesting in that I can push the same load from a box with the
> > > > 2.6.25 kernel and not have a load over .3 (with 3 threads), but with
> > > > the 2.6.22.x kernel, I have a load of over 3 when I hit the same
> > > > conditions.
> > >
> > > OK, I think what we want to do is turn on
> > > CONFIG_DEBUG_SLAB_LEAK. I've
> > > never used it before, but it looks like it will report which functions
> > > are allocating from each slab cache, which may be exactly what we need
> > > to know. So:
> > >
> > > 1. Install a kernel with both CONFIG_DEBUG_SLAB ("Debug slab
> > > memory allocations") and CONFIG_DEBUG_SLAB_LEAK ("Memory leak
> > > debugging") turned on. They're both under the "kernel hacking"
> > > section of the kernel config. (If you have a file
> > > /proc/slab_allocators, then you already have these turned on and
> > > you can skip this step.)
> > >
> > > 2. Do whatever you need to do to reproduce the problem.
> > >
> > > 3. Get a copy of /proc/slabinfo and /proc/slab_allocators.
> > >
> > > Then we can take a look at that and see if it sheds any light.
> >
> >
> > I have taken several snapshots of the /proc/slab_allocators and
> > /proc/slabinfo as requested, but since there is a lot of info in them,
> > and I didn't think anyone wanted to go cross-eyed reading the data in an
> > email, I have them up on a website:
> >
> > http://shashi-weathers.net/linux/cluster/NFS/
>
> Excellent.
>
> >
> > The order of data collection is:
> >
> > slab_allocators_bad1.txt and corresponding slabinfo
> > slab_allocators_after_bad1.txt and corresponding slabinfo
> > slab_allocators_16_threads.txt and corresponding slabinfo
> > slab_allocators_16_threads_1.txt and corresponding slabinfo
> > slab_allocators_32_threads.txt and corresponding slabinfo
> > slab_allocators_really_bad.txt and corresponding slabinfo.
> >
> >
> > You will have to forgive my ignorance at this point, but I was looking
> > through the slabinfo and slab_allocators, and noticed that size-4096
> > does not show up in slab_allocators... I hope that is by design. You
> > can see it growing into the gigabytes in the slabinfo files....
>
> Argh. OK, I don't understand well enough how this works. Time to ask
> someone, I guess....
>
> --b.
>
> >
> >
> >
> > >
> > > I think that debugging will hurt the server performance, so you won't
> > > want to keep it turned on all the time.
> > >
> > > >
> > > > Also, this is all with the SLAB cache option. SLUB crashes
> > > everytime
> > > > I use it under heavy load.
> > >
> > > Have you reported the SLUB bugs to lkml?
> >
> > No, I haven't yet. I didn't know for sure if I was doing something
> > wrong, or if SLUB was the problem there. Since the failures, I had gone
> > back to using SLAB anyway, so .... I probably should...
> >
> > >
> > > --b.
> > >
> >
> >
> > Norman Weathers
WARNING: multiple messages have this Message-ID (diff)
From: "J. Bruce Fields" <bfields@fieldses.org>
To: linux-kernel@vger.kernel.org
Cc: linux-nfs@vger.kernel.org, "Weathers,
Norman R." <Norman.R.Weathers@conocophillips.com>
Subject: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
Date: Wed, 11 Jun 2008 15:52:22 -0400 [thread overview]
Message-ID: <20080611195222.GP15380@fieldses.org> (raw)
In-Reply-To: <20080611184613.GM15380@fieldses.org>
I'm probably missing something fundamental--why doesn't
/proc/slab_allocators show any results for size-x where x >= 4096?
Someone's seeing a performance problem with the linux nfs server. One
of the symptoms is the "size-4096" slab cache seems to be out of
control. I assumed that meant that memory allocated by kmalloc() might
be leaking, so figured it might be interesting to turn on
CONFIG_DEBUG_SLAB_LEAK. As far as I can tell what that does is list
kmalloc() callers in /proc/slab_allocators. But that doesn't seem to be
showing any results for size-4096. Can anyone provide a clue?
Thanks!
--b.
On Wed, Jun 11, 2008 at 02:46:13PM -0400, bfields wrote:
> On Tue, Jun 10, 2008 at 05:12:31PM -0500, Weathers, Norman R. wrote:
> >
> >
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > > Sent: Tuesday, June 10, 2008 12:16 PM
> > > To: Weathers, Norman R.
> > > Cc: linux-nfs@vger.kernel.org
> > > Subject: Re: Problems with large number of clients and reads
> > >
> > > On Tue, Jun 10, 2008 at 09:30:18AM -0500, Weathers, Norman R. wrote:
> > > > Unfortunately, I cannot stop the clients (middle of long running
> > > > jobs). I might be able to test this soon. If I have the number of
> > > > threads high, yes I can reduce the number of threads and it
> > > appears to
> > > > lower some of the memory, but even with as little as three threads,
> > > > the memory usage climbs very high, just not as high as if there are
> > > > say 8 threads. When the memory usage climbs high, it can cause the
> > > > box to not respond over the network (ssh, rsh), and even be very
> > > > sluggish when I am connected over our serial console to the
> > > server(s).
> > > > This same scenario has been happening with kernels that I have tried
> > > > from 2.6.22.x on to the 2.6.25 series. The 2.6.25 series is
> > > > interesting in that I can push the same load from a box with the
> > > > 2.6.25 kernel and not have a load over .3 (with 3 threads), but with
> > > > the 2.6.22.x kernel, I have a load of over 3 when I hit the same
> > > > conditions.
> > >
> > > OK, I think what we want to do is turn on
> > > CONFIG_DEBUG_SLAB_LEAK. I've
> > > never used it before, but it looks like it will report which functions
> > > are allocating from each slab cache, which may be exactly what we need
> > > to know. So:
> > >
> > > 1. Install a kernel with both CONFIG_DEBUG_SLAB ("Debug slab
> > > memory allocations") and CONFIG_DEBUG_SLAB_LEAK ("Memory leak
> > > debugging") turned on. They're both under the "kernel hacking"
> > > section of the kernel config. (If you have a file
> > > /proc/slab_allocators, then you already have these turned on and
> > > you can skip this step.)
> > >
> > > 2. Do whatever you need to do to reproduce the problem.
> > >
> > > 3. Get a copy of /proc/slabinfo and /proc/slab_allocators.
> > >
> > > Then we can take a look at that and see if it sheds any light.
> >
> >
> > I have taken several snapshots of the /proc/slab_allocators and
> > /proc/slabinfo as requested, but since there is a lot of info in them,
> > and I didn't think anyone wanted to go cross-eyed reading the data in an
> > email, I have them up on a website:
> >
> > http://shashi-weathers.net/linux/cluster/NFS/
>
> Excellent.
>
> >
> > The order of data collection is:
> >
> > slab_allocators_bad1.txt and corresponding slabinfo
> > slab_allocators_after_bad1.txt and corresponding slabinfo
> > slab_allocators_16_threads.txt and corresponding slabinfo
> > slab_allocators_16_threads_1.txt and corresponding slabinfo
> > slab_allocators_32_threads.txt and corresponding slabinfo
> > slab_allocators_really_bad.txt and corresponding slabinfo.
> >
> >
> > You will have to forgive my ignorance at this point, but I was looking
> > through the slabinfo and slab_allocators, and noticed that size-4096
> > does not show up in slab_allocators... I hope that is by design. You
> > can see it growing into the gigabytes in the slabinfo files....
>
> Argh. OK, I don't understand well enough how this works. Time to ask
> someone, I guess....
>
> --b.
>
> >
> >
> >
> > >
> > > I think that debugging will hurt the server performance, so you won't
> > > want to keep it turned on all the time.
> > >
> > > >
> > > > Also, this is all with the SLAB cache option. SLUB crashes
> > > everytime
> > > > I use it under heavy load.
> > >
> > > Have you reported the SLUB bugs to lkml?
> >
> > No, I haven't yet. I didn't know for sure if I was doing something
> > wrong, or if SLUB was the problem there. Since the failures, I had gone
> > back to using SLAB anyway, so .... I probably should...
> >
> > >
> > > --b.
> > >
> >
> >
> > Norman Weathers
next prev parent reply other threads:[~2008-06-11 19:52 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-03 18:50 Problems with large number of clients and reads Norman Weathers
2008-06-04 13:49 ` Chuck Lever
[not found] ` <76bd70e30806040649h53ab5d66x8c3423c551e94f77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-04 14:13 ` Norman Weathers
2008-06-05 18:54 ` Norman Weathers
2008-06-06 14:44 ` Chuck Lever
2008-06-09 13:56 ` Weathers, Norman R.
2008-06-06 0:06 ` Dean Hildebrand
2008-06-09 13:20 ` Weathers, Norman R.
2008-06-06 16:09 ` J. Bruce Fields
2008-06-09 14:19 ` Weathers, Norman R.
[not found] ` <0122F800A3B64C449565A9E8C2977010155587-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-09 18:53 ` J. Bruce Fields
2008-06-10 14:30 ` Weathers, Norman R.
[not found] ` <0122F800A3B64C449565A9E8C297701002D75D9F-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-10 17:16 ` J. Bruce Fields
2008-06-10 22:12 ` Weathers, Norman R.
[not found] ` <0122F800A3B64C449565A9E8C297701002D75DA3-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-11 18:46 ` J. Bruce Fields
2008-06-11 19:52 ` J. Bruce Fields [this message]
2008-06-11 19:52 ` CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger? J. Bruce Fields
2008-06-11 20:09 ` Jeff Layton
2008-06-11 20:09 ` Jeff Layton
[not found] ` <20080611160947.5f08fb16-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-11 20:57 ` J. Bruce Fields
2008-06-11 20:57 ` J. Bruce Fields
2008-06-11 22:46 ` Weathers, Norman R.
2008-06-11 22:46 ` Weathers, Norman R.
[not found] ` <0122F800A3B64C449565A9E8C297701002D75DAA-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-11 22:54 ` J. Bruce Fields
2008-06-11 22:54 ` J. Bruce Fields
2008-06-12 19:54 ` Weathers, Norman R.
2008-06-12 19:54 ` Weathers, Norman R.
[not found] ` <0122F800A3B64C449565A9E8C297701002D75DAE-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-13 20:15 ` J. Bruce Fields
2008-06-13 20:15 ` J. Bruce Fields
2008-06-13 21:53 ` Weathers, Norman R.
2008-06-13 21:53 ` Weathers, Norman R.
[not found] ` <0122F800A3B64C449565A9E8C297701002D75DB6-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-13 22:04 ` J. Bruce Fields
2008-06-13 22:04 ` J. Bruce Fields
2008-06-13 22:53 ` Weathers, Norman R.
2008-06-13 22:53 ` Weathers, Norman R.
[not found] ` <0122F800A3B64C449565A9E8C297701002D75DB7-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-16 17:43 ` J. Bruce Fields
2008-06-16 17:43 ` J. Bruce Fields
2008-06-19 15:53 ` Weathers, Norman R.
2008-06-19 15:53 ` Weathers, Norman R.
[not found] ` <0122F800A3B64C449565A9E8C297701002D75DD4-zIGg2qceuZx7uNL6xugVa6xOck334EZe@public.gmane.org>
2008-06-19 18:46 ` J. Bruce Fields
2008-06-19 18:46 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080611195222.GP15380@fieldses.org \
--to=bfields@fieldses.org \
--cc=Norman.R.Weathers-496aOtIFJR1B+Kdf37RAV9BPR1lH4CV8@public.gmane.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.