* possible bug in kmem_cache related code
@ 2006-04-27 8:40 Or Gerlitz
2006-04-27 11:19 ` Pekka Enberg
0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2006-04-27 8:40 UTC (permalink / raw)
To: linux-kernel; +Cc: openib-general, open-iscsi
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2835 bytes --]
With 2.6.17-rc3 I'm running into something which seems as a bug related
to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
later attempting to destroy it yields the following message and trace
============================================================================
slab error in kmem_cache_destroy(): cache `my_cache': Can't free all objects
Call Trace: <ffffffff8106e46b>{kmem_cache_destroy+150}
<ffffffff88204033>{:my_kcache:kcache_cleanup_module+51}
<ffffffff81044cd3>{sys_delete_module+415} <ffffffff8112fb5b>{__up_write+20}
<ffffffff8105d42b>{sys_munmap+91} <ffffffff8100966a>{system_call+126}
Failed to destroy cache
============================================================================
I was hitting it as an Infiniband/iSCSI user as IB/iSCSI/SCSI code use
kmem_caches, but since the failure happens on a code which works fine on
2.6.16 i have decided to try it with a synthetic module and had this hit...
Below is a sample code that reproduces it, if i only do kmem_cache_create
and later destroy it does not happen, attached is my .config please note
that some of the CONFIG_DEBUG_ options are open.
Please CC openib-general@openib.org at least with the resolution of the
matter since it kind of hard to do testing over 2.6.17-rcX with this
issue, the tests run fine but some modules are crashing on rmmod so a
reboot it needed...
thanks,
Or.
This is the related slab info line once the module is loaded
my_cache 256 264 328 12 1 : tunables 32 16 8
: slabdata 22 22 0 : globalstat 264 264 22 0
--- /deb/null 1970-01-01 02:00:00.000000000 +0200
+++ kcache/kcache.c 2006-04-27 10:43:18.000000000 +0300
@@ -0,0 +1,61 @@
+#include <linux/module.h>
+#include <linux/slab.h>
+
+kmem_cache_t *cache;
+
+struct foo {
+ char bar[300];
+};
+
+
+#define TRIES 256
+
+struct foo *foo_arr[TRIES];
+
+static int __init kcache_init_module(void)
+{
+ int i, j;
+
+ cache = kmem_cache_create("my_cache",
+ sizeof (struct foo),
+ 0,
+ SLAB_HWCACHE_ALIGN,
+ NULL,
+ NULL);
+ if (!cache) {
+ printk(KERN_ERR "couldn't create cache\n");
+ goto error1;
+ }
+
+ for (i = 0; i < TRIES; i++) {
+ foo_arr[i] = kmem_cache_alloc(cache, GFP_KERNEL);
+ if (foo_arr[i] == NULL) {
+ printk(KERN_ERR "couldn't allocate from cache\n");
+ goto error2;
+ }
+ }
+
+ return 0;
+error2:
+ for (j = 0; j < i; j++)
+ kmem_cache_free(cache, foo_arr[j]);
+error1:
+ return -ENOMEM;
+}
+
+static void __exit kcache_cleanup_module(void)
+{
+ int i;
+
+ for (i = 0; i < TRIES; i++)
+ kmem_cache_free(cache, foo_arr[i]);
+
+ if (kmem_cache_destroy(cache)) {
+ printk(KERN_DEBUG "Failed to destroy cache\n");
+ }
+}
+
+MODULE_LICENSE("GPL");
+
+module_init(kcache_init_module);
+module_exit(kcache_cleanup_module);
[-- Attachment #2: Type: APPLICATION/x-bzip2, Size: 10879 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: possible bug in kmem_cache related code
2006-04-27 8:40 possible bug in kmem_cache related code Or Gerlitz
@ 2006-04-27 11:19 ` Pekka Enberg
2006-04-27 22:22 ` Christoph Lameter
0 siblings, 1 reply; 10+ messages in thread
From: Pekka Enberg @ 2006-04-27 11:19 UTC (permalink / raw)
To: Or Gerlitz
Cc: linux-kernel, openib-general, open-iscsi, clameter, Andrew Morton
On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> With 2.6.17-rc3 I'm running into something which seems as a bug related
> to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> later attempting to destroy it yields the following message and trace
Tested on 2.6.16.7 and works ok. Christoph, could this be related to
the cache draining patches that went in 2.6.17-rc1?
Pekka
>
> ============================================================================
> slab error in kmem_cache_destroy(): cache `my_cache': Can't free all objects
>
> Call Trace: <ffffffff8106e46b>{kmem_cache_destroy+150}
> <ffffffff88204033>{:my_kcache:kcache_cleanup_module+51}
> <ffffffff81044cd3>{sys_delete_module+415} <ffffffff8112fb5b>{__up_write+20}
> <ffffffff8105d42b>{sys_munmap+91} <ffffffff8100966a>{system_call+126}
>
> Failed to destroy cache
> ============================================================================
>
> I was hitting it as an Infiniband/iSCSI user as IB/iSCSI/SCSI code use
> kmem_caches, but since the failure happens on a code which works fine on
> 2.6.16 i have decided to try it with a synthetic module and had this hit...
>
> Below is a sample code that reproduces it, if i only do kmem_cache_create
> and later destroy it does not happen, attached is my .config please note
> that some of the CONFIG_DEBUG_ options are open.
>
> Please CC openib-general@openib.org at least with the resolution of the
> matter since it kind of hard to do testing over 2.6.17-rcX with this
> issue, the tests run fine but some modules are crashing on rmmod so a
> reboot it needed...
>
> thanks,
>
> Or.
>
> This is the related slab info line once the module is loaded
>
> my_cache 256 264 328 12 1 : tunables 32 16 8
> : slabdata 22 22 0 : globalstat 264 264 22 0
>
> --- /deb/null 1970-01-01 02:00:00.000000000 +0200
> +++ kcache/kcache.c 2006-04-27 10:43:18.000000000 +0300
> @@ -0,0 +1,61 @@
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +
> +kmem_cache_t *cache;
> +
> +struct foo {
> + char bar[300];
> +};
> +
> +
> +#define TRIES 256
> +
> +struct foo *foo_arr[TRIES];
> +
> +static int __init kcache_init_module(void)
> +{
> + int i, j;
> +
> + cache = kmem_cache_create("my_cache",
> + sizeof (struct foo),
> + 0,
> + SLAB_HWCACHE_ALIGN,
> + NULL,
> + NULL);
> + if (!cache) {
> + printk(KERN_ERR "couldn't create cache\n");
> + goto error1;
> + }
> +
> + for (i = 0; i < TRIES; i++) {
> + foo_arr[i] = kmem_cache_alloc(cache, GFP_KERNEL);
> + if (foo_arr[i] == NULL) {
> + printk(KERN_ERR "couldn't allocate from cache\n");
> + goto error2;
> + }
> + }
> +
> + return 0;
> +error2:
> + for (j = 0; j < i; j++)
> + kmem_cache_free(cache, foo_arr[j]);
> +error1:
> + return -ENOMEM;
> +}
> +
> +static void __exit kcache_cleanup_module(void)
> +{
> + int i;
> +
> + for (i = 0; i < TRIES; i++)
> + kmem_cache_free(cache, foo_arr[i]);
> +
> + if (kmem_cache_destroy(cache)) {
> + printk(KERN_DEBUG "Failed to destroy cache\n");
> + }
> +}
> +
> +MODULE_LICENSE("GPL");
> +
> +module_init(kcache_init_module);
> +module_exit(kcache_cleanup_module);
>
>
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: possible bug in kmem_cache related code
2006-04-27 11:19 ` Pekka Enberg
@ 2006-04-27 22:22 ` Christoph Lameter
2006-04-28 6:03 ` Pekka J Enberg
2006-04-28 8:10 ` Pekka J Enberg
0 siblings, 2 replies; 10+ messages in thread
From: Christoph Lameter @ 2006-04-27 22:22 UTC (permalink / raw)
To: Pekka Enberg
Cc: Or Gerlitz, linux-kernel, openib-general, open-iscsi,
Andrew Morton
On Thu, 27 Apr 2006, Pekka Enberg wrote:
> On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > With 2.6.17-rc3 I'm running into something which seems as a bug related
> > to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> > later attempting to destroy it yields the following message and trace
>
> Tested on 2.6.16.7 and works ok. Christoph, could this be related to
> the cache draining patches that went in 2.6.17-rc1?
What happened to that part of the slab allocator? Looks completely
changed to when I saw it the last time?
This directly fails in kmem_cache_destroy?
So it tries to free all the slab entries from the free list and then
returns 1 or 2 if there are entries left on the partial and full
list? So the bug happens if cache entries are left.
Guess the reason for this failure is then that not all cache entries have
been freed before calling kmem_cache_destroy()?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: possible bug in kmem_cache related code
2006-04-27 22:22 ` Christoph Lameter
@ 2006-04-28 6:03 ` Pekka J Enberg
2006-04-28 8:10 ` Pekka J Enberg
1 sibling, 0 replies; 10+ messages in thread
From: Pekka J Enberg @ 2006-04-28 6:03 UTC (permalink / raw)
To: Christoph Lameter
Cc: Or Gerlitz, linux-kernel, openib-general, open-iscsi,
Andrew Morton
On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > > With 2.6.17-rc3 I'm running into something which seems as a bug related
> > > to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> > > later attempting to destroy it yields the following message and trace
On Thu, 27 Apr 2006, Pekka Enberg wrote:
> > Tested on 2.6.16.7 and works ok. Christoph, could this be related to
> > the cache draining patches that went in 2.6.17-rc1?
On Thu, 27 Apr 2006, Christoph Lameter wrote:
> What happened to that part of the slab allocator? Looks completely
> changed to when I saw it the last time?
>
> This directly fails in kmem_cache_destroy?
>
> So it tries to free all the slab entries from the free list and then
> returns 1 or 2 if there are entries left on the partial and full
> list? So the bug happens if cache entries are left.
>
> Guess the reason for this failure is then that not all cache entries have
> been freed before calling kmem_cache_destroy()?
Yes, but if you look at Or's test case, there's no obvious reason why
that's happening. I'll see if I can reproduce the problem with 2.6.17-rc3.
Pekka
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: possible bug in kmem_cache related code
2006-04-27 22:22 ` Christoph Lameter
2006-04-28 6:03 ` Pekka J Enberg
@ 2006-04-28 8:10 ` Pekka J Enberg
2006-04-28 19:24 ` [openib-general] " Or Gerlitz
1 sibling, 1 reply; 10+ messages in thread
From: Pekka J Enberg @ 2006-04-28 8:10 UTC (permalink / raw)
To: Christoph Lameter
Cc: Or Gerlitz, linux-kernel, openib-general, open-iscsi,
Andrew Morton
On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > > With 2.6.17-rc3 I'm running into something which seems as a bug related
> > > to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> > > later attempting to destroy it yields the following message and trace
On Thu, 27 Apr 2006, Pekka Enberg wrote:
> > Tested on 2.6.16.7 and works ok. Christoph, could this be related to
> > the cache draining patches that went in 2.6.17-rc1?
On Thu, 27 Apr 2006, Christoph Lameter wrote:
> What happened to that part of the slab allocator? Looks completely
> changed to when I saw it the last time?
>
> This directly fails in kmem_cache_destroy?
>
> So it tries to free all the slab entries from the free list and then
> returns 1 or 2 if there are entries left on the partial and full
> list? So the bug happens if cache entries are left.
>
> Guess the reason for this failure is then that not all cache entries have
> been freed before calling kmem_cache_destroy()?
I can't reproduce this with Linus' git head on User-mode Linux running on
UP i386. Or, can you reproduce this at will? Any local modifications? Can
we see your .config, please.
Pekka
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [openib-general] Re: possible bug in kmem_cache related code
2006-04-28 8:10 ` Pekka J Enberg
@ 2006-04-28 19:24 ` Or Gerlitz
2006-04-29 6:44 ` Pekka Enberg
0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2006-04-28 19:24 UTC (permalink / raw)
To: Pekka J Enberg
Cc: Christoph Lameter, Andrew Morton, open-iscsi, linux-kernel,
openib-general
On 4/28/06, Pekka J Enberg <penberg@cs.helsinki.fi> wrote:
> On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > > > With 2.6.17-rc3 I'm running into something which seems as a bug related
> > > > to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> > > > later attempting to destroy it yields the following message and trace
> On Thu, 27 Apr 2006, Pekka Enberg wrote:
> > > Tested on 2.6.16.7 and works ok. Christoph, could this be related to
> > > the cache draining patches that went in 2.6.17-rc1?
> I can't reproduce this with Linus' git head on User-mode Linux running on
> UP i386. Or, can you reproduce this at will? Any local modifications? Can
> we see your .config, please.
Yes, i can reproduce this at will, no local modifications, my system
is amd dual
x86_64, i have attached my .config to the first email of this thread,
and also mentioned
that some CONFIG_DEBUG_ options are set, including one related to slab
debugging.
Also, by "User mode Linux" you mean linux kernel that runs as a user
process on your system?
Or.
Or.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [openib-general] Re: possible bug in kmem_cache related code
2006-04-28 19:24 ` [openib-general] " Or Gerlitz
@ 2006-04-29 6:44 ` Pekka Enberg
2006-05-01 13:40 ` Or Gerlitz
0 siblings, 1 reply; 10+ messages in thread
From: Pekka Enberg @ 2006-04-29 6:44 UTC (permalink / raw)
To: Or Gerlitz
Cc: Christoph Lameter, Andrew Morton, open-iscsi, linux-kernel,
openib-general
On Fri, 2006-04-28 at 21:24 +0200, Or Gerlitz wrote:
> Yes, i can reproduce this at will, no local modifications, my system
> is amd dual x86_64, i have attached my .config to the first email of
> this thread, and also mentioned that some CONFIG_DEBUG_ options are
> set, including one related to slab debugging.
>
> Also, by "User mode Linux" you mean linux kernel that runs as a user
> process on your system?
Yeah, arch/um/. Unfortunately I don't have a SMP box, so I probably
can't reproduce this. You could try git bisect to isolate the offending
changeset.
Pekka
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [openib-general] Re: possible bug in kmem_cache related code
2006-04-29 6:44 ` Pekka Enberg
@ 2006-05-01 13:40 ` Or Gerlitz
2006-05-03 22:04 ` Ard van Breemen
0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2006-05-01 13:40 UTC (permalink / raw)
To: Pekka Enberg
Cc: Or Gerlitz, Andrew Morton, open-iscsi, linux-kernel,
openib-general, Christoph Lameter
Pekka Enberg wrote:
> On Fri, 2006-04-28 at 21:24 +0200, Or Gerlitz wrote:
>> Yes, i can reproduce this at will, no local modifications, my system
>> is amd dual x86_64, i have attached my .config to the first email of
>> this thread, and also mentioned that some CONFIG_DEBUG_ options are
>> set, including one related to slab debugging.
>>
> Yeah, arch/um/. Unfortunately I don't have a SMP box, so I probably
> can't reproduce this. You could try git bisect to isolate the offending
> changeset.
mmm, I might be able to do git bisection later this week or next week.
However, for the mean time can more people of the openib and open iscsi
communities set 2.6.17-rcX to see that the issue reproduces with my
synthetic module and with ib/iscsi code (you know this kernel will be
out in few weeks from now...)
Or.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [openib-general] Re: possible bug in kmem_cache related code
2006-05-01 13:40 ` Or Gerlitz
@ 2006-05-03 22:04 ` Ard van Breemen
0 siblings, 0 replies; 10+ messages in thread
From: Ard van Breemen @ 2006-05-03 22:04 UTC (permalink / raw)
To: linux-kernel
Or Gerlitz wrote:
> However, for the mean time can more people of the openib and
> open iscsi communities set 2.6.17-rcX to see that the issue
> reproduces with my synthetic module and with ib/iscsi code (you
> know this kernel will be out in few weeks from now...)
For what it's worth:
On a dual opteron running 2.6.17-rc2-git6 with reiser4 for 2.6.16
patch:
md layer, raid5 on 4 disks, no other stuff then that.
at 23:38 I said: mdadm --stop /dev/md6
May 2 20:38:27 jip kernel: <5>reiser4[dd(2791)]: disable_write_barrier (fs/reiser4/wander.c:234)[zam-1055]:
May 2 20:38:27 jip kernel: NOTICE: md6 does not support write barriers, using synchronous write instead.
May 2 20:38:27 jip kernel:
May 3 23:38:19 jip kernel: slab error in kmem_cache_destroy(): cache `raid5/md6': Can't free all objects
May 3 23:38:19 jip kernel:
May 3 23:38:19 jip kernel: Call Trace: <ffffffff802749cc>{kmem_cache_destroy+156}
May 3 23:38:19 jip kernel: <ffffffff8044ed71>{shrink_stripes+33} <ffffffff80452993>{stop+51}
May 3 23:38:19 jip kernel: <ffffffff8045daf5>{do_md_stop+245} <ffffffff80255035>{filemap_nopage+389}
May 3 23:38:19 jip kernel: <ffffffff8045f648>{md_ioctl+744} <ffffffff802631a0>{do_no_page+576}
May 3 23:38:19 jip kernel: <ffffffff8035ba14>{blkdev_driver_ioctl+100} <ffffffff8035bc3d>{blkdev_ioctl+493}
May 3 23:38:19 jip kernel: <ffffffff80369451>{__up_read+33} <ffffffff802826bb>{block_ioctl+27}
May 3 23:38:19 jip kernel: <ffffffff8028c83a>{do_ioctl+58} <ffffffff8028cb61>{vfs_ioctl+449}
May 3 23:38:19 jip kernel: <ffffffff8028cbdd>{sys_ioctl+77} <ffffffff802a814b>{do_ioctl32_pointer+11}
May 3 23:38:19 jip kernel: <ffffffff802a61e2>{compat_sys_ioctl+386} <ffffffff8021c85e>{ia32_sysret+0}
May 3 23:38:19 jip kernel: md: md6 stopped.
Second system, same specs, except running drbd on all sata disks instead of
raid5 (yes, external module):
May 3 15:49:19 localhost kernel: drbd1: drbd_cleanup: (!list_empty(&mdev->data.work.q)) in /usr/src/kernel/tyan-s2891/git/modules/drbd/drbd/drbd_main.c:2173
May 3 15:49:19 localhost kernel: drbd1: lp = ffff81007c4f8888 in /usr/src/kernel/tyan-s2891/git/modules/drbd/drbd/drbd_main.c:2176
May 3 15:49:19 localhost kernel: slab error in kmem_cache_destroy(): cache `drbd_ee_cache': Can't free all objects
May 3 15:49:19 localhost kernel:
May 3 15:49:19 localhost kernel: Call Trace: <ffffffff802749cc>{kmem_cache_destroy+156}
May 3 15:49:19 localhost kernel: <ffffffff8807df01>{:drbd:drbd_destroy_mempools+113}
May 3 15:49:19 localhost kernel: <ffffffff8807f2f2>{:drbd:drbd_cleanup+1074} <ffffffff802484e8>{sys_delete_module+312}
May 3 15:49:19 localhost kernel: <ffffffff802663b5>{sys_munmap+85} <ffffffff80209b5a>{system_call+126}
May 3 15:49:19 localhost kernel: drbd: kmem_cache_destroy(drbd_ee_cache) FAILED
May 3 15:49:19 localhost kernel: slab error in kmem_cache_destroy(): cache `drbd_req_cache': Can't free all objects
May 3 15:49:19 localhost kernel:
May 3 15:49:19 localhost kernel: Call Trace: <ffffffff802749cc>{kmem_cache_destroy+156}
May 3 15:49:19 localhost kernel: <ffffffff8807df24>{:drbd:drbd_destroy_mempools+148}
May 3 15:49:19 localhost kernel: <ffffffff8807f2f2>{:drbd:drbd_cleanup+1074} <ffffffff802484e8>{sys_delete_module+312}
May 3 15:49:19 localhost kernel: <ffffffff802663b5>{sys_munmap+85} <ffffffff80209b5a>{system_call+126}
May 3 15:49:19 localhost kernel: drbd: kmem_cache_destroy(drbd_request_cache) FAILED
May 3 15:49:19 localhost kernel: drbd: module cleanup done.
NUMA and such is enabled
--
begin LOVE-LETTER-FOR-YOU.txt.vbs
I am a signature virus. Distribute me until the bitter
end
^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <OF74DEDEC9.CB33A0DB-ON8725715E.0023266E-8825715E.002874C1@us.ibm.com>]
end of thread, other threads:[~2006-05-03 22:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-27 8:40 possible bug in kmem_cache related code Or Gerlitz
2006-04-27 11:19 ` Pekka Enberg
2006-04-27 22:22 ` Christoph Lameter
2006-04-28 6:03 ` Pekka J Enberg
2006-04-28 8:10 ` Pekka J Enberg
2006-04-28 19:24 ` [openib-general] " Or Gerlitz
2006-04-29 6:44 ` Pekka Enberg
2006-05-01 13:40 ` Or Gerlitz
2006-05-03 22:04 ` Ard van Breemen
[not found] <OF74DEDEC9.CB33A0DB-ON8725715E.0023266E-8825715E.002874C1@us.ibm.com>
2006-04-28 6:46 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox