public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* possible bug in kmem_cache related code
@ 2006-04-27  8:40 Or Gerlitz
  2006-04-27 11:19 ` Pekka Enberg
  0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2006-04-27  8:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: openib-general, open-iscsi

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2835 bytes --]

With 2.6.17-rc3 I'm running into something which seems as a bug related 
to kmem_cache. Doing some allocations/deallocations from a kmem_cache and 
later attempting to destroy it yields the following message and trace

============================================================================
slab error in kmem_cache_destroy(): cache `my_cache': Can't free all objects

Call Trace: <ffffffff8106e46b>{kmem_cache_destroy+150}
       <ffffffff88204033>{:my_kcache:kcache_cleanup_module+51}
       <ffffffff81044cd3>{sys_delete_module+415} <ffffffff8112fb5b>{__up_write+20}
       <ffffffff8105d42b>{sys_munmap+91} <ffffffff8100966a>{system_call+126}

Failed to destroy cache
============================================================================

I was hitting it as an Infiniband/iSCSI user as IB/iSCSI/SCSI code use 
kmem_caches, but since the failure happens on a code which works fine on 
2.6.16 i have decided to try it with a synthetic module and had this hit...

Below is a sample code that reproduces it, if i only do kmem_cache_create 
and later destroy it does not happen, attached is my .config please note
that some of the CONFIG_DEBUG_ options are open.

Please CC openib-general@openib.org at least with the resolution of the 
matter since it kind of hard to do testing over 2.6.17-rcX with this 
issue, the tests run fine but some modules are crashing on rmmod so a
reboot it needed...

thanks,

Or.

This is the related slab info line once the module is loaded

my_cache  256    264    328   12    1 : tunables   32   16    8 
: slabdata     22     22      0 : globalstat     264    264    22    0

--- /deb/null	1970-01-01 02:00:00.000000000 +0200
+++ kcache/kcache.c	2006-04-27 10:43:18.000000000 +0300
@@ -0,0 +1,61 @@
+#include <linux/module.h>
+#include <linux/slab.h>
+
+kmem_cache_t *cache;
+
+struct foo {
+	char bar[300];
+};
+
+
+#define TRIES 256
+
+struct foo *foo_arr[TRIES];
+
+static int __init kcache_init_module(void)
+{
+	int i, j;
+
+	cache = kmem_cache_create("my_cache",
+				  sizeof (struct foo),
+				  0,
+				  SLAB_HWCACHE_ALIGN,
+				  NULL,
+				  NULL);
+	if (!cache) {
+		printk(KERN_ERR "couldn't create cache\n");
+		goto error1;
+	}
+
+	for (i = 0; i < TRIES; i++) {
+		foo_arr[i] = kmem_cache_alloc(cache, GFP_KERNEL);
+		if (foo_arr[i] == NULL) {
+			printk(KERN_ERR "couldn't allocate from cache\n");
+			goto error2;
+		}
+	}
+
+	return 0;
+error2:
+	for (j = 0; j < i; j++)
+		kmem_cache_free(cache, foo_arr[j]);
+error1:
+	return -ENOMEM;
+}
+
+static void __exit kcache_cleanup_module(void)
+{
+	int i;
+
+	for (i = 0; i < TRIES; i++)
+		kmem_cache_free(cache, foo_arr[i]);
+
+	if (kmem_cache_destroy(cache)) {
+		printk(KERN_DEBUG "Failed to destroy cache\n");
+	}
+}
+
+MODULE_LICENSE("GPL");
+
+module_init(kcache_init_module);
+module_exit(kcache_cleanup_module);



[-- Attachment #2: Type: APPLICATION/x-bzip2, Size: 10879 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: possible bug in kmem_cache related code
  2006-04-27  8:40 possible bug in kmem_cache related code Or Gerlitz
@ 2006-04-27 11:19 ` Pekka Enberg
  2006-04-27 22:22   ` Christoph Lameter
  0 siblings, 1 reply; 10+ messages in thread
From: Pekka Enberg @ 2006-04-27 11:19 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: linux-kernel, openib-general, open-iscsi, clameter, Andrew Morton

On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> With 2.6.17-rc3 I'm running into something which seems as a bug related
> to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> later attempting to destroy it yields the following message and trace

Tested on 2.6.16.7 and works ok. Christoph, could this be related to
the cache draining patches that went in 2.6.17-rc1?

                                                    Pekka

>
> ============================================================================
> slab error in kmem_cache_destroy(): cache `my_cache': Can't free all objects
>
> Call Trace: <ffffffff8106e46b>{kmem_cache_destroy+150}
>        <ffffffff88204033>{:my_kcache:kcache_cleanup_module+51}
>        <ffffffff81044cd3>{sys_delete_module+415} <ffffffff8112fb5b>{__up_write+20}
>        <ffffffff8105d42b>{sys_munmap+91} <ffffffff8100966a>{system_call+126}
>
> Failed to destroy cache
> ============================================================================
>
> I was hitting it as an Infiniband/iSCSI user as IB/iSCSI/SCSI code use
> kmem_caches, but since the failure happens on a code which works fine on
> 2.6.16 i have decided to try it with a synthetic module and had this hit...
>
> Below is a sample code that reproduces it, if i only do kmem_cache_create
> and later destroy it does not happen, attached is my .config please note
> that some of the CONFIG_DEBUG_ options are open.
>
> Please CC openib-general@openib.org at least with the resolution of the
> matter since it kind of hard to do testing over 2.6.17-rcX with this
> issue, the tests run fine but some modules are crashing on rmmod so a
> reboot it needed...
>
> thanks,
>
> Or.
>
> This is the related slab info line once the module is loaded
>
> my_cache  256    264    328   12    1 : tunables   32   16    8
> : slabdata     22     22      0 : globalstat     264    264    22    0
>
> --- /deb/null   1970-01-01 02:00:00.000000000 +0200
> +++ kcache/kcache.c     2006-04-27 10:43:18.000000000 +0300
> @@ -0,0 +1,61 @@
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +
> +kmem_cache_t *cache;
> +
> +struct foo {
> +       char bar[300];
> +};
> +
> +
> +#define TRIES 256
> +
> +struct foo *foo_arr[TRIES];
> +
> +static int __init kcache_init_module(void)
> +{
> +       int i, j;
> +
> +       cache = kmem_cache_create("my_cache",
> +                                 sizeof (struct foo),
> +                                 0,
> +                                 SLAB_HWCACHE_ALIGN,
> +                                 NULL,
> +                                 NULL);
> +       if (!cache) {
> +               printk(KERN_ERR "couldn't create cache\n");
> +               goto error1;
> +       }
> +
> +       for (i = 0; i < TRIES; i++) {
> +               foo_arr[i] = kmem_cache_alloc(cache, GFP_KERNEL);
> +               if (foo_arr[i] == NULL) {
> +                       printk(KERN_ERR "couldn't allocate from cache\n");
> +                       goto error2;
> +               }
> +       }
> +
> +       return 0;
> +error2:
> +       for (j = 0; j < i; j++)
> +               kmem_cache_free(cache, foo_arr[j]);
> +error1:
> +       return -ENOMEM;
> +}
> +
> +static void __exit kcache_cleanup_module(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < TRIES; i++)
> +               kmem_cache_free(cache, foo_arr[i]);
> +
> +       if (kmem_cache_destroy(cache)) {
> +               printk(KERN_DEBUG "Failed to destroy cache\n");
> +       }
> +}
> +
> +MODULE_LICENSE("GPL");
> +
> +module_init(kcache_init_module);
> +module_exit(kcache_cleanup_module);
>
>
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: possible bug in kmem_cache related code
  2006-04-27 11:19 ` Pekka Enberg
@ 2006-04-27 22:22   ` Christoph Lameter
  2006-04-28  6:03     ` Pekka J Enberg
  2006-04-28  8:10     ` Pekka J Enberg
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Lameter @ 2006-04-27 22:22 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Or Gerlitz, linux-kernel, openib-general, open-iscsi,
	Andrew Morton

On Thu, 27 Apr 2006, Pekka Enberg wrote:

> On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > With 2.6.17-rc3 I'm running into something which seems as a bug related
> > to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> > later attempting to destroy it yields the following message and trace
> 
> Tested on 2.6.16.7 and works ok. Christoph, could this be related to
> the cache draining patches that went in 2.6.17-rc1?

What happened to that part of the slab allocator? Looks completely  
changed to when I saw it the last time?

This directly fails in kmem_cache_destroy?

So it tries to free all the slab entries from the free list and then 
returns 1 or 2 if there are entries left on the partial and full 
list? So the bug happens if cache entries are left.

Guess the reason for this failure is then that not all cache entries have 
been freed before calling kmem_cache_destroy()?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: possible bug in kmem_cache related code
  2006-04-27 22:22   ` Christoph Lameter
@ 2006-04-28  6:03     ` Pekka J Enberg
  2006-04-28  8:10     ` Pekka J Enberg
  1 sibling, 0 replies; 10+ messages in thread
From: Pekka J Enberg @ 2006-04-28  6:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Or Gerlitz, linux-kernel, openib-general, open-iscsi,
	Andrew Morton

On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > > With 2.6.17-rc3 I'm running into something which seems as a bug related
> > > to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> > > later attempting to destroy it yields the following message and trace

On Thu, 27 Apr 2006, Pekka Enberg wrote:
> > Tested on 2.6.16.7 and works ok. Christoph, could this be related to
> > the cache draining patches that went in 2.6.17-rc1?

On Thu, 27 Apr 2006, Christoph Lameter wrote:
> What happened to that part of the slab allocator? Looks completely  
> changed to when I saw it the last time?
> 
> This directly fails in kmem_cache_destroy?
> 
> So it tries to free all the slab entries from the free list and then 
> returns 1 or 2 if there are entries left on the partial and full 
> list? So the bug happens if cache entries are left.
> 
> Guess the reason for this failure is then that not all cache entries have 
> been freed before calling kmem_cache_destroy()?

Yes, but if you look at Or's test case, there's no obvious reason why 
that's happening. I'll see if I can reproduce the problem with 2.6.17-rc3.

					Pekka

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [openib-general] Re: possible bug in kmem_cache related code
       [not found] <OF74DEDEC9.CB33A0DB-ON8725715E.0023266E-8825715E.002874C1@us.ibm.com>
@ 2006-04-28  6:46 ` Christoph Lameter
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2006-04-28  6:46 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Pekka J Enberg, Andrew Morton, linux-kernel, Or Gerlitz,
	open-iscsi, openib-general, openib-general-bounces

On Thu, 27 Apr 2006, Shirley Ma wrote:

> I hit a similar problem while calling kzalloc(). it happened on 
> linux-2.6.17-rc1 + ppc64.
> 
> kernel BUG in __cache_alloc_node at mm/slab.c:2934!
> which is 
>         BUG_ON(slabp->inuse == cachep->num);

More entries were added to a slab than allowed? This suggests a race on
slabp->inuse.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: possible bug in kmem_cache related code
  2006-04-27 22:22   ` Christoph Lameter
  2006-04-28  6:03     ` Pekka J Enberg
@ 2006-04-28  8:10     ` Pekka J Enberg
  2006-04-28 19:24       ` [openib-general] " Or Gerlitz
  1 sibling, 1 reply; 10+ messages in thread
From: Pekka J Enberg @ 2006-04-28  8:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Or Gerlitz, linux-kernel, openib-general, open-iscsi,
	Andrew Morton

On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > > With 2.6.17-rc3 I'm running into something which seems as a bug related
> > > to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> > > later attempting to destroy it yields the following message and trace

On Thu, 27 Apr 2006, Pekka Enberg wrote:
> > Tested on 2.6.16.7 and works ok. Christoph, could this be related to
> > the cache draining patches that went in 2.6.17-rc1?

On Thu, 27 Apr 2006, Christoph Lameter wrote:
> What happened to that part of the slab allocator? Looks completely  
> changed to when I saw it the last time?
> 
> This directly fails in kmem_cache_destroy?
> 
> So it tries to free all the slab entries from the free list and then 
> returns 1 or 2 if there are entries left on the partial and full 
> list? So the bug happens if cache entries are left.
> 
> Guess the reason for this failure is then that not all cache entries have 
> been freed before calling kmem_cache_destroy()?

I can't reproduce this with Linus' git head on User-mode Linux running on 
UP i386. Or, can you reproduce this at will? Any local modifications? Can 
we see your .config, please.

					Pekka

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [openib-general] Re: possible bug in kmem_cache related code
  2006-04-28  8:10     ` Pekka J Enberg
@ 2006-04-28 19:24       ` Or Gerlitz
  2006-04-29  6:44         ` Pekka Enberg
  0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2006-04-28 19:24 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Christoph Lameter, Andrew Morton, open-iscsi, linux-kernel,
	openib-general

On 4/28/06, Pekka J Enberg <penberg@cs.helsinki.fi> wrote:
> On 4/27/06, Or Gerlitz <ogerlitz@voltaire.com> wrote:
> > > > With 2.6.17-rc3 I'm running into something which seems as a bug related
> > > > to kmem_cache. Doing some allocations/deallocations from a kmem_cache and
> > > > later attempting to destroy it yields the following message and trace

> On Thu, 27 Apr 2006, Pekka Enberg wrote:
> > > Tested on 2.6.16.7 and works ok. Christoph, could this be related to
> > > the cache draining patches that went in 2.6.17-rc1?

> I can't reproduce this with Linus' git head on User-mode Linux running on
> UP i386. Or, can you reproduce this at will? Any local modifications? Can
> we see your .config, please.

Yes, i can reproduce this at will, no local modifications, my system
is amd dual
x86_64, i have attached my .config to the first email of this thread,
and also mentioned
that some CONFIG_DEBUG_ options are set, including one related to slab
debugging.

Also, by "User mode Linux" you mean linux kernel that runs as a user
process on your system?

Or.

Or.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [openib-general] Re: possible bug in kmem_cache related code
  2006-04-28 19:24       ` [openib-general] " Or Gerlitz
@ 2006-04-29  6:44         ` Pekka Enberg
  2006-05-01 13:40           ` Or Gerlitz
  0 siblings, 1 reply; 10+ messages in thread
From: Pekka Enberg @ 2006-04-29  6:44 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Christoph Lameter, Andrew Morton, open-iscsi, linux-kernel,
	openib-general

On Fri, 2006-04-28 at 21:24 +0200, Or Gerlitz wrote:
> Yes, i can reproduce this at will, no local modifications, my system
> is amd dual x86_64, i have attached my .config to the first email of
> this thread, and also mentioned that some CONFIG_DEBUG_ options are
> set, including one related to slab debugging.
> 
> Also, by "User mode Linux" you mean linux kernel that runs as a user
> process on your system?

Yeah, arch/um/. Unfortunately I don't have a SMP box, so I probably
can't reproduce this. You could try git bisect to isolate the offending
changeset.

				Pekka


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [openib-general] Re: possible bug in kmem_cache related code
  2006-04-29  6:44         ` Pekka Enberg
@ 2006-05-01 13:40           ` Or Gerlitz
  2006-05-03 22:04             ` Ard van Breemen
  0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2006-05-01 13:40 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Or Gerlitz, Andrew Morton, open-iscsi, linux-kernel,
	openib-general, Christoph Lameter

Pekka Enberg wrote:
> On Fri, 2006-04-28 at 21:24 +0200, Or Gerlitz wrote:
>> Yes, i can reproduce this at will, no local modifications, my system
>> is amd dual x86_64, i have attached my .config to the first email of
>> this thread, and also mentioned that some CONFIG_DEBUG_ options are
>> set, including one related to slab debugging.
>>

> Yeah, arch/um/. Unfortunately I don't have a SMP box, so I probably
> can't reproduce this. You could try git bisect to isolate the offending
> changeset.

mmm, I might be able to do git bisection later this week or next week.

However, for the mean time can more people of the openib and open iscsi 
communities set 2.6.17-rcX to see that the issue reproduces with my 
synthetic module and with ib/iscsi code (you know this kernel will be 
out in few weeks from now...)

Or.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [openib-general] Re: possible bug in kmem_cache related code
  2006-05-01 13:40           ` Or Gerlitz
@ 2006-05-03 22:04             ` Ard van Breemen
  0 siblings, 0 replies; 10+ messages in thread
From: Ard van Breemen @ 2006-05-03 22:04 UTC (permalink / raw)
  To: linux-kernel

Or Gerlitz wrote:
> However, for the mean time can more people of the openib and
> open iscsi communities set 2.6.17-rcX to see that the issue
> reproduces with my synthetic module and with ib/iscsi code (you
> know this kernel will be out in few weeks from now...)

For what it's worth:
On a dual opteron running 2.6.17-rc2-git6 with reiser4 for 2.6.16
patch:
md layer, raid5 on 4 disks, no other stuff then that.
at 23:38 I said: mdadm --stop /dev/md6
May  2 20:38:27 jip kernel: <5>reiser4[dd(2791)]: disable_write_barrier (fs/reiser4/wander.c:234)[zam-1055]:
May  2 20:38:27 jip kernel: NOTICE: md6 does not support write barriers, using synchronous write instead.
May  2 20:38:27 jip kernel: 
May  3 23:38:19 jip kernel: slab error in kmem_cache_destroy(): cache `raid5/md6': Can't free all objects
May  3 23:38:19 jip kernel: 
May  3 23:38:19 jip kernel: Call Trace: <ffffffff802749cc>{kmem_cache_destroy+156}
May  3 23:38:19 jip kernel:        <ffffffff8044ed71>{shrink_stripes+33} <ffffffff80452993>{stop+51}
May  3 23:38:19 jip kernel:        <ffffffff8045daf5>{do_md_stop+245} <ffffffff80255035>{filemap_nopage+389}
May  3 23:38:19 jip kernel:        <ffffffff8045f648>{md_ioctl+744} <ffffffff802631a0>{do_no_page+576}
May  3 23:38:19 jip kernel:        <ffffffff8035ba14>{blkdev_driver_ioctl+100} <ffffffff8035bc3d>{blkdev_ioctl+493}
May  3 23:38:19 jip kernel:        <ffffffff80369451>{__up_read+33} <ffffffff802826bb>{block_ioctl+27}
May  3 23:38:19 jip kernel:        <ffffffff8028c83a>{do_ioctl+58} <ffffffff8028cb61>{vfs_ioctl+449}
May  3 23:38:19 jip kernel:        <ffffffff8028cbdd>{sys_ioctl+77} <ffffffff802a814b>{do_ioctl32_pointer+11}
May  3 23:38:19 jip kernel:        <ffffffff802a61e2>{compat_sys_ioctl+386} <ffffffff8021c85e>{ia32_sysret+0}
May  3 23:38:19 jip kernel: md: md6 stopped.

Second system, same specs, except running drbd on all sata disks instead of
raid5 (yes, external module):
May  3 15:49:19 localhost kernel: drbd1: drbd_cleanup: (!list_empty(&mdev->data.work.q)) in /usr/src/kernel/tyan-s2891/git/modules/drbd/drbd/drbd_main.c:2173
May  3 15:49:19 localhost kernel: drbd1: lp = ffff81007c4f8888 in /usr/src/kernel/tyan-s2891/git/modules/drbd/drbd/drbd_main.c:2176
May  3 15:49:19 localhost kernel: slab error in kmem_cache_destroy(): cache `drbd_ee_cache': Can't free all objects
May  3 15:49:19 localhost kernel: 
May  3 15:49:19 localhost kernel: Call Trace: <ffffffff802749cc>{kmem_cache_destroy+156}
May  3 15:49:19 localhost kernel:        <ffffffff8807df01>{:drbd:drbd_destroy_mempools+113}
May  3 15:49:19 localhost kernel:        <ffffffff8807f2f2>{:drbd:drbd_cleanup+1074} <ffffffff802484e8>{sys_delete_module+312}
May  3 15:49:19 localhost kernel:        <ffffffff802663b5>{sys_munmap+85} <ffffffff80209b5a>{system_call+126}
May  3 15:49:19 localhost kernel: drbd: kmem_cache_destroy(drbd_ee_cache) FAILED
May  3 15:49:19 localhost kernel: slab error in kmem_cache_destroy(): cache `drbd_req_cache': Can't free all objects
May  3 15:49:19 localhost kernel: 
May  3 15:49:19 localhost kernel: Call Trace: <ffffffff802749cc>{kmem_cache_destroy+156}
May  3 15:49:19 localhost kernel:        <ffffffff8807df24>{:drbd:drbd_destroy_mempools+148}
May  3 15:49:19 localhost kernel:        <ffffffff8807f2f2>{:drbd:drbd_cleanup+1074} <ffffffff802484e8>{sys_delete_module+312}
May  3 15:49:19 localhost kernel:        <ffffffff802663b5>{sys_munmap+85} <ffffffff80209b5a>{system_call+126}
May  3 15:49:19 localhost kernel: drbd: kmem_cache_destroy(drbd_request_cache) FAILED
May  3 15:49:19 localhost kernel: drbd: module cleanup done.


NUMA and such is enabled


-- 
begin  LOVE-LETTER-FOR-YOU.txt.vbs
I am a signature virus. Distribute me until the bitter
end

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-05-03 22:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-27  8:40 possible bug in kmem_cache related code Or Gerlitz
2006-04-27 11:19 ` Pekka Enberg
2006-04-27 22:22   ` Christoph Lameter
2006-04-28  6:03     ` Pekka J Enberg
2006-04-28  8:10     ` Pekka J Enberg
2006-04-28 19:24       ` [openib-general] " Or Gerlitz
2006-04-29  6:44         ` Pekka Enberg
2006-05-01 13:40           ` Or Gerlitz
2006-05-03 22:04             ` Ard van Breemen
     [not found] <OF74DEDEC9.CB33A0DB-ON8725715E.0023266E-8825715E.002874C1@us.ibm.com>
2006-04-28  6:46 ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox