public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849
@ 2005-09-15 16:51 Petr Vandrovec
  2005-09-15 17:33 ` Petr Vandrovec
       [not found] ` <20050916023005.4146e499.akpm@osdl.org>
  0 siblings, 2 replies; 43+ messages in thread
From: Petr Vandrovec @ 2005-09-15 16:51 UTC (permalink / raw)
  To: Linux-kernel

Hello,
   so now once crashes on UP system were sorted out, I tried to
put new kernel on my SMP host - and sorry to say, but it does not
seem to work as advertised :-(  It seems that we somehow got
blocks from CPU#1 into memory blocks on CPU#0, and free_block
complains that caller holds cachep->nodelists[0]->list_lock
while nodeid for block passed to free_block() comes from processor
(and node) #1...

   I cannot find how this happened.  Hopefully somebody else
will know...  Meanwhile I'll try to get rid of PREEMPT, apparently
although it is now masqueraded under 'Low-latency desktop' it
is still somewhat dangerous.  If it is triggered by preempt, that is.
						Thanks,
							Petr Vandrovec


ttyS0 at I/O 0x3f8 (irq = 0) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 0) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378 (0x778), irq 7, dma 3 [PCSPP,TRISTATE,COMPAT,EPP,ECP,DMA]
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and petero2@telia.com
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
     ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
     ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/slab.c:1849
invalid operand: 0000 [1] PREEMPT SMP
CPU 0
Modules linked in:
Pid: 8, comm: events/0 Not tainted 2.6.14-rc1-1619 #1
RIP: 0010:[<ffffffff8016e826>] <ffffffff8016e826>{free_block+294}
RSP: 0000:ffff81007ff21d88  EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000310
RDX: 0000000000000000 RSI: ffff81007ffddd10 RDI: ffff81007ffda080
RBP: ffff81007ffde000 R08: ffff81003ffaed90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff81007ffc9b50
R13: ffff81007ffde048 R14: ffff81007ffda080 R15: ffff81007ffda080
FS:  0000000000000000(0000) GS:ffffffff805fb800(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
Process events/0 (pid: 8, threadinfo ffff81007ff20000, task ffff81003ff8c790)
Stack: 0000000000000000 0000000000000000 0000000000000213 0000000200000000
        ffff81007ffddd10 ffff81007ffddd10 ffff81007ffddce8 0000000000000002
        0000000000000000 ffff81007ffda080
Call Trace:<ffffffff8016fdc7>{drain_array_locked+167} <ffffffff8016feee>{cache_reap+206}
        <ffffffff803a2374>{_spin_lock_irqsave+36} <ffffffff8016fe20>{cache_reap+0}
        <ffffffff8014a1bc>{worker_thread+476} <ffffffff80132610>{default_wake_function+0}
        <ffffffff80132610>{default_wake_function+0} <ffffffff80149fe0>{worker_thread+0}
        <ffffffff8014ebc2>{kthread+146} <ffffffff8010ed12>{child_rip+8}
        <ffffffff80149fe0>{worker_thread+0} <ffffffff8014eb30>{kthread+0}
        <ffffffff8010ed0a>{child_rip+0}

Code: 0f 0b 68 bd aa 3d 80 c2 39 07 48 89 ee 4c 89 ff 4c 8d 75 30
RIP <ffffffff8016e826>{free_block+294} RSP <ffff81007ff21d88>
  <6>note: events/0[8] exited with preempt_count 1
hda: _NEC DVD_RW ND-3500AG, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: WDC WD1200JB-00CRA0, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: max request size: 128KiB
hdc: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(100)
hdc: cache flushes not supported
  hdc: hdc1
libata version 1.12 loaded.
sata_sil version 0.9
ACPI: PCI Interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 177
<and box is dead>


^ permalink raw reply	[flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849
@ 2005-09-23 19:34 Alok Kataria
  2005-09-23 23:57 ` Christoph Lameter
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Alok Kataria @ 2005-09-23 19:34 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Petr Vandrovec, Andrew Morton, linux-kernel, manfred

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

On Wed, 2005-09-21 at 06:33, Christoph Lameter wrote:

Hi Christoph,
I have some doubts over this...

>/On Tue, 20 Sep 2005, Petr Vandrovec wrote:
>
>> slab belonging to node#1, while having acquired lock for cachep belonging
>> to node #0.  Due to this check_spinlock_acquired_node(cachep, nodeid) fails
>> (check_spinlock_acquired_node(cachep, 0) would succeed).
>
>Hmmm. If a node runs out of memory then pages from another node may end up 
>on the slab list of a node. But it seems that free_block cannot handle 
>that properly.
>
>How are you producing the problem?
>
>Could you try the following patch:
>
>---
>
>The numa slab allocator may allocate pages from foreign nodes onto the lists
>for a particular node if a node runs out of memory. Inspecting the slab->nodeid
>field will not reflect that the page is now in use for the slabs of another node.
>/
>
/
/

IMO the slab->nodeid  field just lets us know to which nodes list3 is 
this slab attached, irrespective of the node from
which node the memory was got.
 

>/This patch fixes that issue by adding a node field to free_block so that the caller
>can indicate which node currently uses a slab.
>
>/
>
But the nodeid is already accessible through the slab-descriptor of this 
object, and this nodeid is set in the cache_grow
function.

>/Also removes the check for the current node from kmalloc_cache_node since the
>process may shift later to another node which may lead to an allocation on another
>node than intended.
>/
>
Yeah that is possible, but won't putting a check in __cache_alloc_node 
after disabling the interrupt be better, because 
kmalloc_node/kmem_cache_alloc_node can be called at runtime as well, and 
getting the object directly from the slabs, instead of the arraycaches 
may slow up things.
Thus tweaking the patch a little.


Thanks & Regards,
Alok


[-- Attachment #2: cache_alloc_node.patch --]
[-- Type: text/x-patch, Size: 1880 bytes --]

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>

Index: linux-2.6.13/mm/slab.c
===================================================================
--- linux-2.6.13.orig/mm/slab.c	2005-09-24 00:08:00.221900000 +0530
+++ linux-2.6.13/mm/slab.c	2005-09-24 00:24:12.206645250 +0530
@@ -2507,16 +2507,12 @@
 #define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
 #endif
 
-
-static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+static inline void *____cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
 {
-	unsigned long save_flags;
 	void* objp;
 	struct array_cache *ac;
 
-	cache_alloc_debugcheck_before(cachep, flags);
-
-	local_irq_save(save_flags);
+	check_irq_off();
 	ac = ac_data(cachep);
 	if (likely(ac->avail)) {
 		STATS_INC_ALLOCHIT(cachep);
@@ -2526,6 +2522,18 @@
 		STATS_INC_ALLOCMISS(cachep);
 		objp = cache_alloc_refill(cachep, flags);
 	}
+	return objp;
+}
+
+static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+{
+	unsigned long save_flags;
+	void* objp;
+
+	cache_alloc_debugcheck_before(cachep, flags);
+
+	local_irq_save(save_flags);
+	objp = ____cache_alloc(cachep, flags);
 	local_irq_restore(save_flags);
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, __builtin_return_address(0));
 	return objp;
@@ -2841,7 +2849,7 @@
 	unsigned long save_flags;
 	void *ptr;
 
-	if (nodeid == numa_node_id() || nodeid == -1)
+	if (nodeid == -1)
 		return __cache_alloc(cachep, flags);
 
 	if (unlikely(!cachep->nodelists[nodeid])) {
@@ -2852,6 +2860,8 @@
 
 	cache_alloc_debugcheck_before(cachep, flags);
 	local_irq_save(save_flags);
+	if (nodeid == numa_node_id())
+		____cache_alloc(cachep, flags);
 	ptr = __cache_alloc_node(cachep, flags, nodeid);
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, __builtin_return_address(0));

^ permalink raw reply	[flat|nested] 43+ messages in thread
* Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849
@ 2005-09-25 14:16 Alok Kataria
  2005-09-26 18:00 ` Christoph Lameter
  0 siblings, 1 reply; 43+ messages in thread
From: Alok Kataria @ 2005-09-25 14:16 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Petr Vandrovec, Andrew Morton, linux-kernel, manfred

On Sat, 2005-09-24 at 05:35, Christoph Lameter wrote:
> Comments on the code:
>
> @@ -2852,6 +2860,8 @@
>
>        cache_alloc_debugcheck_before(cachep, flags);
>        local_irq_save(save_flags);
> +      if (nodeid == numa_node_id())
> +              ____cache_alloc(cachep, flags);
>        ptr = __cache_alloc_node(cachep, flags, nodeid);
>
> This should be
>
>                ptr = ___cache_alloc(cachep, flags)
>        else
>                ptr = __cache_alloc_node(...)
>
> right?
>
>        local_irq_restore(save_flags);
>        ptr = cache_alloc_debugcheck_after(cachep, flags, ptr,
> __builtin_return_address(0));

Oh a major blunder !! Updated the patch

--
As pointed by Christoph,  In kmalloc_node we are cheking if, the allocation is for the
same node when interrupts are "on", this may lead to an allocation on another node than intended.
This patch just shifts the check for the current node in __cache_alloc_node when interrupts
are disabled.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Cc : Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.13/mm/slab.c
===================================================================
--- linux-2.6.13.orig/mm/slab.c	2005-09-25 18:48:16.068349500 +0530
+++ linux-2.6.13/mm/slab.c	2005-09-25 18:48:18.484500500 +0530
@@ -2508,16 +2508,12 @@
  #define cache_alloc_debugcheck_after(a,b,objp,d) (objp)
  #endif

-
-static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+static inline void *____cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
  {
-	unsigned long save_flags;
  	void* objp;
  	struct array_cache *ac;

-	cache_alloc_debugcheck_before(cachep, flags);
-
-	local_irq_save(save_flags);
+	check_irq_off();
  	ac = ac_data(cachep);
  	if (likely(ac->avail)) {
  		STATS_INC_ALLOCHIT(cachep);
@@ -2527,6 +2523,18 @@
  		STATS_INC_ALLOCMISS(cachep);
  		objp = cache_alloc_refill(cachep, flags);
  	}
+	return objp;
+}
+
+static inline void *__cache_alloc(kmem_cache_t *cachep, unsigned int __nocast flags)
+{
+	unsigned long save_flags;
+	void* objp;
+
+	cache_alloc_debugcheck_before(cachep, flags);
+
+	local_irq_save(save_flags);
+	objp = ____cache_alloc(cachep, flags);
  	local_irq_restore(save_flags);
  	objp = cache_alloc_debugcheck_after(cachep, flags, objp,
  					__builtin_return_address(0));
@@ -2844,7 +2852,7 @@
  	unsigned long save_flags;
  	void *ptr;

-	if (nodeid == numa_node_id() || nodeid == -1)
+	if (nodeid == -1)
  		return __cache_alloc(cachep, flags);

  	if (unlikely(!cachep->nodelists[nodeid])) {
@@ -2855,7 +2863,10 @@

  	cache_alloc_debugcheck_before(cachep, flags);
  	local_irq_save(save_flags);
-	ptr = __cache_alloc_node(cachep, flags, nodeid);
+	if (nodeid == numa_node_id())
+		ptr = ____cache_alloc(cachep, flags);
+	else
+		ptr = __cache_alloc_node(cachep, flags, nodeid);
  	local_irq_restore(save_flags);
  	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, __builtin_return_address(0));


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2005-09-30 20:23 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-15 16:51 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 Petr Vandrovec
2005-09-15 17:33 ` Petr Vandrovec
     [not found] ` <20050916023005.4146e499.akpm@osdl.org>
     [not found]   ` <432AA00D.4030706@vc.cvut.cz>
     [not found]     ` <20050916230809.789d6b0b.akpm@osdl.org>
2005-09-19 16:02       ` Petr Vandrovec
2005-09-19 18:29         ` Andrew Morton
2005-09-19 18:51           ` Christoph Lameter
2005-09-19 19:28             ` Andrew Morton
2005-09-19 21:20               ` Christoph Lameter
2005-09-20  5:16                 ` Andrew Morton
2005-09-20  8:34                   ` Alok Kataria
2005-09-20 13:58                   ` Petr Vandrovec
2005-09-21  1:03                     ` Christoph Lameter
2005-09-21  1:22                       ` Petr Vandrovec
2005-09-21 15:59                         ` Christoph Lameter
2005-09-22 19:52                           ` Christoph Lameter
2005-09-22 20:01                             ` Andrew Morton
2005-09-22 21:25                               ` Petr Vandrovec
2005-09-22 21:32                                 ` Christoph Lameter
2005-09-22 21:46                                 ` Andrew Morton
2005-09-22 21:54                                   ` Christoph Lameter
2005-09-23  0:25                                     ` Petr Vandrovec
2005-09-28 21:02                     ` Ravikiran G Thirumalai
2005-09-28 22:50                       ` Christoph Lameter
2005-09-29 16:43                       ` Petr Vandrovec
2005-09-29 18:11                         ` Ravikiran G Thirumalai
2005-09-29 18:38                           ` Christoph Lameter
2005-09-30  5:45                         ` Ravikiran G Thirumalai
2005-09-30  6:05                           ` Andrew Morton
2005-09-30  6:28                             ` Ravikiran G Thirumalai
2005-09-30 15:16                               ` Bryan O'Sullivan
2005-09-30 15:57                                 ` Christoph Lameter
2005-09-30 16:45                                   ` Bryan O'Sullivan
2005-09-30 20:11                                 ` Andi Kleen
2005-09-30 20:23                                   ` Ravikiran G Thirumalai
2005-09-30 16:55                           ` Christoph Lameter
2005-09-19 18:56           ` Petr Vandrovec
2005-09-19 19:08             ` Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2005-09-23 19:34 Alok Kataria
2005-09-23 23:57 ` Christoph Lameter
2005-09-24  0:05 ` Christoph Lameter
2005-09-24 12:52 ` Manfred Spraul
2005-09-25 14:16 Alok Kataria
2005-09-26 18:00 ` Christoph Lameter
2005-09-26 19:34   ` Alok Kataria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox