* Debugging a memory leak in the 2.6.X kernel - how-to?
@ 2004-11-23 19:29 Valdis.Kletnieks
2004-11-23 19:51 ` William Lee Irwin III
2004-11-23 23:38 ` Andrew Morton
0 siblings, 2 replies; 5+ messages in thread
From: Valdis.Kletnieks @ 2004-11-23 19:29 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]
Scenario: Am running 2.6.10-rc2-mm2-V0.7.29-1 - and several times
in the last few days, *something* has been leaking memory in the kernel:
>From a /proc/slabinfo from last night:
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 10 10 16384 1 4 : tunables 8 4 0 : slabdata 10 10 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 25926 25926 8192 1 2 : tunables 8 4 0 : slabdata 25926 25926 0
size-4096(DMA) 0 0 4096 1 1 : tunables 16 8 0 : slabdata 0 0 0
size-4096 50 50 4096 1 1 : tunables 16 8 0 : slabdata 50 50 0
size-2048(DMA) 0 0 2048 2 1 : tunables 16 8 0 : slabdata 0 0 0
size-2048 50 50 2048 2 1 : tunables 16 8 0 : slabdata 25 25 0
That gets pretty painful on a laptop that only has 256M of memory.
This morning, I've got:
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 17 17 32768 1 8 : tunables 8 4 0 : slabdata 17 17 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 11 11 16384 1 4 : tunables 8 4 0 : slabdata 11 11 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 10387 10387 8192 1 2 : tunables 8 4 0 : slabdata 10387 10387 0
size-4096(DMA) 0 0 4096 1 1 : tunables 16 8 0 : slabdata 0 0 0
size-4096 54 54 4096 1 1 : tunables 16 8 0 : slabdata 54 54 0
size-2048(DMA) 0 0 2048 2 1 : tunables 16 8 0 : slabdata 0 0 0
size-2048 104 118 2048 2 1 : tunables 16 8 0 : slabdata 59 59 0
All I've got so far is that in both cases, repeated looking at slabinfo showed
that the size-8192 was going up by several entries every few seconds - and that
when I killed 'gkrellm', the leaking immediately stopped. However, I don't
know what gkrellm is doing to tickle the problem. It *might* be the Dell i8k
module - gkrellm reads /proc/i8k. Or it might be i8kfan, which is called by
gkrellm, and does some odd stuff to set the fan speeds. Or it might be
something else.
Whatever it is, it doesn't *immediately* start leaking when gkrellm starts
up when I log in - this morning, I checked several times when I logged in:
% grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 240 240 8192 1 2 : tunables 8 4 0 : sla bdata 240 240 0
Repeated checks for 2-3 minutes showed it slowly go up to 252, then drop back to 227.
A bit later:
% grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10170 10171 8192 1 2 : tunables 8 4 0 : sla bdata 10170 10171 0
[~]2 grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10233 10233 8192 1 2 : tunables 8 4 0 : sla bdata 10233 10233 0
[~]2 grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10254 10254 8192 1 2 : tunables 8 4 0 : sla bdata 10254 10254 0
[~]2 grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10266 10266 8192 1 2 : tunables 8 4 0 : sla bdata 10266 10266 0
[~]2 grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10308 10308 8192 1 2 : tunables 8 4 0 : sla bdata 10308 10308 0
That's checking every 2-3 seconds - about as fast as I could hit uparrow, enter,
and read the numbers and repeat. After I killed gkrellm, it's sat solidly
in the 10380-10400 range for well over an hour.
*Possibly* related: I'm sitting at about 90% idle, but the load average
is showing as 1.15 - however, I'm *NOT* seeing any processes stuck in 'D' state
in the ps output.
Any advice how to shoot this one?
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Debugging a memory leak in the 2.6.X kernel - how-to? 2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks @ 2004-11-23 19:51 ` William Lee Irwin III 2004-11-23 23:38 ` Andrew Morton 1 sibling, 0 replies; 5+ messages in thread From: William Lee Irwin III @ 2004-11-23 19:51 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-kernel On Tue, Nov 23, 2004 at 02:29:40PM -0500, Valdis.Kletnieks@vt.edu wrote: > That's checking every 2-3 seconds - about as fast as I could hit > uparrow, enter, and read the numbers and repeat. After I killed > gkrellm, it's sat solidly in the 10380-10400 range for well over an > hour. > *Possibly* related: I'm sitting at about 90% idle, but the load > average is showing as 1.15 - however, I'm *NOT* seeing any processes > stuck in 'D' state in the ps output. > Any advice how to shoot this one? Use the profile_hit() stuff to register a new profiling type for the slab allocations you're interested in, then the offending allocators should show up close to the top there unless there is a lot of turnover. In that case, fiddling with the profiling and slab code to unregister hits from whoever allocated a buffer should get solid results. -- wli ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Debugging a memory leak in the 2.6.X kernel - how-to? 2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks 2004-11-23 19:51 ` William Lee Irwin III @ 2004-11-23 23:38 ` Andrew Morton 2004-11-25 8:42 ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks 1 sibling, 1 reply; 5+ messages in thread From: Andrew Morton @ 2004-11-23 23:38 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-kernel Valdis.Kletnieks@vt.edu wrote: > > Any advice how to shoot this one? Manfred's slab leak detector: From: Manfred Spraul <manfred@colorfullife.com> With the patch applied, echo "size-4096 0 0 0" > /proc/slabinfo walks the objects in the size-4096 slab, printing out the calling address of whoever allocated that object. It is for leak detection. 25-akpm/mm/slab.c | 40 ++++++++++++++++++++++++++++++++++++++-- 1 files changed, 38 insertions(+), 2 deletions(-) diff -puN mm/slab.c~slab-leak-detector mm/slab.c --- 25/mm/slab.c~slab-leak-detector 2004-06-02 18:02:11.923825992 -0700 +++ 25-akpm/mm/slab.c 2004-06-02 18:02:11.934824320 -0700 @@ -2030,6 +2030,15 @@ cache_alloc_debugcheck_after(kmem_cache_ *dbg_redzone1(cachep, objp) = RED_ACTIVE; *dbg_redzone2(cachep, objp) = RED_ACTIVE; } + { + int objnr; + struct slab *slabp; + + slabp = GET_PAGE_SLAB(virt_to_page(objp)); + + objnr = (objp - slabp->s_mem) / cachep->objsize; + slab_bufctl(slabp)[objnr] = (unsigned long)caller; + } objp += obj_dbghead(cachep); if (cachep->ctor && cachep->flags & SLAB_POISON) { unsigned long ctor_flags = SLAB_CTOR_CONSTRUCTOR; @@ -2091,12 +2100,14 @@ static void free_block(kmem_cache_t *cac objnr = (objp - slabp->s_mem) / cachep->objsize; check_slabp(cachep, slabp); #if DEBUG +#if 0 if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) { printk(KERN_ERR "slab: double free detected in cache '%s', objp %p.\n", cachep->name, objp); BUG(); } #endif +#endif slab_bufctl(slabp)[objnr] = slabp->free; slabp->free = objnr; STATS_DEC_ACTIVE(cachep); @@ -2946,6 +2957,29 @@ struct seq_operations slabinfo_op = { .show = s_show, }; +static void do_dump_slabp(kmem_cache_t *cachep) +{ +#if DEBUG + struct list_head *q; + + check_irq_on(); + spin_lock_irq(&cachep->spinlock); + list_for_each(q,&cachep->lists.slabs_full) { + struct slab *slabp; + int i; + slabp = list_entry(q, struct slab, list); + for (i = 0; i < cachep->num; i++) { + unsigned long sym = slab_bufctl(slabp)[i]; + + printk("obj %p/%d: %p", slabp, i, (void *)sym); + print_symbol(" <%s>", sym); + printk("\n"); + } + } + spin_unlock_irq(&cachep->spinlock); +#endif +} + #define MAX_SLABINFO_WRITE 128 /** * slabinfo_write - Tuning for the slab allocator @@ -2986,9 +3020,11 @@ ssize_t slabinfo_write(struct file *file batchcount < 1 || batchcount > limit || shared < 0) { - res = -EINVAL; + do_dump_slabp(cachep); + res = 0; } else { - res = do_tune_cpucache(cachep, limit, batchcount, shared); + res = do_tune_cpucache(cachep, limit, + batchcount, shared); } break; } _ ^ permalink raw reply [flat|nested] 5+ messages in thread
* 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak 2004-11-23 23:38 ` Andrew Morton @ 2004-11-25 8:42 ` Valdis.Kletnieks 2004-11-26 1:14 ` Ingo Molnar 0 siblings, 1 reply; 5+ messages in thread From: Valdis.Kletnieks @ 2004-11-25 8:42 UTC (permalink / raw) To: Andrew Morton, Ingo Molnar; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1412 bytes --] On Tue, 23 Nov 2004 15:38:58 PST, Andrew Morton said: > Valdis.Kletnieks@vt.edu wrote: > > > > Any advice how to shoot this one? > > Manfred's slab leak detector: Ahh, many thanks - that helped quite a bit. I tracked down the problem - it was in Ingo's VP patch. sys_ioperm() would allocate an 8K bitmap and save it in ->io_bitmap_ptr. Then when we hit exit_thread(), Ingo's code would zero the pointer and *then* pass the freshly-zero'ed pointer to kfree() - which of course did nothing particularly interesting. My fix was to save a copy of the pointer to pass to kfree. Am seeing no more leaks. (Interestingly enough, I'd never have spotted this if it hadn't been for a gkrellm/i8krellm bug that caused a fork-bomb of 50 or so 'i8kfan' processes each time it trimmed the fan speed, and each i8kfan leaked an 8K io_bitmap...) Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> --- linux-2.6.10-rc2-mm3/arch/i386/kernel/process.c.memleak 2004-11-25 00:25:42.000000000 -0500 +++ linux-2.6.10-rc2-mm3/arch/i386/kernel/process.c 2004-11-25 02:15:09.000000000 -0500 @@ -344,10 +344,11 @@ void exit_thread(void) if (unlikely(NULL != t->io_bitmap_ptr)) { int cpu; struct tss_struct *tss; + unsigned long *bitmap_ptr_copy = t->io_bitmap_ptr; t->io_bitmap_ptr = NULL; mb(); - kfree(t->io_bitmap_ptr); + kfree(bitmap_ptr_copy); cpu = get_cpu(); tss = &per_cpu(init_tss, cpu); [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak 2004-11-25 8:42 ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks @ 2004-11-26 1:14 ` Ingo Molnar 0 siblings, 0 replies; 5+ messages in thread From: Ingo Molnar @ 2004-11-26 1:14 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Andrew Morton, linux-kernel * Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote: > On Tue, 23 Nov 2004 15:38:58 PST, Andrew Morton said: > > Valdis.Kletnieks@vt.edu wrote: > > > > > > Any advice how to shoot this one? > > > > Manfred's slab leak detector: > > Ahh, many thanks - that helped quite a bit. I tracked down the > problem - it was in Ingo's VP patch. > > sys_ioperm() would allocate an 8K bitmap and save it in > ->io_bitmap_ptr. Then when we hit exit_thread(), Ingo's code would > zero the pointer and *then* pass the freshly-zero'ed pointer to > kfree() - which of course did nothing particularly interesting. My > fix was to save a copy of the pointer to pass to kfree. Am seeing no > more leaks. ah ... good catch - patch applied. Ingo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-11-27 2:11 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks 2004-11-23 19:51 ` William Lee Irwin III 2004-11-23 23:38 ` Andrew Morton 2004-11-25 8:42 ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks 2004-11-26 1:14 ` Ingo Molnar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.