* Debugging a memory leak in the 2.6.X kernel - how-to?
@ 2004-11-23 19:29 Valdis.Kletnieks
2004-11-23 19:51 ` William Lee Irwin III
2004-11-23 23:38 ` Andrew Morton
0 siblings, 2 replies; 5+ messages in thread
From: Valdis.Kletnieks @ 2004-11-23 19:29 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]
Scenario: Am running 2.6.10-rc2-mm2-V0.7.29-1 - and several times
in the last few days, *something* has been leaking memory in the kernel:
>From a /proc/slabinfo from last night:
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 10 10 16384 1 4 : tunables 8 4 0 : slabdata 10 10 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 25926 25926 8192 1 2 : tunables 8 4 0 : slabdata 25926 25926 0
size-4096(DMA) 0 0 4096 1 1 : tunables 16 8 0 : slabdata 0 0 0
size-4096 50 50 4096 1 1 : tunables 16 8 0 : slabdata 50 50 0
size-2048(DMA) 0 0 2048 2 1 : tunables 16 8 0 : slabdata 0 0 0
size-2048 50 50 2048 2 1 : tunables 16 8 0 : slabdata 25 25 0
That gets pretty painful on a laptop that only has 256M of memory.
This morning, I've got:
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 17 17 32768 1 8 : tunables 8 4 0 : slabdata 17 17 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 11 11 16384 1 4 : tunables 8 4 0 : slabdata 11 11 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 10387 10387 8192 1 2 : tunables 8 4 0 : slabdata 10387 10387 0
size-4096(DMA) 0 0 4096 1 1 : tunables 16 8 0 : slabdata 0 0 0
size-4096 54 54 4096 1 1 : tunables 16 8 0 : slabdata 54 54 0
size-2048(DMA) 0 0 2048 2 1 : tunables 16 8 0 : slabdata 0 0 0
size-2048 104 118 2048 2 1 : tunables 16 8 0 : slabdata 59 59 0
All I've got so far is that in both cases, repeated looking at slabinfo showed
that the size-8192 was going up by several entries every few seconds - and that
when I killed 'gkrellm', the leaking immediately stopped. However, I don't
know what gkrellm is doing to tickle the problem. It *might* be the Dell i8k
module - gkrellm reads /proc/i8k. Or it might be i8kfan, which is called by
gkrellm, and does some odd stuff to set the fan speeds. Or it might be
something else.
Whatever it is, it doesn't *immediately* start leaking when gkrellm starts
up when I log in - this morning, I checked several times when I logged in:
% grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 240 240 8192 1 2 : tunables 8 4 0 : sla bdata 240 240 0
Repeated checks for 2-3 minutes showed it slowly go up to 252, then drop back to 227.
A bit later:
% grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10170 10171 8192 1 2 : tunables 8 4 0 : sla bdata 10170 10171 0
[~]2 grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10233 10233 8192 1 2 : tunables 8 4 0 : sla bdata 10233 10233 0
[~]2 grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10254 10254 8192 1 2 : tunables 8 4 0 : sla bdata 10254 10254 0
[~]2 grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10266 10266 8192 1 2 : tunables 8 4 0 : sla bdata 10266 10266 0
[~]2 grep 8192 /proc/slabinfo
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : sla bdata 0 0 0
size-8192 10308 10308 8192 1 2 : tunables 8 4 0 : sla bdata 10308 10308 0
That's checking every 2-3 seconds - about as fast as I could hit uparrow, enter,
and read the numbers and repeat. After I killed gkrellm, it's sat solidly
in the 10380-10400 range for well over an hour.
*Possibly* related: I'm sitting at about 90% idle, but the load average
is showing as 1.15 - however, I'm *NOT* seeing any processes stuck in 'D' state
in the ps output.
Any advice how to shoot this one?
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Debugging a memory leak in the 2.6.X kernel - how-to?
2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks
@ 2004-11-23 19:51 ` William Lee Irwin III
2004-11-23 23:38 ` Andrew Morton
1 sibling, 0 replies; 5+ messages in thread
From: William Lee Irwin III @ 2004-11-23 19:51 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: linux-kernel
On Tue, Nov 23, 2004 at 02:29:40PM -0500, Valdis.Kletnieks@vt.edu wrote:
> That's checking every 2-3 seconds - about as fast as I could hit
> uparrow, enter, and read the numbers and repeat. After I killed
> gkrellm, it's sat solidly in the 10380-10400 range for well over an
> hour.
> *Possibly* related: I'm sitting at about 90% idle, but the load
> average is showing as 1.15 - however, I'm *NOT* seeing any processes
> stuck in 'D' state in the ps output.
> Any advice how to shoot this one?
Use the profile_hit() stuff to register a new profiling type for the
slab allocations you're interested in, then the offending allocators
should show up close to the top there unless there is a lot of turnover.
In that case, fiddling with the profiling and slab code to unregister
hits from whoever allocated a buffer should get solid results.
-- wli
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Debugging a memory leak in the 2.6.X kernel - how-to?
2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks
2004-11-23 19:51 ` William Lee Irwin III
@ 2004-11-23 23:38 ` Andrew Morton
2004-11-25 8:42 ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks
1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2004-11-23 23:38 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: linux-kernel
Valdis.Kletnieks@vt.edu wrote:
>
> Any advice how to shoot this one?
Manfred's slab leak detector:
From: Manfred Spraul <manfred@colorfullife.com>
With the patch applied,
echo "size-4096 0 0 0" > /proc/slabinfo
walks the objects in the size-4096 slab, printing out the calling address
of whoever allocated that object.
It is for leak detection.
25-akpm/mm/slab.c | 40 ++++++++++++++++++++++++++++++++++++++--
1 files changed, 38 insertions(+), 2 deletions(-)
diff -puN mm/slab.c~slab-leak-detector mm/slab.c
--- 25/mm/slab.c~slab-leak-detector 2004-06-02 18:02:11.923825992 -0700
+++ 25-akpm/mm/slab.c 2004-06-02 18:02:11.934824320 -0700
@@ -2030,6 +2030,15 @@ cache_alloc_debugcheck_after(kmem_cache_
*dbg_redzone1(cachep, objp) = RED_ACTIVE;
*dbg_redzone2(cachep, objp) = RED_ACTIVE;
}
+ {
+ int objnr;
+ struct slab *slabp;
+
+ slabp = GET_PAGE_SLAB(virt_to_page(objp));
+
+ objnr = (objp - slabp->s_mem) / cachep->objsize;
+ slab_bufctl(slabp)[objnr] = (unsigned long)caller;
+ }
objp += obj_dbghead(cachep);
if (cachep->ctor && cachep->flags & SLAB_POISON) {
unsigned long ctor_flags = SLAB_CTOR_CONSTRUCTOR;
@@ -2091,12 +2100,14 @@ static void free_block(kmem_cache_t *cac
objnr = (objp - slabp->s_mem) / cachep->objsize;
check_slabp(cachep, slabp);
#if DEBUG
+#if 0
if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) {
printk(KERN_ERR "slab: double free detected in cache '%s', objp %p.\n",
cachep->name, objp);
BUG();
}
#endif
+#endif
slab_bufctl(slabp)[objnr] = slabp->free;
slabp->free = objnr;
STATS_DEC_ACTIVE(cachep);
@@ -2946,6 +2957,29 @@ struct seq_operations slabinfo_op = {
.show = s_show,
};
+static void do_dump_slabp(kmem_cache_t *cachep)
+{
+#if DEBUG
+ struct list_head *q;
+
+ check_irq_on();
+ spin_lock_irq(&cachep->spinlock);
+ list_for_each(q,&cachep->lists.slabs_full) {
+ struct slab *slabp;
+ int i;
+ slabp = list_entry(q, struct slab, list);
+ for (i = 0; i < cachep->num; i++) {
+ unsigned long sym = slab_bufctl(slabp)[i];
+
+ printk("obj %p/%d: %p", slabp, i, (void *)sym);
+ print_symbol(" <%s>", sym);
+ printk("\n");
+ }
+ }
+ spin_unlock_irq(&cachep->spinlock);
+#endif
+}
+
#define MAX_SLABINFO_WRITE 128
/**
* slabinfo_write - Tuning for the slab allocator
@@ -2986,9 +3020,11 @@ ssize_t slabinfo_write(struct file *file
batchcount < 1 ||
batchcount > limit ||
shared < 0) {
- res = -EINVAL;
+ do_dump_slabp(cachep);
+ res = 0;
} else {
- res = do_tune_cpucache(cachep, limit, batchcount, shared);
+ res = do_tune_cpucache(cachep, limit,
+ batchcount, shared);
}
break;
}
_
^ permalink raw reply [flat|nested] 5+ messages in thread
* 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak
2004-11-23 23:38 ` Andrew Morton
@ 2004-11-25 8:42 ` Valdis.Kletnieks
2004-11-26 1:14 ` Ingo Molnar
0 siblings, 1 reply; 5+ messages in thread
From: Valdis.Kletnieks @ 2004-11-25 8:42 UTC (permalink / raw)
To: Andrew Morton, Ingo Molnar; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1412 bytes --]
On Tue, 23 Nov 2004 15:38:58 PST, Andrew Morton said:
> Valdis.Kletnieks@vt.edu wrote:
> >
> > Any advice how to shoot this one?
>
> Manfred's slab leak detector:
Ahh, many thanks - that helped quite a bit. I tracked down the problem -
it was in Ingo's VP patch.
sys_ioperm() would allocate an 8K bitmap and save it in ->io_bitmap_ptr.
Then when we hit exit_thread(), Ingo's code would zero the pointer and *then*
pass the freshly-zero'ed pointer to kfree() - which of course did nothing
particularly interesting. My fix was to save a copy of the pointer to
pass to kfree. Am seeing no more leaks.
(Interestingly enough, I'd never have spotted this if it hadn't been for
a gkrellm/i8krellm bug that caused a fork-bomb of 50 or so 'i8kfan' processes
each time it trimmed the fan speed, and each i8kfan leaked an 8K io_bitmap...)
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
--- linux-2.6.10-rc2-mm3/arch/i386/kernel/process.c.memleak 2004-11-25 00:25:42.000000000 -0500
+++ linux-2.6.10-rc2-mm3/arch/i386/kernel/process.c 2004-11-25 02:15:09.000000000 -0500
@@ -344,10 +344,11 @@ void exit_thread(void)
if (unlikely(NULL != t->io_bitmap_ptr)) {
int cpu;
struct tss_struct *tss;
+ unsigned long *bitmap_ptr_copy = t->io_bitmap_ptr;
t->io_bitmap_ptr = NULL;
mb();
- kfree(t->io_bitmap_ptr);
+ kfree(bitmap_ptr_copy);
cpu = get_cpu();
tss = &per_cpu(init_tss, cpu);
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak
2004-11-25 8:42 ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks
@ 2004-11-26 1:14 ` Ingo Molnar
0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2004-11-26 1:14 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: Andrew Morton, linux-kernel
* Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote:
> On Tue, 23 Nov 2004 15:38:58 PST, Andrew Morton said:
> > Valdis.Kletnieks@vt.edu wrote:
> > >
> > > Any advice how to shoot this one?
> >
> > Manfred's slab leak detector:
>
> Ahh, many thanks - that helped quite a bit. I tracked down the
> problem - it was in Ingo's VP patch.
>
> sys_ioperm() would allocate an 8K bitmap and save it in
> ->io_bitmap_ptr. Then when we hit exit_thread(), Ingo's code would
> zero the pointer and *then* pass the freshly-zero'ed pointer to
> kfree() - which of course did nothing particularly interesting. My
> fix was to save a copy of the pointer to pass to kfree. Am seeing no
> more leaks.
ah ... good catch - patch applied.
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-11-27 2:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks
2004-11-23 19:51 ` William Lee Irwin III
2004-11-23 23:38 ` Andrew Morton
2004-11-25 8:42 ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks
2004-11-26 1:14 ` Ingo Molnar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.