Debugging a memory leak in the 2.6.X kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

* Debugging a memory leak in the 2.6.X kernel - how-to?
@ 2004-11-23 19:29 Valdis.Kletnieks
  2004-11-23 19:51 ` William Lee Irwin III
  2004-11-23 23:38 ` Andrew Morton
  0 siblings, 2 replies; 5+ messages in thread
From: Valdis.Kletnieks @ 2004-11-23 19:29 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]

Scenario: Am running 2.6.10-rc2-mm2-V0.7.29-1 - and several times
in the last few days, *something* has been leaking memory in the kernel:

>From a /proc/slabinfo from last night:
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384            10     10  16384    1    4 : tunables    8    4    0 : slabdata     10     10      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192          25926  25926   8192    1    2 : tunables    8    4    0 : slabdata  25926  25926      0
size-4096(DMA)         0      0   4096    1    1 : tunables   16    8    0 : slabdata      0      0      0
size-4096             50     50   4096    1    1 : tunables   16    8    0 : slabdata     50     50      0
size-2048(DMA)         0      0   2048    2    1 : tunables   16    8    0 : slabdata      0      0      0
size-2048             50     50   2048    2    1 : tunables   16    8    0 : slabdata     25     25      0

That gets pretty painful on a laptop that only has 256M of memory.

This morning, I've got:

size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768            17     17  32768    1    8 : tunables    8    4    0 : slabdata     17     17      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384            11     11  16384    1    4 : tunables    8    4    0 : slabdata     11     11      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192          10387  10387   8192    1    2 : tunables    8    4    0 : slabdata  10387  10387      0
size-4096(DMA)         0      0   4096    1    1 : tunables   16    8    0 : slabdata      0      0      0
size-4096             54     54   4096    1    1 : tunables   16    8    0 : slabdata     54     54      0
size-2048(DMA)         0      0   2048    2    1 : tunables   16    8    0 : slabdata      0      0      0
size-2048            104    118   2048    2    1 : tunables   16    8    0 : slabdata     59     59      0

All I've got so far is that in both cases, repeated looking at slabinfo showed
that the size-8192 was going up by several entries every few seconds - and that
when I killed 'gkrellm', the leaking immediately stopped.  However, I don't
know what gkrellm is doing to tickle the problem.  It *might* be the Dell i8k
module - gkrellm reads /proc/i8k.  Or it might be i8kfan, which is called by
gkrellm, and does some odd stuff to set the fan speeds.  Or it might be
something else.

Whatever it is, it doesn't *immediately* start leaking when gkrellm starts
up when I log in - this morning, I checked several times when I logged in:

% grep 8192 /proc/slabinfo 
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : sla                    bdata      0      0      0
size-8192            240    240   8192    1    2 : tunables    8    4    0 : sla                    bdata    240    240      0

Repeated checks for 2-3 minutes showed it slowly go up to 252, then drop back to 227.

A bit later:

% grep 8192 /proc/slabinfo 
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : sla                    bdata      0      0      0
size-8192          10170  10171   8192    1    2 : tunables    8    4    0 : sla                    bdata  10170  10171      0
[~]2 grep 8192 /proc/slabinfo 
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : sla                    bdata      0      0      0
size-8192          10233  10233   8192    1    2 : tunables    8    4    0 : sla                    bdata  10233  10233      0
[~]2 grep 8192 /proc/slabinfo 
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : sla                    bdata      0      0      0
size-8192          10254  10254   8192    1    2 : tunables    8    4    0 : sla                    bdata  10254  10254      0
[~]2 grep 8192 /proc/slabinfo 
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : sla                    bdata      0      0      0
size-8192          10266  10266   8192    1    2 : tunables    8    4    0 : sla                    bdata  10266  10266      0
[~]2 grep 8192 /proc/slabinfo 
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : sla                    bdata      0      0      0
size-8192          10308  10308   8192    1    2 : tunables    8    4    0 : sla                    bdata  10308  10308      0

That's checking every 2-3 seconds - about as fast as I could hit uparrow, enter,
and read the numbers and repeat.  After I killed gkrellm, it's sat solidly
in the 10380-10400 range for well over an hour.

*Possibly* related: I'm sitting at about 90% idle, but the load average
is showing as 1.15 - however, I'm *NOT* seeing any processes stuck in 'D' state
in the ps output.

Any advice how to shoot this one?

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging a memory leak in the 2.6.X kernel - how-to?
  2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks
@ 2004-11-23 19:51 ` William Lee Irwin III
  2004-11-23 23:38 ` Andrew Morton
  1 sibling, 0 replies; 5+ messages in thread
From: William Lee Irwin III @ 2004-11-23 19:51 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel

On Tue, Nov 23, 2004 at 02:29:40PM -0500, Valdis.Kletnieks@vt.edu wrote:
> That's checking every 2-3 seconds - about as fast as I could hit
> uparrow, enter, and read the numbers and repeat.  After I killed
> gkrellm, it's sat solidly in the 10380-10400 range for well over an
> hour.
> *Possibly* related: I'm sitting at about 90% idle, but the load
> average is showing as 1.15 - however, I'm *NOT* seeing any processes
> stuck in 'D' state in the ps output.
> Any advice how to shoot this one?

Use the profile_hit() stuff to register a new profiling type for the
slab allocations you're interested in, then the offending allocators
should show up close to the top there unless there is a lot of turnover.
In that case, fiddling with the profiling and slab code to unregister
hits from whoever allocated a buffer should get solid results.


-- wli

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Debugging a memory leak in the 2.6.X kernel - how-to?
  2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks
  2004-11-23 19:51 ` William Lee Irwin III
@ 2004-11-23 23:38 ` Andrew Morton
  2004-11-25  8:42   ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2004-11-23 23:38 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel

Valdis.Kletnieks@vt.edu wrote:
>
> Any advice how to shoot this one?

Manfred's slab leak detector:


From: Manfred Spraul <manfred@colorfullife.com>

With the patch applied,

	echo "size-4096 0 0 0" > /proc/slabinfo

walks the objects in the size-4096 slab, printing out the calling address
of whoever allocated that object.

It is for leak detection.

 25-akpm/mm/slab.c |   40 ++++++++++++++++++++++++++++++++++++++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff -puN mm/slab.c~slab-leak-detector mm/slab.c
--- 25/mm/slab.c~slab-leak-detector	2004-06-02 18:02:11.923825992 -0700
+++ 25-akpm/mm/slab.c	2004-06-02 18:02:11.934824320 -0700
@@ -2030,6 +2030,15 @@ cache_alloc_debugcheck_after(kmem_cache_
 		*dbg_redzone1(cachep, objp) = RED_ACTIVE;
 		*dbg_redzone2(cachep, objp) = RED_ACTIVE;
 	}
+	{
+		int objnr;
+		struct slab *slabp;
+
+		slabp = GET_PAGE_SLAB(virt_to_page(objp));
+
+		objnr = (objp - slabp->s_mem) / cachep->objsize;
+		slab_bufctl(slabp)[objnr] = (unsigned long)caller;
+	}
 	objp += obj_dbghead(cachep);
 	if (cachep->ctor && cachep->flags & SLAB_POISON) {
 		unsigned long	ctor_flags = SLAB_CTOR_CONSTRUCTOR;
@@ -2091,12 +2100,14 @@ static void free_block(kmem_cache_t *cac
 		objnr = (objp - slabp->s_mem) / cachep->objsize;
 		check_slabp(cachep, slabp);
 #if DEBUG
+#if 0
 		if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) {
 			printk(KERN_ERR "slab: double free detected in cache '%s', objp %p.\n",
 						cachep->name, objp);
 			BUG();
 		}
 #endif
+#endif
 		slab_bufctl(slabp)[objnr] = slabp->free;
 		slabp->free = objnr;
 		STATS_DEC_ACTIVE(cachep);
@@ -2946,6 +2957,29 @@ struct seq_operations slabinfo_op = {
 	.show	= s_show,
 };
 
+static void do_dump_slabp(kmem_cache_t *cachep)
+{
+#if DEBUG
+	struct list_head *q;
+
+	check_irq_on();
+	spin_lock_irq(&cachep->spinlock);
+	list_for_each(q,&cachep->lists.slabs_full) {
+		struct slab *slabp;
+		int i;
+		slabp = list_entry(q, struct slab, list);
+		for (i = 0; i < cachep->num; i++) {
+			unsigned long sym = slab_bufctl(slabp)[i];
+
+			printk("obj %p/%d: %p", slabp, i, (void *)sym);
+			print_symbol(" <%s>", sym);
+			printk("\n");
+		}
+	}
+	spin_unlock_irq(&cachep->spinlock);
+#endif
+}
+
 #define MAX_SLABINFO_WRITE 128
 /**
  * slabinfo_write - Tuning for the slab allocator
@@ -2986,9 +3020,11 @@ ssize_t slabinfo_write(struct file *file
 			    batchcount < 1 ||
 			    batchcount > limit ||
 			    shared < 0) {
-				res = -EINVAL;
+				do_dump_slabp(cachep);
+				res = 0;
 			} else {
-				res = do_tune_cpucache(cachep, limit, batchcount, shared);
+				res = do_tune_cpucache(cachep, limit,
+							batchcount, shared);
 			}
 			break;
 		}
_


^ permalink raw reply	[flat|nested] 5+ messages in thread

* 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak
  2004-11-23 23:38 ` Andrew Morton
@ 2004-11-25  8:42   ` Valdis.Kletnieks
  2004-11-26  1:14     ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Valdis.Kletnieks @ 2004-11-25  8:42 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1412 bytes --]

On Tue, 23 Nov 2004 15:38:58 PST, Andrew Morton said:
> Valdis.Kletnieks@vt.edu wrote:
> >
> > Any advice how to shoot this one?
> 
> Manfred's slab leak detector:

Ahh, many thanks - that helped quite a bit.  I tracked down the problem -
it was in Ingo's VP patch.

sys_ioperm() would allocate an 8K bitmap and save it in ->io_bitmap_ptr.
Then when we hit exit_thread(), Ingo's code would zero the pointer and *then*
pass the freshly-zero'ed pointer to kfree() - which of course did nothing
particularly interesting.  My fix was to save a copy of the pointer to
pass to kfree.  Am seeing no more leaks.

(Interestingly enough, I'd never have spotted this if it hadn't been for
a gkrellm/i8krellm bug that caused a fork-bomb of 50 or so 'i8kfan' processes
each time it trimmed the fan speed, and each i8kfan leaked an 8K io_bitmap...)

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>

--- linux-2.6.10-rc2-mm3/arch/i386/kernel/process.c.memleak	2004-11-25 00:25:42.000000000 -0500
+++ linux-2.6.10-rc2-mm3/arch/i386/kernel/process.c	2004-11-25 02:15:09.000000000 -0500
@@ -344,10 +344,11 @@ void exit_thread(void)
 	if (unlikely(NULL != t->io_bitmap_ptr)) {
 		int cpu;
 		struct tss_struct *tss;
+		unsigned long *bitmap_ptr_copy = t->io_bitmap_ptr;
 
 		t->io_bitmap_ptr = NULL;
 		mb();
-		kfree(t->io_bitmap_ptr);
+		kfree(bitmap_ptr_copy);
 
 		cpu = get_cpu();
 		tss = &per_cpu(init_tss, cpu);


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak
  2004-11-25  8:42   ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks
@ 2004-11-26  1:14     ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2004-11-26  1:14 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Andrew Morton, linux-kernel


* Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> wrote:

> On Tue, 23 Nov 2004 15:38:58 PST, Andrew Morton said:
> > Valdis.Kletnieks@vt.edu wrote:
> > >
> > > Any advice how to shoot this one?
> > 
> > Manfred's slab leak detector:
> 
> Ahh, many thanks - that helped quite a bit.  I tracked down the
> problem - it was in Ingo's VP patch.
> 
> sys_ioperm() would allocate an 8K bitmap and save it in
> ->io_bitmap_ptr. Then when we hit exit_thread(), Ingo's code would
> zero the pointer and *then* pass the freshly-zero'ed pointer to
> kfree() - which of course did nothing particularly interesting.  My
> fix was to save a copy of the pointer to pass to kfree.  Am seeing no
> more leaks.

ah ... good catch - patch applied.

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-11-27  2:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-23 19:29 Debugging a memory leak in the 2.6.X kernel - how-to? Valdis.Kletnieks
2004-11-23 19:51 ` William Lee Irwin III
2004-11-23 23:38 ` Andrew Morton
2004-11-25  8:42   ` 2.6.10-rc2-mm3-V0.7.31-3 memory leak (was Re: Debugging a memory leak Valdis.Kletnieks
2004-11-26  1:14     ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.