2.6.0-test5/6 (and probably 7 too) size-4096 memory leak

All of lore.kernel.org
 help / color / mirror / Atom feed

* 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak
@ 2003-10-16  2:55 Alberto Bertogli
  2003-10-16  4:19 ` Andrew Morton
  2003-10-17  5:56 ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Alberto Bertogli @ 2003-10-16  2:55 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1028 bytes --]

Hi there!

I want to report a memory leak for 2.6.0-test5 that I've noticed today on
a mail server after 32 days of uptime.

As I'm upgrading it tomorrow to test7 I wasn't going to report this until
verifying if the behaviour continued, but I saw on kernelnewbies that
others were having this issue with test7 too, so I decided to post a
report with the information before I reboot the server.

The attached files are gzipped for space reasons, and were taken at night
when the server isn't very loaded.

The workload is a simple sendmail with ipop3d and imapd, nothing much, for
about 6500 users; the machine is a dual Pentium III with 1gb of RAM and a
couple of SCSI disks.

Slabinfo reports that size-4096 has 104341 active objects and growing.

On another box at home I see the same issue with test6, but "only" with
11612 objects; I'm not posting info on this box as I guess the mailserver
is much more important because the leak is really noticeable.

Please let me know if I can help with anything.

Thanks,
		Alberto

[-- Attachment #2: config.gz --]
[-- Type: application/x-gunzip, Size: 5843 bytes --]

[-- Attachment #3: cpuinfo.gz --]
[-- Type: application/x-gunzip, Size: 298 bytes --]

[-- Attachment #4: free.gz --]
[-- Type: application/x-gunzip, Size: 154 bytes --]

[-- Attachment #5: meminfo.gz --]
[-- Type: application/x-gunzip, Size: 287 bytes --]

[-- Attachment #6: prev_dmesg.gz --]
[-- Type: application/x-gunzip, Size: 5440 bytes --]

[-- Attachment #7: ps_auxfww.gz --]
[-- Type: application/x-gunzip, Size: 1655 bytes --]

[-- Attachment #8: slabinfo.gz --]
[-- Type: application/x-gunzip, Size: 2115 bytes --]

[-- Attachment #9: sysrq+t.gz --]
[-- Type: application/x-gunzip, Size: 8945 bytes --]

[-- Attachment #10: uname.gz --]
[-- Type: application/x-gunzip, Size: 98 bytes --]

[-- Attachment #11: uptime.gz --]
[-- Type: application/x-gunzip, Size: 94 bytes --]

[-- Attachment #12: vmstat_5.gz --]
[-- Type: application/x-gunzip, Size: 1546 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak
  2003-10-16  2:55 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak Alberto Bertogli
@ 2003-10-16  4:19 ` Andrew Morton
  2003-10-16  4:43   ` William Lee Irwin III
  2003-10-17  5:56 ` Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2003-10-16  4:19 UTC (permalink / raw)
  To: Alberto Bertogli; +Cc: linux-kernel

Alberto Bertogli <albertogli@telpin.com.ar> wrote:
>
> I want to report a memory leak for 2.6.0-test5 that I've noticed today on
>  a mail server after 32 days of uptime.
> 
>  As I'm upgrading it tomorrow to test7 I wasn't going to report this until
>  verifying if the behaviour continued, but I saw on kernelnewbies that
>  others were having this issue with test7 too, so I decided to post a
>  report with the information before I reboot the server.
> 
>  The attached files are gzipped for space reasons, and were taken at night
>  when the server isn't very loaded.
> 
>  The workload is a simple sendmail with ipop3d and imapd, nothing much, for
>  about 6500 users; the machine is a dual Pentium III with 1gb of RAM and a
>  couple of SCSI disks.
> 
>  Slabinfo reports that size-4096 has 104341 active objects and growing.
> 
>  On another box at home I see the same issue with test6, but "only" with
>  11612 objects; I'm not posting info on this box as I guess the mailserver
>  is much more important because the leak is really noticeable.

At least I'm not the only one; my main desktop machine does the same
thing.  It leaks two megabytes a day into size-4096, like clockwork.  It's
up to 43 megs now.

I was ignoring it and hoping it would go away.  Ho hum.  Tricky.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak
  2003-10-16  4:19 ` Andrew Morton
@ 2003-10-16  4:43   ` William Lee Irwin III
  2003-10-16  4:58     ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: William Lee Irwin III @ 2003-10-16  4:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alberto Bertogli, linux-kernel

Alberto Bertogli <albertogli@telpin.com.ar> wrote:
>>  Slabinfo reports that size-4096 has 104341 active objects and growing.
>>  On another box at home I see the same issue with test6, but "only" with
>>  11612 objects; I'm not posting info on this box as I guess the mailserver
>>  is much more important because the leak is really noticeable.

On Wed, Oct 15, 2003 at 09:19:18PM -0700, Andrew Morton wrote:
> At least I'm not the only one; my main desktop machine does the same
> thing.  It leaks two megabytes a day into size-4096, like clockwork.  It's
> up to 43 megs now.
> I was ignoring it and hoping it would go away.  Ho hum.  Tricky.

I immediately thought of bundling this in with the do_exit() BUG() and
/proc/ oopsen, but we would see a task_t leak also in that case. I still
say the /proc/ change is swiss cheese (well, in concept there's nothing
wrong with what it wants to do, but there's something definitely wrong
with the implementation since backing it out stops things from oopsing),
but this looks unrelated therefore (which is actually depressing, since
we can't kill all three in one shot and/or get anywhere by correlating).
I should try using a dedicated stack slab to see if they're stacks even
though task_t's aren't leaking.

-- wli

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak
  2003-10-16  4:43   ` William Lee Irwin III
@ 2003-10-16  4:58     ` Andrew Morton
  2003-10-16  5:40       ` Manfred Spraul
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2003-10-16  4:58 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: albertogli, linux-kernel, Manfred Spraul

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> Alberto Bertogli <albertogli@telpin.com.ar> wrote:
> >>  Slabinfo reports that size-4096 has 104341 active objects and growing.
> >>  On another box at home I see the same issue with test6, but "only" with
> >>  11612 objects; I'm not posting info on this box as I guess the mailserver
> >>  is much more important because the leak is really noticeable.
> 
> On Wed, Oct 15, 2003 at 09:19:18PM -0700, Andrew Morton wrote:
> > At least I'm not the only one; my main desktop machine does the same
> > thing.  It leaks two megabytes a day into size-4096, like clockwork.  It's
> > up to 43 megs now.
> > I was ignoring it and hoping it would go away.  Ho hum.  Tricky.
> 
> I immediately thought of bundling this in with the do_exit() BUG() and
> /proc/ oopsen, but we would see a task_t leak also in that case. I still
> say the /proc/ change is swiss cheese (well, in concept there's nothing
> wrong with what it wants to do, but there's something definitely wrong
> with the implementation since backing it out stops things from oopsing),
> but this looks unrelated therefore (which is actually depressing, since
> we can't kill all three in one shot and/or get anywhere by correlating).
> I should try using a dedicated stack slab to see if they're stacks even
> though task_t's aren't leaking.
> 

This leak is at least a couple of months old.

The recent (test7) /proc oops was fixed when Linus reverted the
job-control-in-signal-struct patch.

This leak is of size-4096: it isn't kernel stacks.

I did a quicky audit of all kmalloc(PAGE_SIZE) instances and everything
looked OK: one suspect in the NFS server but I wasn't able to force
speedier bloat by exercising the NFS server in my normal usage pattern.

I'm thinking we need to stuff builtin_return_address(0) into the object and
write a dumper, but I haven't looked into that.  Maybe I can persuade
Manfred to cook up a custom patch to do that?  Just for size-4096?  Something
really crude will be fine.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak
  2003-10-16  4:58     ` Andrew Morton
@ 2003-10-16  5:40       ` Manfred Spraul
  2003-10-16  6:31         ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Manfred Spraul @ 2003-10-16  5:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: William Lee Irwin III, albertogli, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 553 bytes --]

Andrew Morton wrote:

>I'm thinking we need to stuff builtin_return_address(0) into the object and
>write a dumper, but I haven't looked into that.  Maybe I can persuade
>Manfred to cook up a custom patch to do that?  Just for size-4096?  Something
>really crude will be fine.
>  
>
I've attached something: with the patch applied, `echo "size-4096 0 0 0" 
 > /proc/slabinfo` dumps all caller addresses.

It works fine and dumps all 25 outstanding objects of my bochs setup - 
you might have to limit the dumps if you have 100k objects.

--
    Manfred

[-- Attachment #2: patch-slab-extended-last-user --]
[-- Type: text/plain, Size: 1746 bytes --]

--- 2.6/mm/slab.c	2003-10-09 21:23:19.000000000 +0200
+++ build-2.6/mm/slab.c	2003-10-16 07:32:06.000000000 +0200
@@ -1891,6 +1891,15 @@
 		*dbg_redzone1(cachep, objp) = RED_ACTIVE;
 		*dbg_redzone2(cachep, objp) = RED_ACTIVE;
 	}
+	{
+		int objnr;
+		struct slab *slabp;
+
+		slabp = GET_PAGE_SLAB(virt_to_page(objp));
+
+		objnr = (objp - slabp->s_mem) / cachep->objsize;
+		slab_bufctl(slabp)[objnr] = (int)caller;
+	}
 	objp += obj_dbghead(cachep);
 	if (cachep->ctor && cachep->flags & SLAB_POISON) {
 		unsigned long	ctor_flags = SLAB_CTOR_CONSTRUCTOR;
@@ -1952,12 +1961,14 @@
 		objnr = (objp - slabp->s_mem) / cachep->objsize;
 		check_slabp(cachep, slabp);
 #if DEBUG
+#if 0
 		if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) {
 			printk(KERN_ERR "slab: double free detected in cache '%s', objp %p.\n",
 						cachep->name, objp);
 			BUG();
 		}
 #endif
+#endif
 		slab_bufctl(slabp)[objnr] = slabp->free;
 		slabp->free = objnr;
 		STATS_DEC_ACTIVE(cachep);
@@ -2694,6 +2705,22 @@
 	.show	= s_show,
 };
 
+static void do_dump_slabp(kmem_cache_t *cachep)
+{
+	struct list_head *q;
+
+	check_irq_on();
+	spin_lock_irq(&cachep->spinlock);
+	list_for_each(q,&cachep->lists.slabs_full) {
+		struct slab *slabp;
+		int i;
+		slabp = list_entry(q, struct slab, list);
+		for (i=0;i<cachep->num;i++)
+			printk(KERN_DEBUG "obj %p/%d: %p\n", slabp, i, (void*)(slab_bufctl(slabp)[i]));
+	}
+	spin_unlock_irq(&cachep->spinlock);
+}
+
 #define MAX_SLABINFO_WRITE 128
 /**
  * slabinfo_write - Tuning for the slab allocator
@@ -2734,6 +2761,7 @@
 			    batchcount < 1 ||
 			    batchcount > limit ||
 			    shared < 0) {
+				do_dump_slabp(cachep);
 				res = -EINVAL;
 			} else {
 				res = do_tune_cpucache(cachep, limit, batchcount, shared);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak
  2003-10-16  5:40       ` Manfred Spraul
@ 2003-10-16  6:31         ` Andrew Morton
  2003-10-16 15:13           ` Manfred Spraul
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2003-10-16  6:31 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: wli, albertogli, linux-kernel

Manfred Spraul <manfred@colorfullife.com> wrote:
>
> I've attached something: with the patch applied, `echo "size-4096 0 0 0" 
>   > /proc/slabinfo` dumps all caller addresses.

Awesome, thanks.

I added some tweaks (why was it returning -EINVAL?).

Is there any reason why we shouldn't merge this up?



 mm/slab.c |   17 +++++++++++++----
 1 files changed, 13 insertions(+), 4 deletions(-)

diff -puN mm/slab.c~slab-leak-detector-tweaks mm/slab.c
--- 25/mm/slab.c~slab-leak-detector-tweaks	2003-10-15 23:11:19.000000000 -0700
+++ 25-akpm/mm/slab.c	2003-10-15 23:17:12.000000000 -0700
@@ -2708,6 +2708,7 @@ struct seq_operations slabinfo_op = {
 
 static void do_dump_slabp(kmem_cache_t *cachep)
 {
+#if DEBUG
 	struct list_head *q;
 
 	check_irq_on();
@@ -2716,10 +2717,17 @@ static void do_dump_slabp(kmem_cache_t *
 		struct slab *slabp;
 		int i;
 		slabp = list_entry(q, struct slab, list);
-		for (i=0;i<cachep->num;i++)
-			printk(KERN_DEBUG "obj %p/%d: %p\n", slabp, i, (void*)(slab_bufctl(slabp)[i]));
+		for (i = 0; i < cachep->num; i++) {
+			unsigned long sym = slab_bufctl(slabp)[i];
+
+			printk(KERN_DEBUG "obj %p/%d: %p",
+					slabp, i, (void *)sym);
+			print_symbol(" <%s>", sym);
+			printk("\n");
+		}
 	}
 	spin_unlock_irq(&cachep->spinlock);
+#endif
 }
 
 #define MAX_SLABINFO_WRITE 128
@@ -2763,9 +2771,10 @@ ssize_t slabinfo_write(struct file *file
 			    batchcount > limit ||
 			    shared < 0) {
 				do_dump_slabp(cachep);
-				res = -EINVAL;
+				res = 0;
 			} else {
-				res = do_tune_cpucache(cachep, limit, batchcount, shared);
+				res = do_tune_cpucache(cachep, limit,
+							batchcount, shared);
 			}
 			break;
 		}

_


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak
  2003-10-16  6:31         ` Andrew Morton
@ 2003-10-16 15:13           ` Manfred Spraul
  0 siblings, 0 replies; 8+ messages in thread
From: Manfred Spraul @ 2003-10-16 15:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: wli, albertogli, linux-kernel

Andrew Morton wrote:

>Manfred Spraul <manfred@colorfullife.com> wrote:
>  
>
>>I've attached something: with the patch applied, `echo "size-4096 0 0 0" 
>>  > /proc/slabinfo` dumps all caller addresses.
>>    
>>
>
>Awesome, thanks.
>
>I added some tweaks (why was it returning -EINVAL?).
>
>Is there any reason why we shouldn't merge this up?
>  
>
It works only on 32-archs, and I had to disable the double-free 
detection - the bufctl integers were already in use (the hunk with the 
#if 0) for that.


--
    Manfred




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak
  2003-10-16  2:55 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak Alberto Bertogli
  2003-10-16  4:19 ` Andrew Morton
@ 2003-10-17  5:56 ` Andrew Morton
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2003-10-17  5:56 UTC (permalink / raw)
  To: Alberto Bertogli; +Cc: linux-kernel

Alberto Bertogli <albertogli@telpin.com.ar> wrote:
>
> I want to report a memory leak for 2.6.0-test5 that I've noticed today on
>  a mail server after 32 days of uptime.

'twas in ext3.

 fs/jbd/commit.c  |    8 ++++++++
 fs/jbd/journal.c |    2 ++
 2 files changed, 10 insertions(+)

diff -puN fs/jbd/commit.c~jbd-leak-fix fs/jbd/commit.c
--- 25/fs/jbd/commit.c~jbd-leak-fix	2003-10-16 21:47:28.000000000 -0700
+++ 25-akpm/fs/jbd/commit.c	2003-10-16 22:10:58.000000000 -0700
@@ -172,6 +172,14 @@ void journal_commit_transaction(journal_
 	while (commit_transaction->t_reserved_list) {
 		jh = commit_transaction->t_reserved_list;
 		JBUFFER_TRACE(jh, "reserved, unused: refile");
+		/*
+		 * A journal_get_undo_access()+journal_release_buffer() may
+		 * leave undo-committed data.
+		 */
+		if (jh->b_committed_data) {
+			kfree(jh->b_committed_data);
+			jh->b_committed_data = NULL;
+		}
 		journal_refile_buffer(journal, jh);
 	}
 
diff -puN fs/jbd/journal.c~jbd-leak-fix fs/jbd/journal.c
--- 25/fs/jbd/journal.c~jbd-leak-fix	2003-10-16 22:11:45.000000000 -0700
+++ 25-akpm/fs/jbd/journal.c	2003-10-16 22:11:56.000000000 -0700
@@ -1729,6 +1729,8 @@ static void __journal_remove_journal_hea
 			J_ASSERT_BH(bh, buffer_jbd(bh));
 			J_ASSERT_BH(bh, jh2bh(jh) == bh);
 			BUFFER_TRACE(bh, "remove journal_head");
+			J_ASSERT_BH(bh, !jh->b_frozen_data);
+			J_ASSERT_BH(bh, !jh->b_committed_data);
 			bh->b_private = NULL;
 			jh->b_bh = NULL;	/* debug, really */
 			clear_buffer_jbd(bh);

_


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-10-17  5:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-16  2:55 2.6.0-test5/6 (and probably 7 too) size-4096 memory leak Alberto Bertogli
2003-10-16  4:19 ` Andrew Morton
2003-10-16  4:43   ` William Lee Irwin III
2003-10-16  4:58     ` Andrew Morton
2003-10-16  5:40       ` Manfred Spraul
2003-10-16  6:31         ` Andrew Morton
2003-10-16 15:13           ` Manfred Spraul
2003-10-17  5:56 ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.