Re: Recent 3.x kernels: Memory leak causing OOMs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: linux-raid@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Maxime Bizon <mbizon@freebox.fr>,
	linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
	David Rientjes <rientjes@google.com>
Subject: Re: Recent 3.x kernels: Memory leak causing OOMs
Date: Mon, 17 Mar 2014 18:07:48 +1100	[thread overview]
Message-ID: <20140317180748.644d30e2@notabene.brown> (raw)
In-Reply-To: <20140315101952.GT21483@n2100.arm.linux.org.uk>

[-- Attachment #1: Type: text/plain, Size: 7819 bytes --]

On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:

> On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> > > 
> > > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> > > 
> > > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> > > 
> > > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c)  r6:00000520 r5:c277bae0 r4:c68f37c0
> > > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> > > 
> > > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > > (512MB RAM), but sorry I don't have the full dump anymore.
> > > 
> > > I found a slow leaking process, and since I fixed that leak I now have
> > > uptime better than 7 days, *but* there was definitely some memory left
> > > when the OOM happened, so it appears to be related to fragmentation.
> > 
> > However, that's a side effect, not the cause - and a patch has been
> > merged to fix that OOM - but that doesn't explain where most of the
> > memory has gone!
> > 
> > I'm presently waiting for the machine to OOM again (it's probably going
> > to be something like another month) at which point I'll grab the files
> > people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> > /proc/slabinfo etc.)
> 
> For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
> OOMs after a month or two of uptime.
> 
> Last night, it OOM'd severely again at around 5am... and rebooted soon
> after so we've lost any hope of recovering anything useful from the
> machine.
> 
> However, the new kernel re-ran the raid check, and...
> 
> md: data-check of RAID array md2
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 4194688k.
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> md: md2: data-check done.
> md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
> md: data-check of RAID array md3
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 524544k.
> md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
> md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
> kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md3: data-check done.
> md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
> md: data-check of RAID array md5
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10486080k.
> kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md5: data-check done.
> md: data-check of RAID array md4
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10486080k.
> md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md4: data-check done.
> md: data-check of RAID array md6
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10409472k.
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md6: data-check done.
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> 
> which totals 3077 of leaks.  So we have a memory leak.  Looking at
> the kmemleak file:
> 
> unreferenced object 0xc3c3f880 (size 256):
>   comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0  ................
>     00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<c008d4f0>] __save_stack_trace+0x34/0x40
>     [<c008d5f0>] create_object+0xf4/0x214
>     [<c02da114>] kmemleak_alloc+0x3c/0x6c
>     [<c008c0d4>] __kmalloc+0xd0/0x124
>     [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
>     [<c021206c>] r1buf_pool_alloc+0x40/0x148
>     [<c0061160>] mempool_alloc+0x54/0xfc
>     [<c0211938>] sync_request+0x168/0x85c
>     [<c021addc>] md_do_sync+0x75c/0xbc0
>     [<c021b594>] md_thread+0x138/0x154
>     [<c0037b48>] kthread+0xb0/0xbc
>     [<c0013190>] ret_from_fork+0x14/0x24
>     [<ffffffff>] 0xffffffff
> 
> with 3077 of these in the debug file.  3075 are for "md2_resync" and
> two are for "md4_resync".
> 
> /proc/slabinfo shows for this bucket:
> kmalloc-256         3237   3450    256   15    1 : tunables  120   60    0 : slabdata    230    230      0
> 
> but this would only account for about 800kB of memory usage, which itself
> is insignificant - so this is not the whole story.
> 
> It seems that this is the culpret for the allocations:
>         for (j = pi->raid_disks ; j-- ; ) {
>                 bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> 
> Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> (12 * 16 = 192) plus the size of struct bio, which would fall into this
> bucket.
> 
> I don't see anything obvious - it looks like it isn't every raid check
> which loses bios.  Not quite sure what to make of this right now.
> 

I can't see anything obvious either.

The bios allocated there are stored in a r1_bio and those pointers are never
changed.
If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
would complain that the pool wasn't completely freed.
And when the r1_bio is freed, all the bios are put as well.

I guess if something was calling bio_get() on the bio, then might stop the
bio_put from freeing the memory, but I cannot see anything that would do that.

I've tried testing on a recent mainline kernel and while kmemleak shows about
238 leaks from "swapper/0", there are none related to md or bios.

I'll let it run a while longer and see if anything pops.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

WARNING: multiple messages have this Message-ID (diff)

From: neilb@suse.de (NeilBrown)
To: linux-arm-kernel@lists.infradead.org
Subject: Recent 3.x kernels: Memory leak causing OOMs
Date: Mon, 17 Mar 2014 18:07:48 +1100	[thread overview]
Message-ID: <20140317180748.644d30e2@notabene.brown> (raw)
In-Reply-To: <20140315101952.GT21483@n2100.arm.linux.org.uk>

On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:

> On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> > > 
> > > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> > > 
> > > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> > > 
> > > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c)  r6:00000520 r5:c277bae0 r4:c68f37c0
> > > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> > > 
> > > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > > (512MB RAM), but sorry I don't have the full dump anymore.
> > > 
> > > I found a slow leaking process, and since I fixed that leak I now have
> > > uptime better than 7 days, *but* there was definitely some memory left
> > > when the OOM happened, so it appears to be related to fragmentation.
> > 
> > However, that's a side effect, not the cause - and a patch has been
> > merged to fix that OOM - but that doesn't explain where most of the
> > memory has gone!
> > 
> > I'm presently waiting for the machine to OOM again (it's probably going
> > to be something like another month) at which point I'll grab the files
> > people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> > /proc/slabinfo etc.)
> 
> For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
> OOMs after a month or two of uptime.
> 
> Last night, it OOM'd severely again at around 5am... and rebooted soon
> after so we've lost any hope of recovering anything useful from the
> machine.
> 
> However, the new kernel re-ran the raid check, and...
> 
> md: data-check of RAID array md2
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 4194688k.
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> md: md2: data-check done.
> md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
> md: data-check of RAID array md3
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 524544k.
> md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
> md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
> kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md3: data-check done.
> md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
> md: data-check of RAID array md5
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10486080k.
> kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md5: data-check done.
> md: data-check of RAID array md4
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10486080k.
> md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md4: data-check done.
> md: data-check of RAID array md6
> md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10409472k.
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md6: data-check done.
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> 
> which totals 3077 of leaks.  So we have a memory leak.  Looking at
> the kmemleak file:
> 
> unreferenced object 0xc3c3f880 (size 256):
>   comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0  ................
>     00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<c008d4f0>] __save_stack_trace+0x34/0x40
>     [<c008d5f0>] create_object+0xf4/0x214
>     [<c02da114>] kmemleak_alloc+0x3c/0x6c
>     [<c008c0d4>] __kmalloc+0xd0/0x124
>     [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
>     [<c021206c>] r1buf_pool_alloc+0x40/0x148
>     [<c0061160>] mempool_alloc+0x54/0xfc
>     [<c0211938>] sync_request+0x168/0x85c
>     [<c021addc>] md_do_sync+0x75c/0xbc0
>     [<c021b594>] md_thread+0x138/0x154
>     [<c0037b48>] kthread+0xb0/0xbc
>     [<c0013190>] ret_from_fork+0x14/0x24
>     [<ffffffff>] 0xffffffff
> 
> with 3077 of these in the debug file.  3075 are for "md2_resync" and
> two are for "md4_resync".
> 
> /proc/slabinfo shows for this bucket:
> kmalloc-256         3237   3450    256   15    1 : tunables  120   60    0 : slabdata    230    230      0
> 
> but this would only account for about 800kB of memory usage, which itself
> is insignificant - so this is not the whole story.
> 
> It seems that this is the culpret for the allocations:
>         for (j = pi->raid_disks ; j-- ; ) {
>                 bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> 
> Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> (12 * 16 = 192) plus the size of struct bio, which would fall into this
> bucket.
> 
> I don't see anything obvious - it looks like it isn't every raid check
> which loses bios.  Not quite sure what to make of this right now.
> 

I can't see anything obvious either.

The bios allocated there are stored in a r1_bio and those pointers are never
changed.
If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
would complain that the pool wasn't completely freed.
And when the r1_bio is freed, all the bios are put as well.

I guess if something was calling bio_get() on the bio, then might stop the
bio_put from freeing the memory, but I cannot see anything that would do that.

I've tried testing on a recent mainline kernel and while kmemleak shows about
238 leaks from "swapper/0", there are none related to md or bios.

I'll let it run a while longer and see if anything pops.

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140317/505caedf/attachment-0001.sig>

next prev parent reply	other threads:[~2014-03-17  7:07 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-16 20:05 Recent 3.x kernels: Memory leak causing OOMs Russell King - ARM Linux
2014-02-16 20:05 ` Russell King - ARM Linux
2014-02-16 21:43 ` Theodore Ts'o
2014-02-16 21:43   ` Theodore Ts'o
2014-02-17 18:34   ` Catalin Marinas
2014-02-17 18:34     ` Catalin Marinas
2014-02-16 22:17 ` David Rientjes
2014-02-16 22:17   ` David Rientjes
2014-02-16 22:50   ` Russell King - ARM Linux
2014-02-16 22:50     ` Russell King - ARM Linux
2014-02-16 23:42     ` David Rientjes
2014-02-16 23:42       ` David Rientjes
2014-02-17 21:02     ` Maxime Bizon
2014-02-17 21:02       ` Maxime Bizon
2014-02-17 21:09       ` Russell King - ARM Linux
2014-02-17 21:09         ` Russell King - ARM Linux
2014-03-15 10:19         ` Russell King - ARM Linux
2014-03-15 10:19           ` Russell King - ARM Linux
2014-03-17  7:07           ` NeilBrown [this message]
2014-03-17  7:07             ` NeilBrown
2014-03-17  8:51             ` Russell King - ARM Linux
2014-03-17  8:51               ` Russell King - ARM Linux
2014-03-17  8:51               ` Russell King - ARM Linux
2014-03-17 18:18             ` Catalin Marinas
2014-03-17 18:18               ` Catalin Marinas
2014-03-17 19:33               ` Russell King - ARM Linux
2014-03-17 19:33                 ` Russell King - ARM Linux
2014-04-01  9:19                 ` Russell King - ARM Linux
2014-04-01  9:19                   ` Russell King - ARM Linux
2014-04-01 11:38                   ` Russell King - ARM Linux
2014-04-01 11:38                     ` Russell King - ARM Linux
2014-04-01 14:04                     ` Russell King - ARM Linux
2014-04-01 14:04                       ` Russell King - ARM Linux
2014-04-02 23:28                       ` NeilBrown
2014-04-02 23:28                         ` NeilBrown
2014-04-01 15:58                     ` Catalin Marinas
2014-04-01 15:58                       ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140317180748.644d30e2@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mbizon@freebox.fr \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.