linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell King - ARM Linux <linux@arm.linux.org.uk>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Maxime Bizon <mbizon@freebox.fr>,
	linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
	David Rientjes <rientjes@google.com>
Subject: Re: Recent 3.x kernels: Memory leak causing OOMs
Date: Mon, 17 Mar 2014 08:51:06 +0000	[thread overview]
Message-ID: <20140317085105.GZ21483@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <20140317180748.644d30e2@notabene.brown>

On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> 
> > On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> > > On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> > > > 
> > > > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> > > > 
> > > > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> > > > 
> > > > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > > > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c)  r6:00000520 r5:c277bae0 r4:c68f37c0
> > > > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> > > > 
> > > > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > > > (512MB RAM), but sorry I don't have the full dump anymore.
> > > > 
> > > > I found a slow leaking process, and since I fixed that leak I now have
> > > > uptime better than 7 days, *but* there was definitely some memory left
> > > > when the OOM happened, so it appears to be related to fragmentation.
> > > 
> > > However, that's a side effect, not the cause - and a patch has been
> > > merged to fix that OOM - but that doesn't explain where most of the
> > > memory has gone!
> > > 
> > > I'm presently waiting for the machine to OOM again (it's probably going
> > > to be something like another month) at which point I'll grab the files
> > > people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> > > /proc/slabinfo etc.)
> > 
> > For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
> > OOMs after a month or two of uptime.
> > 
> > Last night, it OOM'd severely again at around 5am... and rebooted soon
> > after so we've lost any hope of recovering anything useful from the
> > machine.
> > 
> > However, the new kernel re-ran the raid check, and...
> > 
> > md: data-check of RAID array md2
> > md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 4194688k.
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> > md: md2: data-check done.
> > md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
> > md: data-check of RAID array md3
> > md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 524544k.
> > md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
> > md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
> > kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md3: data-check done.
> > md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
> > md: data-check of RAID array md5
> > md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10486080k.
> > kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md5: data-check done.
> > md: data-check of RAID array md4
> > md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10486080k.
> > md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md4: data-check done.
> > md: data-check of RAID array md6
> > md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10409472k.
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md6: data-check done.
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > 
> > which totals 3077 of leaks.  So we have a memory leak.  Looking at
> > the kmemleak file:
> > 
> > unreferenced object 0xc3c3f880 (size 256):
> >   comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> >   hex dump (first 32 bytes):
> >     00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0  ................
> >     00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00  ................
> >   backtrace:
> >     [<c008d4f0>] __save_stack_trace+0x34/0x40
> >     [<c008d5f0>] create_object+0xf4/0x214
> >     [<c02da114>] kmemleak_alloc+0x3c/0x6c
> >     [<c008c0d4>] __kmalloc+0xd0/0x124
> >     [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> >     [<c021206c>] r1buf_pool_alloc+0x40/0x148
> >     [<c0061160>] mempool_alloc+0x54/0xfc
> >     [<c0211938>] sync_request+0x168/0x85c
> >     [<c021addc>] md_do_sync+0x75c/0xbc0
> >     [<c021b594>] md_thread+0x138/0x154
> >     [<c0037b48>] kthread+0xb0/0xbc
> >     [<c0013190>] ret_from_fork+0x14/0x24
> >     [<ffffffff>] 0xffffffff
> > 
> > with 3077 of these in the debug file.  3075 are for "md2_resync" and
> > two are for "md4_resync".
> > 
> > /proc/slabinfo shows for this bucket:
> > kmalloc-256         3237   3450    256   15    1 : tunables  120   60    0 : slabdata    230    230      0
> > 
> > but this would only account for about 800kB of memory usage, which itself
> > is insignificant - so this is not the whole story.
> > 
> > It seems that this is the culpret for the allocations:
> >         for (j = pi->raid_disks ; j-- ; ) {
> >                 bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> > 
> > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > bucket.
> > 
> > I don't see anything obvious - it looks like it isn't every raid check
> > which loses bios.  Not quite sure what to make of this right now.
> > 
> 
> I can't see anything obvious either.
> 
> The bios allocated there are stored in a r1_bio and those pointers are never
> changed.
> If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> would complain that the pool wasn't completely freed.
> And when the r1_bio is freed, all the bios are put as well.
> 
> I guess if something was calling bio_get() on the bio, then might stop the
> bio_put from freeing the memory, but I cannot see anything that would do that.
> 
> I've tried testing on a recent mainline kernel and while kmemleak shows about
> 238 leaks from "swapper/0", there are none related to md or bios.
> 
> I'll let it run a while longer and see if anything pops.

I think the interesting detail from the above is that seems a little random
- which suggests some kind of race maybe.  There are three 10G partitions,
but only one of those leaks two BIOs, but a 4G partition leaked 3075 BIOs.
md2 is /usr and md4 is /home.  Maybe it's related to other IO happening
during the check?

The underlying devices for all the raid1s are PATA (IT821x) using the ata
driver.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

  reply	other threads:[~2014-03-17  8:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20140216200503.GN30257@n2100.arm.linux.org.uk>
     [not found] ` <alpine.DEB.2.02.1402161406120.26926@chino.kir.corp.google.com>
     [not found]   ` <20140216225000.GO30257@n2100.arm.linux.org.uk>
     [not found]     ` <1392670951.24429.10.camel@sakura.staff.proxad.net>
     [not found]       ` <20140217210954.GA21483@n2100.arm.linux.org.uk>
2014-03-15 10:19         ` Recent 3.x kernels: Memory leak causing OOMs Russell King - ARM Linux
2014-03-17  7:07           ` NeilBrown
2014-03-17  8:51             ` Russell King - ARM Linux [this message]
2014-03-17 18:18             ` Catalin Marinas
2014-03-17 19:33               ` Russell King - ARM Linux
2014-04-01  9:19                 ` Russell King - ARM Linux
2014-04-01 11:38                   ` Russell King - ARM Linux
2014-04-01 14:04                     ` Russell King - ARM Linux
2014-04-02 23:28                       ` NeilBrown
2014-04-01 15:58                     ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140317085105.GZ21483@n2100.arm.linux.org.uk \
    --to=linux@arm.linux.org.uk \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mbizon@freebox.fr \
    --cc=neilb@suse.de \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).