* Re: Recent 3.x kernels: Memory leak causing OOMs
[not found] ` <20140217210954.GA21483@n2100.arm.linux.org.uk>
@ 2014-03-15 10:19 ` Russell King - ARM Linux
2014-03-17 7:07 ` NeilBrown
0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-03-15 10:19 UTC (permalink / raw)
To: Neil Brown, linux-raid, Linus Torvalds
Cc: Maxime Bizon, linux-mm, linux-arm-kernel, David Rientjes
On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> >
> > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> >
> > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> >
> > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c) r6:00000520 r5:c277bae0 r4:c68f37c0
> > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> >
> > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > (512MB RAM), but sorry I don't have the full dump anymore.
> >
> > I found a slow leaking process, and since I fixed that leak I now have
> > uptime better than 7 days, *but* there was definitely some memory left
> > when the OOM happened, so it appears to be related to fragmentation.
>
> However, that's a side effect, not the cause - and a patch has been
> merged to fix that OOM - but that doesn't explain where most of the
> memory has gone!
>
> I'm presently waiting for the machine to OOM again (it's probably going
> to be something like another month) at which point I'll grab the files
> people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> /proc/slabinfo etc.)
For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
OOMs after a month or two of uptime.
Last night, it OOM'd severely again at around 5am... and rebooted soon
after so we've lost any hope of recovering anything useful from the
machine.
However, the new kernel re-ran the raid check, and...
md: data-check of RAID array md2
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 4194688k.
md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
md: md2: data-check done.
md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
md: data-check of RAID array md3
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 524544k.
md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
md: md3: data-check done.
md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
md: data-check of RAID array md5
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 10486080k.
kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
md: md5: data-check done.
md: data-check of RAID array md4
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 10486080k.
md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
md: md4: data-check done.
md: data-check of RAID array md6
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 10409472k.
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
md: md6: data-check done.
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
which totals 3077 of leaks. So we have a memory leak. Looking at
the kmemleak file:
unreferenced object 0xc3c3f880 (size 256):
comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<c008d4f0>] __save_stack_trace+0x34/0x40
[<c008d5f0>] create_object+0xf4/0x214
[<c02da114>] kmemleak_alloc+0x3c/0x6c
[<c008c0d4>] __kmalloc+0xd0/0x124
[<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
[<c021206c>] r1buf_pool_alloc+0x40/0x148
[<c0061160>] mempool_alloc+0x54/0xfc
[<c0211938>] sync_request+0x168/0x85c
[<c021addc>] md_do_sync+0x75c/0xbc0
[<c021b594>] md_thread+0x138/0x154
[<c0037b48>] kthread+0xb0/0xbc
[<c0013190>] ret_from_fork+0x14/0x24
[<ffffffff>] 0xffffffff
with 3077 of these in the debug file. 3075 are for "md2_resync" and
two are for "md4_resync".
/proc/slabinfo shows for this bucket:
kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
but this would only account for about 800kB of memory usage, which itself
is insignificant - so this is not the whole story.
It seems that this is the culpret for the allocations:
for (j = pi->raid_disks ; j-- ; ) {
bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
(12 * 16 = 192) plus the size of struct bio, which would fall into this
bucket.
I don't see anything obvious - it looks like it isn't every raid check
which loses bios. Not quite sure what to make of this right now.
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-15 10:19 ` Recent 3.x kernels: Memory leak causing OOMs Russell King - ARM Linux
@ 2014-03-17 7:07 ` NeilBrown
2014-03-17 8:51 ` Russell King - ARM Linux
2014-03-17 18:18 ` Catalin Marinas
0 siblings, 2 replies; 10+ messages in thread
From: NeilBrown @ 2014-03-17 7:07 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: linux-raid, Linus Torvalds, Maxime Bizon, linux-mm,
linux-arm-kernel, David Rientjes
[-- Attachment #1: Type: text/plain, Size: 7819 bytes --]
On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> > >
> > > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> > >
> > > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> > >
> > > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c) r6:00000520 r5:c277bae0 r4:c68f37c0
> > > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> > >
> > > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > > (512MB RAM), but sorry I don't have the full dump anymore.
> > >
> > > I found a slow leaking process, and since I fixed that leak I now have
> > > uptime better than 7 days, *but* there was definitely some memory left
> > > when the OOM happened, so it appears to be related to fragmentation.
> >
> > However, that's a side effect, not the cause - and a patch has been
> > merged to fix that OOM - but that doesn't explain where most of the
> > memory has gone!
> >
> > I'm presently waiting for the machine to OOM again (it's probably going
> > to be something like another month) at which point I'll grab the files
> > people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> > /proc/slabinfo etc.)
>
> For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
> OOMs after a month or two of uptime.
>
> Last night, it OOM'd severely again at around 5am... and rebooted soon
> after so we've lost any hope of recovering anything useful from the
> machine.
>
> However, the new kernel re-ran the raid check, and...
>
> md: data-check of RAID array md2
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 4194688k.
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> md: md2: data-check done.
> md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
> md: data-check of RAID array md3
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 524544k.
> md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
> md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
> kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md3: data-check done.
> md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
> md: data-check of RAID array md5
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10486080k.
> kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md5: data-check done.
> md: data-check of RAID array md4
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10486080k.
> md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md4: data-check done.
> md: data-check of RAID array md6
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10409472k.
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md6: data-check done.
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
>
> which totals 3077 of leaks. So we have a memory leak. Looking at
> the kmemleak file:
>
> unreferenced object 0xc3c3f880 (size 256):
> comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<c008d4f0>] __save_stack_trace+0x34/0x40
> [<c008d5f0>] create_object+0xf4/0x214
> [<c02da114>] kmemleak_alloc+0x3c/0x6c
> [<c008c0d4>] __kmalloc+0xd0/0x124
> [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> [<c021206c>] r1buf_pool_alloc+0x40/0x148
> [<c0061160>] mempool_alloc+0x54/0xfc
> [<c0211938>] sync_request+0x168/0x85c
> [<c021addc>] md_do_sync+0x75c/0xbc0
> [<c021b594>] md_thread+0x138/0x154
> [<c0037b48>] kthread+0xb0/0xbc
> [<c0013190>] ret_from_fork+0x14/0x24
> [<ffffffff>] 0xffffffff
>
> with 3077 of these in the debug file. 3075 are for "md2_resync" and
> two are for "md4_resync".
>
> /proc/slabinfo shows for this bucket:
> kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
>
> but this would only account for about 800kB of memory usage, which itself
> is insignificant - so this is not the whole story.
>
> It seems that this is the culpret for the allocations:
> for (j = pi->raid_disks ; j-- ; ) {
> bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
>
> Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> (12 * 16 = 192) plus the size of struct bio, which would fall into this
> bucket.
>
> I don't see anything obvious - it looks like it isn't every raid check
> which loses bios. Not quite sure what to make of this right now.
>
I can't see anything obvious either.
The bios allocated there are stored in a r1_bio and those pointers are never
changed.
If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
would complain that the pool wasn't completely freed.
And when the r1_bio is freed, all the bios are put as well.
I guess if something was calling bio_get() on the bio, then might stop the
bio_put from freeing the memory, but I cannot see anything that would do that.
I've tried testing on a recent mainline kernel and while kmemleak shows about
238 leaks from "swapper/0", there are none related to md or bios.
I'll let it run a while longer and see if anything pops.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-17 7:07 ` NeilBrown
@ 2014-03-17 8:51 ` Russell King - ARM Linux
2014-03-17 18:18 ` Catalin Marinas
1 sibling, 0 replies; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-03-17 8:51 UTC (permalink / raw)
To: NeilBrown
Cc: linux-raid, Linus Torvalds, Maxime Bizon, linux-mm,
linux-arm-kernel, David Rientjes
On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>
> > On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> > > On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> > > >
> > > > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> > > >
> > > > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> > > >
> > > > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > > > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c) r6:00000520 r5:c277bae0 r4:c68f37c0
> > > > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> > > >
> > > > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > > > (512MB RAM), but sorry I don't have the full dump anymore.
> > > >
> > > > I found a slow leaking process, and since I fixed that leak I now have
> > > > uptime better than 7 days, *but* there was definitely some memory left
> > > > when the OOM happened, so it appears to be related to fragmentation.
> > >
> > > However, that's a side effect, not the cause - and a patch has been
> > > merged to fix that OOM - but that doesn't explain where most of the
> > > memory has gone!
> > >
> > > I'm presently waiting for the machine to OOM again (it's probably going
> > > to be something like another month) at which point I'll grab the files
> > > people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> > > /proc/slabinfo etc.)
> >
> > For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
> > OOMs after a month or two of uptime.
> >
> > Last night, it OOM'd severely again at around 5am... and rebooted soon
> > after so we've lost any hope of recovering anything useful from the
> > machine.
> >
> > However, the new kernel re-ran the raid check, and...
> >
> > md: data-check of RAID array md2
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 4194688k.
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> > md: md2: data-check done.
> > md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
> > md: data-check of RAID array md3
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 524544k.
> > md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
> > md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
> > kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md3: data-check done.
> > md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
> > md: data-check of RAID array md5
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10486080k.
> > kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md5: data-check done.
> > md: data-check of RAID array md4
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10486080k.
> > md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md4: data-check done.
> > md: data-check of RAID array md6
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10409472k.
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md6: data-check done.
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> >
> > which totals 3077 of leaks. So we have a memory leak. Looking at
> > the kmemleak file:
> >
> > unreferenced object 0xc3c3f880 (size 256):
> > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > hex dump (first 32 bytes):
> > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > backtrace:
> > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > [<c008d5f0>] create_object+0xf4/0x214
> > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > [<c008c0d4>] __kmalloc+0xd0/0x124
> > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > [<c0061160>] mempool_alloc+0x54/0xfc
> > [<c0211938>] sync_request+0x168/0x85c
> > [<c021addc>] md_do_sync+0x75c/0xbc0
> > [<c021b594>] md_thread+0x138/0x154
> > [<c0037b48>] kthread+0xb0/0xbc
> > [<c0013190>] ret_from_fork+0x14/0x24
> > [<ffffffff>] 0xffffffff
> >
> > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > two are for "md4_resync".
> >
> > /proc/slabinfo shows for this bucket:
> > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> >
> > but this would only account for about 800kB of memory usage, which itself
> > is insignificant - so this is not the whole story.
> >
> > It seems that this is the culpret for the allocations:
> > for (j = pi->raid_disks ; j-- ; ) {
> > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> >
> > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > bucket.
> >
> > I don't see anything obvious - it looks like it isn't every raid check
> > which loses bios. Not quite sure what to make of this right now.
> >
>
> I can't see anything obvious either.
>
> The bios allocated there are stored in a r1_bio and those pointers are never
> changed.
> If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> would complain that the pool wasn't completely freed.
> And when the r1_bio is freed, all the bios are put as well.
>
> I guess if something was calling bio_get() on the bio, then might stop the
> bio_put from freeing the memory, but I cannot see anything that would do that.
>
> I've tried testing on a recent mainline kernel and while kmemleak shows about
> 238 leaks from "swapper/0", there are none related to md or bios.
>
> I'll let it run a while longer and see if anything pops.
I think the interesting detail from the above is that seems a little random
- which suggests some kind of race maybe. There are three 10G partitions,
but only one of those leaks two BIOs, but a 4G partition leaked 3075 BIOs.
md2 is /usr and md4 is /home. Maybe it's related to other IO happening
during the check?
The underlying devices for all the raid1s are PATA (IT821x) using the ata
driver.
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-17 7:07 ` NeilBrown
2014-03-17 8:51 ` Russell King - ARM Linux
@ 2014-03-17 18:18 ` Catalin Marinas
2014-03-17 19:33 ` Russell King - ARM Linux
1 sibling, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2014-03-17 18:18 UTC (permalink / raw)
To: NeilBrown
Cc: Russell King - ARM Linux, linux-raid, linux-mm, David Rientjes,
Maxime Bizon, Linus Torvalds, linux-arm-kernel
On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > unreferenced object 0xc3c3f880 (size 256):
> > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > hex dump (first 32 bytes):
> > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > backtrace:
> > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > [<c008d5f0>] create_object+0xf4/0x214
> > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > [<c008c0d4>] __kmalloc+0xd0/0x124
> > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > [<c0061160>] mempool_alloc+0x54/0xfc
> > [<c0211938>] sync_request+0x168/0x85c
> > [<c021addc>] md_do_sync+0x75c/0xbc0
> > [<c021b594>] md_thread+0x138/0x154
> > [<c0037b48>] kthread+0xb0/0xbc
> > [<c0013190>] ret_from_fork+0x14/0x24
> > [<ffffffff>] 0xffffffff
> >
> > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > two are for "md4_resync".
> >
> > /proc/slabinfo shows for this bucket:
> > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> >
> > but this would only account for about 800kB of memory usage, which itself
> > is insignificant - so this is not the whole story.
> >
> > It seems that this is the culpret for the allocations:
> > for (j = pi->raid_disks ; j-- ; ) {
> > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> >
> > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > bucket.
> >
> > I don't see anything obvious - it looks like it isn't every raid check
> > which loses bios. Not quite sure what to make of this right now.
>
> I can't see anything obvious either.
>
> The bios allocated there are stored in a r1_bio and those pointers are never
> changed.
> If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> would complain that the pool wasn't completely freed.
> And when the r1_bio is freed, all the bios are put as well.
It could be a false positive, there are areas that kmemleak doesn't scan
like page allocations and the pointer reference graph it tries to build
would fail.
What's interesting to see is the first few leaks reported as they are
always reported in the order of allocation. In this case, the
bio_kmalloc() returned pointer is stored in r1_bio. Is the r1_bio
reported as a leak as well?
The sync_request() function eventually gets rid of the r1_bio as it is a
variable on the stack. But it is stored in a bio->bi_private variable
and that's where I lost track of where pointers are referenced from.
A simple way to check whether it's a false positive is to do a:
echo dump=<unref obj addr> > /sys/kernel/debug/kmemleak
If an object was reported as a leak but later on kmemleak doesn't know
about it, it means that it was freed and hence a false positive (maybe I
should add this as a warning in kmemleak if certain amount of leaked
objects freeing is detected).
--
Catalin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-17 18:18 ` Catalin Marinas
@ 2014-03-17 19:33 ` Russell King - ARM Linux
2014-04-01 9:19 ` Russell King - ARM Linux
0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-03-17 19:33 UTC (permalink / raw)
To: Catalin Marinas
Cc: NeilBrown, linux-raid, linux-mm, David Rientjes, Maxime Bizon,
Linus Torvalds, linux-arm-kernel
On Mon, Mar 17, 2014 at 06:18:13PM +0000, Catalin Marinas wrote:
> On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> > On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> > > unreferenced object 0xc3c3f880 (size 256):
> > > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > > hex dump (first 32 bytes):
> > > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > > backtrace:
> > > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > > [<c008d5f0>] create_object+0xf4/0x214
> > > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > > [<c008c0d4>] __kmalloc+0xd0/0x124
> > > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > > [<c0061160>] mempool_alloc+0x54/0xfc
> > > [<c0211938>] sync_request+0x168/0x85c
> > > [<c021addc>] md_do_sync+0x75c/0xbc0
> > > [<c021b594>] md_thread+0x138/0x154
> > > [<c0037b48>] kthread+0xb0/0xbc
> > > [<c0013190>] ret_from_fork+0x14/0x24
> > > [<ffffffff>] 0xffffffff
> > >
> > > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > > two are for "md4_resync".
> > >
> > > /proc/slabinfo shows for this bucket:
> > > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> > >
> > > but this would only account for about 800kB of memory usage, which itself
> > > is insignificant - so this is not the whole story.
> > >
> > > It seems that this is the culpret for the allocations:
> > > for (j = pi->raid_disks ; j-- ; ) {
> > > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> > >
> > > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > > bucket.
> > >
> > > I don't see anything obvious - it looks like it isn't every raid check
> > > which loses bios. Not quite sure what to make of this right now.
> >
> > I can't see anything obvious either.
> >
> > The bios allocated there are stored in a r1_bio and those pointers are never
> > changed.
> > If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> > would complain that the pool wasn't completely freed.
> > And when the r1_bio is freed, all the bios are put as well.
>
> It could be a false positive, there are areas that kmemleak doesn't scan
> like page allocations and the pointer reference graph it tries to build
> would fail.
>
> What's interesting to see is the first few leaks reported as they are
> always reported in the order of allocation. In this case, the
> bio_kmalloc() returned pointer is stored in r1_bio. Is the r1_bio
> reported as a leak as well?
I'd assume that something else would likely have a different size.
All leaks are of 256 bytes. Also...
$ grep kmemleak_alloc kmemleak-20140315 -A2 |sort | uniq -c |less
3081 --
3082 [<c008c0d4>] __kmalloc+0xd0/0x124
3082 [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
3082 [<c02da114>] kmemleak_alloc+0x3c/0x6c
seems pretty conclusive that it's just one spot.
> The sync_request() function eventually gets rid of the r1_bio as it is a
> variable on the stack. But it is stored in a bio->bi_private variable
> and that's where I lost track of where pointers are referenced from.
>
> A simple way to check whether it's a false positive is to do a:
>
> echo dump=<unref obj addr> > /sys/kernel/debug/kmemleak
>
> If an object was reported as a leak but later on kmemleak doesn't know
> about it, it means that it was freed and hence a false positive (maybe I
> should add this as a warning in kmemleak if certain amount of leaked
> objects freeing is detected).
So doing that with the above leaked bio produces:
kmemleak: Object 0xc3c3f880 (size 256):
kmemleak: comm "md2_resync", pid 4680, jiffies 638245
kmemleak: min_count = 1
kmemleak: count = 0
kmemleak: flags = 0x3
kmemleak: checksum = 1042746691
kmemleak: backtrace:
[<c008d4f0>] __save_stack_trace+0x34/0x40
[<c008d5f0>] create_object+0xf4/0x214
[<c02da114>] kmemleak_alloc+0x3c/0x6c
[<c008c0d4>] __kmalloc+0xd0/0x124
[<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
[<c021206c>] r1buf_pool_alloc+0x40/0x148
[<c0061160>] mempool_alloc+0x54/0xfc
[<c0211938>] sync_request+0x168/0x85c
[<c021addc>] md_do_sync+0x75c/0xbc0
[<c021b594>] md_thread+0x138/0x154
[<c0037b48>] kthread+0xb0/0xbc
[<c0013190>] ret_from_fork+0x14/0x24
[<ffffffff>] 0xffffffff
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-17 19:33 ` Russell King - ARM Linux
@ 2014-04-01 9:19 ` Russell King - ARM Linux
2014-04-01 11:38 ` Russell King - ARM Linux
0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-04-01 9:19 UTC (permalink / raw)
To: Catalin Marinas, Linus Torvalds
Cc: NeilBrown, linux-raid, linux-mm, David Rientjes, Maxime Bizon,
linux-arm-kernel
Right, so the machine has died again this morning.
Excuse me if I ignore this merge window, I seem to have some debugging
to do on my own because no one else is interested in my reports of this,
and I need to get this fixed. Modern Linux is totally unusable for me
at the moment.
On Mon, Mar 17, 2014 at 07:33:16PM +0000, Russell King - ARM Linux wrote:
> On Mon, Mar 17, 2014 at 06:18:13PM +0000, Catalin Marinas wrote:
> > On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> > > On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> > > <linux@arm.linux.org.uk> wrote:
> > > > unreferenced object 0xc3c3f880 (size 256):
> > > > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > > > hex dump (first 32 bytes):
> > > > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > > > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > > > backtrace:
> > > > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > > > [<c008d5f0>] create_object+0xf4/0x214
> > > > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > > > [<c008c0d4>] __kmalloc+0xd0/0x124
> > > > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > > > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > > > [<c0061160>] mempool_alloc+0x54/0xfc
> > > > [<c0211938>] sync_request+0x168/0x85c
> > > > [<c021addc>] md_do_sync+0x75c/0xbc0
> > > > [<c021b594>] md_thread+0x138/0x154
> > > > [<c0037b48>] kthread+0xb0/0xbc
> > > > [<c0013190>] ret_from_fork+0x14/0x24
> > > > [<ffffffff>] 0xffffffff
> > > >
> > > > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > > > two are for "md4_resync".
> > > >
> > > > /proc/slabinfo shows for this bucket:
> > > > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> > > >
> > > > but this would only account for about 800kB of memory usage, which itself
> > > > is insignificant - so this is not the whole story.
> > > >
> > > > It seems that this is the culpret for the allocations:
> > > > for (j = pi->raid_disks ; j-- ; ) {
> > > > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> > > >
> > > > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > > > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > > > bucket.
> > > >
> > > > I don't see anything obvious - it looks like it isn't every raid check
> > > > which loses bios. Not quite sure what to make of this right now.
> > >
> > > I can't see anything obvious either.
> > >
> > > The bios allocated there are stored in a r1_bio and those pointers are never
> > > changed.
> > > If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> > > would complain that the pool wasn't completely freed.
> > > And when the r1_bio is freed, all the bios are put as well.
> >
> > It could be a false positive, there are areas that kmemleak doesn't scan
> > like page allocations and the pointer reference graph it tries to build
> > would fail.
> >
> > What's interesting to see is the first few leaks reported as they are
> > always reported in the order of allocation. In this case, the
> > bio_kmalloc() returned pointer is stored in r1_bio. Is the r1_bio
> > reported as a leak as well?
>
> I'd assume that something else would likely have a different size.
> All leaks are of 256 bytes. Also...
>
> $ grep kmemleak_alloc kmemleak-20140315 -A2 |sort | uniq -c |less
> 3081 --
> 3082 [<c008c0d4>] __kmalloc+0xd0/0x124
> 3082 [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> 3082 [<c02da114>] kmemleak_alloc+0x3c/0x6c
>
> seems pretty conclusive that it's just one spot.
>
> > The sync_request() function eventually gets rid of the r1_bio as it is a
> > variable on the stack. But it is stored in a bio->bi_private variable
> > and that's where I lost track of where pointers are referenced from.
> >
> > A simple way to check whether it's a false positive is to do a:
> >
> > echo dump=<unref obj addr> > /sys/kernel/debug/kmemleak
> >
> > If an object was reported as a leak but later on kmemleak doesn't know
> > about it, it means that it was freed and hence a false positive (maybe I
> > should add this as a warning in kmemleak if certain amount of leaked
> > objects freeing is detected).
>
> So doing that with the above leaked bio produces:
>
> kmemleak: Object 0xc3c3f880 (size 256):
> kmemleak: comm "md2_resync", pid 4680, jiffies 638245
> kmemleak: min_count = 1
> kmemleak: count = 0
> kmemleak: flags = 0x3
> kmemleak: checksum = 1042746691
> kmemleak: backtrace:
> [<c008d4f0>] __save_stack_trace+0x34/0x40
> [<c008d5f0>] create_object+0xf4/0x214
> [<c02da114>] kmemleak_alloc+0x3c/0x6c
> [<c008c0d4>] __kmalloc+0xd0/0x124
> [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> [<c021206c>] r1buf_pool_alloc+0x40/0x148
> [<c0061160>] mempool_alloc+0x54/0xfc
> [<c0211938>] sync_request+0x168/0x85c
> [<c021addc>] md_do_sync+0x75c/0xbc0
> [<c021b594>] md_thread+0x138/0x154
> [<c0037b48>] kthread+0xb0/0xbc
> [<c0013190>] ret_from_fork+0x14/0x24
> [<ffffffff>] 0xffffffff
>
> --
> FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
> improving, and getting towards what was expected from it.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-04-01 9:19 ` Russell King - ARM Linux
@ 2014-04-01 11:38 ` Russell King - ARM Linux
2014-04-01 14:04 ` Russell King - ARM Linux
2014-04-01 15:58 ` Catalin Marinas
0 siblings, 2 replies; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-04-01 11:38 UTC (permalink / raw)
To: Catalin Marinas, Linus Torvalds
Cc: NeilBrown, linux-raid, linux-mm, David Rientjes, Maxime Bizon,
linux-arm-kernel
On Tue, Apr 01, 2014 at 10:19:59AM +0100, Russell King - ARM Linux wrote:
> On Mon, Mar 17, 2014 at 07:33:16PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Mar 17, 2014 at 06:18:13PM +0000, Catalin Marinas wrote:
> > > On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> > > > On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> > > > <linux@arm.linux.org.uk> wrote:
> > > > > unreferenced object 0xc3c3f880 (size 256):
> > > > > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > > > > hex dump (first 32 bytes):
> > > > > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > > > > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > > > > backtrace:
> > > > > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > > > > [<c008d5f0>] create_object+0xf4/0x214
> > > > > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > > > > [<c008c0d4>] __kmalloc+0xd0/0x124
> > > > > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > > > > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > > > > [<c0061160>] mempool_alloc+0x54/0xfc
> > > > > [<c0211938>] sync_request+0x168/0x85c
> > > > > [<c021addc>] md_do_sync+0x75c/0xbc0
> > > > > [<c021b594>] md_thread+0x138/0x154
> > > > > [<c0037b48>] kthread+0xb0/0xbc
> > > > > [<c0013190>] ret_from_fork+0x14/0x24
> > > > > [<ffffffff>] 0xffffffff
> > > > >
> > > > > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > > > > two are for "md4_resync".
> > > > >
> > > > > /proc/slabinfo shows for this bucket:
> > > > > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> > > > >
> > > > > but this would only account for about 800kB of memory usage, which itself
> > > > > is insignificant - so this is not the whole story.
> > > > >
> > > > > It seems that this is the culpret for the allocations:
> > > > > for (j = pi->raid_disks ; j-- ; ) {
> > > > > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> > > > >
> > > > > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > > > > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > > > > bucket.
> > > > >
> > > > > I don't see anything obvious - it looks like it isn't every raid check
> > > > > which loses bios. Not quite sure what to make of this right now.
I now see something very obvious, having had the problem again, dumped
the physical memory to file, and inspected the full leaked struct bio.
What I find is that the leaked struct bio's have a bi_cnt of one, which
confirms that they were never freed - free'd struct bio's would have a
bi_cnt of zero due to the atomic_dec_and_test() before bio_free() inside
bio_put().
When looking at the bi_inline_vecs, I see that there was a failure to
allocate a page. Now, let's look at what r1buf_pool_alloc() does:
for (j = pi->raid_disks ; j-- ; ) {
bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
if (!bio)
goto out_free_bio;
r1_bio->bios[j] = bio;
}
if (test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery))
j = pi->raid_disks;
else
j = 1;
while(j--) {
bio = r1_bio->bios[j];
bio->bi_vcnt = RESYNC_PAGES;
if (bio_alloc_pages(bio, gfp_flags))
goto out_free_bio;
}
out_free_bio:
while (++j < pi->raid_disks)
bio_put(r1_bio->bios[j]);
r1bio_pool_free(r1_bio, data);
Consider what happens when bio_alloc_pages() fails. j starts off as one
for non-recovery operations, and we enter the loop to allocate the pages.
j is post-decremented to zero. So, bio = r1_bio->bios[0].
bio_alloc_pages(bio) fails, we jump to out_free_bio. The first thing
that does is increment j, so we free from r1_bio->bios[1] up to the
number of raid disks, leaving r1_bio->bios[0] leaked as the r1_bio is
then freed.
The obvious fix is to set j to -1 before jumping to out_free_bio on
bio_alloc_pages() failure. However, that's not the end of the story -
there's more leaks here.
bio_put() will not free the pages allocated by the previously successful
bio_alloc_pages(). What's more is that I don't see any function in BIO
which performs that function, which makes me wonder how many other places
in the kernel dealing with BIOs end up leaking like this.
Anyway, this is what I've come up with - it's not particularly nice,
but hopefully it will plug this leak. I'm now running with this patch
in place, and time will tell.
drivers/md/raid1.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index aacf6bf352d8..604bad2fa442 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -123,8 +123,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
bio = r1_bio->bios[j];
bio->bi_vcnt = RESYNC_PAGES;
- if (bio_alloc_pages(bio, gfp_flags))
- goto out_free_bio;
+ if (bio_alloc_pages(bio, gfp_flags)) {
+ /*
+ * Mark this as having no pages - bio_alloc_pages
+ * removes any it allocated.
+ */
+ bio->bi_vcnt = 0;
+ goto out_free_all_bios;
+ }
}
/* If not user-requests, copy the page pointers to all bios */
if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
@@ -138,9 +144,25 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
return r1_bio;
+out_free_all_bios:
+ j = -1;
out_free_bio:
- while (++j < pi->raid_disks)
- bio_put(r1_bio->bios[j]);
+ while (++j < pi->raid_disks) {
+ bio = r1_bio->bios[j];
+ if (bio->bi_vcnt) {
+ struct bio_vec *bv;
+ int i;
+ /*
+ * Annoyingly, BIO has no way to do this, so we have
+ * to do it manually. Given the trouble here, and
+ * the lack of BIO support for cleaning up, I don't
+ * care about linux/bio.h's comment about this helper.
+ */
+ bio_for_each_segment_all(bv, bio, i)
+ __free_page(bv->bv_page);
+ }
+ bio_put(bio);
+ }
r1bio_pool_free(r1_bio, data);
return NULL;
}
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-04-01 11:38 ` Russell King - ARM Linux
@ 2014-04-01 14:04 ` Russell King - ARM Linux
2014-04-02 23:28 ` NeilBrown
2014-04-01 15:58 ` Catalin Marinas
1 sibling, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-04-01 14:04 UTC (permalink / raw)
To: NeilBrown, Linus Torvalds, Kent Overstreet
Cc: Catalin Marinas, linux-raid, linux-mm, David Rientjes,
Maxime Bizon, linux-arm-kernel
On Tue, Apr 01, 2014 at 12:38:51PM +0100, Russell King - ARM Linux wrote:
> Consider what happens when bio_alloc_pages() fails. j starts off as one
> for non-recovery operations, and we enter the loop to allocate the pages.
> j is post-decremented to zero. So, bio = r1_bio->bios[0].
>
> bio_alloc_pages(bio) fails, we jump to out_free_bio. The first thing
> that does is increment j, so we free from r1_bio->bios[1] up to the
> number of raid disks, leaving r1_bio->bios[0] leaked as the r1_bio is
> then freed.
Neil,
Can you please review commit a07876064a0b7 (block: Add bio_alloc_pages)
which seems to have introduced this bug - it seems to have gone in during
the v3.10 merge window, and looks like it was never reviewed from the
attributations on the commit.
The commit message is brief, and inadequately describes the functional
change that the patch has - we go from "get up to RESYNC_PAGES into the
bio's io_vec" to "get all RESYNC_PAGES or fail completely".
Not withstanding the breakage of the error cleanup paths, is this an
acceptable change of behaviour here?
Thanks.
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-04-01 11:38 ` Russell King - ARM Linux
2014-04-01 14:04 ` Russell King - ARM Linux
@ 2014-04-01 15:58 ` Catalin Marinas
1 sibling, 0 replies; 10+ messages in thread
From: Catalin Marinas @ 2014-04-01 15:58 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Linus Torvalds, NeilBrown, linux-raid@vger.kernel.org,
linux-mm@kvack.org, David Rientjes, Maxime Bizon,
linux-arm-kernel@lists.infradead.org
On Tue, Apr 01, 2014 at 12:38:51PM +0100, Russell King - ARM Linux wrote:
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index aacf6bf352d8..604bad2fa442 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -123,8 +123,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
> bio = r1_bio->bios[j];
> bio->bi_vcnt = RESYNC_PAGES;
>
> - if (bio_alloc_pages(bio, gfp_flags))
> - goto out_free_bio;
> + if (bio_alloc_pages(bio, gfp_flags)) {
> + /*
> + * Mark this as having no pages - bio_alloc_pages
> + * removes any it allocated.
> + */
> + bio->bi_vcnt = 0;
> + goto out_free_all_bios;
> + }
> }
> /* If not user-requests, copy the page pointers to all bios */
> if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
> @@ -138,9 +144,25 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>
> return r1_bio;
>
> +out_free_all_bios:
> + j = -1;
> out_free_bio:
> - while (++j < pi->raid_disks)
> - bio_put(r1_bio->bios[j]);
> + while (++j < pi->raid_disks) {
> + bio = r1_bio->bios[j];
> + if (bio->bi_vcnt) {
> + struct bio_vec *bv;
> + int i;
> + /*
> + * Annoyingly, BIO has no way to do this, so we have
> + * to do it manually. Given the trouble here, and
> + * the lack of BIO support for cleaning up, I don't
> + * care about linux/bio.h's comment about this helper.
> + */
> + bio_for_each_segment_all(bv, bio, i)
> + __free_page(bv->bv_page);
> + }
Do you still need the 'if' block here? bio_for_each_segment_all() checks
for bio->bi_vcnt which was set to 0 above.
--
Catalin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-04-01 14:04 ` Russell King - ARM Linux
@ 2014-04-02 23:28 ` NeilBrown
0 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2014-04-02 23:28 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Linus Torvalds, Kent Overstreet, Catalin Marinas, linux-raid,
linux-mm, David Rientjes, Maxime Bizon, linux-arm-kernel
[-- Attachment #1: Type: text/plain, Size: 3942 bytes --]
On Tue, 1 Apr 2014 15:04:01 +0100 Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Apr 01, 2014 at 12:38:51PM +0100, Russell King - ARM Linux wrote:
> > Consider what happens when bio_alloc_pages() fails. j starts off as one
> > for non-recovery operations, and we enter the loop to allocate the pages.
> > j is post-decremented to zero. So, bio = r1_bio->bios[0].
> >
> > bio_alloc_pages(bio) fails, we jump to out_free_bio. The first thing
> > that does is increment j, so we free from r1_bio->bios[1] up to the
> > number of raid disks, leaving r1_bio->bios[0] leaked as the r1_bio is
> > then freed.
>
> Neil,
>
> Can you please review commit a07876064a0b7 (block: Add bio_alloc_pages)
> which seems to have introduced this bug - it seems to have gone in during
> the v3.10 merge window, and looks like it was never reviewed from the
> attributations on the commit.
>
> The commit message is brief, and inadequately describes the functional
> change that the patch has - we go from "get up to RESYNC_PAGES into the
> bio's io_vec" to "get all RESYNC_PAGES or fail completely".
>
> Not withstanding the breakage of the error cleanup paths, is this an
> acceptable change of behaviour here?
>
> Thanks.
>
Hi Russell,
thanks for finding that bug! - I'm sure I looked at that code, but obviously
missed the problem :-(
Below is the fix that I plan to submit. It is slightly different from yours
but should achieve the same effect. If you could confirm that it looks good
to you I would appreciate it.
Thanks,
NeilBrown
From 72dce88eee7259d65c6eba10c2e0beff357f713b Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Thu, 3 Apr 2014 10:19:12 +1100
Subject: [PATCH] md/raid1: r1buf_pool_alloc: free allocate pages when
subsequent allocation fails.
When performing a user-request check/repair (MD_RECOVERY_REQUEST is set)
on a raid1, we allocate multiple bios each with their own set of pages.
If the page allocations for one bio fails, we currently do *not* free
the pages allocated for the previous bios, nor do we free the bio itself.
This patch frees all the already-allocate pages, and makes sure that
all the bios are freed as well.
This bug can cause a memory leak which can ultimately OOM a machine.
It was introduced in 3.10-rc1.
Fixes: a07876064a0b73ab5ef1ebcf14b1cf0231c07858
Cc: Kent Overstreet <koverstreet@google.com>
Cc: stable@vger.kernel.org (3.10+)
Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4a6ca1cb2e78..56e24c072b62 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -97,6 +97,7 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
struct pool_info *pi = data;
struct r1bio *r1_bio;
struct bio *bio;
+ int need_pages;
int i, j;
r1_bio = r1bio_pool_alloc(gfp_flags, pi);
@@ -119,15 +120,15 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
* RESYNC_PAGES for each bio.
*/
if (test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery))
- j = pi->raid_disks;
+ need_pages = pi->raid_disks;
else
- j = 1;
- while(j--) {
+ need_pages = 1;
+ for (j = 0; j < need_pages; j++) {
bio = r1_bio->bios[j];
bio->bi_vcnt = RESYNC_PAGES;
if (bio_alloc_pages(bio, gfp_flags))
- goto out_free_bio;
+ goto out_free_pages;
}
/* If not user-requests, copy the page pointers to all bios */
if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
@@ -141,6 +142,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
return r1_bio;
+out_free_pages:
+ while (--j >= 0) {
+ struct bio_vec *bv;
+
+ bio_for_each_segment_all(bv, r1_bio->bios[j], i)
+ __free_page(bv->bv_page);
+ }
+
out_free_bio:
while (++j < pi->raid_disks)
bio_put(r1_bio->bios[j]);
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-04-02 23:28 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20140216200503.GN30257@n2100.arm.linux.org.uk>
[not found] ` <alpine.DEB.2.02.1402161406120.26926@chino.kir.corp.google.com>
[not found] ` <20140216225000.GO30257@n2100.arm.linux.org.uk>
[not found] ` <1392670951.24429.10.camel@sakura.staff.proxad.net>
[not found] ` <20140217210954.GA21483@n2100.arm.linux.org.uk>
2014-03-15 10:19 ` Recent 3.x kernels: Memory leak causing OOMs Russell King - ARM Linux
2014-03-17 7:07 ` NeilBrown
2014-03-17 8:51 ` Russell King - ARM Linux
2014-03-17 18:18 ` Catalin Marinas
2014-03-17 19:33 ` Russell King - ARM Linux
2014-04-01 9:19 ` Russell King - ARM Linux
2014-04-01 11:38 ` Russell King - ARM Linux
2014-04-01 14:04 ` Russell King - ARM Linux
2014-04-02 23:28 ` NeilBrown
2014-04-01 15:58 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).