* Recent 3.x kernels: Memory leak causing OOMs
@ 2014-02-16 20:05 Russell King - ARM Linux
2014-02-16 21:43 ` Theodore Ts'o
2014-02-16 22:17 ` David Rientjes
0 siblings, 2 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-02-16 20:05 UTC (permalink / raw)
To: linux-arm-kernel, linux-mm
Guys,
I'm seeing a lot of my machines OOM from time to time with recent 3.x
kernels. Systems which previously ended up with 1400+ days uptime with
older kernels barely manage much more than a month.
This can happen with /any/ workload, though it takes a long time with
128MB of SDRAM - the order of 40 to 60 days. Workload may depend on how
quickly it shows.
Obviously, if it takes 1-2 months to show with 128MB of RAM, think how
long it would take with 8GB or more... which is why I doubt it's been
seen yet on x86.
I have one platform here (Versatile PB926) who's sole job is monitoring
the power system supplying the test board - it sits there running a
thread which regularly polls two I2C buses via GPIO bitbang. Userspace
doesn't change its memory requirements very often if at all. No crond,
no atd.
The only other thing which happens is the occasional network connection
to tell it to power a board up or down, or to download the power logs.
Yet somehow, this platform tonight has gone into OOM when I telnetted
to it to get a shell. See the log below.
Now, this runs cramfs with no swap, but that's irrelevent.
I have another machine which OOM'd a week ago with plenty of unused swap
- it uses ext3 on raid1 and is a more busy system. That took 41 days
to show, and upon reboot, it got a kernel with kmemleak enabled. So far,
after 7 days, kmemleak has found nothing at all.
I've previously seen it on my old firewall - which had less RAM, but I
had put that down to the workload not fitting in RAM - now I'm less sure
that was the problem.
CPU doesn't matter - I've now seen it with StrongARMs and ARM926s, so it
can't be low-vectors issues.
It is quite clear to me that there is something really bad with 3.x
kernels. Exactly when this started, I can't say - and with it taking
of the order of a month or more to show, it's not something which can be
bisected.
ash invoked oom-killer: gfp_mask=0x201d0, order=0, oom_score_adj=0
CPU: 0 PID: 598 Comm: ash Tainted: G O 3.13.0-rc3+ #663
Backtrace:
[<c0016da4>] (dump_backtrace) from [<c0017150>] (show_stack+0x18/0x1c)
r6:00000000 r5:00000000 r4:c7b8e000 r3:00400100
[<c0017138>] (show_stack) from [<c0322c94>] (dump_stack+0x20/0x2c)
[<c0322c74>] (dump_stack) from [<c0073d98>] (dump_header.clone.13+0x6c/0x1d8)
[<c0073d2c>] (dump_header.clone.13) from [<c00740f8>] (oom_kill_process+0x9c/0x3bc)
r10:000201d0 r9:00000000 r8:00000000 r7:00000000 r6:00000380 r5:c7b7b800
r4:000201d0
[<c007405c>] (oom_kill_process) from [<c00748d8>] (out_of_memory+0x2d4/0x338)
r10:000201d0 r9:000201d0 r8:00000000 r7:00007556 r6:00006924 r5:c7b7b800
r4:c7b7b800
[<c0074604>] (out_of_memory) from [<c0078408>] (__alloc_pages_nodemask+0x5cc/0x6bc)
r10:00000000 r9:00000000 r8:c04862c0 r7:00000000 r6:00000000 r5:c7b8e000
r4:000201d0
[<c0077e3c>] (__alloc_pages_nodemask) from [<c0071824>] (do_read_cache_page+0x50/0x178)
r10:000200d0 r9:000201d0 r8:c00d414c r7:00000000 r6:c74027dc r5:00000035
r4:00000000
[<c00717d4>] (do_read_cache_page) from [<c007199c>] (read_cache_page_async+0x20/0x28)
r10:00000886 r9:c782dd28 r8:c782dd30 r7:00000132 r6:c74027dc r5:c7b73000
r4:00001070
[<c007197c>] (read_cache_page_async) from [<c010049c>] (cramfs_read+0x104/0x21c)[<c0100398>] (cramfs_read) from [<c0100bdc>] (cramfs_readpage+0xe8/0x1a4)
r10:000c6628 r9:c0a9f4a0 r8:c7b73000 r7:0012f886 r6:00000a4d r5:c7d25000
r4:c0a9f4a0
[<c0100af4>] (cramfs_readpage) from [<c0072bd4>] (filemap_fault+0x298/0x394)
r10:c7b9a318 r8:00000004 r7:c7417ea4 r6:000000b6 r5:c7417ea4 r4:c7bc0cc0
[<c007293c>] (filemap_fault) from [<c008aa3c>] (__do_fault+0xa8/0x490)
r10:c7b9edb8 r9:000005b7 r8:000000b6 r7:000000a8 r6:c7b83d40 r5:00000000
r4:c7b9a318
[<c008a994>] (__do_fault) from [<c008df3c>] (handle_mm_fault+0x3b8/0x854)
r10:00000000 r9:000005b7 r8:000000b6 r7:b6f0c000 r6:c7b83d40 r5:c7bb0430
r4:c7b9a318
[<c008db84>] (handle_mm_fault) from [<c001b3e8>] (do_page_fault+0x12c/0x28c)
r10:80000005 r9:c7b9a318 r8:c7b83d40 r7:000000a8 r6:c78257e0 r5:b6f0ced8
r4:c782dfb0
[<c001b2bc>] (do_page_fault) from [<c001b5e0>] (do_translation_fault+0x24/0xac)
r10:bef2bc70 r9:00009f30 r8:bef2bc6c r7:c782dfb0 r6:b6f0ced8 r5:c0459ad4
r4:80000005
[<c001b5bc>] (do_translation_fault) from [<c00086bc>] (do_PrefetchAbort+0x3c/0xa0)
r7:c782dfb0 r6:b6f0ced8 r5:c0459ad4 r4:00000005
[<c0008680>] (do_PrefetchAbort) from [<c0017fb8>] (ret_from_exception+0x0/0x10)
Exception stack(0xc782dfb0 to 0xc782dff8)
dfa0: 00000003 bef2b920 bef2bb98 00000003
dfc0: bef2bb98 00016658 00016880 000000c3 bef2bc6c 00009f30 bef2bc70 00016880
dfe0: 00000000 bef2b920 b6f05dbc b6f0ced8 60000010 ffffffff
r7:000000c3 r6:ffffffff r5:60000010 r4:b6f0ced8
Mem-info:
Normal per-cpu:
CPU 0: hi: 42, btch: 7 usd: 36
active_anon:28041 inactive_anon:104 isolated_anon:0
active_file:11 inactive_file:11 isolated_file:0
unevictable:0 dirty:1 writeback:6 unstable:0
free:342 slab_reclaimable:170 slab_unreclaimable:570
mapped:13 shmem:139 pagetables:95 bounce:0
free_cma:0
Normal free:1368kB min:1384kB low:1728kB high:2076kB active_anon:112164kB inactive_anon:416kB active_file:44kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:131072kB managed:120152kB mlocked:0kB dirty:4kB
writeback:24kB mapped:52kB shmem:556kB slab_reclaimable:680kB slab_unreclaimable:2280kB kernel_stack:248kB pagetables:380kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:136 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Normal: 4*4kB (R) 1*8kB (R) 0*16kB 0*32kB 1*64kB (R) 0*128kB 1*256kB (R) 0*512kB 1*1024kB (R) 0*2048kB 0*4096kB = 1368kB
161 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
32768 pages of RAM
448 free pages
2730 reserved pages
681 slab pages
261905 pages shared
0 pages swap cached
[ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[ 519] 1 519 425 17 3 0 0 portmap
[ 529] 0 529 480 24 4 0 0 inetd
[ 562] 0 562 397 15 4 0 0 getty
[ 563] 0 563 397 15 3 0 0 getty
[ 564] 0 564 397 15 3 0 0 getty
[ 565] 0 565 397 15 4 0 0 getty
[ 566] 0 566 414 14 4 0 0 user_interface.
[ 570] 0 570 397 15 3 0 0 getty
[ 702] 0 702 543 41 3 0 0 in.telnetd
[ 703] 0 703 1023 57 4 0 0 ash
[ 756] 0 756 28163 27776 57 0 0 ld-linux.so.2
Out of memory: Kill process 756 (ld-linux.so.2) score 896 or sacrifice child
Killed process 756 (ld-linux.so.2) total-vm:112652kB, anon-rss:111052kB, file-rss:52kB
INFO: task ld-linux.so.2:842 blocked for more than 120 seconds.
Tainted: G O 3.13.0-rc3+ #663
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ld-linux.so.2 D c032364c 0 842 1 0x00000000
Backtrace:
[<c03232a4>] (__schedule) from [<c03237d8>] (schedule+0x98/0x9c)
r10:00000001 r9:c7b83054 r8:00000002 r7:c720fdac r6:c0071240 r5:c0aa6060
r4:c720e000 r3:00000000
[<c0323740>] (schedule) from [<c032384c>] (io_schedule+0x70/0xa8)
[<c03237dc>] (io_schedule) from [<c0071250>] (sleep_on_page+0x10/0x18)
r4:c720fda4 r3:0000000d
[<c0071240>] (sleep_on_page) from [<c0323d48>] (__wait_on_bit+0x5c/0xa4)
[<c0323cec>] (__wait_on_bit) from [<c0071544>] (wait_on_page_bit+0x98/0xa8)
r10:07f9518f r8:c71661cc r7:c7290820 r6:c0aa42a0 r5:c720fda4 r4:00000001
[<c00714ac>] (wait_on_page_bit) from [<c010b480>] (nfs_vm_page_mkwrite+0x80/0x158)
r7:c711d600 r6:c0aa42a0 r5:c7bd5318 r4:c74bbad0
[<c010b400>] (nfs_vm_page_mkwrite) from [<c008bd04>] (do_wp_page+0x238/0x7d0)
r7:c7b83020 r6:b6673000 r5:c7bd5318 r4:c0aa42a0
[<c008bacc>] (do_wp_page) from [<c008e320>] (handle_mm_fault+0x79c/0x854)
r10:07f9518f r9:c7b83054 r8:00000001 r7:b6673000 r6:c7b83020 r5:c71661cc
r4:c7bd5318
[<c008db84>] (handle_mm_fault) from [<c001b3e8>] (do_page_fault+0x12c/0x28c)
r10:0000081f r9:c7bd5318 r8:c7b83020 r7:000000a9 r6:c7290820 r5:b667300c
r4:c720ffb0
[<c001b2bc>] (do_page_fault) from [<c0008618>] (do_DataAbort+0x38/0xa0)
r10:00000003 r9:00016944 r8:c720ffb0 r7:c0459a84 r6:b667300c r5:0000000f
r4:0000081f
[<c00085e0>] (do_DataAbort) from [<c0017de0>] (__dabt_usr+0x40/0x60)
Exception stack(0xc720ffb0 to 0xc720fff8)
ffa0: 00019f0a 0000186a 0067daea 00000000
ffc0: 8407ace5 0001bf16 00000000 0000186a becdf9b4 00016944 00000003 becdf980
ffe0: b6673000 becdf958 b667300c 0000a4b4 80000010 ffffffff
r8:becdf9b4 r7:0000186a r6:ffffffff r5:80000010 r4:0000a4b4
1 lock held by ld-linux.so.2/842:
#0: (&mm->mmap_sem){++++++}, at: [<c001b35c>] do_page_fault+0xa0/0x28c
INFO: task ld-linux.so.2:842 blocked for more than 120 seconds.
Tainted: G O 3.13.0-rc3+ #663
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ld-linux.so.2 D c032364c 0 842 1 0x00000000
Backtrace:
[<c03232a4>] (__schedule) from [<c03237d8>] (schedule+0x98/0x9c)
r10:00000001 r9:c7b83054 r8:00000002 r7:c720fdac r6:c0071240 r5:c0aa5728
r4:c720e000 r3:00000000
[<c0323740>] (schedule) from [<c032384c>] (io_schedule+0x70/0xa8)
[<c03237dc>] (io_schedule) from [<c0071250>] (sleep_on_page+0x10/0x18)
r4:c720fda4 r3:0000000d
[<c0071240>] (sleep_on_page) from [<c0323d48>] (__wait_on_bit+0x5c/0xa4)
[<c0323cec>] (__wait_on_bit) from [<c0071544>] (wait_on_page_bit+0x98/0xa8)
r10:05e3e18f r8:c70ac1a4 r7:c7290820 r6:c0a617c0 r5:c720fda4 r4:00000001
[<c00714ac>] (wait_on_page_bit) from [<c010b480>] (nfs_vm_page_mkwrite+0x80/0x158)
r7:c711d280 r6:c0a617c0 r5:c7bd5420 r4:c74bcaf0
[<c010b400>] (nfs_vm_page_mkwrite) from [<c008bd04>] (do_wp_page+0x238/0x7d0)
r7:c7b83020 r6:b5269000 r5:c7bd5420 r4:c0a617c0
[<c008bacc>] (do_wp_page) from [<c008e320>] (handle_mm_fault+0x79c/0x854)
r10:05e3e18f r9:c7b83054 r8:00000001 r7:b5269000 r6:c7b83020 r5:c70ac1a4
r4:c7bd5420
[<c008db84>] (handle_mm_fault) from [<c001b3e8>] (do_page_fault+0x12c/0x28c)
r10:0000081f r9:c7bd5420 r8:c7b83020 r7:000000a9 r6:c7290820 r5:b526900c
r4:c720ffb0
[<c001b2bc>] (do_page_fault) from [<c0008618>] (do_DataAbort+0x38/0xa0)
r10:00000003 r9:00016944 r8:c720ffb0 r7:c0459a84 r6:b526900c r5:0000000f
r4:0000081f
[<c00085e0>] (do_DataAbort) from [<c0017de0>] (__dabt_usr+0x40/0x60)
Exception stack(0xc720ffb0 to 0xc720fff8)
ffa0: 00000000 00000000 00000000 00000000
ffc0: ff9c57d8 00010f33 00000000 00000000 becdf9b4 00016944 00000003 becdf980
ffe0: b5269000 becdf958 b526900c 0000a4b4 80000010 ffffffff
r8:becdf9b4 r7:00000000 r6:ffffffff r5:80000010 r4:0000a4b4
[<c010b400>] (nfs_vm_page_mkwrite) from [<c008bd04>] (do_wp_page+0x238/0x7d0)
r7:c7b83020 r6:b5269000 r5:c7bd5420 r4:c0a617c0
[<c008bacc>] (do_wp_page) from [<c008e320>] (handle_mm_fault+0x79c/0x854)
r10:05e3e18f r9:c7b83054 r8:00000001 r7:b5269000 r6:c7b83020 r5:c70ac1a4
r4:c7bd5420
[<c008db84>] (handle_mm_fault) from [<c001b3e8>] (do_page_fault+0x12c/0x28c)
r10:0000081f r9:c7bd5420 r8:c7b83020 r7:000000a9 r6:c7290820 r5:b526900c
r4:c720ffb0
[<c001b2bc>] (do_page_fault) from [<c0008618>] (do_DataAbort+0x38/0xa0)
r10:00000003 r9:00016944 r8:c720ffb0 r7:c0459a84 r6:b526900c r5:0000000f
r4:0000081f
[<c00085e0>] (do_DataAbort) from [<c0017de0>] (__dabt_usr+0x40/0x60)
Exception stack(0xc720ffb0 to 0xc720fff8)
ffa0: 00000000 00000000 00000000 00000000
ffc0: ff9c57d8 00010f33 00000000 00000000 becdf9b4 00016944 00000003 becdf980
ffe0: b5269000 becdf958 b526900c 0000a4b4 80000010 ffffffff
r8:becdf9b4 r7:00000000 r6:ffffffff r5:80000010 r4:0000a4b4
1 lock held by ld-linux.so.2/842:
#0: (&mm->mmap_sem){++++++}, at: [<c001b35c>] do_page_fault+0xa0/0x28c
nfs: server flint not responding, still trying
nfs: server flint OK
[sched_delayed] sched: RT throttling activated
--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-02-16 20:05 Recent 3.x kernels: Memory leak causing OOMs Russell King - ARM Linux
@ 2014-02-16 21:43 ` Theodore Ts'o
2014-02-17 18:34 ` Catalin Marinas
2014-02-16 22:17 ` David Rientjes
1 sibling, 1 reply; 18+ messages in thread
From: Theodore Ts'o @ 2014-02-16 21:43 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mm
On Sun, Feb 16, 2014 at 08:05:04PM +0000, Russell King - ARM Linux wrote:
> I have another machine which OOM'd a week ago with plenty of unused swap
> - it uses ext3 on raid1 and is a more busy system. That took 41 days
> to show, and upon reboot, it got a kernel with kmemleak enabled. So far,
> after 7 days, kmemleak has found nothing at all.
If kmemleak doesn't show anything, then presumably it's not a leak of
the slab object. Does /proc/meminfo show anything interesting? Maybe
it's a page getting leaked (which wouldn't be noticed by kmemleak)
- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-02-16 20:05 Recent 3.x kernels: Memory leak causing OOMs Russell King - ARM Linux
2014-02-16 21:43 ` Theodore Ts'o
@ 2014-02-16 22:17 ` David Rientjes
2014-02-16 22:50 ` Russell King - ARM Linux
1 sibling, 1 reply; 18+ messages in thread
From: David Rientjes @ 2014-02-16 22:17 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mm
On Sun, 16 Feb 2014, Russell King - ARM Linux wrote:
> Mem-info:
> Normal per-cpu:
> CPU 0: hi: 42, btch: 7 usd: 36
> active_anon:28041 inactive_anon:104 isolated_anon:0
> active_file:11 inactive_file:11 isolated_file:0
> unevictable:0 dirty:1 writeback:6 unstable:0
> free:342 slab_reclaimable:170 slab_unreclaimable:570
> mapped:13 shmem:139 pagetables:95 bounce:0
> free_cma:0
> Normal free:1368kB min:1384kB low:1728kB high:2076kB active_anon:112164kB inactive_anon:416kB active_file:44kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:131072kB managed:120152kB mlocked:0kB dirty:4kB
> writeback:24kB mapped:52kB shmem:556kB slab_reclaimable:680kB slab_unreclaimable:2280kB kernel_stack:248kB pagetables:380kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:136 all_unreclaimable? yes
All memory is accounted for here, there appears to be no leakage.
> [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
...
> [ 756] 0 756 28163 27776 57 0 0 ld-linux.so.2
This is taking ~108MB of your ~117MB memory.
Three possibilies that immediately jump to mind:
- if this is an SMP kernel, then too much free memory is being accounted
for in cpu-0 vmstat differential and not returned to the ZVC pages
count,
- there is a too little amount of "managed" memory attributed to
ZONE_NORMAL, the ~10MB difference between "present" and "managed"
memory, or
- ld-linux.so.2 is using too much memory.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-02-16 22:17 ` David Rientjes
@ 2014-02-16 22:50 ` Russell King - ARM Linux
2014-02-16 23:42 ` David Rientjes
2014-02-17 21:02 ` Maxime Bizon
0 siblings, 2 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-02-16 22:50 UTC (permalink / raw)
To: David Rientjes; +Cc: linux-arm-kernel, linux-mm
On Sun, Feb 16, 2014 at 02:17:44PM -0800, David Rientjes wrote:
> On Sun, 16 Feb 2014, Russell King - ARM Linux wrote:
>
> > Mem-info:
> > Normal per-cpu:
> > CPU 0: hi: 42, btch: 7 usd: 36
> > active_anon:28041 inactive_anon:104 isolated_anon:0
> > active_file:11 inactive_file:11 isolated_file:0
> > unevictable:0 dirty:1 writeback:6 unstable:0
> > free:342 slab_reclaimable:170 slab_unreclaimable:570
> > mapped:13 shmem:139 pagetables:95 bounce:0
> > free_cma:0
> > Normal free:1368kB min:1384kB low:1728kB high:2076kB active_anon:112164kB inactive_anon:416kB active_file:44kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:131072kB managed:120152kB mlocked:0kB dirty:4kB
> > writeback:24kB mapped:52kB shmem:556kB slab_reclaimable:680kB slab_unreclaimable:2280kB kernel_stack:248kB pagetables:380kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:136 all_unreclaimable? yes
>
> All memory is accounted for here, there appears to be no leakage.
>
> > [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
> ...
> > [ 756] 0 756 28163 27776 57 0 0 ld-linux.so.2
>
> This is taking ~108MB of your ~117MB memory.
Damn, you're right. I was reading that as kB not pages, which means this
is an /old/ real OOM. It should only be taking around 30MB. Sorry for
wasting people's time with that one.
However, that doesn't negate the point which I brought up in my other
mail - I have been chasing a memory leak elsewhere, and I so far have
two dumps off a different machine - both of these logs are from the
same machine, which took 41 days to OOM.
http://www.home.arm.linux.org.uk/~rmk/misc/log-20131228.txt
http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
Rik van Riel had a look at the second one, and we came to the conclusion
that running kmemleak would be a good idea - which I have been over the
last 7 days and it's found nothing yet. Both of those OOMs in those logs
required a reboot to recover the machine to a usable state - in the second
one, you could log in but it wasn't pleasant.
--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-02-16 22:50 ` Russell King - ARM Linux
@ 2014-02-16 23:42 ` David Rientjes
2014-02-17 21:02 ` Maxime Bizon
1 sibling, 0 replies; 18+ messages in thread
From: David Rientjes @ 2014-02-16 23:42 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mm
On Sun, 16 Feb 2014, Russell King - ARM Linux wrote:
> However, that doesn't negate the point which I brought up in my other
> mail - I have been chasing a memory leak elsewhere, and I so far have
> two dumps off a different machine - both of these logs are from the
> same machine, which took 41 days to OOM.
>
> http://www.home.arm.linux.org.uk/~rmk/misc/log-20131228.txt
> http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
>
You actually have free memory in both of these, the problem is
fragmentation: the first log shows oom kills where order=2 and the second
long shows oom kills where order=3.
If I look at an example from the second log:
Normal free:35052kB min:1416kB low:1768kB high:2124kB active_anon:28kB
inactive_anon:60kB active_file:140kB inactive_file:140kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:131072kB managed:125848kB
mlocked:0kB dirty:0kB writeback:40kB mapped:0kB shmem:0kB
slab_reclaimable:3024kB slab_unreclaimable:9036kB kernel_stack:1248kB
pagetables:1696kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:574 all_unreclaimable? yes
you definitely are missing memory somewhere, but I'm not sure it's going
to be detected by kmemleak since the slab stats aren't very high. The
system has ~123MB of memory, ~34.5MB is user or free memory, ~12MB is
slab, and ~3MB for stack and pagetables means you're missing over half of
your memory somewhere. There's types of memory that isn't shown here for
things like vmalloc(), things that call alloc_pages() directly, hugepages,
etc.
You also have a lot of swap available:
Free swap = 1011476kB
Total swap = 1049256kB
These ooms are coming from the high-order sk_page_frag_refill() which has
been changed recently to fallback without calling the oom killer, you'll
need commit ed98df3361f0 ("net: use __GFP_NORETRY for high order
allocations") that Linus merged about 1.5 weeks ago.
So I'd recommend forgetting about kmemleak here, try a kernel with that
commit to avoid the oom killing, and then capture /proc/meminfo at regular
intervals to see if something continuously grows that isn't captured in
the oom log.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-02-16 21:43 ` Theodore Ts'o
@ 2014-02-17 18:34 ` Catalin Marinas
0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2014-02-17 18:34 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Russell King - ARM Linux, linux-mm, linux-arm-kernel
On Sun, Feb 16, 2014 at 04:43:54PM -0500, Theodore Ts'o wrote:
> On Sun, Feb 16, 2014 at 08:05:04PM +0000, Russell King - ARM Linux wrote:
> > I have another machine which OOM'd a week ago with plenty of unused swap
> > - it uses ext3 on raid1 and is a more busy system. That took 41 days
> > to show, and upon reboot, it got a kernel with kmemleak enabled. So far,
> > after 7 days, kmemleak has found nothing at all.
>
> If kmemleak doesn't show anything, then presumably it's not a leak of
> the slab object. Does /proc/meminfo show anything interesting? Maybe
> it's a page getting leaked (which wouldn't be noticed by kmemleak)
Kmemleak also doesn't notice leaked objects that are added to a list for
example (still referenced). Maybe /proc/slabinfo would give some clues.
--
Catalin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-02-16 22:50 ` Russell King - ARM Linux
2014-02-16 23:42 ` David Rientjes
@ 2014-02-17 21:02 ` Maxime Bizon
2014-02-17 21:09 ` Russell King - ARM Linux
1 sibling, 1 reply; 18+ messages in thread
From: Maxime Bizon @ 2014-02-17 21:02 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: David Rientjes, linux-mm, linux-arm-kernel
On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
[<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
[<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c) r6:00000520 r5:c277bae0 r4:c68f37c0
[<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
FWIW I had OOMs with the exact same backtrace on kirkwood platform
(512MB RAM), but sorry I don't have the full dump anymore.
I found a slow leaking process, and since I fixed that leak I now have
uptime better than 7 days, *but* there was definitely some memory left
when the OOM happened, so it appears to be related to fragmentation.
>From what I recall, clearing the page cache helped making the box live a
little bit longer. Does it make sense or should alloc_pages() discard
its content when trying to satisfy high order allocations ?
--
Maxime
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-02-17 21:02 ` Maxime Bizon
@ 2014-02-17 21:09 ` Russell King - ARM Linux
2014-03-15 10:19 ` Russell King - ARM Linux
0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-02-17 21:09 UTC (permalink / raw)
To: Maxime Bizon; +Cc: David Rientjes, linux-mm, linux-arm-kernel
On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
>
> On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
>
> > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
>
> [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c) r6:00000520 r5:c277bae0 r4:c68f37c0
> [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
>
> FWIW I had OOMs with the exact same backtrace on kirkwood platform
> (512MB RAM), but sorry I don't have the full dump anymore.
>
> I found a slow leaking process, and since I fixed that leak I now have
> uptime better than 7 days, *but* there was definitely some memory left
> when the OOM happened, so it appears to be related to fragmentation.
However, that's a side effect, not the cause - and a patch has been
merged to fix that OOM - but that doesn't explain where most of the
memory has gone!
I'm presently waiting for the machine to OOM again (it's probably going
to be something like another month) at which point I'll grab the files
people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
/proc/slabinfo etc.)
--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-02-17 21:09 ` Russell King - ARM Linux
@ 2014-03-15 10:19 ` Russell King - ARM Linux
2014-03-17 7:07 ` NeilBrown
0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-03-15 10:19 UTC (permalink / raw)
To: Neil Brown, linux-raid, Linus Torvalds
Cc: Maxime Bizon, linux-mm, linux-arm-kernel, David Rientjes
On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> >
> > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> >
> > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> >
> > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c) r6:00000520 r5:c277bae0 r4:c68f37c0
> > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> >
> > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > (512MB RAM), but sorry I don't have the full dump anymore.
> >
> > I found a slow leaking process, and since I fixed that leak I now have
> > uptime better than 7 days, *but* there was definitely some memory left
> > when the OOM happened, so it appears to be related to fragmentation.
>
> However, that's a side effect, not the cause - and a patch has been
> merged to fix that OOM - but that doesn't explain where most of the
> memory has gone!
>
> I'm presently waiting for the machine to OOM again (it's probably going
> to be something like another month) at which point I'll grab the files
> people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> /proc/slabinfo etc.)
For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
OOMs after a month or two of uptime.
Last night, it OOM'd severely again at around 5am... and rebooted soon
after so we've lost any hope of recovering anything useful from the
machine.
However, the new kernel re-ran the raid check, and...
md: data-check of RAID array md2
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 4194688k.
md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
md: md2: data-check done.
md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
md: data-check of RAID array md3
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 524544k.
md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
md: md3: data-check done.
md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
md: data-check of RAID array md5
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 10486080k.
kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
md: md5: data-check done.
md: data-check of RAID array md4
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 10486080k.
md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
md: md4: data-check done.
md: data-check of RAID array md6
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for data-check.
md: using 128k window, over a total of 10409472k.
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
md: md6: data-check done.
kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
which totals 3077 of leaks. So we have a memory leak. Looking at
the kmemleak file:
unreferenced object 0xc3c3f880 (size 256):
comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<c008d4f0>] __save_stack_trace+0x34/0x40
[<c008d5f0>] create_object+0xf4/0x214
[<c02da114>] kmemleak_alloc+0x3c/0x6c
[<c008c0d4>] __kmalloc+0xd0/0x124
[<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
[<c021206c>] r1buf_pool_alloc+0x40/0x148
[<c0061160>] mempool_alloc+0x54/0xfc
[<c0211938>] sync_request+0x168/0x85c
[<c021addc>] md_do_sync+0x75c/0xbc0
[<c021b594>] md_thread+0x138/0x154
[<c0037b48>] kthread+0xb0/0xbc
[<c0013190>] ret_from_fork+0x14/0x24
[<ffffffff>] 0xffffffff
with 3077 of these in the debug file. 3075 are for "md2_resync" and
two are for "md4_resync".
/proc/slabinfo shows for this bucket:
kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
but this would only account for about 800kB of memory usage, which itself
is insignificant - so this is not the whole story.
It seems that this is the culpret for the allocations:
for (j = pi->raid_disks ; j-- ; ) {
bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
(12 * 16 = 192) plus the size of struct bio, which would fall into this
bucket.
I don't see anything obvious - it looks like it isn't every raid check
which loses bios. Not quite sure what to make of this right now.
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-15 10:19 ` Russell King - ARM Linux
@ 2014-03-17 7:07 ` NeilBrown
2014-03-17 8:51 ` Russell King - ARM Linux
2014-03-17 18:18 ` Catalin Marinas
0 siblings, 2 replies; 18+ messages in thread
From: NeilBrown @ 2014-03-17 7:07 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: linux-raid, Linus Torvalds, Maxime Bizon, linux-mm,
linux-arm-kernel, David Rientjes
[-- Attachment #1: Type: text/plain, Size: 7819 bytes --]
On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> > >
> > > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> > >
> > > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> > >
> > > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c) r6:00000520 r5:c277bae0 r4:c68f37c0
> > > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> > >
> > > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > > (512MB RAM), but sorry I don't have the full dump anymore.
> > >
> > > I found a slow leaking process, and since I fixed that leak I now have
> > > uptime better than 7 days, *but* there was definitely some memory left
> > > when the OOM happened, so it appears to be related to fragmentation.
> >
> > However, that's a side effect, not the cause - and a patch has been
> > merged to fix that OOM - but that doesn't explain where most of the
> > memory has gone!
> >
> > I'm presently waiting for the machine to OOM again (it's probably going
> > to be something like another month) at which point I'll grab the files
> > people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> > /proc/slabinfo etc.)
>
> For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
> OOMs after a month or two of uptime.
>
> Last night, it OOM'd severely again at around 5am... and rebooted soon
> after so we've lost any hope of recovering anything useful from the
> machine.
>
> However, the new kernel re-ran the raid check, and...
>
> md: data-check of RAID array md2
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 4194688k.
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> md: md2: data-check done.
> md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
> md: data-check of RAID array md3
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 524544k.
> md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
> md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
> kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md3: data-check done.
> md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
> md: data-check of RAID array md5
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10486080k.
> kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md5: data-check done.
> md: data-check of RAID array md4
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10486080k.
> md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md4: data-check done.
> md: data-check of RAID array md6
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> for data-check.
> md: using 128k window, over a total of 10409472k.
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> md: md6: data-check done.
> kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
>
> which totals 3077 of leaks. So we have a memory leak. Looking at
> the kmemleak file:
>
> unreferenced object 0xc3c3f880 (size 256):
> comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<c008d4f0>] __save_stack_trace+0x34/0x40
> [<c008d5f0>] create_object+0xf4/0x214
> [<c02da114>] kmemleak_alloc+0x3c/0x6c
> [<c008c0d4>] __kmalloc+0xd0/0x124
> [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> [<c021206c>] r1buf_pool_alloc+0x40/0x148
> [<c0061160>] mempool_alloc+0x54/0xfc
> [<c0211938>] sync_request+0x168/0x85c
> [<c021addc>] md_do_sync+0x75c/0xbc0
> [<c021b594>] md_thread+0x138/0x154
> [<c0037b48>] kthread+0xb0/0xbc
> [<c0013190>] ret_from_fork+0x14/0x24
> [<ffffffff>] 0xffffffff
>
> with 3077 of these in the debug file. 3075 are for "md2_resync" and
> two are for "md4_resync".
>
> /proc/slabinfo shows for this bucket:
> kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
>
> but this would only account for about 800kB of memory usage, which itself
> is insignificant - so this is not the whole story.
>
> It seems that this is the culpret for the allocations:
> for (j = pi->raid_disks ; j-- ; ) {
> bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
>
> Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> (12 * 16 = 192) plus the size of struct bio, which would fall into this
> bucket.
>
> I don't see anything obvious - it looks like it isn't every raid check
> which loses bios. Not quite sure what to make of this right now.
>
I can't see anything obvious either.
The bios allocated there are stored in a r1_bio and those pointers are never
changed.
If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
would complain that the pool wasn't completely freed.
And when the r1_bio is freed, all the bios are put as well.
I guess if something was calling bio_get() on the bio, then might stop the
bio_put from freeing the memory, but I cannot see anything that would do that.
I've tried testing on a recent mainline kernel and while kmemleak shows about
238 leaks from "swapper/0", there are none related to md or bios.
I'll let it run a while longer and see if anything pops.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-17 7:07 ` NeilBrown
@ 2014-03-17 8:51 ` Russell King - ARM Linux
2014-03-17 18:18 ` Catalin Marinas
1 sibling, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-03-17 8:51 UTC (permalink / raw)
To: NeilBrown
Cc: linux-raid, Linus Torvalds, Maxime Bizon, linux-mm,
linux-arm-kernel, David Rientjes
On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>
> > On Mon, Feb 17, 2014 at 09:09:54PM +0000, Russell King - ARM Linux wrote:
> > > On Mon, Feb 17, 2014 at 10:02:31PM +0100, Maxime Bizon wrote:
> > > >
> > > > On Sun, 2014-02-16 at 22:50 +0000, Russell King - ARM Linux wrote:
> > > >
> > > > > http://www.home.arm.linux.org.uk/~rmk/misc/log-20140208.txt
> > > >
> > > > [<c0064ce0>] (__alloc_pages_nodemask+0x0/0x694) from [<c022273c>] (sk_page_frag_refill+0x78/0x108)
> > > > [<c02226c4>] (sk_page_frag_refill+0x0/0x108) from [<c026a3a4>] (tcp_sendmsg+0x654/0xd1c) r6:00000520 r5:c277bae0 r4:c68f37c0
> > > > [<c0269d50>] (tcp_sendmsg+0x0/0xd1c) from [<c028ca9c>] (inet_sendmsg+0x64/0x70)
> > > >
> > > > FWIW I had OOMs with the exact same backtrace on kirkwood platform
> > > > (512MB RAM), but sorry I don't have the full dump anymore.
> > > >
> > > > I found a slow leaking process, and since I fixed that leak I now have
> > > > uptime better than 7 days, *but* there was definitely some memory left
> > > > when the OOM happened, so it appears to be related to fragmentation.
> > >
> > > However, that's a side effect, not the cause - and a patch has been
> > > merged to fix that OOM - but that doesn't explain where most of the
> > > memory has gone!
> > >
> > > I'm presently waiting for the machine to OOM again (it's probably going
> > > to be something like another month) at which point I'll grab the files
> > > people have been mentioning (/proc/meminfo, /proc/vmallocinfo,
> > > /proc/slabinfo etc.)
> >
> > For those new to this report, this is a 3.12.6+ kernel, and I'm seeing
> > OOMs after a month or two of uptime.
> >
> > Last night, it OOM'd severely again at around 5am... and rebooted soon
> > after so we've lost any hope of recovering anything useful from the
> > machine.
> >
> > However, the new kernel re-ran the raid check, and...
> >
> > md: data-check of RAID array md2
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 4194688k.
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md6 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md3 until md2 has finished (they share one or more physical units)
> > md: delaying data-check of md5 until md2 has finished (they share one or more physical units)
> > md: md2: data-check done.
> > md: delaying data-check of md5 until md3 has finished (they share one or more physical units)
> > md: data-check of RAID array md3
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 524544k.
> > md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
> > md: delaying data-check of md6 until md3 has finished (they share one or more physical units)
> > kmemleak: 836 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md3: data-check done.
> > md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> > md: delaying data-check of md4 until md5 has finished (they share one or more physical units)
> > md: data-check of RAID array md5
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10486080k.
> > kmemleak: 2235 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md5: data-check done.
> > md: data-check of RAID array md4
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10486080k.
> > md: delaying data-check of md6 until md4 has finished (they share one or more physical units)
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md4: data-check done.
> > md: data-check of RAID array md6
> > md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> > md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec)
> > for data-check.
> > md: using 128k window, over a total of 10409472k.
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > kmemleak: 3 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > md: md6: data-check done.
> > kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> >
> > which totals 3077 of leaks. So we have a memory leak. Looking at
> > the kmemleak file:
> >
> > unreferenced object 0xc3c3f880 (size 256):
> > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > hex dump (first 32 bytes):
> > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > backtrace:
> > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > [<c008d5f0>] create_object+0xf4/0x214
> > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > [<c008c0d4>] __kmalloc+0xd0/0x124
> > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > [<c0061160>] mempool_alloc+0x54/0xfc
> > [<c0211938>] sync_request+0x168/0x85c
> > [<c021addc>] md_do_sync+0x75c/0xbc0
> > [<c021b594>] md_thread+0x138/0x154
> > [<c0037b48>] kthread+0xb0/0xbc
> > [<c0013190>] ret_from_fork+0x14/0x24
> > [<ffffffff>] 0xffffffff
> >
> > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > two are for "md4_resync".
> >
> > /proc/slabinfo shows for this bucket:
> > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> >
> > but this would only account for about 800kB of memory usage, which itself
> > is insignificant - so this is not the whole story.
> >
> > It seems that this is the culpret for the allocations:
> > for (j = pi->raid_disks ; j-- ; ) {
> > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> >
> > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > bucket.
> >
> > I don't see anything obvious - it looks like it isn't every raid check
> > which loses bios. Not quite sure what to make of this right now.
> >
>
> I can't see anything obvious either.
>
> The bios allocated there are stored in a r1_bio and those pointers are never
> changed.
> If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> would complain that the pool wasn't completely freed.
> And when the r1_bio is freed, all the bios are put as well.
>
> I guess if something was calling bio_get() on the bio, then might stop the
> bio_put from freeing the memory, but I cannot see anything that would do that.
>
> I've tried testing on a recent mainline kernel and while kmemleak shows about
> 238 leaks from "swapper/0", there are none related to md or bios.
>
> I'll let it run a while longer and see if anything pops.
I think the interesting detail from the above is that seems a little random
- which suggests some kind of race maybe. There are three 10G partitions,
but only one of those leaks two BIOs, but a 4G partition leaked 3075 BIOs.
md2 is /usr and md4 is /home. Maybe it's related to other IO happening
during the check?
The underlying devices for all the raid1s are PATA (IT821x) using the ata
driver.
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-17 7:07 ` NeilBrown
2014-03-17 8:51 ` Russell King - ARM Linux
@ 2014-03-17 18:18 ` Catalin Marinas
2014-03-17 19:33 ` Russell King - ARM Linux
1 sibling, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2014-03-17 18:18 UTC (permalink / raw)
To: NeilBrown
Cc: Russell King - ARM Linux, linux-raid, linux-mm, David Rientjes,
Maxime Bizon, Linus Torvalds, linux-arm-kernel
On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > unreferenced object 0xc3c3f880 (size 256):
> > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > hex dump (first 32 bytes):
> > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > backtrace:
> > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > [<c008d5f0>] create_object+0xf4/0x214
> > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > [<c008c0d4>] __kmalloc+0xd0/0x124
> > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > [<c0061160>] mempool_alloc+0x54/0xfc
> > [<c0211938>] sync_request+0x168/0x85c
> > [<c021addc>] md_do_sync+0x75c/0xbc0
> > [<c021b594>] md_thread+0x138/0x154
> > [<c0037b48>] kthread+0xb0/0xbc
> > [<c0013190>] ret_from_fork+0x14/0x24
> > [<ffffffff>] 0xffffffff
> >
> > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > two are for "md4_resync".
> >
> > /proc/slabinfo shows for this bucket:
> > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> >
> > but this would only account for about 800kB of memory usage, which itself
> > is insignificant - so this is not the whole story.
> >
> > It seems that this is the culpret for the allocations:
> > for (j = pi->raid_disks ; j-- ; ) {
> > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> >
> > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > bucket.
> >
> > I don't see anything obvious - it looks like it isn't every raid check
> > which loses bios. Not quite sure what to make of this right now.
>
> I can't see anything obvious either.
>
> The bios allocated there are stored in a r1_bio and those pointers are never
> changed.
> If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> would complain that the pool wasn't completely freed.
> And when the r1_bio is freed, all the bios are put as well.
It could be a false positive, there are areas that kmemleak doesn't scan
like page allocations and the pointer reference graph it tries to build
would fail.
What's interesting to see is the first few leaks reported as they are
always reported in the order of allocation. In this case, the
bio_kmalloc() returned pointer is stored in r1_bio. Is the r1_bio
reported as a leak as well?
The sync_request() function eventually gets rid of the r1_bio as it is a
variable on the stack. But it is stored in a bio->bi_private variable
and that's where I lost track of where pointers are referenced from.
A simple way to check whether it's a false positive is to do a:
echo dump=<unref obj addr> > /sys/kernel/debug/kmemleak
If an object was reported as a leak but later on kmemleak doesn't know
about it, it means that it was freed and hence a false positive (maybe I
should add this as a warning in kmemleak if certain amount of leaked
objects freeing is detected).
--
Catalin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-17 18:18 ` Catalin Marinas
@ 2014-03-17 19:33 ` Russell King - ARM Linux
2014-04-01 9:19 ` Russell King - ARM Linux
0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-03-17 19:33 UTC (permalink / raw)
To: Catalin Marinas
Cc: NeilBrown, linux-raid, linux-mm, David Rientjes, Maxime Bizon,
Linus Torvalds, linux-arm-kernel
On Mon, Mar 17, 2014 at 06:18:13PM +0000, Catalin Marinas wrote:
> On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> > On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> > > unreferenced object 0xc3c3f880 (size 256):
> > > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > > hex dump (first 32 bytes):
> > > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > > backtrace:
> > > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > > [<c008d5f0>] create_object+0xf4/0x214
> > > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > > [<c008c0d4>] __kmalloc+0xd0/0x124
> > > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > > [<c0061160>] mempool_alloc+0x54/0xfc
> > > [<c0211938>] sync_request+0x168/0x85c
> > > [<c021addc>] md_do_sync+0x75c/0xbc0
> > > [<c021b594>] md_thread+0x138/0x154
> > > [<c0037b48>] kthread+0xb0/0xbc
> > > [<c0013190>] ret_from_fork+0x14/0x24
> > > [<ffffffff>] 0xffffffff
> > >
> > > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > > two are for "md4_resync".
> > >
> > > /proc/slabinfo shows for this bucket:
> > > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> > >
> > > but this would only account for about 800kB of memory usage, which itself
> > > is insignificant - so this is not the whole story.
> > >
> > > It seems that this is the culpret for the allocations:
> > > for (j = pi->raid_disks ; j-- ; ) {
> > > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> > >
> > > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > > bucket.
> > >
> > > I don't see anything obvious - it looks like it isn't every raid check
> > > which loses bios. Not quite sure what to make of this right now.
> >
> > I can't see anything obvious either.
> >
> > The bios allocated there are stored in a r1_bio and those pointers are never
> > changed.
> > If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> > would complain that the pool wasn't completely freed.
> > And when the r1_bio is freed, all the bios are put as well.
>
> It could be a false positive, there are areas that kmemleak doesn't scan
> like page allocations and the pointer reference graph it tries to build
> would fail.
>
> What's interesting to see is the first few leaks reported as they are
> always reported in the order of allocation. In this case, the
> bio_kmalloc() returned pointer is stored in r1_bio. Is the r1_bio
> reported as a leak as well?
I'd assume that something else would likely have a different size.
All leaks are of 256 bytes. Also...
$ grep kmemleak_alloc kmemleak-20140315 -A2 |sort | uniq -c |less
3081 --
3082 [<c008c0d4>] __kmalloc+0xd0/0x124
3082 [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
3082 [<c02da114>] kmemleak_alloc+0x3c/0x6c
seems pretty conclusive that it's just one spot.
> The sync_request() function eventually gets rid of the r1_bio as it is a
> variable on the stack. But it is stored in a bio->bi_private variable
> and that's where I lost track of where pointers are referenced from.
>
> A simple way to check whether it's a false positive is to do a:
>
> echo dump=<unref obj addr> > /sys/kernel/debug/kmemleak
>
> If an object was reported as a leak but later on kmemleak doesn't know
> about it, it means that it was freed and hence a false positive (maybe I
> should add this as a warning in kmemleak if certain amount of leaked
> objects freeing is detected).
So doing that with the above leaked bio produces:
kmemleak: Object 0xc3c3f880 (size 256):
kmemleak: comm "md2_resync", pid 4680, jiffies 638245
kmemleak: min_count = 1
kmemleak: count = 0
kmemleak: flags = 0x3
kmemleak: checksum = 1042746691
kmemleak: backtrace:
[<c008d4f0>] __save_stack_trace+0x34/0x40
[<c008d5f0>] create_object+0xf4/0x214
[<c02da114>] kmemleak_alloc+0x3c/0x6c
[<c008c0d4>] __kmalloc+0xd0/0x124
[<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
[<c021206c>] r1buf_pool_alloc+0x40/0x148
[<c0061160>] mempool_alloc+0x54/0xfc
[<c0211938>] sync_request+0x168/0x85c
[<c021addc>] md_do_sync+0x75c/0xbc0
[<c021b594>] md_thread+0x138/0x154
[<c0037b48>] kthread+0xb0/0xbc
[<c0013190>] ret_from_fork+0x14/0x24
[<ffffffff>] 0xffffffff
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-03-17 19:33 ` Russell King - ARM Linux
@ 2014-04-01 9:19 ` Russell King - ARM Linux
2014-04-01 11:38 ` Russell King - ARM Linux
0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-04-01 9:19 UTC (permalink / raw)
To: Catalin Marinas, Linus Torvalds
Cc: NeilBrown, linux-raid, linux-mm, David Rientjes, Maxime Bizon,
linux-arm-kernel
Right, so the machine has died again this morning.
Excuse me if I ignore this merge window, I seem to have some debugging
to do on my own because no one else is interested in my reports of this,
and I need to get this fixed. Modern Linux is totally unusable for me
at the moment.
On Mon, Mar 17, 2014 at 07:33:16PM +0000, Russell King - ARM Linux wrote:
> On Mon, Mar 17, 2014 at 06:18:13PM +0000, Catalin Marinas wrote:
> > On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> > > On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> > > <linux@arm.linux.org.uk> wrote:
> > > > unreferenced object 0xc3c3f880 (size 256):
> > > > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > > > hex dump (first 32 bytes):
> > > > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > > > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > > > backtrace:
> > > > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > > > [<c008d5f0>] create_object+0xf4/0x214
> > > > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > > > [<c008c0d4>] __kmalloc+0xd0/0x124
> > > > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > > > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > > > [<c0061160>] mempool_alloc+0x54/0xfc
> > > > [<c0211938>] sync_request+0x168/0x85c
> > > > [<c021addc>] md_do_sync+0x75c/0xbc0
> > > > [<c021b594>] md_thread+0x138/0x154
> > > > [<c0037b48>] kthread+0xb0/0xbc
> > > > [<c0013190>] ret_from_fork+0x14/0x24
> > > > [<ffffffff>] 0xffffffff
> > > >
> > > > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > > > two are for "md4_resync".
> > > >
> > > > /proc/slabinfo shows for this bucket:
> > > > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> > > >
> > > > but this would only account for about 800kB of memory usage, which itself
> > > > is insignificant - so this is not the whole story.
> > > >
> > > > It seems that this is the culpret for the allocations:
> > > > for (j = pi->raid_disks ; j-- ; ) {
> > > > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> > > >
> > > > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > > > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > > > bucket.
> > > >
> > > > I don't see anything obvious - it looks like it isn't every raid check
> > > > which loses bios. Not quite sure what to make of this right now.
> > >
> > > I can't see anything obvious either.
> > >
> > > The bios allocated there are stored in a r1_bio and those pointers are never
> > > changed.
> > > If the r1_bio wasn't freed then when the data-check finished, mempool_destroy
> > > would complain that the pool wasn't completely freed.
> > > And when the r1_bio is freed, all the bios are put as well.
> >
> > It could be a false positive, there are areas that kmemleak doesn't scan
> > like page allocations and the pointer reference graph it tries to build
> > would fail.
> >
> > What's interesting to see is the first few leaks reported as they are
> > always reported in the order of allocation. In this case, the
> > bio_kmalloc() returned pointer is stored in r1_bio. Is the r1_bio
> > reported as a leak as well?
>
> I'd assume that something else would likely have a different size.
> All leaks are of 256 bytes. Also...
>
> $ grep kmemleak_alloc kmemleak-20140315 -A2 |sort | uniq -c |less
> 3081 --
> 3082 [<c008c0d4>] __kmalloc+0xd0/0x124
> 3082 [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> 3082 [<c02da114>] kmemleak_alloc+0x3c/0x6c
>
> seems pretty conclusive that it's just one spot.
>
> > The sync_request() function eventually gets rid of the r1_bio as it is a
> > variable on the stack. But it is stored in a bio->bi_private variable
> > and that's where I lost track of where pointers are referenced from.
> >
> > A simple way to check whether it's a false positive is to do a:
> >
> > echo dump=<unref obj addr> > /sys/kernel/debug/kmemleak
> >
> > If an object was reported as a leak but later on kmemleak doesn't know
> > about it, it means that it was freed and hence a false positive (maybe I
> > should add this as a warning in kmemleak if certain amount of leaked
> > objects freeing is detected).
>
> So doing that with the above leaked bio produces:
>
> kmemleak: Object 0xc3c3f880 (size 256):
> kmemleak: comm "md2_resync", pid 4680, jiffies 638245
> kmemleak: min_count = 1
> kmemleak: count = 0
> kmemleak: flags = 0x3
> kmemleak: checksum = 1042746691
> kmemleak: backtrace:
> [<c008d4f0>] __save_stack_trace+0x34/0x40
> [<c008d5f0>] create_object+0xf4/0x214
> [<c02da114>] kmemleak_alloc+0x3c/0x6c
> [<c008c0d4>] __kmalloc+0xd0/0x124
> [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> [<c021206c>] r1buf_pool_alloc+0x40/0x148
> [<c0061160>] mempool_alloc+0x54/0xfc
> [<c0211938>] sync_request+0x168/0x85c
> [<c021addc>] md_do_sync+0x75c/0xbc0
> [<c021b594>] md_thread+0x138/0x154
> [<c0037b48>] kthread+0xb0/0xbc
> [<c0013190>] ret_from_fork+0x14/0x24
> [<ffffffff>] 0xffffffff
>
> --
> FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
> improving, and getting towards what was expected from it.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-04-01 9:19 ` Russell King - ARM Linux
@ 2014-04-01 11:38 ` Russell King - ARM Linux
2014-04-01 14:04 ` Russell King - ARM Linux
2014-04-01 15:58 ` Catalin Marinas
0 siblings, 2 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-04-01 11:38 UTC (permalink / raw)
To: Catalin Marinas, Linus Torvalds
Cc: NeilBrown, linux-raid, linux-mm, David Rientjes, Maxime Bizon,
linux-arm-kernel
On Tue, Apr 01, 2014 at 10:19:59AM +0100, Russell King - ARM Linux wrote:
> On Mon, Mar 17, 2014 at 07:33:16PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Mar 17, 2014 at 06:18:13PM +0000, Catalin Marinas wrote:
> > > On Mon, Mar 17, 2014 at 06:07:48PM +1100, NeilBrown wrote:
> > > > On Sat, 15 Mar 2014 10:19:52 +0000 Russell King - ARM Linux
> > > > <linux@arm.linux.org.uk> wrote:
> > > > > unreferenced object 0xc3c3f880 (size 256):
> > > > > comm "md2_resync", pid 4680, jiffies 638245 (age 8615.570s)
> > > > > hex dump (first 32 bytes):
> > > > > 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 f0 ................
> > > > > 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 ................
> > > > > backtrace:
> > > > > [<c008d4f0>] __save_stack_trace+0x34/0x40
> > > > > [<c008d5f0>] create_object+0xf4/0x214
> > > > > [<c02da114>] kmemleak_alloc+0x3c/0x6c
> > > > > [<c008c0d4>] __kmalloc+0xd0/0x124
> > > > > [<c00bb124>] bio_alloc_bioset+0x4c/0x1a4
> > > > > [<c021206c>] r1buf_pool_alloc+0x40/0x148
> > > > > [<c0061160>] mempool_alloc+0x54/0xfc
> > > > > [<c0211938>] sync_request+0x168/0x85c
> > > > > [<c021addc>] md_do_sync+0x75c/0xbc0
> > > > > [<c021b594>] md_thread+0x138/0x154
> > > > > [<c0037b48>] kthread+0xb0/0xbc
> > > > > [<c0013190>] ret_from_fork+0x14/0x24
> > > > > [<ffffffff>] 0xffffffff
> > > > >
> > > > > with 3077 of these in the debug file. 3075 are for "md2_resync" and
> > > > > two are for "md4_resync".
> > > > >
> > > > > /proc/slabinfo shows for this bucket:
> > > > > kmalloc-256 3237 3450 256 15 1 : tunables 120 60 0 : slabdata 230 230 0
> > > > >
> > > > > but this would only account for about 800kB of memory usage, which itself
> > > > > is insignificant - so this is not the whole story.
> > > > >
> > > > > It seems that this is the culpret for the allocations:
> > > > > for (j = pi->raid_disks ; j-- ; ) {
> > > > > bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
> > > > >
> > > > > Since RESYNC_PAGES will be 64K/4K=16, each struct bio_vec is 12 bytes
> > > > > (12 * 16 = 192) plus the size of struct bio, which would fall into this
> > > > > bucket.
> > > > >
> > > > > I don't see anything obvious - it looks like it isn't every raid check
> > > > > which loses bios. Not quite sure what to make of this right now.
I now see something very obvious, having had the problem again, dumped
the physical memory to file, and inspected the full leaked struct bio.
What I find is that the leaked struct bio's have a bi_cnt of one, which
confirms that they were never freed - free'd struct bio's would have a
bi_cnt of zero due to the atomic_dec_and_test() before bio_free() inside
bio_put().
When looking at the bi_inline_vecs, I see that there was a failure to
allocate a page. Now, let's look at what r1buf_pool_alloc() does:
for (j = pi->raid_disks ; j-- ; ) {
bio = bio_kmalloc(gfp_flags, RESYNC_PAGES);
if (!bio)
goto out_free_bio;
r1_bio->bios[j] = bio;
}
if (test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery))
j = pi->raid_disks;
else
j = 1;
while(j--) {
bio = r1_bio->bios[j];
bio->bi_vcnt = RESYNC_PAGES;
if (bio_alloc_pages(bio, gfp_flags))
goto out_free_bio;
}
out_free_bio:
while (++j < pi->raid_disks)
bio_put(r1_bio->bios[j]);
r1bio_pool_free(r1_bio, data);
Consider what happens when bio_alloc_pages() fails. j starts off as one
for non-recovery operations, and we enter the loop to allocate the pages.
j is post-decremented to zero. So, bio = r1_bio->bios[0].
bio_alloc_pages(bio) fails, we jump to out_free_bio. The first thing
that does is increment j, so we free from r1_bio->bios[1] up to the
number of raid disks, leaving r1_bio->bios[0] leaked as the r1_bio is
then freed.
The obvious fix is to set j to -1 before jumping to out_free_bio on
bio_alloc_pages() failure. However, that's not the end of the story -
there's more leaks here.
bio_put() will not free the pages allocated by the previously successful
bio_alloc_pages(). What's more is that I don't see any function in BIO
which performs that function, which makes me wonder how many other places
in the kernel dealing with BIOs end up leaking like this.
Anyway, this is what I've come up with - it's not particularly nice,
but hopefully it will plug this leak. I'm now running with this patch
in place, and time will tell.
drivers/md/raid1.c | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index aacf6bf352d8..604bad2fa442 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -123,8 +123,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
bio = r1_bio->bios[j];
bio->bi_vcnt = RESYNC_PAGES;
- if (bio_alloc_pages(bio, gfp_flags))
- goto out_free_bio;
+ if (bio_alloc_pages(bio, gfp_flags)) {
+ /*
+ * Mark this as having no pages - bio_alloc_pages
+ * removes any it allocated.
+ */
+ bio->bi_vcnt = 0;
+ goto out_free_all_bios;
+ }
}
/* If not user-requests, copy the page pointers to all bios */
if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
@@ -138,9 +144,25 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
return r1_bio;
+out_free_all_bios:
+ j = -1;
out_free_bio:
- while (++j < pi->raid_disks)
- bio_put(r1_bio->bios[j]);
+ while (++j < pi->raid_disks) {
+ bio = r1_bio->bios[j];
+ if (bio->bi_vcnt) {
+ struct bio_vec *bv;
+ int i;
+ /*
+ * Annoyingly, BIO has no way to do this, so we have
+ * to do it manually. Given the trouble here, and
+ * the lack of BIO support for cleaning up, I don't
+ * care about linux/bio.h's comment about this helper.
+ */
+ bio_for_each_segment_all(bv, bio, i)
+ __free_page(bv->bv_page);
+ }
+ bio_put(bio);
+ }
r1bio_pool_free(r1_bio, data);
return NULL;
}
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-04-01 11:38 ` Russell King - ARM Linux
@ 2014-04-01 14:04 ` Russell King - ARM Linux
2014-04-02 23:28 ` NeilBrown
2014-04-01 15:58 ` Catalin Marinas
1 sibling, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-04-01 14:04 UTC (permalink / raw)
To: NeilBrown, Linus Torvalds, Kent Overstreet
Cc: Catalin Marinas, linux-raid, linux-mm, David Rientjes,
Maxime Bizon, linux-arm-kernel
On Tue, Apr 01, 2014 at 12:38:51PM +0100, Russell King - ARM Linux wrote:
> Consider what happens when bio_alloc_pages() fails. j starts off as one
> for non-recovery operations, and we enter the loop to allocate the pages.
> j is post-decremented to zero. So, bio = r1_bio->bios[0].
>
> bio_alloc_pages(bio) fails, we jump to out_free_bio. The first thing
> that does is increment j, so we free from r1_bio->bios[1] up to the
> number of raid disks, leaving r1_bio->bios[0] leaked as the r1_bio is
> then freed.
Neil,
Can you please review commit a07876064a0b7 (block: Add bio_alloc_pages)
which seems to have introduced this bug - it seems to have gone in during
the v3.10 merge window, and looks like it was never reviewed from the
attributations on the commit.
The commit message is brief, and inadequately describes the functional
change that the patch has - we go from "get up to RESYNC_PAGES into the
bio's io_vec" to "get all RESYNC_PAGES or fail completely".
Not withstanding the breakage of the error cleanup paths, is this an
acceptable change of behaviour here?
Thanks.
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-04-01 11:38 ` Russell King - ARM Linux
2014-04-01 14:04 ` Russell King - ARM Linux
@ 2014-04-01 15:58 ` Catalin Marinas
1 sibling, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2014-04-01 15:58 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Linus Torvalds, NeilBrown, linux-raid@vger.kernel.org,
linux-mm@kvack.org, David Rientjes, Maxime Bizon,
linux-arm-kernel@lists.infradead.org
On Tue, Apr 01, 2014 at 12:38:51PM +0100, Russell King - ARM Linux wrote:
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index aacf6bf352d8..604bad2fa442 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -123,8 +123,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
> bio = r1_bio->bios[j];
> bio->bi_vcnt = RESYNC_PAGES;
>
> - if (bio_alloc_pages(bio, gfp_flags))
> - goto out_free_bio;
> + if (bio_alloc_pages(bio, gfp_flags)) {
> + /*
> + * Mark this as having no pages - bio_alloc_pages
> + * removes any it allocated.
> + */
> + bio->bi_vcnt = 0;
> + goto out_free_all_bios;
> + }
> }
> /* If not user-requests, copy the page pointers to all bios */
> if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
> @@ -138,9 +144,25 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>
> return r1_bio;
>
> +out_free_all_bios:
> + j = -1;
> out_free_bio:
> - while (++j < pi->raid_disks)
> - bio_put(r1_bio->bios[j]);
> + while (++j < pi->raid_disks) {
> + bio = r1_bio->bios[j];
> + if (bio->bi_vcnt) {
> + struct bio_vec *bv;
> + int i;
> + /*
> + * Annoyingly, BIO has no way to do this, so we have
> + * to do it manually. Given the trouble here, and
> + * the lack of BIO support for cleaning up, I don't
> + * care about linux/bio.h's comment about this helper.
> + */
> + bio_for_each_segment_all(bv, bio, i)
> + __free_page(bv->bv_page);
> + }
Do you still need the 'if' block here? bio_for_each_segment_all() checks
for bio->bi_vcnt which was set to 0 above.
--
Catalin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Recent 3.x kernels: Memory leak causing OOMs
2014-04-01 14:04 ` Russell King - ARM Linux
@ 2014-04-02 23:28 ` NeilBrown
0 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2014-04-02 23:28 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Linus Torvalds, Kent Overstreet, Catalin Marinas, linux-raid,
linux-mm, David Rientjes, Maxime Bizon, linux-arm-kernel
[-- Attachment #1: Type: text/plain, Size: 3942 bytes --]
On Tue, 1 Apr 2014 15:04:01 +0100 Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Apr 01, 2014 at 12:38:51PM +0100, Russell King - ARM Linux wrote:
> > Consider what happens when bio_alloc_pages() fails. j starts off as one
> > for non-recovery operations, and we enter the loop to allocate the pages.
> > j is post-decremented to zero. So, bio = r1_bio->bios[0].
> >
> > bio_alloc_pages(bio) fails, we jump to out_free_bio. The first thing
> > that does is increment j, so we free from r1_bio->bios[1] up to the
> > number of raid disks, leaving r1_bio->bios[0] leaked as the r1_bio is
> > then freed.
>
> Neil,
>
> Can you please review commit a07876064a0b7 (block: Add bio_alloc_pages)
> which seems to have introduced this bug - it seems to have gone in during
> the v3.10 merge window, and looks like it was never reviewed from the
> attributations on the commit.
>
> The commit message is brief, and inadequately describes the functional
> change that the patch has - we go from "get up to RESYNC_PAGES into the
> bio's io_vec" to "get all RESYNC_PAGES or fail completely".
>
> Not withstanding the breakage of the error cleanup paths, is this an
> acceptable change of behaviour here?
>
> Thanks.
>
Hi Russell,
thanks for finding that bug! - I'm sure I looked at that code, but obviously
missed the problem :-(
Below is the fix that I plan to submit. It is slightly different from yours
but should achieve the same effect. If you could confirm that it looks good
to you I would appreciate it.
Thanks,
NeilBrown
From 72dce88eee7259d65c6eba10c2e0beff357f713b Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Thu, 3 Apr 2014 10:19:12 +1100
Subject: [PATCH] md/raid1: r1buf_pool_alloc: free allocate pages when
subsequent allocation fails.
When performing a user-request check/repair (MD_RECOVERY_REQUEST is set)
on a raid1, we allocate multiple bios each with their own set of pages.
If the page allocations for one bio fails, we currently do *not* free
the pages allocated for the previous bios, nor do we free the bio itself.
This patch frees all the already-allocate pages, and makes sure that
all the bios are freed as well.
This bug can cause a memory leak which can ultimately OOM a machine.
It was introduced in 3.10-rc1.
Fixes: a07876064a0b73ab5ef1ebcf14b1cf0231c07858
Cc: Kent Overstreet <koverstreet@google.com>
Cc: stable@vger.kernel.org (3.10+)
Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4a6ca1cb2e78..56e24c072b62 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -97,6 +97,7 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
struct pool_info *pi = data;
struct r1bio *r1_bio;
struct bio *bio;
+ int need_pages;
int i, j;
r1_bio = r1bio_pool_alloc(gfp_flags, pi);
@@ -119,15 +120,15 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
* RESYNC_PAGES for each bio.
*/
if (test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery))
- j = pi->raid_disks;
+ need_pages = pi->raid_disks;
else
- j = 1;
- while(j--) {
+ need_pages = 1;
+ for (j = 0; j < need_pages; j++) {
bio = r1_bio->bios[j];
bio->bi_vcnt = RESYNC_PAGES;
if (bio_alloc_pages(bio, gfp_flags))
- goto out_free_bio;
+ goto out_free_pages;
}
/* If not user-requests, copy the page pointers to all bios */
if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
@@ -141,6 +142,14 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
return r1_bio;
+out_free_pages:
+ while (--j >= 0) {
+ struct bio_vec *bv;
+
+ bio_for_each_segment_all(bv, r1_bio->bios[j], i)
+ __free_page(bv->bv_page);
+ }
+
out_free_bio:
while (++j < pi->raid_disks)
bio_put(r1_bio->bios[j]);
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-04-02 23:28 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-16 20:05 Recent 3.x kernels: Memory leak causing OOMs Russell King - ARM Linux
2014-02-16 21:43 ` Theodore Ts'o
2014-02-17 18:34 ` Catalin Marinas
2014-02-16 22:17 ` David Rientjes
2014-02-16 22:50 ` Russell King - ARM Linux
2014-02-16 23:42 ` David Rientjes
2014-02-17 21:02 ` Maxime Bizon
2014-02-17 21:09 ` Russell King - ARM Linux
2014-03-15 10:19 ` Russell King - ARM Linux
2014-03-17 7:07 ` NeilBrown
2014-03-17 8:51 ` Russell King - ARM Linux
2014-03-17 18:18 ` Catalin Marinas
2014-03-17 19:33 ` Russell King - ARM Linux
2014-04-01 9:19 ` Russell King - ARM Linux
2014-04-01 11:38 ` Russell King - ARM Linux
2014-04-01 14:04 ` Russell King - ARM Linux
2014-04-02 23:28 ` NeilBrown
2014-04-01 15:58 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).