* khugepaged / firefox going wild in 3.18-rc
@ 2014-11-04 23:20 Norbert Preining
2014-11-05 0:00 ` David Rientjes
0 siblings, 1 reply; 12+ messages in thread
From: Norbert Preining @ 2014-11-04 23:20 UTC (permalink / raw)
To: linux-kernel
Dear all,
(please Cc)
since 3.18-rc started, regularly, but unpredictably, firefox
(Debian iceweasel) goes completely boom to 100% CPU, and there
is also a kernel task khugepaged doing the same.
I need to kill firefox, or better the X session to get it back to normal.
THe logs don't show any strange output
top's top process:
27021 norbert 20 0 1351536 449868 88788 R 100.4 5.6 4:46.16 iceweasel
35 root 39 19 0 0 0 R 100.0 0.0 2:41.10 khugepaged
Switching back to 3.17 does Not* exhibit the same behaviour.
Is this a bug in Firefox/Iceweasel, or is there something luring in the
kernel.
All the best
Norbert
------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-04 23:20 khugepaged / firefox going wild in 3.18-rc Norbert Preining
@ 2014-11-05 0:00 ` David Rientjes
2014-11-05 0:10 ` Norbert Preining
0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2014-11-05 0:00 UTC (permalink / raw)
To: Norbert Preining; +Cc: linux-kernel
On Wed, 5 Nov 2014, Norbert Preining wrote:
> Dear all,
>
> (please Cc)
>
> since 3.18-rc started, regularly, but unpredictably, firefox
> (Debian iceweasel) goes completely boom to 100% CPU, and there
> is also a kernel task khugepaged doing the same.
>
> I need to kill firefox, or better the X session to get it back to normal.
>
> THe logs don't show any strange output
>
> top's top process:
>
> 27021 norbert 20 0 1351536 449868 88788 R 100.4 5.6 4:46.16 iceweasel
> 35 root 39 19 0 0 0 R 100.0 0.0 2:41.10 khugepaged
>
> Switching back to 3.17 does Not* exhibit the same behaviour.
>
> Is this a bug in Firefox/Iceweasel, or is there something luring in the
> kernel.
>
If you have the ability to kill your X session, then you presumably are
able to capture /proc/pid/stack for these pids to see where it is busy.
It might also be helpful to see how the grep ^thp_ /proc/vmstat and
grep ^compact_ /proc/vmstat counters change over time.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-05 0:00 ` David Rientjes
@ 2014-11-05 0:10 ` Norbert Preining
2014-11-05 0:12 ` Norbert Preining
0 siblings, 1 reply; 12+ messages in thread
From: Norbert Preining @ 2014-11-05 0:10 UTC (permalink / raw)
To: David Rientjes; +Cc: linux-kernel
Hi David,
thanks for your answer.
On Tue, 04 Nov 2014, David Rientjes wrote:
> If you have the ability to kill your X session, then you presumably are
> able to capture /proc/pid/stack for these pids to see where it is busy.
Yes I can do that, I can even reproduce it on *every* start of iceweasel
after it happened the first time.
There is no /proc/PID/stack, though .....
> It might also be helpful to see how the grep ^thp_ /proc/vmstat and
> grep ^compact_ /proc/vmstat counters change over time.
I captured that over timne, as well as the contents of
/proc/PID/stat
for iceweasel and khugepaged. There are some numbers steadily
increasing.
Here is what I got:
START
thp_fault_alloc 9092
thp_fault_fallback 87
thp_collapse_alloc 5003
thp_collapse_alloc_failed 19
thp_split 2889
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 400301
compact_free_scanned 8634053
compact_isolated 687993
compact_stall 567
compact_fail 101
compact_success 466
thp_fault_alloc 9127
thp_fault_fallback 100
thp_collapse_alloc 5008
thp_collapse_alloc_failed 19
thp_split 2892
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 430094
compact_free_scanned 10444689
compact_isolated 709843
compact_stall 603
compact_fail 130
compact_success 473
===
thp_fault_alloc 9127
thp_fault_fallback 100
thp_collapse_alloc 5008
thp_collapse_alloc_failed 19
thp_split 2892
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 430094
compact_free_scanned 10444689
compact_isolated 709843
compact_stall 603
compact_fail 130
compact_success 473
===
thp_fault_alloc 9127
thp_fault_fallback 100
thp_collapse_alloc 5008
thp_collapse_alloc_failed 19
thp_split 2892
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 430094
compact_free_scanned 10444689
compact_isolated 709843
compact_stall 603
compact_fail 130
compact_success 473
===
thp_fault_alloc 9127
thp_fault_fallback 100
thp_collapse_alloc 5008
thp_collapse_alloc_failed 19
thp_split 2892
thp_zero_page_alloc 2
thp_zero_page_alloc_failed 0
compact_migrate_scanned 430094
compact_free_scanned 10444689
compact_isolated 709843
compact_stall 603
compact_fail 130
compact_success 473
===
/proc/PID/stat for iceweasel over some time:
8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 14849 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 2 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15130 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15460 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15724 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
/proc/PID/stat for khugepaged over some time:
35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 39779 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40104 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40516 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40797 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Hope that helps
Norbert
------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-05 0:10 ` Norbert Preining
@ 2014-11-05 0:12 ` Norbert Preining
2014-11-05 0:20 ` David Rientjes
0 siblings, 1 reply; 12+ messages in thread
From: Norbert Preining @ 2014-11-05 0:12 UTC (permalink / raw)
To: David Rientjes; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5232 bytes --]
Hi David,
one more thing, attached dmesg output with some page faults,
maybe this is connected?
On Wed, 05 Nov 2014, Norbert Preining wrote:
> Hi David,
>
> thanks for your answer.
>
> On Tue, 04 Nov 2014, David Rientjes wrote:
> > If you have the ability to kill your X session, then you presumably are
> > able to capture /proc/pid/stack for these pids to see where it is busy.
>
> Yes I can do that, I can even reproduce it on *every* start of iceweasel
> after it happened the first time.
>
> There is no /proc/PID/stack, though .....
>
> > It might also be helpful to see how the grep ^thp_ /proc/vmstat and
> > grep ^compact_ /proc/vmstat counters change over time.
>
> I captured that over timne, as well as the contents of
> /proc/PID/stat
> for iceweasel and khugepaged. There are some numbers steadily
> increasing.
>
> Here is what I got:
>
> START
> thp_fault_alloc 9092
> thp_fault_fallback 87
> thp_collapse_alloc 5003
> thp_collapse_alloc_failed 19
> thp_split 2889
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 400301
> compact_free_scanned 8634053
> compact_isolated 687993
> compact_stall 567
> compact_fail 101
> compact_success 466
>
>
> thp_fault_alloc 9127
> thp_fault_fallback 100
> thp_collapse_alloc 5008
> thp_collapse_alloc_failed 19
> thp_split 2892
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 430094
> compact_free_scanned 10444689
> compact_isolated 709843
> compact_stall 603
> compact_fail 130
> compact_success 473
> ===
> thp_fault_alloc 9127
> thp_fault_fallback 100
> thp_collapse_alloc 5008
> thp_collapse_alloc_failed 19
> thp_split 2892
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 430094
> compact_free_scanned 10444689
> compact_isolated 709843
> compact_stall 603
> compact_fail 130
> compact_success 473
> ===
> thp_fault_alloc 9127
> thp_fault_fallback 100
> thp_collapse_alloc 5008
> thp_collapse_alloc_failed 19
> thp_split 2892
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 430094
> compact_free_scanned 10444689
> compact_isolated 709843
> compact_stall 603
> compact_fail 130
> compact_success 473
> ===
> thp_fault_alloc 9127
> thp_fault_fallback 100
> thp_collapse_alloc 5008
> thp_collapse_alloc_failed 19
> thp_split 2892
> thp_zero_page_alloc 2
> thp_zero_page_alloc_failed 0
> compact_migrate_scanned 430094
> compact_free_scanned 10444689
> compact_isolated 709843
> compact_stall 603
> compact_fail 130
> compact_success 473
> ===
>
>
> /proc/PID/stat for iceweasel over some time:
> 8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 14849 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 2 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
> 8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15130 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
> 8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15460 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
> 8927 (iceweasel) R 27016 26901 26901 0 -1 4204544 100479 2248 5 0 565 15724 1 0 20 0 47 0 38224838 1134931968 84204 18446744073709551615 139840244875264 139840244981188 140737356374240 140737356331656 139840166756100 0 0 4096 17583 18446744073709551615 0 0 17 1 0 0 1 0 0 139840247078912 139840247080712 139840278519808 140737356376815 140737356376825 140737356376825 140737356378085 0
>
>
> /proc/PID/stat for khugepaged over some time:
> 35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 39779 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40104 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
> 35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40516 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 2 0 0 0 0 0 0 0 0 0 0 0 0 0
> 35 (khugepaged) R 2 0 0 0 -1 2107456 0 0 0 0 0 40797 0 0 39 19 1 0 22 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
>
Norbert
------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------
[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 5195 bytes --]
[28664.834183] INFO: task Socket Thread:8931 blocked for more than 120 seconds.
[28664.834190] Tainted: G O 3.18.0-rc2+ #47
[28664.834193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[28664.834196] Socket Thread D ffff8800ccc775e8 0 8931 27016 0x00000000
[28664.834203] ffff88020de53d50 0000000000000082 ffff880215162048 ffff88020de53fd8
[28664.834209] ffff8800ccc771a8 0000000000011c40 ffffffff814f33d5 ffff8802090f5be0
[28664.834215] ffff8800ccc771a8 0000000000000000 00000000000000a9 ffff8800ccc771a8
[28664.834220] Call Trace:
[28664.834231] [<ffffffff814f33d5>] ? sock_recvmsg+0x55/0x76
[28664.834238] [<ffffffff815b9313>] schedule+0x64/0x66
[28664.834243] [<ffffffff815bad67>] rwsem_down_read_failed+0xc7/0xdf
[28664.834249] [<ffffffff81055c14>] ? try_to_wake_up+0x1a1/0x1b3
[28664.834256] [<ffffffff812592e4>] call_rwsem_down_read_failed+0x14/0x30
[28664.834261] [<ffffffff815ba85a>] ? down_read+0x12/0x14
[28664.834267] [<ffffffff81030433>] __do_page_fault+0x1ba/0x34a
[28664.834272] [<ffffffff814f6553>] ? release_sock+0x112/0x11b
[28664.834279] [<ffffffff81538ba2>] ? do_tcp_setsockopt.isra.23+0x60e/0x674
[28664.834286] [<ffffffff810f8186>] ? fput+0x12/0x86
[28664.834290] [<ffffffff810305f0>] do_page_fault+0xc/0xe
[28664.834295] [<ffffffff815bd692>] page_fault+0x22/0x30
[28664.834299] INFO: task JS Watchdog:8933 blocked for more than 120 seconds.
[28664.834302] Tainted: G O 3.18.0-rc2+ #47
[28664.834305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[28664.834307] JS Watchdog D ffff8801f22d3768 0 8933 27016 0x00000000
[28664.834312] ffff880207243e48 0000000000000082 ffff8802151788b8 ffff880207243fd8
[28664.834317] ffff8801f22d3328 0000000000011c40 ffff880207243ee8 ffff8802090f5be0
[28664.834322] ffffffff00000001 ffff8802090f5bf8 ffff8801f22d3328 00007f2efa914000
[28664.834328] Call Trace:
[28664.834333] [<ffffffff815b9313>] schedule+0x64/0x66
[28664.834338] [<ffffffff815bb013>] rwsem_down_write_failed+0x294/0x2d2
[28664.834344] [<ffffffff81259313>] call_rwsem_down_write_failed+0x13/0x20
[28664.834349] [<ffffffff815ba87b>] ? down_write+0x1f/0x2e
[28664.834355] [<ffffffff810da3aa>] SyS_mprotect+0xbe/0x1d7
[28664.834360] [<ffffffff815bbf96>] system_call_fastpath+0x16/0x1b
[28664.834364] INFO: task pool:8937 blocked for more than 120 seconds.
[28664.834367] Tainted: G O 3.18.0-rc2+ #47
[28664.834369] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[28664.834371] pool D ffff8801e8947238 0 8937 27016 0x00000000
[28664.834376] ffff880207e8bd50 0000000000000082 ffff880215162048 ffff880207e8bfd8
[28664.834381] ffff8801e8946df8 0000000000011c40 ffff880207e8bd20 ffff8802090f5be0
[28664.834386] ffff8801e8946df8 0000000000000000 00000000000000a9 ffff8801e8946df8
[28664.834391] Call Trace:
[28664.834397] [<ffffffff815b9313>] schedule+0x64/0x66
[28664.834411] [<ffffffff815bad67>] rwsem_down_read_failed+0xc7/0xdf
[28664.834416] [<ffffffff8107945b>] ? hrtimer_get_res+0x3f/0x3f
[28664.834421] [<ffffffff812592e4>] call_rwsem_down_read_failed+0x14/0x30
[28664.834426] [<ffffffff815ba85a>] ? down_read+0x12/0x14
[28664.834430] [<ffffffff81030433>] __do_page_fault+0x1ba/0x34a
[28664.834435] [<ffffffff81107d5f>] ? __dentry_kill+0x198/0x1a0
[28664.834441] [<ffffffff8110eb3d>] ? mntput_no_expire+0x2a/0x10c
[28664.834445] [<ffffffff8110ec47>] ? mntput+0x28/0x2a
[28664.834450] [<ffffffff810f837e>] ? __fput+0x184/0x193
[28664.834456] [<ffffffff810091ad>] ? read_tsc+0x9/0xb
[28664.834462] [<ffffffff810882c5>] ? SyS_futex+0xb9/0x120
[28664.834467] [<ffffffff810305f0>] do_page_fault+0xc/0xe
[28664.834471] [<ffffffff815bd692>] page_fault+0x22/0x30
[28664.834474] INFO: task Timer:8939 blocked for more than 120 seconds.
[28664.834477] Tainted: G O 3.18.0-rc2+ #47
[28664.834479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[28664.834481] Timer D ffff8800cffa25c8 0 8939 27016 0x00000000
[28664.834486] ffff8801cd9c3d50 0000000000000082 ffff8802151788b8 ffff8801cd9c3fd8
[28664.834490] ffff8800cffa2188 0000000000011c40 ffff8801cd9c3d20 ffff8802090f5be0
[28664.834495] ffff8800cffa2188 0000000000000000 00000000000000a9 ffff8800cffa2188
[28664.834500] Call Trace:
[28664.834505] [<ffffffff815b9313>] schedule+0x64/0x66
[28664.834509] [<ffffffff815bad67>] rwsem_down_read_failed+0xc7/0xdf
[28664.834515] [<ffffffff812592e4>] call_rwsem_down_read_failed+0x14/0x30
[28664.834520] [<ffffffff815ba85a>] ? down_read+0x12/0x14
[28664.834524] [<ffffffff81030433>] __do_page_fault+0x1ba/0x34a
[28664.834530] [<ffffffff81122141>] ? fsnotify+0x228/0x24d
[28664.834535] [<ffffffff81058f22>] ? calc_delta_fair+0x1d/0x1f
[28664.834540] [<ffffffff810882c5>] ? SyS_futex+0xb9/0x120
[28664.834545] [<ffffffff815b913a>] ? __schedule+0x322/0x497
[28664.834549] [<ffffffff810305f0>] do_page_fault+0xc/0xe
[28664.834553] [<ffffffff815bd692>] page_fault+0x22/0x30
[28695.709435] Chrome_ChildThr[8990]: segfault at 0 ip 00007fc42ea9c3da sp 00007fc4260b9450 error 6 in libmozalloc.so[7fc42ea9b000+2000]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-05 0:12 ` Norbert Preining
@ 2014-11-05 0:20 ` David Rientjes
2014-11-06 12:25 ` Vlastimil Babka
0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2014-11-05 0:20 UTC (permalink / raw)
To: Norbert Preining; +Cc: linux-kernel
On Wed, 5 Nov 2014, Norbert Preining wrote:
> Hi David,
>
> one more thing, attached dmesg output with some page faults,
> maybe this is connected?
>
Hmm, I'm not aware of any mm->mmap_sem starvation issues in 3.18-rc, maybe
this is a duplicate of another issue that someone has reported that I
haven't seen. The lengthy output of echo t > /proc/sysrq-trigger should
give a clue as to what is holding it or perhaps this is a more generic
rwsem issue.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-05 0:20 ` David Rientjes
@ 2014-11-06 12:25 ` Vlastimil Babka
2014-11-06 12:39 ` Norbert Preining
0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2014-11-06 12:25 UTC (permalink / raw)
To: David Rientjes, Norbert Preining; +Cc: linux-kernel
On 11/05/2014 01:20 AM, David Rientjes wrote:
> On Wed, 5 Nov 2014, Norbert Preining wrote:
>
>> Hi David,
>>
>> one more thing, attached dmesg output with some page faults,
>> maybe this is connected?
>>
>
> Hmm, I'm not aware of any mm->mmap_sem starvation issues in 3.18-rc, maybe
> this is a duplicate of another issue that someone has reported that I
> haven't seen. The lengthy output of echo t > /proc/sysrq-trigger should
> give a clue as to what is holding it or perhaps this is a more generic
> rwsem issue.
Could be that another task holds the mmap_sem during THP allocation attempt on
its own page fault, and compaction goes in some kind of infinite loop. There are
two other threads that look similar:
http://article.gmane.org/gmane.linux.kernel.mm/124451/match=isolate_freepages_block+very+high+intermittent+overhead
https://lkml.org/lkml/2014/11/4/144
I suggested testing a commit revert in one thread, and a possible fix in the
other. If you can reproduce this well, that would be very useful.
khugepaged using CPU also points to either the address space scanning, or
compaction going wrong. Since 8b1645685ac it shouldn't hold mmap_sem during
compaction, but that still leaves page faulters to possibly hold it.
So yeah we would need the stacks of processes that do hog the CPU's, not those
that sleep. As David suggested, a /proc/pid/stack could work. Also can you
please provide /proc/zoneinfo ?
Thanks,
Vlastimil
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-06 12:25 ` Vlastimil Babka
@ 2014-11-06 12:39 ` Norbert Preining
2014-11-06 13:03 ` Vlastimil Babka
0 siblings, 1 reply; 12+ messages in thread
From: Norbert Preining @ 2014-11-06 12:39 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: David Rientjes, linux-kernel
Hi Vlastimil
thanks for your answer.
In the meantime I have tried rc3, too, with the same effects.
Interestingly, once it goes into a bad state, every future approach
does the same. I started shotwell (photo organizer) and it went into the
same state (khugepaged / shotwell each using about 100% of CPu time).
On Thu, 06 Nov 2014, Vlastimil Babka wrote:
> Could be that another task holds the mmap_sem during THP allocation attempt on
> its own page fault, and compaction goes in some kind of infinite loop. There are
My feeling somehow is about the plugin-container in firefox ...
(flashplayer or something similar, but I might be wrong!). With shotwell,
I have no idea why.
> I suggested testing a commit revert in one thread, and a possible fix in the
> other. If you can reproduce this well, that would be very useful.
Which commit are you talking about? I can easily revert some/all of what you
want and do test runs.
> khugepaged using CPU also points to either the address space scanning, or
> compaction going wrong. Since 8b1645685ac it shouldn't hold mmap_sem during
> compaction, but that still leaves page faulters to possibly hold it.
So, do you mean I should try reverting 8b1645685ac?
> So yeah we would need the stacks of processes that do hog the CPU's, not those
> that sleep. As David suggested, a /proc/pid/stack could work. Also can you
> please provide /proc/zoneinfo ?
Again, as I mentioned, I don't have /proc/pid/stack for any "pid", is
this depending on some specific kerenl option?
/proc/zoneinfo I have and can send you the next time.
Thanks a lot and all the best
Norbert
------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-06 12:39 ` Norbert Preining
@ 2014-11-06 13:03 ` Vlastimil Babka
2014-11-07 13:07 ` Norbert Preining
0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2014-11-06 13:03 UTC (permalink / raw)
To: Norbert Preining; +Cc: David Rientjes, linux-kernel
On 11/06/2014 01:39 PM, Norbert Preining wrote:
> Hi Vlastimil
>
> thanks for your answer.
>
> In the meantime I have tried rc3, too, with the same effects.
>
> Interestingly, once it goes into a bad state, every future approach
> does the same. I started shotwell (photo organizer) and it went into the
> same state (khugepaged / shotwell each using about 100% of CPu time).
>
> On Thu, 06 Nov 2014, Vlastimil Babka wrote:
>> Could be that another task holds the mmap_sem during THP allocation attempt on
>> its own page fault, and compaction goes in some kind of infinite loop. There are
>
> My feeling somehow is about the plugin-container in firefox ...
> (flashplayer or something similar, but I might be wrong!). With shotwell,
> I have no idea why.
plugin-container is different process than firefox, so it should show its CPU
consumption separately. If you see firefox, it's firefox binary itself.
>> I suggested testing a commit revert in one thread, and a possible fix in the
>> other. If you can reproduce this well, that would be very useful.
>
> Which commit are you talking about? I can easily revert some/all of what you
> want and do test runs.
OK, one possibility is to do (it should apply cleanly)
git revert e14c720efdd73c6d69cd8d07fa894bcd11fe1973
Then there's a patch at the end of this e-mail, which however I doubt would fix
the symptoms you describe.
>> khugepaged using CPU also points to either the address space scanning, or
>> compaction going wrong. Since 8b1645685ac it shouldn't hold mmap_sem during
>> compaction, but that still leaves page faulters to possibly hold it.
>
> So, do you mean I should try reverting 8b1645685ac?
No, not that one. That one should actually reduce the problem you see.
>> So yeah we would need the stacks of processes that do hog the CPU's, not those
>> that sleep. As David suggested, a /proc/pid/stack could work. Also can you
>> please provide /proc/zoneinfo ?
>
> Again, as I mentioned, I don't have /proc/pid/stack for any "pid", is
> this depending on some specific kerenl option?
Ah I missed that. Should be CONFIG_STACKTRACE to enable that.
> /proc/zoneinfo I have and can send you the next time.
>
> Thanks a lot and all the best
Great, thank you!
Vlastimil
> Norbert
>
> ------------------------------------------------------------------------
> PREINING, Norbert http://www.preining.info
> JAIST, Japan TeX Live & Debian Developer
> GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
> ------------------------------------------------------------------------
>
------8<------
>From fe9c963cc665cdab50abb41f3babb5b19d08fab1 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Wed, 5 Nov 2014 14:19:18 +0100
Subject: [PATCH] mm, compaction: do not reset deferred compaction
optimistically
In try_to_compact_pages() we reset deferred compaction for a zone where we
think compaction has succeeded. Although this action does not reset the
counters affecting deferred compaction period, just bumping the deferred order
means that another compaction attempt will be able to pass the check in
compaction_deferred() and proceed with compaction.
This is a problem when try_to_compact_pages() thinks compaction was successful
just because the watermark check is missing proper classzone_idx parameter,
but then the allocation attempt itself will fail due to its watermark check
having the proper value. Although __alloc_pages_direct_compact() will re-defer
compaction in such case, this happens only in the case of sync compaction.
Async compaction will leave the zone open for another compaction attempt which
may reset the deferred order again. This could possibly explain what
P. Christeas reported - a system where many processes include the following
backtrace:
[<ffffffff813b1025>] preempt_schedule_irq+0x3c/0x59
[<ffffffff813b4810>] retint_kernel+0x20/0x30
[<ffffffff810d7481>] ? __zone_watermark_ok+0x77/0x85
[<ffffffff810d8256>] zone_watermark_ok+0x1a/0x1c
[<ffffffff810eee56>] compact_zone+0x215/0x4b2
[<ffffffff810ef13f>] compact_zone_order+0x4c/0x5f
[<ffffffff810ef2fe>] try_to_compact_pages+0xc4/0x1e8
[<ffffffff813ad7f8>] __alloc_pages_direct_compact+0x61/0x1bf
[<ffffffff810da299>] __alloc_pages_nodemask+0x409/0x799
[<ffffffff8110d3fd>] new_slab+0x5f/0x21c
The issue has been made visible by commit 53853e2d2bfb ("mm, compaction: defer
each zone individually instead of preferred zone"), since before the commit,
deferred compaction for fallback zones (where classzone_idx matters) was not
considered separately.
Although work is underway to fix the underlying zone watermark check mismatch,
this patch fixes the immediate problem by removing the optimistic defer reset
completely. Its usefulness is questionable anyway, since if the allocation
really succeeds, a full defer reset (including the period counters) follows.
Fixes: 53853e2d2bfb748a8b5aa2fd1de15699266865e0
Reported-by: P. Christeas <xrg@linux.gr>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/compaction.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index ec74cf0..f0335f9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1325,13 +1325,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
alloc_flags)) {
*candidate_zone = zone;
/*
- * We think the allocation will succeed in this zone,
- * but it is not certain, hence the false. The caller
- * will repeat this with true if allocation indeed
- * succeeds in this zone.
- */
- compaction_defer_reset(zone, order, false);
- /*
* It is possible that async compaction aborted due to
* need_resched() and the watermarks were ok thanks to
* somebody else freeing memory. The allocation can
--
2.1.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-06 13:03 ` Vlastimil Babka
@ 2014-11-07 13:07 ` Norbert Preining
2014-11-07 13:38 ` Vlastimil Babka
0 siblings, 1 reply; 12+ messages in thread
From: Norbert Preining @ 2014-11-07 13:07 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: David Rientjes, linux-kernel
Hi Vlastimil,
On Thu, 06 Nov 2014, Vlastimil Babka wrote:
> OK, one possibility is to do (it should apply cleanly)
> git revert e14c720efdd73c6d69cd8d07fa894bcd11fe1973
I am running with this reverted now, and I cannot reproduce the
khugepage going overboard anymore.
I tried hard since 2 days now, no chance to reproduce.
Before firefox or shotwell went into boom mode, but not now.
(For now!)
> > Again, as I mentioned, I don't have /proc/pid/stack for any "pid", is
> > this depending on some specific kerenl option?
>
> Ah I missed that. Should be CONFIG_STACKTRACE to enable that.
If you want, I can compile a "default" kernel without the above commit
reverted, and turn on STACKTRACE (I have turned it one for *all* future
kernels I compile ;-)
I also didn't try the patch you send me, just the revert from above.
Please let me know which other experiments you want to see.
Thanks
Norbert
------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-07 13:07 ` Norbert Preining
@ 2014-11-07 13:38 ` Vlastimil Babka
2014-11-07 13:57 ` Norbert Preining
0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2014-11-07 13:38 UTC (permalink / raw)
To: Norbert Preining; +Cc: David Rientjes, linux-kernel
On 11/07/2014 02:07 PM, Norbert Preining wrote:
> Hi Vlastimil,
>
> On Thu, 06 Nov 2014, Vlastimil Babka wrote:
>> OK, one possibility is to do (it should apply cleanly)
>> git revert e14c720efdd73c6d69cd8d07fa894bcd11fe1973
>
> I am running with this reverted now, and I cannot reproduce the
> khugepage going overboard anymore.
Great, that's good news if I understand correctly, but ...
> I tried hard since 2 days now, no chance to reproduce.
> Before firefox or shotwell went into boom mode, but not now.
> (For now!)
I suggested the commit to you for revert 1 day ago, and you say you
can't reproduce it for 2 days already? That's a bit suspicious. Did you
already tried to revert it before my suggestion, or were you unable to
reproduce it anymore even before the revert?
>>> Again, as I mentioned, I don't have /proc/pid/stack for any "pid", is
>>> this depending on some specific kerenl option?
>>
>> Ah I missed that. Should be CONFIG_STACKTRACE to enable that.
>
> If you want, I can compile a "default" kernel without the above commit
Hmm it would probably be more useful if I sent you some debug printing
patches first... the stacktrace won't probably tell us much we don't
already know.
> reverted, and turn on STACKTRACE (I have turned it one for *all* future
> kernels I compile ;-)
Yeah that's a good idea :)
> I also didn't try the patch you send me, just the revert from above.
Yeah if the revert helped, the other patch probably wouldn't change
anything. It also didn't help the other guy with similar problem.
> Please let me know which other experiments you want to see.
I'll prepare a debugging patch and send with instructions. Meanwhile you
could send the /proc/zoneinfo contents? :)
Thanks!
Vlastimil
> Thanks
>
> Norbert
>
> ------------------------------------------------------------------------
> PREINING, Norbert http://www.preining.info
> JAIST, Japan TeX Live & Debian Developer
> GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
> ------------------------------------------------------------------------
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-07 13:38 ` Vlastimil Babka
@ 2014-11-07 13:57 ` Norbert Preining
2014-11-07 15:44 ` Vlastimil Babka
0 siblings, 1 reply; 12+ messages in thread
From: Norbert Preining @ 2014-11-07 13:57 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: David Rientjes, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]
Hi Vlastimil,
On Fri, 07 Nov 2014, Vlastimil Babka wrote:
> Great, that's good news if I understand correctly, but ...
no "but ..."
> I suggested the commit to you for revert 1 day ago, and you say you
> can't reproduce it for 2 days already? That's a bit suspicious. Did
No, you suggested it yesterday during the day, and here in Japan the
next day is already over, so my feeling is two days ;-)
So all fine ;-)
> I'll prepare a debugging patch and send with instructions. Meanwhile
> you could send the /proc/zoneinfo contents? :)
When? Running which kernel? Anyway, with the current kernel (reverted
commit as before) I get the attached zoneinfo.
Thanks, and waiting for your patches ;-)
Norbert
------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------
[-- Attachment #2: zoneinfo.txt --]
[-- Type: text/plain, Size: 4639 bytes --]
Node 0, zone DMA
pages free 3974
min 66
low 82
high 99
scanned 0
spanned 4095
present 3996
managed 3974
nr_free_pages 3974
nr_alloc_batch 17
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
nr_pages_scanned 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_transparent_hugepages 0
nr_free_cma 0
protection: (0, 3399, 7878, 7878)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 2
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 3
count: 0
high: 0
batch: 1
vm stats threshold: 6
all_unreclaimable: 1
start_pfn: 1
inactive_ratio: 1
Node 0, zone DMA32
pages free 557017
min 14552
low 18190
high 21828
scanned 0
spanned 1044480
present 889801
managed 871049
nr_free_pages 557017
nr_alloc_batch 3572
nr_inactive_anon 12444
nr_active_anon 67252
nr_inactive_file 86719
nr_active_file 125886
nr_unevictable 4
nr_mlock 4
nr_anon_pages 67107
nr_mapped 22262
nr_file_pages 225198
nr_dirty 20
nr_writeback 0
nr_slab_reclaimable 11794
nr_slab_unreclaimable 5026
nr_page_table_pages 2369
nr_kernel_stack 226
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 12593
nr_dirtied 224825
nr_written 214052
nr_pages_scanned 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_transparent_hugepages 28
nr_free_cma 0
protection: (0, 0, 4478, 4478)
pagesets
cpu: 0
count: 159
high: 186
batch: 31
vm stats threshold: 36
cpu: 1
count: 145
high: 186
batch: 31
vm stats threshold: 36
cpu: 2
count: 99
high: 186
batch: 31
vm stats threshold: 36
cpu: 3
count: 65
high: 186
batch: 31
vm stats threshold: 36
all_unreclaimable: 0
start_pfn: 4096
inactive_ratio: 5
Node 0, zone Normal
pages free 712051
min 19172
low 23965
high 28758
scanned 0
spanned 1179136
present 1179136
managed 1146604
nr_free_pages 712051
nr_alloc_batch 1771
nr_inactive_anon 11186
nr_active_anon 96996
nr_inactive_file 113106
nr_active_file 178825
nr_unevictable 8
nr_mlock 8
nr_anon_pages 96917
nr_mapped 31145
nr_file_pages 303206
nr_dirty 13
nr_writeback 0
nr_slab_reclaimable 13001
nr_slab_unreclaimable 5053
nr_page_table_pages 3211
nr_kernel_stack 207
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 11275
nr_dirtied 327684
nr_written 314671
nr_pages_scanned 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_transparent_hugepages 45
nr_free_cma 0
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 170
high: 186
batch: 31
vm stats threshold: 42
cpu: 1
count: 134
high: 186
batch: 31
vm stats threshold: 42
cpu: 2
count: 136
high: 186
batch: 31
vm stats threshold: 42
cpu: 3
count: 97
high: 186
batch: 31
vm stats threshold: 42
all_unreclaimable: 0
start_pfn: 1048576
inactive_ratio: 6
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: khugepaged / firefox going wild in 3.18-rc
2014-11-07 13:57 ` Norbert Preining
@ 2014-11-07 15:44 ` Vlastimil Babka
0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2014-11-07 15:44 UTC (permalink / raw)
To: Norbert Preining; +Cc: David Rientjes, linux-kernel
On 11/07/2014 02:57 PM, Norbert Preining wrote:
> Hi Vlastimil,
>
> On Fri, 07 Nov 2014, Vlastimil Babka wrote:
>> Great, that's good news if I understand correctly, but ...
>
> no "but ..."
>
>> I suggested the commit to you for revert 1 day ago, and you say you
>> can't reproduce it for 2 days already? That's a bit suspicious. Did
>
> No, you suggested it yesterday during the day, and here in Japan the
> next day is already over, so my feeling is two days ;-)
>
> So all fine ;-)
Ah, good, just wanted to be sure.
>> I'll prepare a debugging patch and send with instructions. Meanwhile
>> you could send the /proc/zoneinfo contents? :)
>
> When? Running which kernel? Anyway, with the current kernel (reverted
> commit as before) I get the attached zoneinfo.
It doesn't matter which kernel. Thanks, but I didn't find anything
suspicious there... expected some oddly aligned zones, but it all seemed
to be aligned to pageblock boundaries.
> Thanks, and waiting for your patches ;-)
Tracing patch attached. You should apply this to the broken kernel, i.e.
without the revert, and have tracing enabled, i.e. CONFIG_FTRACE. There
should be /sys/kernel/debug/tracing directory.
To avoid overhead and noise, I would just run this kernel as usual, and
only when khugepaged/firefox/whatever starts misbehaving, do the following:
cd /sys/kernel/debug/tracing
echo 1 > tracing_on
echo 1 > events/compaction/enable
cat trace_pipe | tee /tmp/trace #(or somewhere else)
You should see events scrolling on the screen, including the misbehaving
processes. After a minute or something, ctrl+c and send me the trace file.
Thanks a lot!
Vlastimil
> Norbert
>
------8<------
>From 59c93237ad2fb2317e61c8f00ea73d93ff8a2813 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 7 Nov 2014 16:12:14 +0100
Subject: [PATCH] compaction: detailed free scanner tracing
---
include/trace/events/compaction.h | 51 +++++++++++++++++++++++++++++++++------
mm/compaction.c | 9 +++++--
2 files changed, 50 insertions(+), 10 deletions(-)
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index c6814b9..db83ea4 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -12,38 +12,73 @@
DECLARE_EVENT_CLASS(mm_compaction_isolate_template,
TP_PROTO(unsigned long nr_scanned,
- unsigned long nr_taken),
+ unsigned long nr_taken,
+ unsigned long last_pfn),
- TP_ARGS(nr_scanned, nr_taken),
+ TP_ARGS(nr_scanned, nr_taken, last_pfn),
TP_STRUCT__entry(
__field(unsigned long, nr_scanned)
__field(unsigned long, nr_taken)
+ __field(unsigned long, last_pfn)
),
TP_fast_assign(
__entry->nr_scanned = nr_scanned;
__entry->nr_taken = nr_taken;
+ __entry->last_pfn = last_pfn;
),
- TP_printk("nr_scanned=%lu nr_taken=%lu",
+ TP_printk("nr_scanned=%lu nr_taken=%lu last_pfn=%lu",
__entry->nr_scanned,
- __entry->nr_taken)
+ __entry->nr_taken,
+ __entry->last_pfn)
);
DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_migratepages,
TP_PROTO(unsigned long nr_scanned,
- unsigned long nr_taken),
+ unsigned long nr_taken,
+ unsigned long last_pfn),
- TP_ARGS(nr_scanned, nr_taken)
+ TP_ARGS(nr_scanned, nr_taken, last_pfn)
);
DEFINE_EVENT(mm_compaction_isolate_template, mm_compaction_isolate_freepages,
TP_PROTO(unsigned long nr_scanned,
- unsigned long nr_taken),
+ unsigned long nr_taken,
+ unsigned long last_pfn),
- TP_ARGS(nr_scanned, nr_taken)
+ TP_ARGS(nr_scanned, nr_taken, last_pfn)
+);
+
+TRACE_EVENT(mm_compaction_isolate_freepages_loop,
+ TP_PROTO(unsigned long low_pfn,
+ unsigned long block_start_pfn,
+ unsigned long isolate_start_pfn,
+ unsigned long block_end_pfn),
+
+ TP_ARGS(low_pfn, block_start_pfn, isolate_start_pfn, block_end_pfn),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, low_pfn)
+ __field(unsigned long, block_start_pfn)
+ __field(unsigned long, isolate_start_pfn)
+ __field(unsigned long, block_end_pfn)
+ ),
+
+ TP_fast_assign(
+ __entry->low_pfn = low_pfn;
+ __entry->block_start_pfn = block_start_pfn;
+ __entry->isolate_start_pfn = isolate_start_pfn;
+ __entry->block_end_pfn = block_end_pfn;
+ ),
+
+ TP_printk("low=%lu block_start=%lu isolate_start=%lu block_end=%lu",
+ __entry->low_pfn,
+ __entry->block_start_pfn,
+ __entry->isolate_start_pfn,
+ __entry->block_end_pfn)
);
TRACE_EVENT(mm_compaction_migratepages,
diff --git a/mm/compaction.c b/mm/compaction.c
index ec74cf0..4931b21 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -426,7 +426,8 @@ isolate_fail:
/* Record how far we have got within the block */
*start_pfn = blockpfn;
- trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
+ trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated,
+ blockpfn);
/*
* If strict isolation is requested by CMA then check that all the
@@ -734,7 +735,8 @@ isolate_success:
if (low_pfn == end_pfn)
update_pageblock_skip(cc, valid_page, nr_isolated, true);
- trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
+ trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated,
+ low_pfn);
count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
if (nr_isolated)
@@ -838,6 +840,9 @@ static void isolate_freepages(struct compact_control *cc)
isolate_start_pfn = block_start_pfn) {
unsigned long isolated;
+ trace_mm_compaction_isolate_freepages_loop(low_pfn,
+ block_start_pfn, isolate_start_pfn, block_end_pfn);
+
/*
* This can iterate a massively long zone without finding any
* suitable migration targets, so periodically check if we need
--
2.1.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-11-07 15:44 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-04 23:20 khugepaged / firefox going wild in 3.18-rc Norbert Preining
2014-11-05 0:00 ` David Rientjes
2014-11-05 0:10 ` Norbert Preining
2014-11-05 0:12 ` Norbert Preining
2014-11-05 0:20 ` David Rientjes
2014-11-06 12:25 ` Vlastimil Babka
2014-11-06 12:39 ` Norbert Preining
2014-11-06 13:03 ` Vlastimil Babka
2014-11-07 13:07 ` Norbert Preining
2014-11-07 13:38 ` Vlastimil Babka
2014-11-07 13:57 ` Norbert Preining
2014-11-07 15:44 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox