All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/numa: use down_read_trylock for mmap_sem
@ 2017-05-15 13:13 Vlastimil Babka
  2017-05-15 14:27 ` Rik van Riel
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Vlastimil Babka @ 2017-05-15 13:13 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Mel Gorman, Rik van Riel, linux-kernel, Vlastimil Babka

A customer has reported a soft-lockup when running a proprietary intensive
memory stress test, where the trace on multiple CPU's looks like this:

 RIP: 0010:[<ffffffff810c53fe>]
  [<ffffffff810c53fe>] native_queued_spin_lock_slowpath+0x10e/0x190
...
 Call Trace:
  [<ffffffff81182d07>] queued_spin_lock_slowpath+0x7/0xa
  [<ffffffff811bc331>] change_protection_range+0x3b1/0x930
  [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
  [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
  [<ffffffff81098322>] task_work_run+0x72/0x90

Further investigation showed that the lock contention here is pmd_lock().

The task_numa_work() function makes sure that only one thread is let to perform
the work in a single scan period (via cmpxchg), but if there's a thread with
mmap_sem locked for writing for several periods, multiple threads in
task_numa_work() can build up a convoy waiting for mmap_sem for read and then
all get unblocked at once.

This patch changes the down_read() to the trylock version, which prevents the
build up. For a workload experiencing mmap_sem contention, it's probably better
to postpone the NUMA balancing work anyway. This seems to have fixed the soft
lockups involving pmd_lock(), which is in line with the convoy theory.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 kernel/sched/fair.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dea138964b91..d70f9026defc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2475,7 +2475,8 @@ void task_numa_work(struct callback_head *work)
 		return;
 
 
-	down_read(&mm->mmap_sem);
+	if (!down_read_trylock(&mm->mmap_sem))
+		return;
 	vma = find_vma(mm, start);
 	if (!vma) {
 		reset_ptenuma_scan(p);
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-23  8:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-15 13:13 [PATCH] sched/numa: use down_read_trylock for mmap_sem Vlastimil Babka
2017-05-15 14:27 ` Rik van Riel
2017-05-15 14:35 ` Mel Gorman
2017-05-16  8:15 ` Peter Zijlstra
2017-05-23  8:47 ` [tip:sched/core] sched/numa: Use down_read_trylock() for the mmap_sem tip-bot for Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.