From: Michal Hocko <mhocko@kernel.org>
To: linux-mm@kvack.org
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Roman Gushchin <guro@fb.com>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>
Subject: [RFC PATCH 3/3] mm, oom: hand over MMF_OOM_SKIP to exit path if it is guranteed to finish
Date: Mon, 10 Sep 2018 14:55:13 +0200 [thread overview]
Message-ID: <20180910125513.311-4-mhocko@kernel.org> (raw)
In-Reply-To: <20180910125513.311-1-mhocko@kernel.org>
From: Michal Hocko <mhocko@suse.com>
David Rientjes has noted that certain user space memory allocators leave
a lot of page tables behind and the current implementation of oom_reaper
doesn't deal with those workloads very well. In order to improve these
workloads define a point when exit_mmap is guaranteed to finish the tear
down without any further blocking etc. This is right after we unlink
vmas (those still depend on locks which are held while performing memory
allocations from other contexts) and before we start releasing page
tables.
Opencode free_pgtables and explicitly unlink all vmas first. Then set
mm->mmap to NULL (there shouldn't be anybody looking at it at this
stage) and check for mm->mmap in the oom_reaper path. If the mm->mmap
is NULL we rely on the exit path and won't set MMF_OOM_SKIP from the
reaper.
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
mm/mmap.c | 24 ++++++++++++++++++++----
mm/oom_kill.c | 13 +++++++------
2 files changed, 27 insertions(+), 10 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 3481424717ac..99bb9ce29bc5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3085,8 +3085,27 @@ void exit_mmap(struct mm_struct *mm)
/* oom_reaper cannot race with the page tables teardown */
if (oom)
down_write(&mm->mmap_sem);
+ /*
+ * Hide vma from rmap and truncate_pagecache before freeing
+ * pgtables
+ */
+ while (vma) {
+ unlink_anon_vmas(vma);
+ unlink_file_vma(vma);
+ vma = vma->vm_next;
+ }
+ vma = mm->mmap;
+ if (oom) {
+ /*
+ * the exit path is guaranteed to finish without any unbound
+ * blocking at this stage so make it clear to the caller.
+ */
+ mm->mmap = NULL;
+ up_write(&mm->mmap_sem);
+ }
- free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
+ free_pgd_range(&tlb, vma->vm_start, vma->vm_prev->vm_end,
+ FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
/*
@@ -3099,9 +3118,6 @@ void exit_mmap(struct mm_struct *mm)
vma = remove_vma(vma);
}
vm_unacct_memory(nr_accounted);
-
- if (oom)
- up_write(&mm->mmap_sem);
}
/* Insert vm structure into process list sorted by address
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 049e67dc039b..0ebf93c76c81 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -570,12 +570,10 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
}
/*
- * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
- * work on the mm anymore. The check for MMF_OOM_SKIP must run
- * under mmap_sem for reading because it serializes against the
- * down_write();up_write() cycle in exit_mmap().
+ * If exit path clear mm->mmap then we know it will finish the tear down
+ * and we can go and bail out here.
*/
- if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+ if (!mm->mmap) {
trace_skip_task_reaping(tsk->pid);
goto out_unlock;
}
@@ -624,8 +622,11 @@ static void oom_reap_task(struct task_struct *tsk)
/*
* Hide this mm from OOM killer because it has been either reaped or
* somebody can't call up_write(mmap_sem).
+ * Leave the MMF_OOM_SKIP to the exit path if it managed to reach the
+ * point it is guaranteed to finish without any blocking
*/
- set_bit(MMF_OOM_SKIP, &mm->flags);
+ if (mm->mmap)
+ set_bit(MMF_OOM_SKIP, &mm->flags);
/* Drop a reference taken by wake_oom_reaper */
put_task_struct(tsk);
--
2.18.0
next prev parent reply other threads:[~2018-09-10 12:55 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-08 4:54 [PATCH v2] mm, oom: Fix unnecessary killing of additional processes Tetsuo Handa
2018-09-10 9:54 ` Michal Hocko
2018-09-10 11:27 ` Tetsuo Handa
2018-09-10 11:40 ` Michal Hocko
2018-09-10 12:52 ` Tetsuo Handa
2018-09-10 12:55 ` [RFC PATCH 0/3] rework mmap-exit vs. oom_reaper handover Michal Hocko
2018-09-10 12:55 ` [RFC PATCH 1/3] mm, oom: rework mmap_exit vs. oom_reaper synchronization Michal Hocko
2018-09-10 12:55 ` [RFC PATCH 2/3] mm, oom: keep retrying the oom_reap operation as long as there is substantial memory left Michal Hocko
2018-09-10 12:55 ` Michal Hocko [this message]
2018-09-10 14:59 ` [RFC PATCH 0/3] rework mmap-exit vs. oom_reaper handover Tetsuo Handa
2018-09-10 15:11 ` Michal Hocko
2018-09-10 15:40 ` Tetsuo Handa
2018-09-10 16:44 ` Michal Hocko
2018-09-12 3:06 ` Tetsuo Handa
2018-09-12 7:18 ` Michal Hocko
2018-09-12 7:58 ` Tetsuo Handa
2018-09-12 8:17 ` Michal Hocko
2018-09-12 10:59 ` Tetsuo Handa
2018-09-12 11:22 ` Michal Hocko
2018-09-11 14:01 ` Tetsuo Handa
2018-09-12 7:50 ` Michal Hocko
2018-09-12 13:42 ` Michal Hocko
2018-09-13 2:44 ` Tetsuo Handa
2018-09-13 9:09 ` Michal Hocko
2018-09-13 11:20 ` Tetsuo Handa
2018-09-13 11:35 ` Michal Hocko
2018-09-13 11:53 ` Tetsuo Handa
2018-09-13 13:40 ` Michal Hocko
2018-09-14 13:54 ` Tetsuo Handa
2018-09-14 14:14 ` Michal Hocko
2018-09-14 17:07 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180910125513.311-4-mhocko@kernel.org \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=guro@fb.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).