All of lore.kernel.org
 help / color / mirror / Atom feed
diff for duplicates of <20170518132033.GA12219@castle>

diff --git a/a/1.txt b/N1/1.txt
index d760823..dc5ef32 100644
--- a/a/1.txt
+++ b/N1/1.txt
@@ -21,3 +21,108 @@ If we consider this approach, I've prepared a separate patch for this problem
 (stripped all oom reaper list stuff).
 
 Thanks!
+
+>From 317fad44a0fe79fb76e8e4fd6bd81c52ae1712e9 Mon Sep 17 00:00:00 2001
+From: Roman Gushchin <guro@fb.com>
+Date: Tue, 16 May 2017 21:19:56 +0100
+Subject: [PATCH] mm,oom: prevent OOM double kill from a pagefault handling
+ path
+
+During the debugging of some OOM-related stuff, I've noticed
+that sometimes OOM kills two processes instead of one.
+
+The problem can be easily reproduced on a vanilla kernel:
+
+[   25.721494] allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
+[   25.725658] allocate cpuset=/ mems_allowed=0
+[   25.727033] CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
+[   25.729215] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
+[   25.729598] Call Trace:
+[   25.729598]  dump_stack+0x63/0x82
+[   25.729598]  dump_header+0x97/0x21a
+[   25.729598]  ? do_try_to_free_pages+0x2d7/0x360
+[   25.729598]  ? security_capable_noaudit+0x45/0x60
+[   25.729598]  oom_kill_process+0x219/0x3e0
+[   25.729598]  out_of_memory+0x11d/0x480
+[   25.729598]  __alloc_pages_slowpath+0xc84/0xd40
+[   25.729598]  __alloc_pages_nodemask+0x245/0x260
+[   25.729598]  alloc_pages_vma+0xa2/0x270
+[   25.729598]  __handle_mm_fault+0xca9/0x10c0
+[   25.729598]  handle_mm_fault+0xf3/0x210
+[   25.729598]  __do_page_fault+0x240/0x4e0
+[   25.729598]  trace_do_page_fault+0x37/0xe0
+[   25.729598]  do_async_page_fault+0x19/0x70
+[   25.729598]  async_page_fault+0x28/0x30
+< cut >
+[   25.810868] oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
+< cut >
+[   25.817589] allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0
+[   25.818821] allocate cpuset=/ mems_allowed=0
+[   25.819259] CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
+[   25.819847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
+[   25.820549] Call Trace:
+[   25.820733]  dump_stack+0x63/0x82
+[   25.820961]  dump_header+0x97/0x21a
+[   25.820961]  ? security_capable_noaudit+0x45/0x60
+[   25.820961]  oom_kill_process+0x219/0x3e0
+[   25.820961]  out_of_memory+0x11d/0x480
+[   25.820961]  pagefault_out_of_memory+0x68/0x80
+[   25.820961]  mm_fault_error+0x8f/0x190
+[   25.820961]  ? handle_mm_fault+0xf3/0x210
+[   25.820961]  __do_page_fault+0x4b2/0x4e0
+[   25.820961]  trace_do_page_fault+0x37/0xe0
+[   25.820961]  do_async_page_fault+0x19/0x70
+[   25.820961]  async_page_fault+0x28/0x30
+< cut >
+[   25.863078] Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
+[   25.863634] Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB
+
+This actually happens if pagefault_out_of_memory() is called
+after the calling process has already been selected as an OOM victim
+and killed. There is a race with the oom reaper: if the process
+is reaped before it enters out_of_memory(), the MMF_OOM_SKIP
+flag is set, and out_of_memory() will not consider the process
+as a eligible victim. That means that another victim will be selected
+and killed.
+
+Tetsuo Handa has noticed, that this is a side effect of
+commit 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks
+in the allocator slowpath").
+
+To avoid this, out_of_memory() shouldn't be called from
+pagefault_out_of_memory(), if current task already
+has been chosen as an oom victim.
+
+v2: dropped changes related to the oom_reaper synchronization,
+    as it looks like a separate and minor issue;
+    rebased on new mm;
+    renamed, updated commit message.
+
+Signed-off-by: Roman Gushchin <guro@fb.com>
+Cc: Michal Hocko <mhocko@suse.com>
+Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
+Cc: kernel-team@fb.com
+Cc: linux-mm@kvack.org
+Cc: linux-kernel@vger.kernel.org
+---
+ mm/oom_kill.c | 3 +++
+ 1 file changed, 3 insertions(+)
+
+diff --git a/mm/oom_kill.c b/mm/oom_kill.c
+index 04c9143..9c643a3 100644
+--- a/mm/oom_kill.c
++++ b/mm/oom_kill.c
+@@ -1068,6 +1068,9 @@ void pagefault_out_of_memory(void)
+ 	if (mem_cgroup_oom_synchronize(true))
+ 		return;
+ 
++	if (tsk_is_oom_victim(current))
++		return;
++
+ 	if (!mutex_trylock(&oom_lock))
+ 		return;
+ 	out_of_memory(&oc);
+-- 
+2.7.4
diff --git a/a/content_digest b/N1/content_digest
index bbec673..a5f9bbd 100644
--- a/a/content_digest
+++ b/N1/content_digest
@@ -9,11 +9,11 @@
  "Date\0Thu, 18 May 2017 14:20:33 +0100\0"
  "To\0Michal Hocko <mhocko@kernel.org>\0"
  "Cc\0Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>"
-  hannes@cmpxchg.org
-  vdavydov.dev@gmail.com
-  kernel-team@fb.com
-  linux-mm@kvack.org
- " linux-kernel@vger.kernel.org\0"
+  <hannes@cmpxchg.org>
+  <vdavydov.dev@gmail.com>
+  <kernel-team@fb.com>
+  <linux-mm@kvack.org>
+ " <linux-kernel@vger.kernel.org>\0"
  "\00:1\0"
  "b\0"
  "On Thu, May 18, 2017 at 11:00:39AM +0200, Michal Hocko wrote:\n"
@@ -38,6 +38,111 @@
  "If we consider this approach, I've prepared a separate patch for this problem\n"
  "(stripped all oom reaper list stuff).\n"
  "\n"
- Thanks!
+ "Thanks!\n"
+ "\n"
+ ">From 317fad44a0fe79fb76e8e4fd6bd81c52ae1712e9 Mon Sep 17 00:00:00 2001\n"
+ "From: Roman Gushchin <guro@fb.com>\n"
+ "Date: Tue, 16 May 2017 21:19:56 +0100\n"
+ "Subject: [PATCH] mm,oom: prevent OOM double kill from a pagefault handling\n"
+ " path\n"
+ "\n"
+ "During the debugging of some OOM-related stuff, I've noticed\n"
+ "that sometimes OOM kills two processes instead of one.\n"
+ "\n"
+ "The problem can be easily reproduced on a vanilla kernel:\n"
+ "\n"
+ "[   25.721494] allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0\n"
+ "[   25.725658] allocate cpuset=/ mems_allowed=0\n"
+ "[   25.727033] CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181\n"
+ "[   25.729215] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014\n"
+ "[   25.729598] Call Trace:\n"
+ "[   25.729598]  dump_stack+0x63/0x82\n"
+ "[   25.729598]  dump_header+0x97/0x21a\n"
+ "[   25.729598]  ? do_try_to_free_pages+0x2d7/0x360\n"
+ "[   25.729598]  ? security_capable_noaudit+0x45/0x60\n"
+ "[   25.729598]  oom_kill_process+0x219/0x3e0\n"
+ "[   25.729598]  out_of_memory+0x11d/0x480\n"
+ "[   25.729598]  __alloc_pages_slowpath+0xc84/0xd40\n"
+ "[   25.729598]  __alloc_pages_nodemask+0x245/0x260\n"
+ "[   25.729598]  alloc_pages_vma+0xa2/0x270\n"
+ "[   25.729598]  __handle_mm_fault+0xca9/0x10c0\n"
+ "[   25.729598]  handle_mm_fault+0xf3/0x210\n"
+ "[   25.729598]  __do_page_fault+0x240/0x4e0\n"
+ "[   25.729598]  trace_do_page_fault+0x37/0xe0\n"
+ "[   25.729598]  do_async_page_fault+0x19/0x70\n"
+ "[   25.729598]  async_page_fault+0x28/0x30\n"
+ "< cut >\n"
+ "[   25.810868] oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB\n"
+ "< cut >\n"
+ "[   25.817589] allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0\n"
+ "[   25.818821] allocate cpuset=/ mems_allowed=0\n"
+ "[   25.819259] CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181\n"
+ "[   25.819847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014\n"
+ "[   25.820549] Call Trace:\n"
+ "[   25.820733]  dump_stack+0x63/0x82\n"
+ "[   25.820961]  dump_header+0x97/0x21a\n"
+ "[   25.820961]  ? security_capable_noaudit+0x45/0x60\n"
+ "[   25.820961]  oom_kill_process+0x219/0x3e0\n"
+ "[   25.820961]  out_of_memory+0x11d/0x480\n"
+ "[   25.820961]  pagefault_out_of_memory+0x68/0x80\n"
+ "[   25.820961]  mm_fault_error+0x8f/0x190\n"
+ "[   25.820961]  ? handle_mm_fault+0xf3/0x210\n"
+ "[   25.820961]  __do_page_fault+0x4b2/0x4e0\n"
+ "[   25.820961]  trace_do_page_fault+0x37/0xe0\n"
+ "[   25.820961]  do_async_page_fault+0x19/0x70\n"
+ "[   25.820961]  async_page_fault+0x28/0x30\n"
+ "< cut >\n"
+ "[   25.863078] Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child\n"
+ "[   25.863634] Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB\n"
+ "\n"
+ "This actually happens if pagefault_out_of_memory() is called\n"
+ "after the calling process has already been selected as an OOM victim\n"
+ "and killed. There is a race with the oom reaper: if the process\n"
+ "is reaped before it enters out_of_memory(), the MMF_OOM_SKIP\n"
+ "flag is set, and out_of_memory() will not consider the process\n"
+ "as a eligible victim. That means that another victim will be selected\n"
+ "and killed.\n"
+ "\n"
+ "Tetsuo Handa has noticed, that this is a side effect of\n"
+ "commit 9a67f6488eca926f (\"mm: consolidate GFP_NOFAIL checks\n"
+ "in the allocator slowpath\").\n"
+ "\n"
+ "To avoid this, out_of_memory() shouldn't be called from\n"
+ "pagefault_out_of_memory(), if current task already\n"
+ "has been chosen as an oom victim.\n"
+ "\n"
+ "v2: dropped changes related to the oom_reaper synchronization,\n"
+ "    as it looks like a separate and minor issue;\n"
+ "    rebased on new mm;\n"
+ "    renamed, updated commit message.\n"
+ "\n"
+ "Signed-off-by: Roman Gushchin <guro@fb.com>\n"
+ "Cc: Michal Hocko <mhocko@suse.com>\n"
+ "Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>\n"
+ "Cc: Johannes Weiner <hannes@cmpxchg.org>\n"
+ "Cc: Vladimir Davydov <vdavydov.dev@gmail.com>\n"
+ "Cc: kernel-team@fb.com\n"
+ "Cc: linux-mm@kvack.org\n"
+ "Cc: linux-kernel@vger.kernel.org\n"
+ "---\n"
+ " mm/oom_kill.c | 3 +++\n"
+ " 1 file changed, 3 insertions(+)\n"
+ "\n"
+ "diff --git a/mm/oom_kill.c b/mm/oom_kill.c\n"
+ "index 04c9143..9c643a3 100644\n"
+ "--- a/mm/oom_kill.c\n"
+ "+++ b/mm/oom_kill.c\n"
+ "@@ -1068,6 +1068,9 @@ void pagefault_out_of_memory(void)\n"
+ " \tif (mem_cgroup_oom_synchronize(true))\n"
+ " \t\treturn;\n"
+ " \n"
+ "+\tif (tsk_is_oom_victim(current))\n"
+ "+\t\treturn;\n"
+ "+\n"
+ " \tif (!mutex_trylock(&oom_lock))\n"
+ " \t\treturn;\n"
+ " \tout_of_memory(&oc);\n"
+ "-- \n"
+ 2.7.4
 
-369b4f9788e1003f0ab13a9e782b456b7cd36c189fb9e12e7ff06583bebc7595
+71286ffb0939b64b3a19ec7c4fa5ec7108656a3fd206318125f60241a45dc0e4

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.