* [REPOST] [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and __alloc_pages_high_priority().
@ 2015-08-23 7:21 Tetsuo Handa
2015-08-24 10:03 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2015-08-23 7:21 UTC (permalink / raw)
To: mhocko, rientjes, hannes; +Cc: linux-mm
>From 4a3cf5be07a66cf3906a380e77ba5e2ac1b2b3d5 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 1 Aug 2015 22:39:30 +0900
Subject: [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and
__alloc_pages_high_priority().
Currently, TIF_MEMDIE is checked at gfp_to_alloc_flags() which is before
calling __alloc_pages_high_priority() and at
/* Avoid allocations with no watermarks from looping endlessly */
if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
which is after returning from __alloc_pages_high_priority(). This means
that if TIF_MEMDIE is set between returning from gfp_to_alloc_flags() and
checking test_thread_flag(TIF_MEMDIE), the allocation can fail without
calling __alloc_pages_high_priority(). We need to replace
"test_thread_flag(TIF_MEMDIE)" with "whether TIF_MEMDIE was already set
as of calling gfp_to_alloc_flags()" in order to close this race window.
Since gfp_to_alloc_flags() includes ALLOC_NO_WATERMARKS for several cases,
it will be more correct to replace "test_thread_flag(TIF_MEMDIE)" with
"whether gfp_to_alloc_flags() included ALLOC_NO_WATERMARKS" because the
purpose of test_thread_flag(TIF_MEMDIE) is to give up immediately if
__alloc_pages_high_priority() failed.
Note that we could simply do
if (alloc_flags & ALLOC_NO_WATERMARKS) {
ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
page = __alloc_pages_high_priority(gfp_mask, order, ac);
if (page)
goto got_pg;
WARN_ON_ONCE(!wait && (gfp_mask & __GFP_NOFAIL));
goto nopage;
}
instead of changing to
if ((alloc_flags & ALLOC_NO_WATERMARKS) && !(gfp_mask & __GFP_NOFAIL))
goto nopage;
if we can duplicate
if (!wait) {
WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
goto nopage;
}
.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4b220cb..37a0390 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3085,7 +3085,7 @@ retry:
goto nopage;
/* Avoid allocations with no watermarks from looping endlessly */
- if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+ if ((alloc_flags & ALLOC_NO_WATERMARKS) && !(gfp_mask & __GFP_NOFAIL))
goto nopage;
/*
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [REPOST] [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and __alloc_pages_high_priority().
2015-08-23 7:21 [REPOST] [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and __alloc_pages_high_priority() Tetsuo Handa
@ 2015-08-24 10:03 ` Michal Hocko
2015-08-24 12:52 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2015-08-24 10:03 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: rientjes, hannes, linux-mm
On Sun 23-08-15 16:21:41, Tetsuo Handa wrote:
> >From 4a3cf5be07a66cf3906a380e77ba5e2ac1b2b3d5 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Sat, 1 Aug 2015 22:39:30 +0900
> Subject: [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and
> __alloc_pages_high_priority().
>
> Currently, TIF_MEMDIE is checked at gfp_to_alloc_flags() which is before
> calling __alloc_pages_high_priority() and at
>
> /* Avoid allocations with no watermarks from looping endlessly */
> if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
>
> which is after returning from __alloc_pages_high_priority(). This means
> that if TIF_MEMDIE is set between returning from gfp_to_alloc_flags() and
> checking test_thread_flag(TIF_MEMDIE), the allocation can fail without
> calling __alloc_pages_high_priority(). We need to replace
> "test_thread_flag(TIF_MEMDIE)" with "whether TIF_MEMDIE was already set
> as of calling gfp_to_alloc_flags()" in order to close this race window.
>
> Since gfp_to_alloc_flags() includes ALLOC_NO_WATERMARKS for several cases,
> it will be more correct to replace "test_thread_flag(TIF_MEMDIE)" with
> "whether gfp_to_alloc_flags() included ALLOC_NO_WATERMARKS" because the
> purpose of test_thread_flag(TIF_MEMDIE) is to give up immediately if
> __alloc_pages_high_priority() failed.
Yes TIF_MEMDIE setting is inherently racy. We will fail the allocation
without diving into reserves. Why is that a problem?
The comment above the check is misleading but now you are allowing to
fail all ALLOC_NO_WATERMARKS (without __GFP_NOFAIL) allocations before
entering the direct reclaim and compaction. This seems incorrect. What
about __GFP_MEMALLOC requests?
I think the check for TIF_MEMDIE makes more sense here.
>
> Note that we could simply do
>
> if (alloc_flags & ALLOC_NO_WATERMARKS) {
> ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> page = __alloc_pages_high_priority(gfp_mask, order, ac);
> if (page)
> goto got_pg;
> WARN_ON_ONCE(!wait && (gfp_mask & __GFP_NOFAIL));
> goto nopage;
> }
>
> instead of changing to
>
> if ((alloc_flags & ALLOC_NO_WATERMARKS) && !(gfp_mask & __GFP_NOFAIL))
> goto nopage;
>
> if we can duplicate
>
> if (!wait) {
> WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> goto nopage;
> }
>
> .
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4b220cb..37a0390 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3085,7 +3085,7 @@ retry:
> goto nopage;
>
> /* Avoid allocations with no watermarks from looping endlessly */
> - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> + if ((alloc_flags & ALLOC_NO_WATERMARKS) && !(gfp_mask & __GFP_NOFAIL))
> goto nopage;
>
> /*
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REPOST] [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and __alloc_pages_high_priority().
2015-08-24 10:03 ` Michal Hocko
@ 2015-08-24 12:52 ` Tetsuo Handa
2015-08-24 13:20 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2015-08-24 12:52 UTC (permalink / raw)
To: mhocko; +Cc: rientjes, hannes, linux-mm
Michal Hocko wrote:
> The comment above the check is misleading but now you are allowing to
> fail all ALLOC_NO_WATERMARKS (without __GFP_NOFAIL) allocations before
> entering the direct reclaim and compaction. This seems incorrect. What
> about __GFP_MEMALLOC requests?
So, you want __GPP_MEMALLOC to retry forever unless TIF_MEMDIE is set, don't
you?
> I think the check for TIF_MEMDIE makes more sense here.
Since we already failed to allocate from memory reserves, I don't know if
direct reclaim and compaction can work as expected under such situation.
Maybe the OOM killer is invoked, but I worry that the OOM victim gets stuck
because we already failed to allocate from memory reserves. Unless next OOM
victims are chosen via timeout, I think that this can be one of triggers
that lead to silent hangup... (Just my suspect. I can't prove it because I
can't go to in front of customers' servers and check SysRq.)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REPOST] [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and __alloc_pages_high_priority().
2015-08-24 12:52 ` Tetsuo Handa
@ 2015-08-24 13:20 ` Michal Hocko
2015-08-27 13:49 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2015-08-24 13:20 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: rientjes, hannes, linux-mm
On Mon 24-08-15 21:52:08, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > The comment above the check is misleading but now you are allowing to
> > fail all ALLOC_NO_WATERMARKS (without __GFP_NOFAIL) allocations before
> > entering the direct reclaim and compaction. This seems incorrect. What
> > about __GFP_MEMALLOC requests?
>
> So, you want __GPP_MEMALLOC to retry forever unless TIF_MEMDIE is set, don't
> you?
I am not saying that. I was just pointing out that you have changed the
behavior of this gfp flag.
> > I think the check for TIF_MEMDIE makes more sense here.
>
> Since we already failed to allocate from memory reserves, I don't know if
> direct reclaim and compaction can work as expected under such situation.
Yes the allocation has failed and the reclaim might not do any
progress. Withtout trying to the reclaim we simply do not know that,
though.
The TIF_MEMDIE check was explicit for a good reason IMO. The race is not
really that important AFAICS because we would only fail the allocation
sooner for the OOM victim and that one might fail already. I might be
missing something of course but your change has a higher risk of
undesired behavior than the original code.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [REPOST] [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and __alloc_pages_high_priority().
2015-08-24 13:20 ` Michal Hocko
@ 2015-08-27 13:49 ` Tetsuo Handa
0 siblings, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2015-08-27 13:49 UTC (permalink / raw)
To: mhocko; +Cc: rientjes, hannes, linux-mm
Michal Hocko wrote:
> The TIF_MEMDIE check was explicit for a good reason IMO. The race is not
> really that important AFAICS because we would only fail the allocation
> sooner for the OOM victim and that one might fail already. I might be
> missing something of course but your change has a higher risk of
> undesired behavior than the original code.
In a different thread, you are trying to allow giving up !__GFP_FS
allocations without retrying because giving up __GFP_FS allocations
without retrying increases possibility of hitting obsucure bugs in the
error recovery paths. This TIF_MEMDIE v.s. __alloc_pages_high_priority()
race condition allows giving up not only !__GFP_FS allocations but also
__GFP_FS allocations; i.e. fixing this race reduces possibility of
hitting obsucure bugs.
Thus, here is an updated patch.
------------------------------------------------------------
>From 5300fa1b78130113189e72a0a09e9a49090b5f1e Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 27 Aug 2015 22:30:13 +0900
Subject: [PATCH] mm: Fix race between setting TIF_MEMDIE and __alloc_pages_high_priority().
Currently, TIF_MEMDIE is checked at gfp_to_alloc_flags() which is before
calling __alloc_pages_high_priority() and at
/* Avoid allocations with no watermarks from looping endlessly */
if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
which is after returning from __alloc_pages_high_priority(). This means
that if TIF_MEMDIE is set between returning from gfp_to_alloc_flags() and
checking test_thread_flag(TIF_MEMDIE), the allocation will fail without
calling __alloc_pages_high_priority().
For now, we need to try to avoid failing small __GFP_FS allocations
because many of error recovery paths are untested, resulting in obscure
bugs. This patch replaces "test_thread_flag(TIF_MEMDIE)" with "whether
TIF_MEMDIE was already set as of calling gfp_to_alloc_flags()" in order
to try to avoid such failures.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
mm/page_alloc.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ff998c..8880b17 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3012,6 +3012,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
enum migrate_mode migration_mode = MIGRATE_ASYNC;
bool deferred_compaction = false;
int contended_compaction = COMPACT_CONTENDED_NONE;
+ bool memdie_pending;
/*
* In the slowpath, we sanity check order to avoid ever trying to
@@ -3036,6 +3037,7 @@ retry:
if (!(gfp_mask & __GFP_NO_KSWAPD))
wake_all_kswapds(order, ac);
+ memdie_pending = test_thread_flag(TIF_MEMDIE);
/*
* OK, we're below the kswapd watermark and have kicked background
* reclaim. Now things get more complex, so set up alloc_flags according
@@ -3091,8 +3093,13 @@ retry:
if (current->flags & PF_MEMALLOC)
goto nopage;
- /* Avoid allocations with no watermarks from looping endlessly */
- if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+ /*
+ * Give up if chosen as an OOM victim. But if the context allows,
+ * make sure that __alloc_pages_high_priority() was called before
+ * giving up, for failing small __GFP_FS allocations are prone to
+ * trigger obscure bugs.
+ */
+ if (memdie_pending && !(gfp_mask & __GFP_NOFAIL))
goto nopage;
/*
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-27 13:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-23 7:21 [REPOST] [PATCH 1/2] mm: Fix race between setting TIF_MEMDIE and __alloc_pages_high_priority() Tetsuo Handa
2015-08-24 10:03 ` Michal Hocko
2015-08-24 12:52 ` Tetsuo Handa
2015-08-24 13:20 ` Michal Hocko
2015-08-27 13:49 ` Tetsuo Handa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).