From: Vlastimil Babka <vbabka@suse.cz>
To: Vladimir Davydov <vdavydov@parallels.com>, Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
stable@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
Date: Mon, 22 Dec 2014 20:33:33 +0100 [thread overview]
Message-ID: <5498720D.5030702@suse.cz> (raw)
In-Reply-To: <20141222162558.GA21211@esperanza>
On 22.12.2014 17:25, Vladimir Davydov wrote:
>
>>> E.g. suppose processes are
>>> governed by FIFO and kswapd happens to have a higher prio than the
>>> process killed by OOM. Then after cond_resched kswapd will be picked for
>>> execution again, and the killing process won't have a chance to remove
>>> itself from the wait queue.
>> Except that kswapd runs as SCHED_NORMAL with 0 priority.
>>
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index 744e2b491527..2a123634c220 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -2984,6 +2984,9 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
>>>>> if (remaining)
>>>>> return false;
>>>>>
>>>>> + if (!pgdat_balanced(pgdat, order, classzone_idx))
>>>>> + return false;
>>>>> +
>>>> What would be consequences of not waking up pfmemalloc waiters while the
>>>> node is not balanced?
>>> They will get woken up a bit later in balanced_pgdat. This might result
>>> in latency spikes though. In order not to change the original behaviour
>>> we could always wake all pfmemalloc waiters no matter if we are going to
>>> sleep or not:
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 744e2b491527..a21e0bd563c3 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -2993,10 +2993,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
>>> * so wake them now if necessary. If necessary, processes will wake
>>> * kswapd and get throttled again
>>> */
>>> - if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
>>> - wake_up(&pgdat->pfmemalloc_wait);
>>> - return false;
>>> - }
>>> + wake_up_all(&pgdat->pfmemalloc_wait);
>>>
>>> return pgdat_balanced(pgdat, order, classzone_idx);
>> So you are relying on scheduling points somewhere down the
>> balance_pgdat. That should be sufficient. I am still quite surprised
>> that we have an OOM victim still on the queue and balanced pgdat here
>> because OOM victim didn't have chance to free memory. So somebody else
>> must have released a lot of memory after OOM.
>>
>> This patch seems better than the one from Vlastimil. Care to post it
>> with the full changelog, please?
> Attached below (merged with 2/2). I haven't checked that it does fix the
> issue, because I don't have the reproducer, so it should be committed
> only if Vlastimil approves it.
I agree it's the right fix, thanks a lot. We only have a synthetic
reproducer,
as the real scenario would be hard to trigger reliably. I can test it
later, but
I think it's reasonably clear the patch will help.
I would just personaly keep the comment clarification in the patch, but it's
not a critical issue.
Vlastimil
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Vladimir Davydov <vdavydov@parallels.com>, Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
stable@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
Date: Mon, 22 Dec 2014 20:33:33 +0100 [thread overview]
Message-ID: <5498720D.5030702@suse.cz> (raw)
In-Reply-To: <20141222162558.GA21211@esperanza>
On 22.12.2014 17:25, Vladimir Davydov wrote:
>
>>> E.g. suppose processes are
>>> governed by FIFO and kswapd happens to have a higher prio than the
>>> process killed by OOM. Then after cond_resched kswapd will be picked for
>>> execution again, and the killing process won't have a chance to remove
>>> itself from the wait queue.
>> Except that kswapd runs as SCHED_NORMAL with 0 priority.
>>
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index 744e2b491527..2a123634c220 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -2984,6 +2984,9 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
>>>>> if (remaining)
>>>>> return false;
>>>>>
>>>>> + if (!pgdat_balanced(pgdat, order, classzone_idx))
>>>>> + return false;
>>>>> +
>>>> What would be consequences of not waking up pfmemalloc waiters while the
>>>> node is not balanced?
>>> They will get woken up a bit later in balanced_pgdat. This might result
>>> in latency spikes though. In order not to change the original behaviour
>>> we could always wake all pfmemalloc waiters no matter if we are going to
>>> sleep or not:
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 744e2b491527..a21e0bd563c3 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -2993,10 +2993,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
>>> * so wake them now if necessary. If necessary, processes will wake
>>> * kswapd and get throttled again
>>> */
>>> - if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
>>> - wake_up(&pgdat->pfmemalloc_wait);
>>> - return false;
>>> - }
>>> + wake_up_all(&pgdat->pfmemalloc_wait);
>>>
>>> return pgdat_balanced(pgdat, order, classzone_idx);
>> So you are relying on scheduling points somewhere down the
>> balance_pgdat. That should be sufficient. I am still quite surprised
>> that we have an OOM victim still on the queue and balanced pgdat here
>> because OOM victim didn't have chance to free memory. So somebody else
>> must have released a lot of memory after OOM.
>>
>> This patch seems better than the one from Vlastimil. Care to post it
>> with the full changelog, please?
> Attached below (merged with 2/2). I haven't checked that it does fix the
> issue, because I don't have the reproducer, so it should be committed
> only if Vlastimil approves it.
I agree it's the right fix, thanks a lot. We only have a synthetic
reproducer,
as the real scenario would be hard to trigger reliably. I can test it
later, but
I think it's reasonably clear the patch will help.
I would just personaly keep the comment clarification in the patch, but it's
not a critical issue.
Vlastimil
next prev parent reply other threads:[~2014-12-22 19:33 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-19 13:01 [PATCH 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Vlastimil Babka
2014-12-19 13:01 ` Vlastimil Babka
2014-12-19 13:01 ` Vlastimil Babka
2014-12-19 13:01 ` [PATCH 2/2] mm, vmscan: wake up all pfmemalloc-throttled processes at once Vlastimil Babka
2014-12-19 13:01 ` Vlastimil Babka
2014-12-19 13:01 ` Vlastimil Babka
2014-12-19 15:57 ` [PATCH 1/2] mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed Michal Hocko
2014-12-19 15:57 ` Michal Hocko
2014-12-19 18:28 ` Vladimir Davydov
2014-12-19 18:28 ` Vladimir Davydov
2014-12-19 18:28 ` Vladimir Davydov
2014-12-19 23:05 ` Vlastimil Babka
2014-12-19 23:05 ` Vlastimil Babka
2014-12-20 9:24 ` Vladimir Davydov
2014-12-20 9:24 ` Vladimir Davydov
2014-12-20 9:24 ` Vladimir Davydov
2014-12-20 10:47 ` Michal Hocko
2014-12-20 10:47 ` Michal Hocko
2014-12-20 14:18 ` Vladimir Davydov
2014-12-20 14:18 ` Vladimir Davydov
2014-12-20 14:18 ` Vladimir Davydov
2014-12-22 14:24 ` Michal Hocko
2014-12-22 14:24 ` Michal Hocko
2014-12-22 16:25 ` Vladimir Davydov
2014-12-22 16:25 ` Vladimir Davydov
2014-12-22 16:25 ` Vladimir Davydov
2014-12-22 19:33 ` Vlastimil Babka [this message]
2014-12-22 19:33 ` Vlastimil Babka
2014-12-23 10:16 ` Michal Hocko
2014-12-23 10:16 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5498720D.5030702@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=stable@vger.kernel.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.