From: Michal Hocko <mhocko@kernel.org>
To: Tejun Heo <htejun@gmail.com>
Cc: Christoph Lameter <cl@linux.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
torvalds@linux-foundation.org, rientjes@google.com,
oleg@redhat.com, kwalker@redhat.com, akpm@linux-foundation.org,
hannes@cmpxchg.org, vdavydov@parallels.com, skozina@redhat.com,
mgorman@suse.de, riel@redhat.com
Subject: Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks
Date: Wed, 11 Nov 2015 16:44:25 +0100 [thread overview]
Message-ID: <20151111154424.GC1432@dhcp22.suse.cz> (raw)
In-Reply-To: <20151106001648.GA18183@mtj.duckdns.org>
On Thu 05-11-15 19:16:48, Tejun Heo wrote:
> Hello,
>
> On Thu, Nov 05, 2015 at 11:45:42AM -0600, Christoph Lameter wrote:
> > Sorry but we need work queue processing for vmstat counters that is
>
> I made this analogy before but this is similar to looping with
> preemption off. If anything on workqueue stays RUNNING w/o making
> forward progress, it's buggy. I'd venture to say any code which busy
> loops without making forward progress in the time scale noticeable to
> human beings is borderline buggy too.
Well, the caller asked for a memory but the request cannot succeed. Due
to the memory allocator semantic we cannot fail the request so we have
to loop. If we had an event to wait for we would do so, of course.
Now wrt. to a small sleep. We used to do that and called
congestion_wait(HZ/50) before retry. This has proved to cause stalls
during high memory pressure 0e093d99763e ("writeback: do not sleep on
the congestion queue if there are no congested BDIs or if significant
congestion is not being encountered in the current zone"). I do not
really remember what was CONFIG_HZ in those reports but it is quite
possible it was 250. So there is a risk of (partial) re-introducing of
those stalls with the patch from Tetsuo
(http://lkml.kernel.org/r/201510251952.CEF04109.OSOtLFHFVFJMQO@I-love.SAKURA.ne.jp)
If we really have to do short sleep, though, then I would suggest
sticking that into wait_iff_congested rather than spread it into more
places and reduce it only to worker threads. This should be much more
safer. Thought?
---
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8ed2ffd963c5..7340353f8aea 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -957,8 +957,9 @@ EXPORT_SYMBOL(congestion_wait);
* jiffies for either a BDI to exit congestion of the given @sync queue
* or a write to complete.
*
- * In the absence of zone congestion, cond_resched() is called to yield
- * the processor if necessary but otherwise does not sleep.
+ * In the absence of zone congestion, a short sleep or a cond_resched is
+ * performed to yield the processor and to allow other subsystems to make
+ * a forward progress.
*
* The return value is 0 if the sleep is for the full timeout. Otherwise,
* it is the number of jiffies that were still remaining when the function
@@ -978,7 +979,19 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout)
*/
if (atomic_read(&nr_wb_congested[sync]) == 0 ||
!test_bit(ZONE_CONGESTED, &zone->flags)) {
- cond_resched();
+
+ /*
+ * Memory allocation/reclaim might be called from a WQ
+ * context and the current implementation of the WQ
+ * concurrency control doesn't recognize that a particular
+ * WQ is congested if the worker thread is looping without
+ * ever sleeping. Therefore we have to do a short sleep
+ * here rather than calling cond_resched().
+ */
+ if (current->flags & PF_WQ_WORKER)
+ schedule_timeout(1);
+ else
+ cond_resched();
/* In case we scheduled, work out time remaining */
ret = timeout - (jiffies - start);
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-11-11 15:44 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-21 12:26 [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks Tetsuo Handa
2015-10-21 13:03 ` Michal Hocko
2015-10-21 14:22 ` Christoph Lameter
2015-10-21 14:33 ` Michal Hocko
2015-10-21 14:49 ` Christoph Lameter
2015-10-21 14:55 ` Michal Hocko
2015-10-21 15:39 ` Tetsuo Handa
2015-10-21 17:16 ` Christoph Lameter
2015-10-22 11:37 ` Tetsuo Handa
2015-10-22 13:39 ` Christoph Lameter
2015-10-22 14:09 ` Tejun Heo
2015-10-22 14:21 ` Tejun Heo
2015-10-22 14:23 ` Christoph Lameter
2015-10-22 14:24 ` Tejun Heo
2015-10-22 14:25 ` Christoph Lameter
2015-10-22 14:33 ` Tejun Heo
2015-10-22 14:41 ` Christoph Lameter
2015-10-22 15:14 ` Tejun Heo
2015-10-23 4:26 ` Tejun Heo
2015-11-02 15:01 ` Michal Hocko
2015-11-02 19:20 ` Tejun Heo
2015-11-03 2:32 ` Tetsuo Handa
2015-11-03 19:43 ` Tejun Heo
2015-11-05 14:59 ` Tetsuo Handa
2015-11-05 17:45 ` Christoph Lameter
2015-11-06 0:16 ` Tejun Heo
2015-11-11 15:44 ` Michal Hocko [this message]
2015-11-11 16:03 ` Michal Hocko
2015-10-22 14:22 ` Christoph Lameter
2015-10-22 15:06 ` Michal Hocko
2015-10-22 15:15 ` Tejun Heo
2015-10-22 15:33 ` Christoph Lameter
2015-10-23 8:37 ` Michal Hocko
2015-10-23 11:43 ` Make vmstat deferrable again (was Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks) Christoph Lameter
2015-10-23 12:07 ` Sergey Senozhatsky
2015-10-23 14:12 ` Christoph Lameter
2015-10-23 14:49 ` Sergey Senozhatsky
2015-10-23 16:10 ` Christoph Lameter
2015-10-22 15:35 ` [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks Michal Hocko
2015-10-22 15:37 ` Tejun Heo
2015-10-22 15:49 ` Michal Hocko
2015-10-22 18:42 ` Tejun Heo
2015-10-22 21:42 ` [PATCH] mm,vmscan: Use accurate values for zone_reclaimable()checks Tetsuo Handa
2015-10-22 22:47 ` Tejun Heo
2015-10-23 8:36 ` Michal Hocko
2015-10-23 10:37 ` Tejun Heo
2015-10-23 8:33 ` [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks Michal Hocko
2015-10-23 10:36 ` Tejun Heo
2015-10-23 11:11 ` Michal Hocko
2015-10-23 12:25 ` Tetsuo Handa
2015-10-23 18:23 ` Tejun Heo
2015-10-25 10:52 ` Tetsuo Handa
2015-10-25 22:47 ` Tejun Heo
2015-10-27 9:22 ` Michal Hocko
2015-10-27 10:55 ` Tejun Heo
2015-10-27 12:07 ` Michal Hocko
2015-10-23 18:21 ` Tejun Heo
2015-10-27 9:16 ` Michal Hocko
2015-10-27 10:52 ` Tejun Heo
2015-10-27 11:07 ` [PATCH] mm,vmscan: Use accurate values for zone_reclaimable()checks Tetsuo Handa
2015-10-27 11:30 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151111154424.GC1432@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=htejun@gmail.com \
--cc=kwalker@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=oleg@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=skozina@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).