From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Pekka Enberg <penberg@kernel.org>,
Christoph Lameter <cl@linux.com>, Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Colin King <colin.king@canonical.com>,
Raghavendra D Prabhu <raghu.prabhu13@gmail.com>,
Jan Kara <jack@suse.cz>, Chris Mason <chris.mason@oracle.com>,
Rik van Riel <riel@redhat.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 3/3] mm: slub: Default slub_max_order to 0
Date: Thu, 12 May 2011 19:47:05 -0500 [thread overview]
Message-ID: <1305247626.2575.111.camel@mulgrave.site> (raw)
In-Reply-To: <20110512221506.GM16531@cmpxchg.org>
On Fri, 2011-05-13 at 00:15 +0200, Johannes Weiner wrote:
> On Thu, May 12, 2011 at 05:04:41PM -0500, James Bottomley wrote:
> > On Thu, 2011-05-12 at 15:04 -0500, James Bottomley wrote:
> > > Confirmed, I'm afraid ... I can trigger the problem with all three
> > > patches under PREEMPT. It's not a hang this time, it's just kswapd
> > > taking 100% system time on 1 CPU and it won't calm down after I unload
> > > the system.
> >
> > Just on a "if you don't know what's wrong poke about and see" basis, I
> > sliced out all the complex logic in sleeping_prematurely() and, as far
> > as I can tell, it cures the problem behaviour. I've loaded up the
> > system, and taken the tar load generator through three runs without
> > producing a spinning kswapd (this is PREEMPT). I'll try with a
> > non-PREEMPT kernel shortly.
> >
> > What this seems to say is that there's a problem with the complex logic
> > in sleeping_prematurely(). I'm pretty sure hacking up
> > sleeping_prematurely() just to dump all the calculations is the wrong
> > thing to do, but perhaps someone can see what the right thing is ...
>
> I think I see the problem: the boolean logic of sleeping_prematurely()
> is odd. If it returns true, kswapd will keep running. So if
> pgdat_balanced() returns true, kswapd should go to sleep.
>
> This?
I was going to say this was a winner, but on the third untar run on
non-PREEMPT, I hit the kswapd livelock. It's got much farther than
previous attempts, which all hang on the first run, but I think the
essential problem is still (at least on this machine) that
sleeping_prematurely() is doing too much work for the wakeup storm that
allocators are causing.
Something that ratelimits the amount of time we spend in the watermark
calculations, like the below (which incorporates your pgdat fix) seems
to be much more stable (I've not run it for three full runs yet, but
kswapd CPU time is way lower so far).
The heuristic here is that if we're making the calculation more than ten
times in 1/10 of a second, stop and sleep anyway.
James
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0665520..545250c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2249,12 +2249,32 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
{
int i;
unsigned long balanced = 0;
- bool all_zones_ok = true;
+ bool all_zones_ok = true, ret;
+ static int returned_true = 0;
+ static unsigned long prev_jiffies = 0;
+
/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
if (remaining)
return true;
+ /* rate limit our entry to the watermark calculations */
+ if (time_after(prev_jiffies + HZ/10, jiffies)) {
+ /* previously returned false, do so again */
+ if (returned_true == 0)
+ return false;
+ /* or we've done the true calculation too many times */
+ if (returned_true++ > 10)
+ return false;
+
+ return true;
+ } else {
+ /* haven't been here for a while, reset the true count */
+ returned_true = 0;
+ }
+
+ prev_jiffies = jiffies;
+
/* Check the watermark levels */
for (i = 0; i < pgdat->nr_zones; i++) {
struct zone *zone = pgdat->node_zones + i;
@@ -2286,9 +2306,16 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
* must be balanced
*/
if (order)
- return pgdat_balanced(pgdat, balanced, classzone_idx);
+ ret = !pgdat_balanced(pgdat, balanced, classzone_idx);
+ else
+ ret = !all_zones_ok;
+
+ if (ret)
+ returned_true++;
else
- return !all_zones_ok;
+ returned_true = 0;
+
+ return ret;
}
/*
next prev parent reply other threads:[~2011-05-13 0:47 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-11 15:29 [PATCH 0/3] Reduce impact to overall system of SLUB using high-order allocations Mel Gorman
2011-05-11 15:29 ` [PATCH 1/3] mm: slub: Do not wake kswapd for SLUBs speculative " Mel Gorman
2011-05-11 20:38 ` David Rientjes
2011-05-11 15:29 ` [PATCH 2/3] mm: slub: Do not take expensive steps " Mel Gorman
2011-05-11 20:38 ` David Rientjes
2011-05-11 21:10 ` Mel Gorman
2011-05-12 17:25 ` Andrea Arcangeli
2011-05-11 15:29 ` [PATCH 3/3] mm: slub: Default slub_max_order to 0 Mel Gorman
2011-05-11 20:38 ` David Rientjes
2011-05-11 20:53 ` James Bottomley
2011-05-11 21:09 ` Mel Gorman
2011-05-11 22:27 ` David Rientjes
2011-05-13 10:14 ` Mel Gorman
2011-05-12 17:36 ` Andrea Arcangeli
2011-05-16 21:03 ` David Rientjes
2011-05-17 9:48 ` Mel Gorman
2011-05-17 19:25 ` David Rientjes
2011-05-12 14:43 ` Christoph Lameter
2011-05-12 15:15 ` James Bottomley
2011-05-12 15:27 ` Christoph Lameter
2011-05-12 15:43 ` James Bottomley
2011-05-12 15:46 ` Dave Jones
2011-05-12 16:00 ` James Bottomley
2011-05-12 16:08 ` Dave Jones
2011-05-12 16:27 ` Christoph Lameter
2011-05-12 16:30 ` James Bottomley
2011-05-12 16:48 ` Christoph Lameter
2011-05-12 17:46 ` Andrea Arcangeli
2011-05-12 18:00 ` Christoph Lameter
2011-05-12 18:18 ` Andrea Arcangeli
2011-05-12 17:06 ` Pekka Enberg
2011-05-12 17:11 ` Pekka Enberg
2011-05-12 17:38 ` Christoph Lameter
2011-05-12 18:00 ` Andrea Arcangeli
2011-05-13 9:49 ` Mel Gorman
2011-05-15 16:39 ` Andrea Arcangeli
2011-05-16 8:42 ` Mel Gorman
2011-05-12 17:51 ` Andrea Arcangeli
2011-05-12 18:03 ` Christoph Lameter
2011-05-12 18:09 ` Andrea Arcangeli
2011-05-12 18:16 ` Christoph Lameter
2011-05-12 18:36 ` James Bottomley
2011-05-12 17:40 ` Andrea Arcangeli
2011-05-12 15:55 ` Pekka Enberg
2011-05-12 18:37 ` James Bottomley
2011-05-12 18:46 ` Christoph Lameter
2011-05-12 19:21 ` James Bottomley
2011-05-12 19:44 ` James Bottomley
2011-05-12 20:04 ` James Bottomley
2011-05-12 20:29 ` Johannes Weiner
2011-05-12 20:31 ` Johannes Weiner
2011-05-12 20:31 ` James Bottomley
2011-05-12 22:04 ` James Bottomley
2011-05-12 22:15 ` Johannes Weiner
2011-05-12 22:58 ` Minchan Kim
2011-05-13 5:39 ` Minchan Kim
2011-05-13 0:47 ` James Bottomley [this message]
2011-05-13 4:12 ` James Bottomley
2011-05-13 10:55 ` Mel Gorman
2011-05-13 14:16 ` James Bottomley
2011-05-13 10:30 ` Mel Gorman
2011-05-13 6:16 ` Pekka Enberg
2011-05-13 10:05 ` Mel Gorman
2011-05-12 16:01 ` Christoph Lameter
2011-05-12 16:10 ` Eric Dumazet
2011-05-12 17:37 ` Andrew Morton
2011-05-12 15:45 ` Dave Jones
2011-05-11 21:39 ` [PATCH 0/3] Reduce impact to overall system of SLUB using high-order allocations James Bottomley
2011-05-11 22:28 ` David Rientjes
2011-05-11 22:34 ` James Bottomley
2011-05-12 11:13 ` Pekka Enberg
2011-05-12 13:19 ` Mel Gorman
2011-05-12 14:04 ` James Bottomley
2011-05-12 15:53 ` James Bottomley
2011-05-13 11:25 ` Mel Gorman
2011-05-12 18:04 ` Andrea Arcangeli
2011-05-13 11:24 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1305247626.2575.111.camel@mulgrave.site \
--to=james.bottomley@hansenpartnership.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=cl@linux.com \
--cc=colin.king@canonical.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=penberg@kernel.org \
--cc=raghu.prabhu13@gmail.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).