public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Thomas Sattler <tsattler@gmx.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Mel Gorman <mel@csn.ul.ie>
Subject: Re: iotop: khugepaged at 99.99% (2.6.38.X)
Date: Fri, 6 May 2011 19:20:19 +0200	[thread overview]
Message-ID: <20110506172019.GB6330@random.random> (raw)
In-Reply-To: <4DC40484.3050205@gmx.de>

On Fri, May 06, 2011 at 04:24:04PM +0200, Thomas Sattler wrote:
> > Aaarg, wrong kernel tree. I patched and compiled 2.6.38.5.
> > Do you think it is important to stay with 2.6.38.2, after
> > we know 2.6.38.4 is also affected?
> 
> I bootet 2.6.38.5.aa1 ("aa1" for the "make-it-worse-patch")

Sorry, unfortunately the make-it-worse-patch had a misplaced #if 0
which resulted in the VM not being able to reclaim, it should have
been around __alloc_pages_direct_compact and instead it was around
__alloc_pages_direct_reclaim (I noticed the hard way too).

The second patch (hotfix, not the make-it-worse) I sent should work
just fine instead.

Other ways we could fix it (if my vmstat per-cpu theory is right)
would be to call the equivalent of start_cpu_timer() to
schedule_delayed_work_on every CPU after congestion_wait returns
before re-evaluating too_many_isolated (however that would still add a
100msec latency here and there plus doing some overscheduling in
possibly no VM-congested situations where just one task quit releasing
all anon memory in the inactive list), or probably to always return
false from too_many_isolated if nr_isolated_anon <
threshold*CONFIG_NR_CPUS would be enough to sort the per-cpu
accounting error.. but personally I prefer to nuke the function for
all reasons mentioned in the prev email and go ahead and drop the
isolated counter too. However a more strict fix would give more
confirmation that we're not hiding a stat accounting error and confirm
my theory, but for the long run (after having spent a day reading that
function) I don't really like to keep it.

The correct make-it-worse patch would be this (and this time I tested
it before sending ;). This should speedup the time it takes to
reproduce as it'll always enter reclaim with __GFP_NO_KSWAPD
allocations (while previously it'd enter reclaim only if compaction
failed). And entering reclaim without kswapd running and churning over
the per-cpu stats and adding stuff from active to the inactive list
even when the inactive list gets trimmed to zero by an exit(), would
screw things up.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9f8a97b..3dcd442 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2093,6 +2093,7 @@ rebalance:
 	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
 		goto nopage;
 
+#if 0
 	/*
 	 * Try direct compaction. The first pass is asynchronous. Subsequent
 	 * attempts after direct reclaim are synchronous
@@ -2105,7 +2106,8 @@ rebalance:
 					sync_migration);
 	if (page)
 		goto got_pg;
-	sync_migration = !(gfp_mask & __GFP_NO_KSWAPD);
+#endif
+	sync_migration = true;
 
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,


  reply	other threads:[~2011-05-06 17:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-20 23:28 iotop: khugepaged at 99.99% (2.6.38.3) Thomas Sattler
2011-04-27 13:46 ` Andrea Arcangeli
2011-05-04 12:20   ` Thomas Sattler
2011-05-04 12:37     ` Thomas Sattler
2011-05-04 14:38     ` Andrea Arcangeli
2011-05-05 13:08       ` Thomas Sattler
2011-05-05 22:04       ` iotop: khugepaged at 99.99% (2.6.38.X) Thomas Sattler
2011-05-06  1:13         ` Andrea Arcangeli
2011-05-06  6:35           ` Andrea Arcangeli
2011-05-06  8:49           ` Thomas Sattler
2011-05-06  8:54             ` Thomas Sattler
2011-05-06 14:24               ` Thomas Sattler
2011-05-06 17:20                 ` Andrea Arcangeli [this message]
2011-05-06 17:55             ` Andrea Arcangeli
2011-05-11 10:53 ` iotop: khugepaged at 99.99% (2.6.38.3) Ulrich Keller
2011-05-12 14:03   ` Andrea Arcangeli
2011-05-16  9:27     ` Ulrich Keller
2011-05-16 12:29       ` Ulrich Keller
2011-05-23 18:05     ` Johannes Hirte
2011-05-25 16:06       ` Andrea Arcangeli
2011-05-25 20:44         ` Thomas Sattler
2011-06-01 19:37     ` Gilles Hamel
2011-06-13 10:28 ` Antonio Messina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110506172019.GB6330@random.random \
    --to=aarcange@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=tsattler@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox