All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Thomas Sattler <tsattler@gmx.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Mel Gorman <mel@csn.ul.ie>
Subject: Re: iotop: khugepaged at 99.99% (2.6.38.X)
Date: Fri, 6 May 2011 19:20:19 +0200	[thread overview]
Message-ID: <20110506172019.GB6330@random.random> (raw)
In-Reply-To: <4DC40484.3050205@gmx.de>

On Fri, May 06, 2011 at 04:24:04PM +0200, Thomas Sattler wrote:
> > Aaarg, wrong kernel tree. I patched and compiled 2.6.38.5.
> > Do you think it is important to stay with 2.6.38.2, after
> > we know 2.6.38.4 is also affected?
> 
> I bootet 2.6.38.5.aa1 ("aa1" for the "make-it-worse-patch")

Sorry, unfortunately the make-it-worse-patch had a misplaced #if 0
which resulted in the VM not being able to reclaim, it should have
been around __alloc_pages_direct_compact and instead it was around
__alloc_pages_direct_reclaim (I noticed the hard way too).

The second patch (hotfix, not the make-it-worse) I sent should work
just fine instead.

Other ways we could fix it (if my vmstat per-cpu theory is right)
would be to call the equivalent of start_cpu_timer() to
schedule_delayed_work_on every CPU after congestion_wait returns
before re-evaluating too_many_isolated (however that would still add a
100msec latency here and there plus doing some overscheduling in
possibly no VM-congested situations where just one task quit releasing
all anon memory in the inactive list), or probably to always return
false from too_many_isolated if nr_isolated_anon <
threshold*CONFIG_NR_CPUS would be enough to sort the per-cpu
accounting error.. but personally I prefer to nuke the function for
all reasons mentioned in the prev email and go ahead and drop the
isolated counter too. However a more strict fix would give more
confirmation that we're not hiding a stat accounting error and confirm
my theory, but for the long run (after having spent a day reading that
function) I don't really like to keep it.

The correct make-it-worse patch would be this (and this time I tested
it before sending ;). This should speedup the time it takes to
reproduce as it'll always enter reclaim with __GFP_NO_KSWAPD
allocations (while previously it'd enter reclaim only if compaction
failed). And entering reclaim without kswapd running and churning over
the per-cpu stats and adding stuff from active to the inactive list
even when the inactive list gets trimmed to zero by an exit(), would
screw things up.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9f8a97b..3dcd442 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2093,6 +2093,7 @@ rebalance:
 	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
 		goto nopage;
 
+#if 0
 	/*
 	 * Try direct compaction. The first pass is asynchronous. Subsequent
 	 * attempts after direct reclaim are synchronous
@@ -2105,7 +2106,8 @@ rebalance:
 					sync_migration);
 	if (page)
 		goto got_pg;
-	sync_migration = !(gfp_mask & __GFP_NO_KSWAPD);
+#endif
+	sync_migration = true;
 
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,


  reply	other threads:[~2011-05-06 17:20 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-20 23:28 iotop: khugepaged at 99.99% (2.6.38.3) Thomas Sattler
2011-04-27 13:46 ` Andrea Arcangeli
2011-05-04 12:20   ` Thomas Sattler
2011-05-04 12:37     ` Thomas Sattler
2011-05-04 14:38     ` Andrea Arcangeli
2011-05-05 13:08       ` Thomas Sattler
2011-05-05 22:04       ` iotop: khugepaged at 99.99% (2.6.38.X) Thomas Sattler
2011-05-06  1:13         ` Andrea Arcangeli
2011-05-06  6:35           ` Andrea Arcangeli
2011-05-06  8:49           ` Thomas Sattler
2011-05-06  8:54             ` Thomas Sattler
2011-05-06 14:24               ` Thomas Sattler
2011-05-06 17:20                 ` Andrea Arcangeli [this message]
2011-05-06 17:55             ` Andrea Arcangeli
2011-05-11 10:53 ` iotop: khugepaged at 99.99% (2.6.38.3) Ulrich Keller
2011-05-12 14:03   ` Andrea Arcangeli
2011-05-16  9:27     ` Ulrich Keller
2011-05-16 12:29       ` Ulrich Keller
2011-05-23 18:05     ` Johannes Hirte
2011-05-25 16:06       ` Andrea Arcangeli
2011-05-25 20:44         ` Thomas Sattler
2011-06-01 19:37     ` Gilles Hamel
2011-06-13 10:28 ` Antonio Messina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110506172019.GB6330@random.random \
    --to=aarcange@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=tsattler@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.