From: Andrew Morton <akpm@linux-foundation.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Richard Davies <richard@arachsys.com>, KVM <kvm@vger.kernel.org>,
QEMU-devel <qemu-devel@nongnu.org>,
LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>, Avi Kivity <avi@redhat.com>,
Shaohua Li <shli@kernel.org>
Subject: Re: [Qemu-devel] [PATCH 8/9] mm: compaction: Cache if a pageblock was scanned and no pages were isolated
Date: Fri, 21 Sep 2012 14:36:56 -0700 [thread overview]
Message-ID: <20120921143656.60a9a6cd.akpm@linux-foundation.org> (raw)
In-Reply-To: <1348224383-1499-9-git-send-email-mgorman@suse.de>
On Fri, 21 Sep 2012 11:46:22 +0100
Mel Gorman <mgorman@suse.de> wrote:
> When compaction was implemented it was known that scanning could potentially
> be excessive. The ideal was that a counter be maintained for each pageblock
> but maintaining this information would incur a severe penalty due to a
> shared writable cache line. It has reached the point where the scanning
> costs are an serious problem, particularly on long-lived systems where a
> large process starts and allocates a large number of THPs at the same time.
>
> Instead of using a shared counter, this patch adds another bit to the
> pageblock flags called PG_migrate_skip. If a pageblock is scanned by
> either migrate or free scanner and 0 pages were isolated, the pageblock
> is marked to be skipped in the future. When scanning, this bit is checked
> before any scanning takes place and the block skipped if set.
>
> The main difficulty with a patch like this is "when to ignore the cached
> information?" If it's ignored too often, the scanning rates will still
> be excessive. If the information is too stale then allocations will fail
> that might have otherwise succeeded. In this patch
>
> o CMA always ignores the information
> o If the migrate and free scanner meet then the cached information will
> be discarded if it's at least 5 seconds since the last time the cache
> was discarded
> o If there are a large number of allocation failures, discard the cache.
>
> The time-based heuristic is very clumsy but there are few choices for a
> better event. Depending solely on multiple allocation failures still allows
> excessive scanning when THP allocations are failing in quick succession
> due to memory pressure. Waiting until memory pressure is relieved would
> cause compaction to continually fail instead of using reclaim/compaction
> to try allocate the page. The time-based mechanism is clumsy but a better
> option is not obvious.
ick.
Wall time has sooo little relationship to what's happening in there.
If we *have* to use polling, cannot we clock the poll with some metric
which is at least vaguely related to the amount of activity? Number
(or proportion) of pages scanned, for example? Or reset everything on
the Nth trip around the zone? Or even a combination of one of these
*and* of wall time, so the system will at least work harder when MM is
under load.
Also, what has to be done to avoid the polling altogether? eg/ie, zap
a pageblock's PB_migrate_skip synchronously, when something was done to
that pageblock which justifies repolling it?
>
> ...
>
> +static void reset_isolation_suitable(struct zone *zone)
> +{
> + unsigned long start_pfn = zone->zone_start_pfn;
> + unsigned long end_pfn = zone->zone_start_pfn + zone->spanned_pages;
> + unsigned long pfn;
> +
> + /*
> + * Do not reset more than once every five seconds. If allocations are
> + * failing sufficiently quickly to allow this to happen then continually
> + * scanning for compaction is not going to help. The choice of five
> + * seconds is arbitrary but will mitigate excessive scanning.
> + */
> + if (time_before(jiffies, zone->compact_blockskip_expire))
> + return;
> + zone->compact_blockskip_expire = jiffies + (HZ * 5);
> +
> + /* Walk the zone and mark every pageblock as suitable for isolation */
> + for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
> + struct page *page;
> + if (!pfn_valid(pfn))
> + continue;
> +
> + page = pfn_to_page(pfn);
> + if (zone != page_zone(page))
> + continue;
> +
> + clear_pageblock_skip(page);
> + }
What's the worst-case loop count here?
> +}
> +
>
> ...
>
next prev parent reply other threads:[~2012-09-21 21:37 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-21 10:46 [Qemu-devel] [PATCH 0/9] Reduce compaction scanning and lock contention Mel Gorman
2012-09-21 10:46 ` [Qemu-devel] [PATCH 1/9] Revert "mm: compaction: check lock contention first before taking lock" Mel Gorman
2012-09-21 17:46 ` Rafael Aquini
2012-09-21 10:46 ` [Qemu-devel] [PATCH 2/9] Revert "mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix" Mel Gorman
2012-09-21 17:47 ` Rafael Aquini
2012-09-21 10:46 ` [Qemu-devel] [PATCH 3/9] Revert "mm: compaction: abort compaction loop if lock is contended or run too long" Mel Gorman
2012-09-21 17:48 ` Rafael Aquini
2012-09-21 10:46 ` [Qemu-devel] [PATCH 4/9] mm: compaction: Abort compaction loop if lock is contended or run too long Mel Gorman
2012-09-21 17:50 ` Rafael Aquini
2012-09-21 21:31 ` Andrew Morton
2012-09-25 7:34 ` Minchan Kim
2012-09-21 10:46 ` [Qemu-devel] [PATCH 5/9] mm: compaction: Acquire the zone->lru_lock as late as possible Mel Gorman
2012-09-21 17:51 ` Rafael Aquini
2012-09-25 7:05 ` Minchan Kim
2012-09-25 7:51 ` Mel Gorman
2012-09-25 8:13 ` Minchan Kim
2012-09-25 21:39 ` Andrew Morton
2012-09-26 0:23 ` Minchan Kim
2012-09-26 10:17 ` Mel Gorman
2012-09-21 10:46 ` [Qemu-devel] [PATCH 6/9] mm: compaction: Acquire the zone->lock " Mel Gorman
2012-09-21 17:52 ` Rafael Aquini
2012-09-21 21:35 ` Andrew Morton
2012-09-24 8:52 ` Mel Gorman
2012-09-25 7:36 ` Minchan Kim
2012-09-25 7:35 ` Minchan Kim
2012-09-21 10:46 ` [Qemu-devel] [PATCH 7/9] Revert "mm: have order > 0 compaction start off where it left" Mel Gorman
2012-09-21 17:52 ` Rafael Aquini
2012-09-25 7:37 ` Minchan Kim
2012-09-21 10:46 ` [Qemu-devel] [PATCH 8/9] mm: compaction: Cache if a pageblock was scanned and no pages were isolated Mel Gorman
2012-09-21 17:53 ` Rafael Aquini
2012-09-21 21:36 ` Andrew Morton [this message]
2012-09-24 9:39 ` Mel Gorman
2012-09-24 21:26 ` Andrew Morton
2012-09-25 9:12 ` Mel Gorman
2012-09-25 20:03 ` Andrew Morton
2012-09-27 12:06 ` [Qemu-devel] [PATCH] mm: compaction: cache if a pageblock was scanned and no pages were isolated -fix2 Mel Gorman
2012-09-27 13:12 ` [Qemu-devel] [PATCH 8/9] mm: compaction: Cache if a pageblock was scanned and no pages were isolated Mel Gorman
2012-09-26 0:49 ` Minchan Kim
2012-09-27 12:14 ` Mel Gorman
2012-09-21 10:46 ` [Qemu-devel] [PATCH 9/9] mm: compaction: Restart compaction from near where it left off Mel Gorman
2012-09-21 17:54 ` Rafael Aquini
2012-09-21 13:51 ` [Qemu-devel] [PATCH 0/9] Reduce compaction scanning and lock contention Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120921143656.60a9a6cd.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=qemu-devel@nongnu.org \
--cc=richard@arachsys.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).