Re: [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.com>, Linux-MM <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>, Pintu Kumar <pintu.k@samsung.com>,
	Xishi Qiu <qiuxishi@huawei.com>, Gioh Kim <gioh.kim@lge.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand
Date: Fri, 31 Jul 2015 09:43:50 +0100	[thread overview]
Message-ID: <20150731084350.GF5840@techsingularity.net> (raw)
In-Reply-To: <55BB31C4.3000107@suse.cz>

On Fri, Jul 31, 2015 at 10:28:52AM +0200, Vlastimil Babka wrote:
> On 07/29/2015 02:53 PM, Mel Gorman wrote:
> >On Wed, Jul 29, 2015 at 01:35:17PM +0200, Vlastimil Babka wrote:
> >>On 07/20/2015 10:00 AM, Mel Gorman wrote:
> >>>+/*
> >>>+ * Used when an allocation is about to fail under memory pressure. This
> >>>+ * potentially hurts the reliability of high-order allocations when under
> >>>+ * intense memory pressure but failed atomic allocations should be easier
> >>>+ * to recover from than an OOM.
> >>>+ */
> >>>+static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
> >>>+{
> >>>+	struct zonelist *zonelist = ac->zonelist;
> >>>+	unsigned long flags;
> >>>+	struct zoneref *z;
> >>>+	struct zone *zone;
> >>>+	struct page *page;
> >>>+	int order;
> >>>+
> >>>+	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
> >>>+								ac->nodemask) {
> >>
> >>This fixed order might bias some zones over others wrt unreserving. Is it OK?
> >
> >I could not think of a situation where it mattered. It'll always be
> >preferring highest zone over lower zones. Allocation requests that can
> >use any zone that do not care. Allocation requests that are limited to
> >lower zones are protected as long as possible.
> 
> Hmm... allocation requests will follow fair zone policy and thus the
> highatomic reservations will be spread fairly among all zones? Unless the
> allocations require lower zones of course.
> 

Does that matter?

> But for unreservations, normal/high allocations failing under memory
> pressure will lead to unreserving highatomic pageblocks first in the higher
> zones and only then the lower zones, and that was my concern. But it's true
> that failing allocations that require lower zones will lead to unreserving
> the lower zones, so it might be ok in the end.
> 

Again, I don't think it matters.

> >
> >>
> >>>+		/* Preserve at least one pageblock */
> >>>+		if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
> >>>+			continue;
> >>>+
> >>>+		spin_lock_irqsave(&zone->lock, flags);
> >>>+		for (order = 0; order < MAX_ORDER; order++) {
> >>
> >>Would it make more sense to look in descending order for a higher chance of
> >>unreserving a pageblock that's mostly free? Like the traditional page stealing does?
> >>
> >
> >I don't think it's worth the search cost. Traditional page stealing is
> >searching because it's trying to minimise events that cause external
> >fragmentation. Here we'd gain very little. We are under some memory
> >pressure here, if enough pages are not free then another one will get
> >freed shortly. Either way, I doubt the difference is measurable.
> 
> Hmm, I guess...
> 
> >
> >>>+			struct free_area *area = &(zone->free_area[order]);
> >>>+
> >>>+			if (list_empty(&area->free_list[MIGRATE_HIGHATOMIC]))
> >>>+				continue;
> >>>+
> >>>+			page = list_entry(area->free_list[MIGRATE_HIGHATOMIC].next,
> >>>+						struct page, lru);
> >>>+
> >>>+			zone->nr_reserved_highatomic -= pageblock_nr_pages;
> >>>+			set_pageblock_migratetype(page, ac->migratetype);
> >>
> >>Would it make more sense to assume MIGRATE_UNMOVABLE, as high-order allocations
> >>present in the pageblock typically would be, and apply the traditional page
> >>stealing heuristics to decide if it should be changed to ac->migratetype (if
> >>that differs)?
> >>
> >
> >Superb spot, I had to think about this for a while and initially I was
> >thinking your suggestion was a no-brainer and obviously the right thing
> >to do.
> >
> >On the pro side, it preserves the fragmentation logic because it'll force
> >the normal page stealing logic to be applied.
> >
> >On the con side, we may reassign the pageblock twice -- once to
> >MIGRATE_UNMOVABLE and once to ac->migratetype. That one does not matter
> >but the second con is that we inadvertly increase the number of unmovable
> >blocks in some cases.
> >
> >Lets say we default to MIGRATE_UNMOVABLE, ac->migratetype is MIGRATE_MOVABLE
> >and there are enough free pages to satisfy the allocation but not steal
> >the whole pageblock. The end result is that we have a new unmovable
> >pageblock that may not be necessary. The next unmovable allocation
> >potentially is forever. They key observation is that previously the
> >pageblock could have been short-lived high-order allocations that could
> >be completely free soon if it was assigned MIGRATE_MOVABLE. This may not
> >apply when SLUB is using high-order allocations but the point still
> >holds.
> 
> Yeah, I see the point. The obvious counterexample is a pageblock that we
> design as MOVABLE and yet it contains some long-lived unmovable allocation.
> More unmovable allocations could lead to choosing another movable block as a
> fallback, while if we marked this pageblock as unmovable, they could go here
> and not increase fragmentation.
> 
> The problem is, we can't known which one is the case. I've toyed with an
> idea of MIGRATE_MIXED blocks that would be for cases where the heuristics
> decide that e.g. an UNMOVABLE block is empty enough to change it to MOVABLE,
> but still it may contain some unmovable allocations. Such pageblocks should
> be preferred fallbacks for future unmovable allocations before truly
> pristine movable pageblocks.
> 

I tried that once upon a time. The number of blocks simply increased
over time and there was no sensible way to recover from it. I never
found a solution that worked for very long.

> >>>+
> >>>+	/*
> >>>+	 * If the caller is not atomic then discount the reserves. This will
> >>>+	 * over-estimate how the atomic reserve but it avoids a search
> >>>+	 */
> >>>+	if (likely(!(alloc_flags & ALLOC_HARDER)))
> >>>+		free_pages -= z->nr_reserved_highatomic;
> >>
> >>Hm, so in the case the maximum of 10% reserved blocks is already full, we deny
> >>the allocation access to another 10% of the memory and push it to reclaim. This
> >>seems rather excessive.
> >
> >It's necessary. If normal callers can use it then the reserve fills with
> >normal pages, the memory gets fragmented and high-order atomic allocations
> >fail due to fragmentation. Similarly, the number of MIGRATE_HIGHORDER
> >pageblocks cannot be unbound or everything else will be continually pushed
> >into reclaim even if there is plenty of memory free.
> 
> I understand denying normal allocations access to highatomic reserves via
> watermarks is necessary. But my concern is that for each reserved pageblock
> we effectively deny up to two pageblocks-worth-of-pages to normal
> allocations. One pageblock that is marked as MIGRATE_HIGHATOMIC, and once it
> becomes full, free_pages above are decreased twice - once by the pageblock
> becoming full, and then again by subtracting z->nr_reserved_highatomic. This
> extra gap is still usable by highatomic allocations, but they will also
> potentially mark more pageblocks highatomic and further increase the gap. In
> the worst case we have 10% of pageblocks marked highatomic and full, and
> another 10% that is only usable by highatomic allocations (but won't be
> marked as such), and if no more highatomic allocations come then the 10% is
> wasted.

Similar to what I said to Joosoo, I'll drop the size of the reserves in
the next revision. It brings things back in line with MIGRATE_RESERVE in
terms of the amount of memory we reserve while still removing the need
for high-order watermark checks. It means that the rate of atomic
high-order allocation failure will remain the same after the series as
before but that should not matter.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-07-31  8:43 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-20  8:00 [RFC PATCH 00/10] Remove zonelist cache and high-order watermark checking Mel Gorman
2015-07-20  8:00 ` [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache Mel Gorman
2015-07-21 23:47   ` David Rientjes
2015-07-23 10:58     ` Mel Gorman
2015-07-20  8:00 ` [PATCH 02/10] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe Mel Gorman
2015-07-21 23:49   ` David Rientjes
2015-07-28 12:20   ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 03/10] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing Mel Gorman
2015-07-22  0:08   ` David Rientjes
2015-07-23 12:28     ` Mel Gorman
2015-07-28 12:25   ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 04/10] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled Mel Gorman
2015-07-22  0:11   ` David Rientjes
2015-07-28 12:32   ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 05/10] mm, page_alloc: Remove unnecessary updating of GFP flags during normal operation Mel Gorman
2015-07-28 13:36   ` Vlastimil Babka
2015-07-28 13:47     ` Peter Zijlstra
2015-07-28 15:48     ` Mel Gorman
2015-07-20  8:00 ` [PATCH 06/10] mm, page_alloc: Use jump label to check if page grouping by mobility is enabled Mel Gorman
2015-07-28 13:42   ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 07/10] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types Mel Gorman
2015-07-20  8:00 ` [PATCH 08/10] mm, page_alloc: Remove MIGRATE_RESERVE Mel Gorman
2015-07-29  9:59   ` Vlastimil Babka
2015-07-29 12:25     ` Mel Gorman
2015-07-20  8:00 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-07-29 11:35   ` Vlastimil Babka
2015-07-29 12:53     ` Mel Gorman
2015-07-31  8:28       ` Vlastimil Babka
2015-07-31  8:43         ` Mel Gorman [this message]
2015-07-31  5:54   ` Joonsoo Kim
2015-07-31  7:11     ` Mel Gorman
2015-07-31  7:25       ` Vlastimil Babka
2015-07-31  8:22         ` Mel Gorman
2015-07-31  8:30         ` Joonsoo Kim
2015-07-31  8:26       ` Joonsoo Kim
2015-07-31  8:41         ` Mel Gorman
2015-07-20  8:00 ` [PATCH 10/10] mm, page_alloc: Only enforce watermarks for order-0 allocations Mel Gorman
2015-07-29 12:25   ` Vlastimil Babka
2015-07-29 13:04     ` Mel Gorman
2015-07-31  6:08   ` Joonsoo Kim
2015-07-31  7:19     ` Mel Gorman
2015-07-31  8:40       ` Joonsoo Kim
2015-07-31  6:14 ` [RFC PATCH 00/10] Remove zonelist cache and high-order watermark checking Joonsoo Kim
2015-07-31  7:20   ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2015-08-12 10:45 [PATCH 00/10] Remove zonelist cache and high-order watermark checking v2 Mel Gorman
2015-08-12 10:45 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-09-21 10:52 [PATCH 00/10] Remove zonelist cache and high-order watermark checking v4 Mel Gorman
2015-09-21 10:52 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-09-24 13:50   ` Michal Hocko
2015-09-25 19:22   ` Johannes Weiner
2015-09-29 21:01   ` Andrew Morton
2015-09-30  8:27     ` Mel Gorman
2015-09-30 14:02       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150731084350.GF5840@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=gioh.kim@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.com \
    --cc=pintu.k@samsung.com \
    --cc=qiuxishi@huawei.com \
    --cc=riel@redhat.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).