All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Linux-MM <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>, Pintu Kumar <pintu.k@samsung.com>,
	Xishi Qiu <qiuxishi@huawei.com>, Gioh Kim <gioh.kim@lge.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand
Date: Fri, 31 Jul 2015 09:22:21 +0100	[thread overview]
Message-ID: <20150731082221.GD5840@techsingularity.net> (raw)
In-Reply-To: <55BB22D9.5040200@suse.cz>

On Fri, Jul 31, 2015 at 09:25:13AM +0200, Vlastimil Babka wrote:
> On 07/31/2015 09:11 AM, Mel Gorman wrote:
> >On Fri, Jul 31, 2015 at 02:54:07PM +0900, Joonsoo Kim wrote:
> >>Hello, Mel.
> >>
> >>On Mon, Jul 20, 2015 at 09:00:18AM +0100, Mel Gorman wrote:
> >>>From: Mel Gorman <mgorman@suse.de>
> >>>
> >>>High-order watermark checking exists for two reasons --  kswapd high-order
> >>>awareness and protection for high-order atomic requests. Historically we
> >>>depended on MIGRATE_RESERVE to preserve min_free_kbytes as high-order free
> >>>pages for as long as possible. This patch introduces MIGRATE_HIGHATOMIC
> >>>that reserves pageblocks for high-order atomic allocations. This is expected
> >>>to be more reliable than MIGRATE_RESERVE was.
> >>
> >>I have some concerns on this patch.
> >>
> >>1) This patch breaks intention of __GFP_WAIT.
> >>__GFP_WAIT is used when we want to succeed allocation even if we need
> >>to do some reclaim/compaction work. That implies importance of
> >>allocation success. But, reserved pageblock for MIGRATE_HIGHATOMIC makes
> >>atomic allocation (~__GFP_WAIT) more successful than allocation with
> >>__GFP_WAIT in many situation. It breaks basic assumption of gfp flags
> >>and doesn't make any sense.
> >>
> >
> >Currently allocation requests that do not specify __GFP_WAIT get the
> >ALLOC_HARDER flag which allows them to dip further into watermark reserves.
> >It already is the case that there are corner cases where a high atomic
> >allocation can succeed when a non-atomic allocation would reclaim.
> 
> I think (and said so before elsewhere) is that the problem is that we don't
> currently distinguish allocations that can't wait (=are really atomic and
> have no order-0 fallback) and allocations that just don't want to wait
> (=they have fallbacks). The second ones should obviously not access the
> current ALLOC_HARDER watermark-based reserves nor the proposed highatomic
> reserves.
> 

It's a separate issue though.  There are a number of cases

1. can't wait because a spinlock is held or in interrupt
2. does not want to wait because a fallback option is available
3. does not want to wait or wake kswapd because a fallback option is available
4. should not fail as it would cause major difficulties
5. cannot fail because it's a functional failure

5 is never meant to occur might be the situation on embedded platforms.
If this is the case then they should consider modifying MIGRATE_HIGHATOMIC
in this patch series but I don't think it belongs in mainline as it has
other consequences. 1-4 are a separate series.

Right now, this series does not drastically alter the concept that in
some cases atomic allocations will succeed without delay when a !atomic
allocation would have to reclaim.

There is certainly value to ironing out 1-4 on top and teaching SLUB,
THP and networking the distinction.

> Well we do look at __GFP_NO_KSWAPD flag to treat allocation as non-atomic,
> so that covers THP allocations and two drivers. But the recent networking
> commit fb05e7a89f50 didn't add the flag and nor does Joonsoo's slub patch
> use it. Either we should rename the flag and employ it where appropriate, or
> agree that access to reserves is orthogonal concern to waking up kswapd, and
> distinguish non-atomic non-__GFP_WAIT allocations differently.
> 

Separate problem with a separate series. This one is about removing
the zonelist cache due to complexity and removing an odd anomaly where
allocations can fail due to how watermarks are calculated.

> >>>A MIGRATE_HIGHORDER pageblock is created when an allocation request steals
> >>>a pageblock but limits the total number to 10% of the zone.
> >>
> >>When steals happens, pageblock already can be fragmented and we can't
> >>fully utilize this pageblock without allowing order-0 allocation. This
> >>is very waste.
> >>
> >
> >If the pageblock was stolen, it implies there was at least 1 usable page
> >of the correct order. As the pageblock is then reserved, any pages that
> >free in that block stay free for use by high-order atomic allocations.
> >Else, the number of pageblocks will increase again until the 10% limit
> >is hit.
> 
> It's however true that many of the "any pages free in that block" may be
> order-0, so they both won't be useful to high-order atomic allocations, and
> won't be available to other allocations, so they might remain unused.

I typoed lightly and missed a letter but the same outcome applies when
slightly corrected -- any pages that are *freed* in that block stay free
if it merges with buddies for use by high-order atomic allocations.
 or else, the number of pageblocks will increase again until the 10%
limit is hit.

If the limit is hit and we are still failing then it's no different to
what can happen today except it took a lot longer and was a lot harder to
trigger. As the changelog pointed out, with this approach the allocation
failure rate was massively reduced but not eliminated.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Linux-MM <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>, Pintu Kumar <pintu.k@samsung.com>,
	Xishi Qiu <qiuxishi@huawei.com>, Gioh Kim <gioh.kim@lge.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand
Date: Fri, 31 Jul 2015 09:22:21 +0100	[thread overview]
Message-ID: <20150731082221.GD5840@techsingularity.net> (raw)
In-Reply-To: <55BB22D9.5040200@suse.cz>

On Fri, Jul 31, 2015 at 09:25:13AM +0200, Vlastimil Babka wrote:
> On 07/31/2015 09:11 AM, Mel Gorman wrote:
> >On Fri, Jul 31, 2015 at 02:54:07PM +0900, Joonsoo Kim wrote:
> >>Hello, Mel.
> >>
> >>On Mon, Jul 20, 2015 at 09:00:18AM +0100, Mel Gorman wrote:
> >>>From: Mel Gorman <mgorman@suse.de>
> >>>
> >>>High-order watermark checking exists for two reasons --  kswapd high-order
> >>>awareness and protection for high-order atomic requests. Historically we
> >>>depended on MIGRATE_RESERVE to preserve min_free_kbytes as high-order free
> >>>pages for as long as possible. This patch introduces MIGRATE_HIGHATOMIC
> >>>that reserves pageblocks for high-order atomic allocations. This is expected
> >>>to be more reliable than MIGRATE_RESERVE was.
> >>
> >>I have some concerns on this patch.
> >>
> >>1) This patch breaks intention of __GFP_WAIT.
> >>__GFP_WAIT is used when we want to succeed allocation even if we need
> >>to do some reclaim/compaction work. That implies importance of
> >>allocation success. But, reserved pageblock for MIGRATE_HIGHATOMIC makes
> >>atomic allocation (~__GFP_WAIT) more successful than allocation with
> >>__GFP_WAIT in many situation. It breaks basic assumption of gfp flags
> >>and doesn't make any sense.
> >>
> >
> >Currently allocation requests that do not specify __GFP_WAIT get the
> >ALLOC_HARDER flag which allows them to dip further into watermark reserves.
> >It already is the case that there are corner cases where a high atomic
> >allocation can succeed when a non-atomic allocation would reclaim.
> 
> I think (and said so before elsewhere) is that the problem is that we don't
> currently distinguish allocations that can't wait (=are really atomic and
> have no order-0 fallback) and allocations that just don't want to wait
> (=they have fallbacks). The second ones should obviously not access the
> current ALLOC_HARDER watermark-based reserves nor the proposed highatomic
> reserves.
> 

It's a separate issue though.  There are a number of cases

1. can't wait because a spinlock is held or in interrupt
2. does not want to wait because a fallback option is available
3. does not want to wait or wake kswapd because a fallback option is available
4. should not fail as it would cause major difficulties
5. cannot fail because it's a functional failure

5 is never meant to occur might be the situation on embedded platforms.
If this is the case then they should consider modifying MIGRATE_HIGHATOMIC
in this patch series but I don't think it belongs in mainline as it has
other consequences. 1-4 are a separate series.

Right now, this series does not drastically alter the concept that in
some cases atomic allocations will succeed without delay when a !atomic
allocation would have to reclaim.

There is certainly value to ironing out 1-4 on top and teaching SLUB,
THP and networking the distinction.

> Well we do look at __GFP_NO_KSWAPD flag to treat allocation as non-atomic,
> so that covers THP allocations and two drivers. But the recent networking
> commit fb05e7a89f50 didn't add the flag and nor does Joonsoo's slub patch
> use it. Either we should rename the flag and employ it where appropriate, or
> agree that access to reserves is orthogonal concern to waking up kswapd, and
> distinguish non-atomic non-__GFP_WAIT allocations differently.
> 

Separate problem with a separate series. This one is about removing
the zonelist cache due to complexity and removing an odd anomaly where
allocations can fail due to how watermarks are calculated.

> >>>A MIGRATE_HIGHORDER pageblock is created when an allocation request steals
> >>>a pageblock but limits the total number to 10% of the zone.
> >>
> >>When steals happens, pageblock already can be fragmented and we can't
> >>fully utilize this pageblock without allowing order-0 allocation. This
> >>is very waste.
> >>
> >
> >If the pageblock was stolen, it implies there was at least 1 usable page
> >of the correct order. As the pageblock is then reserved, any pages that
> >free in that block stay free for use by high-order atomic allocations.
> >Else, the number of pageblocks will increase again until the 10% limit
> >is hit.
> 
> It's however true that many of the "any pages free in that block" may be
> order-0, so they both won't be useful to high-order atomic allocations, and
> won't be available to other allocations, so they might remain unused.

I typoed lightly and missed a letter but the same outcome applies when
slightly corrected -- any pages that are *freed* in that block stay free
if it merges with buddies for use by high-order atomic allocations.
 or else, the number of pageblocks will increase again until the 10%
limit is hit.

If the limit is hit and we are still failing then it's no different to
what can happen today except it took a lot longer and was a lot harder to
trigger. As the changelog pointed out, with this approach the allocation
failure rate was massively reduced but not eliminated.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2015-07-31  8:22 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-20  8:00 [RFC PATCH 00/10] Remove zonelist cache and high-order watermark checking Mel Gorman
2015-07-20  8:00 ` Mel Gorman
2015-07-20  8:00 ` [PATCH 01/10] mm, page_alloc: Delete the zonelist_cache Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-21 23:47   ` David Rientjes
2015-07-21 23:47     ` David Rientjes
2015-07-23 10:58     ` Mel Gorman
2015-07-23 10:58       ` Mel Gorman
2015-07-20  8:00 ` [PATCH 02/10] mm, page_alloc: Remove unnecessary parameter from zone_watermark_ok_safe Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-21 23:49   ` David Rientjes
2015-07-21 23:49     ` David Rientjes
2015-07-28 12:20   ` Vlastimil Babka
2015-07-28 12:20     ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 03/10] mm, page_alloc: Remove unnecessary recalculations for dirty zone balancing Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-22  0:08   ` David Rientjes
2015-07-22  0:08     ` David Rientjes
2015-07-23 12:28     ` Mel Gorman
2015-07-23 12:28       ` Mel Gorman
2015-07-28 12:25   ` Vlastimil Babka
2015-07-28 12:25     ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 04/10] mm, page_alloc: Remove unnecessary taking of a seqlock when cpusets are disabled Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-22  0:11   ` David Rientjes
2015-07-22  0:11     ` David Rientjes
2015-07-28 12:32   ` Vlastimil Babka
2015-07-28 12:32     ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 05/10] mm, page_alloc: Remove unnecessary updating of GFP flags during normal operation Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-28 13:36   ` Vlastimil Babka
2015-07-28 13:36     ` Vlastimil Babka
2015-07-28 13:47     ` Peter Zijlstra
2015-07-28 13:47       ` Peter Zijlstra
2015-07-28 15:48     ` Mel Gorman
2015-07-28 15:48       ` Mel Gorman
2015-07-20  8:00 ` [PATCH 06/10] mm, page_alloc: Use jump label to check if page grouping by mobility is enabled Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-28 13:42   ` Vlastimil Babka
2015-07-28 13:42     ` Vlastimil Babka
2015-07-20  8:00 ` [PATCH 07/10] mm, page_alloc: Use masks and shifts when converting GFP flags to migrate types Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-20  8:00 ` [PATCH 08/10] mm, page_alloc: Remove MIGRATE_RESERVE Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-29  9:59   ` Vlastimil Babka
2015-07-29  9:59     ` Vlastimil Babka
2015-07-29 12:25     ` Mel Gorman
2015-07-29 12:25       ` Mel Gorman
2015-07-20  8:00 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-29 11:35   ` Vlastimil Babka
2015-07-29 11:35     ` Vlastimil Babka
2015-07-29 12:53     ` Mel Gorman
2015-07-29 12:53       ` Mel Gorman
2015-07-31  8:28       ` Vlastimil Babka
2015-07-31  8:28         ` Vlastimil Babka
2015-07-31  8:43         ` Mel Gorman
2015-07-31  8:43           ` Mel Gorman
2015-07-31  5:54   ` Joonsoo Kim
2015-07-31  5:54     ` Joonsoo Kim
2015-07-31  7:11     ` Mel Gorman
2015-07-31  7:11       ` Mel Gorman
2015-07-31  7:25       ` Vlastimil Babka
2015-07-31  7:25         ` Vlastimil Babka
2015-07-31  8:22         ` Mel Gorman [this message]
2015-07-31  8:22           ` Mel Gorman
2015-07-31  8:30         ` Joonsoo Kim
2015-07-31  8:30           ` Joonsoo Kim
2015-07-31  8:26       ` Joonsoo Kim
2015-07-31  8:26         ` Joonsoo Kim
2015-07-31  8:41         ` Mel Gorman
2015-07-31  8:41           ` Mel Gorman
2015-07-20  8:00 ` [PATCH 10/10] mm, page_alloc: Only enforce watermarks for order-0 allocations Mel Gorman
2015-07-20  8:00   ` Mel Gorman
2015-07-29 12:25   ` Vlastimil Babka
2015-07-29 12:25     ` Vlastimil Babka
2015-07-29 13:04     ` Mel Gorman
2015-07-29 13:04       ` Mel Gorman
2015-07-31  6:08   ` Joonsoo Kim
2015-07-31  6:08     ` Joonsoo Kim
2015-07-31  7:19     ` Mel Gorman
2015-07-31  7:19       ` Mel Gorman
2015-07-31  8:40       ` Joonsoo Kim
2015-07-31  8:40         ` Joonsoo Kim
2015-07-31  6:14 ` [RFC PATCH 00/10] Remove zonelist cache and high-order watermark checking Joonsoo Kim
2015-07-31  6:14   ` Joonsoo Kim
2015-07-31  7:20   ` Mel Gorman
2015-07-31  7:20     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2015-08-12 10:45 [PATCH 00/10] Remove zonelist cache and high-order watermark checking v2 Mel Gorman
2015-08-12 10:45 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-08-12 10:45   ` Mel Gorman
2015-09-21 10:52 [PATCH 00/10] Remove zonelist cache and high-order watermark checking v4 Mel Gorman
2015-09-21 10:52 ` [PATCH 09/10] mm, page_alloc: Reserve pageblocks for high-order atomic allocations on demand Mel Gorman
2015-09-21 10:52   ` Mel Gorman
2015-09-24 13:50   ` Michal Hocko
2015-09-24 13:50     ` Michal Hocko
2015-09-25 19:22   ` Johannes Weiner
2015-09-25 19:22     ` Johannes Weiner
2015-09-29 21:01   ` Andrew Morton
2015-09-29 21:01     ` Andrew Morton
2015-09-30  8:27     ` Mel Gorman
2015-09-30  8:27       ` Mel Gorman
2015-09-30 14:02       ` Vlastimil Babka
2015-09-30 14:02         ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150731082221.GD5840@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=gioh.kim@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pintu.k@samsung.com \
    --cc=qiuxishi@huawei.com \
    --cc=riel@redhat.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.