* [PATCH 1/6] mm, page_alloc: Only check PageCompound for high-order pages -fix
2016-04-27 14:57 [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Mel Gorman
@ 2016-04-27 14:57 ` Mel Gorman
2016-04-27 14:57 ` [PATCH 2/6] mm, page_alloc: move might_sleep_if check to the allocator slowpath -revert Mel Gorman
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML,
Mel Gorman
Vlastimil Babka pointed out that an unlikely annotation in free_pages_prepare
shrinks stack usage by moving compound handling to the end of the function.
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-30 (-30)
function old new delta
free_pages_prepare 771 741 -30
It's also consistent with the buffered_rmqueue path.
This is a fix to the mmotm patch
mm-page_alloc-only-check-pagecompound-for-high-order-pages.patch.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1da56779f8fa..d8383750bd43 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1003,7 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
* Check tail pages before head page information is cleared to
* avoid checking PageCompound for order-0 pages.
*/
- if (order) {
+ if (unlikely(order)) {
bool compound = PageCompound(page);
int i;
--
2.6.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 2/6] mm, page_alloc: move might_sleep_if check to the allocator slowpath -revert
2016-04-27 14:57 [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Mel Gorman
2016-04-27 14:57 ` [PATCH 1/6] mm, page_alloc: Only check PageCompound for high-order pages -fix Mel Gorman
@ 2016-04-27 14:57 ` Mel Gorman
2016-04-27 14:57 ` [PATCH 3/6] mm, page_alloc: Check once if a zone has isolated pageblocks -fix Mel Gorman
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML,
Mel Gorman
Vlastimil Babka pointed out that a patch weakens a zone_reclaim test
which while "safe" defeats the purposes of the debugging check. As most
configurations eliminate this check anyway, I thought it was better to
simply revert the patch instead of adding a second check in zone_reclaim.
This is a revert of the mmotm patch
mm-page_alloc-move-might_sleep_if-check-to-the-allocator-slowpath.patch .
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8383750bd43..9ad4e68486e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3606,8 +3606,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
return NULL;
}
- might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
-
/*
* We also sanity check to catch abuse of atomic reserves being used by
* callers that are not in atomic context.
@@ -3806,6 +3804,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
lockdep_trace_alloc(gfp_mask);
+ might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
+
if (should_fail_alloc_page(gfp_mask, order))
return NULL;
--
2.6.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/6] mm, page_alloc: Check once if a zone has isolated pageblocks -fix
2016-04-27 14:57 [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Mel Gorman
2016-04-27 14:57 ` [PATCH 1/6] mm, page_alloc: Only check PageCompound for high-order pages -fix Mel Gorman
2016-04-27 14:57 ` [PATCH 2/6] mm, page_alloc: move might_sleep_if check to the allocator slowpath -revert Mel Gorman
@ 2016-04-27 14:57 ` Mel Gorman
2016-04-27 14:57 ` [PATCH 4/6] mm, page_alloc: un-inline the bad part of free_pages_check Mel Gorman
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML,
Mel Gorman
Vlastimil Babka pointed out that the original code was protected by
the zone lock and provided a fix.
This is a fix to the mmotm patch
mm-page_alloc-check-once-if-a-zone-has-isolated-pageblocks.patch . Once
applied the following line should be removed from the changelog "Technically
this is race-prone but so is the existing code."
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ad4e68486e9..c63e5e7e4864 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1098,9 +1098,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
int migratetype = 0;
int batch_free = 0;
unsigned long nr_scanned;
- bool isolated_pageblocks = has_isolate_pageblock(zone);
+ bool isolated_pageblocks;
spin_lock(&zone->lock);
+ isolated_pageblocks = has_isolate_pageblock(zone);
nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
if (nr_scanned)
__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
--
2.6.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/6] mm, page_alloc: un-inline the bad part of free_pages_check
2016-04-27 14:57 [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Mel Gorman
` (2 preceding siblings ...)
2016-04-27 14:57 ` [PATCH 3/6] mm, page_alloc: Check once if a zone has isolated pageblocks -fix Mel Gorman
@ 2016-04-27 14:57 ` Mel Gorman
2016-04-27 14:57 ` [PATCH 5/6] mm, page_alloc: pull out side effects from free_pages_check Mel Gorman
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML,
Mel Gorman
From: Vlastimil Babka <vbabka@suse.cz>
!DEBUG_VM size and bloat-o-meter:
add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246)
function old new delta
free_pages_check_bad - 124 +124
free_pcppages_bulk 1288 1171 -117
__free_pages_ok 948 695 -253
DEBUG_VM:
add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90)
function old new delta
free_pages_check_bad - 124 +124
free_pages_prepare 1112 898 -214
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c63e5e7e4864..97894cbe2fa3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -906,18 +906,11 @@ static inline bool page_expected_state(struct page *page,
return true;
}
-static inline int free_pages_check(struct page *page)
+static void free_pages_check_bad(struct page *page)
{
const char *bad_reason;
unsigned long bad_flags;
- if (page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)) {
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
- return 0;
- }
-
- /* Something has gone sideways, find it */
bad_reason = NULL;
bad_flags = 0;
@@ -936,6 +929,17 @@ static inline int free_pages_check(struct page *page)
bad_reason = "page still charged to cgroup";
#endif
bad_page(page, bad_reason, bad_flags);
+}
+static inline int free_pages_check(struct page *page)
+{
+ if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ return 0;
+ }
+
+ /* Something has gone sideways, find it */
+ free_pages_check_bad(page);
return 1;
}
--
2.6.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 5/6] mm, page_alloc: pull out side effects from free_pages_check
2016-04-27 14:57 [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Mel Gorman
` (3 preceding siblings ...)
2016-04-27 14:57 ` [PATCH 4/6] mm, page_alloc: un-inline the bad part of free_pages_check Mel Gorman
@ 2016-04-27 14:57 ` Mel Gorman
2016-04-27 14:57 ` [PATCH 6/6] mm, page_alloc: don't duplicate code in free_pcp_prepare Mel Gorman
2016-05-02 8:54 ` [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Vlastimil Babka
6 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML,
Mel Gorman
Check without side-effects should be easier to maintain. It also removes the
duplicated cpupid and flags reset done in !DEBUG_VM variant of both
free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it enables
the next patch.
It shouldn't result in new branches, thanks to inlining of the check.
!DEBUG_VM bloat-o-meter:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
function old new delta
__free_pages_ok 748 739 -9
free_pcppages_bulk 1403 1385 -18
DEBUG_VM:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
function old new delta
free_pages_prepare 806 778 -28
This is also slightly faster because cpupid information is not set on tail
pages so we can avoid resets there.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 97894cbe2fa3..b823f00c275b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -932,11 +932,8 @@ static void free_pages_check_bad(struct page *page)
}
static inline int free_pages_check(struct page *page)
{
- if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) {
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)))
return 0;
- }
/* Something has gone sideways, find it */
free_pages_check_bad(page);
@@ -1016,7 +1013,11 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
- bad += free_pages_check(page + i);
+ if (unlikely(free_pages_check(page + i))) {
+ bad++;
+ continue;
+ }
+ (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
}
}
if (PageAnonHead(page))
@@ -1025,6 +1026,8 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
if (bad)
return false;
+ page_cpupid_reset_last(page);
+ page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
reset_page_owner(page, order);
if (!PageHighMem(page)) {
--
2.6.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 6/6] mm, page_alloc: don't duplicate code in free_pcp_prepare
2016-04-27 14:57 [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Mel Gorman
` (4 preceding siblings ...)
2016-04-27 14:57 ` [PATCH 5/6] mm, page_alloc: pull out side effects from free_pages_check Mel Gorman
@ 2016-04-27 14:57 ` Mel Gorman
2016-05-02 8:54 ` [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Vlastimil Babka
6 siblings, 0 replies; 10+ messages in thread
From: Mel Gorman @ 2016-04-27 14:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Jesper Dangaard Brouer, Linux-MM, LKML,
Mel Gorman
The new free_pcp_prepare() function shares a lot of code with
free_pages_prepare(), which makes this a maintenance risk when some future
patch modifies only one of them. We should be able to achieve the same effect
(skipping free_pages_check() from !DEBUG_VM configs) by adding a parameter to
free_pages_prepare() and making it inline, so the checks (and the order != 0
parts) are eliminated from the call from free_pcp_prepare().
!DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already inlining
free_pages_prepare() and the elimination seems to work as expected
DEBUG_VM bloat-o-meter:
add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
function old new delta
__free_pages_ok 297 1060 +763
free_hot_cold_page 480 752 +272
free_pages_prepare 778 - -778
Here inlining didn't occur before, and added some code, but it's ok for a debug
option.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 35 +++++++----------------------------
1 file changed, 7 insertions(+), 28 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b823f00c275b..bc4160bfb36b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -990,7 +990,8 @@ static int free_tail_pages_check(struct page *head_page, struct page *page)
return ret;
}
-static bool free_pages_prepare(struct page *page, unsigned int order)
+static __always_inline bool free_pages_prepare(struct page *page, unsigned int order,
+ bool check_free)
{
int bad = 0;
@@ -1022,7 +1023,8 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
}
if (PageAnonHead(page))
page->mapping = NULL;
- bad += free_pages_check(page);
+ if (check_free)
+ bad += free_pages_check(page);
if (bad)
return false;
@@ -1046,7 +1048,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
#ifdef CONFIG_DEBUG_VM
static inline bool free_pcp_prepare(struct page *page)
{
- return free_pages_prepare(page, 0);
+ return free_pages_prepare(page, 0, true);
}
static inline bool bulkfree_pcp_prepare(struct page *page)
@@ -1056,30 +1058,7 @@ static inline bool bulkfree_pcp_prepare(struct page *page)
#else
static bool free_pcp_prepare(struct page *page)
{
- VM_BUG_ON_PAGE(PageTail(page), page);
-
- trace_mm_page_free(page, 0);
- kmemcheck_free_shadow(page, 0);
- kasan_poison_free_pages(page, 0);
-
- if (PageAnonHead(page))
- page->mapping = NULL;
-
- reset_page_owner(page, 0);
-
- if (!PageHighMem(page)) {
- debug_check_no_locks_freed(page_address(page),
- PAGE_SIZE);
- debug_check_no_obj_freed(page_address(page),
- PAGE_SIZE);
- }
- arch_free_page(page, 0);
- kernel_poison_pages(page, 0, 0);
- kernel_map_pages(page, 0, 0);
-
- page_cpupid_reset_last(page);
- page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
- return true;
+ return free_pages_prepare(page, 0, false);
}
static bool bulkfree_pcp_prepare(struct page *page)
@@ -1257,7 +1236,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
int migratetype;
unsigned long pfn = page_to_pfn(page);
- if (!free_pages_prepare(page, order))
+ if (!free_pages_prepare(page, order, true))
return;
migratetype = get_pfnblock_migratetype(page, pfn);
--
2.6.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
2016-04-27 14:57 [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Mel Gorman
` (5 preceding siblings ...)
2016-04-27 14:57 ` [PATCH 6/6] mm, page_alloc: don't duplicate code in free_pcp_prepare Mel Gorman
@ 2016-05-02 8:54 ` Vlastimil Babka
2016-05-03 8:50 ` Mel Gorman
6 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2016-05-02 8:54 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Jesper Dangaard Brouer, Linux-MM, LKML
On 04/27/2016 04:57 PM, Mel Gorman wrote:
> as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
> is fine. The nodemask pointer is the same between cpuset retries. If the
> zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
> change then there is a second harmless pass through the page allocator.
True. But I just realized (while working on direct compaction priorities)
that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
According to the comment it should be ignoring mempolicies, but it still
honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
mempolicy one.
I think it's possibly easily fixed outside the fast path like this. If
you agree, consider it has my s-o-b:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f052bbca41d..7ccaa6e023f3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3834,6 +3834,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
alloc_mask = memalloc_noio_flags(gfp_mask);
ac.spread_dirty_pages = false;
+ /*
+ * Restore the original nodemask, which might have been replaced with
+ * &cpuset_current_mems_allowed to optimize the fast-path attempt.
+ */
+ ac.nodemask = nodemask;
page = __alloc_pages_slowpath(alloc_mask, order, &ac);
no_zone:
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
2016-05-02 8:54 ` [PATCH 0/6] Optimise page alloc/free fast paths followup v2 Vlastimil Babka
@ 2016-05-03 8:50 ` Mel Gorman
2016-05-03 14:33 ` Vlastimil Babka
0 siblings, 1 reply; 10+ messages in thread
From: Mel Gorman @ 2016-05-03 8:50 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On Mon, May 02, 2016 at 10:54:23AM +0200, Vlastimil Babka wrote:
> On 04/27/2016 04:57 PM, Mel Gorman wrote:
> > as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
> > is fine. The nodemask pointer is the same between cpuset retries. If the
> > zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
> > change then there is a second harmless pass through the page allocator.
>
> True. But I just realized (while working on direct compaction priorities)
> that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
> According to the comment it should be ignoring mempolicies, but it still
> honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
> mempolicy one.
>
> I think it's possibly easily fixed outside the fast path like this. If
> you agree, consider it has my s-o-b:
>
While I see your point, I don't necessarily see why this fixes it as the
original nodemask may also be a restricted set that ALLOC_NO_WATERMARKS
should ignore. How about this?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 79100583b9de..dbb08d102d41 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3432,9 +3432,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
/*
* Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
* the allocation is high priority and these type of
- * allocations are system rather than user orientated
+ * allocations are system rather than user orientated. If a
+ * cpuset retry occurs then these values persist across the
+ * retry but that's ok for a context ignoring watermarks.
*/
ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+ ac->high_zoneidx = MAX_NR_ZONES - 1;
+ ac->nodemask = NULL;
page = get_page_from_freelist(gfp_mask, order,
ALLOC_NO_WATERMARKS, ac);
if (page)
--
Mel Gorman
SUSE Labs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/6] Optimise page alloc/free fast paths followup v2
2016-05-03 8:50 ` Mel Gorman
@ 2016-05-03 14:33 ` Vlastimil Babka
0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2016-05-03 14:33 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Jesper Dangaard Brouer, Linux-MM, LKML
On 05/03/2016 10:50 AM, Mel Gorman wrote:
> On Mon, May 02, 2016 at 10:54:23AM +0200, Vlastimil Babka wrote:
>> On 04/27/2016 04:57 PM, Mel Gorman wrote:
>>> as the patch "mm, page_alloc: inline the fast path of the zonelist iterator"
>>> is fine. The nodemask pointer is the same between cpuset retries. If the
>>> zonelist changes due to ALLOC_NO_WATERMARKS *and* it races with a cpuset
>>> change then there is a second harmless pass through the page allocator.
>>
>> True. But I just realized (while working on direct compaction priorities)
>> that there's another subtle issue with the ALLOC_NO_WATERMARKS part.
>> According to the comment it should be ignoring mempolicies, but it still
>> honours ac.nodemask, and your patch is replacing NULL ac.nodemask with the
>> mempolicy one.
>>
>> I think it's possibly easily fixed outside the fast path like this. If
>> you agree, consider it has my s-o-b:
>>
>
> While I see your point, I don't necessarily see why this fixes it as the
> original nodemask may also be a restricted set that ALLOC_NO_WATERMARKS
> should ignore.
I wasn't so sure about that, so I defensively went with just restoring
the pre-patch behavior. I expect that it's safe to ignore mempolicies
imposed on the process via e.g. taskset/numactl, for a a critical kernel
allocation that happens to be done within the process context. But
ignoring nodemask that's imposed by the kernel allocation itself might
be breaking some expectations. I guess vma policies can also result in a
restricted nodemask, but those should be for userspace allocations and
never ALLOC_NO_WATERMARKS?
> How about this?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 79100583b9de..dbb08d102d41 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3432,9 +3432,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> /*
> * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
> * the allocation is high priority and these type of
> - * allocations are system rather than user orientated
> + * allocations are system rather than user orientated. If a
> + * cpuset retry occurs then these values persist across the
> + * retry but that's ok for a context ignoring watermarks.
> */
> ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> + ac->high_zoneidx = MAX_NR_ZONES - 1;
Wouldn't altering high_zoneidx like this result in e.g. DMA allocation
ending up in a NORMAL zone? Also high_zoneidx doesn't seem to get
restricted by mempolicy/nodemask anyway. Maybe you wanted to re-set
preferred_zoneref? That would make sense, yeah.
> + ac->nodemask = NULL;
> page = get_page_from_freelist(gfp_mask, order,
> ALLOC_NO_WATERMARKS, ac);
> if (page)
>
^ permalink raw reply [flat|nested] 10+ messages in thread