linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Joonsoo Kim <js1304@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Aaron Lu <aaron.lu@intel.com>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>,
	David Rientjes <rientjes@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Gu Zheng <guz.fnst@cn.fujitsu.com>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Toshi Kani <toshi.kani@hpe.com>
Subject: Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
Date: Mon, 21 Dec 2015 11:46:32 +0100	[thread overview]
Message-ID: <5677D888.40704@suse.cz> (raw)
In-Reply-To: <1450678432-16593-2-git-send-email-iamjoonsoo.kim@lge.com>

On 12/21/2015 07:13 AM, Joonsoo Kim wrote:
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
>
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
>
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
>
> Avg is improved by roughly 30% [2].
>
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
>
> v2
> o checking zone continuity after initialization
> o handle memory-hotplug case
>
> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
[...]
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -505,6 +505,9 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>   	unsigned long i;
>   	int err = 0;
>   	int start_sec, end_sec;
> +
> +	clear_zone_contiguous(zone);
> +
>   	/* during initialize mem_map, align hot-added range to section */
>   	start_sec = pfn_to_section_nr(phys_start_pfn);
>   	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> @@ -523,6 +526,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>   	}
>   	vmemmap_populate_print_last();
>
> +	set_zone_contiguous(zone);
> +
>   	return err;
>   }
>   EXPORT_SYMBOL_GPL(__add_pages);
> @@ -770,6 +775,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>   	resource_size_t start, size;
>   	int ret = 0;
>
> +	clear_zone_contiguous(zone);
> +
>   	/*
>   	 * We can only remove entire sections
>   	 */
> @@ -796,6 +803,9 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>   		if (ret)
>   			break;
>   	}
> +
> +	set_zone_contiguous(zone);
> +
>   	return ret;

Hm I wonder how many __add_ or __remove_pages calls there might be per a 
major hotplug event (e.g. whole node). IIRC there may be many subranges 
that are onlined/offlined separately? Doing a full zone rescan on each 
sub-operation could be quite costly, no? You should have added 
mm/hotplug_memory.c people to CC to comment, as you did in the [RFC] 
theoretical race... mail. Doing that now.
If the hotplug people confirm it might be an issue, I guess one solution 
is to call set_zone_contiguous() lazily on-demand as you did in the v1 
(but not relying on cached pfn initialization to determine whether 
contiguous was already evaluated). Add another variable like 
zone->contiguous_evaluated and make hotplug code just set it to false.

>   }
>   EXPORT_SYMBOL_GPL(__remove_pages);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bac8842..4f5ad2b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1271,9 +1271,13 @@ free_range:
>   	pgdat_init_report_one_done();
>   	return 0;
>   }
> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>
>   void __init page_alloc_init_late(void)
>   {
> +	struct zone *zone;
> +
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>   	int nid;
>
>   	/* There will be num_node_state(N_MEMORY) threads */
> @@ -1287,8 +1291,87 @@ void __init page_alloc_init_late(void)
>
>   	/* Reinit limits that are based on free pages after the kernel is up */
>   	files_maxfiles_init();
> +#endif
> +
> +	for_each_populated_zone(zone)
> +		set_zone_contiguous(zone);
> +}
> +
> +/*
> + * Check that the whole (or subset of) a pageblock given by the interval of
> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> + * with the migration of free compaction scanner. The scanners then need to
> + * use only pfn_valid_within() check for arches that allow holes within
> + * pageblocks.
> + *
> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> + *
> + * It's possible on some configurations to have a setup like node0 node1 node0
> + * i.e. it's possible that all pages within a zones range of pages do not
> + * belong to a single zone. We assume that a border between node0 and node1
> + * can occur within a single pageblock, but not a node0 node1 node0
> + * interleaving within a single pageblock. It is therefore sufficient to check
> + * the first and last page of a pageblock and avoid checking each individual
> + * page in a pageblock.
> + */
> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> +				unsigned long end_pfn, struct zone *zone)
> +{
> +	struct page *start_page;
> +	struct page *end_page;
> +
> +	/* end_pfn is one past the range we are checking */
> +	end_pfn--;
> +
> +	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> +		return NULL;
> +
> +	start_page = pfn_to_page(start_pfn);
> +
> +	if (page_zone(start_page) != zone)
> +		return NULL;
> +
> +	end_page = pfn_to_page(end_pfn);
> +
> +	/* This gives a shorter code than deriving page_zone(end_page) */
> +	if (page_zone_id(start_page) != page_zone_id(end_page))
> +		return NULL;
> +
> +	return start_page;
> +}
> +
> +void set_zone_contiguous(struct zone *zone)
> +{
> +	unsigned long block_start_pfn = zone->zone_start_pfn;
> +	unsigned long block_end_pfn;
> +	unsigned long pfn;
> +
> +	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> +	for (; block_start_pfn < zone_end_pfn(zone);
> +		block_start_pfn = block_end_pfn,
> +		block_end_pfn += pageblock_nr_pages) {
> +
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		if (!__pageblock_pfn_to_page(block_start_pfn,
> +					block_end_pfn, zone))
> +			return;
> +
> +		/* Check validity of pfn within pageblock */
> +		for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
> +			if (!pfn_valid_within(pfn))
> +				return;
> +		}

Hm this is suboptimal and misleading. The result of pfn_valid_within() 
doesn't affect whether we need to use __pageblock_pfn_to_page() or not, 
so zone->contiguous shouldn't depend on it.

On the other hand, if we knew that pfn_valid_within() is true 
everywhere, we wouldn't need to check it inside isolate_*pages_block().
So you could add another patch that adds another bool to struct zone and 
test for that (with #ifdef CONFIG_HOLES_IN_ZONE at appropriate places).

Thanks,
Vlastimil

> +	}
> +
> +	/* We confirm that there is no hole */
> +	zone->contiguous = true;
> +}
> +
> +void clear_zone_contiguous(struct zone *zone)
> +{
> +	zone->contiguous = false;
>   }
> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>
>   #ifdef CONFIG_CMA
>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-12-21 10:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-21  6:13 [PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn Joonsoo Kim
2015-12-21  6:13 ` [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous Joonsoo Kim
2015-12-21 10:46   ` Vlastimil Babka [this message]
2015-12-21 12:18     ` Joonsoo Kim
2015-12-21 12:38       ` Joonsoo Kim
2015-12-22 22:17   ` David Rientjes
2015-12-23  6:14     ` Vlastimil Babka
2015-12-23  6:57       ` Joonsoo Kim
2016-01-04 12:38         ` Vlastimil Babka
2016-01-08  2:52           ` Joonsoo Kim
2016-01-19  8:29   ` zhong jiang
2015-12-22 22:05 ` [PATCH 1/2] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn David Rientjes
  -- strict thread matches above, loose matches on Subject: below --
2015-12-14  5:02 Joonsoo Kim
2015-12-14  5:02 ` [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous Joonsoo Kim
2015-12-14 10:29   ` Vlastimil Babka
2015-12-14 15:25     ` Joonsoo Kim
2015-12-15  1:06       ` Aaron Lu
2015-12-15  8:24         ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5677D888.40704@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=toshi.kani@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).