From: Vlastimil Babka <vbabka@suse.cz>
To: Dan Williams <dan.j.williams@intel.com>, akpm@linux-foundation.org
Cc: Rik van Riel <riel@redhat.com>,
linux-nvdimm@lists.01.org,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
linux-mm@kvack.org, Ingo Molnar <mingo@redhat.com>,
Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
Jerome Glisse <j.glisse@gmail.com>,
Sudip Mukherjee <sudipm.mukherjee@gmail.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Laura Abbott <labbott@fedoraproject.org>
Subject: Re: [RFC PATCH] mm: support CONFIG_ZONE_DEVICE + CONFIG_ZONE_DMA
Date: Tue, 26 Jan 2016 22:42:30 +0100 [thread overview]
Message-ID: <56A7E846.30607@suse.cz> (raw)
In-Reply-To: <20160126000639.358.89668.stgit@dwillia2-desk3.amr.corp.intel.com>
On 26.1.2016 1:06, Dan Williams wrote:
> It appears devices requiring ZONE_DMA are still prevalent (see link
> below). For this reason the proposal to require turning off ZONE_DMA to
> enable ZONE_DEVICE is untenable in the short term. We want a single
> kernel image to be able to support legacy devices as well as next
> generation persistent memory platforms.
>
> Towards this end, alias ZONE_DMA and ZONE_DEVICE to work around needing
> to maintain a unique zone number for ZONE_DEVICE. Record the geometry
> of ZONE_DMA at init (->init_spanned_pages) and use that information in
> is_zone_device_page() to differentiate pages allocated via
> devm_memremap_pages() vs true ZONE_DMA pages. Otherwise, use the
> simpler definition of is_zone_device_page() when ZONE_DMA is turned off.
>
> Note that this also teaches the memory hot remove path that the zone may
> not have sections for all pfn spans (->zone_dyn_start_pfn).
>
> A user visible implication of this change is potentially an unexpectedly
> high "spanned" value in /proc/zoneinfo for the DMA zone.
[+CC Joonsoo, Laura]
Sounds like quite a hack :( Would it be possible to extend the bits encoding
zone? Potentially, ZONE_CMA could be added one day...
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Jerome Glisse <j.glisse@gmail.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931
> Fixes: 033fbae988fc ("mm: ZONE_DEVICE for "device memory"")
> Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++--------------
> include/linux/mmzone.h | 24 ++++++++++++++++++++----
> mm/Kconfig | 1 -
> mm/memory_hotplug.c | 15 +++++++++++----
> mm/page_alloc.c | 9 ++++++---
> 5 files changed, 69 insertions(+), 26 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f1cd22f2df1a..b4bccd3d3c41 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -664,12 +664,44 @@ static inline enum zone_type page_zonenum(const struct page *page)
> return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
> }
>
> +#ifdef NODE_NOT_IN_PAGE_FLAGS
> +extern int page_to_nid(const struct page *page);
> +#else
> +static inline int page_to_nid(const struct page *page)
> +{
> + return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
> +}
> +#endif
> +
> +static inline struct zone *page_zone(const struct page *page)
> +{
> + return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
> +}
> +
> #ifdef CONFIG_ZONE_DEVICE
> void get_zone_device_page(struct page *page);
> void put_zone_device_page(struct page *page);
> static inline bool is_zone_device_page(const struct page *page)
> {
> +#ifndef CONFIG_ZONE_DMA
> return page_zonenum(page) == ZONE_DEVICE;
> +#else /* ZONE_DEVICE == ZONE_DMA */
> + struct zone *zone;
> +
> + if (page_zonenum(page) != ZONE_DEVICE)
> + return false;
> +
> + /*
> + * If ZONE_DEVICE is aliased with ZONE_DMA we need to check
> + * whether this was a dynamically allocated page from
> + * devm_memremap_pages() by checking against the size of
> + * ZONE_DMA at boot.
> + */
> + zone = page_zone(page);
> + if (page_to_pfn(page) <= zone_end_pfn_boot(zone))
> + return false;
> + return true;
> +#endif
> }
> #else
> static inline void get_zone_device_page(struct page *page)
> @@ -735,15 +767,6 @@ static inline int zone_to_nid(struct zone *zone)
> #endif
> }
>
> -#ifdef NODE_NOT_IN_PAGE_FLAGS
> -extern int page_to_nid(const struct page *page);
> -#else
> -static inline int page_to_nid(const struct page *page)
> -{
> - return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
> -}
> -#endif
> -
> #ifdef CONFIG_NUMA_BALANCING
> static inline int cpu_pid_to_cpupid(int cpu, int pid)
> {
> @@ -857,11 +880,6 @@ static inline bool cpupid_match_pid(struct task_struct *task, int cpupid)
> }
> #endif /* CONFIG_NUMA_BALANCING */
>
> -static inline struct zone *page_zone(const struct page *page)
> -{
> - return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
> -}
> -
> #ifdef SECTION_IN_PAGE_FLAGS
> static inline void set_page_section(struct page *page, unsigned long section)
> {
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 33bb1b19273e..a0ef09b7f893 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -288,6 +288,13 @@ enum zone_type {
> */
> ZONE_DMA,
> #endif
> +#ifdef CONFIG_ZONE_DEVICE
> +#ifndef CONFIG_ZONE_DMA
> + ZONE_DEVICE,
> +#else
> + ZONE_DEVICE = ZONE_DMA,
> +#endif
> +#endif
> #ifdef CONFIG_ZONE_DMA32
> /*
> * x86_64 needs two ZONE_DMAs because it supports devices that are
> @@ -314,11 +321,7 @@ enum zone_type {
> ZONE_HIGHMEM,
> #endif
> ZONE_MOVABLE,
> -#ifdef CONFIG_ZONE_DEVICE
> - ZONE_DEVICE,
> -#endif
> __MAX_NR_ZONES
> -
> };
>
> #ifndef __GENERATING_BOUNDS_H
> @@ -379,12 +382,19 @@ struct zone {
>
> /* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */
> unsigned long zone_start_pfn;
> + /* first dynamically added pfn of the zone */
> + unsigned long zone_dyn_start_pfn;
>
> /*
> * spanned_pages is the total pages spanned by the zone, including
> * holes, which is calculated as:
> * spanned_pages = zone_end_pfn - zone_start_pfn;
> *
> + * init_spanned_pages is the boot/init time total pages spanned
> + * by the zone for differentiating statically assigned vs
> + * dynamically hot added memory to a zone.
> + * init_spanned_pages = init_zone_end_pfn - zone_start_pfn;
> + *
> * present_pages is physical pages existing within the zone, which
> * is calculated as:
> * present_pages = spanned_pages - absent_pages(pages in holes);
> @@ -423,6 +433,7 @@ struct zone {
> */
> unsigned long managed_pages;
> unsigned long spanned_pages;
> + unsigned long init_spanned_pages;
> unsigned long present_pages;
>
> const char *name;
> @@ -546,6 +557,11 @@ static inline unsigned long zone_end_pfn(const struct zone *zone)
> return zone->zone_start_pfn + zone->spanned_pages;
> }
>
> +static inline unsigned long zone_end_pfn_boot(const struct zone *zone)
> +{
> + return zone->zone_start_pfn + zone->init_spanned_pages;
> +}
> +
> static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
> {
> return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 97a4e06b15c0..08a92a9c8fbd 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -652,7 +652,6 @@ config IDLE_PAGE_TRACKING
> config ZONE_DEVICE
> bool "Device memory (pmem, etc...) hotplug support" if EXPERT
> default !ZONE_DMA
> - depends on !ZONE_DMA
> depends on MEMORY_HOTPLUG
> depends on MEMORY_HOTREMOVE
> depends on X86_64 #arch_add_memory() comprehends device memory
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4af58a3a8ffa..c3f0ff45bd47 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -300,6 +300,8 @@ static void __meminit grow_zone_span(struct zone *zone, unsigned long start_pfn,
>
> zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
> zone->zone_start_pfn;
> + if (!zone->zone_dyn_start_pfn || start_pfn < zone->zone_dyn_start_pfn)
> + zone->zone_dyn_start_pfn = start_pfn;
>
> zone_span_writeunlock(zone);
> }
> @@ -601,8 +603,9 @@ static int find_biggest_section_pfn(int nid, struct zone *zone,
> static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
> unsigned long end_pfn)
> {
> - unsigned long zone_start_pfn = zone->zone_start_pfn;
> + unsigned long zone_start_pfn = zone->zone_dyn_start_pfn;
> unsigned long z = zone_end_pfn(zone); /* zone_end_pfn namespace clash */
> + bool dyn_zone = zone->zone_start_pfn == zone_start_pfn;
> unsigned long zone_end_pfn = z;
> unsigned long pfn;
> struct mem_section *ms;
> @@ -619,7 +622,9 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
> pfn = find_smallest_section_pfn(nid, zone, end_pfn,
> zone_end_pfn);
> if (pfn) {
> - zone->zone_start_pfn = pfn;
> + if (dyn_zone)
> + zone->zone_start_pfn = pfn;
> + zone->zone_dyn_start_pfn = pfn;
> zone->spanned_pages = zone_end_pfn - pfn;
> }
> } else if (zone_end_pfn == end_pfn) {
> @@ -661,8 +666,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
> }
>
> /* The zone has no valid section */
> - zone->zone_start_pfn = 0;
> - zone->spanned_pages = 0;
> + if (dyn_zone)
> + zone->zone_start_pfn = 0;
> + zone->zone_dyn_start_pfn = 0;
> + zone->spanned_pages = zone->init_spanned_pages;
> zone_span_writeunlock(zone);
> }
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 63358d9f9aa9..2d8b1d602ff3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -209,6 +209,10 @@ EXPORT_SYMBOL(totalram_pages);
> static char * const zone_names[MAX_NR_ZONES] = {
> #ifdef CONFIG_ZONE_DMA
> "DMA",
> +#else
> +#ifdef CONFIG_ZONE_DEVICE
> + "Device",
> +#endif
> #endif
> #ifdef CONFIG_ZONE_DMA32
> "DMA32",
> @@ -218,9 +222,6 @@ static char * const zone_names[MAX_NR_ZONES] = {
> "HighMem",
> #endif
> "Movable",
> -#ifdef CONFIG_ZONE_DEVICE
> - "Device",
> -#endif
> };
>
> compound_page_dtor * const compound_page_dtors[] = {
> @@ -5082,6 +5083,8 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
> node_start_pfn, node_end_pfn,
> zholes_size);
> zone->spanned_pages = size;
> + zone->init_spanned_pages = size;
> + zone->zone_dyn_start_pfn = 0;
> zone->present_pages = real_size;
>
> totalpages += size;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-01-26 21:42 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-26 0:06 [RFC PATCH] mm: support CONFIG_ZONE_DEVICE + CONFIG_ZONE_DMA Dan Williams
2016-01-26 6:00 ` Sudip Mukherjee
2016-01-26 17:07 ` Dan Williams
2016-01-26 19:10 ` Mark
2016-01-26 21:42 ` Vlastimil Babka [this message]
2016-01-26 21:48 ` Dan Williams
2016-01-26 22:11 ` Andrew Morton
2016-01-26 22:33 ` Dan Williams
2016-01-26 22:51 ` Andrew Morton
2016-01-26 23:11 ` Dan Williams
2016-01-27 1:18 ` Joonsoo Kim
2016-01-27 1:37 ` Dan Williams
2016-01-27 2:15 ` Joonsoo Kim
2016-01-27 3:23 ` Dan Williams
2016-01-27 3:52 ` Joonsoo Kim
2016-01-27 4:26 ` Dan Williams
2016-01-27 5:52 ` Joonsoo Kim
2016-01-27 7:46 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56A7E846.30607@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=hch@lst.de \
--cc=hpa@zytor.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=j.glisse@gmail.com \
--cc=labbott@fedoraproject.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=riel@redhat.com \
--cc=sudipm.mukherjee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).