All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Rik van Riel <riel@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mel Gorman <mgorman@suse.de>, Mark <markk@clara.co.uk>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Subject: Re: [RFC PATCH] mm: CONFIG_NR_ZONES_EXTENDED
Date: Mon, 1 Feb 2016 21:42:13 -0800	[thread overview]
Message-ID: <20160201214213.2bdf9b4e.akpm@linux-foundation.org> (raw)
In-Reply-To: <20160128061914.32541.97351.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed, 27 Jan 2016 22:19:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new
> mm zones that are bumping up against the current maximum limit of 4
> zones, i.e. 2 bits in page->flags.  When adding a zone this equation
> still needs to be satisified:
> 
>     SECTIONS_WIDTH + ZONES_WIDTH + NODES_SHIFT + LAST_CPUPID_SHIFT
> 	  <= BITS_PER_LONG - NR_PAGEFLAGS
> 
> ZONE_DEVICE currently tries to satisfy this equation by requiring that
> ZONE_DMA be disabled, but this is untenable given generic kernels want
> to support ZONE_DEVICE and ZONE_DMA simultaneously.  ZONE_CMA would like
> to increase the amount of memory covered per section, but that limits
> the minimum granularity at which consecutive memory ranges can be added
> via devm_memremap_pages().
> 
> The trade-off of what is acceptable to sacrifice depends heavily on the
> platform.  For example, ZONE_CMA is targeted for 32-bit platforms where
> page->flags is constrained, but those platforms likely do not care about
> the minimum granularity of memory hotplug.  A big iron machine with 1024
> numa nodes can likely sacrifice ZONE_DMA where a general purpose
> distribution kernel can not.
> 
> CONFIG_NR_ZONES_EXTENDED is a configuration symbol that gets selected
> when the number of configured zones exceeds 4.  It documents the
> configuration symbols and definitions that get modified when ZONES_WIDTH
> is greater than 2.
> 
> For now, it steals a bit from NODES_SHIFT.  Later on it can be used to
> document the definitions that get modified when a 32-bit configuration
> wants more zone bits.

So if you want ZONE_DMA, you're limited to 512 NUMA nodes?

That seems reasonable.

> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1409,8 +1409,10 @@ config NUMA_EMU
>  
>  config NODES_SHIFT
>  	int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP
> -	range 1 10
> -	default "10" if MAXSMP
> +	range 1 10 if !NR_ZONES_EXTENDED
> +	range 1 9 if NR_ZONES_EXTENDED
> +	default "10" if MAXSMP && !NR_ZONES_EXTENDED
> +	default "9" if MAXSMP && NR_ZONES_EXTENDED
>  	default "6" if X86_64
>  	default "3"
>  	depends on NEED_MULTIPLE_NODES
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 28ad5f6494b0..5979c2c80140 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -329,22 +329,29 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
>   *       0xe    => BAD (MOVABLE+DMA32+HIGHMEM)
>   *       0xf    => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
>   *
> - * ZONES_SHIFT must be <= 2 on 32 bit platforms.
> + * GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms.
>   */
>  
> -#if 16 * ZONES_SHIFT > BITS_PER_LONG
> -#error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
> +#if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4
> +/* ZONE_DEVICE is not a valid GFP zone specifier */
> +#define GFP_ZONES_SHIFT 2
> +#else
> +#define GFP_ZONES_SHIFT ZONES_SHIFT
> +#endif
> +
> +#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG
> +#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
>  #endif
>  
>  #define GFP_ZONE_TABLE ( \
> -	(ZONE_NORMAL << 0 * ZONES_SHIFT)				      \
> -	| (OPT_ZONE_DMA << ___GFP_DMA * ZONES_SHIFT)			      \
> -	| (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * ZONES_SHIFT)		      \
> -	| (OPT_ZONE_DMA32 << ___GFP_DMA32 * ZONES_SHIFT)		      \
> -	| (ZONE_NORMAL << ___GFP_MOVABLE * ZONES_SHIFT)			      \
> -	| (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * ZONES_SHIFT)	      \
> -	| (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * ZONES_SHIFT)   \
> -	| (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * ZONES_SHIFT)   \
> +	(ZONE_NORMAL << 0 * GFP_ZONES_SHIFT)					\
> +	| (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT)			\
> +	| (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT)		\
> +	| (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT)		      	\
> +	| (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT)			\
> +	| (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT)	\
> +	| (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)	\
> +	| (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)	\
>  )

Geeze.  Congrats on decrypting this stuff.  I hope.  Do you think it's
possible to comprehensibly document it all for the next poor soul who
ventures into it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Rik van Riel <riel@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mel Gorman <mgorman@suse.de>, Mark <markk@clara.co.uk>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Subject: Re: [RFC PATCH] mm: CONFIG_NR_ZONES_EXTENDED
Date: Mon, 1 Feb 2016 21:42:13 -0800	[thread overview]
Message-ID: <20160201214213.2bdf9b4e.akpm@linux-foundation.org> (raw)
In-Reply-To: <20160128061914.32541.97351.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed, 27 Jan 2016 22:19:14 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new
> mm zones that are bumping up against the current maximum limit of 4
> zones, i.e. 2 bits in page->flags.  When adding a zone this equation
> still needs to be satisified:
> 
>     SECTIONS_WIDTH + ZONES_WIDTH + NODES_SHIFT + LAST_CPUPID_SHIFT
> 	  <= BITS_PER_LONG - NR_PAGEFLAGS
> 
> ZONE_DEVICE currently tries to satisfy this equation by requiring that
> ZONE_DMA be disabled, but this is untenable given generic kernels want
> to support ZONE_DEVICE and ZONE_DMA simultaneously.  ZONE_CMA would like
> to increase the amount of memory covered per section, but that limits
> the minimum granularity at which consecutive memory ranges can be added
> via devm_memremap_pages().
> 
> The trade-off of what is acceptable to sacrifice depends heavily on the
> platform.  For example, ZONE_CMA is targeted for 32-bit platforms where
> page->flags is constrained, but those platforms likely do not care about
> the minimum granularity of memory hotplug.  A big iron machine with 1024
> numa nodes can likely sacrifice ZONE_DMA where a general purpose
> distribution kernel can not.
> 
> CONFIG_NR_ZONES_EXTENDED is a configuration symbol that gets selected
> when the number of configured zones exceeds 4.  It documents the
> configuration symbols and definitions that get modified when ZONES_WIDTH
> is greater than 2.
> 
> For now, it steals a bit from NODES_SHIFT.  Later on it can be used to
> document the definitions that get modified when a 32-bit configuration
> wants more zone bits.

So if you want ZONE_DMA, you're limited to 512 NUMA nodes?

That seems reasonable.

> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1409,8 +1409,10 @@ config NUMA_EMU
>  
>  config NODES_SHIFT
>  	int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP
> -	range 1 10
> -	default "10" if MAXSMP
> +	range 1 10 if !NR_ZONES_EXTENDED
> +	range 1 9 if NR_ZONES_EXTENDED
> +	default "10" if MAXSMP && !NR_ZONES_EXTENDED
> +	default "9" if MAXSMP && NR_ZONES_EXTENDED
>  	default "6" if X86_64
>  	default "3"
>  	depends on NEED_MULTIPLE_NODES
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 28ad5f6494b0..5979c2c80140 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -329,22 +329,29 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
>   *       0xe    => BAD (MOVABLE+DMA32+HIGHMEM)
>   *       0xf    => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
>   *
> - * ZONES_SHIFT must be <= 2 on 32 bit platforms.
> + * GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms.
>   */
>  
> -#if 16 * ZONES_SHIFT > BITS_PER_LONG
> -#error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
> +#if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4
> +/* ZONE_DEVICE is not a valid GFP zone specifier */
> +#define GFP_ZONES_SHIFT 2
> +#else
> +#define GFP_ZONES_SHIFT ZONES_SHIFT
> +#endif
> +
> +#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG
> +#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
>  #endif
>  
>  #define GFP_ZONE_TABLE ( \
> -	(ZONE_NORMAL << 0 * ZONES_SHIFT)				      \
> -	| (OPT_ZONE_DMA << ___GFP_DMA * ZONES_SHIFT)			      \
> -	| (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * ZONES_SHIFT)		      \
> -	| (OPT_ZONE_DMA32 << ___GFP_DMA32 * ZONES_SHIFT)		      \
> -	| (ZONE_NORMAL << ___GFP_MOVABLE * ZONES_SHIFT)			      \
> -	| (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * ZONES_SHIFT)	      \
> -	| (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * ZONES_SHIFT)   \
> -	| (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * ZONES_SHIFT)   \
> +	(ZONE_NORMAL << 0 * GFP_ZONES_SHIFT)					\
> +	| (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT)			\
> +	| (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT)		\
> +	| (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT)		      	\
> +	| (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT)			\
> +	| (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT)	\
> +	| (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)	\
> +	| (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)	\
>  )

Geeze.  Congrats on decrypting this stuff.  I hope.  Do you think it's
possible to comprehensibly document it all for the next poor soul who
ventures into it?

  reply	other threads:[~2016-02-02  5:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28  6:19 [RFC PATCH] mm: CONFIG_NR_ZONES_EXTENDED Dan Williams
2016-01-28  6:19 ` Dan Williams
2016-02-02  5:42 ` Andrew Morton [this message]
2016-02-02  5:42   ` Andrew Morton
2016-02-07  6:10   ` Dan Williams
2016-02-07  6:10     ` Dan Williams
2016-02-29 12:33   ` Vlastimil Babka
2016-02-29 12:33     ` Vlastimil Babka
2016-02-29 17:55     ` Dan Williams
2016-02-29 17:55       ` Dan Williams
2016-03-01  0:06       ` Vlastimil Babka
2016-03-01  0:06         ` Vlastimil Babka
2016-03-01  2:06         ` Dan Williams
2016-03-01  2:06           ` Dan Williams
2016-03-01  8:31           ` Vlastimil Babka
2016-03-01  8:31             ` Vlastimil Babka
2016-03-01 23:43             ` Dan Williams
2016-03-01 23:43               ` Dan Williams
2016-03-02  8:10               ` Vlastimil Babka
2016-03-02  8:10                 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160201214213.2bdf9b4e.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=markk@clara.co.uk \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=sudipm.mukherjee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.