All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>,
	linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v7 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
Date: Thu, 2 May 2019 09:48:08 +0200	[thread overview]
Message-ID: <20190502074803.GA3495@linux> (raw)
In-Reply-To: <155677653785.2336373.11131100812252340469.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed, May 01, 2019 at 10:55:37PM -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Tested-by: Jane Chu <jane.chu@oracle.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Unfortunately I did not hear back about the comments/questions I made for this
in the previous version.

> ---
>  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
>  mm/page_alloc.c        |    4 +++-
>  mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6726fc175b51..cffde898e345 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1175,6 +1175,8 @@ struct mem_section_usage {
>  	unsigned long pageblock_flags[0];
>  };
>  
> +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> +
>  struct page;
>  struct page_ext;
>  struct mem_section {
> @@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
>  
>  extern int __highest_present_section_nr;
>  
> +static inline int section_active_index(phys_addr_t phys)
> +{
> +	return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
> +}
> +
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +	int idx = section_active_index(PFN_PHYS(pfn));
> +
> +	return !!(ms->usage->map_active & (1UL << idx));

section_active_mask() also converts the value to address/size.
Why do we need to convert the values and we cannot work with pfn/pages instead?
It should be perfectly possible unless I am missing something.

The only thing required would be to export earlier your:

+#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
+#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))

and change section_active_index to:

static inline int section_active_index(unsigned long pfn)
{
	return (pfn & ~(PAGE_SECTION_MASK)) / SUB_SECTION_ACTIVE_PAGES;
}

In this way we do need to shift the values every time and we can work with them
directly.
Maybe you made it work this way because a reason I am missing.

> +static unsigned long section_active_mask(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx_start, idx_size;
> +	phys_addr_t start, size;
> +
> +	if (!nr_pages)
> +		return 0;
> +
> +	start = PFN_PHYS(pfn);
> +	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK)));

It seems to me that we already picked the lowest value back in
section_active_init, so we should be fine if we drop the min() here?

Another thing is why do we need to convert the values to address/size, and we
cannot work with pfns/pages.
Unless I am missing something it should be possible.

> +	size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +	idx_start = section_active_index(start);
> +	idx_size = section_active_index(size);
> +
> +	if (idx_size == 0)
> +		return -1;

Maybe we would be better off converting that -1 into something like "FULL_SECTION",
or at least dropping a comment there that "-1" means that the section is fully
populated.

> +	return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +	int i, start_sec = pfn_to_section_nr(pfn);
> +
> +	if (!nr_pages)
> +		return;
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		struct mem_section *ms;
> +		unsigned long mask;
> +		unsigned long pfns;
> +
> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		mask = section_active_mask(pfn, pfns);
> +
> +		ms = __nr_to_section(i);
> +		ms->usage->map_active |= mask;
> +		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, ms->usage->map_active);
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}
> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned long end)
>  {
> 

-- 
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Oscar Salvador <osalvador@suse.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: akpm@linux-foundation.org, Michal Hocko <mhocko@suse.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Logan Gunthorpe <logang@deltatee.com>,
	Jane Chu <jane.chu@oracle.com>,
	linux-nvdimm@lists.01.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 03/12] mm/sparsemem: Add helpers track active portions of a section at boot
Date: Thu, 2 May 2019 09:48:08 +0200	[thread overview]
Message-ID: <20190502074803.GA3495@linux> (raw)
In-Reply-To: <155677653785.2336373.11131100812252340469.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed, May 01, 2019 at 10:55:37PM -0700, Dan Williams wrote:
> Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
> section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) /
> map_active bitmask length (64)). If it turns out that 2MB is too large
> of an active tracking granularity it is trivial to increase the size of
> the map_active bitmap.
> 
> The implications of a partially populated section is that pfn_valid()
> needs to go beyond a valid_section() check and read the sub-section
> active ranges from the bitmask.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Tested-by: Jane Chu <jane.chu@oracle.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Unfortunately I did not hear back about the comments/questions I made for this
in the previous version.

> ---
>  include/linux/mmzone.h |   29 ++++++++++++++++++++++++++++-
>  mm/page_alloc.c        |    4 +++-
>  mm/sparse.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6726fc175b51..cffde898e345 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1175,6 +1175,8 @@ struct mem_section_usage {
>  	unsigned long pageblock_flags[0];
>  };
>  
> +void section_active_init(unsigned long pfn, unsigned long nr_pages);
> +
>  struct page;
>  struct page_ext;
>  struct mem_section {
> @@ -1312,12 +1314,36 @@ static inline struct mem_section *__pfn_to_section(unsigned long pfn)
>  
>  extern int __highest_present_section_nr;
>  
> +static inline int section_active_index(phys_addr_t phys)
> +{
> +	return (phys & ~(PA_SECTION_MASK)) / SECTION_ACTIVE_SIZE;
> +}
> +
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +	int idx = section_active_index(PFN_PHYS(pfn));
> +
> +	return !!(ms->usage->map_active & (1UL << idx));

section_active_mask() also converts the value to address/size.
Why do we need to convert the values and we cannot work with pfn/pages instead?
It should be perfectly possible unless I am missing something.

The only thing required would be to export earlier your:

+#define PAGES_PER_SUB_SECTION (SECTION_ACTIVE_SIZE / PAGE_SIZE)
+#define PAGE_SUB_SECTION_MASK (~(PAGES_PER_SUB_SECTION-1))

and change section_active_index to:

static inline int section_active_index(unsigned long pfn)
{
	return (pfn & ~(PAGE_SECTION_MASK)) / SUB_SECTION_ACTIVE_PAGES;
}

In this way we do need to shift the values every time and we can work with them
directly.
Maybe you made it work this way because a reason I am missing.

> +static unsigned long section_active_mask(unsigned long pfn,
> +		unsigned long nr_pages)
> +{
> +	int idx_start, idx_size;
> +	phys_addr_t start, size;
> +
> +	if (!nr_pages)
> +		return 0;
> +
> +	start = PFN_PHYS(pfn);
> +	size = PFN_PHYS(min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK)));

It seems to me that we already picked the lowest value back in
section_active_init, so we should be fine if we drop the min() here?

Another thing is why do we need to convert the values to address/size, and we
cannot work with pfns/pages.
Unless I am missing something it should be possible.

> +	size = ALIGN(size, SECTION_ACTIVE_SIZE);
> +
> +	idx_start = section_active_index(start);
> +	idx_size = section_active_index(size);
> +
> +	if (idx_size == 0)
> +		return -1;

Maybe we would be better off converting that -1 into something like "FULL_SECTION",
or at least dropping a comment there that "-1" means that the section is fully
populated.

> +	return ((1UL << idx_size) - 1) << idx_start;
> +}
> +
> +void section_active_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +	int end_sec = pfn_to_section_nr(pfn + nr_pages - 1);
> +	int i, start_sec = pfn_to_section_nr(pfn);
> +
> +	if (!nr_pages)
> +		return;
> +
> +	for (i = start_sec; i <= end_sec; i++) {
> +		struct mem_section *ms;
> +		unsigned long mask;
> +		unsigned long pfns;
> +
> +		pfns = min(nr_pages, PAGES_PER_SECTION
> +				- (pfn & ~PAGE_SECTION_MASK));
> +		mask = section_active_mask(pfn, pfns);
> +
> +		ms = __nr_to_section(i);
> +		ms->usage->map_active |= mask;
> +		pr_debug("%s: sec: %d mask: %#018lx\n", __func__, i, ms->usage->map_active);
> +
> +		pfn += pfns;
> +		nr_pages -= pfns;
> +	}
> +}
> +
>  /* Record a memory area against a node. */
>  void __init memory_present(int nid, unsigned long start, unsigned long end)
>  {
> 

-- 
Oscar Salvador
SUSE L3


  reply	other threads:[~2019-05-02  7:48 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-02  5:55 [PATCH v7 00/12] mm: Sub-section memory hotplug support Dan Williams
2019-05-02  5:55 ` Dan Williams
2019-05-02  5:55 ` [PATCH v7 01/12] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-05-02  5:55   ` Dan Williams
2019-05-03  7:35   ` Oscar Salvador
2019-05-02  5:55 ` [PATCH v7 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section Dan Williams
2019-05-02  5:55   ` Dan Williams
2019-05-03  8:06   ` Oscar Salvador
2019-05-03  8:06     ` Oscar Salvador
2019-05-02  5:55 ` [PATCH v7 03/12] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-05-02  5:55   ` Dan Williams
2019-05-02  7:48   ` Oscar Salvador [this message]
2019-05-02  7:48     ` Oscar Salvador
2019-05-02 14:03     ` Dan Williams
2019-05-03  7:31       ` Oscar Salvador
2019-05-03  7:31         ` Oscar Salvador
2019-05-03 19:52   ` Pavel Tatashin
2019-05-02  5:55 ` [PATCH v7 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-05-02  5:55   ` Dan Williams
2019-05-02  5:55 ` [PATCH v7 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-05-02  5:55   ` Dan Williams
2019-05-03  8:46   ` Oscar Salvador
2019-05-03  8:46     ` Oscar Salvador
2019-05-02  5:55 ` [PATCH v7 06/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages() Dan Williams
2019-05-02  5:55   ` Dan Williams
2019-05-02 11:27   ` David Hildenbrand
2019-05-02 11:27     ` David Hildenbrand
2019-05-03  7:37   ` Oscar Salvador
2019-05-03  7:37     ` Oscar Salvador
2019-05-02  5:55 ` [PATCH v7 07/12] mm: Kill is_dev_zone() helper Dan Williams
2019-05-02  5:55   ` Dan Williams
2019-05-02  5:56 ` [PATCH v7 08/12] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-05-02  5:56   ` Dan Williams
2019-05-03 11:00   ` Oscar Salvador
2019-05-03 11:00     ` Oscar Salvador
2019-05-02  5:56 ` [PATCH v7 09/12] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-05-02  5:56   ` Dan Williams
2019-05-03 12:56   ` Oscar Salvador
2019-05-03 12:56     ` Oscar Salvador
2019-06-04  4:17     ` Dan Williams
2019-06-04  4:17       ` Dan Williams
2019-05-02  5:56 ` [PATCH v7 10/12] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-05-02  5:56   ` Dan Williams
2019-05-02  5:56 ` [PATCH v7 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-05-02  5:56 ` [PATCH v7 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
2019-05-02  5:56   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190502074803.GA3495@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.