All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: akpm@linux-foundation.org, Michal Hocko <mhocko@suse.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Logan Gunthorpe <logang@deltatee.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 08/12] mm/sparsemem: Support sub-section hotplug
Date: Fri, 7 Jun 2019 10:33:58 +0200	[thread overview]
Message-ID: <20190607083351.GA5342@linux> (raw)
In-Reply-To: <155977192280.2443951.13941265207662462739.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed, Jun 05, 2019 at 02:58:42PM -0700, Dan Williams wrote:
> The libnvdimm sub-system has suffered a series of hacks and broken
> workarounds for the memory-hotplug implementation's awkward
> section-aligned (128MB) granularity. For example the following backtrace
> is emitted when attempting arch_add_memory() with physical address
> ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
> within a given section:
> 
>  WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
>  devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
>  [..]
>  Call Trace:
>    dump_stack+0x86/0xc3
>    __warn+0xcb/0xf0
>    warn_slowpath_fmt+0x5f/0x80
>    devm_memremap_pages+0x3b5/0x4c0
>    __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
>    pmem_attach_disk+0x19a/0x440 [nd_pmem]
> 
> Recently it was discovered that the problem goes beyond RAM vs PMEM
> collisions as some platform produce PMEM vs PMEM collisions within a
> given section. The libnvdimm workaround for that case revealed that the
> libnvdimm section-alignment-padding implementation has been broken for a
> long while. A fix for that long-standing breakage introduces as many
> problems as it solves as it would require a backward-incompatible change
> to the namespace metadata interpretation. Instead of that dubious route
> [1], address the root problem in the memory-hotplug implementation.
> 
> [1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  include/linux/memory_hotplug.h |    2 
>  mm/memory_hotplug.c            |    7 -
>  mm/page_alloc.c                |    2 
>  mm/sparse.c                    |  225 +++++++++++++++++++++++++++-------------
>  4 files changed, 155 insertions(+), 81 deletions(-)
> 
[...]
> @@ -325,6 +332,15 @@ static void __meminit sparse_init_one_section(struct mem_section *ms,
>  		unsigned long pnum, struct page *mem_map,
>  		struct mem_section_usage *usage)
>  {
> +	/*
> +	 * Given that SPARSEMEM_VMEMMAP=y supports sub-section hotplug,
> +	 * ->section_mem_map can not be guaranteed to point to a full
> +	 *  section's worth of memory.  The field is only valid / used
> +	 *  in the SPARSEMEM_VMEMMAP=n case.
> +	 */
> +	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
> +		mem_map = NULL;

Will this be a problem when reading mem_map with the crash-tool?
I do not expect it to be, but I am not sure if crash internally tries
to read ms->section_mem_map and do some sort of translation.
And since ms->section_mem_map SECTION_HAS_MEM_MAP, it might be that it expects
a valid mem_map?

> +static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> +		struct vmem_altmap *altmap)
> +{
> +	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> +	DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	bool early_section = is_early_section(ms);
> +	struct page *memmap = NULL;
> +	unsigned long *subsection_map = ms->usage
> +		? &ms->usage->subsection_map[0] : NULL;
> +
> +	subsection_mask_set(map, pfn, nr_pages);
> +	if (subsection_map)
> +		bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +
> +	if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> +				"section already deactivated (%#lx + %ld)\n",
> +				pfn, nr_pages))
> +		return;
> +
> +	/*
> +	 * There are 3 cases to handle across two configurations
> +	 * (SPARSEMEM_VMEMMAP={y,n}):
> +	 *
> +	 * 1/ deactivation of a partial hot-added section (only possible
> +	 * in the SPARSEMEM_VMEMMAP=y case).
> +	 *    a/ section was present at memory init
> +	 *    b/ section was hot-added post memory init
> +	 * 2/ deactivation of a complete hot-added section
> +	 * 3/ deactivation of a complete section from memory init
> +	 *
> +	 * For 1/, when subsection_map does not empty we will not be
> +	 * freeing the usage map, but still need to free the vmemmap
> +	 * range.
> +	 *
> +	 * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> +	 */
> +	bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +	if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
> +		unsigned long section_nr = pfn_to_section_nr(pfn);
> +
> +		if (!early_section) {
> +			kfree(ms->usage);
> +			ms->usage = NULL;
> +		}
> +		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> +		ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr);
> +	}
> +
> +	if (early_section && memmap)
> +		free_map_bootmem(memmap);
> +	else
> +		depopulate_section_memmap(pfn, nr_pages, altmap);
> +}
> +
> +static struct page * __meminit section_activate(int nid, unsigned long pfn,
> +		unsigned long nr_pages, struct vmem_altmap *altmap)
> +{
> +	DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> +	struct mem_section *ms = __pfn_to_section(pfn);
> +	struct mem_section_usage *usage = NULL;
> +	unsigned long *subsection_map;
> +	struct page *memmap;
> +	int rc = 0;
> +
> +	subsection_mask_set(map, pfn, nr_pages);
> +
> +	if (!ms->usage) {
> +		usage = kzalloc(mem_section_usage_size(), GFP_KERNEL);
> +		if (!usage)
> +			return ERR_PTR(-ENOMEM);
> +		ms->usage = usage;
> +	}
> +	subsection_map = &ms->usage->subsection_map[0];
> +
> +	if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
> +		rc = -EINVAL;
> +	else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
> +		rc = -EEXIST;
> +	else
> +		bitmap_or(subsection_map, map, subsection_map,
> +				SUBSECTIONS_PER_SECTION);
> +
> +	if (rc) {
> +		if (usage)
> +			ms->usage = NULL;
> +		kfree(usage);
> +		return ERR_PTR(rc);
> +	}

We should not be really looking at subsection_map stuff when running on
!CONFIG_SPARSE_VMEMMAP, right?
Would it make sense to hide the bitmap dance behind

if(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) ?

Sorry for nagging here

>  /**
> - * sparse_add_one_section - add a memory section
> + * sparse_add_section - add a memory section, or populate an existing one
>   * @nid: The node to add section on
>   * @start_pfn: start pfn of the memory range
> + * @nr_pages: number of pfns to add in the section
>   * @altmap: device page map
>   *
>   * This is only intended for hotplug.

Below this, the return codes are specified:

---
 * Return:
 * * 0          - On success.
 * * -EEXIST    - Section has been present.
 * * -ENOMEM    - Out of memory.
 */
---

We can get rid of -EEXIST since we do not return that anymore.

-- 
Oscar Salvador
SUSE L3

  reply	other threads:[~2019-06-07  8:33 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-05 21:57 [PATCH v9 00/12] mm: Sub-section memory hotplug support Dan Williams
2019-06-05 21:57 ` Dan Williams
2019-06-05 21:57 ` [PATCH v9 01/12] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-06-05 21:57   ` Dan Williams
2019-06-06 17:34   ` Oscar Salvador
2019-06-06 17:34     ` Oscar Salvador
2019-06-16 13:11   ` Wei Yang
2019-06-16 13:11     ` Wei Yang
2019-06-18 21:56     ` Dan Williams
2019-06-18 21:56       ` Dan Williams
2019-06-19  2:13       ` Wei Yang
2019-06-19  2:13         ` Wei Yang
2019-06-05 21:57 ` [PATCH v9 02/12] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-06-05 21:57   ` Dan Williams
2019-06-06 16:55   ` Oscar Salvador
2019-06-06 16:55     ` Oscar Salvador
2019-06-17 22:21   ` Wei Yang
2019-06-17 22:21     ` Wei Yang
2019-06-17 22:32     ` Dan Williams
2019-06-17 22:32       ` Dan Williams
2019-06-18  1:03       ` Wei Yang
2019-06-18  1:03         ` Wei Yang
2019-06-19  3:15       ` Dan Williams
2019-06-05 21:58 ` [PATCH v9 03/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-18  1:42   ` Wei Yang
2019-06-18  1:42     ` Wei Yang
2019-06-19  3:40     ` Dan Williams
2019-06-05 21:58 ` [PATCH v9 04/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-06 17:02   ` Oscar Salvador
2019-06-06 17:02     ` Oscar Salvador
2019-06-16  6:06   ` Aneesh Kumar K.V
2019-06-16  6:06     ` Aneesh Kumar K.V
2019-06-05 21:58 ` [PATCH v9 05/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages() Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-05 21:58 ` [PATCH v9 06/12] mm: Kill is_dev_zone() helper Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-18  3:35   ` Wei Yang
2019-06-18  3:35     ` Wei Yang
2019-06-05 21:58 ` [PATCH v9 07/12] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-06 17:21   ` Oscar Salvador
2019-06-06 17:21     ` Oscar Salvador
2019-06-06 18:16     ` Dan Williams
2019-06-06 18:16       ` Dan Williams
2019-06-14  8:39   ` David Hildenbrand
2019-06-14  8:39     ` David Hildenbrand
2019-06-05 21:58 ` [PATCH v9 08/12] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-07  8:33   ` Oscar Salvador [this message]
2019-06-07 15:38     ` Dan Williams
2019-06-07 15:38       ` Dan Williams
2019-06-07 21:41       ` Oscar Salvador
2019-06-05 21:58 ` [PATCH v9 09/12] mm: Document ZONE_DEVICE memory-model implications Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-05 21:58 ` [PATCH v9 10/12] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-07  8:56   ` Oscar Salvador
2019-06-07  8:56     ` Oscar Salvador
2019-06-16  7:49   ` Aneesh Kumar K.V
2019-06-05 21:58 ` [PATCH v9 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-06-05 21:58   ` Dan Williams
2019-06-06 21:46   ` Andrew Morton
2019-06-06 21:46     ` Andrew Morton
2019-06-06 22:06     ` Dan Williams
2019-06-06 22:06       ` Dan Williams
2019-06-07 19:54       ` Andrew Morton
2019-06-07 20:09         ` Dan Williams
2019-06-12  9:41   ` Aneesh Kumar K.V
2019-06-05 21:59 ` [PATCH v9 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
2019-06-05 21:59   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190607083351.GA5342@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=logang@deltatee.com \
    --cc=mhocko@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.