From: Oscar Salvador <osalvador@suse.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>,
linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, akpm@linux-foundation.org,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v7 09/12] mm/sparsemem: Support sub-section hotplug
Date: Fri, 3 May 2019 14:56:34 +0200 [thread overview]
Message-ID: <20190503125634.GH15740@linux> (raw)
In-Reply-To: <155677657023.2336373.4452495266651002382.stgit@dwillia2-desk3.amr.corp.intel.com>
On Wed, May 01, 2019 at 10:56:10PM -0700, Dan Williams wrote:
> The libnvdimm sub-system has suffered a series of hacks and broken
> workarounds for the memory-hotplug implementation's awkward
> section-aligned (128MB) granularity. For example the following backtrace
> is emitted when attempting arch_add_memory() with physical address
> ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
> within a given section:
>
> WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
> devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
> [..]
> Call Trace:
> dump_stack+0x86/0xc3
> __warn+0xcb/0xf0
> warn_slowpath_fmt+0x5f/0x80
> devm_memremap_pages+0x3b5/0x4c0
> __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
> pmem_attach_disk+0x19a/0x440 [nd_pmem]
>
> Recently it was discovered that the problem goes beyond RAM vs PMEM
> collisions as some platform produce PMEM vs PMEM collisions within a
> given section. The libnvdimm workaround for that case revealed that the
> libnvdimm section-alignment-padding implementation has been broken for a
> long while. A fix for that long-standing breakage introduces as many
> problems as it solves as it would require a backward-incompatible change
> to the namespace metadata interpretation. Instead of that dubious route
> [1], address the root problem in the memory-hotplug implementation.
>
> [1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> mm/sparse.c | 223 ++++++++++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 150 insertions(+), 73 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 198371e5fc87..419a3620af6e 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
> unsigned long root = SECTION_NR_TO_ROOT(section_nr);
> struct mem_section *section;
>
> + /*
> + * An existing section is possible in the sub-section hotplug
> + * case. First hot-add instantiates, follow-on hot-add reuses
> + * the existing section.
> + *
> + * The mem_hotplug_lock resolves the apparent race below.
> + */
> if (mem_section[root])
> - return -EEXIST;
> + return 0;
Just a sidenote: we do not bail out on -EEXIST, so it should be fine if we
stick with it.
But if not, I would then clean up sparse_add_section:
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -901,13 +901,12 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
int ret;
ret = sparse_index_init(section_nr, nid);
- if (ret < 0 && ret != -EEXIST)
+ if (ret < 0)
return ret;
memmap = section_activate(nid, start_pfn, nr_pages, altmap);
if (IS_ERR(memmap))
return PTR_ERR(memmap);
- ret = 0;
> +
> + if (!mask)
> + rc = -EINVAL;
> + else if (mask & ms->usage->map_active)
else if (ms->usage->map_active) should be enough?
> + rc = -EEXIST;
> + else
> + ms->usage->map_active |= mask;
> +
> + if (rc) {
> + if (usage)
> + ms->usage = NULL;
> + kfree(usage);
> + return ERR_PTR(rc);
> + }
> +
> + /*
> + * The early init code does not consider partially populated
> + * initial sections, it simply assumes that memory will never be
> + * referenced. If we hot-add memory into such a section then we
> + * do not need to populate the memmap and can simply reuse what
> + * is already there.
> + */
This puzzles me a bit.
I think we cannot have partially populated early sections, can we?
And how we even come to hot-add memory into those?
Could you please elaborate a bit here?
> + ms = __pfn_to_section(start_pfn);
> section_mark_present(ms);
> - sparse_init_one_section(ms, section_nr, memmap, usage);
> + sparse_init_one_section(ms, section_nr, memmap, ms->usage);
>
> -out:
> - if (ret < 0) {
> - kfree(usage);
> - depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
> - }
> + if (ret < 0)
> + section_deactivate(start_pfn, nr_pages, nid, altmap);
Uhm, if my eyes do not trick me, ret is only used for the return value from
sparse_index_init(), so this is not needed. Can we get rid of it?
Unfortunately I am running out of time, but I plan to keep reviewing this patch
in the next few days.
--
Oscar Salvador
SUSE L3
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Oscar Salvador <osalvador@suse.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: akpm@linux-foundation.org, Michal Hocko <mhocko@suse.com>,
Vlastimil Babka <vbabka@suse.cz>,
Logan Gunthorpe <logang@deltatee.com>,
linux-nvdimm@lists.01.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 09/12] mm/sparsemem: Support sub-section hotplug
Date: Fri, 3 May 2019 14:56:34 +0200 [thread overview]
Message-ID: <20190503125634.GH15740@linux> (raw)
In-Reply-To: <155677657023.2336373.4452495266651002382.stgit@dwillia2-desk3.amr.corp.intel.com>
On Wed, May 01, 2019 at 10:56:10PM -0700, Dan Williams wrote:
> The libnvdimm sub-system has suffered a series of hacks and broken
> workarounds for the memory-hotplug implementation's awkward
> section-aligned (128MB) granularity. For example the following backtrace
> is emitted when attempting arch_add_memory() with physical address
> ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM)
> within a given section:
>
> WARNING: CPU: 0 PID: 558 at kernel/memremap.c:300 devm_memremap_pages+0x3b5/0x4c0
> devm_memremap_pages attempted on mixed region [mem 0x200000000-0x2fbffffff flags 0x200]
> [..]
> Call Trace:
> dump_stack+0x86/0xc3
> __warn+0xcb/0xf0
> warn_slowpath_fmt+0x5f/0x80
> devm_memremap_pages+0x3b5/0x4c0
> __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
> pmem_attach_disk+0x19a/0x440 [nd_pmem]
>
> Recently it was discovered that the problem goes beyond RAM vs PMEM
> collisions as some platform produce PMEM vs PMEM collisions within a
> given section. The libnvdimm workaround for that case revealed that the
> libnvdimm section-alignment-padding implementation has been broken for a
> long while. A fix for that long-standing breakage introduces as many
> problems as it solves as it would require a backward-incompatible change
> to the namespace metadata interpretation. Instead of that dubious route
> [1], address the root problem in the memory-hotplug implementation.
>
> [1]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> mm/sparse.c | 223 ++++++++++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 150 insertions(+), 73 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 198371e5fc87..419a3620af6e 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -83,8 +83,15 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
> unsigned long root = SECTION_NR_TO_ROOT(section_nr);
> struct mem_section *section;
>
> + /*
> + * An existing section is possible in the sub-section hotplug
> + * case. First hot-add instantiates, follow-on hot-add reuses
> + * the existing section.
> + *
> + * The mem_hotplug_lock resolves the apparent race below.
> + */
> if (mem_section[root])
> - return -EEXIST;
> + return 0;
Just a sidenote: we do not bail out on -EEXIST, so it should be fine if we
stick with it.
But if not, I would then clean up sparse_add_section:
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -901,13 +901,12 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
int ret;
ret = sparse_index_init(section_nr, nid);
- if (ret < 0 && ret != -EEXIST)
+ if (ret < 0)
return ret;
memmap = section_activate(nid, start_pfn, nr_pages, altmap);
if (IS_ERR(memmap))
return PTR_ERR(memmap);
- ret = 0;
> +
> + if (!mask)
> + rc = -EINVAL;
> + else if (mask & ms->usage->map_active)
else if (ms->usage->map_active) should be enough?
> + rc = -EEXIST;
> + else
> + ms->usage->map_active |= mask;
> +
> + if (rc) {
> + if (usage)
> + ms->usage = NULL;
> + kfree(usage);
> + return ERR_PTR(rc);
> + }
> +
> + /*
> + * The early init code does not consider partially populated
> + * initial sections, it simply assumes that memory will never be
> + * referenced. If we hot-add memory into such a section then we
> + * do not need to populate the memmap and can simply reuse what
> + * is already there.
> + */
This puzzles me a bit.
I think we cannot have partially populated early sections, can we?
And how we even come to hot-add memory into those?
Could you please elaborate a bit here?
> + ms = __pfn_to_section(start_pfn);
> section_mark_present(ms);
> - sparse_init_one_section(ms, section_nr, memmap, usage);
> + sparse_init_one_section(ms, section_nr, memmap, ms->usage);
>
> -out:
> - if (ret < 0) {
> - kfree(usage);
> - depopulate_section_memmap(start_pfn, PAGES_PER_SECTION, altmap);
> - }
> + if (ret < 0)
> + section_deactivate(start_pfn, nr_pages, nid, altmap);
Uhm, if my eyes do not trick me, ret is only used for the return value from
sparse_index_init(), so this is not needed. Can we get rid of it?
Unfortunately I am running out of time, but I plan to keep reviewing this patch
in the next few days.
--
Oscar Salvador
SUSE L3
next prev parent reply other threads:[~2019-05-03 12:56 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-02 5:55 [PATCH v7 00/12] mm: Sub-section memory hotplug support Dan Williams
2019-05-02 5:55 ` Dan Williams
2019-05-02 5:55 ` [PATCH v7 01/12] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-05-02 5:55 ` Dan Williams
2019-05-03 7:35 ` Oscar Salvador
2019-05-02 5:55 ` [PATCH v7 02/12] mm/sparsemem: Introduce common definitions for the size and mask of a section Dan Williams
2019-05-02 5:55 ` Dan Williams
2019-05-03 8:06 ` Oscar Salvador
2019-05-03 8:06 ` Oscar Salvador
2019-05-02 5:55 ` [PATCH v7 03/12] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-05-02 5:55 ` Dan Williams
2019-05-02 7:48 ` Oscar Salvador
2019-05-02 7:48 ` Oscar Salvador
2019-05-02 14:03 ` Dan Williams
2019-05-03 7:31 ` Oscar Salvador
2019-05-03 7:31 ` Oscar Salvador
2019-05-03 19:52 ` Pavel Tatashin
2019-05-02 5:55 ` [PATCH v7 04/12] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-05-02 5:55 ` Dan Williams
2019-05-02 5:55 ` [PATCH v7 05/12] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-05-02 5:55 ` Dan Williams
2019-05-03 8:46 ` Oscar Salvador
2019-05-03 8:46 ` Oscar Salvador
2019-05-02 5:55 ` [PATCH v7 06/12] mm/hotplug: Kill is_dev_zone() usage in __remove_pages() Dan Williams
2019-05-02 5:55 ` Dan Williams
2019-05-02 11:27 ` David Hildenbrand
2019-05-02 11:27 ` David Hildenbrand
2019-05-03 7:37 ` Oscar Salvador
2019-05-03 7:37 ` Oscar Salvador
2019-05-02 5:55 ` [PATCH v7 07/12] mm: Kill is_dev_zone() helper Dan Williams
2019-05-02 5:55 ` Dan Williams
2019-05-02 5:56 ` [PATCH v7 08/12] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-05-02 5:56 ` Dan Williams
2019-05-03 11:00 ` Oscar Salvador
2019-05-03 11:00 ` Oscar Salvador
2019-05-02 5:56 ` [PATCH v7 09/12] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-05-02 5:56 ` Dan Williams
2019-05-03 12:56 ` Oscar Salvador [this message]
2019-05-03 12:56 ` Oscar Salvador
2019-06-04 4:17 ` Dan Williams
2019-06-04 4:17 ` Dan Williams
2019-05-02 5:56 ` [PATCH v7 10/12] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-05-02 5:56 ` Dan Williams
2019-05-02 5:56 ` [PATCH v7 11/12] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-05-02 5:56 ` [PATCH v7 12/12] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
2019-05-02 5:56 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190503125634.GH15740@linux \
--to=osalvador@suse.de \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=mhocko@suse.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.