All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>
To: Robert Richter <robert.richter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Cc: Ard Biesheuvel
	<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>,
	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>,
	David Daney <david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>,
	Hanjun Guo <hanjun.guo-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	"linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section
Date: Tue, 18 Oct 2016 11:18:36 +0100	[thread overview]
Message-ID: <20161018101715.GA15639@leverpostej> (raw)
In-Reply-To: <20161006161114.GH22012-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>

Hi Robert, Ard,

Sorry for the delay in getting to this; I've been travelling a lot
lately and in the meantime this managed to get buried in my inbox.

On Thu, Oct 06, 2016 at 06:11:14PM +0200, Robert Richter wrote:
> On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > On 6 October 2016 at 10:52, Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org> wrote:
> > > There is a memory setup problem on ThunderX systems with certain
> > > memory configurations. The symptom is
> > >
> > >  kernel BUG at mm/page_alloc.c:1848!
> > >
> > > This happens for some configs with 64k page size enabled. The bug
> > > triggers for page zones with some pages in the zone not assigned to
> > > this particular zone. In my case some pages that are marked as nomap
> > > were not reassigned to the new zone of node 1, so those are still
> > > assigned to node 0.
> > >
> > > The reason for the mis-configuration is a change in pfn_valid() which
> > > reports pages marked nomap as invalid:
> > >
> > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping
> > 
> > These pages are owned by the firmware, which may map it with
> > attributes that conflict with the attributes we use for the linear
> > mapping. This means they should not be covered by the linear mapping.
> > 
> > > This causes pages marked as nomap being no long reassigned to the new
> > > zone in memmap_init_zone() by calling __init_single_pfn().

Why do we have pages for a nomap region? Given the region shouldn't be
in the linear mapping, and isn't suitable for general allocation, I
don't believe it makes sense to have a struct page for any part of it.

Am I missing some reason that we require a struct page?

e.g. is it just easier to allocate an unused struct page than to carve
it out?

> > This sounds like the root cause of your issue. Could we not fix that instead?
> 
> Yes, this is proposal b) from my last mail that would work too: I
> implemented an arm64 private early_pfn_valid() function that uses
> memblock_is_memory() to setup all pages of a zone. Though, I think
> this is the wrong way and thus I prefer this patch instead. I see
> serveral reasons for this:
> 
> Inconsistent use of struct *page, it is initialized but never used
> again.

As above, I don't believe we should have a struct page to initialise in
the first place.

> Other archs only do a basic range check in pfn_valid(), the default
> implementation just returns if the whole section is valid. As I
> understand the code, if the mem range is not aligned to the section,
> then there will be pfn's in the section that don't have physical mem
> attached. The page is then just initialized, it's not marked reserved
> nor the refcount is non-zero. It is then simply not used. This is how
> no-map pages should be handled too.
> 
> I think pfn_valid() is just a quick check if the pfn's struct *page
> can be used. There is a good description for this in include/linux/
> mmzone.h. So there can be memory holes that have a valid pfn.

I take it you mean the comment in the CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
ifdef (line 1266 in v4.9-rc1)?

I'm not sufficiently acquainted with the memmap code to follow; I'll
need to dig into that a bit further.

> If the no-map memory needs special handling, then additional checks
> need to be added to the particular code (as in ioremap.c). It's imo
> wrong to (mis-)use pfn_valid for that.
> 
> Variant b) involves generic mm code to fix it for arm64, this patch is
> an arm64 change only. This makes it harder to get a fix for it.
> (Though maybe only a problem of patch logistics.)
> 
> > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > memblock_is_memory().
> > 
> > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > covered by the linear mapping and the struct page array. Returning
> > reserved ranges that the kernel should not even touch only to please
> > the NUMA code seems like an inappropriate way to deal with this issue.
> 
> As said above, it is not marked as reserved, it is treated like
> non-existing memory.

I think Ard was using "reserved" in the more general sense than the
Linux-specific meaning. NOMAP is distinct from the Linux concept of
"reserved" memory, but is "reserved" in some sense.

Memory with NOMAP is meant to be treated as non-existent for the purpose
of the linear mapping (and thus for the purpose of struct page).

> This has been observed for non-numa kernels too and can happen for
> each zone that is only partly initialized.
> 
> I think the patch addresses your concerns. I can't see there the
> kernel uses memory marked as nomap in a wrong way.

I'll have to dig into this locally; I'm still not familiar enough with
this code to know what the right thing to do is.

Thanks,
Mark.

WARNING: multiple messages have this Message-ID (diff)
From: mark.rutland@arm.com (Mark Rutland)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section
Date: Tue, 18 Oct 2016 11:18:36 +0100	[thread overview]
Message-ID: <20161018101715.GA15639@leverpostej> (raw)
In-Reply-To: <20161006161114.GH22012@rric.localdomain>

Hi Robert, Ard,

Sorry for the delay in getting to this; I've been travelling a lot
lately and in the meantime this managed to get buried in my inbox.

On Thu, Oct 06, 2016 at 06:11:14PM +0200, Robert Richter wrote:
> On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > On 6 October 2016 at 10:52, Robert Richter <rrichter@cavium.com> wrote:
> > > There is a memory setup problem on ThunderX systems with certain
> > > memory configurations. The symptom is
> > >
> > >  kernel BUG at mm/page_alloc.c:1848!
> > >
> > > This happens for some configs with 64k page size enabled. The bug
> > > triggers for page zones with some pages in the zone not assigned to
> > > this particular zone. In my case some pages that are marked as nomap
> > > were not reassigned to the new zone of node 1, so those are still
> > > assigned to node 0.
> > >
> > > The reason for the mis-configuration is a change in pfn_valid() which
> > > reports pages marked nomap as invalid:
> > >
> > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping
> > 
> > These pages are owned by the firmware, which may map it with
> > attributes that conflict with the attributes we use for the linear
> > mapping. This means they should not be covered by the linear mapping.
> > 
> > > This causes pages marked as nomap being no long reassigned to the new
> > > zone in memmap_init_zone() by calling __init_single_pfn().

Why do we have pages for a nomap region? Given the region shouldn't be
in the linear mapping, and isn't suitable for general allocation, I
don't believe it makes sense to have a struct page for any part of it.

Am I missing some reason that we require a struct page?

e.g. is it just easier to allocate an unused struct page than to carve
it out?

> > This sounds like the root cause of your issue. Could we not fix that instead?
> 
> Yes, this is proposal b) from my last mail that would work too: I
> implemented an arm64 private early_pfn_valid() function that uses
> memblock_is_memory() to setup all pages of a zone. Though, I think
> this is the wrong way and thus I prefer this patch instead. I see
> serveral reasons for this:
> 
> Inconsistent use of struct *page, it is initialized but never used
> again.

As above, I don't believe we should have a struct page to initialise in
the first place.

> Other archs only do a basic range check in pfn_valid(), the default
> implementation just returns if the whole section is valid. As I
> understand the code, if the mem range is not aligned to the section,
> then there will be pfn's in the section that don't have physical mem
> attached. The page is then just initialized, it's not marked reserved
> nor the refcount is non-zero. It is then simply not used. This is how
> no-map pages should be handled too.
> 
> I think pfn_valid() is just a quick check if the pfn's struct *page
> can be used. There is a good description for this in include/linux/
> mmzone.h. So there can be memory holes that have a valid pfn.

I take it you mean the comment in the CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
ifdef (line 1266 in v4.9-rc1)?

I'm not sufficiently acquainted with the memmap code to follow; I'll
need to dig into that a bit further.

> If the no-map memory needs special handling, then additional checks
> need to be added to the particular code (as in ioremap.c). It's imo
> wrong to (mis-)use pfn_valid for that.
> 
> Variant b) involves generic mm code to fix it for arm64, this patch is
> an arm64 change only. This makes it harder to get a fix for it.
> (Though maybe only a problem of patch logistics.)
> 
> > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > memblock_is_memory().
> > 
> > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > covered by the linear mapping and the struct page array. Returning
> > reserved ranges that the kernel should not even touch only to please
> > the NUMA code seems like an inappropriate way to deal with this issue.
> 
> As said above, it is not marked as reserved, it is treated like
> non-existing memory.

I think Ard was using "reserved" in the more general sense than the
Linux-specific meaning. NOMAP is distinct from the Linux concept of
"reserved" memory, but is "reserved" in some sense.

Memory with NOMAP is meant to be treated as non-existent for the purpose
of the linear mapping (and thus for the purpose of struct page).

> This has been observed for non-numa kernels too and can happen for
> each zone that is only partly initialized.
> 
> I think the patch addresses your concerns. I can't see there the
> kernel uses memory marked as nomap in a wrong way.

I'll have to dig into this locally; I'm still not familiar enough with
this code to know what the right thing to do is.

Thanks,
Mark.

WARNING: multiple messages have this Message-ID (diff)
From: Mark Rutland <mark.rutland@arm.com>
To: Robert Richter <robert.richter@cavium.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	David Daney <david.daney@cavium.com>,
	Hanjun Guo <hanjun.guo@linaro.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section
Date: Tue, 18 Oct 2016 11:18:36 +0100	[thread overview]
Message-ID: <20161018101715.GA15639@leverpostej> (raw)
In-Reply-To: <20161006161114.GH22012@rric.localdomain>

Hi Robert, Ard,

Sorry for the delay in getting to this; I've been travelling a lot
lately and in the meantime this managed to get buried in my inbox.

On Thu, Oct 06, 2016 at 06:11:14PM +0200, Robert Richter wrote:
> On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > On 6 October 2016 at 10:52, Robert Richter <rrichter@cavium.com> wrote:
> > > There is a memory setup problem on ThunderX systems with certain
> > > memory configurations. The symptom is
> > >
> > >  kernel BUG at mm/page_alloc.c:1848!
> > >
> > > This happens for some configs with 64k page size enabled. The bug
> > > triggers for page zones with some pages in the zone not assigned to
> > > this particular zone. In my case some pages that are marked as nomap
> > > were not reassigned to the new zone of node 1, so those are still
> > > assigned to node 0.
> > >
> > > The reason for the mis-configuration is a change in pfn_valid() which
> > > reports pages marked nomap as invalid:
> > >
> > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping
> > 
> > These pages are owned by the firmware, which may map it with
> > attributes that conflict with the attributes we use for the linear
> > mapping. This means they should not be covered by the linear mapping.
> > 
> > > This causes pages marked as nomap being no long reassigned to the new
> > > zone in memmap_init_zone() by calling __init_single_pfn().

Why do we have pages for a nomap region? Given the region shouldn't be
in the linear mapping, and isn't suitable for general allocation, I
don't believe it makes sense to have a struct page for any part of it.

Am I missing some reason that we require a struct page?

e.g. is it just easier to allocate an unused struct page than to carve
it out?

> > This sounds like the root cause of your issue. Could we not fix that instead?
> 
> Yes, this is proposal b) from my last mail that would work too: I
> implemented an arm64 private early_pfn_valid() function that uses
> memblock_is_memory() to setup all pages of a zone. Though, I think
> this is the wrong way and thus I prefer this patch instead. I see
> serveral reasons for this:
> 
> Inconsistent use of struct *page, it is initialized but never used
> again.

As above, I don't believe we should have a struct page to initialise in
the first place.

> Other archs only do a basic range check in pfn_valid(), the default
> implementation just returns if the whole section is valid. As I
> understand the code, if the mem range is not aligned to the section,
> then there will be pfn's in the section that don't have physical mem
> attached. The page is then just initialized, it's not marked reserved
> nor the refcount is non-zero. It is then simply not used. This is how
> no-map pages should be handled too.
> 
> I think pfn_valid() is just a quick check if the pfn's struct *page
> can be used. There is a good description for this in include/linux/
> mmzone.h. So there can be memory holes that have a valid pfn.

I take it you mean the comment in the CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
ifdef (line 1266 in v4.9-rc1)?

I'm not sufficiently acquainted with the memmap code to follow; I'll
need to dig into that a bit further.

> If the no-map memory needs special handling, then additional checks
> need to be added to the particular code (as in ioremap.c). It's imo
> wrong to (mis-)use pfn_valid for that.
> 
> Variant b) involves generic mm code to fix it for arm64, this patch is
> an arm64 change only. This makes it harder to get a fix for it.
> (Though maybe only a problem of patch logistics.)
> 
> > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > memblock_is_memory().
> > 
> > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > covered by the linear mapping and the struct page array. Returning
> > reserved ranges that the kernel should not even touch only to please
> > the NUMA code seems like an inappropriate way to deal with this issue.
> 
> As said above, it is not marked as reserved, it is treated like
> non-existing memory.

I think Ard was using "reserved" in the more general sense than the
Linux-specific meaning. NOMAP is distinct from the Linux concept of
"reserved" memory, but is "reserved" in some sense.

Memory with NOMAP is meant to be treated as non-existent for the purpose
of the linear mapping (and thus for the purpose of struct page).

> This has been observed for non-numa kernels too and can happen for
> each zone that is only partly initialized.
> 
> I think the patch addresses your concerns. I can't see there the
> kernel uses memory marked as nomap in a wrong way.

I'll have to dig into this locally; I'm still not familiar enough with
this code to know what the right thing to do is.

Thanks,
Mark.

  parent reply	other threads:[~2016-10-18 10:18 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-06  9:52 [PATCH] arm64: mm: Fix memmap to be initialized for the entire section Robert Richter
2016-10-06  9:52 ` Robert Richter
2016-10-06  9:52 ` Robert Richter
     [not found] ` <1475747527-32387-1-git-send-email-rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
2016-10-06 10:00   ` Ard Biesheuvel
2016-10-06 10:00     ` Ard Biesheuvel
2016-10-06 10:00     ` Ard Biesheuvel
     [not found]     ` <CAKv+Gu-oYzSn_eeqaX3QVFeQjBGkaX-Wh7+oPcQFcXv7J6GOFQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-06 16:11       ` Robert Richter
2016-10-06 16:11         ` Robert Richter
2016-10-06 16:11         ` Robert Richter
     [not found]         ` <20161006161114.GH22012-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-10-17 18:58           ` Robert Richter
2016-10-17 18:58             ` Robert Richter
2016-10-17 18:58             ` Robert Richter
     [not found]             ` <20161017185801.GT25086-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-10-27 16:01               ` Will Deacon
2016-10-27 16:01                 ` Will Deacon
2016-10-27 16:01                 ` Will Deacon
     [not found]                 ` <20161027160136.GD24290-5wv7dgnIgG8@public.gmane.org>
2016-10-28  9:19                   ` Robert Richter
2016-10-28  9:19                     ` Robert Richter
2016-10-28  9:19                     ` Robert Richter
2016-11-07 21:05                     ` Will Deacon
2016-11-07 21:05                       ` Will Deacon
2016-11-09 19:51                       ` Robert Richter
2016-11-09 19:51                         ` Robert Richter
     [not found]                         ` <20161109195132.GZ22012-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-11-17 14:25                           ` Will Deacon
2016-11-17 14:25                             ` Will Deacon
2016-11-17 14:25                             ` Will Deacon
2016-11-17 15:18                             ` Robert Richter
2016-11-17 15:18                               ` Robert Richter
2016-11-17 15:18                               ` Robert Richter
     [not found]                               ` <20161117151805.GJ2151-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-11-20 17:07                                 ` Ard Biesheuvel
2016-11-20 17:07                                   ` Ard Biesheuvel
2016-11-20 17:07                                   ` Ard Biesheuvel
2016-11-23 21:15                                   ` Robert Richter
2016-11-23 21:15                                     ` Robert Richter
2016-11-23 21:15                                     ` Robert Richter
     [not found]                                     ` <20161123211538.GH10776-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-11-23 21:25                                       ` Ard Biesheuvel
2016-11-23 21:25                                         ` Ard Biesheuvel
2016-11-23 21:25                                         ` Ard Biesheuvel
2016-11-24 13:42                                         ` Robert Richter
2016-11-24 13:42                                           ` Robert Richter
2016-11-24 13:42                                           ` Robert Richter
     [not found]                                           ` <20161124134238.GI10776-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-11-24 13:44                                             ` Ard Biesheuvel
2016-11-24 13:44                                               ` Ard Biesheuvel
2016-11-24 13:44                                               ` Ard Biesheuvel
2016-11-24 13:51                                               ` Robert Richter
2016-11-24 13:51                                                 ` Robert Richter
     [not found]                                                 ` <20161124135151.GJ10776-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-11-24 13:58                                                   ` Ard Biesheuvel
2016-11-24 13:58                                                     ` Ard Biesheuvel
2016-11-24 13:58                                                     ` Ard Biesheuvel
2016-11-24 14:11                                                     ` Robert Richter
2016-11-24 14:11                                                       ` Robert Richter
2016-11-24 14:11                                                       ` Robert Richter
2016-11-24 14:23                                                       ` Ard Biesheuvel
2016-11-24 14:23                                                         ` Ard Biesheuvel
2016-11-24 14:23                                                         ` Ard Biesheuvel
2016-11-24 15:09                                                         ` Robert Richter
2016-11-24 15:09                                                           ` Robert Richter
     [not found]                                                           ` <20161124150918.GF2213-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-11-24 19:26                                                             ` Robert Richter
2016-11-24 19:26                                                               ` Robert Richter
2016-11-24 19:26                                                               ` Robert Richter
2016-11-24 19:42                                                               ` Ard Biesheuvel
2016-11-24 19:42                                                                 ` Ard Biesheuvel
     [not found]                                                                 ` <CAKv+Gu_C_17RtAiw2U0OOzVik3G7KkwUTo5eiGK-HDo-maQ-bA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-25 11:29                                                                   ` Robert Richter
2016-11-25 11:29                                                                     ` Robert Richter
2016-11-25 11:29                                                                     ` Robert Richter
     [not found]                                                                     ` <20161125112914.GI2213-vWBEXY7mpu582hYKe6nXyg@public.gmane.org>
2016-11-25 12:28                                                                       ` Ard Biesheuvel
2016-11-25 12:28                                                                         ` Ard Biesheuvel
2016-11-25 12:28                                                                         ` Ard Biesheuvel
2016-11-25 17:01                                                                         ` Ard Biesheuvel
2016-11-25 17:01                                                                           ` Ard Biesheuvel
2016-11-25 17:01                                                                           ` Ard Biesheuvel
2016-10-18 10:18           ` Mark Rutland [this message]
2016-10-18 10:18             ` Mark Rutland
2016-10-18 10:18             ` Mark Rutland
2016-10-18 15:02             ` Robert Richter
2016-10-18 15:02               ` Robert Richter
2016-10-18 15:02               ` Robert Richter
2016-11-01 16:55   ` Robert Richter
2016-11-01 16:55     ` Robert Richter
2016-11-01 16:55     ` Robert Richter
2016-10-10 15:33 ` David Daney
2016-10-10 15:33   ` David Daney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161018101715.GA15639@leverpostej \
    --to=mark.rutland-5wv7dgnigg8@public.gmane.org \
    --cc=ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=catalin.marinas-5wv7dgnIgG8@public.gmane.org \
    --cc=david.daney-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
    --cc=hanjun.guo-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=robert.richter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
    --cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.