linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Rich Felker <dalias@libc.org>,
	"linux-ia64@vger.kernel.org" <linux-ia64@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Michal Hocko <mhocko@kernel.org>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Max Filippov <jcmvbkbc@gmail.com>, Guo Ren <guoren@kernel.org>,
	"linux-csky@vger.kernel.org" <linux-csky@vger.kernel.org>,
	"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	Greg Ungerer <gerg@linux-m68k.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
	"linux-c6x-dev@linux-c6x.org" <linux-c6x-dev@linux-c6x.org>,
	Baoquan He <bhe@redhat.com>, Jonathan Corbet <corbet@lwn.net>,
	linux-sh@vger.kerne
Subject: Re: [PATCH v2 17/20] mm: free_area_init: allow defining max_zone_pfn in descending order
Date: Tue, 5 May 2020 12:19:46 +0300	[thread overview]
Message-ID: <20200505091946.GG342687@linux.ibm.com> (raw)
In-Reply-To: <a0b20e15-fddb-aa9c-fd67-f1c8e735b4a4@synopsys.com>

Hi Vineet,

On Tue, May 05, 2020 at 06:23:37AM +0000, Vineet Gupta wrote:
> Hi Mike,
> 
> On 5/4/20 8:39 AM, Mike Rapoport wrote:
> > On Sun, May 03, 2020 at 11:43:00AM -0700, Guenter Roeck wrote:
> >> On Sun, May 03, 2020 at 10:41:38AM -0700, Guenter Roeck wrote:
> >>> Hi,
> >>>
> >>> On Wed, Apr 29, 2020 at 03:11:23PM +0300, Mike Rapoport wrote:
> >>>> From: Mike Rapoport <rppt@linux.ibm.com>
> >>>>
> >>>> Some architectures (e.g. ARC) have the ZONE_HIGHMEM zone below the
> >>>> ZONE_NORMAL. Allowing free_area_init() parse max_zone_pfn array even it is
> >>>> sorted in descending order allows using free_area_init() on such
> >>>> architectures.
> >>>>
> >>>> Add top -> down traversal of max_zone_pfn array in free_area_init() and use
> >>>> the latter in ARC node/zone initialization.
> >>>>
> >>>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> >>>
> >>> This patch causes my microblazeel qemu boot test in linux-next to fail.
> >>> Reverting it fixes the problem.
> >>>
> >> The same problem is seen with s390 emulations.
> > 
> > Yeah, this patch breaks some others as well :(
> > 
> > My assumption that max_zone_pfn defines architectural limit for maximal
> > PFN that can belong to a zone was over-optimistic. Several arches
> > actually do that, but others do
> > 
> > 	max_zone_pfn[ZONE_DMA] = MAX_DMA_PFN;
> > 	max_zone_pfn[ZONE_NORMAL] = max_pfn;
> > 
> > where MAX_DMA_PFN is build-time constrain and max_pfn is run time limit
> > for the current system.
> > 
> > So, when max_pfn is lower than MAX_DMA_PFN, the free_init_area() will
> > consider max_zone_pfn as descending and will wrongly calculate zone
> > extents.
> > 
> > That said, instead of trying to create a generic way to special case
> > ARC, I suggest to simply use the below patch instead.
> 
> Even for ARC it will be a bit more complicated. Highmem on ARC can be setup in 2
> ways such that it is descending in one case, and ascending in other (w.r.t
> "normal" mem) :-(

Yeah, and this makes ARC really special :)

> First some basic info about an ARC MMU based system
> 
> ARC logical address space (various addresses embedded in binaries)
>  - translated (0 to 0x6FFF_FFFF)  - for userspace
>  - untranslated (0x8000_0000 to 0xFFFF_FFFF) - kernel
> 
> ARC Physical address space is typically from 0x8000_0000 to 0xF000_0000.
> Above translated space maps here via MMU. Untranslated is implicitly mapped (no
> MMU involved).
> 
> The physical address in turn maps to a Bus address / memory (done at the
> inter-connect/NoC). Typically Physical 0x8000_0000 map to DDR 0
> 
> Now,
> - HIGHMEM w/o PAE40 adds Physical address space 0 to 0x7FFF_FFFF.
> - HIGHMEM with PAE40 uses physical address space from 0x1_0000_0000 upwards.
> 
> But then you could also have a system which has both of above so the bimodal up/dn
> won't work.

From the code I've got the impression that it is either one of them. I.e
the physical memory is either at

0x8000_0000 - <end of DDR 0 bank>
0x0000_0000 - <end of DDR 1 bank>

or

0x0_8000_0000 - <end of DDR 0 bank>
0x1_0000_0000 - <end of DDR 1 bank>

Is this possible to have a system with three live ranges? Like

0x0_0000_0000 - <end of DDR 1 bank>
0x0_8000_0000 - <end of DDR 0 bank>
0x1_0000_0000 - <end of DDR 2 bank>

> While I appreciate the effort to reduce complexity, it seems the
> current way of
> setting things up allows for more flexibility in specifying the system memory map.
>
> PS: I haven't looked at your series too carefully, the mention of ARC caught my
> attention :-) I guess I need to read it more carefully to understand.
 
That would be cool :)

> > 
> > diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
> > index 41eb9be1653c..386959bac3d2 100644
> > --- a/arch/arc/mm/init.c
> > +++ b/arch/arc/mm/init.c
> > @@ -77,6 +77,11 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
> >  		base, TO_MB(size), !in_use ? "Not used":"");
> >  }
> >  
> > +bool arch_has_descending_max_zone_pfns(void)
> > +{
> > +	return true;
> > +}
> > +
> >  /*
> >   * First memory setup routine called from setup_arch()
> >   * 1. setup swapper's mm @init_mm
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index b990e9734474..114f0e027144 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7307,6 +7307,15 @@ static void check_for_memory(pg_data_t *pgdat, int nid)
> >  	}
> >  }
> >  
> > +/*
> > + * Some architecturs, e.g. ARC may have ZONE_HIGHMEM below ZONE_NORMAL. For
> > + * such cases we allow max_zone_pfn sorted in the descending order
> > + */
> > +bool __weak arch_has_descending_max_zone_pfns(void)
> > +{
> > +	return false;
> > +}
> > +
> >  /**
> >   * free_area_init - Initialise all pg_data_t and zone data
> >   * @max_zone_pfn: an array of max PFNs for each zone
> > @@ -7324,7 +7333,7 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> >  {
> >  	unsigned long start_pfn, end_pfn;
> >  	int i, nid, zone;
> > -	bool descending = false;
> > +	bool descending;
> >  
> >  	/* Record where the zone boundaries are */
> >  	memset(arch_zone_lowest_possible_pfn, 0,
> > @@ -7333,14 +7342,7 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> >  				sizeof(arch_zone_highest_possible_pfn));
> >  
> >  	start_pfn = find_min_pfn_with_active_regions();
> > -
> > -	/*
> > -	 * Some architecturs, e.g. ARC may have ZONE_HIGHMEM below
> > -	 * ZONE_NORMAL. For such cases we allow max_zone_pfn sorted in the
> > -	 * descending order
> > -	 */
> > -	if (MAX_NR_ZONES > 1 && max_zone_pfn[0] > max_zone_pfn[1])
> > -		descending = true;
> > +	descending = arch_has_descending_max_zone_pfns();
> >  
> >  	for (i = 0; i < MAX_NR_ZONES; i++) {
> >  		if (descending)
> > 

-- 
Sincerely yours,
Mike.

  reply	other threads:[~2020-05-05  9:19 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29 12:11 [PATCH v2 00/20] mm: rework free_area_init*() funcitons Mike Rapoport
2020-04-29 12:11 ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 01/20] mm: memblock: replace dereferences of memblock_region.nid with API calls Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 02/20] mm: make early_pfn_to_nid() and related defintions close to each other Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 03/20] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-05-26 17:11   ` Catalin Marinas
2020-04-29 12:11 ` [PATCH v2 04/20] mm: free_area_init: use maximal zone PFNs rather than zone sizes Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 05/20] mm: use free_area_init() instead of free_area_init_nodes() Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-05-26 17:13   ` Catalin Marinas
2020-04-29 12:11 ` [PATCH v2 06/20] alpha: simplify detection of memory zone boundaries Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 07/20] arm: " Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 08/20] arm64: simplify detection of memory zone boundaries for UMA configs Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-05-26 17:15   ` Catalin Marinas
2020-04-29 12:11 ` [PATCH v2 09/20] csky: simplify detection of memory zone boundaries Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 10/20] m68k: mm: " Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 11/20] parisc: " Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 12/20] sparc32: " Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 13/20] unicore32: " Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 14/20] xtensa: " Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 15/20] mm: memmap_init: iterate over memblock regions rather that check each PFN Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 16/20] mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 14:17   ` Christoph Hellwig
2020-04-29 14:33     ` Mike Rapoport
2020-04-29 14:33       ` Mike Rapoport
2020-04-29 16:29   ` [PATCH v2.5 " Mike Rapoport
2020-04-29 16:29     ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 17/20] mm: free_area_init: allow defining max_zone_pfn in descending order Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-05-03 17:41   ` Guenter Roeck
2020-05-03 18:43     ` Guenter Roeck
2020-05-04 15:39       ` Mike Rapoport
2020-05-04 15:39         ` Mike Rapoport
2020-05-05  6:23         ` Vineet Gupta
2020-05-05  9:19           ` Mike Rapoport [this message]
2020-05-05 18:07             ` Vineet Gupta
2020-05-05 20:15               ` Mike Rapoport
2020-05-07 20:59                 ` Mike Rapoport
2020-05-07 21:21                   ` Vineet Gupta
2020-05-05 13:18         ` Guenter Roeck
2020-05-05 13:45           ` Mike Rapoport
2020-05-05 17:27           ` Vineet Gupta
2020-04-29 12:11 ` [PATCH v2 18/20] mm: clean up free_area_init_node() and its helpers Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 19/20] mm: simplify find_min_pfn_with_active_regions() Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport
2020-04-29 12:11 ` [PATCH v2 20/20] docs/vm: update memory-models documentation Mike Rapoport
2020-04-29 12:11   ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200505091946.GG342687@linux.ibm.com \
    --to=rppt@linux.ibm.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=Vineet.Gupta1@synopsys.com \
    --cc=bhe@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dalias@libc.org \
    --cc=gerg@linux-m68k.org \
    --cc=guoren@kernel.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-c6x-dev@linux-c6x.org \
    --cc=linux-csky@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kerne \
    --cc=mhocko@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).