linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP
@ 2024-05-10  9:03 Oscar Salvador
  2024-05-10 22:03 ` Matthew Wilcox
  2024-05-12 13:45 ` Mike Rapoport
  0 siblings, 2 replies; 6+ messages in thread
From: Oscar Salvador @ 2024-05-10  9:03 UTC (permalink / raw)
  To: lsf-pc, linux-mm

Hi all,

I would like to discuss the following topic in the LSFMM.

We have SPARSEMEM memory model and SPARSEMEM_VMEMMAP, where the only difference
between the two of them is that the latter allocates a virtual chunk to represent
the memmap array, which speeds operations like
page_to_pfn/pfn_to_page/nth_page/folio_page_idx.

page->flags layout would also be changed.

Currently we have (extracted from include/linux/page-flags-layout.h):

/*
 * page->flags layout:
 * No sparsemem or sparsemem vmemmap: |       NODE     | ZONE |
 * ... | FLAGS |
 *      " plus space for last_cpupid: |       NODE     | ZONE |
 *      LAST_CPUPID ... | FLAGS |
 * classic sparse with space for node:| SECTION | NODE | ZONE |
 * ... | FLAGS |
 *      " plus space for last_cpupid: | SECTION | NODE | ZONE |
 *      LAST_CPUPID ... | FLAGS |
 * classic sparse no space for node:  | SECTION |     ZONE    | ...
 * | FLAGS |
 */

The last three could be gone.

Also, by getting rid of SPARSEMEM we would also get rid of a non-trivial amount
of code.

I did some research on which arches use CONFIG_SPARSE_MEMMAP/VMEMMAP or
none(using flatmem?).
                   
 SPARSE_MEMMAP       SPARSE_VMEMMAP
 arc
 arm              X
 arm64            X                     X
 csky
 hexagon 
 loongarch        X                     X
 m68k
 microblaze
 mips             X
 nios2
 openrisc
 parisc
 powerpc          X                     X
 riscv            X                     X
 s390             X                     X
 sh               X
 sparc            X                     X
 um
 x86              X                     X
 xtensa

arm, mips, parisc and sh operate with SPARSE_MEMMAP but are lacking code for
SPARSE_VMEMMAP.
I think these archs should be the first thing to focus on, to see if we can
make them work on SPARSE_VMEMMAP.
If we can, and when all arches can run on SPARSE_VMEMMAP, we can think of killing
SPARSE_MEMMAP.

This is not for free, of course.
There is a certain memory overhead because we have to allocate the virtual
memmap for each section.

Taking MIPS as an example:

- SECTION_SIZE_BITS 28 (256MB)
  PAGE_SIZE_4KB:  4MB per section
  PAGE_SIZE_16KB: 1MB per section
  PAGE_SIZE_64KB: 256KB per section
- SECTION_SIZE_BITS 29 (512MB)
  PAGE_SIZE_64KB: 512KB per section

e.g: When the section size is 256MB and we operate on 4KB page size, we spend
     4MB for the virtual memmap array per section.
    (65536 pages per section * sizeof(struct page)) = 4MB

Ideally, we can discuss:

1) whether it makes sense to definitely switch over to SPARSE_VMEMMAP
2) if yes, how can this be arranged. Implementing all the code and giving 
   a grace period?
3) if not, why? and what can be done so we can proceed.

Thanks

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP
  2024-05-10  9:03 [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP Oscar Salvador
@ 2024-05-10 22:03 ` Matthew Wilcox
  2024-05-13  9:43   ` Oscar Salvador
  2024-05-12 13:45 ` Mike Rapoport
  1 sibling, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2024-05-10 22:03 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: lsf-pc, linux-mm

On Fri, May 10, 2024 at 11:03:14AM +0200, Oscar Salvador wrote:
> I did some research on which arches use CONFIG_SPARSE_MEMMAP/VMEMMAP or
> none(using flatmem?).
>                    
>  SPARSE_MEMMAP       SPARSE_VMEMMAP
>  arc
>  arm              X
>  arm64            X                     X
>  csky
>  hexagon 
>  loongarch        X                     X
>  m68k
>  microblaze
>  mips             X
>  nios2
>  openrisc
>  parisc
>  powerpc          X                     X
>  riscv            X                     X
>  s390             X                     X
>  sh               X
>  sparc            X                     X
>  um
>  x86              X                     X
>  xtensa
> 
> arm, mips, parisc and sh operate with SPARSE_MEMMAP but are lacking code for
> SPARSE_VMEMMAP.
> I think these archs should be the first thing to focus on, to see if we can
> make them work on SPARSE_VMEMMAP.
> If we can, and when all arches can run on SPARSE_VMEMMAP, we can think of killing
> SPARSE_MEMMAP.

I'm a little concerned about having this conversation without the
affected architecture maintainers in the room.  However, I can speak to
PA-RISC.

Early models have a dense memory layout and we need not be concerned
with them.  I'm not quite sure about the PA-7200 to PA-8500 ccio based
machines, would need to do some research.  For the PA-8500+ astro based
machines, the 256MB that would be in the range 3.75GB to 4GB is
relocated to 67.75-68GB to leave space for PCI mmio.  So if you have
a machine with 8GB of memory (fairly typical for a J6000 machine),
you'd have three ranges of memory:

0-3.75GB
4-8GB
67.75-68GB

and I'd like to see an analysis of how laying out memmap would differ
for those machines.

The PA-8800+ pluto chipset does something similar, except it
supports more memory and more PCI mmio, so a 32GB rp3440 would have
a memmap:

0-3GB
4-32GB
259-260GB

I think this would actually work better as three zones rather than
three sections.  It might match quite well with the TAO proposal.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP
  2024-05-10  9:03 [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP Oscar Salvador
  2024-05-10 22:03 ` Matthew Wilcox
@ 2024-05-12 13:45 ` Mike Rapoport
  1 sibling, 0 replies; 6+ messages in thread
From: Mike Rapoport @ 2024-05-12 13:45 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: lsf-pc, linux-mm

On Fri, May 10, 2024 at 11:03:14AM +0200, Oscar Salvador wrote:
> Hi all,
> 
> I would like to discuss the following topic in the LSFMM.
> 
> We have SPARSEMEM memory model and SPARSEMEM_VMEMMAP, where the only difference
> between the two of them is that the latter allocates a virtual chunk to represent
> the memmap array, which speeds operations like
> page_to_pfn/pfn_to_page/nth_page/folio_page_idx.
 
...
 
> I did some research on which arches use CONFIG_SPARSE_MEMMAP/VMEMMAP or
> none(using flatmem?).

FLATMEM it is.
                    
>  SPARSE_MEMMAP       SPARSE_VMEMMAP
>  arc
>  arm              X
>  arm64            X                     X
>  csky
>  hexagon 
>  loongarch        X                     X
>  m68k
>  microblaze
>  mips             X
>  nios2
>  openrisc
>  parisc
>  powerpc          X                     X
>  riscv            X                     X
>  s390             X                     X
>  sh               X
>  sparc            X                     X
>  um
>  x86              X                     X
>  xtensa
> 
> arm, mips, parisc and sh operate with SPARSE_MEMMAP but are lacking code for
> SPARSE_VMEMMAP.

arm can use "sparse flatmem" in the sense that it can free unused parts of
the memory map so FLATMEM does not have memory overhead with sparsely
located memory banks. So possible we can just drop SPARSEMEM on arm.

But I agree with Matthew that for this discussion we'd need arch
maintainers to participate.

> Thanks
> 
> -- 
> Oscar Salvador
> SUSE Labs
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP
  2024-05-10 22:03 ` Matthew Wilcox
@ 2024-05-13  9:43   ` Oscar Salvador
  2024-05-13 10:12     ` Oscar Salvador
  2024-05-13 23:02     ` Mike Rapoport
  0 siblings, 2 replies; 6+ messages in thread
From: Oscar Salvador @ 2024-05-13  9:43 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: lsf-pc, linux-mm

On Fri, May 10, 2024 at 11:03:58PM +0100, Matthew Wilcox wrote:
> I'm a little concerned about having this conversation without the
> affected architecture maintainers in the room.  However, I can speak to
> PA-RISC.

Yes, having the architecture maintainers would be great.

> Early models have a dense memory layout and we need not be concerned
> with them.  I'm not quite sure about the PA-7200 to PA-8500 ccio based
> machines, would need to do some research.  For the PA-8500+ astro based
> machines, the 256MB that would be in the range 3.75GB to 4GB is
> relocated to 67.75-68GB to leave space for PCI mmio.  So if you have
> a machine with 8GB of memory (fairly typical for a J6000 machine),
> you'd have three ranges of memory:
> 
> 0-3.75GB
> 4-8GB
> 67.75-68GB
> 
> and I'd like to see an analysis of how laying out memmap would differ
> for those machines.

Maybe Mike can prove me wrong, but I assume that memblock will report
the above ranges as memory, and the 3.75GB to 4GB as somewhat reserved.
Then, we only mark those sections falling within the ranges reported as
having memory by memblock as present, and we only populate the memmap
for present sections.

So, those ranges from above will be represented by present sections
and hence with the vmemmap populated, and anything that falls off
will not.

I am not sure if I got your concern right though.


> The PA-8800+ pluto chipset does something similar, except it
> supports more memory and more PCI mmio, so a 32GB rp3440 would have
> a memmap:
> 
> 0-3GB
> 4-32GB
> 259-260GB
> 
> I think this would actually work better as three zones rather than
> three sections.  It might match quite well with the TAO proposal.

But those are not really sections, are they? But rather memory ranges.
Checking the code from parisc, sections are at 128MB granularity.
So you will have 

0-3GB ->     0    .. 23   section (memmap populated)
4-32GB ->    31   .. 255  section (memmap populated)
259-260GB -> 2071 .. 2079 section (memmap populated)



-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP
  2024-05-13  9:43   ` Oscar Salvador
@ 2024-05-13 10:12     ` Oscar Salvador
  2024-05-13 23:02     ` Mike Rapoport
  1 sibling, 0 replies; 6+ messages in thread
From: Oscar Salvador @ 2024-05-13 10:12 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: lsf-pc, linux-mm

On Mon, May 13, 2024 at 11:43:09AM +0200, Oscar Salvador wrote:
> 0-3GB ->     0    .. 23   section (memmap populated)
> 4-32GB ->    31   .. 255  section (memmap populated)
> 259-260GB -> 2071 .. 2079 section (memmap populated)

And just to add more info, considering a 128MB section granularity (which seems
to be what paric has) a PAGE_SIZE of 4KB (not really sure which page size parisc
can deal with), PAGES_PER_SECTION then being 32768 and the vmemmap range starting
at 0xffffea0000000000, we would have the following vmemmap layout:

0-3GB     ->  0xffffea0000000000 - 0xffffea0003000000
4-32GB    ->  0xffffea0004000000 - 0xffffea0007800000
259-260GB ->  0xffffea0023e00000 - 0xffffea0024e00000

I am not sure though whether we can hit any limitation when it comes to
the vmemmap space we have available.

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP
  2024-05-13  9:43   ` Oscar Salvador
  2024-05-13 10:12     ` Oscar Salvador
@ 2024-05-13 23:02     ` Mike Rapoport
  1 sibling, 0 replies; 6+ messages in thread
From: Mike Rapoport @ 2024-05-13 23:02 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: Matthew Wilcox, lsf-pc, linux-mm

On Mon, May 13, 2024 at 11:43:09AM +0200, Oscar Salvador wrote:
> On Fri, May 10, 2024 at 11:03:58PM +0100, Matthew Wilcox wrote:
> > I'm a little concerned about having this conversation without the
> > affected architecture maintainers in the room.  However, I can speak to
> > PA-RISC.
> 
> Yes, having the architecture maintainers would be great.
> 
> > Early models have a dense memory layout and we need not be concerned
> > with them.  I'm not quite sure about the PA-7200 to PA-8500 ccio based
> > machines, would need to do some research.  For the PA-8500+ astro based
> > machines, the 256MB that would be in the range 3.75GB to 4GB is
> > relocated to 67.75-68GB to leave space for PCI mmio.  So if you have
> > a machine with 8GB of memory (fairly typical for a J6000 machine),
> > you'd have three ranges of memory:
> > 
> > 0-3.75GB
> > 4-8GB
> > 67.75-68GB
> > 
> > and I'd like to see an analysis of how laying out memmap would differ
> > for those machines.
> 
> Maybe Mike can prove me wrong, but I assume that memblock will report
> the above ranges as memory, and the 3.75GB to 4GB as somewhat reserved.

The populated ranges will be reported as memory and it seems that there
just will be a hole at the 3.75GB-4GB range. Not that it's important from
the sections layout perspective.

> -- 
> Oscar Salvador
> SUSE Labs
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-13 23:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-10  9:03 [LSF/MM/BFP TOPIC] Deprecate SPARSEMEM and have only SPARSEMEM_VMEMMAP Oscar Salvador
2024-05-10 22:03 ` Matthew Wilcox
2024-05-13  9:43   ` Oscar Salvador
2024-05-13 10:12     ` Oscar Salvador
2024-05-13 23:02     ` Mike Rapoport
2024-05-12 13:45 ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).