* Re: [PATCH 0/4] sparsemem intro patches
@ 2005-03-15 2:30 ` Andrew Morton
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2005-03-15 2:30 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-mm, linux-kernel
Dave Hansen <haveblue@us.ibm.com> wrote:
>
> The following four patches provide the last needed changes before the
> introduction of sparsemem. For a more complete description of what this
> will do, please see this patch:
>
> http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch
I don't know what to think about this. Can you describe sparsemem a little
further, differentiate it from discontigmem and tell us why we want one?
Is it for memory hotplug? If so, how does it support hotplug?
To which architectures is this useful, and what is the attitude of the
relevant maintenance teams?
Quoting from the above patch:
> Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that
> it can eventually become a complete replacement.
> ...
> This patch introduces CONFIG_FLATMEM. It is used in almost all
> cases where there used to be an #ifndef DISCONTIG, because
> SPARSEMEM and DISCONTIGMEM often have to compile out the same areas
> of code.
Would I be right to worry about increasing complexity, decreased
maintainability and generally increasing mayhem?
If a competent kernel developer who is not familiar with how all this code
hangs together wishes to acquaint himself with it, what steps should he
take?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] sparsemem intro patches
2005-03-15 2:30 ` Andrew Morton
@ 2005-03-15 3:53 ` Dave Hansen
-1 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2005-03-15 3:53 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Linux Kernel Mailing List, Matthew E Tolentino,
Jesse Barnes, Mike Kravetz, Bob Picco, Joel Schopp,
Andy Whitcroft
On Mon, 2005-03-14 at 18:30 -0800, Andrew Morton wrote:
> Dave Hansen <haveblue@us.ibm.com> wrote:
> >
> > The following four patches provide the last needed changes before the
> > introduction of sparsemem. For a more complete description of what this
> > will do, please see this patch:
> >
> > http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch
>
> I don't know what to think about this. Can you describe sparsemem a little
> further, differentiate it from discontigmem and tell us why we want one?
>
> Is it for memory hotplug? If so, how does it support hotplug?
Sparsemem is more flexible than discontig, and not tied to any existing
NUMA or MM structures like zones or pgdats. That makes it ideal for
hotplug where those structures are going to be coming and going, sliced
and diced.
Another advantage is that sparse doesn't require each NUMA node's ranges
to be contiguous. It can handle overlapping ranges between nodes with
no problems, where DISCONTIGMEM currently throws away that memory.
DISCONTIGMEM also requires that memory *inside* of a node be contiguous,
and have mem_map for all of it. A once 64GB NUMA node with 63GB of the
memory removed wouldn't have much space left for anything but its
mem_map without sparsemem.
> To which architectures is this useful, and what is the attitude of the
> relevant maintenance teams?
We have implementations for NUMAQ, x86 Summit, flat x86, flat x86-64,
flat and NUMA ppc64, and some ia64 configurations. All of those can
either do simulated, virtualized, or actual hardware memory hotplug of
some kind based on the sparsemem implementations.
Not to put words in their mouths, but there hasn't been anything
negative that I can recall in a while from the architecture maintainers.
What was said that was negative was months ago, and resolved. We've
been talking about this to most of them for quite a while now, and I
think they've grown accustomed to the idea. :)
I've cc'd all of the guilty parties. Perhaps they can fill in my vague
statements with actual facts. But, here are the vague statements
anyway:
i386 - Martin Bligh seems happy with it, he helped design it.
x86-64 - Matt Tolentino has approached Andi Kleen with the necessary
cleanups, and I believe the reaction has been positive. I
think Andi had some other non-hotplug plans for sparsemem, too.
ppc64 - I can bribe Anton and Paul's employer. Mike Kravetz and Joel
Schopp have been working on this port, and I believe they've
kept the maintainers informed and calm.
ia64 - Quote from Jesse Barnes (November 19, 2004):
> CONFIG_NONLINEAR (SPARSE's old name) should be the *only*
> memory init code on ia64 when this is done. That means
> getting rid of both discontig and contig and virtual memmap...
I believe Jesse's been keeping up with the development as well.
> Quoting from the above patch:
>
> > Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that
> > it can eventually become a complete replacement.
> > ...
> > This patch introduces CONFIG_FLATMEM. It is used in almost all
> > cases where there used to be an #ifndef DISCONTIG, because
> > SPARSEMEM and DISCONTIGMEM often have to compile out the same areas
> > of code.
>
> Would I be right to worry about increasing complexity, decreased
> maintainability and generally increasing mayhem?
You certainly would be. For the time being, this increases the number
of config options and places for us to screw up. However, I am
confident at this point that we're doing the right thing. We had a more
complicated version of sparsemem at first. We stripped it down to the
bare bones, and that's what we would like to submit soon. It has the
capability to replace discontig, and will eventually _reduce_
complexity.
One of my favorite ways to demonstrate why I think it's *simple* are the
architecture ports. The longest added function that I can find in the
ports is 17 lines including whitespace.
139 insertions(+), 36 deletions(-) for ia64:
http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-180-sparsemem-ia64.patch
75 insertions(+), 17 deletions(-) for ppc64:
http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-170-sparsemem-ppc64.patch
x86_64 is broken up a little more, but it's probably smaller than the
ppc64 one.
> If a competent kernel developer who is not familiar with how all this code
> hangs together wishes to acquaint himself with it, what steps should he
> take?
Dan Phillips spelled out the basic concepts of chopping things up into
sections a few years ago:
http://lwn.net/2002/0411/a/discontig.php3
However, we haven't yet implemented the phys_to_virt() translations that
he envisioned. We don't need that until unless we need some advanced
hot-remove features which are many, many months away.
Where should a competent kernel developer look to understand the code
more?
The sparsemem implementation isn't horribly deep. At the implementation
level, it replaces pfn_to_page() and page_to_pfn(). It does that with
an array lookup and some bits from page->flags. I'd check out a few
architectures' current implementations of those functions as well as the
one in the patch referenced at the beginning of the mail:
B-sparse-150-sparsemem.patch .
Next, see how the memory_present() abstraction allows the memory layout
of the system to be either encoded in arch-specific discontig structures
or fed into the arch-independent structures that sparse_init() uses to
set up the mem_section[] array.
You could also go look at some of the hotplug code, but this email is
getting long enough as it is :)
-- Dave
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 0/4] sparsemem intro patches
@ 2005-03-15 3:53 ` Dave Hansen
0 siblings, 0 replies; 22+ messages in thread
From: Dave Hansen @ 2005-03-15 3:53 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, Linux Kernel Mailing List, Matthew E Tolentino,
Jesse Barnes, Mike Kravetz, Bob Picco, Joel Schopp,
Andy Whitcroft
On Mon, 2005-03-14 at 18:30 -0800, Andrew Morton wrote:
> Dave Hansen <haveblue@us.ibm.com> wrote:
> >
> > The following four patches provide the last needed changes before the
> > introduction of sparsemem. For a more complete description of what this
> > will do, please see this patch:
> >
> > http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch
>
> I don't know what to think about this. Can you describe sparsemem a little
> further, differentiate it from discontigmem and tell us why we want one?
>
> Is it for memory hotplug? If so, how does it support hotplug?
Sparsemem is more flexible than discontig, and not tied to any existing
NUMA or MM structures like zones or pgdats. That makes it ideal for
hotplug where those structures are going to be coming and going, sliced
and diced.
Another advantage is that sparse doesn't require each NUMA node's ranges
to be contiguous. It can handle overlapping ranges between nodes with
no problems, where DISCONTIGMEM currently throws away that memory.
DISCONTIGMEM also requires that memory *inside* of a node be contiguous,
and have mem_map for all of it. A once 64GB NUMA node with 63GB of the
memory removed wouldn't have much space left for anything but its
mem_map without sparsemem.
> To which architectures is this useful, and what is the attitude of the
> relevant maintenance teams?
We have implementations for NUMAQ, x86 Summit, flat x86, flat x86-64,
flat and NUMA ppc64, and some ia64 configurations. All of those can
either do simulated, virtualized, or actual hardware memory hotplug of
some kind based on the sparsemem implementations.
Not to put words in their mouths, but there hasn't been anything
negative that I can recall in a while from the architecture maintainers.
What was said that was negative was months ago, and resolved. We've
been talking about this to most of them for quite a while now, and I
think they've grown accustomed to the idea. :)
I've cc'd all of the guilty parties. Perhaps they can fill in my vague
statements with actual facts. But, here are the vague statements
anyway:
i386 - Martin Bligh seems happy with it, he helped design it.
x86-64 - Matt Tolentino has approached Andi Kleen with the necessary
cleanups, and I believe the reaction has been positive. I
think Andi had some other non-hotplug plans for sparsemem, too.
ppc64 - I can bribe Anton and Paul's employer. Mike Kravetz and Joel
Schopp have been working on this port, and I believe they've
kept the maintainers informed and calm.
ia64 - Quote from Jesse Barnes (November 19, 2004):
> CONFIG_NONLINEAR (SPARSE's old name) should be the *only*
> memory init code on ia64 when this is done. That means
> getting rid of both discontig and contig and virtual memmap...
I believe Jesse's been keeping up with the development as well.
> Quoting from the above patch:
>
> > Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that
> > it can eventually become a complete replacement.
> > ...
> > This patch introduces CONFIG_FLATMEM. It is used in almost all
> > cases where there used to be an #ifndef DISCONTIG, because
> > SPARSEMEM and DISCONTIGMEM often have to compile out the same areas
> > of code.
>
> Would I be right to worry about increasing complexity, decreased
> maintainability and generally increasing mayhem?
You certainly would be. For the time being, this increases the number
of config options and places for us to screw up. However, I am
confident at this point that we're doing the right thing. We had a more
complicated version of sparsemem at first. We stripped it down to the
bare bones, and that's what we would like to submit soon. It has the
capability to replace discontig, and will eventually _reduce_
complexity.
One of my favorite ways to demonstrate why I think it's *simple* are the
architecture ports. The longest added function that I can find in the
ports is 17 lines including whitespace.
139 insertions(+), 36 deletions(-) for ia64:
http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-180-sparsemem-ia64.patch
75 insertions(+), 17 deletions(-) for ppc64:
http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-170-sparsemem-ppc64.patch
x86_64 is broken up a little more, but it's probably smaller than the
ppc64 one.
> If a competent kernel developer who is not familiar with how all this code
> hangs together wishes to acquaint himself with it, what steps should he
> take?
Dan Phillips spelled out the basic concepts of chopping things up into
sections a few years ago:
http://lwn.net/2002/0411/a/discontig.php3
However, we haven't yet implemented the phys_to_virt() translations that
he envisioned. We don't need that until unless we need some advanced
hot-remove features which are many, many months away.
Where should a competent kernel developer look to understand the code
more?
The sparsemem implementation isn't horribly deep. At the implementation
level, it replaces pfn_to_page() and page_to_pfn(). It does that with
an array lookup and some bits from page->flags. I'd check out a few
architectures' current implementations of those functions as well as the
one in the patch referenced at the beginning of the mail:
B-sparse-150-sparsemem.patch .
Next, see how the memory_present() abstraction allows the memory layout
of the system to be either encoded in arch-specific discontig structures
or fed into the arch-independent structures that sparse_init() uses to
set up the mem_section[] array.
You could also go look at some of the hotplug code, but this email is
getting long enough as it is :)
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] sparsemem intro patches
2005-03-15 2:30 ` Andrew Morton
@ 2005-03-15 14:56 ` Martin J. Bligh
-1 siblings, 0 replies; 22+ messages in thread
From: Martin J. Bligh @ 2005-03-15 14:56 UTC (permalink / raw)
To: Andrew Morton, Dave Hansen, Andy Whitcroft, Dave McCracken,
Daniel Phillips
Cc: linux-mm, linux-kernel
>> The following four patches provide the last needed changes before the
>> introduction of sparsemem. For a more complete description of what this
>> will do, please see this patch:
>>
>> http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch
>
> I don't know what to think about this. Can you describe sparsemem a little
> further, differentiate it from discontigmem and tell us why we want one?
> Is it for memory hotplug? If so, how does it support hotplug?
>
> To which architectures is this useful, and what is the attitude of the
> relevant maintenance teams?
This isn't just for hotplug by any means. Andy wrote it to get rid of a whole
bunch of different problems, roughly based on some previous work by Dan Phillips
and Dave McCracken (I've added a cc to the actual authors of these patches).
This is the major part of what used to be called CONFIG_NONLINEAR, which we
discussed at last year's kernel summit, and people were pretty enthusiastic
about.
> Quoting from the above patch:
>
>> Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that
>> it can eventually become a complete replacement.
>> ...
>> This patch introduces CONFIG_FLATMEM. It is used in almost all
>> cases where there used to be an #ifndef DISCONTIG, because
>> SPARSEMEM and DISCONTIGMEM often have to compile out the same areas
>> of code.
>
> Would I be right to worry about increasing complexity, decreased
> maintainability and generally increasing mayhem?
Not really - it cleans up the current mess where discontigmem means, and
is used for, two distinct things: 1. the memory is significantly non-contig
in the physical layout. 2. NUMA support.
It also allows us to support discontiguous memory *within* a NUMA node, which
is important for some systems - we can scrap the added complexity of ia64s
vmemmap stuff, for instance.
Whatever your opinions are on mem hotplug, I think we want CONFIG_SPARSEMEM
to clean up the existing mess of discontig - with or without hotplug. I've
wanted this for a very long time, and was dicussing it with Andy at OLS last
year; he came up with a much better, cleaner way to implement it than I had.
It also makes a lot of sense as a foundation for hotplug, which multiple
people seem to want for virtualization stuff.
Anyway, that's what I want it for ;-)
> If a competent kernel developer who is not familiar with how all this code
> hangs together wishes to acquaint himself with it, what steps should he
> take?
Andy, can you explain that further? Maybe also worth checking these are the
correct version of your patches.
M.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] sparsemem intro patches
@ 2005-03-15 14:56 ` Martin J. Bligh
0 siblings, 0 replies; 22+ messages in thread
From: Martin J. Bligh @ 2005-03-15 14:56 UTC (permalink / raw)
To: Andrew Morton, Dave Hansen, Andy Whitcroft, Dave McCracken,
Daniel Phillips
Cc: linux-mm, linux-kernel
>> The following four patches provide the last needed changes before the
>> introduction of sparsemem. For a more complete description of what this
>> will do, please see this patch:
>>
>> http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch
>
> I don't know what to think about this. Can you describe sparsemem a little
> further, differentiate it from discontigmem and tell us why we want one?
> Is it for memory hotplug? If so, how does it support hotplug?
>
> To which architectures is this useful, and what is the attitude of the
> relevant maintenance teams?
This isn't just for hotplug by any means. Andy wrote it to get rid of a whole
bunch of different problems, roughly based on some previous work by Dan Phillips
and Dave McCracken (I've added a cc to the actual authors of these patches).
This is the major part of what used to be called CONFIG_NONLINEAR, which we
discussed at last year's kernel summit, and people were pretty enthusiastic
about.
> Quoting from the above patch:
>
>> Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that
>> it can eventually become a complete replacement.
>> ...
>> This patch introduces CONFIG_FLATMEM. It is used in almost all
>> cases where there used to be an #ifndef DISCONTIG, because
>> SPARSEMEM and DISCONTIGMEM often have to compile out the same areas
>> of code.
>
> Would I be right to worry about increasing complexity, decreased
> maintainability and generally increasing mayhem?
Not really - it cleans up the current mess where discontigmem means, and
is used for, two distinct things: 1. the memory is significantly non-contig
in the physical layout. 2. NUMA support.
It also allows us to support discontiguous memory *within* a NUMA node, which
is important for some systems - we can scrap the added complexity of ia64s
vmemmap stuff, for instance.
Whatever your opinions are on mem hotplug, I think we want CONFIG_SPARSEMEM
to clean up the existing mess of discontig - with or without hotplug. I've
wanted this for a very long time, and was dicussing it with Andy at OLS last
year; he came up with a much better, cleaner way to implement it than I had.
It also makes a lot of sense as a foundation for hotplug, which multiple
people seem to want for virtualization stuff.
Anyway, that's what I want it for ;-)
> If a competent kernel developer who is not familiar with how all this code
> hangs together wishes to acquaint himself with it, what steps should he
> take?
Andy, can you explain that further? Maybe also worth checking these are the
correct version of your patches.
M.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] sparsemem intro patches
2005-03-15 2:30 ` Andrew Morton
@ 2005-03-17 16:21 ` Andy Whitcroft
-1 siblings, 0 replies; 22+ messages in thread
From: Andy Whitcroft @ 2005-03-17 16:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: Dave Hansen, linux-mm, linux-kernel, Martin J. Bligh
Andrew Morton wrote:
> Dave Hansen <haveblue@us.ibm.com> wrote:
>
>> The following four patches provide the last needed changes before the
>> introduction of sparsemem. For a more complete description of what this
>> will do, please see this patch:
>>
>> http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch
> I don't know what to think about this. Can you describe sparsemem a little
> further, differentiate it from discontigmem and tell us why we want one?
> Is it for memory hotplug? If so, how does it support hotplug?
SPARSEMEM was born out of discussions which followed the OLS last year
over the NONLINEAR memory model which was being proposed for hotplug.
We got interested as it appeared that a simple form of NONLINEAR memory
could help us handle some problematics cases with DISCONTIG memory.
Particularly the case where we have large intra-node memory holes.
The DISCONTIGMEM memory model appears to have been designed to handle
discontiguous UMA configuration. It was subsequently put into service
to provide node support under NUMA configurations. This dual use seems
to have led to confusing code and compromises on functionality. In its
current form we can only express inter-node memory spaces, making it
majorly inefficient for NUMA systems with sparse physical inter-node
memory maps, effectivly not supporting some configurations. Also,
although DISCONTIGMEM is a common model between a number of
architectures there is almost no code overlap.
SPARSEMEM essentially is a replacement for DISCONTIGMEM providing
support for non-contigious memory but with the advantage of handling
both inter- and intra-node memory holes. The goal of the implementation
was to design a clean memory memory model covering the needs of both UMA
and NUMA discontigouos memory layouts whilst providing a basis for
hotplug. This should allow us to consolidate the implementation of
various "discontiguous" memory model whilst trying to fix its short comings.
Hotplug at its most complex puts two requirements on the memory model.
Firstly, It requires the arbirary replacement of physical memory with
memory which may be at a different address (the breaking of V=P+c) to
cope with the case of memory replacement under unmovable kernel objects.
Secondly, it requires we cope with memory "all over" the physical map.
SPARSEMEM is geared towards providing the required infrastructure for
NONLINEAR memory needed in hotplug. The idea being that NONLINEAR would
be layered on top of it and share its implementation.
-apw.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] sparsemem intro patches
@ 2005-03-17 16:21 ` Andy Whitcroft
0 siblings, 0 replies; 22+ messages in thread
From: Andy Whitcroft @ 2005-03-17 16:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: Dave Hansen, linux-mm, linux-kernel, Martin J. Bligh
Andrew Morton wrote:
> Dave Hansen <haveblue@us.ibm.com> wrote:
>
>> The following four patches provide the last needed changes before the
>> introduction of sparsemem. For a more complete description of what this
>> will do, please see this patch:
>>
>> http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch
> I don't know what to think about this. Can you describe sparsemem a little
> further, differentiate it from discontigmem and tell us why we want one?
> Is it for memory hotplug? If so, how does it support hotplug?
SPARSEMEM was born out of discussions which followed the OLS last year
over the NONLINEAR memory model which was being proposed for hotplug.
We got interested as it appeared that a simple form of NONLINEAR memory
could help us handle some problematics cases with DISCONTIG memory.
Particularly the case where we have large intra-node memory holes.
The DISCONTIGMEM memory model appears to have been designed to handle
discontiguous UMA configuration. It was subsequently put into service
to provide node support under NUMA configurations. This dual use seems
to have led to confusing code and compromises on functionality. In its
current form we can only express inter-node memory spaces, making it
majorly inefficient for NUMA systems with sparse physical inter-node
memory maps, effectivly not supporting some configurations. Also,
although DISCONTIGMEM is a common model between a number of
architectures there is almost no code overlap.
SPARSEMEM essentially is a replacement for DISCONTIGMEM providing
support for non-contigious memory but with the advantage of handling
both inter- and intra-node memory holes. The goal of the implementation
was to design a clean memory memory model covering the needs of both UMA
and NUMA discontigouos memory layouts whilst providing a basis for
hotplug. This should allow us to consolidate the implementation of
various "discontiguous" memory model whilst trying to fix its short comings.
Hotplug at its most complex puts two requirements on the memory model.
Firstly, It requires the arbirary replacement of physical memory with
memory which may be at a different address (the breaking of V=P+c) to
cope with the case of memory replacement under unmovable kernel objects.
Secondly, it requires we cope with memory "all over" the physical map.
SPARSEMEM is geared towards providing the required infrastructure for
NONLINEAR memory needed in hotplug. The idea being that NONLINEAR would
be layered on top of it and share its implementation.
-apw.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread