* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
[not found] ` <BANLkTim+m-v-4k17HUSOYSbmNFDtJTgD6g@mail.gmail.com>
@ 2011-04-20 14:15 ` James Bottomley
2011-04-20 14:15 ` James Bottomley
2011-04-20 14:50 ` Christoph Lameter
0 siblings, 2 replies; 15+ messages in thread
From: James Bottomley @ 2011-04-20 14:15 UTC (permalink / raw)
To: Pekka Enberg
Cc: Matthew Wilcox, KOSAKI Motohiro, Christoph Lameter, Michal Hocko,
Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
David Rientjes, Ingo Molnar, x86 maintainers, linux-arch
[added linux-arch to cc since we're going to be affecting them]
On Wed, 2011-04-20 at 14:28 +0300, Pekka Enberg wrote:
> Right. My point was simply that since x86 doesn't support DISCONTIGMEM
> without NUMA, the misunderstanding is likely very wide-spread.
Why don't we approach the problem in a few separate ways then.
1. We can look at what imposing NUMA on the DISCONTIGMEM archs
would do ... the embedded ones are going to be hardest hit, but
if it's not too much extra code, it might be palatable.
2. The other is that we can audit mm to look at all the node
assumptions in the non-numa case. My suspicion is that
accidentally or otherwise, it mostly works for the normal case,
so there might not be much needed to pull it back to working
properly for DISCONTIGMEM.
3. Finally we could look at deprecating DISCONTIGMEM in favour of
SPARSEMEM, but we'd still need to fix -stable for that case.
Especially as it will take time to convert all the architectures
I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
framework which allows machines with split physical memory ranges to
work. That's a very common case nowadays. Numa is supposed to be a
heavyweight framework to preserve node locality for non-uniform memory
access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 14:15 ` [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards James Bottomley
@ 2011-04-20 14:15 ` James Bottomley
2011-04-20 14:50 ` Christoph Lameter
1 sibling, 0 replies; 15+ messages in thread
From: James Bottomley @ 2011-04-20 14:15 UTC (permalink / raw)
To: Pekka Enberg
Cc: Matthew Wilcox, KOSAKI Motohiro, Christoph Lameter, Michal Hocko,
Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
David Rientjes, Ingo Molnar, x86 maintainers, linux-arch
[added linux-arch to cc since we're going to be affecting them]
On Wed, 2011-04-20 at 14:28 +0300, Pekka Enberg wrote:
> Right. My point was simply that since x86 doesn't support DISCONTIGMEM
> without NUMA, the misunderstanding is likely very wide-spread.
Why don't we approach the problem in a few separate ways then.
1. We can look at what imposing NUMA on the DISCONTIGMEM archs
would do ... the embedded ones are going to be hardest hit, but
if it's not too much extra code, it might be palatable.
2. The other is that we can audit mm to look at all the node
assumptions in the non-numa case. My suspicion is that
accidentally or otherwise, it mostly works for the normal case,
so there might not be much needed to pull it back to working
properly for DISCONTIGMEM.
3. Finally we could look at deprecating DISCONTIGMEM in favour of
SPARSEMEM, but we'd still need to fix -stable for that case.
Especially as it will take time to convert all the architectures
I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
framework which allows machines with split physical memory ranges to
work. That's a very common case nowadays. Numa is supposed to be a
heavyweight framework to preserve node locality for non-uniform memory
access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 14:15 ` [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards James Bottomley
2011-04-20 14:15 ` James Bottomley
@ 2011-04-20 14:50 ` Christoph Lameter
2011-04-20 14:50 ` Christoph Lameter
2011-04-20 15:02 ` James Bottomley
1 sibling, 2 replies; 15+ messages in thread
From: Christoph Lameter @ 2011-04-20 14:50 UTC (permalink / raw)
To: James Bottomley
Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, 20 Apr 2011, James Bottomley wrote:
> 1. We can look at what imposing NUMA on the DISCONTIGMEM archs
> would do ... the embedded ones are going to be hardest hit, but
> if it's not too much extra code, it might be palatable.
> 2. The other is that we can audit mm to look at all the node
> assumptions in the non-numa case. My suspicion is that
> accidentally or otherwise, it mostly works for the normal case,
> so there might not be much needed to pull it back to working
> properly for DISCONTIGMEM.
The older code may work. SLAB f.e. does not call page_to_nid() in the
!NUMA case but keeps special metadata structures around in each slab page
that records the node used for allocation. The problem is with new code
added/revised in the last 5 years or so that uses page_to_nid() and
allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
the page allocator that should trigger if page_to_nid() returns strange
values. I wonder why that never occurred.
> 3. Finally we could look at deprecating DISCONTIGMEM in favour
of > SPARSEMEM, but we'd still need to fix -stable for that case.
> Especially as it will take time to convert all the architectures
The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
need an audit of the core VM before removing that or making it contingent
on the configurations of various VM subsystems.
> I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> framework which allows machines with split physical memory ranges to
> work. That's a very common case nowadays. Numa is supposed to be a
> heavyweight framework to preserve node locality for non-uniform memory
> access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
Well yes but we have SPARSE for that today. DISCONTIG with multiple per
pgdat structures in a !NUMA case is just weird and unexpected for many who
have done VM coding in the last years.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 14:50 ` Christoph Lameter
@ 2011-04-20 14:50 ` Christoph Lameter
2011-04-20 15:02 ` James Bottomley
1 sibling, 0 replies; 15+ messages in thread
From: Christoph Lameter @ 2011-04-20 14:50 UTC (permalink / raw)
To: James Bottomley
Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, 20 Apr 2011, James Bottomley wrote:
> 1. We can look at what imposing NUMA on the DISCONTIGMEM archs
> would do ... the embedded ones are going to be hardest hit, but
> if it's not too much extra code, it might be palatable.
> 2. The other is that we can audit mm to look at all the node
> assumptions in the non-numa case. My suspicion is that
> accidentally or otherwise, it mostly works for the normal case,
> so there might not be much needed to pull it back to working
> properly for DISCONTIGMEM.
The older code may work. SLAB f.e. does not call page_to_nid() in the
!NUMA case but keeps special metadata structures around in each slab page
that records the node used for allocation. The problem is with new code
added/revised in the last 5 years or so that uses page_to_nid() and
allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
the page allocator that should trigger if page_to_nid() returns strange
values. I wonder why that never occurred.
> 3. Finally we could look at deprecating DISCONTIGMEM in favour
of > SPARSEMEM, but we'd still need to fix -stable for that case.
> Especially as it will take time to convert all the architectures
The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
need an audit of the core VM before removing that or making it contingent
on the configurations of various VM subsystems.
> I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> framework which allows machines with split physical memory ranges to
> work. That's a very common case nowadays. Numa is supposed to be a
> heavyweight framework to preserve node locality for non-uniform memory
> access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
Well yes but we have SPARSE for that today. DISCONTIG with multiple per
pgdat structures in a !NUMA case is just weird and unexpected for many who
have done VM coding in the last years.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 14:50 ` Christoph Lameter
2011-04-20 14:50 ` Christoph Lameter
@ 2011-04-20 15:02 ` James Bottomley
2011-04-20 15:02 ` James Bottomley
2011-04-20 15:22 ` Christoph Lameter
1 sibling, 2 replies; 15+ messages in thread
From: James Bottomley @ 2011-04-20 15:02 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, 2011-04-20 at 09:50 -0500, Christoph Lameter wrote:
> On Wed, 20 Apr 2011, James Bottomley wrote:
>
> > 1. We can look at what imposing NUMA on the DISCONTIGMEM archs
> > would do ... the embedded ones are going to be hardest hit, but
> > if it's not too much extra code, it might be palatable.
> > 2. The other is that we can audit mm to look at all the node
> > assumptions in the non-numa case. My suspicion is that
> > accidentally or otherwise, it mostly works for the normal case,
> > so there might not be much needed to pull it back to working
> > properly for DISCONTIGMEM.
>
> The older code may work. SLAB f.e. does not call page_to_nid() in the
> !NUMA case but keeps special metadata structures around in each slab page
> that records the node used for allocation. The problem is with new code
> added/revised in the last 5 years or so that uses page_to_nid() and
> allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
> the page allocator that should trigger if page_to_nid() returns strange
> values. I wonder why that never occurred.
Actually, I think slab got changed when discontigmem was added ...
that's why it all works OK.
> > 3. Finally we could look at deprecating DISCONTIGMEM in favour
> of > SPARSEMEM, but we'd still need to fix -stable for that case.
> > Especially as it will take time to convert all the architectures
>
> The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
> need an audit of the core VM before removing that or making it contingent
> on the configurations of various VM subsystems.
Don't be stupid ... that would cause six architectures to get marked
broken.
> > I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> > framework which allows machines with split physical memory ranges to
> > work. That's a very common case nowadays. Numa is supposed to be a
> > heavyweight framework to preserve node locality for non-uniform memory
> > access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
>
> Well yes but we have SPARSE for that today. DISCONTIG with multiple per
> pgdat structures in a !NUMA case is just weird and unexpected for many who
> have done VM coding in the last years.
Look, I'm not really interested in who understands what. The fact is we
have six architectures with the possibility for DISCONTIGMEM && !NUMA,
so that's the case we need to fix in -stable.
They oops with SLUB, as far as I can tell, there are still no oops
reports with SLAB. The simplest -stable fix seems to be to mark SLUB
broken on DISCONTIGMEM && !NUMA.
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 15:02 ` James Bottomley
@ 2011-04-20 15:02 ` James Bottomley
2011-04-20 15:22 ` Christoph Lameter
1 sibling, 0 replies; 15+ messages in thread
From: James Bottomley @ 2011-04-20 15:02 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, 2011-04-20 at 09:50 -0500, Christoph Lameter wrote:
> On Wed, 20 Apr 2011, James Bottomley wrote:
>
> > 1. We can look at what imposing NUMA on the DISCONTIGMEM archs
> > would do ... the embedded ones are going to be hardest hit, but
> > if it's not too much extra code, it might be palatable.
> > 2. The other is that we can audit mm to look at all the node
> > assumptions in the non-numa case. My suspicion is that
> > accidentally or otherwise, it mostly works for the normal case,
> > so there might not be much needed to pull it back to working
> > properly for DISCONTIGMEM.
>
> The older code may work. SLAB f.e. does not call page_to_nid() in the
> !NUMA case but keeps special metadata structures around in each slab page
> that records the node used for allocation. The problem is with new code
> added/revised in the last 5 years or so that uses page_to_nid() and
> allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
> the page allocator that should trigger if page_to_nid() returns strange
> values. I wonder why that never occurred.
Actually, I think slab got changed when discontigmem was added ...
that's why it all works OK.
> > 3. Finally we could look at deprecating DISCONTIGMEM in favour
> of > SPARSEMEM, but we'd still need to fix -stable for that case.
> > Especially as it will take time to convert all the architectures
>
> The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
> need an audit of the core VM before removing that or making it contingent
> on the configurations of various VM subsystems.
Don't be stupid ... that would cause six architectures to get marked
broken.
> > I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> > framework which allows machines with split physical memory ranges to
> > work. That's a very common case nowadays. Numa is supposed to be a
> > heavyweight framework to preserve node locality for non-uniform memory
> > access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
>
> Well yes but we have SPARSE for that today. DISCONTIG with multiple per
> pgdat structures in a !NUMA case is just weird and unexpected for many who
> have done VM coding in the last years.
Look, I'm not really interested in who understands what. The fact is we
have six architectures with the possibility for DISCONTIGMEM && !NUMA,
so that's the case we need to fix in -stable.
They oops with SLUB, as far as I can tell, there are still no oops
reports with SLAB. The simplest -stable fix seems to be to mark SLUB
broken on DISCONTIGMEM && !NUMA.
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 15:02 ` James Bottomley
2011-04-20 15:02 ` James Bottomley
@ 2011-04-20 15:22 ` Christoph Lameter
2011-04-20 19:25 ` Matthew Wilcox
2011-04-20 21:42 ` David Rientjes
1 sibling, 2 replies; 15+ messages in thread
From: Christoph Lameter @ 2011-04-20 15:22 UTC (permalink / raw)
To: James Bottomley
Cc: Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro, Michal Hocko,
Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, 20 Apr 2011, James Bottomley wrote:
> > The older code may work. SLAB f.e. does not call page_to_nid() in the
> > !NUMA case but keeps special metadata structures around in each slab page
> > that records the node used for allocation. The problem is with new code
> > added/revised in the last 5 years or so that uses page_to_nid() and
> > allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
> > the page allocator that should trigger if page_to_nid() returns strange
> > values. I wonder why that never occurred.
>
> Actually, I think slab got changed when discontigmem was added ...
> that's why it all works OK.
Could be. I was not around at the time.
> > > 3. Finally we could look at deprecating DISCONTIGMEM in favour
> > of > SPARSEMEM, but we'd still need to fix -stable for that case.
> > > Especially as it will take time to convert all the architectures
> >
> > The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
> > need an audit of the core VM before removing that or making it contingent
> > on the configurations of various VM subsystems.
>
> Don't be stupid ... that would cause six architectures to get marked
> broken.
Yes they are broken right now. Marking just means showing the user that we
are aware of the situation.
> Look, I'm not really interested in who understands what. The fact is we
> have six architectures with the possibility for DISCONTIGMEM && !NUMA,
> so that's the case we need to fix in -stable.
>
> They oops with SLUB, as far as I can tell, there are still no oops
> reports with SLAB. The simplest -stable fix seems to be to mark SLUB
> broken on DISCONTIGMEM && !NUMA.
There is barely any testing going on at all of this since we have had this
issue for more than 5 years and have not noticed it. The absence of bug
reports therefore proves nothing. Code inspection of the VM shows
that this is an issue that arises in multiple subsystems and that we have
VM_BUG_ONs in the page allocator that should trigger for these situations.
Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 15:22 ` Christoph Lameter
@ 2011-04-20 19:25 ` Matthew Wilcox
2011-04-20 21:42 ` David Rientjes
1 sibling, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2011-04-20 19:25 UTC (permalink / raw)
To: Christoph Lameter
Cc: James Bottomley, Pekka Enberg, KOSAKI Motohiro, Michal Hocko,
Andrew Morton, Hugh Dickins, linux-mm, LKML, linux-parisc,
David Rientjes, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, Apr 20, 2011 at 10:22:04AM -0500, Christoph Lameter wrote:
> There is barely any testing going on at all of this since we have had this
> issue for more than 5 years and have not noticed it. The absence of bug
> reports therefore proves nothing. Code inspection of the VM shows
> that this is an issue that arises in multiple subsystems and that we have
> VM_BUG_ONs in the page allocator that should trigger for these situations.
So ... we've proven that people using these architectures use SLAB
instead of SLUB, don't enable CONFIG_DEBUG_VM and don't use hugepages
(not really a surprise ... nobody's running Oracle on these arches :-)
I don't think that qualifies as "barely any testing". I think that's
"nobody developing the Linux MM uses one of these architectures".
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 15:22 ` Christoph Lameter
2011-04-20 19:25 ` Matthew Wilcox
@ 2011-04-20 21:42 ` David Rientjes
2011-04-20 21:42 ` David Rientjes
2011-04-21 16:06 ` James Bottomley
1 sibling, 2 replies; 15+ messages in thread
From: David Rientjes @ 2011-04-20 21:42 UTC (permalink / raw)
To: Christoph Lameter
Cc: James Bottomley, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, 20 Apr 2011, Christoph Lameter wrote:
> There is barely any testing going on at all of this since we have had this
> issue for more than 5 years and have not noticed it. The absence of bug
> reports therefore proves nothing. Code inspection of the VM shows
> that this is an issue that arises in multiple subsystems and that we have
> VM_BUG_ONs in the page allocator that should trigger for these situations.
>
> Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.
>
We don't actually have any bug reports in front of us that show anything
else in the VM other than slub has issues with this configuration, so
marking them as broken is probably premature. The parisc config that
triggered this debugging enables CONFIG_SLAB by default, so it probably
has gone unnoticed just because nobody other than James has actually tried
it on hppa64.
Let's see if KOSAKI-san's fixes to Kconfig (even though I'd prefer the
simpler and implicit "config NUMA def_bool ARCH_DISCONTIGMEM_ENABLE" over
his config NUMA) and my fix to parisc to set the bit in N_NORMAL_MEMORY
so that CONFIG_SLUB initializes kmem_cache_node correctly works and then
address issues in the core VM as they arise. Presumably someone has been
running DISCONTIGMEM on hppa64 in the past five years without issues with
defconfig, so the issue here may just be slub.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 21:42 ` David Rientjes
@ 2011-04-20 21:42 ` David Rientjes
2011-04-21 16:06 ` James Bottomley
1 sibling, 0 replies; 15+ messages in thread
From: David Rientjes @ 2011-04-20 21:42 UTC (permalink / raw)
To: Christoph Lameter
Cc: James Bottomley, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, 20 Apr 2011, Christoph Lameter wrote:
> There is barely any testing going on at all of this since we have had this
> issue for more than 5 years and have not noticed it. The absence of bug
> reports therefore proves nothing. Code inspection of the VM shows
> that this is an issue that arises in multiple subsystems and that we have
> VM_BUG_ONs in the page allocator that should trigger for these situations.
>
> Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.
>
We don't actually have any bug reports in front of us that show anything
else in the VM other than slub has issues with this configuration, so
marking them as broken is probably premature. The parisc config that
triggered this debugging enables CONFIG_SLAB by default, so it probably
has gone unnoticed just because nobody other than James has actually tried
it on hppa64.
Let's see if KOSAKI-san's fixes to Kconfig (even though I'd prefer the
simpler and implicit "config NUMA def_bool ARCH_DISCONTIGMEM_ENABLE" over
his config NUMA) and my fix to parisc to set the bit in N_NORMAL_MEMORY
so that CONFIG_SLUB initializes kmem_cache_node correctly works and then
address issues in the core VM as they arise. Presumably someone has been
running DISCONTIGMEM on hppa64 in the past five years without issues with
defconfig, so the issue here may just be slub.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-20 21:42 ` David Rientjes
2011-04-20 21:42 ` David Rientjes
@ 2011-04-21 16:06 ` James Bottomley
2011-04-21 22:19 ` David Rientjes
1 sibling, 1 reply; 15+ messages in thread
From: James Bottomley @ 2011-04-21 16:06 UTC (permalink / raw)
To: David Rientjes
Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Wed, 2011-04-20 at 14:42 -0700, David Rientjes wrote:
> On Wed, 20 Apr 2011, Christoph Lameter wrote:
>
> > There is barely any testing going on at all of this since we have had this
> > issue for more than 5 years and have not noticed it. The absence of bug
> > reports therefore proves nothing. Code inspection of the VM shows
> > that this is an issue that arises in multiple subsystems and that we have
> > VM_BUG_ONs in the page allocator that should trigger for these situations.
> >
> > Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such.
> >
>
> We don't actually have any bug reports in front of us that show anything
> else in the VM other than slub has issues with this configuration, so
> marking them as broken is probably premature. The parisc config that
> triggered this debugging enables CONFIG_SLAB by default, so it probably
> has gone unnoticed just because nobody other than James has actually tried
> it on hppa64.
>
> Let's see if KOSAKI-san's fixes to Kconfig (even though I'd prefer the
> simpler and implicit "config NUMA def_bool ARCH_DISCONTIGMEM_ENABLE" over
> his config NUMA) and my fix to parisc to set the bit in N_NORMAL_MEMORY
> so that CONFIG_SLUB initializes kmem_cache_node correctly works and then
> address issues in the core VM as they arise. Presumably someone has been
> running DISCONTIGMEM on hppa64 in the past five years without issues with
> defconfig, so the issue here may just be slub.
Actually, we can fix slub. As far as all my memory hammer tests go, the
one liner below is the actual fix (it just forces slub get_node() to
return the zero node always on !NUMA). That, as far as a code
inspection goes, seems to make SLUB as good as SLAB ... as long as
no-one uses hugepages or VM DEBUG, which, I think we've demonstrated, is
the case for all the current DISCONTIGMEM users.
I think either the above or just marking slub broken in DISCONTIGMEM & !
NUMA is sufficient for stable. The fix is getting urgent, because
debian (which is what most of our users are running) has made SLUB the
default allocator, which is why we're now starting to run into these
panic reports.
The set memory range fix looks good for a backport too ... at least the
page cache is now no-longer reluctant to use my upper 1GB ...
I worry a bit more about backporting the selection of NUMA as a -stable
fix because it's a larger change (and requires changes to all the
architectures, since NUMA is an arch local Kconfig variable)
James
----
diff --git a/mm/slub.c b/mm/slub.c
index 94d2a33..243bd9c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -235,7 +235,11 @@ int slab_is_available(void)
static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
{
+#ifdef CONFIG_NUMA
return s->node[node];
+#else
+ return s->node[0];
+#endif
}
/* Verify that a pointer has an address that is valid within a slab page */
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-21 16:06 ` James Bottomley
@ 2011-04-21 22:19 ` David Rientjes
2011-04-21 22:19 ` David Rientjes
2011-04-21 22:31 ` James Bottomley
0 siblings, 2 replies; 15+ messages in thread
From: David Rientjes @ 2011-04-21 22:19 UTC (permalink / raw)
To: James Bottomley
Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Thu, 21 Apr 2011, James Bottomley wrote:
> diff --git a/mm/slub.c b/mm/slub.c
> index 94d2a33..243bd9c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -235,7 +235,11 @@ int slab_is_available(void)
>
> static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> {
> +#ifdef CONFIG_NUMA
> return s->node[node];
> +#else
> + return s->node[0];
> +#endif
> }
>
> /* Verify that a pointer has an address that is valid within a slab page */
Looks like parisc may have been just fine before 7340cc84141d (slub:
reduce differences between SMP and NUMA), which was merged into 2.6.37?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-21 22:19 ` David Rientjes
@ 2011-04-21 22:19 ` David Rientjes
2011-04-21 22:31 ` James Bottomley
1 sibling, 0 replies; 15+ messages in thread
From: David Rientjes @ 2011-04-21 22:19 UTC (permalink / raw)
To: James Bottomley
Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Thu, 21 Apr 2011, James Bottomley wrote:
> diff --git a/mm/slub.c b/mm/slub.c
> index 94d2a33..243bd9c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -235,7 +235,11 @@ int slab_is_available(void)
>
> static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> {
> +#ifdef CONFIG_NUMA
> return s->node[node];
> +#else
> + return s->node[0];
> +#endif
> }
>
> /* Verify that a pointer has an address that is valid within a slab page */
Looks like parisc may have been just fine before 7340cc84141d (slub:
reduce differences between SMP and NUMA), which was merged into 2.6.37?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-21 22:19 ` David Rientjes
2011-04-21 22:19 ` David Rientjes
@ 2011-04-21 22:31 ` James Bottomley
2011-04-21 22:31 ` James Bottomley
1 sibling, 1 reply; 15+ messages in thread
From: James Bottomley @ 2011-04-21 22:31 UTC (permalink / raw)
To: David Rientjes
Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Thu, 2011-04-21 at 15:19 -0700, David Rientjes wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
>
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 94d2a33..243bd9c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -235,7 +235,11 @@ int slab_is_available(void)
> >
> > static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> > {
> > +#ifdef CONFIG_NUMA
> > return s->node[node];
> > +#else
> > + return s->node[0];
> > +#endif
> > }
> >
> > /* Verify that a pointer has an address that is valid within a slab page */
>
> Looks like parisc may have been just fine before 7340cc84141d (slub:
> reduce differences between SMP and NUMA), which was merged into 2.6.37?
That's possible. I've had no bug reports from the debian 2.6.32 kernel,
which is the only other one that has SLUB by default. The m68k guys
seem to think this is the cause of their problems too.
But the basic fact is that all our testing has been done on SLAB. It
wasn't until debian asked us to looks at a 2.6.38 kernel that I
accidentally picked up SLUB by importing their config into my build
environment.
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards
2011-04-21 22:31 ` James Bottomley
@ 2011-04-21 22:31 ` James Bottomley
0 siblings, 0 replies; 15+ messages in thread
From: James Bottomley @ 2011-04-21 22:31 UTC (permalink / raw)
To: David Rientjes
Cc: Christoph Lameter, Pekka Enberg, Matthew Wilcox, KOSAKI Motohiro,
Michal Hocko, Andrew Morton, Hugh Dickins, linux-mm, LKML,
linux-parisc, Ingo Molnar, x86 maintainers, linux-arch,
Mel Gorman
On Thu, 2011-04-21 at 15:19 -0700, David Rientjes wrote:
> On Thu, 21 Apr 2011, James Bottomley wrote:
>
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 94d2a33..243bd9c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -235,7 +235,11 @@ int slab_is_available(void)
> >
> > static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
> > {
> > +#ifdef CONFIG_NUMA
> > return s->node[node];
> > +#else
> > + return s->node[0];
> > +#endif
> > }
> >
> > /* Verify that a pointer has an address that is valid within a slab page */
>
> Looks like parisc may have been just fine before 7340cc84141d (slub:
> reduce differences between SMP and NUMA), which was merged into 2.6.37?
That's possible. I've had no bug reports from the debian 2.6.32 kernel,
which is the only other one that has SLUB by default. The m68k guys
seem to think this is the cause of their problems too.
But the basic fact is that all our testing has been done on SLAB. It
wasn't until debian asked us to looks at a 2.6.38 kernel that I
accidentally picked up SLUB by importing their config into my build
environment.
James
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-04-21 22:31 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20110420102314.4604.A69D9226@jp.fujitsu.com>
[not found] ` <BANLkTi=mxWwLPEnB+rGg29b06xNUD0XvsA@mail.gmail.com>
[not found] ` <20110420161615.462D.A69D9226@jp.fujitsu.com>
[not found] ` <BANLkTimfpY3gq8oY6bPDajBW7JN6Hp+A0A@mail.gmail.com>
[not found] ` <20110420112020.GA31296@parisc-linux.org>
[not found] ` <BANLkTim+m-v-4k17HUSOYSbmNFDtJTgD6g@mail.gmail.com>
2011-04-20 14:15 ` [PATCH v3] mm: make expand_downwards symmetrical to expand_upwards James Bottomley
2011-04-20 14:15 ` James Bottomley
2011-04-20 14:50 ` Christoph Lameter
2011-04-20 14:50 ` Christoph Lameter
2011-04-20 15:02 ` James Bottomley
2011-04-20 15:02 ` James Bottomley
2011-04-20 15:22 ` Christoph Lameter
2011-04-20 19:25 ` Matthew Wilcox
2011-04-20 21:42 ` David Rientjes
2011-04-20 21:42 ` David Rientjes
2011-04-21 16:06 ` James Bottomley
2011-04-21 22:19 ` David Rientjes
2011-04-21 22:19 ` David Rientjes
2011-04-21 22:31 ` James Bottomley
2011-04-21 22:31 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox