* Re: Bug: early_pfn_in_nid() called when not early
@ 2006-12-13 23:17 ` Mike Kravetz
0 siblings, 0 replies; 15+ messages in thread
From: Mike Kravetz @ 2006-12-13 23:17 UTC (permalink / raw)
To: Arnd Bergmann
Cc: cbe-oss-dev, linuxppc-dev, linux-mm, Benjamin Herrenschmidt,
Andy Whitcroft, Michael Kravetz, hch, Jeremy Kerr, linux-kernel,
Paul Mackerras, Andrew Morton
On Wed, Dec 13, 2006 at 07:20:57PM +0100, Arnd Bergmann wrote:
> After a lot of debugging in spufs, I found that a crash that we encountered
> on Cell actually was caused by a change in the memory management.
>
> The patch that caused it is archived in http://lkml.org/lkml/2006/11/1/43,
> and this one has been discussed back and forth, but I fear that the current
> version may be broken for all setups that do memory hotplug with sparsemen
> and NUMA, at least on powerpc.
I believe you are correct. At least the memory hotplug code for powerpc
is currently broken (caused crash!).
> - both early_pfn_{in,to}_nid and early_node_map are in the __init
> section and may already have been freed at the time we are calling
> memmap_init_zone().
Well that is the root of the problem for powerpc. I believe that
__meminit attribute on memmap_init_zone() has no definition if
CONFIG_MEMORY_HOTPLUG is defined. This is so that it can be called
after boot. But, this also implies that memmap_init_zone() can not
call any routines in the __init section.
> The patch below is not a suggested fix that I want to get into mainline
> (checking slab_is_available is the wrong here), but it is a quick fix
> that you should apply if you want to run a recent (post-2.6.18) kernel
> on the IBM QS20 blade. I'm sorry for not having reported this earlier,
> but we were always trying to find the problem in my own code...
Thanks for the debug work! Just curious if you really need
CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
those types of memory layouts? If not, an easy/immediate fix for you
might be to simply turn off the option.
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -1962,7 +1962,8 @@ void __meminit memmap_init_zone(unsigned
> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> if (!early_pfn_valid(pfn))
> continue;
> - if (!early_pfn_in_nid(pfn, nid))
> + if (!slab_is_available() &&
> + !early_pfn_in_nid(pfn, nid))
> continue;
> page = pfn_to_page(pfn);
> set_page_links(page, zone, nid, pfn);
I know you don't recommend this as a fix, but it has the interesting
quality of doing exactly what we want for powerpc. When
slab_is_available() we are performing a 'memory add' operation
and there is no need to do the 'pfn_in_nid' check. We know that
the range of added pages will all be on the same (passed) nid.
--
Mike
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Bug: early_pfn_in_nid() called when not early
@ 2006-12-13 23:17 ` Mike Kravetz
0 siblings, 0 replies; 15+ messages in thread
From: Mike Kravetz @ 2006-12-13 23:17 UTC (permalink / raw)
To: Arnd Bergmann
Cc: cbe-oss-dev, linuxppc-dev, linux-mm, Benjamin Herrenschmidt,
Andy Whitcroft, Michael Kravetz, hch, Jeremy Kerr, linux-kernel,
Paul Mackerras, Andrew Morton
On Wed, Dec 13, 2006 at 07:20:57PM +0100, Arnd Bergmann wrote:
> After a lot of debugging in spufs, I found that a crash that we encountered
> on Cell actually was caused by a change in the memory management.
>
> The patch that caused it is archived in http://lkml.org/lkml/2006/11/1/43,
> and this one has been discussed back and forth, but I fear that the current
> version may be broken for all setups that do memory hotplug with sparsemen
> and NUMA, at least on powerpc.
I believe you are correct. At least the memory hotplug code for powerpc
is currently broken (caused crash!).
> - both early_pfn_{in,to}_nid and early_node_map are in the __init
> section and may already have been freed at the time we are calling
> memmap_init_zone().
Well that is the root of the problem for powerpc. I believe that
__meminit attribute on memmap_init_zone() has no definition if
CONFIG_MEMORY_HOTPLUG is defined. This is so that it can be called
after boot. But, this also implies that memmap_init_zone() can not
call any routines in the __init section.
> The patch below is not a suggested fix that I want to get into mainline
> (checking slab_is_available is the wrong here), but it is a quick fix
> that you should apply if you want to run a recent (post-2.6.18) kernel
> on the IBM QS20 blade. I'm sorry for not having reported this earlier,
> but we were always trying to find the problem in my own code...
Thanks for the debug work! Just curious if you really need
CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
those types of memory layouts? If not, an easy/immediate fix for you
might be to simply turn off the option.
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -1962,7 +1962,8 @@ void __meminit memmap_init_zone(unsigned
> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> if (!early_pfn_valid(pfn))
> continue;
> - if (!early_pfn_in_nid(pfn, nid))
> + if (!slab_is_available() &&
> + !early_pfn_in_nid(pfn, nid))
> continue;
> page = pfn_to_page(pfn);
> set_page_links(page, zone, nid, pfn);
I know you don't recommend this as a fix, but it has the interesting
quality of doing exactly what we want for powerpc. When
slab_is_available() we are performing a 'memory add' operation
and there is no need to do the 'pfn_in_nid' check. We know that
the range of added pages will all be on the same (passed) nid.
--
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Bug: early_pfn_in_nid() called when not early
2006-12-13 23:17 ` Mike Kravetz
(?)
@ 2006-12-13 23:21 ` Benjamin Herrenschmidt
-1 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-12-13 23:21 UTC (permalink / raw)
To: Mike Kravetz
Cc: Andrew Morton, Arnd Bergmann, linux-mm, linux-kernel, hch,
linuxppc-dev, Paul Mackerras, Michael Kravetz, cbe-oss-dev
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.
Yes, we need that for some pSeries boxes.
Ben.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug: early_pfn_in_nid() called when not early
@ 2006-12-13 23:21 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-12-13 23:21 UTC (permalink / raw)
To: Mike Kravetz
Cc: Arnd Bergmann, cbe-oss-dev, linuxppc-dev, linux-mm,
Andy Whitcroft, Michael Kravetz, hch, Jeremy Kerr, linux-kernel,
Paul Mackerras, Andrew Morton
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.
Yes, we need that for some pSeries boxes.
Ben.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug: early_pfn_in_nid() called when not early
@ 2006-12-13 23:21 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2006-12-13 23:21 UTC (permalink / raw)
To: Mike Kravetz
Cc: Arnd Bergmann, cbe-oss-dev, linuxppc-dev, linux-mm,
Andy Whitcroft, Michael Kravetz, hch, Jeremy Kerr, linux-kernel,
Paul Mackerras, Andrew Morton
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.
Yes, we need that for some pSeries boxes.
Ben.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug: early_pfn_in_nid() called when not early
2006-12-13 23:17 ` Mike Kravetz
(?)
@ 2006-12-13 23:37 ` Arnd Bergmann
-1 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2006-12-13 23:37 UTC (permalink / raw)
To: Mike Kravetz
Cc: Andrew Morton, linux-mm, linux-kernel, hch, linuxppc-dev,
Paul Mackerras, Michael Kravetz, cbe-oss-dev
On Thursday 14 December 2006 00:17, Mike Kravetz wrote:
> Thanks for the debug work! =A0Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? =A0Can you get
> those types of memory layouts? =A0If not, an easy/immediate fix for you
> might be to simply turn off the option.
We want to be able to run the same kernel as pseries, e.g. on
distributions that ship only one kernel. Cell should be able to
live without CONFIG_NODES_SPAN_OTHER_NODES, but that does not solve
the problem for us.
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -1962,7 +1962,8 @@ void __meminit memmap_init_zone(unsigned
> > =A0=A0=A0=A0=A0=A0for (pfn =3D start_pfn; pfn < end_pfn; pfn++) {
> > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (!early_pfn_valid(pfn))
> > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0conti=
nue;
> > -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (!early_pfn_in_nid(pfn, nid))
> > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (!slab_is_available() &&
> > +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =A0 =A0!early_pfn_in_nid(pfn, =
nid))
> > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0conti=
nue;
> > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0page =3D pfn_to_page(pfn);
> > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0set_page_links(page, zone, ni=
d, pfn);
>=20
> I know you don't recommend this as a fix, but it has the interesting
> quality of doing exactly what we want for powerpc. =A0When
> slab_is_available() we are performing a 'memory add' operation
> and there is no need to do the 'pfn_in_nid' check. =A0We know that
> the range of added pages will all be on the same (passed) nid.
Right, that was at least my intention, though I wasn't sure
if pseries can live without the check.
Arnd <><
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Bug: early_pfn_in_nid() called when not early
@ 2006-12-13 23:37 ` Arnd Bergmann
0 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2006-12-13 23:37 UTC (permalink / raw)
To: Mike Kravetz
Cc: cbe-oss-dev, linuxppc-dev, linux-mm, Benjamin Herrenschmidt,
Andy Whitcroft, Michael Kravetz, hch, Jeremy Kerr, linux-kernel,
Paul Mackerras, Andrew Morton
On Thursday 14 December 2006 00:17, Mike Kravetz wrote:
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.
We want to be able to run the same kernel as pseries, e.g. on
distributions that ship only one kernel. Cell should be able to
live without CONFIG_NODES_SPAN_OTHER_NODES, but that does not solve
the problem for us.
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -1962,7 +1962,8 @@ void __meminit memmap_init_zone(unsigned
> > for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > if (!early_pfn_valid(pfn))
> > continue;
> > - if (!early_pfn_in_nid(pfn, nid))
> > + if (!slab_is_available() &&
> > + !early_pfn_in_nid(pfn, nid))
> > continue;
> > page = pfn_to_page(pfn);
> > set_page_links(page, zone, nid, pfn);
>
> I know you don't recommend this as a fix, but it has the interesting
> quality of doing exactly what we want for powerpc. When
> slab_is_available() we are performing a 'memory add' operation
> and there is no need to do the 'pfn_in_nid' check. We know that
> the range of added pages will all be on the same (passed) nid.
Right, that was at least my intention, though I wasn't sure
if pseries can live without the check.
Arnd <><
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Bug: early_pfn_in_nid() called when not early
@ 2006-12-13 23:37 ` Arnd Bergmann
0 siblings, 0 replies; 15+ messages in thread
From: Arnd Bergmann @ 2006-12-13 23:37 UTC (permalink / raw)
To: Mike Kravetz
Cc: cbe-oss-dev, linuxppc-dev, linux-mm, Benjamin Herrenschmidt,
Andy Whitcroft, Michael Kravetz, hch, Jeremy Kerr, linux-kernel,
Paul Mackerras, Andrew Morton
On Thursday 14 December 2006 00:17, Mike Kravetz wrote:
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.
We want to be able to run the same kernel as pseries, e.g. on
distributions that ship only one kernel. Cell should be able to
live without CONFIG_NODES_SPAN_OTHER_NODES, but that does not solve
the problem for us.
> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -1962,7 +1962,8 @@ void __meminit memmap_init_zone(unsigned
> > for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > if (!early_pfn_valid(pfn))
> > continue;
> > - if (!early_pfn_in_nid(pfn, nid))
> > + if (!slab_is_available() &&
> > + !early_pfn_in_nid(pfn, nid))
> > continue;
> > page = pfn_to_page(pfn);
> > set_page_links(page, zone, nid, pfn);
>
> I know you don't recommend this as a fix, but it has the interesting
> quality of doing exactly what we want for powerpc. When
> slab_is_available() we are performing a 'memory add' operation
> and there is no need to do the 'pfn_in_nid' check. We know that
> the range of added pages will all be on the same (passed) nid.
Right, that was at least my intention, though I wasn't sure
if pseries can live without the check.
Arnd <><
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug: early_pfn_in_nid() called when not early
2006-12-13 23:17 ` Mike Kravetz
(?)
@ 2006-12-13 23:58 ` Paul Mackerras
-1 siblings, 0 replies; 15+ messages in thread
From: Paul Mackerras @ 2006-12-13 23:58 UTC (permalink / raw)
To: Mike Kravetz
Cc: Andrew Morton, Arnd Bergmann, linux-mm, linux-kernel, hch,
linuxppc-dev, Michael Kravetz, cbe-oss-dev
Mike Kravetz writes:
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.
We really need CONFIG_NODES_SPAN_OTHER_NODES for pSeries. Since we
can build a single kernel binary that runs on both Cell and pSeries,
the Cell code needs to be able to work with that option turned on.
Paul.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug: early_pfn_in_nid() called when not early
@ 2006-12-13 23:58 ` Paul Mackerras
0 siblings, 0 replies; 15+ messages in thread
From: Paul Mackerras @ 2006-12-13 23:58 UTC (permalink / raw)
To: Mike Kravetz
Cc: Arnd Bergmann, cbe-oss-dev, linuxppc-dev, linux-mm,
Benjamin Herrenschmidt, Andy Whitcroft, Michael Kravetz, hch,
Jeremy Kerr, linux-kernel, Andrew Morton
Mike Kravetz writes:
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.
We really need CONFIG_NODES_SPAN_OTHER_NODES for pSeries. Since we
can build a single kernel binary that runs on both Cell and pSeries,
the Cell code needs to be able to work with that option turned on.
Paul.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug: early_pfn_in_nid() called when not early
@ 2006-12-13 23:58 ` Paul Mackerras
0 siblings, 0 replies; 15+ messages in thread
From: Paul Mackerras @ 2006-12-13 23:58 UTC (permalink / raw)
To: Mike Kravetz
Cc: Arnd Bergmann, cbe-oss-dev, linuxppc-dev, linux-mm,
Benjamin Herrenschmidt, Andy Whitcroft, Michael Kravetz, hch,
Jeremy Kerr, linux-kernel, Andrew Morton
Mike Kravetz writes:
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.
We really need CONFIG_NODES_SPAN_OTHER_NODES for pSeries. Since we
can build a single kernel binary that runs on both Cell and pSeries,
the Cell code needs to be able to work with that option turned on.
Paul.
^ permalink raw reply [flat|nested] 15+ messages in thread