* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
@ 2016-01-04 22:42 Mark Brown
2016-01-04 23:09 ` Andrew Morton
2016-01-06 0:22 ` Guenter Roeck
0 siblings, 2 replies; 14+ messages in thread
From: Mark Brown @ 2016-01-04 22:42 UTC (permalink / raw)
To: linux-arm-kernel
Since 20151231 -next has been failing to boot on a wide range of ARM
platforms in the kernelci.org boot tests[1]. Doing bisections with
Arndale and BeagleBone Black identifies 904769ac82ebf (mm/page_alloc.c:
calculate zone_start_pfn at zone_spanned_pages_in_node()) from the akpm
tree as the first broken commit[2,3]. An example bootlog from the
failure is:
http://storage.kernelci.org/next/next-20151231/arm-multi_v7_defconfig/lab-cambridge/boot-exynos5250-arndale.html
which shows no output on the console once we start the kernel, a brief
sampling of failing boards suggests this is the normal failure mode.
x86 and arm64 targets seem fine (juno shows up as failing but the boot
log seems fine so it's probably a false positive, Mustang was failing
already) and there are a small number of ARM platforms that boot. I've
not yet had any time to investigate further than that (including trying
a revert of that commit), sorry.
[1] http://kernelci.org/boot/all/job/next/kernel/next-20151231/
[2] https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/135/console
[3] https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/136/console
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160104/4feed3d4/attachment-0001.sig>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-04 22:42 Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()" Mark Brown
@ 2016-01-04 23:09 ` Andrew Morton
2016-01-04 23:55 ` Mark Brown
2016-01-06 0:22 ` Guenter Roeck
1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2016-01-04 23:09 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 4 Jan 2016 22:42:33 +0000 Mark Brown <broonie@kernel.org> wrote:
> Since 20151231 -next has been failing to boot on a wide range of ARM
> platforms in the kernelci.org boot tests[1]. Doing bisections with
> Arndale and BeagleBone Black identifies 904769ac82ebf (mm/page_alloc.c:
> calculate zone_start_pfn at zone_spanned_pages_in_node()) from the akpm
> tree as the first broken commit[2,3]. An example bootlog from the
> failure is:
>
> http://storage.kernelci.org/next/next-20151231/arm-multi_v7_defconfig/lab-cambridge/boot-exynos5250-arndale.html
>
> which shows no output on the console once we start the kernel, a brief
> sampling of failing boards suggests this is the normal failure mode.
> x86 and arm64 targets seem fine (juno shows up as failing but the boot
> log seems fine so it's probably a false positive, Mustang was failing
> already) and there are a small number of ARM platforms that boot. I've
> not yet had any time to investigate further than that (including trying
> a revert of that commit), sorry.
>
> [1] http://kernelci.org/boot/all/job/next/kernel/next-20151231/
> [2] https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/135/console
> [3] https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/136/console
Thanks. That patch has rather a blooper if
CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
Arnd's tentative fix is below.
I shall drop that patchset for now.
From: Arnd Bergmann <arnd@arndb.de>
Subject: mm/page_alloc.c: set a zone_start_pfn value in zone_spanned_pages_in_node
We got a new build warning in linux-next:
mm/page_alloc.c: In function 'free_area_init_node':
mm/page_alloc.c:5278:25: warning: 'zone_start_pfn' may be used uninitialized in this function [-Wmaybe-uninitialized]
zone->zone_start_pfn = zone_start_pfn;
mm/page_alloc.c:5265:17: note: 'zone_start_pfn' was declared here
unsigned long zone_start_pfn, zone_end_pfn;
The code indeed looks wrong, but this is just a guess of what the
fix might be: I have not looked it in detail, so please treat this
as a bug report.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 2 ++
1 file changed, 2 insertions(+)
diff -puN mm/page_alloc.c~mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node-fix
+++ a/mm/page_alloc.c
@@ -5013,6 +5013,8 @@ static inline unsigned long __meminit zo
unsigned long *zone_end_pfn,
unsigned long *zones_size)
{
+ *zone_start_pfn = node_start_pfn;
+ *zone_end_pfn = node_end_pfn;
return zones_size[zone_type];
}
_
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-04 23:09 ` Andrew Morton
@ 2016-01-04 23:55 ` Mark Brown
2016-01-05 0:35 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Mark Brown @ 2016-01-04 23:55 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
> On Mon, 4 Jan 2016 22:42:33 +0000 Mark Brown <broonie@kernel.org> wrote:
> > platforms in the kernelci.org boot tests[1]. Doing bisections with
> > Arndale and BeagleBone Black identifies 904769ac82ebf (mm/page_alloc.c:
> > calculate zone_start_pfn at zone_spanned_pages_in_node()) from the akpm
> > tree as the first broken commit[2,3]. An example bootlog from the
> > failure is:
> Thanks. That patch has rather a blooper if
> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
Seems to be what's making a difference from a quick run through, yes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160104/05a406fe/attachment.sig>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-04 23:55 ` Mark Brown
@ 2016-01-05 0:35 ` Andrew Morton
2016-01-05 0:49 ` Stephen Rothwell
2016-01-05 11:45 ` Mark Brown
0 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2016-01-05 0:35 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
> > On Mon, 4 Jan 2016 22:42:33 +0000 Mark Brown <broonie@kernel.org> wrote:
>
> > > platforms in the kernelci.org boot tests[1]. Doing bisections with
> > > Arndale and BeagleBone Black identifies 904769ac82ebf (mm/page_alloc.c:
> > > calculate zone_start_pfn at zone_spanned_pages_in_node()) from the akpm
> > > tree as the first broken commit[2,3]. An example bootlog from the
> > > failure is:
>
> > Thanks. That patch has rather a blooper if
> > CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
>
> Seems to be what's making a difference from a quick run through, yes.
OK, thanks.
Stephen, can we please retain
mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node.patch
mm-introduce-kernelcore=mirror-option.patch
mm-introduce-kernelcore=mirror-option-fix.patch
mm-introduce-kernelcore=mirror-option-fix-2.patch
and add the below?
Or don't bother - I'll do an mmotm tomorrow with these in it.
I'd still like reviewing and testing from Taku Izumi please.
From: Arnd Bergmann <arnd@arndb.de>
Subject: mm/page_alloc.c: set a zone_start_pfn value in zone_spanned_pages_in_node
We got a new build warning in linux-next:
mm/page_alloc.c: In function 'free_area_init_node':
mm/page_alloc.c:5278:25: warning: 'zone_start_pfn' may be used uninitialized in this function [-Wmaybe-uninitialized]
zone->zone_start_pfn = zone_start_pfn;
mm/page_alloc.c:5265:17: note: 'zone_start_pfn' was declared here
unsigned long zone_start_pfn, zone_end_pfn;
The code indeed looks wrong, but this is just a guess of what the
fix might be: I have not looked it in detail, so please treat this
as a bug report.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 2 ++
1 file changed, 2 insertions(+)
diff -puN mm/page_alloc.c~mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node-fix
+++ a/mm/page_alloc.c
@@ -5013,6 +5013,8 @@ static inline unsigned long __meminit zo
unsigned long *zone_end_pfn,
unsigned long *zones_size)
{
+ *zone_start_pfn = node_start_pfn;
+ *zone_end_pfn = node_end_pfn;
return zones_size[zone_type];
}
_
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 0:35 ` Andrew Morton
@ 2016-01-05 0:49 ` Stephen Rothwell
2016-01-05 5:47 ` Stephen Rothwell
2016-01-05 11:45 ` Mark Brown
1 sibling, 1 reply; 14+ messages in thread
From: Stephen Rothwell @ 2016-01-05 0:49 UTC (permalink / raw)
To: linux-arm-kernel
Hi Andrew,
On Mon, 4 Jan 2016 16:35:28 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Stephen, can we please retain
>
> mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node.patch
> mm-introduce-kernelcore=mirror-option.patch
> mm-introduce-kernelcore=mirror-option-fix.patch
> mm-introduce-kernelcore=mirror-option-fix-2.patch
>
> and add the below?
Sure, that is easier than dropping the above patches, anyway.
--
Cheers,
Stephen Rothwell sfr at canb.auug.org.au
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 0:49 ` Stephen Rothwell
@ 2016-01-05 5:47 ` Stephen Rothwell
2016-01-05 5:49 ` Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Stephen Rothwell @ 2016-01-05 5:47 UTC (permalink / raw)
To: linux-arm-kernel
Hi Andrew,
On Tue, 5 Jan 2016 11:49:18 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Mon, 4 Jan 2016 16:35:28 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Stephen, can we please retain
> >
> > mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node.patch
> > mm-introduce-kernelcore=mirror-option.patch
> > mm-introduce-kernelcore=mirror-option-fix.patch
> > mm-introduce-kernelcore=mirror-option-fix-2.patch
> >
> > and add the below?
>
> Sure, that is easier than dropping the above patches, anyway.
I have done that *except* that
mm-introduce-kernelcore=mirror-option-fix-2.patch is not in mmotm and I
cannot find it anywhere.
--
Cheers,
Stephen Rothwell sfr at canb.auug.org.au
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 5:47 ` Stephen Rothwell
@ 2016-01-05 5:49 ` Andrew Morton
0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2016-01-05 5:49 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 5 Jan 2016 16:47:16 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> Hi Andrew,
>
> On Tue, 5 Jan 2016 11:49:18 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > On Mon, 4 Jan 2016 16:35:28 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > Stephen, can we please retain
> > >
> > > mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node.patch
> > > mm-introduce-kernelcore=mirror-option.patch
> > > mm-introduce-kernelcore=mirror-option-fix.patch
> > > mm-introduce-kernelcore=mirror-option-fix-2.patch
> > >
> > > and add the below?
> >
> > Sure, that is easier than dropping the above patches, anyway.
>
> I have done that *except* that
> mm-introduce-kernelcore=mirror-option-fix-2.patch is not in mmotm and I
> cannot find it anywhere.
oops sorry, I took it out so it isn't in today's
http://ozlabs.org/~akpm/mmots/broken-out/. Here:
From: Arnd Bergmann <arnd@arndb.de>
Subject: mm: avoid unused variables in memmap_init_zone
A quick fix on mm/page_alloc.c introduced a harmless warning:
mm/page_alloc.c: In function 'memmap_init_zone':
mm/page_alloc.c:4617:44: warning: unused variable 'tmp' [-Wunused-variable]
mm/page_alloc.c:4617:26: warning: unused variable 'r' [-Wunused-variable]
This uses another #ifdef to avoid declaring the two variables when the
code is not built.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 2 ++
1 file changed, 2 insertions(+)
diff -puN mm/page_alloc.c~mm-introduce-kernelcore=mirror-option-fix-2 mm/page_alloc.c
--- a/mm/page_alloc.c~mm-introduce-kernelcore=mirror-option-fix-2
+++ a/mm/page_alloc.c
@@ -4465,7 +4465,9 @@ void __meminit memmap_init_zone(unsigned
unsigned long pfn;
struct zone *z;
unsigned long nr_initialised = 0;
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
struct memblock_region *r = NULL, *tmp;
+#endif
if (highest_memmap_pfn < end_pfn - 1)
highest_memmap_pfn = end_pfn - 1;
_
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 0:35 ` Andrew Morton
2016-01-05 0:49 ` Stephen Rothwell
@ 2016-01-05 11:45 ` Mark Brown
2016-01-05 12:21 ` Sudeep Holla
1 sibling, 1 reply; 14+ messages in thread
From: Mark Brown @ 2016-01-05 11:45 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
> > On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
> > > Thanks. That patch has rather a blooper if
> > > CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
> > Seems to be what's making a difference from a quick run through, yes.
> OK, thanks.
Seems like I was mistaken here somehow or there's some other problem -
I've kicked off another bisect for today's -next:
https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
and will follow up with any results.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160105/73cf50d5/attachment.sig>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 11:45 ` Mark Brown
@ 2016-01-05 12:21 ` Sudeep Holla
2016-01-05 19:24 ` Mark Brown
2016-01-05 19:59 ` Steve Capper
0 siblings, 2 replies; 14+ messages in thread
From: Sudeep Holla @ 2016-01-05 12:21 UTC (permalink / raw)
To: linux-arm-kernel
On 05/01/16 11:45, Mark Brown wrote:
> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
>
>>>> Thanks. That patch has rather a blooper if
>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
>
>>> Seems to be what's making a difference from a quick run through, yes.
>
>> OK, thanks.
>
> Seems like I was mistaken here somehow or there's some other problem -
> I've kicked off another bisect for today's -next:
>
> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
>
> and will follow up with any results.
>
With both patches applied(one already in today's -next), I am able to
boot on ARM64 platform but I get huge load(for each pfn) of below warning:
-->8
BUG: Bad page state in process swapper pfn:900000
page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0
flags: 0x0()
page dumped because: nonzero mapcount
Modules linked in:
Hardware name: ARM Juno development board (r0) (DT)
Call trace:
[<ffffffc000089830>] dump_backtrace+0x0/0x180
[<ffffffc0000899c4>] show_stack+0x14/0x20
[<ffffffc000335008>] dump_stack+0x90/0xc8
[<ffffffc0001531f8>] bad_page+0xd8/0x138
[<ffffffc000153470>] free_pages_prepare+0x218/0x290
[<ffffffc000154d4c>] __free_pages_ok+0x1c/0xb8
[<ffffffc000155638>] __free_pages+0x30/0x50
[<ffffffc00092fa9c>] __free_pages_bootmem+0xa0/0xa8
[<ffffffc0009321d0>] free_all_bootmem+0x11c/0x184
[<ffffffc000925264>] mem_init+0x48/0x1b4
[<ffffffc0009217e0>] start_kernel+0x224/0x3b4
[<0000000080663000>] 0x80663000
Disabling lock debugging due to kernel taint
--
Regards,
Sudeep
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 12:21 ` Sudeep Holla
@ 2016-01-05 19:24 ` Mark Brown
2016-01-05 19:59 ` Steve Capper
1 sibling, 0 replies; 14+ messages in thread
From: Mark Brown @ 2016-01-05 19:24 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jan 05, 2016 at 12:21:51PM +0000, Sudeep Holla wrote:
> On 05/01/16 11:45, Mark Brown wrote:
> >On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
> >>On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
> >>>On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
> >>>>Thanks. That patch has rather a blooper if
> >>>>CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
> >>>Seems to be what's making a difference from a quick run through, yes.
> >>OK, thanks.
> >Seems like I was mistaken here somehow or there's some other problem -
> >I've kicked off another bisect for today's -next:
> > https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
> >and will follow up with any results.
> With both patches applied(one already in today's -next), I am able to
> boot on ARM64 platform but I get huge load(for each pfn) of below warning:
Bisect on today's -next with Arndale (an ARM platform) flags the same
patch:
https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
as does Juno which is an arm64 platform:
https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/138/console
(it does get to a console but with lots of the backtraces Sudeep
indicated).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160105/66d224f6/attachment.sig>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 12:21 ` Sudeep Holla
2016-01-05 19:24 ` Mark Brown
@ 2016-01-05 19:59 ` Steve Capper
2016-01-06 10:32 ` Sudeep Holla
2016-01-06 15:56 ` Steve Capper
1 sibling, 2 replies; 14+ messages in thread
From: Steve Capper @ 2016-01-05 19:59 UTC (permalink / raw)
To: linux-arm-kernel
On 5 January 2016 at 12:21, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
>
> On 05/01/16 11:45, Mark Brown wrote:
>>
>> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
>>>
>>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
>>>>
>>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
>>
>>
>>>>> Thanks. That patch has rather a blooper if
>>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
>>
>>
>>>> Seems to be what's making a difference from a quick run through, yes.
>>
>>
>>> OK, thanks.
>>
>>
>> Seems like I was mistaken here somehow or there's some other problem -
>> I've kicked off another bisect for today's -next:
>>
>>
>> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
>>
>> and will follow up with any results.
>>
>
> With both patches applied(one already in today's -next), I am able to
> boot on ARM64 platform but I get huge load(for each pfn) of below warning:
>
> -->8
>
> BUG: Bad page state in process swapper pfn:900000
> page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0
> flags: 0x0()
> page dumped because: nonzero mapcount
> Modules linked in:
> Hardware name: ARM Juno development board (r0) (DT)
> Call trace:
> [<ffffffc000089830>] dump_backtrace+0x0/0x180
> [<ffffffc0000899c4>] show_stack+0x14/0x20
> [<ffffffc000335008>] dump_stack+0x90/0xc8
> [<ffffffc0001531f8>] bad_page+0xd8/0x138
> [<ffffffc000153470>] free_pages_prepare+0x218/0x290
> [<ffffffc000154d4c>] __free_pages_ok+0x1c/0xb8
> [<ffffffc000155638>] __free_pages+0x30/0x50
> [<ffffffc00092fa9c>] __free_pages_bootmem+0xa0/0xa8
> [<ffffffc0009321d0>] free_all_bootmem+0x11c/0x184
> [<ffffffc000925264>] mem_init+0x48/0x1b4
> [<ffffffc0009217e0>] start_kernel+0x224/0x3b4
> [<0000000080663000>] 0x80663000
> Disabling lock debugging due to kernel taint
>
> --
I managed to get 904769ac82ebf60cb54f225f59ae7c064772a4d7 booting on
an arm64 machine without errors with the following changes:
=====================================
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8bb70d..0edb608 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5013,6 +5013,15 @@ static inline unsigned long __meminit
zone_spanned_pages_in_node(int nid,
unsigned long *zone_end_pfn,
unsigned long *zones_size)
{
+ unsigned int zone;
+
+ *zone_start_pfn = node_start_pfn;
+ for (zone = 0; zone < zone_type; zone++) {
+ *zone_start_pfn += zones_size[zone];
+ }
+
+ *zone_end_pfn = *zone_start_pfn + zones_size[zone_type];
+
return zones_size[zone_type];
}
@@ -5328,6 +5337,8 @@ void __paginginit free_area_init_node(int nid,
unsigned long *zones_size,
pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
(u64)start_pfn << PAGE_SHIFT,
end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
+#else
+ start_pfn = node_start_pfn;
#endif
calculate_node_totalpages(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);
=====================================
My understanding is that 904769a ("mm/page_alloc.c: calculate
zone_start_pfn@zone_spanned_pages_in_node()") inadvertently
discards information when pgdat->node_start_pfn is removed from
free_area_init_core (and zone_start_pfn is no longer updated by "size"
in the loop inside free_area_init_core). This isn't an issue with
systems where CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled as
zone_start_pfn is set correctly. On systems without
CONFIG_HAVE_MEMBLOCK_NODE_MAP, zone_start_pfn is always 0.
When I ported the above fix to linux-next
(8ef79cd05e6894c01ab9b41aa918a402fa8022a7) I was able to boot in a VM
but not on my actual machine, I'll investigate that tomorrow.
Cheers,
--
Steve
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-04 22:42 Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()" Mark Brown
2016-01-04 23:09 ` Andrew Morton
@ 2016-01-06 0:22 ` Guenter Roeck
1 sibling, 0 replies; 14+ messages in thread
From: Guenter Roeck @ 2016-01-06 0:22 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Jan 04, 2016 at 10:42:33PM +0000, Mark Brown wrote:
> Since 20151231 -next has been failing to boot on a wide range of ARM
> platforms in the kernelci.org boot tests[1]. Doing bisections with
> Arndale and BeagleBone Black identifies 904769ac82ebf (mm/page_alloc.c:
> calculate zone_start_pfn at zone_spanned_pages_in_node()) from the akpm
> tree as the first broken commit[2,3]. An example bootlog from the
> failure is:
>
> http://storage.kernelci.org/next/next-20151231/arm-multi_v7_defconfig/lab-cambridge/boot-exynos5250-arndale.html
>
> which shows no output on the console once we start the kernel, a brief
> sampling of failing boards suggests this is the normal failure mode.
> x86 and arm64 targets seem fine (juno shows up as failing but the boot
> log seems fine so it's probably a false positive, Mustang was failing
> already) and there are a small number of ARM platforms that boot. I've
> not yet had any time to investigate further than that (including trying
> a revert of that commit), sorry.
>
I see the same problem with my qemu tests for arm.
Reverting the offending commit together with cd31fe16b585 ("mm/page_alloc.c:
set a zone_start_pfn value in zone_spanned_pages_in_node") fixes the problem.
Guenter
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 19:59 ` Steve Capper
@ 2016-01-06 10:32 ` Sudeep Holla
2016-01-06 15:56 ` Steve Capper
1 sibling, 0 replies; 14+ messages in thread
From: Sudeep Holla @ 2016-01-06 10:32 UTC (permalink / raw)
To: linux-arm-kernel
On 05/01/16 19:59, Steve Capper wrote:
> On 5 January 2016 at 12:21, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>> On 05/01/16 11:45, Mark Brown wrote:
>>>
>>> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
>>>>
>>>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
>>>>>
>>>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
>>>
>>>
>>>>>> Thanks. That patch has rather a blooper if
>>>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
>>>
>>>
>>>>> Seems to be what's making a difference from a quick run through, yes.
>>>
>>>
>>>> OK, thanks.
>>>
>>>
>>> Seems like I was mistaken here somehow or there's some other problem -
>>> I've kicked off another bisect for today's -next:
>>>
>>>
>>> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
>>>
>>> and will follow up with any results.
>>>
>>
>> With both patches applied(one already in today's -next), I am able to
>> boot on ARM64 platform but I get huge load(for each pfn) of below warning:
>>
>> -->8
>>
>> BUG: Bad page state in process swapper pfn:900000
>> page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0
>> flags: 0x0()
>> page dumped because: nonzero mapcount
>> Modules linked in:
>> Hardware name: ARM Juno development board (r0) (DT)
>> Call trace:
>> [<ffffffc000089830>] dump_backtrace+0x0/0x180
>> [<ffffffc0000899c4>] show_stack+0x14/0x20
>> [<ffffffc000335008>] dump_stack+0x90/0xc8
>> [<ffffffc0001531f8>] bad_page+0xd8/0x138
>> [<ffffffc000153470>] free_pages_prepare+0x218/0x290
>> [<ffffffc000154d4c>] __free_pages_ok+0x1c/0xb8
>> [<ffffffc000155638>] __free_pages+0x30/0x50
>> [<ffffffc00092fa9c>] __free_pages_bootmem+0xa0/0xa8
>> [<ffffffc0009321d0>] free_all_bootmem+0x11c/0x184
>> [<ffffffc000925264>] mem_init+0x48/0x1b4
>> [<ffffffc0009217e0>] start_kernel+0x224/0x3b4
>> [<0000000080663000>] 0x80663000
>> Disabling lock debugging due to kernel taint
>>
>> --
>
> I managed to get 904769ac82ebf60cb54f225f59ae7c064772a4d7 booting on
> an arm64 machine without errors with the following changes:
>
> =====================================
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a8bb70d..0edb608 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5013,6 +5013,15 @@ static inline unsigned long __meminit
> zone_spanned_pages_in_node(int nid,
> unsigned long *zone_end_pfn,
> unsigned long *zones_size)
> {
> + unsigned int zone;
> +
> + *zone_start_pfn = node_start_pfn;
> + for (zone = 0; zone < zone_type; zone++) {
> + *zone_start_pfn += zones_size[zone];
> + }
> +
> + *zone_end_pfn = *zone_start_pfn + zones_size[zone_type];
> +
> return zones_size[zone_type];
> }
>
> @@ -5328,6 +5337,8 @@ void __paginginit free_area_init_node(int nid,
> unsigned long *zones_size,
> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
> (u64)start_pfn << PAGE_SHIFT,
> end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
> +#else
> + start_pfn = node_start_pfn;
> #endif
> calculate_node_totalpages(pgdat, start_pfn, end_pfn,
> zones_size, zholes_size);
>
> =====================================
>
> My understanding is that 904769a ("mm/page_alloc.c: calculate
> zone_start_pfn at zone_spanned_pages_in_node()") inadvertently
> discards information when pgdat->node_start_pfn is removed from
> free_area_init_core (and zone_start_pfn is no longer updated by "size"
> in the loop inside free_area_init_core). This isn't an issue with
> systems where CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled as
> zone_start_pfn is set correctly. On systems without
> CONFIG_HAVE_MEMBLOCK_NODE_MAP, zone_start_pfn is always 0.
>
> When I ported the above fix to linux-next
> (8ef79cd05e6894c01ab9b41aa918a402fa8022a7) I was able to boot in a VM
> but not on my actual machine, I'll investigate that tomorrow.
>
It fixes the issue on real hardware too(Juno).
--
Regards,
Sudeep
^ permalink raw reply [flat|nested] 14+ messages in thread
* Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
2016-01-05 19:59 ` Steve Capper
2016-01-06 10:32 ` Sudeep Holla
@ 2016-01-06 15:56 ` Steve Capper
1 sibling, 0 replies; 14+ messages in thread
From: Steve Capper @ 2016-01-06 15:56 UTC (permalink / raw)
To: linux-arm-kernel
On 5 January 2016 at 19:59, Steve Capper <steve.capper@linaro.org> wrote:
> On 5 January 2016 at 12:21, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>> On 05/01/16 11:45, Mark Brown wrote:
>>>
>>> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
>>>>
>>>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
>>>>>
>>>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
>>>
>>>
>>>>>> Thanks. That patch has rather a blooper if
>>>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
>>>
>>>
>>>>> Seems to be what's making a difference from a quick run through, yes.
>>>
>>>
>>>> OK, thanks.
>>>
>>>
>>> Seems like I was mistaken here somehow or there's some other problem -
>>> I've kicked off another bisect for today's -next:
>>>
>>>
>>> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
>>>
>>> and will follow up with any results.
>>>
>>
>> With both patches applied(one already in today's -next), I am able to
>> boot on ARM64 platform but I get huge load(for each pfn) of below warning:
>>
>> -->8
>>
>> BUG: Bad page state in process swapper pfn:900000
>> page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0
>> flags: 0x0()
>> page dumped because: nonzero mapcount
>> Modules linked in:
>> Hardware name: ARM Juno development board (r0) (DT)
>> Call trace:
>> [<ffffffc000089830>] dump_backtrace+0x0/0x180
>> [<ffffffc0000899c4>] show_stack+0x14/0x20
>> [<ffffffc000335008>] dump_stack+0x90/0xc8
>> [<ffffffc0001531f8>] bad_page+0xd8/0x138
>> [<ffffffc000153470>] free_pages_prepare+0x218/0x290
>> [<ffffffc000154d4c>] __free_pages_ok+0x1c/0xb8
>> [<ffffffc000155638>] __free_pages+0x30/0x50
>> [<ffffffc00092fa9c>] __free_pages_bootmem+0xa0/0xa8
>> [<ffffffc0009321d0>] free_all_bootmem+0x11c/0x184
>> [<ffffffc000925264>] mem_init+0x48/0x1b4
>> [<ffffffc0009217e0>] start_kernel+0x224/0x3b4
>> [<0000000080663000>] 0x80663000
>> Disabling lock debugging due to kernel taint
>>
>> --
>
> I managed to get 904769ac82ebf60cb54f225f59ae7c064772a4d7 booting on
> an arm64 machine without errors with the following changes:
>
> =====================================
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a8bb70d..0edb608 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5013,6 +5013,15 @@ static inline unsigned long __meminit
> zone_spanned_pages_in_node(int nid,
> unsigned long *zone_end_pfn,
> unsigned long *zones_size)
> {
> + unsigned int zone;
> +
> + *zone_start_pfn = node_start_pfn;
> + for (zone = 0; zone < zone_type; zone++) {
> + *zone_start_pfn += zones_size[zone];
> + }
> +
> + *zone_end_pfn = *zone_start_pfn + zones_size[zone_type];
> +
> return zones_size[zone_type];
> }
>
> @@ -5328,6 +5337,8 @@ void __paginginit free_area_init_node(int nid,
> unsigned long *zones_size,
> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
> (u64)start_pfn << PAGE_SHIFT,
> end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
> +#else
> + start_pfn = node_start_pfn;
> #endif
> calculate_node_totalpages(pgdat, start_pfn, end_pfn,
> zones_size, zholes_size);
>
> =====================================
>
> My understanding is that 904769a ("mm/page_alloc.c: calculate
> zone_start_pfn at zone_spanned_pages_in_node()") inadvertently
> discards information when pgdat->node_start_pfn is removed from
> free_area_init_core (and zone_start_pfn is no longer updated by "size"
> in the loop inside free_area_init_core). This isn't an issue with
> systems where CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled as
> zone_start_pfn is set correctly. On systems without
> CONFIG_HAVE_MEMBLOCK_NODE_MAP, zone_start_pfn is always 0.
>
> When I ported the above fix to linux-next
> (8ef79cd05e6894c01ab9b41aa918a402fa8022a7) I was able to boot in a VM
> but not on my actual machine, I'll investigate that tomorrow.
I got 8ef79cd05e6894c01ab9b41aa918a402fa8022a7 working on my main
devboard with the fix above. (It turned out that I had a misconfigured
earlycon command line option and this became a problem with one of the
pl011 patches.)
Cheers,
--
Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-01-06 15:56 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-04 22:42 Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()" Mark Brown
2016-01-04 23:09 ` Andrew Morton
2016-01-04 23:55 ` Mark Brown
2016-01-05 0:35 ` Andrew Morton
2016-01-05 0:49 ` Stephen Rothwell
2016-01-05 5:47 ` Stephen Rothwell
2016-01-05 5:49 ` Andrew Morton
2016-01-05 11:45 ` Mark Brown
2016-01-05 12:21 ` Sudeep Holla
2016-01-05 19:24 ` Mark Brown
2016-01-05 19:59 ` Steve Capper
2016-01-06 10:32 ` Sudeep Holla
2016-01-06 15:56 ` Steve Capper
2016-01-06 0:22 ` Guenter Roeck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).