From: Sudeep Holla <sudeep.holla@arm.com>
To: Steve Capper <steve.capper@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>,
Mark Brown <broonie@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Stephen Rothwell <sfr@canb.auug.org.au>,
Tony Luck <tony.luck@intel.com>,
Russell King <linux@arm.linux.org.uk>,
Kernel Build Reports Mailman List
<kernel-build-reports@lists.linaro.org>,
Mel Gorman <mel@csn.ul.ie>, Tyler Baker <Tyler.Baker@linaro.org>,
Dave Hansen <dave.hansen@intel.com>,
Kevin.Hilman@linaro.org, linux-next@vger.kernel.org,
Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Xishi Qiu <qiuxishi@huawei.com>,
Taku Izumi <izumi.taku@jp.fujitsu.com>,
Matt Fleming <matt@codeblueprint.co.uk>
Subject: Re: Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
Date: Wed, 6 Jan 2016 10:32:20 +0000 [thread overview]
Message-ID: <568CED34.5010905@arm.com> (raw)
In-Reply-To: <CAPvkgC3w0oy53HTwDtqEXKoabL44i06hAXndno-Ms83L0Tm0CA@mail.gmail.com>
On 05/01/16 19:59, Steve Capper wrote:
> On 5 January 2016 at 12:21, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>> On 05/01/16 11:45, Mark Brown wrote:
>>>
>>> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
>>>>
>>>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
>>>>>
>>>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
>>>
>>>
>>>>>> Thanks. That patch has rather a blooper if
>>>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing?
>>>
>>>
>>>>> Seems to be what's making a difference from a quick run through, yes.
>>>
>>>
>>>> OK, thanks.
>>>
>>>
>>> Seems like I was mistaken here somehow or there's some other problem -
>>> I've kicked off another bisect for today's -next:
>>>
>>>
>>> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
>>>
>>> and will follow up with any results.
>>>
>>
>> With both patches applied(one already in today's -next), I am able to
>> boot on ARM64 platform but I get huge load(for each pfn) of below warning:
>>
>> -->8
>>
>> BUG: Bad page state in process swapper pfn:900000
>> page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0
>> flags: 0x0()
>> page dumped because: nonzero mapcount
>> Modules linked in:
>> Hardware name: ARM Juno development board (r0) (DT)
>> Call trace:
>> [<ffffffc000089830>] dump_backtrace+0x0/0x180
>> [<ffffffc0000899c4>] show_stack+0x14/0x20
>> [<ffffffc000335008>] dump_stack+0x90/0xc8
>> [<ffffffc0001531f8>] bad_page+0xd8/0x138
>> [<ffffffc000153470>] free_pages_prepare+0x218/0x290
>> [<ffffffc000154d4c>] __free_pages_ok+0x1c/0xb8
>> [<ffffffc000155638>] __free_pages+0x30/0x50
>> [<ffffffc00092fa9c>] __free_pages_bootmem+0xa0/0xa8
>> [<ffffffc0009321d0>] free_all_bootmem+0x11c/0x184
>> [<ffffffc000925264>] mem_init+0x48/0x1b4
>> [<ffffffc0009217e0>] start_kernel+0x224/0x3b4
>> [<0000000080663000>] 0x80663000
>> Disabling lock debugging due to kernel taint
>>
>> --
>
> I managed to get 904769ac82ebf60cb54f225f59ae7c064772a4d7 booting on
> an arm64 machine without errors with the following changes:
>
> =====================================
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a8bb70d..0edb608 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5013,6 +5013,15 @@ static inline unsigned long __meminit
> zone_spanned_pages_in_node(int nid,
> unsigned long *zone_end_pfn,
> unsigned long *zones_size)
> {
> + unsigned int zone;
> +
> + *zone_start_pfn = node_start_pfn;
> + for (zone = 0; zone < zone_type; zone++) {
> + *zone_start_pfn += zones_size[zone];
> + }
> +
> + *zone_end_pfn = *zone_start_pfn + zones_size[zone_type];
> +
> return zones_size[zone_type];
> }
>
> @@ -5328,6 +5337,8 @@ void __paginginit free_area_init_node(int nid,
> unsigned long *zones_size,
> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
> (u64)start_pfn << PAGE_SHIFT,
> end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
> +#else
> + start_pfn = node_start_pfn;
> #endif
> calculate_node_totalpages(pgdat, start_pfn, end_pfn,
> zones_size, zholes_size);
>
> =====================================
>
> My understanding is that 904769a ("mm/page_alloc.c: calculate
> zone_start_pfn at zone_spanned_pages_in_node()") inadvertently
> discards information when pgdat->node_start_pfn is removed from
> free_area_init_core (and zone_start_pfn is no longer updated by "size"
> in the loop inside free_area_init_core). This isn't an issue with
> systems where CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled as
> zone_start_pfn is set correctly. On systems without
> CONFIG_HAVE_MEMBLOCK_NODE_MAP, zone_start_pfn is always 0.
>
> When I ported the above fix to linux-next
> (8ef79cd05e6894c01ab9b41aa918a402fa8022a7) I was able to boot in a VM
> but not on my actual machine, I'll investigate that tomorrow.
>
It fixes the issue on real hardware too(Juno).
--
Regards,
Sudeep
next prev parent reply other threads:[~2016-01-06 10:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-04 22:42 Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()" Mark Brown
2016-01-04 23:09 ` Andrew Morton
2016-01-04 23:55 ` Mark Brown
2016-01-05 0:35 ` Andrew Morton
2016-01-05 0:49 ` Stephen Rothwell
2016-01-05 5:47 ` Stephen Rothwell
2016-01-05 5:49 ` Andrew Morton
2016-01-05 11:45 ` Mark Brown
2016-01-05 12:21 ` Sudeep Holla
2016-01-05 19:24 ` Mark Brown
2016-01-05 19:59 ` Steve Capper
2016-01-06 10:32 ` Sudeep Holla [this message]
2016-01-06 15:56 ` Steve Capper
2016-01-06 0:22 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=568CED34.5010905@arm.com \
--to=sudeep.holla@arm.com \
--cc=Kevin.Hilman@linaro.org \
--cc=Tyler.Baker@linaro.org \
--cc=akpm@linux-foundation.org \
--cc=broonie@kernel.org \
--cc=dave.hansen@intel.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kernel-build-reports@lists.linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-next@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=matt@codeblueprint.co.uk \
--cc=mel@csn.ul.ie \
--cc=qiuxishi@huawei.com \
--cc=sfr@canb.auug.org.au \
--cc=steve.capper@linaro.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).