Re: Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sudeep Holla <sudeep.holla@arm.com>
To: Steve Capper <steve.capper@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>,
	Mark Brown <broonie@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Tony Luck <tony.luck@intel.com>,
	Russell King <linux@arm.linux.org.uk>,
	Kernel Build Reports Mailman List
	<kernel-build-reports@lists.linaro.org>,
	Mel Gorman <mel@csn.ul.ie>, Tyler Baker <Tyler.Baker@linaro.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Kevin.Hilman@linaro.org, linux-next@vger.kernel.org,
	Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Xishi Qiu <qiuxishi@huawei.com>,
	Taku Izumi <izumi.taku@jp.fujitsu.com>,
	Matt Fleming <matt@codeblueprint.co.uk>
Subject: Re: Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
Date: Wed, 6 Jan 2016 10:32:20 +0000	[thread overview]
Message-ID: <568CED34.5010905@arm.com> (raw)
In-Reply-To: <CAPvkgC3w0oy53HTwDtqEXKoabL44i06hAXndno-Ms83L0Tm0CA@mail.gmail.com>



On 05/01/16 19:59, Steve Capper wrote:
> On 5 January 2016 at 12:21, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>> On 05/01/16 11:45, Mark Brown wrote:
>>>
>>> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
>>>>
>>>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
>>>>>
>>>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
>>>
>>>
>>>>>> Thanks.  That patch has rather a blooper if
>>>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n.  Is that the case in your testing?
>>>
>>>
>>>>> Seems to be what's making a difference from a quick run through, yes.
>>>
>>>
>>>> OK, thanks.
>>>
>>>
>>> Seems like I was mistaken here somehow or there's some other problem -
>>> I've kicked off another bisect for today's -next:
>>>
>>>
>>> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
>>>
>>> and will follow up with any results.
>>>
>>
>> With both patches applied(one already in today's -next), I am able to
>> boot on ARM64 platform but I get huge load(for each pfn) of below warning:
>>
>> -->8
>>
>> BUG: Bad page state in process swapper  pfn:900000
>> page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0
>> flags: 0x0()
>> page dumped because: nonzero mapcount
>> Modules linked in:
>> Hardware name: ARM Juno development board (r0) (DT)
>> Call trace:
>> [<ffffffc000089830>] dump_backtrace+0x0/0x180
>> [<ffffffc0000899c4>] show_stack+0x14/0x20
>> [<ffffffc000335008>] dump_stack+0x90/0xc8
>> [<ffffffc0001531f8>] bad_page+0xd8/0x138
>> [<ffffffc000153470>] free_pages_prepare+0x218/0x290
>> [<ffffffc000154d4c>] __free_pages_ok+0x1c/0xb8
>> [<ffffffc000155638>] __free_pages+0x30/0x50
>> [<ffffffc00092fa9c>] __free_pages_bootmem+0xa0/0xa8
>> [<ffffffc0009321d0>] free_all_bootmem+0x11c/0x184
>> [<ffffffc000925264>] mem_init+0x48/0x1b4
>> [<ffffffc0009217e0>] start_kernel+0x224/0x3b4
>> [<0000000080663000>] 0x80663000
>> Disabling lock debugging due to kernel taint
>>
>> --
>
> I managed to get 904769ac82ebf60cb54f225f59ae7c064772a4d7 booting on
> an arm64 machine without errors with the following changes:
>
> =====================================
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a8bb70d..0edb608 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5013,6 +5013,15 @@ static inline unsigned long __meminit
> zone_spanned_pages_in_node(int nid,
>                                          unsigned long *zone_end_pfn,
>                                          unsigned long *zones_size)
>   {
> +       unsigned int zone;
> +
> +       *zone_start_pfn = node_start_pfn;
> +       for (zone = 0; zone < zone_type; zone++) {
> +               *zone_start_pfn += zones_size[zone];
> +       }
> +
> +       *zone_end_pfn = *zone_start_pfn + zones_size[zone_type];
> +
>          return zones_size[zone_type];
>   }
>
> @@ -5328,6 +5337,8 @@ void __paginginit free_area_init_node(int nid,
> unsigned long *zones_size,
>          pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
>                  (u64)start_pfn << PAGE_SHIFT,
>                  end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
> +#else
> +       start_pfn = node_start_pfn;
>   #endif
>          calculate_node_totalpages(pgdat, start_pfn, end_pfn,
>                                    zones_size, zholes_size);
>
> =====================================
>
> My understanding is that 904769a ("mm/page_alloc.c: calculate
> zone_start_pfn at zone_spanned_pages_in_node()") inadvertently
> discards information when pgdat->node_start_pfn is removed from
> free_area_init_core (and zone_start_pfn is no longer updated by "size"
> in the loop inside free_area_init_core). This isn't an issue with
> systems where CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled as
> zone_start_pfn is set correctly. On systems without
> CONFIG_HAVE_MEMBLOCK_NODE_MAP, zone_start_pfn is always 0.
>
> When I ported the above fix to linux-next
> (8ef79cd05e6894c01ab9b41aa918a402fa8022a7) I was able to boot in a VM
> but not on my actual machine, I'll investigate that tomorrow.
>

It fixes the issue on real hardware too(Juno).

-- 
Regards,
Sudeep

WARNING: multiple messages have this Message-ID (diff)

From: sudeep.holla@arm.com (Sudeep Holla)
To: linux-arm-kernel@lists.infradead.org
Subject: Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"
Date: Wed, 6 Jan 2016 10:32:20 +0000	[thread overview]
Message-ID: <568CED34.5010905@arm.com> (raw)
In-Reply-To: <CAPvkgC3w0oy53HTwDtqEXKoabL44i06hAXndno-Ms83L0Tm0CA@mail.gmail.com>



On 05/01/16 19:59, Steve Capper wrote:
> On 5 January 2016 at 12:21, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>> On 05/01/16 11:45, Mark Brown wrote:
>>>
>>> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
>>>>
>>>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
>>>>>
>>>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
>>>
>>>
>>>>>> Thanks.  That patch has rather a blooper if
>>>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n.  Is that the case in your testing?
>>>
>>>
>>>>> Seems to be what's making a difference from a quick run through, yes.
>>>
>>>
>>>> OK, thanks.
>>>
>>>
>>> Seems like I was mistaken here somehow or there's some other problem -
>>> I've kicked off another bisect for today's -next:
>>>
>>>
>>> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
>>>
>>> and will follow up with any results.
>>>
>>
>> With both patches applied(one already in today's -next), I am able to
>> boot on ARM64 platform but I get huge load(for each pfn) of below warning:
>>
>> -->8
>>
>> BUG: Bad page state in process swapper  pfn:900000
>> page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0
>> flags: 0x0()
>> page dumped because: nonzero mapcount
>> Modules linked in:
>> Hardware name: ARM Juno development board (r0) (DT)
>> Call trace:
>> [<ffffffc000089830>] dump_backtrace+0x0/0x180
>> [<ffffffc0000899c4>] show_stack+0x14/0x20
>> [<ffffffc000335008>] dump_stack+0x90/0xc8
>> [<ffffffc0001531f8>] bad_page+0xd8/0x138
>> [<ffffffc000153470>] free_pages_prepare+0x218/0x290
>> [<ffffffc000154d4c>] __free_pages_ok+0x1c/0xb8
>> [<ffffffc000155638>] __free_pages+0x30/0x50
>> [<ffffffc00092fa9c>] __free_pages_bootmem+0xa0/0xa8
>> [<ffffffc0009321d0>] free_all_bootmem+0x11c/0x184
>> [<ffffffc000925264>] mem_init+0x48/0x1b4
>> [<ffffffc0009217e0>] start_kernel+0x224/0x3b4
>> [<0000000080663000>] 0x80663000
>> Disabling lock debugging due to kernel taint
>>
>> --
>
> I managed to get 904769ac82ebf60cb54f225f59ae7c064772a4d7 booting on
> an arm64 machine without errors with the following changes:
>
> =====================================
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a8bb70d..0edb608 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5013,6 +5013,15 @@ static inline unsigned long __meminit
> zone_spanned_pages_in_node(int nid,
>                                          unsigned long *zone_end_pfn,
>                                          unsigned long *zones_size)
>   {
> +       unsigned int zone;
> +
> +       *zone_start_pfn = node_start_pfn;
> +       for (zone = 0; zone < zone_type; zone++) {
> +               *zone_start_pfn += zones_size[zone];
> +       }
> +
> +       *zone_end_pfn = *zone_start_pfn + zones_size[zone_type];
> +
>          return zones_size[zone_type];
>   }
>
> @@ -5328,6 +5337,8 @@ void __paginginit free_area_init_node(int nid,
> unsigned long *zones_size,
>          pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
>                  (u64)start_pfn << PAGE_SHIFT,
>                  end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
> +#else
> +       start_pfn = node_start_pfn;
>   #endif
>          calculate_node_totalpages(pgdat, start_pfn, end_pfn,
>                                    zones_size, zholes_size);
>
> =====================================
>
> My understanding is that 904769a ("mm/page_alloc.c: calculate
> zone_start_pfn at zone_spanned_pages_in_node()") inadvertently
> discards information when pgdat->node_start_pfn is removed from
> free_area_init_core (and zone_start_pfn is no longer updated by "size"
> in the loop inside free_area_init_core). This isn't an issue with
> systems where CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled as
> zone_start_pfn is set correctly. On systems without
> CONFIG_HAVE_MEMBLOCK_NODE_MAP, zone_start_pfn is always 0.
>
> When I ported the above fix to linux-next
> (8ef79cd05e6894c01ab9b41aa918a402fa8022a7) I was able to boot in a VM
> but not on my actual machine, I'll investigate that tomorrow.
>

It fixes the issue on real hardware too(Juno).

-- 
Regards,
Sudeep

next prev parent reply	other threads:[~2016-01-06 10:32 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-04 22:42 Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()" Mark Brown
2016-01-04 22:42 ` Mark Brown
2016-01-04 23:09 ` Andrew Morton
2016-01-04 23:09   ` Andrew Morton
2016-01-04 23:55   ` Mark Brown
2016-01-04 23:55     ` Mark Brown
2016-01-05  0:35     ` Andrew Morton
2016-01-05  0:35       ` Andrew Morton
2016-01-05  0:49       ` Stephen Rothwell
2016-01-05  0:49         ` Stephen Rothwell
2016-01-05  5:47         ` Stephen Rothwell
2016-01-05  5:47           ` Stephen Rothwell
2016-01-05  5:49           ` Andrew Morton
2016-01-05  5:49             ` Andrew Morton
2016-01-05 11:45       ` Mark Brown
2016-01-05 11:45         ` Mark Brown
2016-01-05 12:21         ` Sudeep Holla
2016-01-05 12:21           ` Sudeep Holla
2016-01-05 19:24           ` Mark Brown
2016-01-05 19:24             ` Mark Brown
2016-01-05 19:59           ` Steve Capper
2016-01-05 19:59             ` Steve Capper
2016-01-06 10:32             ` Sudeep Holla [this message]
2016-01-06 10:32               ` Sudeep Holla
2016-01-06 15:56             ` Steve Capper
2016-01-06 15:56               ` Steve Capper
2016-01-06  0:22 ` Guenter Roeck
2016-01-06  0:22   ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=568CED34.5010905@arm.com \
    --to=sudeep.holla@arm.com \
    --cc=Kevin.Hilman@linaro.org \
    --cc=Tyler.Baker@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=broonie@kernel.org \
    --cc=dave.hansen@intel.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kernel-build-reports@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=matt@codeblueprint.co.uk \
    --cc=mel@csn.ul.ie \
    --cc=qiuxishi@huawei.com \
    --cc=sfr@canb.auug.org.au \
    --cc=steve.capper@linaro.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.