From: Mike Rapoport <rppt@linux.ibm.com>
To: "Łukasz Majczak" <lma@semihalf.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Radosław Biernacki" <rad@semihalf.com>,
"Marcin Wojtas" <mw@semihalf.com>,
"Alex Levin" <levinale@google.com>,
"Guenter Roeck" <groeck@google.com>,
"Jesse Barnes" <jsbarnes@google.com>,
"Chris Wilson" <chris@chris-wilson.co.uk>,
"Sarvela, Tomi P" <tomi.p.sarvela@intel.com>
Subject: Re: PROBLEM: Crash after mm: fix initialization of struct page for holes in memory layout
Date: Wed, 27 Jan 2021 20:26:51 +0200 [thread overview]
Message-ID: <20210127182651.GA281042@linux.ibm.com> (raw)
In-Reply-To: <CAFJ_xbrwLwgDfCyHA=PmJ8j_3dJXqVNxmv7e+ATQAAa9n3de2w@mail.gmail.com>
Hi Lukasz,
On Wed, Jan 27, 2021 at 02:15:53PM +0100, Łukasz Majczak wrote:
> Hi Mike,
>
> I have started bisecting your patch and I have figured out that there
> might be something wrong with clamping - with comments out these lines
> it started to work.
> The full log (with logs from below patch) can be found here:
> https://gist.github.com/semihalf-majczak-lukasz/3cecbab0ddc59a6c3ce11ddc29645725
> it's fresh - I haven't analyze it yet, just sharing with hope it will help.
Thanks, that helps!
The first page is never considered by the kernel as memory and so
arch_zone_lowest_possible_pfn[ZONE_DMA] is set to 0x1000. As the result,
init_unavailable_mem() skips pfn 0 and then __SetPageReserved(page) in
reserve_bootmem_region() panics because the struct page for pfn 0 remains
poisoned.
Can you please try the below patch on top of v5.11-rc5?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 783913e41f65..3ce9ef238dfc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7083,10 +7083,11 @@ void __init free_area_init_memoryless_node(int nid)
static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn,
int zone, int nid)
{
- unsigned long pfn, zone_spfn, zone_epfn;
+ unsigned long pfn, zone_spfn = 0, zone_epfn;
u64 pgcnt = 0;
- zone_spfn = arch_zone_lowest_possible_pfn[zone];
+ if (zone > 0)
+ zone_spfn = arch_zone_highest_possible_pfn[zone - 1];
zone_epfn = arch_zone_highest_possible_pfn[zone];
spfn = clamp(spfn, zone_spfn, zone_epfn);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index eed54ce26ad1..9f4468c413a1 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7093,9 +7093,11 @@ static u64 __init
> init_unavailable_range(unsigned long spfn, unsigned long epfn,
> zone_spfn = arch_zone_lowest_possible_pfn[zone];
> zone_epfn = arch_zone_highest_possible_pfn[zone];
>
> - spfn = clamp(spfn, zone_spfn, zone_epfn);
> - epfn = clamp(epfn, zone_spfn, zone_epfn);
> -
> + //spfn = clamp(spfn, zone_spfn, zone_epfn);
> + //epfn = clamp(epfn, zone_spfn, zone_epfn);
> + pr_info("LMA DBG: zone_spfn: %llx, zone_epfn %llx\n",
> zone_spfn, zone_epfn);
> + pr_info("LMA DBG: spfn: %llx, epfn %llx\n", spfn, epfn);
> + pr_info("LMA DBG: clamp_spfn: %llx, clamp_epfn %llx\n",
> clamp(spfn, zone_spfn, zone_epfn), clamp(epfn, zone_spfn, zone_epfn));
> for (pfn = spfn; pfn < epfn; pfn++) {
> if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
> pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
>
> Best regards,
> Lukasz
>
>
> śr., 27 sty 2021 o 13:15 Łukasz Majczak <lma@semihalf.com> napisał(a):
> >
> > Unfortunately nothing :( my current kernel command line contains:
> > console=ttyS0,115200n8 debug earlyprintk=serial loglevel=7
> >
> > I was thinking about using earlycon, but it seems to be blocked.
> > (I think the lack of earlycon might be related to Chromebook HW
> > security design. There is an EC controller which is a part of AP ->
> > serial chain as kernel messages are considered sensitive from a
> > security standpoint.)
> >
> > Best regards,
> > Lukasz
> >
> > śr., 27 sty 2021 o 12:19 Mike Rapoport <rppt@linux.ibm.com> napisał(a):
> > >
> > > On Wed, Jan 27, 2021 at 11:08:17AM +0100, Łukasz Majczak wrote:
> > > > Hi Mike,
> > > >
> > > > Actually I have a serial console attached (via servo device), but
> > > > there is no output :( and also the reboot/crash is very fast/immediate
> > > > after power on.
> > >
> > > If you boot with earlyprintk=serial are there any messages?
> > >
> > > > Best regards
> > > > Lukasz
> > > >
> > > > śr., 27 sty 2021 o 11:05 Mike Rapoport <rppt@linux.ibm.com> napisał(a):
> > > > >
> > > > > Hi Lukasz,
> > > > >
> > > > > On Wed, Jan 27, 2021 at 10:22:29AM +0100, Łukasz Majczak wrote:
> > > > > > Crash after mm: fix initialization of struct page for holes in memory layout
> > > > > >
> > > > > > Hi,
> > > > > > I was trying to run v5.11-rc5 on my Samsung Chromebook Pro (Caroline),
> > > > > > but I've noticed it has crashed - unfortunately it seems to happen at
> > > > > > a very early stage - No output to the console nor to the screen, so I
> > > > > > have started a bisect (between 5.11-rc4 - which works just find - and
> > > > > > 5.11-rc5),
> > > > > > bisect results points to:
> > > > > >
> > > > > > d3921cb8be29 mm: fix initialization of struct page for holes in memory layout
> > > > > >
> > > > > > Reproduction is just to build and load the kernel.
> > > > > >
> > > > > > If it will help any how I am attaching:
> > > > > > - /proc/cpuinfo (from healthy system):
> > > > > > https://gist.github.com/semihalf-majczak-lukasz/3517867bf39f07377c1a785b64a97066
> > > > > > - my .config file (for a broken system):
> > > > > > https://gist.github.com/semihalf-majczak-lukasz/584b329f1bf3e43b53efe8e18b5da33c
> > > > > >
> > > > > > If there is anything I could add/do/test to help fix this please let me know.
> > > > >
> > > > > Chris Wilson also reported boot failures on several Chromebooks:
> > > > >
> > > > > https://lore.kernel.org/lkml/161160687463.28991.354987542182281928@build.alporthouse.com
> > > > >
> > > > > I presume serial console is not an option, so if you could boot with
> > > > > earlyprintk=vga and see if there is anything useful printed on the screen
> > > > > it would be really helpful.
> > > > >
> > > > > > Best regards
> > > > > > Lukasz
> > > > >
> > > > > --
> > > > > Sincerely yours,
> > > > > Mike.
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2021-01-27 18:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-27 9:22 PROBLEM: Crash after mm: fix initialization of struct page for holes in memory layout Łukasz Majczak
2021-01-27 10:04 ` Mike Rapoport
2021-01-27 10:08 ` Łukasz Majczak
2021-01-27 11:18 ` Mike Rapoport
2021-01-27 12:15 ` Łukasz Majczak
2021-01-27 13:15 ` Łukasz Majczak
2021-01-27 18:26 ` Mike Rapoport [this message]
2021-01-27 19:18 ` Łukasz Majczak
2021-01-28 2:45 ` Baoquan He
2021-01-28 9:31 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210127182651.GA281042@linux.ibm.com \
--to=rppt@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=chris@chris-wilson.co.uk \
--cc=groeck@google.com \
--cc=jsbarnes@google.com \
--cc=levinale@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lma@semihalf.com \
--cc=mw@semihalf.com \
--cc=rad@semihalf.com \
--cc=tomi.p.sarvela@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.