All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: Baoquan He <bhe@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, cai@lca.pw, mgorman@suse.de,
	mhocko@kernel.org
Subject: Re: [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages()
Date: Thu, 21 May 2020 12:26:12 +0300	[thread overview]
Message-ID: <20200521092612.GP1059226@linux.ibm.com> (raw)
In-Reply-To: <20200521014407.29690-1-bhe@redhat.com>

Hi Baoquan,

On Thu, May 21, 2020 at 09:44:07AM +0800, Baoquan He wrote:
> Qian reported that a crash happened in compaction.
> http://lkml.kernel.org/r/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
> 
V> LTP: starting swapping01 (swapping01 -i 5)
> page:ffffea0000aa0000 refcount:1 mapcount:0 mapping:000000002243743b index:0x0
> flags: 0x1fffe000001000(reserved)
> raw: 001fffe000001000 ffffea0000aa0008 ffffea0000aa0008 0000000000000000
> raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
> page_owner info is not present (never set?)
> ------------[ cut here ]------------
> kernel BUG at mm/page_alloc.c:533!

...

> After investigation, it turns out that this is introduced by commit of
> linux-next: commit f6edbdb71877 ("mm: memmap_init: iterate over memblock
> regions rather that check each PFN").
> 
> After investigation, it turns out that this is introduced by commit of
> linux-next, the patch subject is:
>   "mm: memmap_init: iterate over memblock regions rather that check each PFN".
> 
> Qian added debugging code. The debugging log shows that the fault page is
> 0x2a800000. From the system e820 map which is pasted at bottom, the page
> is in e820 reserved range:
> 	BIOS-e820: [mem 0x0000000029ffe000-0x000000002a80afff] reserved
> And it's in section [0x28000000, 0x2fffffff]. In that secion, there are
> several usable ranges and some e820 reserved ranges.
> 
> For this kind of e820 reserved range, it won't be added to memblock allocator.
> However, init_unavailable_mem() will initialize to add them into node 0,
> zone 0. Before that commit, later, memmap_init() will add e820 reserved
> ranges into the zone where they are contained, because it can pass
> the checking of early_pfn_valid() and early_pfn_in_nid(). In this case,
> the e820 reserved range where fault page 0x2a800000 is located is added
> into DMA32 zone. After that commit, the e820 reserved rgions are kept
> in node 0, zone 0, since we iterate over memblock regions to iniatialize
> in memmap_init() instead, their node and zone won't be changed.

I wonder why woudn't we add the reserved memory to memblock from the
very beginning...
I've tried to undestand why this was not done, but I couldn't find the
reasoning behind that.

Can you please try the below patch and see if it helps or, on the
contrary, breaks anything else :)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c5399e80c59c..b0940c618ed9 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1301,8 +1301,11 @@ void __init e820__memblock_setup(void)
 		if (end != (resource_size_t)end)
 			continue;
 
-		if (entry->type == E820_TYPE_SOFT_RESERVED)
+		if (entry->type == E820_TYPE_SOFT_RESERVED ||
+		    entry->type == E820_TYPE_RESERVED) {
+			memblock_add(entry->addr, entry->size);
 			memblock_reserve(entry->addr, entry->size);
+		}
 
 		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
 			continue;

> Now, fast_isolate_freepages() will use min mark directly as the migration
> target if no page is found from buddy. However, the min mark is not checked
> carefully, it could be in e820 reserved range, and trigger the
> VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)) when try to mark it
> as skip.
> 
> Here, let's call pageblock_pfn_to_page() to get page of min_pfn, since it
> will do careful checks and return NULL if the pfn is not qualified.
> 
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000008bfff] usable
> [    0.000000] BIOS-e820: [mem 0x000000000008c000-0x000000000009ffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000028328fff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000028329000-0x0000000028568fff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000028569000-0x0000000028d85fff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000028d86000-0x0000000028ee5fff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000028ee6000-0x0000000029a04fff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000029a05000-0x0000000029a08fff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000029a09000-0x0000000029ee4fff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000029ee5000-0x0000000029ef2fff] ACPI data
> [    0.000000] BIOS-e820: [mem 0x0000000029ef3000-0x0000000029f22fff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000029f23000-0x0000000029f23fff] ACPI data
> [    0.000000] BIOS-e820: [mem 0x0000000029f24000-0x0000000029f24fff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000029f25000-0x0000000029f28fff] ACPI data
> [    0.000000] BIOS-e820: [mem 0x0000000029f29000-0x0000000029f29fff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x0000000029f2a000-0x0000000029f2bfff] ACPI data
> [    0.000000] BIOS-e820: [mem 0x0000000029f2c000-0x0000000029f2cfff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000029f2d000-0x0000000029f2ffff] ACPI data
> [    0.000000] BIOS-e820: [mem 0x0000000029f30000-0x0000000029ffdfff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000029ffe000-0x000000002a80afff] reserved
> [    0.000000] BIOS-e820: [mem 0x000000002a80b000-0x000000002a80efff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x000000002a80f000-0x000000002a81ffff] reserved
> [    0.000000] BIOS-e820: [mem 0x000000002a820000-0x000000002a822fff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x000000002a823000-0x0000000033a22fff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000033a23000-0x0000000033a32fff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000033a33000-0x0000000033a42fff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x0000000033a43000-0x0000000033a52fff] ACPI data
> [    0.000000] BIOS-e820: [mem 0x0000000033a53000-0x0000000077ffffff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000078000000-0x000000008fffffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed80fff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000087effffff] usable
> [    0.000000] BIOS-e820: [mem 0x000000087f000000-0x000000087fffffff] reserved
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/compaction.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 67fd317f78db..9ce4cff4d407 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1418,7 +1418,9 @@ fast_isolate_freepages(struct compact_control *cc)
>  				cc->free_pfn = highest;
>  			} else {
>  				if (cc->direct_compaction && pfn_valid(min_pfn)) {
> -					page = pfn_to_page(min_pfn);
> +					page = pageblock_pfn_to_page(min_pfn,
> +						pageblock_end_pfn(min_pfn),
> +						cc->zone);
>  					cc->free_pfn = min_pfn;
>  				}
>  			}
> -- 
> 2.17.2
> 

-- 
Sincerely yours,
Mike.


  reply	other threads:[~2020-05-21  9:26 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21  1:44 [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages() Baoquan He
2020-05-21  9:26 ` Mike Rapoport [this message]
2020-05-21 15:52   ` Baoquan He
2020-05-21 17:18     ` Mike Rapoport
2020-05-22  7:01       ` Baoquan He
2020-05-22  7:25         ` Baoquan He
2020-05-22 14:20           ` Mike Rapoport
2020-05-26  8:45             ` Baoquan He
2020-05-26  8:55               ` David Hildenbrand
2020-05-26 11:32               ` Mike Rapoport
2020-05-26 11:49                 ` David Hildenbrand
2020-05-28  9:07                   ` Baoquan He
2020-05-28  9:08                     ` David Hildenbrand
2020-05-28 15:15                     ` Steve Wahl
2020-06-01 11:42                       ` Mike Rapoport
2020-06-01 13:31                         ` Baoquan He
2020-06-01 18:41                           ` [RFC PATCH] mm: Add a new page type to mark unavailable pages kbuild test robot
2020-06-01 19:07                           ` kbuild test robot
2020-06-03  1:28                             ` Baoquan He
2020-05-21  9:36 ` [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages() Mel Gorman
2020-05-21 13:38   ` Mike Rapoport
2020-05-21 15:41   ` Baoquan He
  -- strict thread matches above, loose matches on Subject: below --
2020-05-28  8:59 [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages()^[ Baoquan He
2020-05-28  8:59 ` [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages() ^[ Baoquan He
2020-05-28  9:08 ` [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages()^[ Baoquan He
2020-05-28  9:08   ` [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages() ^[ Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200521092612.GP1059226@linux.ibm.com \
    --to=rppt@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.