linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH] mm, page_alloc: check pfn is valid before moving to freelist
Date: Wed, 13 Apr 2022 23:48:36 +0300	[thread overview]
Message-ID: <Ylc3JGy6DUq00ryv@linux.ibm.com> (raw)
In-Reply-To: <fb3c8c008994b2ed96f74b6b9698ff998b689bd2.1649794059.git.quic_sudaraja@quicinc.com>

On Tue, Apr 12, 2022 at 01:16:23PM -0700, Sudarshan Rajagopalan wrote:
> Check if pfn is valid before or not before moving it to freelist.
> 
> There are possible scenario where a pageblock can have partial physical
> hole and partial part of System RAM. This happens when base address in RAM
> partition table is not aligned to pageblock size.
> 
> Example:
> 
> Say we have this first two entries in RAM partition table -
> 
> Base Addr: 0x0000000080000000 Length: 0x0000000058000000
> Base Addr: 0x00000000E3930000 Length: 0x0000000020000000

I wonder what was done to memory DIMMs to get such an interesting
physical memory layout...

> ...
> 
> Physical hole: 0xD8000000 - 0xE3930000
> 
> On system having 4K as page size and hence pageblock size being 4MB, the
> base address 0xE3930000 is not aligned to 4MB pageblock size.
> 
> Now we will have pageblock which has partial physical hole and partial part
> of System RAM -
> 
> Pageblock [0xE3800000 - 0xE3C00000] -
> 	0xE3800000 - 0xE3930000 -- physical hole
> 	0xE3930000 - 0xE3C00000 -- System RAM
> 
> Now doing __alloc_pages say we get a valid page with PFN 0xE3B00 from
> __rmqueue_fallback, we try to put other pages from the same pageblock as well
> into freelist by calling steal_suitable_fallback().
> 
> We then search for freepages from start of the pageblock due to below code -
> 
>  move_freepages_block(zone, page, migratetype, ...)
> {
>     pfn = page_to_pfn(page);
>     start_pfn = pfn & ~(pageblock_nr_pages - 1);
>     end_pfn = start_pfn + pageblock_nr_pages - 1;
> ...
> }
> 
> With the pageblock which has partial physical hole at the beginning, we will
> run into PFNs from the physical hole whose struct page is not initialized and
> is invalid, and system would crash as we operate on invalid struct page to find
> out of page is in Buddy or LRU or not

struct page must be initialized and valid even for holes in the physical
memory. When a pageblock spans both existing memory and a hole, the struct
pages for the "hole" part should be marked as PG_Reserved. 
 
If you see that struct pages for memory holes exist but invalid, we should
solve the underlying issue that causes wrong struct pages contents.

> [  107.629453][ T9688] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [  107.639214][ T9688] Mem abort info:
> [  107.642829][ T9688]   ESR = 0x96000006
> [  107.646696][ T9688]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  107.652878][ T9688]   SET = 0, FnV = 0
> [  107.656751][ T9688]   EA = 0, S1PTW = 0
> [  107.660705][ T9688]   FSC = 0x06: level 2 translation fault
> [  107.666455][ T9688] Data abort info:
> [  107.670151][ T9688]   ISV = 0, ISS = 0x00000006
> [  107.674827][ T9688]   CM = 0, WnR = 0
> [  107.678615][ T9688] user pgtable: 4k pages, 39-bit VAs, pgdp=000000098a237000
> [  107.685970][ T9688] [0000000000000000] pgd=0800000987170003, p4d=0800000987170003, pud=0800000987170003, pmd=0000000000000000
> [  107.697582][ T9688] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> 
> [  108.209839][ T9688] pc : move_freepages_block+0x174/0x27c

can you post fadd2line for this address?

> [  108.215407][ T9688] lr : steal_suitable_fallback+0x20c/0x398
> 
> [  108.305908][ T9688] Call trace:
> [  108.309151][ T9688]  move_freepages_block+0x174/0x27c        [PageLRU]
> [  108.314359][ T9688]  steal_suitable_fallback+0x20c/0x398
> [  108.319826][ T9688]  rmqueue_bulk+0x250/0x934
> [  108.324325][ T9688]  rmqueue_pcplist+0x178/0x2ac
> [  108.329086][ T9688]  rmqueue+0x5c/0xc10
> [  108.333048][ T9688]  get_page_from_freelist+0x19c/0x430
> [  108.338430][ T9688]  __alloc_pages+0x134/0x424
> [  108.343017][ T9688]  page_cache_ra_unbounded+0x120/0x324
> [  108.348494][ T9688]  do_sync_mmap_readahead+0x1b0/0x234
> [  108.353878][ T9688]  filemap_fault+0xe0/0x4c8
> [  108.358375][ T9688]  do_fault+0x168/0x6cc
> [  108.362518][ T9688]  handle_mm_fault+0x5c4/0x848
> [  108.367280][ T9688]  do_page_fault+0x3fc/0x5d0
> [  108.371867][ T9688]  do_translation_fault+0x6c/0x1b0
> [  108.376985][ T9688]  do_mem_abort+0x68/0x10c
> [  108.381389][ T9688]  el0_ia+0x50/0xbc
> [  108.385175][ T9688]  el0t_32_sync_handler+0x88/0xbc
> [  108.390208][ T9688]  el0t_32_sync+0x1b8/0x1bc
> 
> Hence, avoid operating on invalid pages within the same pageblock by checking
> if pfn is valid or not.

> Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>
> Fixes: 4c7b9896621be ("mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE")
> Cc: Mike Rapoport <rppt@linux.ibm.com>

For now the patch looks like a band-aid for more fundamental bug, so

NAKED-by: Mike Rapoport <rppt@linux.ibm.com>


> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> ---
>  mm/page_alloc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6e5b448..e87aa053 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2521,6 +2521,11 @@ static int move_freepages(struct zone *zone,
>  	int pages_moved = 0;
>  
>  	for (pfn = start_pfn; pfn <= end_pfn;) {
> +		if (!pfn_valid(pfn)) {
> +			pfn++;
> +			continue;
> +		}
> +
>  		page = pfn_to_page(pfn);
>  		if (!PageBuddy(page)) {
>  			/*
> -- 
> 2.7.4
> 

-- 
Sincerely yours,
Mike.


  parent reply	other threads:[~2022-04-13 20:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 20:16 [PATCH] mm, page_alloc: check pfn is valid before moving to freelist Sudarshan Rajagopalan
2022-04-12 20:16 ` Sudarshan Rajagopalan
2022-04-12 20:59   ` Andrew Morton
2022-04-12 21:05     ` David Rientjes
2022-04-13 20:55       ` Mike Rapoport
2022-04-14 14:02         ` David Hildenbrand
2022-04-13 20:48   ` Mike Rapoport [this message]
2022-04-14 21:00     ` Sudarshan Rajagopalan
2022-04-18  7:24       ` Mike Rapoport
2022-04-18 22:32         ` Sudarshan Rajagopalan
2022-04-19  6:45           ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ylc3JGy6DUq00ryv@linux.ibm.com \
    --to=rppt@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quic_sudaraja@quicinc.com \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).