All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	mgorman@suse.de, zhlcindy@linux.vnet.ibm.com
Subject: Re: [RFC] mm: Fix memory corruption caused by deferred page initialization
Date: Mon, 28 Mar 2016 00:48:27 +1100	[thread overview]
Message-ID: <20160327134827.GA24644@gwshan> (raw)
In-Reply-To: <20160326133708.GA382@gwshan>

On Sun, Mar 27, 2016 at 12:37:09AM +1100, Gavin Shan wrote:
>On Sat, Mar 26, 2016 at 08:47:17PM +1100, Michael Ellerman wrote:
>>Hi Gavin,
>>
>>On Fri, 2016-25-03 at 16:05:29 UTC, Gavin Shan wrote:
>>> During deferred page initialization, the pages are moved from memblock
>>> or bootmem to buddy allocator without checking they were reserved. Those
>>> reserved pages can be reallocated to somebody else by buddy/slab allocator.
>>> It leads to memory corruption and potential kernel crash eventually.
>>
>>Can you give me a bit more detail on what the bug is?
>>
>>I haven't seen any issues on my systems, but I realise now I haven't enabled
>>DEFERRED_STRUCT_PAGE_INIT - I assumed it was enabled by default.
>>
>>How did this get tested before submission?
>>
>
>Michael, I have to reply with same context in another thread in case 
>somebody else wants to understand more: Li, who is in the cc list, is
>backporting deferred page initialization (CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>from upstream kernel to RHEL 7.2 or 7.3 kernel (3.10.0-357.el7). RHEL kernel
>has (!CONFIG_NO_BOOTMEM && CONFIG_DEFERRED_STRUCT_PAGE_INIT), meaning
>bootmem is enabled. She eventually runs into kernel crash and I jumped
>in to help understanding the root cause.
>
>There're two related kernel config options: ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
>and DEFERRED_STRUCT_PAGE_INIT. The former one is enabled on PPC by default.
>The later one isn't enabled by default.
>
>There are two test cases I had:
>
>- With (!CONFIG_NO_BOOTMEM && CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>on PowerNV platform, upstream kernel (4.5.rc7) and additional patch to support
>bootmem as it was removed on powerpc a while ago.
>
>- With (CONFIG_NO_BOOTMEM && CONFIG_DEFERRED_STRUCT_PAGE_INIT) on PowerNV platform,
>upstream kernel (4.5.rc7), I dumped the reserved memblock regions and added printk
>in function deferred_init_memmap() to check if memblock reserved PFN 0x1fff80 (one
>page in memblock reserved region#31, refer to the below kernel log) is released
>to buddy allocator or not when doing deferred page struct initialization. I did
>see that PFN is released to buddy allocator at that time. However, I didn't see
>kernel crash and it would be luck and the current deferred page struct initialization
>implementation: The pages in region [0, 2GB] except the memblock reserved ones are
>presented to buddy allocator at early stage. It's not deferred. So for the pages in
>[0, 2GB], we don't have consistency issue between memblock and buddy allocator.
>The pages in region [2GB ...] are all presented to buddy allocator despite they're
>reserved in memblock or not. It ensures the kernel text section isn't corrupted
>and we're lucky not seeing program interrupt because of illegal instruction.
>

After more debugging, it turns out that Michael is correct: we don't have problem
when CONFIG_NO_BOOTMEM=y. In the case, the page frames in [2G ...] is marked as
reserved in early stage (as below function calls reveal). During the deferred
initialization stage, those reserved pages won't be released to buddy allocator:

- Below function calls mark reserved pages according to memblock reserved regions:
  init/main.c::start_kernel()
  init/main.c::mm_init()
  arch/powerpc/mm/mem.c::mem_init()
  nobootmem.c::free_all_bootmem()            <-> bootmem.c::free_all_bootmem() on !CONFIG_NO_BOOTMEM
  nobootmem.c::free_low_memory_core_early()
  nobootmem.c::reserve_bootmem_region()

- In page_alloc.c::deferred_init_memmap(), the reserved pages aren't released
  to buddy allocator with below check:

                        if (page->flags) {
                                VM_BUG_ON(page_zone(page) != zone);
                                goto free_range;
                        }


So the issue is only existing when CONFIG_NO_BOOTMEM=n. The alternative fix would
be similar to what we have on !CONFIG_NO_BOOTMEM: In early stage, all page structs
for bootmem reserved pages are initialized and mark them with PG_reserved. I'm
not sure it's worthy to fix it as we won't support bootmem as Michael mentioned.

Thanks,
Gavin

WARNING: multiple messages have this Message-ID (diff)
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	mgorman@suse.de, zhlcindy@linux.vnet.ibm.com
Subject: Re: [RFC] mm: Fix memory corruption caused by deferred page initialization
Date: Mon, 28 Mar 2016 00:48:27 +1100	[thread overview]
Message-ID: <20160327134827.GA24644@gwshan> (raw)
In-Reply-To: <20160326133708.GA382@gwshan>

On Sun, Mar 27, 2016 at 12:37:09AM +1100, Gavin Shan wrote:
>On Sat, Mar 26, 2016 at 08:47:17PM +1100, Michael Ellerman wrote:
>>Hi Gavin,
>>
>>On Fri, 2016-25-03 at 16:05:29 UTC, Gavin Shan wrote:
>>> During deferred page initialization, the pages are moved from memblock
>>> or bootmem to buddy allocator without checking they were reserved. Those
>>> reserved pages can be reallocated to somebody else by buddy/slab allocator.
>>> It leads to memory corruption and potential kernel crash eventually.
>>
>>Can you give me a bit more detail on what the bug is?
>>
>>I haven't seen any issues on my systems, but I realise now I haven't enabled
>>DEFERRED_STRUCT_PAGE_INIT - I assumed it was enabled by default.
>>
>>How did this get tested before submission?
>>
>
>Michael, I have to reply with same context in another thread in case 
>somebody else wants to understand more: Li, who is in the cc list, is
>backporting deferred page initialization (CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>from upstream kernel to RHEL 7.2 or 7.3 kernel (3.10.0-357.el7). RHEL kernel
>has (!CONFIG_NO_BOOTMEM && CONFIG_DEFERRED_STRUCT_PAGE_INIT), meaning
>bootmem is enabled. She eventually runs into kernel crash and I jumped
>in to help understanding the root cause.
>
>There're two related kernel config options: ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
>and DEFERRED_STRUCT_PAGE_INIT. The former one is enabled on PPC by default.
>The later one isn't enabled by default.
>
>There are two test cases I had:
>
>- With (!CONFIG_NO_BOOTMEM && CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>on PowerNV platform, upstream kernel (4.5.rc7) and additional patch to support
>bootmem as it was removed on powerpc a while ago.
>
>- With (CONFIG_NO_BOOTMEM && CONFIG_DEFERRED_STRUCT_PAGE_INIT) on PowerNV platform,
>upstream kernel (4.5.rc7), I dumped the reserved memblock regions and added printk
>in function deferred_init_memmap() to check if memblock reserved PFN 0x1fff80 (one
>page in memblock reserved region#31, refer to the below kernel log) is released
>to buddy allocator or not when doing deferred page struct initialization. I did
>see that PFN is released to buddy allocator at that time. However, I didn't see
>kernel crash and it would be luck and the current deferred page struct initialization
>implementation: The pages in region [0, 2GB] except the memblock reserved ones are
>presented to buddy allocator at early stage. It's not deferred. So for the pages in
>[0, 2GB], we don't have consistency issue between memblock and buddy allocator.
>The pages in region [2GB ...] are all presented to buddy allocator despite they're
>reserved in memblock or not. It ensures the kernel text section isn't corrupted
>and we're lucky not seeing program interrupt because of illegal instruction.
>

After more debugging, it turns out that Michael is correct: we don't have problem
when CONFIG_NO_BOOTMEM=y. In the case, the page frames in [2G ...] is marked as
reserved in early stage (as below function calls reveal). During the deferred
initialization stage, those reserved pages won't be released to buddy allocator:

- Below function calls mark reserved pages according to memblock reserved regions:
  init/main.c::start_kernel()
  init/main.c::mm_init()
  arch/powerpc/mm/mem.c::mem_init()
  nobootmem.c::free_all_bootmem()            <-> bootmem.c::free_all_bootmem() on !CONFIG_NO_BOOTMEM
  nobootmem.c::free_low_memory_core_early()
  nobootmem.c::reserve_bootmem_region()

- In page_alloc.c::deferred_init_memmap(), the reserved pages aren't released
  to buddy allocator with below check:

                        if (page->flags) {
                                VM_BUG_ON(page_zone(page) != zone);
                                goto free_range;
                        }


So the issue is only existing when CONFIG_NO_BOOTMEM=n. The alternative fix would
be similar to what we have on !CONFIG_NO_BOOTMEM: In early stage, all page structs
for bootmem reserved pages are initialized and mark them with PG_reserved. I'm
not sure it's worthy to fix it as we won't support bootmem as Michael mentioned.

Thanks,
Gavin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-03-28 23:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25 16:05 [PATCH RFC] mm: Fix memory corruption caused by deferred page initialization Gavin Shan
2016-03-25 16:05 ` Gavin Shan
2016-03-26  9:47 ` [RFC] " Michael Ellerman
2016-03-26  9:47   ` Michael Ellerman
2016-03-26 13:37   ` Gavin Shan
2016-03-26 13:37     ` Gavin Shan
2016-03-27 13:48     ` Gavin Shan [this message]
2016-03-27 13:48       ` Gavin Shan
2016-03-31  2:27       ` Gavin Shan
2016-03-31  2:27         ` Gavin Shan
2016-04-04  8:39         ` Mel Gorman
2016-04-04  8:39           ` Mel Gorman
2016-04-04 11:24           ` Gavin Shan
2016-04-04 11:24             ` Gavin Shan
2016-03-28 14:20     ` Aneesh Kumar K.V
2016-03-28 14:20       ` Aneesh Kumar K.V
2016-03-29  5:13     ` Li Zhang
2016-03-29  5:13       ` Li Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160327134827.GA24644@gwshan \
    --to=gwshan@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mpe@ellerman.id.au \
    --cc=zhlcindy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.