From: Bharata B Rao <bharata@linux.vnet.ibm.com>
To: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org, aneesh.kumar@linux.vnet.ibm.com,
arbab@linux.vnet.ibm.com
Subject: Re: [FIX PATCH v0] powerpc: Fix memory unplug failure on radix guest
Date: Tue, 5 Sep 2017 09:50:32 +0530 [thread overview]
Message-ID: <20170905042032.GA4230@in.ibm.com> (raw)
In-Reply-To: <e9eb4fb9-79ea-118a-f2b6-5501556a1181@linux.vnet.ibm.com>
On Fri, Sep 01, 2017 at 09:11:18AM -0500, Nathan Fontenot wrote:
> On 09/01/2017 01:53 AM, Bharata B Rao wrote:
> > On Thu, Aug 10, 2017 at 02:53:48PM +0530, Bharata B Rao wrote:
> >> For a PowerKVM guest, it is possible to specify a DIMM device in
> >> addition to the system RAM at boot time. When such a cold plugged DIMM
> >> device is removed from a radix guest, we hit the following warning in the
> >> guest kernel resulting in the eventual failure of memory unplug:
> >>
> >> remove_pud_table: unaligned range
> >> WARNING: CPU: 3 PID: 164 at arch/powerpc/mm/pgtable-radix.c:597 remove_pagetable+0x468/0xca0
> >> Call Trace:
> >> remove_pagetable+0x464/0xca0 (unreliable)
> >> radix__remove_section_mapping+0x24/0x40
> >> remove_section_mapping+0x28/0x60
> >> arch_remove_memory+0xcc/0x120
> >> remove_memory+0x1ac/0x270
> >> dlpar_remove_lmb+0x1ac/0x210
> >> dlpar_memory+0xbc4/0xeb0
> >> pseries_hp_work_fn+0x1a4/0x230
> >> process_one_work+0x1cc/0x660
> >> worker_thread+0xac/0x6d0
> >> kthread+0x16c/0x1b0
> >> ret_from_kernel_thread+0x5c/0x74
> >>
> >> The DIMM memory that is cold plugged gets merged to the same memblock
> >> region as RAM and hence gets mapped at 1G alignment. However since the
> >> removal is done for one LMB (lmb size 256MB) at a time, the address
> >> of the LMB (which is 256MB aligned) would get flagged as unaligned
> >> in remove_pud_table() resulting in the above failure.
> >>
> >> This problem is not seen for hot plugged memory because for the
> >> hot plugged memory, the mappings are created separately for each
> >> LMB and hence they all get aligned at 256MB.
> >>
> >> To fix this problem for the cold plugged memory, let us mark the
> >> cold plugged memblock region explicitly as HOTPLUGGED so that the
> >> region doesn't get merged with RAM. All the memory that is discovered
> >> via ibm,dynamic-memory-configuration is marked so(1). Next identify
> >> such regions in radix_init_pgtable() and create separate mappings
> >> within that region for each LMB so that they get don't get aligned
> >> like RAM region at 1G (2).
> >>
> >> (1) For PowerKVM guests, all boot time memory is represented via
> >> memory@XXXX nodes and hot plugged/pluggable memory is represented via
> >> ibm,dynamic-memory-reconfiguration property. We are marking all
> >> hotplugged memory that is in ASSIGNED state during boot as HOTPLUGGED.
> >> With this only cold plugged memory gets marked for PowerKVM but
> >> need to check how this will affect PowerVM guests.
> >>
> >> (2) To create separate mappings for every LMB in the hot plugged
> >> region, we need lmb-size. I am currently using memory_block_size_bytes()
> >> API to get the lmb-size. Since this is early init time code, the
> >> machine type isn't probed yet and hence memory_block_size_bytes()
> >> would return the default LMB size as 16MB. Hence we end up creating
> >> separate mappings at much lower granularity than what we can ideally
> >> do for pseries machine.
> >>
> >> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >> ---
> >> arch/powerpc/kernel/prom.c | 1 +
> >> arch/powerpc/mm/pgtable-radix.c | 17 ++++++++++++++---
> >> 2 files changed, 15 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> >> index f830562..24ecf53 100644
> >> --- a/arch/powerpc/kernel/prom.c
> >> +++ b/arch/powerpc/kernel/prom.c
> >> @@ -524,6 +524,7 @@ static int __init early_init_dt_scan_drconf_memory(unsigned long node)
> >> size = 0x80000000ul - base;
> >> }
> >> memblock_add(base, size);
> >> + memblock_mark_hotplug(base, size);
> >
> > One of the suggestions was to make the above conditional to radix so
> > that PowerVM doesn't get affected by this. However early_radix_enabled()
> > check isn't usable yet at this point and MMU_FTR_TYPE_RADIX will get set
> > only a bit later in early_init_devtree().
>
> We do walk the dynamic reconfiguration memory again in the numa code, see
> parse_drconf_memory() in numa.c, would it far enough along in boot to use
> early_radix_enabled() and mark the memory hotplug at this point?
parse_drconf_memory() in numa.c happens after radix page tables are setup.
Hence setting the hotplugged state from it will not help.
Regards,
Bharata.
prev parent reply other threads:[~2017-09-05 4:22 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-10 9:23 [FIX PATCH v0] powerpc: Fix memory unplug failure on radix guest Bharata B Rao
2017-08-10 16:50 ` Reza Arbab
2017-08-10 20:38 ` Reza Arbab
2017-08-11 8:37 ` Aneesh Kumar K.V
2017-08-11 16:28 ` Reza Arbab
2017-08-11 8:42 ` Aneesh Kumar K.V
2017-08-17 9:58 ` Bharata B Rao
2017-09-01 6:53 ` Bharata B Rao
2017-09-01 14:11 ` Nathan Fontenot
2017-09-05 4:20 ` Bharata B Rao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170905042032.GA4230@in.ibm.com \
--to=bharata@linux.vnet.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=arbab@linux.vnet.ibm.com \
--cc=linuxppc-dev@ozlabs.org \
--cc=nfont@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).