From: RashmicaGupta <rashmicy@gmail.com>
To: Nicholas Piggin <npiggin@gmail.com>,
	Rashmica Gupta <rashmica.g@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org, bsingharora@gmail.com,
	dllehr@us.ibm.com, mpe@ellerman.id.au
Subject: Re: [RFC] Remove memory from nodes for memtrace.
Date: Thu, 23 Feb 2017 15:29:17 +1100	[thread overview]
Message-ID: <159fc026-77a1-49a6-4648-3c4e8de1c1d4@gmail.com> (raw)
In-Reply-To: <20170223135635.5321dcc5@roar.ozlabs.ibm.com>
On 23/02/17 14:56, Nicholas Piggin wrote:
> On Thu, 23 Feb 2017 08:39:10 +1100
> Rashmica Gupta <rashmica.g@gmail.com> wrote:
>
>>   Some powerpc hardware features may want to gain access to a
>>   chunk of undisturbed real memory.  This update provides a means to unplug
>>   said memory from the kernel with a set of sysfs calls.  By writing an integer
>>   containing  the size of memory to be unplugged into
>>   /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that much
>>   memory from the end of each available chip's memory space. In addition, the
>>   means to read out the contents of the unplugged memory is also provided by
>>   reading out the /sys/kernel/debug/powerpc/memtrace/<chip-id>/dump file.
>>
>> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
> Thanks for picking this up. A few suggestions.
>
>> ---
>> Written by Douglas Lehr <dllehr@us.ibm.com>.
> You could mention original author in the changelog. I thought Anton
> wrote some of it too?
Yup he did. Will do that.
>
>>   arch/powerpc/mm/hash_native_64.c          |  39 +++-
>>   arch/powerpc/platforms/powernv/Makefile   |   1 +
>>   arch/powerpc/platforms/powernv/memtrace.c | 285 ++++++++++++++++++++++++++++++
>>   3 files changed, 321 insertions(+), 4 deletions(-)
>>   create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
>>
>> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>> index cc33260..44cc6ce 100644
>> --- a/arch/powerpc/mm/hash_native_64.c
>> +++ b/arch/powerpc/mm/hash_native_64.c
>> @@ -3,7 +3,7 @@
>>    *
>>    * SMP scalability work:
>>    *    Copyright (C) 2001 Anton Blanchard <anton@au.ibm.com>, IBM
>> - *
>> + *
>>    * This program is free software; you can redistribute it and/or
>>    * modify it under the terms of the GNU General Public License
>>    * as published by the Free Software Foundation; either version
>> @@ -181,7 +181,7 @@ static inline void native_lock_hpte(struct hash_pte *hptep)
>>   	while (1) {
>>   		if (!test_and_set_bit_lock(HPTE_LOCK_BIT, word))
>>   			break;
>> -		while(test_bit(HPTE_LOCK_BIT, word))
>> +		while (test_bit(HPTE_LOCK_BIT, word))
>>   			cpu_relax();
>>   	}
>>   }
>> @@ -208,10 +208,10 @@ static long native_hpte_insert(unsigned long hpte_group, unsigned long vpn,
>>   	}
>>   
>>   	for (i = 0; i < HPTES_PER_GROUP; i++) {
>> -		if (! (be64_to_cpu(hptep->v) & HPTE_V_VALID)) {
>> +		if (!(be64_to_cpu(hptep->v) & HPTE_V_VALID)) {
>>   			/* retry with lock held */
>>   			native_lock_hpte(hptep);
>> -			if (! (be64_to_cpu(hptep->v) & HPTE_V_VALID))
>> +			if (!(be64_to_cpu(hptep->v) & HPTE_V_VALID))
>>   				break;
>>   			native_unlock_hpte(hptep);
> Suggest dropping the whitespace cleanups.
>
>
>>   		}
>> @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
>>   	tlbie(vpn, psize, psize, ssize, 0);
>>   }
>>   
>> +/*
>> + * Remove a bolted kernel entry. Memory hotplug uses this.
>> + *
>> + * No need to lock here because we should be the only user.
>> + */
>> +static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
>> +{
>> +	unsigned long vpn;
>> +	unsigned long vsid;
>> +	long slot;
>> +	struct hash_pte *hptep;
>> +
>> +	vsid = get_kernel_vsid(ea, ssize);
>> +	vpn = hpt_vpn(ea, vsid, ssize);
>> +
>> +	slot = native_hpte_find(vpn, psize, ssize);
>> +	if (slot == -1)
>> +		return -ENOENT;
>> +
>> +	hptep = htab_address + slot;
>> +
>> +	/* Invalidate the hpte */
>> +	hptep->v = 0;
>> +
>> +	/* Invalidate the TLB */
>> +	tlbie(vpn, psize, psize, ssize, 0);
>> +	return 0;
>> +}
>> +
>> +
>>   static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
>>   				   int bpsize, int apsize, int ssize, int local)
>>   {
>> @@ -722,6 +752,7 @@ void __init hpte_init_native(void)
>>   	mmu_hash_ops.hpte_invalidate	= native_hpte_invalidate;
>>   	mmu_hash_ops.hpte_updatepp	= native_hpte_updatepp;
>>   	mmu_hash_ops.hpte_updateboltedpp = native_hpte_updateboltedpp;
>> +	mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
> I would submit this as a separate patch. Consider if it should be made
> dependent on any CONFIG options (e.g., MEMORY_HOTREMOVE), and Aneesh had
> been looking at this too so he could help review it.
>
>>   	mmu_hash_ops.hpte_insert	= native_hpte_insert;
>>   	mmu_hash_ops.hpte_remove	= native_hpte_remove;
>>   	mmu_hash_ops.hpte_clear_all	= native_hpte_clear;
>> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
>> index b5d98cb..2026661 100644
>> --- a/arch/powerpc/platforms/powernv/Makefile
>> +++ b/arch/powerpc/platforms/powernv/Makefile
>> @@ -11,4 +11,5 @@ obj-$(CONFIG_EEH)	+= eeh-powernv.o
>>   obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
>>   obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
>>   obj-$(CONFIG_TRACEPOINTS)	+= opal-tracepoints.o
>> +obj-$(CONFIG_MEMORY_HOTREMOVE)         += memtrace.o
>>   obj-$(CONFIG_OPAL_PRD)	+= opal-prd.o
> I would make a new Kconfig option for it, for the time being.
>
>
>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>> +{
>> +	if (mem->state != MEM_ONLINE)
>> +		return -1;
>> +
>> +	return 0;
>> +}
>> +
>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>> +{
>> +	unsigned long state = (unsigned long)arg;
>> +
>> +	mem->state = state;
>> +	return 0;
>> +}
>> +
>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>> +{
>> +	u64 end_pfn = start_pfn + nr_pages - 1;
>> +
>> +	if (walk_memory_range(start_pfn, end_pfn, NULL,
>> +	    check_memblock_online))
>> +		return false;
>> +
>> +	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>> +			  change_memblock_state);
>> +
>> +	if (offline_pages(start_pfn, nr_pages)) {
>> +		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>> +				  change_memblock_state);
>> +		return false;
>> +	}
>> +
>> +	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>> +			  change_memblock_state);
>> +
>> +	/* RCU grace period? */
>> +	flush_memory_region((u64)__va(start_pfn << PAGE_SHIFT), nr_pages << PAGE_SHIFT);
>> +
>> +	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
>> +
>> +	return true;
>> +}
> This is the tricky part. Memory hotplug APIs don't seem well suited for
> what we're trying to do... Anyway, do a bit of grepping around for
> definitions of some of these calls, and how other code uses them. For
> example, remove_memory comment says caller must hold
> lock_device_hotplug() first, so we're missing that at least. I think
> that's also needed over the memblock state changes.
Yup, realised I missed the lock_device_hotplug()
that right after I sent the email...
>
> We don't need an RCU grace period there AFAICT, because offline_pages
> should have us covered.
>
>
> I haven't looked at memory hotplug enough to know why we're open-coding
> the memblock stuff there. It would be nice to just be able to call
> memblock_remove() like the pseries hotplug code does.
>
> I *think* it is because hot remove mostly comes from when we know about
> an online region of memory and we want to take it down. In this case we
> also are trying to discover if those addresses are covered by online
> memory. Still, I wonder if there are better memblock APIs to do this
> with? Balbir may have a better idea of that?
>
Gotcha. Will look into that and other ways to do this.
  
>> +/* XXX FIXME DEBUG CRAP */
>> +machine_device_initcall(pseries, memtrace_init);
> Can remove that, I think.
>
> Thanks,
> Nick
next prev parent reply	other threads:[~2017-02-23  4:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-22 21:39 [RFC] Remove memory from nodes for memtrace Rashmica Gupta
2017-02-23  3:38 ` Stewart Smith
2017-02-23  3:56 ` Nicholas Piggin
2017-02-23  4:01   ` Andrew Donnellan
2017-02-23  4:23     ` RashmicaGupta
2017-02-23  5:01       ` Andrew Donnellan
2017-02-23  4:29   ` RashmicaGupta [this message]
2017-02-23  6:19   ` RashmicaGupta
2017-02-24 23:47 ` Balbir Singh
2017-02-26 23:54 ` Oliver O'Halloran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=159fc026-77a1-49a6-4648-3c4e8de1c1d4@gmail.com \
    --to=rashmicy@gmail.com \
    --cc=bsingharora@gmail.com \
    --cc=dllehr@us.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=rashmica.g@gmail.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).