LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/3] powerpc/io: Add __raw_writeq_be() __raw_rm_writeq_be()
From: Samuel Mendoza-Jonas @ 2018-05-18  6:39 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: alistair, paulus
In-Reply-To: <20180514125033.12000-1-mpe@ellerman.id.au>

On Mon, 2018-05-14 at 22:50 +1000, Michael Ellerman wrote:
> Add byte-swapping versions of __raw_writeq() and __raw_rm_writeq().
> 
> This allows us to avoid sparse warnings caused by passing __be64 to
> __raw_writeq(), which takes unsigned long:
> 
>   arch/powerpc/platforms/powernv/pci-ioda.c:1981:38:
>   warning: incorrect type in argument 1 (different base types)
>       expected unsigned long [unsigned] v
>       got restricted __be64 [usertype] <noident>
> 
> It's also generally preferable to use a byte-swapping accessor rather
> than doing it by hand in the code, which is more bug prone.
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

For this and the following patches:

Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>

> ---
>  arch/powerpc/include/asm/io.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> index af074923d598..e0331e754568 100644
> --- a/arch/powerpc/include/asm/io.h
> +++ b/arch/powerpc/include/asm/io.h
> @@ -367,6 +367,11 @@ static inline void __raw_writeq(unsigned long v, volatile void __iomem *addr)
>  	*(volatile unsigned long __force *)PCI_FIX_ADDR(addr) = v;
>  }
>  
> +static inline void __raw_writeq_be(unsigned long v, volatile void __iomem *addr)
> +{
> +	__raw_writeq((__force unsigned long)cpu_to_be64(v), addr);
> +}
> +
>  /*
>   * Real mode versions of the above. Those instructions are only supposed
>   * to be used in hypervisor real mode as per the architecture spec.
> @@ -395,6 +400,11 @@ static inline void __raw_rm_writeq(u64 val, volatile void __iomem *paddr)
>  		: : "r" (val), "r" (paddr) : "memory");
>  }
>  
> +static inline void __raw_rm_writeq_be(u64 val, volatile void __iomem *paddr)
> +{
> +	__raw_rm_writeq((__force u64)cpu_to_be64(val), paddr);
> +}
> +
>  static inline u8 __raw_rm_readb(volatile void __iomem *paddr)
>  {
>  	u8 ret;

^ permalink raw reply

* Re: [PATCH v4 3/4] powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp()
From: Simon Guo @ 2018-05-18  6:05 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Paul Mackerras, Naveen N.  Rao, Cyril Bur
In-Reply-To: <87o9hemktb.fsf@concordia.ellerman.id.au>

Hi Michael,
On Fri, May 18, 2018 at 12:13:52AM +1000, Michael Ellerman wrote:
> wei.guo.simon@gmail.com writes:
> > From: Simon Guo <wei.guo.simon@gmail.com>
> >
> > This patch is based on the previous VMX patch on memcmp().
> >
> > To optimize ppc64 memcmp() with VMX instruction, we need to think about
> > the VMX penalty brought with: If kernel uses VMX instruction, it needs
> > to save/restore current thread's VMX registers. There are 32 x 128 bits
> > VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store.
> >
> > The major concern regarding the memcmp() performance in kernel is KSM,
> > who will use memcmp() frequently to merge identical pages. So it will
> > make sense to take some measures/enhancement on KSM to see whether any
> > improvement can be done here.  Cyril Bur indicates that the memcmp() for
> > KSM has a higher possibility to fail (unmatch) early in previous bytes
> > in following mail.
> > 	https://patchwork.ozlabs.org/patch/817322/#1773629
> > And I am taking a follow-up on this with this patch.
> >
> > Per some testing, it shows KSM memcmp() will fail early at previous 32
> > bytes.  More specifically:
> >     - 76% cases will fail/unmatch before 16 bytes;
> >     - 83% cases will fail/unmatch before 32 bytes;
> >     - 84% cases will fail/unmatch before 64 bytes;
> > So 32 bytes looks a better choice than other bytes for pre-checking.
> >
> > This patch adds a 32 bytes pre-checking firstly before jumping into VMX
> > operations, to avoid the unnecessary VMX penalty. And the testing shows
> > ~20% improvement on memcmp() average execution time with this patch.
> >
> > The detail data and analysis is at:
> > https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md
> >
> > Any suggestion is welcome.
> 
> Thanks for digging into that, really great work.
> 
> I'm inclined to make this not depend on KSM though. It seems like a good
> optimisation to do in general.
> 
> So can we just call it the 'pre-check' or something, and always do it?
> 
Sound reasonable to me.
I will expand the change to .Ldiffoffset_vmx_cmp case and test accordingly.

Thanks,
- Simon

^ permalink raw reply

* Re: [PATCH bpf 5/6] tools: bpftool: resolve calls without using imm field
From: Sandipan Das @ 2018-05-18  4:28 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, naveen.n.rao, linuxppc-dev, ast, daniel
In-Reply-To: <20180517115104.7666ff60@cakuba>

Hi Jakub,

On 05/18/2018 12:21 AM, Jakub Kicinski wrote:
> On Thu, 17 May 2018 12:05:47 +0530, Sandipan Das wrote:
>> Currently, we resolve the callee's address for a JITed function
>> call by using the imm field of the call instruction as an offset
>> from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
>> use this address to get the callee's kernel symbol's name.
>>
>> For some architectures, such as powerpc64, the imm field is not
>> large enough to hold this offset. So, instead of assigning this
>> offset to the imm field, the verifier now assigns the subprog
>> id. Also, a list of kernel symbol addresses for all the JITed
>> functions is provided in the program info. We now use the imm
>> field as an index for this list to lookup a callee's symbol's
>> address and resolve its name.
>>
>> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> 
> A few nit-picks below, thank you for the patch!
> 
>>  tools/bpf/bpftool/prog.c          | 31 +++++++++++++++++++++++++++++++
>>  tools/bpf/bpftool/xlated_dumper.c | 24 +++++++++++++++++-------
>>  tools/bpf/bpftool/xlated_dumper.h |  2 ++
>>  3 files changed, 50 insertions(+), 7 deletions(-)
>>
>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>> index 9bdfdf2d3fbe..ac2f62a97e84 100644
>> --- a/tools/bpf/bpftool/prog.c
>> +++ b/tools/bpf/bpftool/prog.c
>> @@ -430,6 +430,10 @@ static int do_dump(int argc, char **argv)
>>  	unsigned char *buf;
>>  	__u32 *member_len;
>>  	__u64 *member_ptr;
>> +	unsigned int nr_addrs;
>> +	unsigned long *addrs = NULL;
>> +	__u32 *ksyms_len;
>> +	__u64 *ksyms_ptr;
> 
> nit: please try to keep the variables ordered longest to shortest like
> we do in networking code (please do it in all functions).
> 
>>  	ssize_t n;
>>  	int err;
>>  	int fd;
>> @@ -437,6 +441,8 @@ static int do_dump(int argc, char **argv)
>>  	if (is_prefix(*argv, "jited")) {
>>  		member_len = &info.jited_prog_len;
>>  		member_ptr = &info.jited_prog_insns;
>> +		ksyms_len = &info.nr_jited_ksyms;
>> +		ksyms_ptr = &info.jited_ksyms;
>>  	} else if (is_prefix(*argv, "xlated")) {
>>  		member_len = &info.xlated_prog_len;
>>  		member_ptr = &info.xlated_prog_insns;
>> @@ -496,10 +502,23 @@ static int do_dump(int argc, char **argv)
>>  		return -1;
>>  	}
>>  
>> +	nr_addrs = *ksyms_len;
> 
> Here and ...
> 
>> +	if (nr_addrs) {
>> +		addrs = malloc(nr_addrs * sizeof(__u64));
>> +		if (!addrs) {
>> +			p_err("mem alloc failed");
>> +			free(buf);
>> +			close(fd);
>> +			return -1;
> 
> You can just jump to err_free here.
> 
>> +		}
>> +	}
>> +
>>  	memset(&info, 0, sizeof(info));
>>  
>>  	*member_ptr = ptr_to_u64(buf);
>>  	*member_len = buf_size;
>> +	*ksyms_ptr = ptr_to_u64(addrs);
>> +	*ksyms_len = nr_addrs;
> 
> ... here - this function is getting long, so maybe I'm not seeing
> something, but are ksyms_ptr and ksyms_len guaranteed to be initialized?
> 
>>  	err = bpf_obj_get_info_by_fd(fd, &info, &len);
>>  	close(fd);
>> @@ -513,6 +532,11 @@ static int do_dump(int argc, char **argv)
>>  		goto err_free;
>>  	}
>>  
>> +	if (*ksyms_len > nr_addrs) {
>> +		p_err("too many addresses returned");
>> +		goto err_free;
>> +	}
>> +
>>  	if ((member_len == &info.jited_prog_len &&
>>  	     info.jited_prog_insns == 0) ||
>>  	    (member_len == &info.xlated_prog_len &&
>> @@ -558,6 +582,9 @@ static int do_dump(int argc, char **argv)
>>  			dump_xlated_cfg(buf, *member_len);
>>  	} else {
>>  		kernel_syms_load(&dd);
>> +		dd.jited_ksyms = ksyms_ptr;
>> +		dd.nr_jited_ksyms = *ksyms_len;
>> +
>>  		if (json_output)
>>  			dump_xlated_json(&dd, buf, *member_len, opcodes);
>>  		else
>> @@ -566,10 +593,14 @@ static int do_dump(int argc, char **argv)
>>  	}
>>  
>>  	free(buf);
>> +	if (addrs)
>> +		free(addrs);
> 
> Free can deal with NULL pointers, no need for an if.
> 
>>  	return 0;
>>  
>>  err_free:
>>  	free(buf);
>> +	if (addrs)
>> +		free(addrs);
>>  	return -1;
>>  }
>>  
>> diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
>> index 7a3173b76c16..dc8e4eca0387 100644
>> --- a/tools/bpf/bpftool/xlated_dumper.c
>> +++ b/tools/bpf/bpftool/xlated_dumper.c
>> @@ -178,8 +178,12 @@ static const char *print_call_pcrel(struct dump_data *dd,
>>  		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>>  			 "%+d#%s", insn->off, sym->name);
>>  	else
> 
> else if (address)
> 
> saves us the indentation.
> 
>> -		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>> -			 "%+d#0x%lx", insn->off, address);
>> +		if (address)
>> +			snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>> +				 "%+d#0x%lx", insn->off, address);
>> +		else
>> +			snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>> +				 "%+d", insn->off);
>>  	return dd->scratch_buff;
>>  }
>>  
>> @@ -200,14 +204,20 @@ static const char *print_call(void *private_data,
>>  			      const struct bpf_insn *insn)
>>  {
>>  	struct dump_data *dd = private_data;
>> -	unsigned long address = dd->address_call_base + insn->imm;
>> -	struct kernel_sym *sym;
>> +	unsigned long address = 0;
>> +	struct kernel_sym *sym = NULL;
>>  
> 
> Hm.  Quite a bit of churn.  Why not just add these three lines here:
> 
> if (insn->src_reg == BPF_PSEUDO_CALL && 
>     insn->imm < dd->nr_jited_ksyms)
> 	address = dd->jited_ksyms[insn->imm];
> 
>> -	sym = kernel_syms_search(dd, address);
>> -	if (insn->src_reg == BPF_PSEUDO_CALL)
>> +	if (insn->src_reg == BPF_PSEUDO_CALL) {
>> +		if (dd->nr_jited_ksyms) {
>> +			address = dd->jited_ksyms[insn->imm];
> 
> Perhaps it's paranoid, but it'd please do to bound check insn->imm
> against dd->nr_jited_ksyms.
> 
>> +			sym = kernel_syms_search(dd, address);
>> +		}
>>  		return print_call_pcrel(dd, sym, address, insn);
>> -	else
>> +	} else {
>> +		address = dd->address_call_base + insn->imm;
>> +		sym = kernel_syms_search(dd, address);
>>  		return print_call_helper(dd, sym, address);
>> +	}
>>  }
>>  
>>  static const char *print_imm(void *private_data,
> 
> 

Thank you for the suggestions. Will post out v2 with these changes.

- Sandipan

^ permalink raw reply

* Re: [PATCH 0/3] Add support to disable sensor groups in P9
From: Shilpasri G Bhat @ 2018-05-18  4:03 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: mpe, linuxppc-dev, linux-hwmon, linux-kernel, stewart
In-Reply-To: <56bba97b-c6ee-e7a1-136c-a2236437484a@roeck-us.net>



On 05/17/2018 06:08 PM, Guenter Roeck wrote:
> On 05/16/2018 11:10 PM, Shilpasri G Bhat wrote:
>>
>>
>> On 05/15/2018 08:32 PM, Guenter Roeck wrote:
>>> On Thu, Mar 22, 2018 at 04:24:32PM +0530, Shilpasri G Bhat wrote:
>>>> This patch series adds support to enable/disable OCC based
>>>> inband-sensor groups at runtime. The environmental sensor groups are
>>>> managed in HWMON and the remaining platform specific sensor groups are
>>>> managed in /sys/firmware/opal.
>>>>
>>>> The firmware changes required for this patch is posted below:
>>>> https://lists.ozlabs.org/pipermail/skiboot/2018-March/010812.html
>>>>
>>>
>>> Sorry for not getting back earlier. This is a tough one.
>>>
>>
>> Thanks for the reply. I have tried to answer your questions according to my
>> understanding below:
>>
>>> Key problem is that you are changing the ABI with those new attributes.
>>> On top of that, the attributes _do_ make some sense (many chips support
>>> enabling/disabling of individual sensors), suggesting that those or
>>> similar attributes may or even should at some point be added to the ABI.
>>>
>>> At the same time, returning "0" as measurement values when sensors are
>>> disabled does not seem like a good idea, since "0" is a perfectly valid
>>> measurement, at least for most sensors.
>>
>> I agree.
>>
>>>
>>> Given that, we need to have a discussion about adding _enable attributes to
>>> the ABI
>>
>>> what is the scope,
>> IIUC the scope should be RW and the attribute is defined for each supported
>> sensor group
>>
> 
> That is _your_ need. I am not aware of any other chip where a per-sensor group
> attribute would make sense. The discussion we need has to extend beyond the need
> of a single chip.
> 
> Guenter
> 


Is it okay if the ABI provides provision for both types of attribute
power_enable and powerX_enable. And is it okay to decide which type of attribute
to be used by the capability provided by the hwmon chip?


- Shilpa

>>> when should the attributes exist and when not,
>> We control this currently via device-tree
>>
>>> do we want/need power_enable or powerX_enable or both, and so on), and
>> We need power_enable right now
>>
>>> what to return if a sensor is disabled (such as -ENODATA).
>> -ENODATA sounds good.
>>
>> Thanks and Regards,
>> Shilpa
>>
>>   Once we have an
>>> agreement, we can continue with an implementation.
>>>
>>> Guenter
>>>
>>>> Shilpasri G Bhat (3):
>>>>    powernv:opal-sensor-groups: Add support to enable sensor groups
>>>>    hwmon: ibmpowernv: Add attributes to enable/disable sensor groups
>>>>    powernv: opal-sensor-groups: Add attributes to disable/enable sensors
>>>>
>>>>   .../ABI/testing/sysfs-firmware-opal-sensor-groups  |  34 ++++++
>>>>   Documentation/hwmon/ibmpowernv                     |  31 ++++-
>>>>   arch/powerpc/include/asm/opal-api.h                |   4 +-
>>>>   arch/powerpc/include/asm/opal.h                    |   2 +
>>>>   .../powerpc/platforms/powernv/opal-sensor-groups.c | 104 ++++++++++++-----
>>>>   arch/powerpc/platforms/powernv/opal-wrappers.S     |   1 +
>>>>   drivers/hwmon/ibmpowernv.c                         | 127
>>>> +++++++++++++++++++--
>>>>   7 files changed, 265 insertions(+), 38 deletions(-)
>>>>   create mode 100644
>>>> Documentation/ABI/testing/sysfs-firmware-opal-sensor-groups
>>>>
>>>> -- 
>>>> 1.8.3.1
>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
> 

^ permalink raw reply

* [PATCH] powepc: Clear PCR on boot
From: Michael Neuling @ 2018-05-18  1:37 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, sam, mikey

Clear the PCR on boot to ensure we are not running in a compat mode.

We've seen this cause problems when a crash (and kdump) occurs while
running compat mode guests. The kdump kernel then runs with the PCR
set and causes problems. The symptom in the kdump kernel (also seen in
petitboot after fast-reboot) is early userspace programs taking
sigills on newer instructions (seen in libc).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org>
---
 arch/powerpc/kernel/cpu_setup_power.S | 6 ++++++
 arch/powerpc/kernel/dt_cpu_ftrs.c     | 1 +
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
index 3f30c994e9..458b928dbd 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -28,6 +28,7 @@ _GLOBAL(__setup_cpu_power7)
 	beqlr
 	li	r0,0
 	mtspr	SPRN_LPID,r0
+	mtspr	SPRN_PCR,r0
 	mfspr	r3,SPRN_LPCR
 	li	r4,(LPCR_LPES1 >> LPCR_LPES_SH)
 	bl	__init_LPCR_ISA206
@@ -41,6 +42,7 @@ _GLOBAL(__restore_cpu_power7)
 	beqlr
 	li	r0,0
 	mtspr	SPRN_LPID,r0
+	mtspr	SPRN_PCR,r0
 	mfspr	r3,SPRN_LPCR
 	li	r4,(LPCR_LPES1 >> LPCR_LPES_SH)
 	bl	__init_LPCR_ISA206
@@ -57,6 +59,7 @@ _GLOBAL(__setup_cpu_power8)
 	beqlr
 	li	r0,0
 	mtspr	SPRN_LPID,r0
+	mtspr	SPRN_PCR,r0
 	mfspr	r3,SPRN_LPCR
 	ori	r3, r3, LPCR_PECEDH
 	li	r4,0 /* LPES = 0 */
@@ -78,6 +81,7 @@ _GLOBAL(__restore_cpu_power8)
 	beqlr
 	li	r0,0
 	mtspr	SPRN_LPID,r0
+	mtspr	SPRN_PCR,r0
 	mfspr   r3,SPRN_LPCR
 	ori	r3, r3, LPCR_PECEDH
 	li	r4,0 /* LPES = 0 */
@@ -99,6 +103,7 @@ _GLOBAL(__setup_cpu_power9)
 	mtspr	SPRN_PSSCR,r0
 	mtspr	SPRN_LPID,r0
 	mtspr	SPRN_PID,r0
+	mtspr	SPRN_PCR,r0
 	mfspr	r3,SPRN_LPCR
 	LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE  | LPCR_HEIC)
 	or	r3, r3, r4
@@ -123,6 +128,7 @@ _GLOBAL(__restore_cpu_power9)
 	mtspr	SPRN_PSSCR,r0
 	mtspr	SPRN_LPID,r0
 	mtspr	SPRN_PID,r0
+	mtspr	SPRN_PCR,r0
 	mfspr   r3,SPRN_LPCR
 	LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC)
 	or	r3, r3, r4
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 8ab51f6ca0..c904477aba 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -101,6 +101,7 @@ static void __restore_cpu_cpufeatures(void)
 	if (hv_mode) {
 		mtspr(SPRN_LPID, 0);
 		mtspr(SPRN_HFSCR, system_registers.hfscr);
+		mtspr(SPRN_PCR, 0);
 	}
 	mtspr(SPRN_FSCR, system_registers.fscr);
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH RESEND] powerpc/lib: Fix "integer constant is too large" build failure
From: Finn Thain @ 2018-05-18  1:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

My powerpc-linux-gnu-gcc v4.4.5 compiler can't build a 32-bit kernel
any more:

arch/powerpc/lib/sstep.c: In function 'do_popcnt':
arch/powerpc/lib/sstep.c:1068: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c:1069: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c:1069: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c:1070: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c:1079: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c: In function 'do_prty':
arch/powerpc/lib/sstep.c:1117: error: integer constant is too large for 'long' type

This file gets compiled with -std=gnu89 which means a constant can be
given the type 'long' even if it won't fit. Fix the errors with a 'ULL'
suffix on the relevant constants.

Fixes: 2c979c489fee ("powerpc/lib/sstep: Add prty instruction emulation")
Fixes: dcbd19b48d31 ("powerpc/lib/sstep: Add popcnt instruction emulation")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
This change was compile tested but not regression tested.
---
I'm re-sending this because the first submission didn't show up on
patchwork.ozlabs.org, apparently because I sent it without any
message-id header. Sorry about that.
---
 arch/powerpc/lib/sstep.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 34d68f1b1b40..49427a3ee104 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1065,9 +1065,10 @@ static nokprobe_inline void do_popcnt(const struct pt_regs *regs,
 {
 	unsigned long long out = v1;
 
-	out -= (out >> 1) & 0x5555555555555555;
-	out = (0x3333333333333333 & out) + (0x3333333333333333 & (out >> 2));
-	out = (out + (out >> 4)) & 0x0f0f0f0f0f0f0f0f;
+	out -= (out >> 1) & 0x5555555555555555ULL;
+	out = (0x3333333333333333ULL & out) +
+	      (0x3333333333333333ULL & (out >> 2));
+	out = (out + (out >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
 
 	if (size == 8) {	/* popcntb */
 		op->val = out;
@@ -1076,7 +1077,7 @@ static nokprobe_inline void do_popcnt(const struct pt_regs *regs,
 	out += out >> 8;
 	out += out >> 16;
 	if (size == 32) {	/* popcntw */
-		op->val = out & 0x0000003f0000003f;
+		op->val = out & 0x0000003f0000003fULL;
 		return;
 	}
 
@@ -1114,7 +1115,7 @@ static nokprobe_inline void do_prty(const struct pt_regs *regs,
 
 	res ^= res >> 16;
 	if (size == 32) {		/* prtyw */
-		op->val = res & 0x0000000100000001;
+		op->val = res & 0x0000000100000001ULL;
 		return;
 	}
 
-- 
2.16.1

^ permalink raw reply related

* Re: [PATCH 07/14] powerpc: Add support for restartable sequences
From: Boqun Feng @ 2018-05-17 23:50 UTC (permalink / raw)
  To: Mathieu Desnoyers, Will Deacon
  Cc: Peter Zijlstra, Paul E. McKenney, Andy Lutomirski, Dave Watson,
	linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Hunter,
	Andi Kleen, Chris Lameter, Ben Maurer, rostedt, Josh Triplett,
	Linus Torvalds, Catalin Marinas, Michael Kerrisk, Joel Fernandes,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev
In-Reply-To: <277374719.2144.1526570889798.JavaMail.zimbra@efficios.com>



On Thu, May 17, 2018, at 11:28 PM, Mathieu Desnoyers wrote:
> ----- On May 16, 2018, at 9:19 PM, Boqun Feng boqun.feng@gmail.com wrote:
> 
> > On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote:
> >> ----- On May 16, 2018, at 12:18 PM, Peter Zijlstra peterz@infradead.org wrote:
> >> 
> >> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote:
> >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> >> >> index c32a181a7cbb..ed21a777e8c6 100644
> >> >> --- a/arch/powerpc/Kconfig
> >> >> +++ b/arch/powerpc/Kconfig
> >> >> @@ -223,6 +223,7 @@ config PPC
> >> >>  	select HAVE_SYSCALL_TRACEPOINTS
> >> >>  	select HAVE_VIRT_CPU_ACCOUNTING
> >> >>  	select HAVE_IRQ_TIME_ACCOUNTING
> >> >> +	select HAVE_RSEQ
> >> >>  	select IRQ_DOMAIN
> >> >>  	select IRQ_FORCED_THREADING
> >> >>  	select MODULES_USE_ELF_RELA
> >> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> >> >> index 61db86ecd318..d3bb3aaaf5ac 100644
> >> >> --- a/arch/powerpc/kernel/signal.c
> >> >> +++ b/arch/powerpc/kernel/signal.c
> >> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk)
> >> >>  	/* Re-enable the breakpoints for the signal stack */
> >> >>  	thread_change_pc(tsk, tsk->thread.regs);
> >> >>  
> >> >> +	rseq_signal_deliver(tsk->thread.regs);
> >> >> +
> >> >>  	if (is32) {
> >> >>          	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> >> >>  			ret = handle_rt_signal32(&ksig, oldset, tsk);
> >> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long
> >> >> thread_info_flags)
> >> >>  	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
> >> >>  		clear_thread_flag(TIF_NOTIFY_RESUME);
> >> >>  		tracehook_notify_resume(regs);
> >> >> +		rseq_handle_notify_resume(regs);
> >> >>  	}
> >> >>  
> >> >>  	user_enter();
> >> > 
> >> > Again no rseq_syscall().
> >> 
> >> Same question for PowerPC as for ARM:
> >> 
> >> Considering that rseq_syscall is implemented as follows:
> >> 
> >> +void rseq_syscall(struct pt_regs *regs)
> >> +{
> >> +       unsigned long ip = instruction_pointer(regs);
> >> +       struct task_struct *t = current;
> >> +       struct rseq_cs rseq_cs;
> >> +
> >> +       if (!t->rseq)
> >> +               return;
> >> +       if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) ||
> >> +           rseq_get_rseq_cs(t, &rseq_cs) || in_rseq_cs(ip, &rseq_cs))
> >> +               force_sig(SIGSEGV, t);
> >> +}
> >> 
> >> and that x86 calls it from syscall_return_slowpath() (which AFAIU is
> >> now used in the fast-path since KPTI), I wonder where we should call
> > 
> > So we actually detect this after the syscall takes effect, right? I
> > wonder whether this could be problematic, because "disallowing syscall"
> > in rseq areas may means the syscall won't take effect to some people, I
> > guess?
> > 
> >> this on PowerPC ?  I was under the impression that PowerPC return to
> >> userspace fast-path was not calling C code unless work flags were set,
> >> but I might be wrong.
> >> 
> > 
> > I think you're right. So we have to introduce callsite to rseq_syscall()
> > in syscall path, something like:
> > 
> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> > index 51695608c68b..a25734a96640 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -222,6 +222,9 @@ system_call_exit:
> > 	mtmsrd	r11,1
> > #endif /* CONFIG_PPC_BOOK3E */
> > 
> > +	addi    r3,r1,STACK_FRAME_OVERHEAD
> > +	bl	rseq_syscall
> > +
> > 	ld	r9,TI_FLAGS(r12)
> > 	li	r11,-MAX_ERRNO
> > 	andi.
> > 		r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
> > 
> > But I think it's important for us to first decide where (before or after
> > the syscall) we do the detection.
> 
> As Peter said, we don't really care whether it's on syscall entry or 
> exit, as
> long as the process gets killed when the erroneous use is detected. I 
> think doing
> it on syscall exit is a bit easier because we can clearly access the 
> userspace
> TLS, which AFAIU may be less straightforward on syscall entry.
>

Fair enough.
 
> We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you
> proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y.
> 

OK.

> On the ARM leg of the email thread, Will Deacon suggests to test whether 
> current->rseq
> is non-NULL before calling rseq_syscall(). I wonder if this added check 
> is justified
> as the assembly level, considering that this is just a debugging option. 
> We already do
> that check at the very beginning of rseq_syscall().
> 

Yes, I think it's better to do the check in rseq_syscall(), leaving the asm
code a bit cleaner.

Regards,
Boqun

> Thoughts ?
> 
> Thanks,
> 
> Mathieu
> 
> > 
> > Regards,
> > Boqun
> > 
> >> Thoughts ?
> >> 
> >> Thanks!
> >> 
> >> Mathieu
> >> 
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> > > http://www.efficios.com
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply

* Re: [PATCH] powerpc: Ensure gcc doesn't move around cache flushing in __patch_instruction
From: Segher Boessenkool @ 2018-05-17 23:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Michael Neuling
In-Reply-To: <18a52794a30857146396dd6023abf17179db53d0.camel@kernel.crashing.org>

On Fri, May 18, 2018 at 08:30:27AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2018-05-17 at 14:23 -0500, Segher Boessenkool wrote:
> > On Thu, May 17, 2018 at 01:06:10PM +1000, Benjamin Herrenschmidt wrote:
> > > The current asm statement in __patch_instruction() for the cache flushes
> > > doesn't have a "volatile" statement and no memory clobber. That means
> > > gcc can potentially move it around (or move the store done by put_user
> > > past the flush).
> > 
> > volatile is completely superfluous here, except maybe as documentation:
> > any asm without outputs is always volatile.
> 
> I wasn't aware of that. I was drilled early on to always stick volatile
> in my asm statements if they have any form of side effect :-)

If an asm without output was not marked automatically as having another
side effect, every such asm would be immediately deleted ;-)

Adding volatile as documentation for side effects can be good; it just
doesn't do much (nothing, in fact) for asms without output as far as
the compiler is concerned.

> > (And the memory clobber does not prevent the compiler from moving the
> > asm around, or duplicating it, etc., and neither does the volatile).
> 
> It prevents load/stores from moving around doesn't it ? I wanted to
> make sure the store of the instruction doesn't move in/pass the asm. If
> you say that's not needed then ignore the patch.

No, it's fine here, and you want either that or put exactly the memory
you are touching in a constraint (probably overkill here).  I just
wanted to say that a "memory" clobber does nothing more than say the
asm touches some unspecified memory; there is no magic other meaning
to it.  Your patch is correct, just the "volatile" part isn't needed,
and the explanation was a bit cargo-culty ;-)


Segher

^ permalink raw reply

* [RFC v3 4/4] powerpc/hotplug/drcinfo: Improve code for ibm,drc-info device processing
From: Michael Bringmann @ 2018-05-17 22:41 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <41ab86ce-9557-8e89-e265-7199cb46232c@linux.vnet.ibm.com>

This patch extends the use of a common parse function for the
ibm,drc-info property that can be modified by a callback function
to the hotplug device processing.  Candidate code is replaced by
a call to the parser including a pointer to a local context-specific
functions, and local data.

In addition, the original set missed several opportunities to compress
and reuse common code which this patch attempts to provide.

Finally, a bug with the registration of slots was observed on some
systems, and the code was rewritten to prevent its reoccurrence.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info firmwar
e feature" -- end of patch series applied to powerpc next)
---
Changes in V3:
  -- Update code to account for latest kernel checkins.
  -- Fix bug searching for virtual device slots.
  -- Rebased to 4.17-rc5 kernel
---
 drivers/pci/hotplug/rpaphp_core.c |  188 ++++++++++++++++++++++++++-----------
 1 file changed, 130 insertions(+), 58 deletions(-)

diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
index dccdf62..974147a 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -222,49 +222,52 @@ static int rpaphp_check_drc_props_v1(struct device_node *dn, char *drc_name,
 	return -EINVAL;
 }
 
-static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name,
-				char *drc_type, unsigned int my_index)
+struct check_drc_props_v2_struct {
+	char *drc_name;
+	char *drc_type;
+        unsigned int my_index;
+};
+
+static int check_drc_props_v2_checkRun(struct of_drc_info *drc,
+                                        void *idata, void *not_used,
+					int *ret_code)
 {
-	struct property *info;
-	unsigned int entries;
-	struct of_drc_info drc;
-	const __be32 *value;
+	struct check_drc_props_v2_struct *cdata = idata;
 	char cell_drc_name[MAX_DRC_NAME_LEN];
-	int j, fndit;
-
-	info = of_find_property(dn->parent, "ibm,drc-info", NULL);
-	if (info == NULL)
-		return -EINVAL;
-
-	value = of_prop_next_u32(info, NULL, &entries);
-	if (!value)
-		return -EINVAL;
-	value++;
-
-	for (j = 0; j < entries; j++) {
-		of_read_drc_info_cell(&info, &value, &drc);
 
-		/* Should now know end of current entry */
+	(*ret_code) = -EINVAL;
 
-		if (my_index > drc.last_drc_index)
-			continue;
+	if (cdata->my_index > drc->last_drc_index)
+		return 0;
 
-		fndit = 1;
-		break;
+	/* Found drc_index.  Now match the rest. */
+	sprintf(cell_drc_name, "%s%d", drc->drc_name_prefix, 
+		cdata->my_index - drc->drc_index_start +
+		drc->drc_name_suffix_start);
+
+	if (((cdata->drc_name == NULL) ||
+	     (cdata->drc_name && !strcmp(cdata->drc_name, cell_drc_name))) &&
+	    ((cdata->drc_type == NULL) ||
+	     (cdata->drc_type && !strcmp(cdata->drc_type, drc->drc_type)))) {
+		(*ret_code) = 0;
+		return 1;
 	}
-	/* Found it */
 
-	if (fndit)
-		sprintf(cell_drc_name, "%s%d", drc.drc_name_prefix, 
-			my_index);
+        return 0;
+}
 
-	if (((drc_name == NULL) ||
-	     (drc_name && !strcmp(drc_name, cell_drc_name))) &&
-	    ((drc_type == NULL) ||
-	     (drc_type && !strcmp(drc_type, drc.drc_type))))
-		return 0;
+static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name,
+				char *drc_type, unsigned int my_index)
+{
+	struct device_node *root = dn;
+	struct check_drc_props_v2_struct cdata = {
+		drc_name, drc_type, be32_to_cpu(my_index) };
 
-	return -EINVAL;
+	if (!drc_type || (drc_type && strcmp(drc_type, "SLOT")))
+		root = dn->parent;
+
+	return drc_info_parser(root, check_drc_props_v2_checkRun,
+				drc_type, &cdata);
 }
 
 int rpaphp_check_drc_props(struct device_node *dn, char *drc_name,
@@ -287,7 +290,6 @@ int rpaphp_check_drc_props(struct device_node *dn, char *drc_name,
 }
 EXPORT_SYMBOL_GPL(rpaphp_check_drc_props);
 
-
 static int is_php_type(char *drc_type)
 {
 	unsigned long value;
@@ -347,17 +349,40 @@ static int is_php_dn(struct device_node *dn, const int **indexes,
  *
  * To remove a slot, it suffices to call rpaphp_deregister_slot().
  */
-int rpaphp_add_slot(struct device_node *dn)
+
+static int rpaphp_add_slot_common(struct device_node *dn,
+			u32 drc_index, char *drc_name, char *drc_type,
+			u32 drc_power_domain)
 {
 	struct slot *slot;
 	int retval = 0;
-	int i;
+
+	slot = alloc_slot_struct(dn, drc_index, drc_name,
+				 drc_power_domain);
+	if (!slot)
+		return -ENOMEM;
+
+	slot->type = simple_strtoul(drc_type, NULL, 10);
+
+	dbg("Found drc-index:0x%x drc-name:%s drc-type:%s\n",
+		drc_index, drc_name, drc_type);
+
+	retval = rpaphp_enable_slot(slot);
+	if (!retval)
+		retval = rpaphp_register_slot(slot);
+
+	if (retval)
+		dealloc_slot_struct(slot);
+
+	return retval;
+}
+
+static int rpaphp_add_slot_v1(struct device_node *dn)
+{
+	int i, retval = 0;
 	const int *indexes, *names, *types, *power_domains;
 	char *name, *type;
 
-	if (!dn->name || strcmp(dn->name, "pci"))
-		return 0;
-
 	/* If this is not a hotplug slot, return without doing anything. */
 	if (!is_php_dn(dn, &indexes, &names, &types, &power_domains))
 		return 0;
@@ -368,25 +393,13 @@ int rpaphp_add_slot(struct device_node *dn)
 	name = (char *) &names[1];
 	type = (char *) &types[1];
 	for (i = 0; i < be32_to_cpu(indexes[0]); i++) {
-		int index;
-
-		index = be32_to_cpu(indexes[i + 1]);
-		slot = alloc_slot_struct(dn, index, name,
-					 be32_to_cpu(power_domains[i + 1]));
-		if (!slot)
-			return -ENOMEM;
-
-		slot->type = simple_strtoul(type, NULL, 10);
-
-		dbg("Found drc-index:0x%x drc-name:%s drc-type:%s\n",
-				index, name, type);
-
-		retval = rpaphp_enable_slot(slot);
+	
+		retval = rpaphp_add_slot_common(dn,
+				be32_to_cpu(indexes[i + 1]),
+				name, type,
+				be32_to_cpu(power_domains[i + 1]));
 		if (!retval)
-			retval = rpaphp_register_slot(slot);
-
-		if (retval)
-			dealloc_slot_struct(slot);
+			return retval;
 
 		name += strlen(name) + 1;
 		type += strlen(type) + 1;
@@ -396,6 +409,65 @@ int rpaphp_add_slot(struct device_node *dn)
 	/* XXX FIXME: reports a failure only if last entry in loop failed */
 	return retval;
 }
+
+struct rpaphp_add_slot_v2_struct {
+	struct device_node *dn;
+};
+
+static int rpaphp_add_slot_v2_checkRun(struct of_drc_info *drc,
+                                        void *idata, void *not_used,
+					int *ret_code)
+{
+	struct rpaphp_add_slot_v2_struct *cdata = idata;
+	u32 drc_index;
+	char drc_name[MAX_DRC_NAME_LEN];
+	int i, retval;
+
+	(*ret_code) = -EINVAL;
+
+	if (!is_php_type((char *) drc->drc_type)) {
+		(*ret_code) = 0;
+		return 1;
+	}
+
+	for (i = 0, drc_index = drc->drc_index_start;
+		i < drc->num_sequential_elems; i++, drc_index++) {
+
+        	sprintf(drc_name, "%s%d", drc->drc_name_prefix,
+			drc_index - drc->drc_index_start +
+			drc->drc_name_suffix_start);
+
+		retval = rpaphp_add_slot_common(cdata->dn,
+				drc_index, drc_name, drc->drc_type,
+				drc->drc_power_domain);
+		if (!retval) {
+			(*ret_code) = retval;
+			return 1;
+		}
+	}
+
+	(*ret_code) = retval;
+	return 0;
+}
+
+static int rpaphp_add_slot_v2(struct device_node *dn)
+{
+	struct rpaphp_add_slot_v2_struct cdata = { dn };
+
+	return drc_info_parser(dn, rpaphp_add_slot_v2_checkRun,
+				NULL, &cdata);
+}
+
+int rpaphp_add_slot(struct device_node *dn)
+{
+	if (!dn->name || strcmp(dn->name, "pci"))
+		return 0;
+
+	if (firmware_has_feature(FW_FEATURE_DRC_INFO))
+		return rpaphp_add_slot_v2(dn);
+	else
+		return rpaphp_add_slot_v1(dn);
+}
 EXPORT_SYMBOL_GPL(rpaphp_add_slot);
 
 static void __exit cleanup_slots(void)

^ permalink raw reply related

* [RFC v3 3/4] powerpc/hotplug/drcinfo: Fix hot-add CPU issues
From: Michael Bringmann @ 2018-05-17 22:41 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <41ab86ce-9557-8e89-e265-7199cb46232c@linux.vnet.ibm.com>

This patch applies a common parse function for the ibm,drc-info
property that can be modified by a callback function to the
hot-add CPU code.  Candidate code is replaced by a call to the
parser including a pointer to a local context-specific functions,
and local data.

In addition, a bug in the release of the previous patch set may
break things in some of the CPU DLPAR operations.  For instance,
when attempting to hot-add a new CPU or set of CPUs, the original
patch failed to always properly calculate the available resources,
and aborted the operation.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info firmwar
e feature" -- end of patch series applied to powerpc next)
---
Changes in V3:
  -- Update code to account for latest kernel checkins.
  -- Rebased to 4.17-rc5 kernel
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |  129 +++++++++++++++++------
 arch/powerpc/platforms/pseries/pseries_energy.c |  112 ++++++++++----------
 2 files changed, 154 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 6ef77ca..a408217 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -411,25 +411,67 @@ static bool dlpar_cpu_exists(struct device_node *parent, u32 drc_index)
 	return found;
 }
 
-static bool valid_cpu_drc_index(struct device_node *parent, u32 drc_index)
+static bool check_cpu_drc_index(struct device_node *parent,
+				int (*checkRun)(struct of_drc_info *drc,
+						void *data,
+						void *not_used,
+						int *ret_code),
+				void *cdata)
 {
-	bool found = false;
-	int rc, index;
+	int found = 0;
+
+	if (firmware_has_feature(FW_FEATURE_DRC_INFO)) {
+		found = drc_info_parser(parent, checkRun, "CPU", cdata);
+	} else {
+		int rc, index = 0;
 
-	index = 0;
-	while (!found) {
-		u32 drc;
+		while (!found) {
+			u32 drc;
 
-		rc = of_property_read_u32_index(parent, "ibm,drc-indexes",
+			rc = of_property_read_u32_index(parent,
+						"ibm,drc-indexes",
 						index++, &drc);
-		if (rc)
-			break;
+			if (rc)
+				break;
+			found = checkRun(NULL, cdata, &drc, NULL);
+		}
+	}
 
-		if (drc == drc_index)
-			found = true;
+	return (bool)found;
+}
+
+struct valid_cpu_drc_index_struct {
+	u32 targ_drc_index;
+};
+
+static int valid_cpu_drc_index_checkRun(struct of_drc_info *drc,
+					void *idata,
+					void *drc_index,
+					int *ret_code)
+{
+	struct valid_cpu_drc_index_struct *cdata = idata;
+
+	if (drc) {
+		if ((drc->drc_index_start <= cdata->targ_drc_index) &&
+			(cdata->targ_drc_index <= drc->last_drc_index)) {
+			(*ret_code) = 1;
+			return 1;
+		}
+	} else {
+		if (*((u32*)drc_index) == cdata->targ_drc_index) {
+			(*ret_code) = 1;
+			return 1;
+		}
 	}
+	return 0;
+}
 
-	return found;
+static bool valid_cpu_drc_index(struct device_node *parent, u32 drc_index)
+{
+	struct valid_cpu_drc_index_struct cdata = { drc_index };
+
+	return check_cpu_drc_index(parent, valid_cpu_drc_index_checkRun,
+				&cdata);
 }
 
 static ssize_t dlpar_cpu_add(u32 drc_index)
@@ -721,11 +763,45 @@ static int dlpar_cpu_remove_by_count(u32 cpus_to_remove)
 	return rc;
 }
 
+struct find_dlpar_cpus_to_add_struct {
+	struct device_node *parent;
+	u32 *cpu_drcs;
+	u32 cpus_to_add;
+	u32 cpus_found;
+};
+
+static int find_dlpar_cpus_to_add_checkRun(struct of_drc_info *drc,
+					void *idata,
+					void *drc_index,
+					int *ret_code)
+{
+	struct find_dlpar_cpus_to_add_struct *cdata = idata;
+
+	if (drc) {
+		int k;
+
+		for (k = 0; (k < drc->num_sequential_elems) &&
+			(cdata->cpus_found < cdata->cpus_to_add); k++) {
+			u32 idrc = drc->drc_index_start +
+				(k * drc->sequential_inc);
+
+			if (dlpar_cpu_exists(cdata->parent, idrc))
+				continue;
+			cdata->cpu_drcs[cdata->cpus_found++] = idrc;
+		}
+	} else {
+		if (!dlpar_cpu_exists(cdata->parent, *((u32*)drc_index)))
+			cdata->cpu_drcs[cdata->cpus_found++] =
+				*((u32*)drc_index);
+	}
+	return 0;
+}
+
 static int find_dlpar_cpus_to_add(u32 *cpu_drcs, u32 cpus_to_add)
 {
 	struct device_node *parent;
-	int cpus_found = 0;
-	int index, rc;
+	struct find_dlpar_cpus_to_add_struct cdata = {
+		NULL, cpu_drcs, cpus_to_add, 0 };
 
 	parent = of_find_node_by_path("/cpus");
 	if (!parent) {
@@ -734,28 +810,15 @@ static int find_dlpar_cpus_to_add(u32 *cpu_drcs, u32 cpus_to_add)
 		return -1;
 	}
 
-	/* Search the ibm,drc-indexes array for possible CPU drcs to
-	 * add. Note that the format of the ibm,drc-indexes array is
-	 * the number of entries in the array followed by the array
-	 * of drc values so we start looking at index = 1.
+	/* Search the appropriate property for possible CPU drcs to
+	 * add.
 	 */
-	index = 1;
-	while (cpus_found < cpus_to_add) {
-		u32 drc;
-
-		rc = of_property_read_u32_index(parent, "ibm,drc-indexes",
-						index++, &drc);
-		if (rc)
-			break;
-
-		if (dlpar_cpu_exists(parent, drc))
-			continue;
-
-		cpu_drcs[cpus_found++] = drc;
-	}
+	cdata.parent = parent;
+	check_cpu_drc_index(parent, find_dlpar_cpus_to_add_checkRun,
+				&cdata);
 
 	of_node_put(parent);
-	return cpus_found;
+	return cdata.cpus_found;
 }
 
 static int dlpar_cpu_add_by_count(u32 cpus_to_add)
diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c b/arch/powerpc/platforms/pseries/pseries_energy.c
index c7d84aa..332f3ce 100644
--- a/arch/powerpc/platforms/pseries/pseries_energy.c
+++ b/arch/powerpc/platforms/pseries/pseries_energy.c
@@ -36,6 +36,26 @@
 
 /* Helper Routines to convert between drc_index to cpu numbers */
 
+struct cpu_to_drc_index_struct {
+	u32	thread_index;
+	u32	ret;
+};
+
+static int cpu_to_drc_index_checkRun(struct of_drc_info *drc,
+				void *idata, void *not_used, int *ret_code)
+{
+        struct cpu_to_drc_index_struct *cdata = idata;
+
+	if (cdata->thread_index < drc->last_drc_index) {
+		cdata->ret = drc->drc_index_start +
+			(cdata->thread_index * drc->sequential_inc);
+		(*ret_code) = 1;
+		return 1;
+	}
+	(*ret_code) = 0;
+        return 0;
+}
+
 static u32 cpu_to_drc_index(int cpu)
 {
 	struct device_node *dn = NULL;
@@ -51,32 +71,16 @@ static u32 cpu_to_drc_index(int cpu)
 	thread_index = cpu_core_index_of_thread(cpu);
 
 	if (firmware_has_feature(FW_FEATURE_DRC_INFO)) {
-		struct property *info = NULL;
-		struct of_drc_info drc;
-		int j;
-		u32 num_set_entries;
-		const __be32 *value;
-
-		info = of_find_property(dn, "ibm,drc-info", NULL);
-		if (info == NULL)
-			goto err_of_node_put;
+		struct cpu_to_drc_index_struct cdata = {
+			thread_index, 0 };
 
-		value = of_prop_next_u32(info, NULL, &num_set_entries);
-		if (!value)
-			goto err_of_node_put;
-		value++;
-
-		for (j = 0; j < num_set_entries; j++) {
+		rc = drc_info_parser(dn, &cpu_to_drc_index_checkRun,
+				"CPU", &cdata);
 
-			of_read_drc_info_cell(&info, &value, &drc);
-			if (strncmp(drc.drc_type, "CPU", 3))
-				goto err;
-
-			if (thread_index < drc.last_drc_index)
-				break;
-		}
+		if (rc < 0)
+			goto err_of_node_put;
 
-		ret = drc.drc_index_start + (thread_index * drc.sequential_inc);
+		ret = cdata.ret;
 	} else {
 		const __be32 *indexes;
 
@@ -102,11 +106,36 @@ static u32 cpu_to_drc_index(int cpu)
 	return ret;
 }
 
+struct drc_index_to_cpu_struct {
+	u32	drc_index;
+	u32	thread_index;
+	u32	cpu;
+};
+
+static int drc_index_to_cpu_checkRun(struct of_drc_info *drc,
+				void *idata, void *not_used, int *ret_code)
+{
+        struct drc_index_to_cpu_struct *cdata = idata;
+
+	if (cdata->drc_index > drc->last_drc_index) {
+		cdata->cpu += drc->num_sequential_elems;
+		(*ret_code) = 0;
+		return 0;
+	} else {
+		cdata->cpu += ((cdata->drc_index - drc->drc_index_start) /
+				drc->sequential_inc);
+
+		cdata->thread_index = cpu_first_thread_of_core(cdata->cpu);
+		(*ret_code) = 0;
+		return 0;
+	}
+}
+
 static int drc_index_to_cpu(u32 drc_index)
 {
 	struct device_node *dn = NULL;
 	const int *indexes;
-	int thread_index = 0, cpu = 0;
+	int thread_index = 0;
 	int rc = 1;
 
 	dn = of_find_node_by_path("/cpus");
@@ -114,38 +143,13 @@ static int drc_index_to_cpu(u32 drc_index)
 		goto err;
 
 	if (firmware_has_feature(FW_FEATURE_DRC_INFO)) {
-		struct property *info = NULL;
-		struct of_drc_info drc;
-		int j;
-		u32 num_set_entries;
-		const __be32 *value;
-
-		info = of_find_property(dn, "ibm,drc-info", NULL);
-		if (info == NULL)
-			goto err_of_node_put;
-
-		value = of_prop_next_u32(info, NULL, &num_set_entries);
-		if (!value)
-			goto err_of_node_put;
-		value++;
-
-		for (j = 0; j < num_set_entries; j++) {
+		struct drc_index_to_cpu_struct cdata = {
+			drc_index, 0, 0 };
 
-			of_read_drc_info_cell(&info, &value, &drc);
-			if (strncmp(drc.drc_type, "CPU", 3))
-				goto err;
+		rc = drc_info_parser(dn, &drc_index_to_cpu_checkRun,
+					"CPU", &cdata);
+		thread_index = cdata.thread_index;
 
-			if (drc_index > drc.last_drc_index) {
-				cpu += drc.num_sequential_elems;
-				continue;
-			}
-			cpu += ((drc_index - drc.drc_index_start) /
-				drc.sequential_inc);
-
-			thread_index = cpu_first_thread_of_core(cpu);
-			rc = 0;
-			break;
-		}
 	} else {
 		unsigned long int i;
 

^ permalink raw reply related

* [RFC v3 2/4] powerpc/hotplug/drcinfo: Provide common parser for ibm,drc-info
From: Michael Bringmann @ 2018-05-17 22:41 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <41ab86ce-9557-8e89-e265-7199cb46232c@linux.vnet.ibm.com>

This patch provides a common parse function for the ibm,drc-info
property that can be modified by a callback function.  The caller
provides a pointer to the function and a pointer to their unique
data, and the parser provides the current lmb set from the struct.
The callback function may return codes indicating that the parsing
is complete, or should continue, along with an error code that may
be returned to the caller.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info firmwar
e feature" -- end of patch series applied to powerpc next)
---
Changes in V3:
  -- Update code to account for latest kernel checkins.
  -- Rebased to 4.17-rc5 kernel
---
 arch/powerpc/include/asm/prom.h             |    7 +++
 arch/powerpc/platforms/pseries/Makefile     |    2 -
 arch/powerpc/platforms/pseries/drchelpers.c |   66 +++++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/pseries/drchelpers.c

diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index b04c5ce..2e947b3 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -94,6 +94,13 @@ struct of_drc_info {
 extern int of_read_drc_info_cell(struct property **prop,
 			const __be32 **curval, struct of_drc_info *data);
 
+extern int drc_info_parser(struct device_node *dn,
+			int (*usercb)(struct of_drc_info *drc,
+					void *data,
+					void *optional_data,
+					int *ret_code),
+			char *opt_drc_type,
+			void *data);
 
 /*
  * There are two methods for telling firmware what our capabilities are.
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 13eede6..38c8547 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC64)			:= $(NO_MINIMAL_TOC)
 ccflags-$(CONFIG_PPC_PSERIES_DEBUG)	+= -DDEBUG
 
 obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
-			   of_helpers.o \
+			   of_helpers.o drchelpers.o \
 			   setup.o iommu.o event_sources.o ras.o \
 			   firmware.o power.o dlpar.o mobility.o rng.o \
 			   pci.o pci_dlpar.o eeh_pseries.o msi.o
diff --git a/arch/powerpc/platforms/pseries/drchelpers.c b/arch/powerpc/platforms/pseries/drchelpers.c
new file mode 100644
index 0000000..556e05d
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/drchelpers.c
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2018 Michael Bringmann <mbringm@us.ibm.com>, IBM
+ *
+ * pSeries specific routines for device-tree properties.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *    
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ * 
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/string.h>
+
+#include <asm/prom.h>
+#include "pseries.h"
+
+#define	MAX_DRC_NAME_LEN 64
+
+int drc_info_parser(struct device_node *dn,
+		int (*usercb)(struct of_drc_info *drc,
+				void *data,
+				void *optional_data,
+				int *ret_code),
+		char *opt_drc_type,
+		void *data)
+{
+	struct property *info;
+	unsigned int entries;
+	struct of_drc_info drc;
+	const __be32 *value;
+	int j, done = 0, ret_code = -EINVAL;
+
+	info = of_find_property(dn, "ibm,drc-info", NULL);
+	if (info == NULL)
+		return -EINVAL;
+
+	value = of_prop_next_u32(info, NULL, &entries);
+	if (!value)
+		return -EINVAL;
+	value++;
+
+	for (j = 0, done = 0; (j < entries) && (!done); j++) {
+		of_read_drc_info_cell(&info, &value, &drc);
+
+		if (opt_drc_type && strcmp(opt_drc_type, drc.drc_type))
+			continue;
+
+		done = usercb(&drc, data, NULL, &ret_code);
+	}
+
+	return ret_code;
+}
+EXPORT_SYMBOL(drc_info_parser);

^ permalink raw reply related

* [RFC v3 1/4] powerpc/hotplug/drcinfo: Fix bugs parsing ibm,drc-info structs
From: Michael Bringmann @ 2018-05-17 22:41 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <41ab86ce-9557-8e89-e265-7199cb46232c@linux.vnet.ibm.com>

[Replace/withdraw previous patch submission to ensure that testing
of related patches on similar hardware progresses together.]

This patch fixes a memory parsing bug when using of_prop_next_u32
calls at the start of a structure.  Depending upon the value of
"cur" memory pointer argument to of_prop_next_u32, it will or it
won't advance the value of the returned memory pointer by the
size of one u32.  This patch corrects the code to deal with that
indexing feature when parsing the ibm,drc-info structs for CPUs.
Also, need to advance the pointer at the end of_read_drc_info_cell
for same reason.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info firmware feature" -- end of patch series applied to powerpc next)
---
Changes in V3:
  -- Rebased patch to 4.17-rc5 kernel
---
 arch/powerpc/platforms/pseries/of_helpers.c     |    5 ++---
 arch/powerpc/platforms/pseries/pseries_energy.c |    2 ++
 drivers/pci/hotplug/rpaphp_core.c               |    1 +
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/of_helpers.c b/arch/powerpc/platforms/pseries/of_helpers.c
index 6df192f..20598b2 100644
--- a/arch/powerpc/platforms/pseries/of_helpers.c
+++ b/arch/powerpc/platforms/pseries/of_helpers.c
@@ -65,9 +65,7 @@ int of_read_drc_info_cell(struct property **prop, const __be32 **curval,
 
 	/* Get drc-index-start:encode-int */
 	p2 = (const __be32 *)p;
-	p2 = of_prop_next_u32(*prop, p2, &data->drc_index_start);
-	if (!p2)
-		return -EINVAL;
+	data->drc_index_start = of_read_number(p2, 1);
 
 	/* Get drc-name-suffix-start:encode-int */
 	p2 = of_prop_next_u32(*prop, p2, &data->drc_name_suffix_start);
@@ -88,6 +86,7 @@ int of_read_drc_info_cell(struct property **prop, const __be32 **curval,
 	p2 = of_prop_next_u32(*prop, p2, &data->drc_power_domain);
 	if (!p2)
 		return -EINVAL;
+	p2++;
 
 	/* Should now know end of current entry */
 	(*curval) = (void *)p2;
diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c b/arch/powerpc/platforms/pseries/pseries_energy.c
index 6ed2212..c7d84aa 100644
--- a/arch/powerpc/platforms/pseries/pseries_energy.c
+++ b/arch/powerpc/platforms/pseries/pseries_energy.c
@@ -64,6 +64,7 @@ static u32 cpu_to_drc_index(int cpu)
 		value = of_prop_next_u32(info, NULL, &num_set_entries);
 		if (!value)
 			goto err_of_node_put;
+		value++;
 
 		for (j = 0; j < num_set_entries; j++) {
 
@@ -126,6 +127,7 @@ static int drc_index_to_cpu(u32 drc_index)
 		value = of_prop_next_u32(info, NULL, &num_set_entries);
 		if (!value)
 			goto err_of_node_put;
+		value++;
 
 		for (j = 0; j < num_set_entries; j++) {
 
diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
index fb5e084..dccdf62 100644
--- a/drivers/pci/hotplug/rpaphp_core.c
+++ b/drivers/pci/hotplug/rpaphp_core.c
@@ -239,6 +239,7 @@ static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name,
 	value = of_prop_next_u32(info, NULL, &entries);
 	if (!value)
 		return -EINVAL;
+	value++;
 
 	for (j = 0; j < entries; j++) {
 		of_read_drc_info_cell(&info, &value, &drc);

^ permalink raw reply related

* [RFC v3 0/4] powerpc/drcinfo: Fix bugs 'ibm,drc-info' property
From: Michael Bringmann @ 2018-05-17 22:41 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon

This patch set corrects some errors and omissions in the previous
set of patches adding support for the "ibm,drc-info" property to
powerpc systems.

Unfortunately, some errors in the previous patch set break things
in some of the DLPAR operations.  In particular when attempting to
hot-add a new CPU or set of CPUs, the original patch failed to
properly calculate the available resources, and aborted the operation.
In addition, the original set missed several opportunities to compress
and reuse common code, especially, in the area of device processing.

Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
---
Changes in V3:
  -- Update code for latest kernel checkins.
  -- Fix bug with virtual devices.
  -- Rebase on top of 4.17-rc5 kernel

^ permalink raw reply

* Re: [PATCH] powerpc: Ensure gcc doesn't move around cache flushing in __patch_instruction
From: Benjamin Herrenschmidt @ 2018-05-17 22:30 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, Michael Neuling
In-Reply-To: <20180517192331.GZ17342@gate.crashing.org>

On Thu, 2018-05-17 at 14:23 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, May 17, 2018 at 01:06:10PM +1000, Benjamin Herrenschmidt wrote:
> > The current asm statement in __patch_instruction() for the cache flushes
> > doesn't have a "volatile" statement and no memory clobber. That means
> > gcc can potentially move it around (or move the store done by put_user
> > past the flush).
> 
> volatile is completely superfluous here, except maybe as documentation:
> any asm without outputs is always volatile.

I wasn't aware of that. I was drilled early on to always stick volatile
in my asm statements if they have any form of side effect :-)

> (And the memory clobber does not prevent the compiler from moving the
> asm around, or duplicating it, etc., and neither does the volatile).

It prevents load/stores from moving around doesn't it ? I wanted to
make sure the store of the instruction doesn't move in/pass the asm. If
you say that's not needed then ignore the patch.

Cheers,
Ben.
 
> 
> Segher

^ permalink raw reply

* [RFC v4 3/3] powerpc migration/memory: Associativity & memory updates
From: Michael Bringmann @ 2018-05-17 22:26 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <18541bb5-1e79-09d9-767f-1af09370e076@linux.vnet.ibm.com>

powerpc migration/memory: This patch adds more recognition for changes
to the associativity of memory blocks described by the device-tree
properties and updates local and general kernel data structures to
reflect those changes.  These differences may include:

* Evaluating 'ibm,dynamic-memory' properties when processing the
  topology of LPARS in Post Migration events.  Previous efforts
  only recognized whether a memory block's assignment had changed
  in the property.  Changes here include checking the aa_index
  values for each drc_index of the old/new LMBs and to 'readd'
  any block for which the setting has changed.

* In an LPAR migration scenario, the "ibm,associativity-lookup-arrays"
  property may change.  In the event that a row of the array differs,
  locate all assigned memory blocks with that 'aa_index' and 're-add'
  them to the system memory block data structures.  In the process of
  the 're-add', the system routines will update the corresponding entry
  for the memory in the LMB structures and any other relevant kernel
  data structures.

* Extend the previous work for the 'ibm,associativity-lookup-array'
  and 'ibm,dynamic-memory' properties to support the property
  'ibm,dynamic-memory-v2' by means of the DRMEM LMB interpretation
  code.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
Changes in RFC:
  -- Simplify code to update memory nodes during mobility checks.
  -- Reuse code from DRMEM changes to scan for LMBs when updating
     aa_index
  -- Combine common code for properties 'ibm,dynamic-memory' and
     'ibm,dynamic-memory-v2' after integrating DRMEM features.
  -- Rearrange patches to co-locate memory property-related changes.
  -- Use new paired list iterator for the drmem info arrays.
  -- Use direct calls to add/remove memory from the update drconf
     function as those operations are only intended for user DLPAR
     ops, and should not occur during Migration reconfig notifier
     changes.
  -- Correct processing bug in processing of ibm,associativity-lookup-arrays
  -- Rebase to 4.17-rc5 kernel
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |  172 +++++++++++++++++++----
 1 file changed, 139 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f5..15c6f74 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -994,13 +994,29 @@ static int pseries_add_mem_node(struct device_node *np)
 	return (ret < 0) ? -EINVAL : 0;
 }
 
-static int pseries_update_drconf_memory(struct of_reconfig_data *pr)
+static void pseries_queue_memory_event(u32 drc_index, int action)
 {
-	struct of_drconf_cell_v1 *new_drmem, *old_drmem;
-	unsigned long memblock_size;
-	u32 entries;
-	__be32 *p;
-	int i, rc = -EINVAL;
+	struct pseries_hp_errorlog *hp_elog;
+
+	hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
+	if(!hp_elog)
+		return;
+
+	hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_MEM;
+	hp_elog->action = action;
+	hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
+	hp_elog->_drc_u.drc_index = drc_index;
+
+	queue_hotplug_event(hp_elog, NULL, NULL);
+
+	kfree(hp_elog);
+}
+
+static int pseries_update_drconf_memory(struct drmem_lmb_info *new_dinfo)
+{
+	struct drmem_lmb *old_lmb, *new_lmb;
+ 	unsigned long memblock_size;
+	int rc = 0;
 
 	if (rtas_hp_event)
 		return 0;
@@ -1009,42 +1025,123 @@ static int pseries_update_drconf_memory(struct of_reconfig_data *pr)
 	if (!memblock_size)
 		return -EINVAL;
 
-	p = (__be32 *) pr->old_prop->value;
-	if (!p)
-		return -EINVAL;
+	/* Arrays should have the same size and DRC indexes */
+	for_each_pair_drmem_lmb(drmem_info, old_lmb, new_dinfo, new_lmb) {
 
-	/* The first int of the property is the number of lmb's described
-	 * by the property. This is followed by an array of of_drconf_cell
-	 * entries. Get the number of entries and skip to the array of
-	 * of_drconf_cell's.
-	 */
-	entries = be32_to_cpu(*p++);
-	old_drmem = (struct of_drconf_cell_v1 *)p;
-
-	p = (__be32 *)pr->prop->value;
-	p++;
-	new_drmem = (struct of_drconf_cell_v1 *)p;
+		if (new_lmb->drc_index != old_lmb->drc_index)
+			continue;
 
-	for (i = 0; i < entries; i++) {
-		if ((be32_to_cpu(old_drmem[i].flags) & DRCONF_MEM_ASSIGNED) &&
-		    (!(be32_to_cpu(new_drmem[i].flags) & DRCONF_MEM_ASSIGNED))) {
+		if ((old_lmb->flags & DRCONF_MEM_ASSIGNED) &&
+		    (!(new_lmb->flags & DRCONF_MEM_ASSIGNED))) {
 			rc = pseries_remove_memblock(
-				be64_to_cpu(old_drmem[i].base_addr),
-						     memblock_size);
+				old_lmb->base_addr, memblock_size);
 			break;
-		} else if ((!(be32_to_cpu(old_drmem[i].flags) &
-			    DRCONF_MEM_ASSIGNED)) &&
-			    (be32_to_cpu(new_drmem[i].flags) &
-			    DRCONF_MEM_ASSIGNED)) {
-			rc = memblock_add(be64_to_cpu(old_drmem[i].base_addr),
-					  memblock_size);
+		} else if ((!(old_lmb->flags & DRCONF_MEM_ASSIGNED)) &&
+			   (new_lmb->flags & DRCONF_MEM_ASSIGNED)) {
+			rc = memblock_add(old_lmb->base_addr,
+					memblock_size);
 			rc = (rc < 0) ? -EINVAL : 0;
 			break;
+		} else if ((old_lmb->aa_index != new_lmb->aa_index) &&
+			   (new_lmb->flags & DRCONF_MEM_ASSIGNED)) {
+			pseries_queue_memory_event(
+				new_lmb->drc_index,
+				PSERIES_HP_ELOG_ACTION_READD);
 		}
 	}
 	return rc;
 }
 
+static void pseries_update_ala_memory_aai(int aa_index)
+{
+	struct drmem_lmb *lmb;
+
+	/* Readd all LMBs which were previously using the
+	 * specified aa_index value.
+	 */
+	for_each_drmem_lmb(lmb) {
+		if ((lmb->aa_index == aa_index) &&
+			(lmb->flags & DRCONF_MEM_ASSIGNED)) {
+			pseries_queue_memory_event(lmb->drc_index,
+				PSERIES_HP_ELOG_ACTION_READD);
+		}
+	}
+}
+
+struct assoc_arrays {
+	u32 n_arrays;
+	u32 array_sz;
+	const __be32 *arrays;
+};
+
+static int pseries_update_ala_memory(struct of_reconfig_data *pr)
+{
+	struct assoc_arrays new_ala, old_ala;
+	__be32 *p;
+	int i, lim;
+
+	if (rtas_hp_event)
+		return 0;
+
+	/*
+	 * The layout of the ibm,associativity-lookup-arrays
+	 * property is a number N indicating the number of
+	 * associativity arrays, followed by a number M
+	 * indicating the size of each associativity array,
+	 * followed by a list of N associativity arrays.
+	 */
+
+	p = (__be32 *) pr->old_prop->value;
+	if (!p)
+		return -EINVAL;
+	old_ala.n_arrays = of_read_number(p++, 1);
+	old_ala.array_sz = of_read_number(p++, 1);
+	old_ala.arrays = p;
+
+	p = (__be32 *) pr->prop->value;
+	if (!p)
+		return -EINVAL;
+	new_ala.n_arrays = of_read_number(p++, 1);
+	new_ala.array_sz = of_read_number(p++, 1);
+	new_ala.arrays = p;
+
+	lim = (new_ala.n_arrays > old_ala.n_arrays) ? old_ala.n_arrays :
+			new_ala.n_arrays;
+
+	if (old_ala.array_sz == new_ala.array_sz) {
+
+		/* Reset any entries where the old and new rows
+		 * the array have changed.
+		 */
+		for (i = 0; i < lim; i++) {
+			int index = (i * new_ala.array_sz);
+
+			if (!memcmp(&old_ala.arrays[index],
+				&new_ala.arrays[index],
+				new_ala.array_sz))
+				continue;
+
+			pseries_update_ala_memory_aai(i);
+		}
+
+		/* Reset any entries representing the extra rows.
+		 * There shouldn't be any, but just in case ...
+		 */
+		for (i = lim; i < new_ala.n_arrays; i++)
+			pseries_update_ala_memory_aai(i);
+
+	} else {
+		/* Update all entries representing these rows;
+		 * as all rows have different sizes, none can
+		 * have equivalent values.
+		 */
+		for (i = 0; i < lim; i++)
+			pseries_update_ala_memory_aai(i);
+	}
+
+	return 0;
+}
+
 static int pseries_memory_notifier(struct notifier_block *nb,
 				   unsigned long action, void *data)
 {
@@ -1059,8 +1156,17 @@ static int pseries_memory_notifier(struct notifier_block *nb,
 		err = pseries_remove_mem_node(rd->dn);
 		break;
 	case OF_RECONFIG_UPDATE_PROPERTY:
-		if (!strcmp(rd->prop->name, "ibm,dynamic-memory"))
-			err = pseries_update_drconf_memory(rd);
+		if (!strcmp(rd->prop->name, "ibm,dynamic-memory") ||
+		    !strcmp(rd->prop->name, "ibm,dynamic-memory-v2")) {
+			struct drmem_lmb_info *dinfo =
+				drmem_init_lmbs(rd->prop);
+			if (!dinfo)
+				return -EINVAL;
+			err = pseries_update_drconf_memory(dinfo);
+			drmem_lmbs_free(dinfo);
+		} else if (!strcmp(rd->prop->name,
+				"ibm,associativity-lookup-arrays"))
+			err = pseries_update_ala_memory(rd);
 		break;
 	}
 	return notifier_from_errno(err);

^ permalink raw reply related

* [RFC v4 2/3] powerpc migration/cpu: Associativity & cpu changes
From: Michael Bringmann @ 2018-05-17 22:26 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <18541bb5-1e79-09d9-767f-1af09370e076@linux.vnet.ibm.com>

powerpc migration/cpu: Now apply changes to the associativity of cpus
for the topology of LPARS in Post Migration events.  Recognize more
changes to the associativity of memory blocks described by the
'cpu' properties when processing the topology of LPARS in Post Migration
events.  Previous efforts only recognized whether a memory block's
assignment had changed in the property.  Changes here include:

* Provide hotplug CPU 'readd by index' operation
* Checking for changes in cpu associativity and making 'readd' calls
  when differences are observed.
* Queue up  changes to CPU properties so that they may take place
  after all PowerPC device-tree changes have been applied i.e. after
  the device hotplug is released in the mobility code.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
Changes include:
  -- Rearrange patches to co-locate CPU property-related changes.
  -- Modify dlpar_cpu_add & dlpar_cpu_remove to skip DRC index acquire
     or release operations during the CPU readd process.
  -- Correct a bug in DRC index selection for queued operation.
  -- Rebase to 4.17-rc5 kernel
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  123 +++++++++++++++++++-------
 arch/powerpc/platforms/pseries/mobility.c    |    3 +
 2 files changed, 95 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a408217..23d4cb8 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -474,7 +474,7 @@ static bool valid_cpu_drc_index(struct device_node *parent, u32 drc_index)
 				&cdata);
 }
 
-static ssize_t dlpar_cpu_add(u32 drc_index)
+static ssize_t dlpar_cpu_add(u32 drc_index, bool acquire_drc)
 {
 	struct device_node *dn, *parent;
 	int rc, saved_rc;
@@ -499,19 +499,22 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
 		return -EINVAL;
 	}
 
-	rc = dlpar_acquire_drc(drc_index);
-	if (rc) {
-		pr_warn("Failed to acquire DRC, rc: %d, drc index: %x\n",
-			rc, drc_index);
-		of_node_put(parent);
-		return -EINVAL;
+	if (acquire_drc) {
+		rc = dlpar_acquire_drc(drc_index);
+		if (rc) {
+			pr_warn("Failed to acquire DRC, rc: %d, drc index: %x\n",
+				rc, drc_index);
+			of_node_put(parent);
+			return -EINVAL;
+		}
 	}
 
 	dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent);
 	if (!dn) {
 		pr_warn("Failed call to configure-connector, drc index: %x\n",
 			drc_index);
-		dlpar_release_drc(drc_index);
+		if (acquire_drc)
+			dlpar_release_drc(drc_index);
 		of_node_put(parent);
 		return -EINVAL;
 	}
@@ -526,8 +529,9 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
 		pr_warn("Failed to attach node %s, rc: %d, drc index: %x\n",
 			dn->name, rc, drc_index);
 
-		rc = dlpar_release_drc(drc_index);
-		if (!rc)
+		if (acquire_drc)
+			rc = dlpar_release_drc(drc_index);
+		if (!rc || acquire_drc)
 			dlpar_free_cc_nodes(dn);
 
 		return saved_rc;
@@ -540,7 +544,7 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
 			dn->name, rc, drc_index);
 
 		rc = dlpar_detach_node(dn);
-		if (!rc)
+		if (!rc && acquire_drc)
 			dlpar_release_drc(drc_index);
 
 		return saved_rc;
@@ -608,12 +612,13 @@ static int dlpar_offline_cpu(struct device_node *dn)
 
 }
 
-static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
+static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index,
+				bool release_drc)
 {
 	int rc;
 
-	pr_debug("Attempting to remove CPU %s, drc index: %x\n",
-		 dn->name, drc_index);
+	pr_debug("Attempting to remove CPU %s, drc index: %x (%d)\n",
+		 dn->name, drc_index, release_drc);
 
 	rc = dlpar_offline_cpu(dn);
 	if (rc) {
@@ -621,12 +626,14 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
 		return -EINVAL;
 	}
 
-	rc = dlpar_release_drc(drc_index);
-	if (rc) {
-		pr_warn("Failed to release drc (%x) for CPU %s, rc: %d\n",
-			drc_index, dn->name, rc);
-		dlpar_online_cpu(dn);
-		return rc;
+	if (release_drc) {
+		rc = dlpar_release_drc(drc_index);
+		if (rc) {
+			pr_warn("Failed to release drc (%x) for CPU %s, rc: %d\n",
+				drc_index, dn->name, rc);
+			dlpar_online_cpu(dn);
+			return rc;
+		}
 	}
 
 	rc = dlpar_detach_node(dn);
@@ -635,7 +642,8 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
 
 		pr_warn("Failed to detach CPU %s, rc: %d", dn->name, rc);
 
-		rc = dlpar_acquire_drc(drc_index);
+		if (release_drc)
+			rc = dlpar_acquire_drc(drc_index);
 		if (!rc)
 			dlpar_online_cpu(dn);
 
@@ -664,7 +672,7 @@ static struct device_node *cpu_drc_index_to_dn(u32 drc_index)
 	return dn;
 }
 
-static int dlpar_cpu_remove_by_index(u32 drc_index)
+static int dlpar_cpu_remove_by_index(u32 drc_index, bool release_drc)
 {
 	struct device_node *dn;
 	int rc;
@@ -676,10 +684,30 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
 		return -ENODEV;
 	}
 
-	rc = dlpar_cpu_remove(dn, drc_index);
+	rc = dlpar_cpu_remove(dn, drc_index, release_drc);
 	of_node_put(dn);
 	return rc;
 }
+ 
+static int dlpar_cpu_readd_by_index(u32 drc_index)
+{
+	int rc = 0;
+
+	pr_info("Attempting to re-add CPU, drc index %x\n", drc_index);
+
+	rc = dlpar_cpu_remove_by_index(drc_index, false);
+	if (!rc)
+		rc = dlpar_cpu_add(drc_index, false);
+
+	if (rc)
+		pr_info("Failed to update cpu at drc_index %lx\n",
+				(unsigned long int)drc_index);
+	else
+		pr_info("CPU at drc_index %lx was updated\n",
+				(unsigned long int)drc_index);
+
+	return rc;
+}
 
 static int find_dlpar_cpus_to_remove(u32 *cpu_drcs, int cpus_to_remove)
 {
@@ -741,7 +769,7 @@ static int dlpar_cpu_remove_by_count(u32 cpus_to_remove)
 	}
 
 	for (i = 0; i < cpus_to_remove; i++) {
-		rc = dlpar_cpu_remove_by_index(cpu_drcs[i]);
+		rc = dlpar_cpu_remove_by_index(cpu_drcs[i], true);
 		if (rc)
 			break;
 
@@ -752,7 +780,7 @@ static int dlpar_cpu_remove_by_count(u32 cpus_to_remove)
 		pr_warn("CPU hot-remove failed, adding back removed CPUs\n");
 
 		for (i = 0; i < cpus_removed; i++)
-			dlpar_cpu_add(cpu_drcs[i]);
+			dlpar_cpu_add(cpu_drcs[i], true);
 
 		rc = -EINVAL;
 	} else {
@@ -843,7 +871,7 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
 	}
 
 	for (i = 0; i < cpus_to_add; i++) {
-		rc = dlpar_cpu_add(cpu_drcs[i]);
+		rc = dlpar_cpu_add(cpu_drcs[i], true);
 		if (rc)
 			break;
 
@@ -854,7 +882,7 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
 		pr_warn("CPU hot-add failed, removing any added CPUs\n");
 
 		for (i = 0; i < cpus_added; i++)
-			dlpar_cpu_remove_by_index(cpu_drcs[i]);
+			dlpar_cpu_remove_by_index(cpu_drcs[i], true);
 
 		rc = -EINVAL;
 	} else {
@@ -880,7 +908,7 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
 			rc = dlpar_cpu_remove_by_count(count);
 		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
-			rc = dlpar_cpu_remove_by_index(drc_index);
+			rc = dlpar_cpu_remove_by_index(drc_index, true);
 		else
 			rc = -EINVAL;
 		break;
@@ -888,10 +916,13 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
 		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
 			rc = dlpar_cpu_add_by_count(count);
 		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
-			rc = dlpar_cpu_add(drc_index);
+			rc = dlpar_cpu_add(drc_index, true);
 		else
 			rc = -EINVAL;
 		break;
+	case PSERIES_HP_ELOG_ACTION_READD:
+		rc = dlpar_cpu_readd_by_index(drc_index);
+		break;
 	default:
 		pr_err("Invalid action (%d) specified\n", hp_elog->action);
 		rc = -EINVAL;
@@ -913,7 +944,7 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
 	if (rc)
 		return -EINVAL;
 
-	rc = dlpar_cpu_add(drc_index);
+	rc = dlpar_cpu_add(drc_index, true);
 
 	return rc ? rc : count;
 }
@@ -934,7 +965,7 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
 		return -EINVAL;
 	}
 
-	rc = dlpar_cpu_remove(dn, drc_index);
+	rc = dlpar_cpu_remove(dn, drc_index, true);
 	of_node_put(dn);
 
 	return rc ? rc : count;
@@ -942,12 +973,38 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
 
 #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
+static int pseries_update_cpu(struct of_reconfig_data *pr)
+{
+	/* So far, we only handle the 'ibm,associativity' property,
+	 * here.
+	 */
+	struct pseries_hp_errorlog *hp_elog;
+
+        hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
+        if(!hp_elog)
+                return -ENOMEM;
+
+	hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_CPU;
+	hp_elog->action = PSERIES_HP_ELOG_ACTION_READD;
+	hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
+	hp_elog->_drc_u.drc_index = be32_to_cpu(pr->dn->phandle);
+
+	queue_hotplug_event(hp_elog, NULL, NULL);
+
+	kfree(hp_elog);
+
+	return 0;
+}
+
 static int pseries_smp_notifier(struct notifier_block *nb,
 				unsigned long action, void *data)
 {
 	struct of_reconfig_data *rd = data;
 	int err = 0;
 
+	if (strcmp(rd->dn->type, "cpu"))
+		return notifier_from_errno(err);
+
 	switch (action) {
 	case OF_RECONFIG_ATTACH_NODE:
 		err = pseries_add_processor(rd->dn);
@@ -955,6 +1012,10 @@ static int pseries_smp_notifier(struct notifier_block *nb,
 	case OF_RECONFIG_DETACH_NODE:
 		pseries_remove_processor(rd->dn);
 		break;
+	case OF_RECONFIG_UPDATE_PROPERTY:
+		if (!strcmp(rd->prop->name, "ibm,associativity"))
+			err = pseries_update_cpu(rd);
+		break;
 	}
 	return notifier_from_errno(err);
 }
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 8a8033a..6d98f84 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -283,6 +283,8 @@ int pseries_devicetree_update(s32 scope)
 	if (!rtas_buf)
 		return -ENOMEM;
 
+	lock_device_hotplug();
+
 	do {
 		rc = mobility_rtas_call(update_nodes_token, rtas_buf, scope);
 		if (rc && rc != 1)
@@ -321,6 +323,7 @@ int pseries_devicetree_update(s32 scope)
 	} while (rc == 1);
 
 	kfree(rtas_buf);
+	unlock_device_hotplug();
 	return rc;
 }
 

^ permalink raw reply related

* [RFC v4 1/3] powerpc migration/drmem: Modify DRMEM code to export more features
From: Michael Bringmann @ 2018-05-17 22:26 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon
In-Reply-To: <18541bb5-1e79-09d9-767f-1af09370e076@linux.vnet.ibm.com>

powerpc migration/drmem: Export many of the functions of DRMEM to
parse "ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during
hotplug operations and for Post Migration events.

Also modify the DRMEM initialization code to allow it to,

* Be called after system initialization
* Provide a separate user copy of the LMB array that is produces
* Free the user copy upon request

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
Changes in RFC:
  -- Separate DRMEM changes into a standalone patch
  -- Do not export excess functions.  Make exported names more explicit.
  -- Add new iterator to work through a pair of drmem_info arrays.
  -- Modify DRMEM code to replace usages of dt_root_addr_cells, and
     dt_mem_next_cell, as these are only available at first boot.
  -- Rebase to 4.17-rc5 kernel
---
 arch/powerpc/include/asm/drmem.h |   10 +++++
 arch/powerpc/mm/drmem.c          |   78 +++++++++++++++++++++++++++-----------
 2 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index ce242b9..c964b89 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -35,6 +35,13 @@ struct drmem_lmb_info {
 		&drmem_info->lmbs[0],				\
 		&drmem_info->lmbs[drmem_info->n_lmbs - 1])
 
+#define for_each_pair_drmem_lmb(dinfo1, lmb1, dinfo2, lmb2)	\
+	for ((lmb1) = (&dinfo1->lmbs[0]),			\
+	     (lmb2) = (&dinfo2->lmbs[0]);			\
+             ((lmb1) <= (&dinfo1->lmbs[dinfo1->n_lmbs - 1])) &&	\
+             ((lmb2) <= (&dinfo2->lmbs[dinfo2->n_lmbs - 1]));	\
+	     (lmb1)++, (lmb2)++)
+
 /*
  * The of_drconf_cell_v1 struct defines the layout of the LMB data
  * specified in the ibm,dynamic-memory device tree property.
@@ -94,6 +101,9 @@ void __init walk_drmem_lmbs(struct device_node *dn,
 			void (*func)(struct drmem_lmb *, const __be32 **));
 int drmem_update_dt(void);
 
+struct drmem_lmb_info* drmem_init_lmbs(struct property *prop);
+void drmem_lmbs_free(struct drmem_lmb_info *dinfo);
+
 #ifdef CONFIG_PPC_PSERIES
 void __init walk_drmem_lmbs_early(unsigned long node,
 			void (*func)(struct drmem_lmb *, const __be32 **));
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 3f18036..d9b281c 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -20,6 +20,7 @@
 
 static struct drmem_lmb_info __drmem_info;
 struct drmem_lmb_info *drmem_info = &__drmem_info;
+static int n_root_addr_cells;
 
 u64 drmem_lmb_memory_max(void)
 {
@@ -193,12 +194,13 @@ int drmem_update_dt(void)
 	return rc;
 }
 
-static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
+static void read_drconf_v1_cell(struct drmem_lmb *lmb,
 				       const __be32 **prop)
 {
 	const __be32 *p = *prop;
 
-	lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, &p);
+	lmb->base_addr = of_read_number(p, n_root_addr_cells);
+	p += n_root_addr_cells;
 	lmb->drc_index = of_read_number(p++, 1);
 
 	p++; /* skip reserved field */
@@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
 	*prop = p;
 }
 
-static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
+static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *data,
 			void (*func)(struct drmem_lmb *, const __be32 **))
 {
 	struct drmem_lmb lmb;
@@ -221,17 +223,18 @@ static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
 
 	for (i = 0; i < n_lmbs; i++) {
 		read_drconf_v1_cell(&lmb, &prop);
-		func(&lmb, &usm);
+		func(&lmb, &data);
 	}
 }
 
-static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
+static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
 				       const __be32 **prop)
 {
 	const __be32 *p = *prop;
 
 	dr_cell->seq_lmbs = of_read_number(p++, 1);
-	dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, &p);
+	dr_cell->base_addr = of_read_number(p, n_root_addr_cells);
+	p += n_root_addr_cells;
 	dr_cell->drc_index = of_read_number(p++, 1);
 	dr_cell->aa_index = of_read_number(p++, 1);
 	dr_cell->flags = of_read_number(p++, 1);
@@ -239,7 +242,7 @@ static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
 	*prop = p;
 }
 
-static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
+static void __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *data,
 			void (*func)(struct drmem_lmb *, const __be32 **))
 {
 	struct of_drconf_cell_v2 dr_cell;
@@ -263,7 +266,7 @@ static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
 			lmb.aa_index = dr_cell.aa_index;
 			lmb.flags = dr_cell.flags;
 
-			func(&lmb, &usm);
+			func(&lmb, &data);
 		}
 	}
 }
@@ -275,6 +278,9 @@ void __init walk_drmem_lmbs_early(unsigned long node,
 	const __be32 *prop, *usm;
 	int len;
 
+	if (n_root_addr_cells == 0)
+		n_root_addr_cells = dt_root_addr_cells;
+
 	prop = of_get_flat_dt_prop(node, "ibm,lmb-size", &len);
 	if (!prop || len < dt_root_size_cells * sizeof(__be32))
 		return;
@@ -353,24 +359,26 @@ void __init walk_drmem_lmbs(struct device_node *dn,
 	}
 }
 
-static void __init init_drmem_v1_lmbs(const __be32 *prop)
+static void init_drmem_v1_lmbs(const __be32 *prop,
+			struct drmem_lmb_info *dinfo)
 {
 	struct drmem_lmb *lmb;
 
-	drmem_info->n_lmbs = of_read_number(prop++, 1);
-	if (drmem_info->n_lmbs == 0)
+	dinfo->n_lmbs = of_read_number(prop++, 1);
+	if (dinfo->n_lmbs == 0)
 		return;
 
-	drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
+	dinfo->lmbs = kcalloc(dinfo->n_lmbs, sizeof(*lmb),
 				   GFP_KERNEL);
-	if (!drmem_info->lmbs)
+	if (!dinfo->lmbs)
 		return;
 
 	for_each_drmem_lmb(lmb)
 		read_drconf_v1_cell(lmb, &prop);
 }
 
-static void __init init_drmem_v2_lmbs(const __be32 *prop)
+static void init_drmem_v2_lmbs(const __be32 *prop,
+			struct drmem_lmb_info *dinfo)
 {
 	struct drmem_lmb *lmb;
 	struct of_drconf_cell_v2 dr_cell;
@@ -386,12 +394,12 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 	p = prop;
 	for (i = 0; i < lmb_sets; i++) {
 		read_drconf_v2_cell(&dr_cell, &p);
-		drmem_info->n_lmbs += dr_cell.seq_lmbs;
+		dinfo->n_lmbs += dr_cell.seq_lmbs;
 	}
 
-	drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
+	dinfo->lmbs = kcalloc(dinfo->n_lmbs, sizeof(*lmb),
 				   GFP_KERNEL);
-	if (!drmem_info->lmbs)
+	if (!dinfo->lmbs)
 		return;
 
 	/* second pass, read in the LMB information */
@@ -402,25 +410,51 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 		read_drconf_v2_cell(&dr_cell, &p);
 
 		for (j = 0; j < dr_cell.seq_lmbs; j++) {
-			lmb = &drmem_info->lmbs[lmb_index++];
+			lmb = &dinfo->lmbs[lmb_index++];
 
 			lmb->base_addr = dr_cell.base_addr;
-			dr_cell.base_addr += drmem_info->lmb_size;
+			dr_cell.base_addr += dinfo->lmb_size;
 
 			lmb->drc_index = dr_cell.drc_index;
 			dr_cell.drc_index++;
 
 			lmb->aa_index = dr_cell.aa_index;
-			lmb->flags = dr_cell.flags;
 		}
 	}
 }
 
+void drmem_lmbs_free(struct drmem_lmb_info *dinfo)
+{
+	if (dinfo) {
+		kfree(dinfo->lmbs);
+		kfree(dinfo);
+	}
+}
+
+struct drmem_lmb_info* drmem_init_lmbs(struct property *prop)
+{
+	struct drmem_lmb_info *dinfo;
+
+	dinfo = kzalloc(sizeof(*dinfo), GFP_KERNEL);
+	if (!dinfo)
+		return NULL;
+
+	if (!strcmp("ibm,dynamic-memory", prop->name))
+		init_drmem_v1_lmbs(prop->value, dinfo);
+	else if (!strcmp("ibm,dynamic-memory-v2", prop->name))
+		init_drmem_v2_lmbs(prop->value, dinfo);
+
+	return dinfo;
+}
+
 static int __init drmem_init(void)
 {
 	struct device_node *dn;
 	const __be32 *prop;
 
+	if (n_root_addr_cells == 0)
+		n_root_addr_cells = dt_root_addr_cells;
+
 	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
 	if (!dn) {
 		pr_info("No dynamic reconfiguration memory found\n");
@@ -434,11 +468,11 @@ static int __init drmem_init(void)
 
 	prop = of_get_property(dn, "ibm,dynamic-memory", NULL);
 	if (prop) {
-		init_drmem_v1_lmbs(prop);
+		init_drmem_v1_lmbs(prop, drmem_info);
 	} else {
 		prop = of_get_property(dn, "ibm,dynamic-memory-v2", NULL);
 		if (prop)
-			init_drmem_v2_lmbs(prop);
+			init_drmem_v2_lmbs(prop, drmem_info);
 	}
 
 	of_node_put(dn);

^ permalink raw reply related

* [RFC v4 0/3] powerpc/hotplug: Fix affinity assoc for LPAR migration
From: Michael Bringmann @ 2018-05-17 22:25 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michael Bringmann, Nathan Fontenot, John Allen, Tyrel Datwyler,
	Thomas Falcon

The migration of LPARs across Power systems affects many attributes
including that of the associativity of memory blocks and CPUs.  The
patches in this set execute when a system is coming up fresh upon a
migration target.  They are intended to,

* Recognize changes to the associativity of memory and CPUs recorded
  in internal data structures when compared to the latest copies in
  the device tree (e.g. ibm,dynamic-memory, ibm,dynamic-memory-v2,
  cpus),
* Recognize changes to the associativity mapping (e.g. ibm,
  associativity-lookup-arrays), locate all assigned memory blocks
  corresponding to each changed row, and readd all such blocks.
* Generate calls to other code layers to reset the data structures
  related to associativity of the CPUs and memory.
* Re-register the 'changed' entities into the target system.
  Re-registration of CPUs and memory blocks mostly entails acting as
  if they have been newly hot-added into the target system.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>

Michael Bringmann (3):
  powerpc migration/drmem: Modify DRMEM code to export more features
  powerpc migration/cpu: Associativity & cpu changes
  powerpc migration/memory: Associativity & memory updates
---
Changes in RFC:
  -- Restructure and rearrange content of patches to co-locate
     similar or related modifications
  -- Rename pseries_update_drconf_cpu to pseries_update_cpu
  -- Simplify code to update CPU nodes during mobility checks.
     Remove functions to generate extra HP_ELOG messages in favor
     of direct function calls to dlpar_cpu_readd_by_index, or
     dlpar_memory_readd_by_index.
  -- Revise code order in dlpar_cpu_readd_by_index() to present
     more appropriate error codes from underlying layers of the
     implementation.
  -- Add hotplug device lock around all property updates
  -- Schedule all CPU and memory changes due to device-tree updates /
     LPAR mobility as workqueue operations
  -- Export DRMEM accessor functions to parse 'ibm,dynamic-memory-v2'
  -- Export DRMEM functions to provide user copies of LMB array
  -- Compress code using DRMEM accessor functions.
  -- Split topology timer crash fix into new patch.
  -- Modify DRMEM code to replace usages of dt_root_addr_cells, and
     dt_mem_next_cell, as these are only available at first boot.
  -- Correct a bug in DRC index selection for queued operation.
  -- Rebase to 4.17-rc5 kernel

^ permalink raw reply

* Re: [PATCH v17 0/9] Address error and recovery for AER and DPC
From: Bjorn Helgaas @ 2018-05-17 22:16 UTC (permalink / raw)
  To: Oza Pawandeep
  Cc: Bjorn Helgaas, Philippe Ombredanne, Thomas Gleixner,
	Greg Kroah-Hartman, Kate Stewart, linux-pci, linux-kernel,
	Dongdong Liu, Keith Busch, Wei Zhang, Sinan Kaya, Timur Tabi,
	Russell Currey, Sam Bobroff, Bryant G. Ly, linuxppc-dev,
	Sebastian Ott, linux-s390
In-Reply-To: <1526542991-5291-1-git-send-email-poza@codeaurora.org>

[+cc Russell, Sam, Bryant, linuxppc-dev, Sebastian, linux-s390]

Sorry, I should have pulled in these new CC's earlier because ppc and
s390 both have PCI error handling similar to what Oza is changing
here.

The basic issue is that the new PCIe DPC (Downstream Port Containment,
see PCIe r4.0, sec 6.2.10) feature doesn't fit very well in the
framework of the pci_error_handlers callbacks.

When DPC is enabled, a Downstream Port (either a Root Port or a Switch
Downstream Port) that receives an ERR_FATAL message automatically
disables its Link.  IIUC, this is also intended for use in hot-unplug
scenarios.

When the DPC hardware takes the Link down, it resets all the
downstream devices, and there's not much point in calling the
pci_error_handlers callbacks because the devices are unreachable.
Even after the Link comes back up, we can't be certain the same device
is there because of the hotplug possibility.

The software side of DPC recovery basically consists of detaching the
drivers of the downstream devices (calling their .remove() methods),
bringing the link back up, re-enumerating the downstream devices, and
re-attaching the drivers (calling their .probe() methods).

The existing AER code also responds to ERR_FATAL messages, but it does
call the pci_error_handlers callbacks and also resets the link.

This is a bit of a mess because things look a lot different to the
driver depending on whether the platform supports AER or DPC.

Since we can't change the way DPC works, the idea of this series is
basically to make AER handle ERR_FATAL more like DPC does, i.e., by
resetting the link, detaching, and re-attaching the drivers.

This series is currently on my pci/aer branch
(https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=pci/aer)
and is headed for v4.18 unless somebody raises major objections.

On Thu, May 17, 2018 at 03:43:02AM -0400, Oza Pawandeep wrote:
> This patch set brings in error handling support for DPC
> 
> The current implementation of AER and error message broadcasting to the
> EP driver is tightly coupled and limited to AER service driver.
> It is important to factor out broadcasting and other link handling
> callbacks. So that not only when AER gets triggered, but also when DPC get
> triggered (for e.g. ERR_FATAL), callbacks are handled appropriately.
> 
> The goal of the patch-set is:
> DPC should handle the error handling and recovery similar to AER, because 
> finally both are attempting recovery in some or the other way,
> and for that error handling and recovery framework has to be loosely
> coupled.
> 
> It achieves uniformity and transparency to the error handling agents such
> as AER, DPC, with respect to recovery and error handling.
> 
> So, this patch-set tries to unify lot of things between error agents and
> make them behave in a well defined way. (be it error (FATAL, NON_FATAL)
> handling or recovery).
> 
> The FATAL error handling is handled with remove/reset_link/re-enumerate
> sequence while the NON_FATAL follows the default path.
> Documentation/PCI/pci-error-recovery.txt talks more on that.

I applied this series with a trivial change to remove an unused variable to
pci/aer for v4.18, thanks!

> Changes since v16:
>     Bjorn's comments addressed
>     > remove call pci_walk_bus(dev->subordinate, report_resume, &result_data)
>     > pci_cleanup_aer_uncorrect_error_status(dev); happens only if service is AER
>     > aer_error_resume does not handle ERR_FATAL clearing anymore
> Changes since v15:
>     Bjorn's comments addressed
>     > minor comments fixed
>     > made FATAL sequence aligned to existing one, as far as clearing status are concerned.
>     > pcie_do_fatal_recovery and pcie_do_nonfatal_recovery functions made to modularize
>     > pcie_do_fatal_recovery now takes service as an argument
> Changes since v14:
>     Bjorn's comments addressed
>     > simplified the patch set, and moved AER_FATAL handling in the beginning.
>     > rebase the code to 4.17-rc1.
> Changes since v13:
>     Bjorn's comments addressed
>     > handke FATAL errors with remove devices followed by re-enumeration.
>     > changes in AER and DPC along with required Documentation.
> Changes since v12:
>     Bjorn's and Keith's Comments addressed.
>     > Made DPC and AER error handling identical <aligned err.c>
>     > hanldled cases for hotplug enabled system differently.
> Changes since v11:
>     Bjorn's comments addressed.
>     > rename pcie-err.c to err.c
>     > removed EXPORT_SYMBOL
>     > made generic find_serivce function in port driver.
>     > removed mutex patch as no need to have mutex in pcie_do_recovery
>     > brough in DPC_FATAL in aer.h
>     > so now all the error codes (AER and DPC) are unified in aer.h
> Changes since v10:
>     Christoph Hellwig's, David Laight's and Randy Dunlap's
>     comments addressed.
>         > renamed pci_do_recovery to pcie_do_recovery
>         > removed inner braces in conditional statements.
>         > restrctured the code in pci_wait_for_link
>         > EXPORT_SYMBOL_GPL
> Changes since v9:
>     Sinan's comments addressed.
>         > bool active = true; unnecessary variable removed.
> Changes since v8:
>     Fixed Kbuild errors.
> Changes since v7:
>     Rebased the code on pci master
>         > https://kernel.googlesource.com/pub/scm/linux/kernel/git/helgaas/pci
> Changes since v6:
>     Sinan's and Stefan's comments implemented.
>         > reordered patch 6 and 7
>         > cleaned up
> Changes since v5:
>     Sinan's and Keith's comments incorporated.
>         > made separate patch for mutex
>         > unified error repotting codes into driver/pci/pci.h
>         > got rid of wait link active/inactive and
>           made generic function in driver/pci/pci.c
> Changes since v4:
>     Bjorn's comments incorporated.
>         > Renamed only do_recovery.
>         > moved the things more locally to drivers/pci/pci.h
> Changes since v3:
>     Bjorn's comments incorporated.
>         > Made separate patch renaming generic pci_err.c
>         > Introduce pci_err.h to contain all the error types and recovery
>         > removed all the dependencies on pci.h
> Changes since v2:
>     Based on feedback from Keith:
>     "
>     When DPC is triggered due to receipt of an uncorrectable error Message,
>     the Requester ID from the Message is recorded in the DPC Error
>     Source ID register and that Message is discarded and not forwarded Upstream.
>     "
>     Removed the patch where AER checks if DPC service is active
> Changes since v1:
>     Kbuild errors fixed:
>         > pci_find_dpc_dev made static
>         > ras_event.h updated
>         > pci_find_aer_service call with CONFIG check
>         > pci_find_dpc_service call with CONFIG check
> 
> Oza Pawandeep (9):
>   PCI: Unify wait for link active into generic PCI
>   pci-error-recovery: Add AER_FATAL handling
>   PCI/AER: Handle ERRR_FATAL with removal and re-enumeration of devices
>   PCI/AER: Rename error recovery to generic PCI naming
>   PCI/AER: Factor out error reporting from AER
>   PCI/PORTDRV: Implement generic find service
>   PCI/PORTDRV: Implement generic find device
>   PCI/DPC: Unify and plumb error handling into DPC
>   PCI/DPC: Disable ERR_NONFATAL and enable ERR_FATAL for DPC
> 
>  Documentation/PCI/pci-error-recovery.txt |  35 ++-
>  drivers/pci/hotplug/pciehp_hpc.c         |  20 +-
>  drivers/pci/pci.c                        |  29 +++
>  drivers/pci/pci.h                        |   4 +
>  drivers/pci/pcie/Makefile                |   2 +-
>  drivers/pci/pcie/aer/aerdrv.c            |   2 +
>  drivers/pci/pcie/aer/aerdrv.h            |  30 ---
>  drivers/pci/pcie/aer/aerdrv_core.c       | 317 +-------------------------
>  drivers/pci/pcie/dpc.c                   |  58 +++--
>  drivers/pci/pcie/err.c                   | 374 +++++++++++++++++++++++++++++++
>  drivers/pci/pcie/portdrv.h               |   4 +
>  drivers/pci/pcie/portdrv_core.c          |  67 ++++++
>  include/linux/aer.h                      |   1 +
>  include/uapi/linux/pci_regs.h            |   1 +
>  14 files changed, 540 insertions(+), 404 deletions(-)
>  create mode 100644 drivers/pci/pcie/err.c
> 
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH] powerpc: Ensure gcc doesn't move around cache flushing in __patch_instruction
From: Segher Boessenkool @ 2018-05-17 19:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Michael Neuling
In-Reply-To: <48284701fe497bb4f5bede5c55bbce9d70309562.camel@kernel.crashing.org>

Hi!

On Thu, May 17, 2018 at 01:06:10PM +1000, Benjamin Herrenschmidt wrote:
> The current asm statement in __patch_instruction() for the cache flushes
> doesn't have a "volatile" statement and no memory clobber. That means
> gcc can potentially move it around (or move the store done by put_user
> past the flush).

volatile is completely superfluous here, except maybe as documentation:
any asm without outputs is always volatile.

(And the memory clobber does not prevent the compiler from moving the
asm around, or duplicating it, etc., and neither does the volatile).


Segher

^ permalink raw reply

* Re: [PATCH bpf 5/6] tools: bpftool: resolve calls without using imm field
From: Jakub Kicinski @ 2018-05-17 18:51 UTC (permalink / raw)
  To: Sandipan Das; +Cc: ast, daniel, netdev, linuxppc-dev, naveen.n.rao
In-Reply-To: <20180517063548.6373-6-sandipan@linux.vnet.ibm.com>

On Thu, 17 May 2018 12:05:47 +0530, Sandipan Das wrote:
> Currently, we resolve the callee's address for a JITed function
> call by using the imm field of the call instruction as an offset
> from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
> use this address to get the callee's kernel symbol's name.
> 
> For some architectures, such as powerpc64, the imm field is not
> large enough to hold this offset. So, instead of assigning this
> offset to the imm field, the verifier now assigns the subprog
> id. Also, a list of kernel symbol addresses for all the JITed
> functions is provided in the program info. We now use the imm
> field as an index for this list to lookup a callee's symbol's
> address and resolve its name.
> 
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>

A few nit-picks below, thank you for the patch!

>  tools/bpf/bpftool/prog.c          | 31 +++++++++++++++++++++++++++++++
>  tools/bpf/bpftool/xlated_dumper.c | 24 +++++++++++++++++-------
>  tools/bpf/bpftool/xlated_dumper.h |  2 ++
>  3 files changed, 50 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 9bdfdf2d3fbe..ac2f62a97e84 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -430,6 +430,10 @@ static int do_dump(int argc, char **argv)
>  	unsigned char *buf;
>  	__u32 *member_len;
>  	__u64 *member_ptr;
> +	unsigned int nr_addrs;
> +	unsigned long *addrs = NULL;
> +	__u32 *ksyms_len;
> +	__u64 *ksyms_ptr;

nit: please try to keep the variables ordered longest to shortest like
we do in networking code (please do it in all functions).

>  	ssize_t n;
>  	int err;
>  	int fd;
> @@ -437,6 +441,8 @@ static int do_dump(int argc, char **argv)
>  	if (is_prefix(*argv, "jited")) {
>  		member_len = &info.jited_prog_len;
>  		member_ptr = &info.jited_prog_insns;
> +		ksyms_len = &info.nr_jited_ksyms;
> +		ksyms_ptr = &info.jited_ksyms;
>  	} else if (is_prefix(*argv, "xlated")) {
>  		member_len = &info.xlated_prog_len;
>  		member_ptr = &info.xlated_prog_insns;
> @@ -496,10 +502,23 @@ static int do_dump(int argc, char **argv)
>  		return -1;
>  	}
>  
> +	nr_addrs = *ksyms_len;

Here and ...

> +	if (nr_addrs) {
> +		addrs = malloc(nr_addrs * sizeof(__u64));
> +		if (!addrs) {
> +			p_err("mem alloc failed");
> +			free(buf);
> +			close(fd);
> +			return -1;

You can just jump to err_free here.

> +		}
> +	}
> +
>  	memset(&info, 0, sizeof(info));
>  
>  	*member_ptr = ptr_to_u64(buf);
>  	*member_len = buf_size;
> +	*ksyms_ptr = ptr_to_u64(addrs);
> +	*ksyms_len = nr_addrs;

... here - this function is getting long, so maybe I'm not seeing
something, but are ksyms_ptr and ksyms_len guaranteed to be initialized?

>  	err = bpf_obj_get_info_by_fd(fd, &info, &len);
>  	close(fd);
> @@ -513,6 +532,11 @@ static int do_dump(int argc, char **argv)
>  		goto err_free;
>  	}
>  
> +	if (*ksyms_len > nr_addrs) {
> +		p_err("too many addresses returned");
> +		goto err_free;
> +	}
> +
>  	if ((member_len == &info.jited_prog_len &&
>  	     info.jited_prog_insns == 0) ||
>  	    (member_len == &info.xlated_prog_len &&
> @@ -558,6 +582,9 @@ static int do_dump(int argc, char **argv)
>  			dump_xlated_cfg(buf, *member_len);
>  	} else {
>  		kernel_syms_load(&dd);
> +		dd.jited_ksyms = ksyms_ptr;
> +		dd.nr_jited_ksyms = *ksyms_len;
> +
>  		if (json_output)
>  			dump_xlated_json(&dd, buf, *member_len, opcodes);
>  		else
> @@ -566,10 +593,14 @@ static int do_dump(int argc, char **argv)
>  	}
>  
>  	free(buf);
> +	if (addrs)
> +		free(addrs);

Free can deal with NULL pointers, no need for an if.

>  	return 0;
>  
>  err_free:
>  	free(buf);
> +	if (addrs)
> +		free(addrs);
>  	return -1;
>  }
>  
> diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
> index 7a3173b76c16..dc8e4eca0387 100644
> --- a/tools/bpf/bpftool/xlated_dumper.c
> +++ b/tools/bpf/bpftool/xlated_dumper.c
> @@ -178,8 +178,12 @@ static const char *print_call_pcrel(struct dump_data *dd,
>  		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>  			 "%+d#%s", insn->off, sym->name);
>  	else

else if (address)

saves us the indentation.

> -		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
> -			 "%+d#0x%lx", insn->off, address);
> +		if (address)
> +			snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
> +				 "%+d#0x%lx", insn->off, address);
> +		else
> +			snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
> +				 "%+d", insn->off);
>  	return dd->scratch_buff;
>  }
>  
> @@ -200,14 +204,20 @@ static const char *print_call(void *private_data,
>  			      const struct bpf_insn *insn)
>  {
>  	struct dump_data *dd = private_data;
> -	unsigned long address = dd->address_call_base + insn->imm;
> -	struct kernel_sym *sym;
> +	unsigned long address = 0;
> +	struct kernel_sym *sym = NULL;
>  

Hm.  Quite a bit of churn.  Why not just add these three lines here:

if (insn->src_reg == BPF_PSEUDO_CALL && 
    insn->imm < dd->nr_jited_ksyms)
	address = dd->jited_ksyms[insn->imm];

> -	sym = kernel_syms_search(dd, address);
> -	if (insn->src_reg == BPF_PSEUDO_CALL)
> +	if (insn->src_reg == BPF_PSEUDO_CALL) {
> +		if (dd->nr_jited_ksyms) {
> +			address = dd->jited_ksyms[insn->imm];

Perhaps it's paranoid, but it'd please do to bound check insn->imm
against dd->nr_jited_ksyms.

> +			sym = kernel_syms_search(dd, address);
> +		}
>  		return print_call_pcrel(dd, sym, address, insn);
> -	else
> +	} else {
> +		address = dd->address_call_base + insn->imm;
> +		sym = kernel_syms_search(dd, address);
>  		return print_call_helper(dd, sym, address);
> +	}
>  }
>  
>  static const char *print_imm(void *private_data,

^ permalink raw reply

* Re: [PATCH v11 01/26] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT
From: Randy Dunlap @ 2018-05-17 17:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Laurent Dufour, akpm, mhocko, peterz, kirill, ak, dave, jack,
	khandual, aneesh.kumar, benh, mpe, paulus, Thomas Gleixner,
	Ingo Molnar, hpa, Will Deacon, Sergey Senozhatsky,
	sergey.senozhatsky.work, Andrea Arcangeli, Alexei Starovoitov,
	kemi.wang, Daniel Jordan, David Rientjes, Jerome Glisse,
	Ganesh Mahendran, Minchan Kim, Punit Agrawal, vinayak menon,
	Yang Shi, linux-kernel, linux-mm, haren, npiggin, bsingharora,
	paulmck, Tim Chen, linuxppc-dev, x86
In-Reply-To: <20180517171951.GB26718@bombadil.infradead.org>

On 05/17/2018 10:19 AM, Matthew Wilcox wrote:
> On Thu, May 17, 2018 at 09:36:00AM -0700, Randy Dunlap wrote:
>>> +	 If the speculative page fault fails because of a concurrency is
>>
>> 	                                     because a concurrency is
> 
> While one can use concurrency as a noun, it sounds archaic to me.  I'd
> rather:
> 
> 	If the speculative page fault fails because a concurrent modification
> 	is detected or because underlying PMD or PTE tables are not yet

Yeah, OK.

>>> +	 detected or because underlying PMD or PTE tables are not yet
>>> +	 allocating, it is failing its processing and a classic page fault
>>
>> 	 allocated, the speculative page fault fails and a classic page fault
>>
>>> +	 is then tried.


-- 
~Randy

^ permalink raw reply

* Re: [PATCH v4 4/4] powerpc/kbuild: move -mprofile-kernel check to Kconfig
From: kbuild test robot @ 2018-05-17 17:28 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: kbuild-all, linux-kbuild, Masahiro Yamada, linuxppc-dev,
	Nicholas Piggin
In-Reply-To: <20180516141458.18996-5-npiggin@gmail.com>

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-patches-for-new-Kconfig-language/20180517-224044
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allmodconfig
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        make.cross ARCH=powerpc  allmodconfig
        make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
>> arch/powerpc/Kconfig:468: syntax error
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
>> arch/powerpc/Kconfig:467: invalid option
   make[2]: *** [allmodconfig] Error 1
   make[1]: *** [allmodconfig] Error 2
   make: *** [sub-make] Error 2
--
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
>> arch/powerpc/Kconfig:468: syntax error
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
>> arch/powerpc/Kconfig:467: invalid option
   make[2]: *** [oldconfig] Error 1
   make[1]: *** [oldconfig] Error 2
   make: *** [sub-make] Error 2
--
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
>> arch/powerpc/Kconfig:468: syntax error
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
   arch/powerpc/Kconfig:467:warning: ignoring unsupported character '$'
>> arch/powerpc/Kconfig:467: invalid option
   make[2]: *** [olddefconfig] Error 1
   make[2]: Target 'oldnoconfig' not remade because of errors.
   make[1]: *** [oldnoconfig] Error 2
   make: *** [sub-make] Error 2

vim +468 arch/powerpc/Kconfig

e05c0e81 Kevin Hao       2013-07-16  443  
3d72bbc4 Michael Neuling 2013-02-13  444  config PPC_TRANSACTIONAL_MEM
3d72bbc4 Michael Neuling 2013-02-13  445         bool "Transactional Memory support for POWERPC"
3d72bbc4 Michael Neuling 2013-02-13  446         depends on PPC_BOOK3S_64
3d72bbc4 Michael Neuling 2013-02-13  447         depends on SMP
7b37a123 Michael Neuling 2014-01-08  448         select ALTIVEC
7b37a123 Michael Neuling 2014-01-08  449         select VSX
3d72bbc4 Michael Neuling 2013-02-13  450         default n
3d72bbc4 Michael Neuling 2013-02-13  451         ---help---
3d72bbc4 Michael Neuling 2013-02-13  452           Support user-mode Transactional Memory on POWERPC.
3d72bbc4 Michael Neuling 2013-02-13  453  
951eedeb Nicholas Piggin 2017-05-29  454  config LD_HEAD_STUB_CATCH
951eedeb Nicholas Piggin 2017-05-29  455  	bool "Reserve 256 bytes to cope with linker stubs in HEAD text" if EXPERT
951eedeb Nicholas Piggin 2017-05-29  456  	depends on PPC64
951eedeb Nicholas Piggin 2017-05-29  457  	default n
951eedeb Nicholas Piggin 2017-05-29  458  	help
951eedeb Nicholas Piggin 2017-05-29  459  	  Very large kernels can cause linker branch stubs to be generated by
951eedeb Nicholas Piggin 2017-05-29  460  	  code in head_64.S, which moves the head text sections out of their
951eedeb Nicholas Piggin 2017-05-29  461  	  specified location. This option can work around the problem.
951eedeb Nicholas Piggin 2017-05-29  462  
951eedeb Nicholas Piggin 2017-05-29  463  	  If unsure, say "N".
951eedeb Nicholas Piggin 2017-05-29  464  
8c50b72a Torsten Duwe    2016-03-03  465  config MPROFILE_KERNEL
8c50b72a Torsten Duwe    2016-03-03  466  	depends on PPC64 && CPU_LITTLE_ENDIAN
4421b963 Nicholas Piggin 2018-05-17 @467  	def_bool $(success $(srctree)/arch/powerpc/tools/gcc-check-mprofile-kernel.sh $(CC) -I$(srctree)/include -D__KERNEL__)
8c50b72a Torsten Duwe    2016-03-03 @468  

:::::: The code at line 468 was first introduced by commit
:::::: 8c50b72a3b4f1f7cdfdfebd233b1cbd121262e65 powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel

:::::: TO: Torsten Duwe <duwe@lst.de>
:::::: CC: Michael Ellerman <mpe@ellerman.id.au>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH v11 01/26] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT
From: Matthew Wilcox @ 2018-05-17 17:19 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Laurent Dufour, akpm, mhocko, peterz, kirill, ak, dave, jack,
	khandual, aneesh.kumar, benh, mpe, paulus, Thomas Gleixner,
	Ingo Molnar, hpa, Will Deacon, Sergey Senozhatsky,
	sergey.senozhatsky.work, Andrea Arcangeli, Alexei Starovoitov,
	kemi.wang, Daniel Jordan, David Rientjes, Jerome Glisse,
	Ganesh Mahendran, Minchan Kim, Punit Agrawal, vinayak menon,
	Yang Shi, linux-kernel, linux-mm, haren, npiggin, bsingharora,
	paulmck, Tim Chen, linuxppc-dev, x86
In-Reply-To: <2cb8256d-5822-d94d-b0e6-c46f21d84852@infradead.org>

On Thu, May 17, 2018 at 09:36:00AM -0700, Randy Dunlap wrote:
> > +	 If the speculative page fault fails because of a concurrency is
> 
> 	                                     because a concurrency is

While one can use concurrency as a noun, it sounds archaic to me.  I'd
rather:

	If the speculative page fault fails because a concurrent modification
	is detected or because underlying PMD or PTE tables are not yet

> > +	 detected or because underlying PMD or PTE tables are not yet
> > +	 allocating, it is failing its processing and a classic page fault
> 
> 	 allocated, the speculative page fault fails and a classic page fault
> 
> > +	 is then tried.

^ permalink raw reply

* Re: [PATCH v11 01/26] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT
From: Randy Dunlap @ 2018-05-17 16:36 UTC (permalink / raw)
  To: Laurent Dufour, akpm, mhocko, peterz, kirill, ak, dave, jack,
	Matthew Wilcox, khandual, aneesh.kumar, benh, mpe, paulus,
	Thomas Gleixner, Ingo Molnar, hpa, Will Deacon,
	Sergey Senozhatsky, sergey.senozhatsky.work, Andrea Arcangeli,
	Alexei Starovoitov, kemi.wang, Daniel Jordan, David Rientjes,
	Jerome Glisse, Ganesh Mahendran, Minchan Kim, Punit Agrawal,
	vinayak menon, Yang Shi
  Cc: linux-kernel, linux-mm, haren, npiggin, bsingharora, paulmck,
	Tim Chen, linuxppc-dev, x86
In-Reply-To: <1526555193-7242-2-git-send-email-ldufour@linux.vnet.ibm.com>

Hi,

On 05/17/2018 04:06 AM, Laurent Dufour wrote:
> This configuration variable will be used to build the code needed to
> handle speculative page fault.
> 
> By default it is turned off, and activated depending on architecture
> support, ARCH_HAS_PTE_SPECIAL, SMP and MMU.
> 
> The architecture support is needed since the speculative page fault handler
> is called from the architecture's page faulting code, and some code has to
> be added there to handle the speculative handler.
> 
> The dependency on ARCH_HAS_PTE_SPECIAL is required because vm_normal_page()
> does processing that is not compatible with the speculative handling in the
> case ARCH_HAS_PTE_SPECIAL is not set.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Suggested-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> ---
>  mm/Kconfig | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 1d0888c5b97a..a38796276113 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -761,3 +761,25 @@ config GUP_BENCHMARK
>  
>  config ARCH_HAS_PTE_SPECIAL
>  	bool
> +
> +config ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
> +       def_bool n
> +
> +config SPECULATIVE_PAGE_FAULT
> +       bool "Speculative page faults"
> +       default y
> +       depends on ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
> +       depends on ARCH_HAS_PTE_SPECIAL && MMU && SMP
> +       help
> +         Try to handle user space page faults without holding the mmap_sem.
> +
> +	 This should allow better concurrency for massively threaded process

	                                                             processes

> +	 since the page fault handler will not wait for other threads memory

	                                                      thread's

> +	 layout change to be done, assuming that this change is done in another
> +	 part of the process's memory space. This type of page fault is named
> +	 speculative page fault.
> +
> +	 If the speculative page fault fails because of a concurrency is

	                                     because a concurrency is

> +	 detected or because underlying PMD or PTE tables are not yet
> +	 allocating, it is failing its processing and a classic page fault

	 allocated, the speculative page fault fails and a classic page fault

> +	 is then tried.


Also, all of the help text (below the "help" line) should be indented by
1 tab + 2 spaces (in coding-style.rst).


-- 
~Randy

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox