LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC v3 1/4] powerpc/hotplug/drcinfo: Fix bugs parsing ibm,drc-info structs
From: Nathan Fontenot @ 2018-05-18 19:57 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev; +Cc: John Allen, Tyrel Datwyler, Thomas Falcon
In-Reply-To: <5a950883-d06d-4cdc-2716-8c1ce25f394d@linux.vnet.ibm.com>

On 05/17/2018 05:41 PM, Michael Bringmann wrote:
> [Replace/withdraw previous patch submission to ensure that testing
> of related patches on similar hardware progresses together.]
> 
> This patch fixes a memory parsing bug when using of_prop_next_u32
> calls at the start of a structure.  Depending upon the value of
> "cur" memory pointer argument to of_prop_next_u32, it will or it
> won't advance the value of the returned memory pointer by the
> size of one u32.  This patch corrects the code to deal with that
> indexing feature when parsing the ibm,drc-info structs for CPUs.
> Also, need to advance the pointer at the end of_read_drc_info_cell
> for same reason.
> 

I see that you provide an update for of_read_drc_info_cell to fix the
unexpected behavior you're seeing, but I'm not sure why you're updating
the code in pseries_energy.c and rpaphp_core.c? can you provide some
additional information as to why these functions also need to be updated.

> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info firmware feature" -- end of patch series applied to powerpc next)
> ---
> Changes in V3:
>   -- Rebased patch to 4.17-rc5 kernel
> ---
>  arch/powerpc/platforms/pseries/of_helpers.c     |    5 ++---
>  arch/powerpc/platforms/pseries/pseries_energy.c |    2 ++
>  drivers/pci/hotplug/rpaphp_core.c               |    1 +
>  3 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/of_helpers.c b/arch/powerpc/platforms/pseries/of_helpers.c
> index 6df192f..20598b2 100644
> --- a/arch/powerpc/platforms/pseries/of_helpers.c
> +++ b/arch/powerpc/platforms/pseries/of_helpers.c
> @@ -65,9 +65,7 @@ int of_read_drc_info_cell(struct property **prop, const __be32 **curval,
> 
>  	/* Get drc-index-start:encode-int */
>  	p2 = (const __be32 *)p;
> -	p2 = of_prop_next_u32(*prop, p2, &data->drc_index_start);
> -	if (!p2)
> -		return -EINVAL;
> +	data->drc_index_start = of_read_number(p2, 1);

This appears to resolve advancing the pointer for the beginning of a struct.

> 
>  	/* Get drc-name-suffix-start:encode-int */
>  	p2 = of_prop_next_u32(*prop, p2, &data->drc_name_suffix_start);
> @@ -88,6 +86,7 @@ int of_read_drc_info_cell(struct property **prop, const __be32 **curval,
>  	p2 = of_prop_next_u32(*prop, p2, &data->drc_power_domain);
>  	if (!p2)
>  		return -EINVAL;
> +	p2++;

...but why is the advancement needed here? of_prop_next_u32 should have advanced it, correct?

-Nathan

> 
>  	/* Should now know end of current entry */
>  	(*curval) = (void *)p2;
> diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c b/arch/powerpc/platforms/pseries/pseries_energy.c
> index 6ed2212..c7d84aa 100644
> --- a/arch/powerpc/platforms/pseries/pseries_energy.c
> +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
> @@ -64,6 +64,7 @@ static u32 cpu_to_drc_index(int cpu)
>  		value = of_prop_next_u32(info, NULL, &num_set_entries);
>  		if (!value)
>  			goto err_of_node_put;
> +		value++;
> 
>  		for (j = 0; j < num_set_entries; j++) {
> 
> @@ -126,6 +127,7 @@ static int drc_index_to_cpu(u32 drc_index)
>  		value = of_prop_next_u32(info, NULL, &num_set_entries);
>  		if (!value)
>  			goto err_of_node_put;
> +		value++;
> 
>  		for (j = 0; j < num_set_entries; j++) {
> 
> diff --git a/drivers/pci/hotplug/rpaphp_core.c b/drivers/pci/hotplug/rpaphp_core.c
> index fb5e084..dccdf62 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -239,6 +239,7 @@ static int rpaphp_check_drc_props_v2(struct device_node *dn, char *drc_name,
>  	value = of_prop_next_u32(info, NULL, &entries);
>  	if (!value)
>  		return -EINVAL;
> +	value++;
> 
>  	for (j = 0; j < entries; j++) {
>  		of_read_drc_info_cell(&info, &value, &drc);
> 

^ permalink raw reply

* Re: [PATCH bpf v2 5/6] tools: bpftool: resolve calls without using imm field
From: Jakub Kicinski @ 2018-05-18 19:55 UTC (permalink / raw)
  To: Sandipan Das; +Cc: ast, daniel, netdev, linuxppc-dev, naveen.n.rao, mpe
In-Reply-To: <20180518125039.6500-6-sandipan@linux.vnet.ibm.com>

On Fri, 18 May 2018 18:20:38 +0530, Sandipan Das wrote:
> Currently, we resolve the callee's address for a JITed function
> call by using the imm field of the call instruction as an offset
> from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
> use this address to get the callee's kernel symbol's name.
> 
> For some architectures, such as powerpc64, the imm field is not
> large enough to hold this offset. So, instead of assigning this
> offset to the imm field, the verifier now assigns the subprog
> id. Also, a list of kernel symbol addresses for all the JITed
> functions is provided in the program info. We now use the imm
> field as an index for this list to lookup a callee's symbol's
> address and resolve its name.
> 
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
> v2:
>  - Order variables from longest to shortest
>  - Make sure that ksyms_ptr and ksyms_len are always initialized
>  - Simplify code

Thanks for the improvements!  Since there will be v3 two minor nit
picks still :)

>  tools/bpf/bpftool/prog.c          | 29 +++++++++++++++++++++++++++++
>  tools/bpf/bpftool/xlated_dumper.c | 10 +++++++++-
>  tools/bpf/bpftool/xlated_dumper.h |  2 ++
>  3 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 9bdfdf2d3fbe..e2f8f8f259fc 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -421,19 +421,26 @@ static int do_show(int argc, char **argv)
>  static int do_dump(int argc, char **argv)
>  {
>  	struct bpf_prog_info info = {};
> +	unsigned long *addrs = NULL;
>  	struct dump_data dd = {};
>  	__u32 len = sizeof(info);
>  	unsigned int buf_size;
> +	unsigned int nr_addrs;
>  	char *filepath = NULL;
>  	bool opcodes = false;
>  	bool visual = false;
>  	unsigned char *buf;
>  	__u32 *member_len;
>  	__u64 *member_ptr;
> +	__u32 *ksyms_len;
> +	__u64 *ksyms_ptr;
>  	ssize_t n;
>  	int err;
>  	int fd;
>  
> +	ksyms_len = &info.nr_jited_ksyms;
> +	ksyms_ptr = &info.jited_ksyms;

I'm not sure why you need these, why not just access
info.nr_jited_ksyms and info.jited_ksyms directly?  "member" variables
are there because jited and xlated images get returned in different
member of struct bpf_prog_info.

>  	if (is_prefix(*argv, "jited")) {
>  		member_len = &info.jited_prog_len;
>  		member_ptr = &info.jited_prog_insns;
> @@ -496,10 +503,22 @@ static int do_dump(int argc, char **argv)
>  		return -1;
>  	}
>  
> +	nr_addrs = *ksyms_len;
> +	if (nr_addrs) {
> +		addrs = malloc(nr_addrs * sizeof(__u64));
> +		if (!addrs) {
> +			p_err("mem alloc failed");
> +			close(fd);
> +			goto err_free;
> +		}
> +	}
> +
>  	memset(&info, 0, sizeof(info));
>  
>  	*member_ptr = ptr_to_u64(buf);
>  	*member_len = buf_size;
> +	*ksyms_ptr = ptr_to_u64(addrs);
> +	*ksyms_len = nr_addrs;
>  
>  	err = bpf_obj_get_info_by_fd(fd, &info, &len);
>  	close(fd);
> @@ -513,6 +532,11 @@ static int do_dump(int argc, char **argv)
>  		goto err_free;
>  	}
>  
> +	if (*ksyms_len > nr_addrs) {
> +		p_err("too many addresses returned");
> +		goto err_free;
> +	}
> +
>  	if ((member_len == &info.jited_prog_len &&
>  	     info.jited_prog_insns == 0) ||
>  	    (member_len == &info.xlated_prog_len &&
> @@ -558,6 +582,9 @@ static int do_dump(int argc, char **argv)
>  			dump_xlated_cfg(buf, *member_len);
>  	} else {
>  		kernel_syms_load(&dd);
> +		dd.nr_jited_ksyms = *ksyms_len;
> +		dd.jited_ksyms = (__u64 *) *ksyms_ptr;
> +
>  		if (json_output)
>  			dump_xlated_json(&dd, buf, *member_len, opcodes);
>  		else

> diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
> index 7a3173b76c16..fb065b55db6d 100644
> --- a/tools/bpf/bpftool/xlated_dumper.c
> +++ b/tools/bpf/bpftool/xlated_dumper.c
> @@ -203,6 +207,10 @@ static const char *print_call(void *private_data,
>  	unsigned long address = dd->address_call_base + insn->imm;
>  	struct kernel_sym *sym;
>  
> +	if (insn->src_reg == BPF_PSEUDO_CALL &&
> +		(__u32) insn->imm < dd->nr_jited_ksyms)

Indentation seems off.

> +		address = dd->jited_ksyms[insn->imm];
> +
>  	sym = kernel_syms_search(dd, address);
>  	if (insn->src_reg == BPF_PSEUDO_CALL)
>  		return print_call_pcrel(dd, sym, address, insn);

^ permalink raw reply

* Re: pkeys on POWER: Default AMR, UAMOR values
From: Andy Lutomirski @ 2018-05-18 19:39 UTC (permalink / raw)
  To: linuxram; +Cc: Florian Weimer, linuxppc-dev, Linux-MM, Dave Hansen
In-Reply-To: <20180518174448.GE5479@ram.oc3035372033.ibm.com>

On Fri, May 18, 2018 at 10:45 AM Ram Pai <linuxram@us.ibm.com> wrote:

> On Fri, May 18, 2018 at 03:17:14PM +0200, Florian Weimer wrote:
> > I'm working on adding POWER pkeys support to glibc.  The coding work
> > is done, but I'm faced with some test suite failures.
> >
> > Unlike the default x86 configuration, on POWER, existing threads
> > have full access to newly allocated keys.
> >
> > Or, more precisely, in this scenario:
> >
> > * Thread A launches thread B
> > * Thread B waits
> > * Thread A allocations a protection key with pkey_alloc
> > * Thread A applies the key to a page
> > * Thread A signals thread B
> > * Thread B starts to run and accesses the page
> >
> > Then at the end, the access will be granted.
> >
> > I hope it's not too late to change this to denied access.
> >
> > Furthermore, I think the UAMOR value is wrong as well because it
> > prevents thread B at the end to set the AMR register.  In
> > particular, if I do this
> >
> > * =E2=80=A6 (as before)
> > * Thread A signals thread B
> > * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS
> > * Thread B reads the current access rights for the key
> >
> > then it still gets 0 (all access permitted) because the original
> > UAMOR value inherited from thread A prior to the key allocation
> > masks out the access right update for the newly allocated key.

> Florian, is the behavior on x86 any different? A key allocated in the
> context off one thread is not meaningful in the context of any other
> thread.


The difference is that x86 starts out with deny-all instead of allow-all.
The POWER semantics make it very hard for a multithreaded program to
meaningfully use protection keys to prevent accidental access to important
memory.

^ permalink raw reply

* Re: [RFC v4 2/3] powerpc migration/cpu: Associativity & cpu changes
From: Nathan Fontenot @ 2018-05-18 19:28 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev; +Cc: John Allen, Tyrel Datwyler, Thomas Falcon
In-Reply-To: <17dadef4-f7f3-f5b4-8ead-f194dd5cdf2b@linux.vnet.ibm.com>

On 05/17/2018 05:26 PM, Michael Bringmann wrote:
> powerpc migration/cpu: Now apply changes to the associativity of cpus
> for the topology of LPARS in Post Migration events.  Recognize more
> changes to the associativity of memory blocks described by the
> 'cpu' properties when processing the topology of LPARS in Post Migration
> events.  Previous efforts only recognized whether a memory block's
> assignment had changed in the property.  Changes here include:
> 
> * Provide hotplug CPU 'readd by index' operation
> * Checking for changes in cpu associativity and making 'readd' calls
>   when differences are observed.
> * Queue up  changes to CPU properties so that they may take place
>   after all PowerPC device-tree changes have been applied i.e. after
>   the device hotplug is released in the mobility code.

This kinda feels like three different patches in one. any reason to not split
this into three patches? Perhaps at the least split the last item into it's
own patch.

> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> ---
> Changes include:
>   -- Rearrange patches to co-locate CPU property-related changes.
>   -- Modify dlpar_cpu_add & dlpar_cpu_remove to skip DRC index acquire
>      or release operations during the CPU readd process.
>   -- Correct a bug in DRC index selection for queued operation.
>   -- Rebase to 4.17-rc5 kernel
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |  123 +++++++++++++++++++-------
>  arch/powerpc/platforms/pseries/mobility.c    |    3 +
>  2 files changed, 95 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index a408217..23d4cb8 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -474,7 +474,7 @@ static bool valid_cpu_drc_index(struct device_node *parent, u32 drc_index)
>  				&cdata);
>  }
> 
> -static ssize_t dlpar_cpu_add(u32 drc_index)
> +static ssize_t dlpar_cpu_add(u32 drc_index, bool acquire_drc)
>  {
>  	struct device_node *dn, *parent;
>  	int rc, saved_rc;
> @@ -499,19 +499,22 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>  		return -EINVAL;
>  	}
> 
> -	rc = dlpar_acquire_drc(drc_index);
> -	if (rc) {
> -		pr_warn("Failed to acquire DRC, rc: %d, drc index: %x\n",
> -			rc, drc_index);
> -		of_node_put(parent);
> -		return -EINVAL;
> +	if (acquire_drc) {
> +		rc = dlpar_acquire_drc(drc_index);
> +		if (rc) {
> +			pr_warn("Failed to acquire DRC, rc: %d, drc index: %x\n",
> +				rc, drc_index);
> +			of_node_put(parent);
> +			return -EINVAL;
> +		}
>  	}
> 
>  	dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent);
>  	if (!dn) {
>  		pr_warn("Failed call to configure-connector, drc index: %x\n",
>  			drc_index);
> -		dlpar_release_drc(drc_index);
> +		if (acquire_drc)
> +			dlpar_release_drc(drc_index);
>  		of_node_put(parent);
>  		return -EINVAL;
>  	}
> @@ -526,8 +529,9 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>  		pr_warn("Failed to attach node %s, rc: %d, drc index: %x\n",
>  			dn->name, rc, drc_index);
> 
> -		rc = dlpar_release_drc(drc_index);
> -		if (!rc)
> +		if (acquire_drc)
> +			rc = dlpar_release_drc(drc_index);
> +		if (!rc || acquire_drc)
>  			dlpar_free_cc_nodes(dn);

This seems like it would be more readable if everything were inside the
if (acquire_drc) block.

> 
>  		return saved_rc;
> @@ -540,7 +544,7 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>  			dn->name, rc, drc_index);
> 
>  		rc = dlpar_detach_node(dn);
> -		if (!rc)
> +		if (!rc && acquire_drc)
>  			dlpar_release_drc(drc_index);
> 
>  		return saved_rc;
> @@ -608,12 +612,13 @@ static int dlpar_offline_cpu(struct device_node *dn)
> 
>  }
> 
> -static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
> +static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index,
> +				bool release_drc)
>  {
>  	int rc;
> 
> -	pr_debug("Attempting to remove CPU %s, drc index: %x\n",
> -		 dn->name, drc_index);
> +	pr_debug("Attempting to remove CPU %s, drc index: %x (%d)\n",
> +		 dn->name, drc_index, release_drc);
> 
>  	rc = dlpar_offline_cpu(dn);
>  	if (rc) {
> @@ -621,12 +626,14 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
>  		return -EINVAL;
>  	}
> 
> -	rc = dlpar_release_drc(drc_index);
> -	if (rc) {
> -		pr_warn("Failed to release drc (%x) for CPU %s, rc: %d\n",
> -			drc_index, dn->name, rc);
> -		dlpar_online_cpu(dn);
> -		return rc;
> +	if (release_drc) {
> +		rc = dlpar_release_drc(drc_index);
> +		if (rc) {
> +			pr_warn("Failed to release drc (%x) for CPU %s, rc: %d\n",
> +				drc_index, dn->name, rc);
> +			dlpar_online_cpu(dn);
> +			return rc;
> +		}
>  	}
> 
>  	rc = dlpar_detach_node(dn);
> @@ -635,7 +642,8 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
> 
>  		pr_warn("Failed to detach CPU %s, rc: %d", dn->name, rc);
> 
> -		rc = dlpar_acquire_drc(drc_index);
> +		if (release_drc)
> +			rc = dlpar_acquire_drc(drc_index);
>  		if (!rc)
>  			dlpar_online_cpu(dn);
> 
> @@ -664,7 +672,7 @@ static struct device_node *cpu_drc_index_to_dn(u32 drc_index)
>  	return dn;
>  }
> 
> -static int dlpar_cpu_remove_by_index(u32 drc_index)
> +static int dlpar_cpu_remove_by_index(u32 drc_index, bool release_drc)
>  {
>  	struct device_node *dn;
>  	int rc;
> @@ -676,10 +684,30 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
>  		return -ENODEV;
>  	}
> 
> -	rc = dlpar_cpu_remove(dn, drc_index);
> +	rc = dlpar_cpu_remove(dn, drc_index, release_drc);
>  	of_node_put(dn);
>  	return rc;
>  }
> + 
> +static int dlpar_cpu_readd_by_index(u32 drc_index)
> +{
> +	int rc = 0;

No need to initialize to 0 here.

> +
> +	pr_info("Attempting to re-add CPU, drc index %x\n", drc_index);
> +
> +	rc = dlpar_cpu_remove_by_index(drc_index, false);
> +	if (!rc)
> +		rc = dlpar_cpu_add(drc_index, false);
> +
> +	if (rc)
> +		pr_info("Failed to update cpu at drc_index %lx\n",
> +				(unsigned long int)drc_index);
> +	else
> +		pr_info("CPU at drc_index %lx was updated\n",
> +				(unsigned long int)drc_index);
> +
> +	return rc;
> +}
> 
>  static int find_dlpar_cpus_to_remove(u32 *cpu_drcs, int cpus_to_remove)
>  {
> @@ -741,7 +769,7 @@ static int dlpar_cpu_remove_by_count(u32 cpus_to_remove)
>  	}
> 
>  	for (i = 0; i < cpus_to_remove; i++) {
> -		rc = dlpar_cpu_remove_by_index(cpu_drcs[i]);
> +		rc = dlpar_cpu_remove_by_index(cpu_drcs[i], true);
>  		if (rc)
>  			break;
> 
> @@ -752,7 +780,7 @@ static int dlpar_cpu_remove_by_count(u32 cpus_to_remove)
>  		pr_warn("CPU hot-remove failed, adding back removed CPUs\n");
> 
>  		for (i = 0; i < cpus_removed; i++)
> -			dlpar_cpu_add(cpu_drcs[i]);
> +			dlpar_cpu_add(cpu_drcs[i], true);
> 
>  		rc = -EINVAL;
>  	} else {
> @@ -843,7 +871,7 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
>  	}
> 
>  	for (i = 0; i < cpus_to_add; i++) {
> -		rc = dlpar_cpu_add(cpu_drcs[i]);
> +		rc = dlpar_cpu_add(cpu_drcs[i], true);
>  		if (rc)
>  			break;
> 
> @@ -854,7 +882,7 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
>  		pr_warn("CPU hot-add failed, removing any added CPUs\n");
> 
>  		for (i = 0; i < cpus_added; i++)
> -			dlpar_cpu_remove_by_index(cpu_drcs[i]);
> +			dlpar_cpu_remove_by_index(cpu_drcs[i], true);
> 
>  		rc = -EINVAL;
>  	} else {
> @@ -880,7 +908,7 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>  		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>  			rc = dlpar_cpu_remove_by_count(count);
>  		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> -			rc = dlpar_cpu_remove_by_index(drc_index);
> +			rc = dlpar_cpu_remove_by_index(drc_index, true);
>  		else
>  			rc = -EINVAL;
>  		break;
> @@ -888,10 +916,13 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>  		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>  			rc = dlpar_cpu_add_by_count(count);
>  		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> -			rc = dlpar_cpu_add(drc_index);
> +			rc = dlpar_cpu_add(drc_index, true);
>  		else
>  			rc = -EINVAL;
>  		break;
> +	case PSERIES_HP_ELOG_ACTION_READD:
> +		rc = dlpar_cpu_readd_by_index(drc_index);
> +		break;
>  	default:
>  		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>  		rc = -EINVAL;
> @@ -913,7 +944,7 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
>  	if (rc)
>  		return -EINVAL;
> 
> -	rc = dlpar_cpu_add(drc_index);
> +	rc = dlpar_cpu_add(drc_index, true);
> 
>  	return rc ? rc : count;
>  }
> @@ -934,7 +965,7 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
>  		return -EINVAL;
>  	}
> 
> -	rc = dlpar_cpu_remove(dn, drc_index);
> +	rc = dlpar_cpu_remove(dn, drc_index, true);
>  	of_node_put(dn);
> 
>  	return rc ? rc : count;
> @@ -942,12 +973,38 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
> 
>  #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
> 
> +static int pseries_update_cpu(struct of_reconfig_data *pr)
> +{
> +	/* So far, we only handle the 'ibm,associativity' property,
> +	 * here.
> +	 */
> +	struct pseries_hp_errorlog *hp_elog;
> +
> +        hp_elog = kzalloc(sizeof(*hp_elog), GFP_KERNEL);
> +        if(!hp_elog)
> +                return -ENOMEM;

You seem to have some odd indentation here, may want to run this through checkpatch

Also, I think queue_hotplug_event() makes a copy of the hotplug event passed in. If
so you could just use a hp_elog struct on the stack and avoid the allocation.

-Nathan

> +
> +	hp_elog->resource = PSERIES_HP_ELOG_RESOURCE_CPU;
> +	hp_elog->action = PSERIES_HP_ELOG_ACTION_READD;
> +	hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
> +	hp_elog->_drc_u.drc_index = be32_to_cpu(pr->dn->phandle);
> +
> +	queue_hotplug_event(hp_elog, NULL, NULL);
> +
> +	kfree(hp_elog);
> +
> +	return 0;
> +}
> +
>  static int pseries_smp_notifier(struct notifier_block *nb,
>  				unsigned long action, void *data)
>  {
>  	struct of_reconfig_data *rd = data;
>  	int err = 0;
> 
> +	if (strcmp(rd->dn->type, "cpu"))
> +		return notifier_from_errno(err);
> +
>  	switch (action) {
>  	case OF_RECONFIG_ATTACH_NODE:
>  		err = pseries_add_processor(rd->dn);
> @@ -955,6 +1012,10 @@ static int pseries_smp_notifier(struct notifier_block *nb,
>  	case OF_RECONFIG_DETACH_NODE:
>  		pseries_remove_processor(rd->dn);
>  		break;
> +	case OF_RECONFIG_UPDATE_PROPERTY:
> +		if (!strcmp(rd->prop->name, "ibm,associativity"))
> +			err = pseries_update_cpu(rd);
> +		break;
>  	}
>  	return notifier_from_errno(err);
>  }
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 8a8033a..6d98f84 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -283,6 +283,8 @@ int pseries_devicetree_update(s32 scope)
>  	if (!rtas_buf)
>  		return -ENOMEM;
> 
> +	lock_device_hotplug();
> +
>  	do {
>  		rc = mobility_rtas_call(update_nodes_token, rtas_buf, scope);
>  		if (rc && rc != 1)
> @@ -321,6 +323,7 @@ int pseries_devicetree_update(s32 scope)
>  	} while (rc == 1);
> 
>  	kfree(rtas_buf);
> +	unlock_device_hotplug();
>  	return rc;
>  }
> 

^ permalink raw reply

* Re: [RFC v4 1/3] powerpc migration/drmem: Modify DRMEM code to export more features
From: Nathan Fontenot @ 2018-05-18 19:17 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev; +Cc: John Allen, Tyrel Datwyler, Thomas Falcon
In-Reply-To: <a4d036f9-990d-b1db-3de9-aa179c030743@linux.vnet.ibm.com>

On 05/17/2018 05:26 PM, Michael Bringmann wrote:
> powerpc migration/drmem: Export many of the functions of DRMEM to
> parse "ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during
> hotplug operations and for Post Migration events.
> 
> Also modify the DRMEM initialization code to allow it to,
> 
> * Be called after system initialization
> * Provide a separate user copy of the LMB array that is produces
> * Free the user copy upon request
> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> ---
> Changes in RFC:
>   -- Separate DRMEM changes into a standalone patch
>   -- Do not export excess functions.  Make exported names more explicit.
>   -- Add new iterator to work through a pair of drmem_info arrays.
>   -- Modify DRMEM code to replace usages of dt_root_addr_cells, and
>      dt_mem_next_cell, as these are only available at first boot.
>   -- Rebase to 4.17-rc5 kernel
> ---
>  arch/powerpc/include/asm/drmem.h |   10 +++++
>  arch/powerpc/mm/drmem.c          |   78 +++++++++++++++++++++++++++-----------
>  2 files changed, 66 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
> index ce242b9..c964b89 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -35,6 +35,13 @@ struct drmem_lmb_info {
>  		&drmem_info->lmbs[0],				\
>  		&drmem_info->lmbs[drmem_info->n_lmbs - 1])
> 
> +#define for_each_pair_drmem_lmb(dinfo1, lmb1, dinfo2, lmb2)	\
> +	for ((lmb1) = (&dinfo1->lmbs[0]),			\
> +	     (lmb2) = (&dinfo2->lmbs[0]);			\
> +             ((lmb1) <= (&dinfo1->lmbs[dinfo1->n_lmbs - 1])) &&	\
> +             ((lmb2) <= (&dinfo2->lmbs[dinfo2->n_lmbs - 1]));	\
> +	     (lmb1)++, (lmb2)++)
> +
>  /*
>   * The of_drconf_cell_v1 struct defines the layout of the LMB data
>   * specified in the ibm,dynamic-memory device tree property.
> @@ -94,6 +101,9 @@ void __init walk_drmem_lmbs(struct device_node *dn,
>  			void (*func)(struct drmem_lmb *, const __be32 **));
>  int drmem_update_dt(void);
> 
> +struct drmem_lmb_info* drmem_init_lmbs(struct property *prop);
> +void drmem_lmbs_free(struct drmem_lmb_info *dinfo);
> +
>  #ifdef CONFIG_PPC_PSERIES
>  void __init walk_drmem_lmbs_early(unsigned long node,
>  			void (*func)(struct drmem_lmb *, const __be32 **));
> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
> index 3f18036..d9b281c 100644
> --- a/arch/powerpc/mm/drmem.c
> +++ b/arch/powerpc/mm/drmem.c
> @@ -20,6 +20,7 @@
> 
>  static struct drmem_lmb_info __drmem_info;
>  struct drmem_lmb_info *drmem_info = &__drmem_info;
> +static int n_root_addr_cells;
> 
>  u64 drmem_lmb_memory_max(void)
>  {
> @@ -193,12 +194,13 @@ int drmem_update_dt(void)
>  	return rc;
>  }
> 
> -static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
> +static void read_drconf_v1_cell(struct drmem_lmb *lmb,
>  				       const __be32 **prop)
>  {
>  	const __be32 *p = *prop;
> 
> -	lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, &p);
> +	lmb->base_addr = of_read_number(p, n_root_addr_cells);
> +	p += n_root_addr_cells;
>  	lmb->drc_index = of_read_number(p++, 1);
> 
>  	p++; /* skip reserved field */
> @@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
>  	*prop = p;
>  }
> 
> -static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
> +static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *data,
>  			void (*func)(struct drmem_lmb *, const __be32 **))
>  {
>  	struct drmem_lmb lmb;
> @@ -221,17 +223,18 @@ static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
> 
>  	for (i = 0; i < n_lmbs; i++) {
>  		read_drconf_v1_cell(&lmb, &prop);
> -		func(&lmb, &usm);
> +		func(&lmb, &data);

Is there a need to change the variable name from usm to data (bot here and below)?

>  	}
>  }
> 
> -static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
> +static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
>  				       const __be32 **prop)
>  {
>  	const __be32 *p = *prop;
> 
>  	dr_cell->seq_lmbs = of_read_number(p++, 1);
> -	dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, &p);
> +	dr_cell->base_addr = of_read_number(p, n_root_addr_cells);
> +	p += n_root_addr_cells;
>  	dr_cell->drc_index = of_read_number(p++, 1);
>  	dr_cell->aa_index = of_read_number(p++, 1);
>  	dr_cell->flags = of_read_number(p++, 1);
> @@ -239,7 +242,7 @@ static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
>  	*prop = p;
>  }
> 
> -static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
> +static void __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *data,
>  			void (*func)(struct drmem_lmb *, const __be32 **))
>  {
>  	struct of_drconf_cell_v2 dr_cell;
> @@ -263,7 +266,7 @@ static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
>  			lmb.aa_index = dr_cell.aa_index;
>  			lmb.flags = dr_cell.flags;
> 
> -			func(&lmb, &usm);
> +			func(&lmb, &data);
>  		}
>  	}
>  }
> @@ -275,6 +278,9 @@ void __init walk_drmem_lmbs_early(unsigned long node,
>  	const __be32 *prop, *usm;
>  	int len;
> 
> +	if (n_root_addr_cells == 0)
> +		n_root_addr_cells = dt_root_addr_cells;
> +
>  	prop = of_get_flat_dt_prop(node, "ibm,lmb-size", &len);
>  	if (!prop || len < dt_root_size_cells * sizeof(__be32))
>  		return;
> @@ -353,24 +359,26 @@ void __init walk_drmem_lmbs(struct device_node *dn,
>  	}
>  }
> 
> -static void __init init_drmem_v1_lmbs(const __be32 *prop)
> +static void init_drmem_v1_lmbs(const __be32 *prop,
> +			struct drmem_lmb_info *dinfo)

Please make sure you line up the function args, here and other places below.

>  {
>  	struct drmem_lmb *lmb;
> 
> -	drmem_info->n_lmbs = of_read_number(prop++, 1);
> -	if (drmem_info->n_lmbs == 0)
> +	dinfo->n_lmbs = of_read_number(prop++, 1);
> +	if (dinfo->n_lmbs == 0)
>  		return;
> 
> -	drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
> +	dinfo->lmbs = kcalloc(dinfo->n_lmbs, sizeof(*lmb),
>  				   GFP_KERNEL);> -	if (!drmem_info->lmbs)
> +	if (!dinfo->lmbs)
>  		return;
> 
>  	for_each_drmem_lmb(lmb)
>  		read_drconf_v1_cell(lmb, &prop);
>  }
> 
> -static void __init init_drmem_v2_lmbs(const __be32 *prop)
> +static void init_drmem_v2_lmbs(const __be32 *prop,
> +			struct drmem_lmb_info *dinfo)
>  {
>  	struct drmem_lmb *lmb;
>  	struct of_drconf_cell_v2 dr_cell;
> @@ -386,12 +394,12 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
>  	p = prop;
>  	for (i = 0; i < lmb_sets; i++) {
>  		read_drconf_v2_cell(&dr_cell, &p);
> -		drmem_info->n_lmbs += dr_cell.seq_lmbs;
> +		dinfo->n_lmbs += dr_cell.seq_lmbs;
>  	}
> 
> -	drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
> +	dinfo->lmbs = kcalloc(dinfo->n_lmbs, sizeof(*lmb),
>  				   GFP_KERNEL);
> -	if (!drmem_info->lmbs)
> +	if (!dinfo->lmbs)
>  		return;
> 
>  	/* second pass, read in the LMB information */
> @@ -402,25 +410,51 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
>  		read_drconf_v2_cell(&dr_cell, &p);
> 
>  		for (j = 0; j < dr_cell.seq_lmbs; j++) {
> -			lmb = &drmem_info->lmbs[lmb_index++];
> +			lmb = &dinfo->lmbs[lmb_index++];
> 
>  			lmb->base_addr = dr_cell.base_addr;
> -			dr_cell.base_addr += drmem_info->lmb_size;
> +			dr_cell.base_addr += dinfo->lmb_size;
> 
>  			lmb->drc_index = dr_cell.drc_index;
>  			dr_cell.drc_index++;
> 
>  			lmb->aa_index = dr_cell.aa_index;
> -			lmb->flags = dr_cell.flags;

Why are you removing the setting of the flags field?

>  		}
>  	}
>  }
> 
> +void drmem_lmbs_free(struct drmem_lmb_info *dinfo)
> +{
> +	if (dinfo) {
> +		kfree(dinfo->lmbs);
> +		kfree(dinfo);
> +	}
> +}
> +
> +struct drmem_lmb_info* drmem_init_lmbs(struct property *prop)

Small nit, but this should probably be named drmem_lmbs_init. This follows
more closely to the naming used elsewhere.

-Nathan

> +{
> +	struct drmem_lmb_info *dinfo;
> +
> +	dinfo = kzalloc(sizeof(*dinfo), GFP_KERNEL);
> +	if (!dinfo)
> +		return NULL;
> +
> +	if (!strcmp("ibm,dynamic-memory", prop->name))
> +		init_drmem_v1_lmbs(prop->value, dinfo);
> +	else if (!strcmp("ibm,dynamic-memory-v2", prop->name))
> +		init_drmem_v2_lmbs(prop->value, dinfo);
> +
> +	return dinfo;
> +}
> +
>  static int __init drmem_init(void)
>  {
>  	struct device_node *dn;
>  	const __be32 *prop;
> 
> +	if (n_root_addr_cells == 0)
> +		n_root_addr_cells = dt_root_addr_cells;
> +
>  	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>  	if (!dn) {
>  		pr_info("No dynamic reconfiguration memory found\n");
> @@ -434,11 +468,11 @@ static int __init drmem_init(void)
> 
>  	prop = of_get_property(dn, "ibm,dynamic-memory", NULL);
>  	if (prop) {
> -		init_drmem_v1_lmbs(prop);
> +		init_drmem_v1_lmbs(prop, drmem_info);
>  	} else {
>  		prop = of_get_property(dn, "ibm,dynamic-memory-v2", NULL);
>  		if (prop)
> -			init_drmem_v2_lmbs(prop);
> +			init_drmem_v2_lmbs(prop, drmem_info);
>  	}
> 
>  	of_node_put(dn);
> 

^ permalink raw reply

* Re: [PATCH] crypto: reorder paes test lexicographically
From: Herbert Xu @ 2018-05-18 18:21 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: Abdul Haleem, David S. Miller, Ofir Drang, linux-next,
	Stephen Rothwell, sachinp, linuxppc-dev, linux-crypto,
	linux-kernel
In-Reply-To: <1526025847-13134-1-git-send-email-gilad@benyossef.com>

On Fri, May 11, 2018 at 09:04:06AM +0100, Gilad Ben-Yossef wrote:
> Due to a snafu "paes" testmgr tests were not ordered
> lexicographically, which led to boot time warnings.
> Reorder the tests as needed.
> 
> Fixes: a794d8d ("crypto: ccree - enable support for hardware keys")
> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] crypto: nx: fix spelling mistake: "seqeunce" -> "sequence"
From: Herbert Xu @ 2018-05-18 18:20 UTC (permalink / raw)
  To: Colin King
  Cc: Haren Myneni, David S . Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, linux-crypto, linuxppc-dev,
	kernel-janitors, linux-kernel
In-Reply-To: <20180509091636.13589-1-colin.king@canonical.com>

On Wed, May 09, 2018 at 10:16:36AM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> Trivial fix to spelling mistake in CSB_ERR error message text
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 07/14] powerpc: Add support for restartable sequences
From: Mathieu Desnoyers @ 2018-05-18 18:17 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Peter Zijlstra, Paul E. McKenney, Andy Lutomirski,
	Dave Watson, linux-kernel, linux-api, Paul Turner, Andrew Morton,
	Russell King, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andrew Hunter, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
	Josh Triplett, Linus Torvalds, Catalin Marinas, Michael Kerrisk,
	Joel Fernandes, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linuxppc-dev
In-Reply-To: <1526601043.1338308.1376191416.0444B8C5@webmail.messagingengine.com>

----- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.feng@gmail.com wrote:
[...]
>> > I think you're right. So we have to introduce callsite to rseq_syscall()
>> > in syscall path, something like:
>> > 
>> > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
>> > index 51695608c68b..a25734a96640 100644
>> > --- a/arch/powerpc/kernel/entry_64.S
>> > +++ b/arch/powerpc/kernel/entry_64.S
>> > @@ -222,6 +222,9 @@ system_call_exit:
>> > 	mtmsrd	r11,1
>> > #endif /* CONFIG_PPC_BOOK3E */
>> > 
>> > +	addi    r3,r1,STACK_FRAME_OVERHEAD
>> > +	bl	rseq_syscall
>> > +
>> > 	ld	r9,TI_FLAGS(r12)
>> > 	li	r11,-MAX_ERRNO
>> > 	andi.
>> > 		r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
>> > 

By the way, I think this is not the right spot to call rseq_syscall, because
interrupts are disabled. I think we should move this hunk right after system_call_exit.

Would you like to implement and test an updated patch adding those calls for ppc 32 and 64 ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: pkeys on POWER: Default AMR, UAMOR values
From: Ram Pai @ 2018-05-18 17:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linuxppc-dev, linux-mm, Dave Hansen, Andy Lutomirski
In-Reply-To: <36b98132-d87f-9f75-f1a9-feee36ec8ee6@redhat.com>

On Fri, May 18, 2018 at 03:17:14PM +0200, Florian Weimer wrote:
> I'm working on adding POWER pkeys support to glibc.  The coding work
> is done, but I'm faced with some test suite failures.
> 
> Unlike the default x86 configuration, on POWER, existing threads
> have full access to newly allocated keys.
> 
> Or, more precisely, in this scenario:
> 
> * Thread A launches thread B
> * Thread B waits
> * Thread A allocations a protection key with pkey_alloc
> * Thread A applies the key to a page
> * Thread A signals thread B
> * Thread B starts to run and accesses the page
> 
> Then at the end, the access will be granted.
> 
> I hope it's not too late to change this to denied access.
> 
> Furthermore, I think the UAMOR value is wrong as well because it
> prevents thread B at the end to set the AMR register.  In
> particular, if I do this
> 
> * … (as before)
> * Thread A signals thread B
> * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS
> * Thread B reads the current access rights for the key
> 
> then it still gets 0 (all access permitted) because the original
> UAMOR value inherited from thread A prior to the key allocation
> masks out the access right update for the newly allocated key.

Florian, is the behavior on x86 any different? A key allocated in the
context off one thread is not meaningful in the context of any other
thread. 

Since thread B was created prior to the creation of the key, and the key
was created in the context of thread A, thread B neither inherits the
key nor its permissions. Atleast that is how the semantics are supposed
to work as per the man page.

man 7 pkey 

" Applications  using  threads  and  protection  keys  should
be especially careful.  Threads inherit the protection key rights of the
parent at the time of the clone(2), system call.  Applications should
either ensure that their own permissions are appropriate for child
threads at the time when clone(2) is  called,  or ensure that each child
thread can perform its own initialization of protection key rights."


RP

^ permalink raw reply

* Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
From: Sandipan Das @ 2018-05-18 16:17 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, netdev, naveen.n.rao, linuxppc-dev, jakub.kicinski
In-Reply-To: <242592f3-c2b4-04bb-7a6c-2394ae4fee98@iogearbox.net>


On 05/18/2018 08:45 PM, Daniel Borkmann wrote:
> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>> The imm field of a bpf instruction is a signed 32-bit integer.
>> For JIT bpf-to-bpf function calls, it stores the offset of the
>> start address of the callee's JITed image from __bpf_call_base.
>>
>> For some architectures, such as powerpc64, this offset may be
>> as large as 64 bits and cannot be accomodated in the imm field
>> without truncation.
>>
>> We resolve this by:
>>
>> [1] Additionally using the auxillary data of each function to
>>     keep a list of start addresses of the JITed images for all
>>     functions determined by the verifier.
>>
>> [2] Retaining the subprog id inside the off field of the call
>>     instructions and using it to index into the list mentioned
>>     above and lookup the callee's address.
>>
>> To make sure that the existing JIT compilers continue to work
>> without requiring changes, we keep the imm field as it is.
>>
>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>> ---
>>  kernel/bpf/verifier.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index a9e4b1372da6..6c56cce9c4e3 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>>  			    insn->src_reg != BPF_PSEUDO_CALL)
>>  				continue;
>>  			subprog = insn->off;
>> -			insn->off = 0;
>>  			insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>>  				func[subprog]->bpf_func -
>>  				__bpf_call_base;
>>  		}
>> +
>> +		/* we use the aux data to keep a list of the start addresses
>> +		 * of the JITed images for each function in the program
>> +		 *
>> +		 * for some architectures, such as powerpc64, the imm field
>> +		 * might not be large enough to hold the offset of the start
>> +		 * address of the callee's JITed image from __bpf_call_base
>> +		 *
>> +		 * in such cases, we can lookup the start address of a callee
>> +		 * by using its subprog id, available from the off field of
>> +		 * the call instruction, as an index for this list
>> +		 */
>> +		func[i]->aux->func = func;
>> +		func[i]->aux->func_cnt = env->subprog_cnt + 1;
> 
> The target tree you have here is infact bpf, since in bpf-next there was a
> cleanup where the + 1 is removed. Just for the record that we need to keep
> this in mind for bpf into bpf-next merge since this would otherwise subtly
> break.
> 

Sorry about the wrong tag. This series is indeed based off bpf-next.

- Sandipan

>>  	}
>>  	for (i = 0; i < env->subprog_cnt; i++) {
>>  		old_bpf_func = func[i]->bpf_func;
>>
> 
> 

^ permalink raw reply

* Re: [PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header
From: Alexei Starovoitov @ 2018-05-18 16:08 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Alexei Starovoitov, Daniel Borkmann, Network Development,
	linuxppc-dev, Naveen N. Rao, Michael Ellerman, Jakub Kicinski

On Fri, May 18, 2018 at 5:50 AM, Sandipan Das
<sandipan@linux.vnet.ibm.com> wrote:
> Syncing the bpf.h uapi header with tools so that struct
> bpf_prog_info has the two new fields for passing on the
> addresses of the kernel symbols corresponding to each
> function in a JITed program.
>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  tools/include/uapi/linux/bpf.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
>         __u32 xlated_prog_len;
>         __aligned_u64 jited_prog_insns;
>         __aligned_u64 xlated_prog_insns;
> +       __aligned_u64 jited_ksyms;
> +       __u32 nr_jited_ksyms;
>         __u64 load_time;        /* ns since boottime */
>         __u32 created_by_uid;
>         __u32 nr_map_ids;

this breaks uapi.
New fields can only be added to the end.

^ permalink raw reply

* Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Daniel Borkmann @ 2018-05-18 16:08 UTC (permalink / raw)
  To: Naveen N. Rao, ast, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, mpe, netdev
In-Reply-To: <1526658936.9wk22hv49g.naveen@linux.ibm.com>

On 05/18/2018 06:05 PM, Naveen N. Rao wrote:
> Daniel Borkmann wrote:
>> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>>> This adds support for bpf-to-bpf function calls in the powerpc64
>>> JIT compiler. The JIT compiler converts the bpf call instructions
>>> to native branch instructions. After a round of the usual passes,
>>> the start addresses of the JITed images for the callee functions
>>> are known. Finally, to fixup the branch target addresses, we need
>>> to perform an extra pass.
>>>
>>> Because of the address range in which JITed images are allocated
>>> on powerpc64, the offsets of the start addresses of these images
>>> from __bpf_call_base are as large as 64 bits. So, for a function
>>> call, we cannot use the imm field of the instruction to determine
>>> the callee's address. Instead, we use the alternative method of
>>> getting it from the list of function addresses in the auxillary
>>> data of the caller by using the off field as an index.
>>>
>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 69 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
>>> index 1bdb1aff0619..25939892d8f7 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
>>>  /* Assemble the body code between the prologue & epilogue */
>>>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>>                    struct codegen_context *ctx,
>>> -                  u32 *addrs)
>>> +                  u32 *addrs, bool extra_pass)
>>>  {
>>>      const struct bpf_insn *insn = fp->insnsi;
>>>      int flen = fp->len;
>>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>>              break;
>>>  
>>>          /*
>>> -         * Call kernel helper
>>> +         * Call kernel helper or bpf function
>>>           */
>>>          case BPF_JMP | BPF_CALL:
>>>              ctx->seen |= SEEN_FUNC;
>>> -            func = (u8 *) __bpf_call_base + imm;
>>> +
>>> +            /* bpf function call */
>>> +            if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
>>
>> Perhaps it might make sense here for !extra_pass to set func to some dummy
>> address as otherwise the 'kernel helper call' branch used for this is a bit
>> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
>> optimizes the immediate addr, I presume the JIT can handle situations where
>> in the final extra_pass the image needs to grow/shrink again (due to different
>> final address for the call)?
> 
> That's a good catch. We don't handle that -- we expect to get the size right on first pass. We could probably have PPC_FUNC_ADDR() pad the result with nops to make it a constant 5-instruction sequence.

Yeah, arm64 does something similar by not optimizing the imm in order to always
emit 4 insns for it.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Naveen N. Rao @ 2018-05-18 16:05 UTC (permalink / raw)
  To: ast, Daniel Borkmann, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, mpe, netdev
In-Reply-To: <c3b23da2-0781-a1ee-0d49-71e9efb52e66@iogearbox.net>

Daniel Borkmann wrote:
> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>> This adds support for bpf-to-bpf function calls in the powerpc64
>> JIT compiler. The JIT compiler converts the bpf call instructions
>> to native branch instructions. After a round of the usual passes,
>> the start addresses of the JITed images for the callee functions
>> are known. Finally, to fixup the branch target addresses, we need
>> to perform an extra pass.
>>=20
>> Because of the address range in which JITed images are allocated
>> on powerpc64, the offsets of the start addresses of these images
>> from __bpf_call_base are as large as 64 bits. So, for a function
>> call, we cannot use the imm field of the instruction to determine
>> the callee's address. Instead, we use the alternative method of
>> getting it from the list of function addresses in the auxillary
>> data of the caller by using the off field as an index.
>>=20
>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++=
++-----
>>  1 file changed, 69 insertions(+), 10 deletions(-)
>>=20
>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_ji=
t_comp64.c
>> index 1bdb1aff0619..25939892d8f7 100644
>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struc=
t codegen_context *ctx, u32
>>  /* Assemble the body code between the prologue & epilogue */
>>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>  			      struct codegen_context *ctx,
>> -			      u32 *addrs)
>> +			      u32 *addrs, bool extra_pass)
>>  {
>>  	const struct bpf_insn *insn =3D fp->insnsi;
>>  	int flen =3D fp->len;
>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp,=
 u32 *image,
>>  			break;
>> =20
>>  		/*
>> -		 * Call kernel helper
>> +		 * Call kernel helper or bpf function
>>  		 */
>>  		case BPF_JMP | BPF_CALL:
>>  			ctx->seen |=3D SEEN_FUNC;
>> -			func =3D (u8 *) __bpf_call_base + imm;
>> +
>> +			/* bpf function call */
>> +			if (insn[i].src_reg =3D=3D BPF_PSEUDO_CALL && extra_pass)
>=20
> Perhaps it might make sense here for !extra_pass to set func to some dumm=
y
> address as otherwise the 'kernel helper call' branch used for this is a b=
it
> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
> optimizes the immediate addr, I presume the JIT can handle situations whe=
re
> in the final extra_pass the image needs to grow/shrink again (due to diff=
erent
> final address for the call)?

That's a good catch. We don't handle that -- we expect to get the size=20
right on first pass. We could probably have PPC_FUNC_ADDR() pad the=20
result with nops to make it a constant 5-instruction sequence.

>=20
>> +				if (fp->aux->func && off < fp->aux->func_cnt)
>> +					/* use the subprog id from the off
>> +					 * field to lookup the callee address
>> +					 */
>> +					func =3D (u8 *) fp->aux->func[off]->bpf_func;
>> +				else
>> +					return -EINVAL;
>> +			/* kernel helper call */
>> +			else
>> +				func =3D (u8 *) __bpf_call_base + imm;
>> =20
>>  			bpf_jit_emit_func_call(image, ctx, (u64)func);
>> =20
>> @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, =
u32 *image,
>>  	return 0;
>>  }
>> =20
>> +struct powerpc64_jit_data {
>> +	struct bpf_binary_header *header;
>> +	u32 *addrs;
>> +	u8 *image;
>> +	u32 proglen;
>> +	struct codegen_context ctx;
>> +};
>> +
>>  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  {
>>  	u32 proglen;
>> @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog=
 *fp)
>>  	u8 *image =3D NULL;
>>  	u32 *code_base;
>>  	u32 *addrs;
>> +	struct powerpc64_jit_data *jit_data;
>>  	struct codegen_context cgctx;
>>  	int pass;
>>  	int flen;
>> @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog=
 *fp)
>>  	struct bpf_prog *org_fp =3D fp;
>>  	struct bpf_prog *tmp_fp;
>>  	bool bpf_blinded =3D false;
>> +	bool extra_pass =3D false;
>> =20
>>  	if (!fp->jit_requested)
>>  		return org_fp;
>> @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_pro=
g *fp)
>>  		fp =3D tmp_fp;
>>  	}
>> =20
>> +	jit_data =3D fp->aux->jit_data;
>> +	if (!jit_data) {
>> +		jit_data =3D kzalloc(sizeof(*jit_data), GFP_KERNEL);
>> +		if (!jit_data) {
>> +			fp =3D org_fp;
>> +			goto out;
>> +		}
>> +		fp->aux->jit_data =3D jit_data;
>> +	}
>> +
>>  	flen =3D fp->len;
>> +	addrs =3D jit_data->addrs;
>> +	if (addrs) {
>> +		cgctx =3D jit_data->ctx;
>> +		image =3D jit_data->image;
>> +		bpf_hdr =3D jit_data->header;
>> +		proglen =3D jit_data->proglen;
>> +		alloclen =3D proglen + FUNCTION_DESCR_SIZE;
>> +		extra_pass =3D true;
>> +		goto skip_init_ctx;
>> +	}
>> +
>>  	addrs =3D kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
>>  	if (addrs =3D=3D NULL) {
>>  		fp =3D org_fp;
>=20
> In this case of !addrs, we leak the just allocated jit_data here!
>=20
>> @@ -904,10 +947,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_pr=
og *fp)
>>  	cgctx.stack_size =3D round_up(fp->aux->stack_depth, 16);
>> =20
>>  	/* Scouting faux-generate pass 0 */
>> -	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
>> +	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
>>  		/* We hit something illegal or unsupported. */
>>  		fp =3D org_fp;
>> -		goto out;
>> +		goto out_addrs;
>>  	}
>> =20
>>  	/*
>> @@ -925,9 +968,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_pro=
g *fp)
>>  			bpf_jit_fill_ill_insns);
>>  	if (!bpf_hdr) {
>>  		fp =3D org_fp;
>> -		goto out;
>> +		goto out_addrs;
>>  	}
>> =20
>> +skip_init_ctx:
>>  	code_base =3D (u32 *)(image + FUNCTION_DESCR_SIZE);
>> =20
>>  	/* Code generation passes 1-2 */
>> @@ -935,7 +979,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog=
 *fp)
>>  		/* Now build the prologue, body code & epilogue for real. */
>>  		cgctx.idx =3D 0;
>>  		bpf_jit_build_prologue(code_base, &cgctx);
>> -		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
>> +		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
>>  		bpf_jit_build_epilogue(code_base, &cgctx);
>> =20
>>  		if (bpf_jit_enable > 1)
>> @@ -956,15 +1000,30 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_p=
rog *fp)
>>  	((u64 *)image)[1] =3D local_paca->kernel_toc;
>>  #endif
>> =20
>> +	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE)=
);
>> +
>> +	if (!fp->is_func || extra_pass) {
>> +		bpf_jit_binary_lock_ro(bpf_hdr);
>=20
> powerpc doesn't implement set_memory_ro(). Generally this is not a proble=
m since
> set_memory_ro() defaults to 'return 0' in this case, but since the bpf_ji=
t_free()
> destructor is overridden here, there's no bpf_jit_binary_unlock_ro() and =
in case
> powerpc would get set_memory_*() support one day this will then crash in =
random
> places once the mem gets back to the allocator, thus hard to debug. Two o=
ptions:
> either you remove the bpf_jit_free() override or you remove the bpf_jit_b=
inary_lock_ro().

Yeah, we shouldn't be using the lock here.

Thanks,
Naveen

=

^ permalink raw reply

* Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall
From: Daniel Borkmann @ 2018-05-18 15:51 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-7-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> Currently, for multi-function programs, we cannot get the JITed
> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
> command. Because of this, userspace tools such as bpftool fail
> to identify a multi-function program as being JITed or not.
> 
> With the JIT enabled and the test program running, this can be
> verified as follows:
> 
>   # cat /proc/sys/net/core/bpf_jit_enable
>   1
> 
> Before applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T11:43:38+0530  uid 0
>           xlated 216B  not jited  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>   no instructions returned
> 
> After applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T12:13:01+0530  uid 0
>           xlated 216B  jited 308B  memlock 65536B
>   ...

That's really nice! One comment inline below:

>   # bpftool prog dump jited id 1
>      0:   nop
>      4:   nop
>      8:   mflr    r0
>      c:   std     r0,16(r1)
>     10:   stdu    r1,-112(r1)
>     14:   std     r31,104(r1)
>     18:   addi    r31,r1,48
>     1c:   li      r3,10
>   ...
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  kernel/bpf/syscall.c | 38 ++++++++++++++++++++++++++++++++------
>  1 file changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 54a72fafe57c..2430d159078c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	struct bpf_prog_info info = {};
>  	u32 info_len = attr->info.info_len;
>  	char __user *uinsns;
> -	u32 ulen;
> +	u32 ulen, i;
>  	int err;
>  
>  	err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
> @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	ulen = min_t(u32, info.nr_map_ids, ulen);
>  	if (ulen) {
>  		u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids);
> -		u32 i;
>  
>  		for (i = 0; i < ulen; i++)
>  			if (put_user(prog->aux->used_maps[i]->id,
> @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	 * for offload.
>  	 */
>  	ulen = info.jited_prog_len;
> -	info.jited_prog_len = prog->jited_len;
> +	if (prog->aux->func_cnt) {
> +		info.jited_prog_len = 0;
> +		for (i = 0; i < prog->aux->func_cnt; i++)
> +			info.jited_prog_len += prog->aux->func[i]->jited_len;
> +	} else {
> +		info.jited_prog_len = prog->jited_len;
> +	}
> +
>  	if (info.jited_prog_len && ulen) {
>  		if (bpf_dump_raw_ok()) {
>  			uinsns = u64_to_user_ptr(info.jited_prog_insns);
>  			ulen = min_t(u32, info.jited_prog_len, ulen);
> -			if (copy_to_user(uinsns, prog->bpf_func, ulen))
> -				return -EFAULT;
> +
> +			/* for multi-function programs, copy the JITed
> +			 * instructions for all the functions
> +			 */
> +			if (prog->aux->func_cnt) {
> +				u32 len, free;
> +				u8 *img;
> +
> +				free = ulen;
> +				for (i = 0; i < prog->aux->func_cnt; i++) {
> +					len = prog->aux->func[i]->jited_len;
> +					img = (u8 *) prog->aux->func[i]->bpf_func;
> +					if (len > free)
> +						break;
> +					if (copy_to_user(uinsns, img, len))
> +						return -EFAULT;
> +					uinsns += len;
> +					free -= len;

Is there any way we can introduce a delimiter between the different
images such that they could be more easily correlated with the call
from the main (or other sub-)program instead of having one contiguous
dump blob?

> +				}
> +			} else {
> +				if (copy_to_user(uinsns, prog->bpf_func, ulen))
> +					return -EFAULT;
> +			}
>  		} else {
>  			info.jited_prog_insns = 0;
>  		}
> @@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	if (info.nr_jited_ksyms && ulen) {
>  		u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms);
>  		ulong ksym_addr;
> -		u32 i;
>  
>  		/* copy the address of the kernel symbol corresponding to
>  		 * each function
> 

^ permalink raw reply

* Re: [PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall
From: Daniel Borkmann @ 2018-05-18 15:43 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-4-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds new two new fields to struct bpf_prog_info. For
> multi-function programs, these fields can be used to pass
> a list of kernel symbol addresses for all functions in a
> given program and to userspace using the bpf system call
> with the BPF_OBJ_GET_INFO_BY_FD command.
> 
> When bpf_jit_kallsyms is enabled, we can get the address
> of the corresponding kernel symbol for a callee function
> and resolve the symbol's name. The address is determined
> by adding the value of the call instruction's imm field
> to __bpf_call_base. This offset gets assigned to the imm
> field by the verifier.
> 
> For some architectures, such as powerpc64, the imm field
> is not large enough to hold this offset.
> 
> We resolve this by:
> 
> [1] Assigning the subprog id to the imm field of a call
>     instruction in the verifier instead of the offset of
>     the callee's symbol's address from __bpf_call_base.
> 
> [2] Determining the address of a callee's corresponding
>     symbol by using the imm field as an index for the
>     list of kernel symbol addresses now available from
>     the program info.
> 
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  include/uapi/linux/bpf.h |  2 ++
>  kernel/bpf/syscall.c     | 20 ++++++++++++++++++++
>  kernel/bpf/verifier.c    |  7 +------
>  3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
>  	__u32 xlated_prog_len;
>  	__aligned_u64 jited_prog_insns;
>  	__aligned_u64 xlated_prog_insns;
> +	__aligned_u64 jited_ksyms;
> +	__u32 nr_jited_ksyms;
>  	__u64 load_time;	/* ns since boottime */
>  	__u32 created_by_uid;
>  	__u32 nr_map_ids;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index bfcde949c7f8..54a72fafe57c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	if (!capable(CAP_SYS_ADMIN)) {
>  		info.jited_prog_len = 0;
>  		info.xlated_prog_len = 0;
> +		info.nr_jited_ksyms = 0;
>  		goto done;
>  	}
>  
> @@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  		}
>  	}
>  
> +	ulen = info.nr_jited_ksyms;
> +	info.nr_jited_ksyms = prog->aux->func_cnt;
> +	if (info.nr_jited_ksyms && ulen) {

Since this exposes addresses (though masked one which is correct), this
definitely needs to be guarded with bpf_dump_raw_ok() like we do in other
places here (see JIT dump for example).

> +		u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms);
> +		ulong ksym_addr;
> +		u32 i;
> +
> +		/* copy the address of the kernel symbol corresponding to
> +		 * each function
> +		 */
> +		ulen = min_t(u32, info.nr_jited_ksyms, ulen);
> +		for (i = 0; i < ulen; i++) {
> +			ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
> +			ksym_addr &= PAGE_MASK;
> +			if (put_user((u64) ksym_addr, &user_jited_ksyms[i]))
> +				return -EFAULT;
> +		}
> +	}
> +
>  done:
>  	if (copy_to_user(uinfo, &info, info_len) ||
>  	    put_user(info_len, &uattr->info.info_len))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6c56cce9c4e3..e826c396aba2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>  	 * later look the same as if they were interpreted only.
>  	 */
>  	for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
> -		unsigned long addr;
> -
>  		if (insn->code != (BPF_JMP | BPF_CALL) ||
>  		    insn->src_reg != BPF_PSEUDO_CALL)
>  			continue;
>  		insn->off = env->insn_aux_data[i].call_imm;
>  		subprog = find_subprog(env, i + insn->off + 1);
> -		addr  = (unsigned long)func[subprog]->bpf_func;

Hmm, in current bpf tree this says 'subprog + 1' here, so this is not
rebased against bpf tree but bpf-next (unlike what subject says)?

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/kernel/bpf/verifier.c#n5351

> -		addr &= PAGE_MASK;
> -		insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
> -			    addr - __bpf_call_base;
> +		insn->imm = subprog;
>  	}
>  
>  	prog->jited = 1;
> 

^ permalink raw reply

* Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
From: Daniel Borkmann @ 2018-05-18 15:15 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-2-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> The imm field of a bpf instruction is a signed 32-bit integer.
> For JIT bpf-to-bpf function calls, it stores the offset of the
> start address of the callee's JITed image from __bpf_call_base.
> 
> For some architectures, such as powerpc64, this offset may be
> as large as 64 bits and cannot be accomodated in the imm field
> without truncation.
> 
> We resolve this by:
> 
> [1] Additionally using the auxillary data of each function to
>     keep a list of start addresses of the JITed images for all
>     functions determined by the verifier.
> 
> [2] Retaining the subprog id inside the off field of the call
>     instructions and using it to index into the list mentioned
>     above and lookup the callee's address.
> 
> To make sure that the existing JIT compilers continue to work
> without requiring changes, we keep the imm field as it is.
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  kernel/bpf/verifier.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a9e4b1372da6..6c56cce9c4e3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>  			    insn->src_reg != BPF_PSEUDO_CALL)
>  				continue;
>  			subprog = insn->off;
> -			insn->off = 0;
>  			insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>  				func[subprog]->bpf_func -
>  				__bpf_call_base;
>  		}
> +
> +		/* we use the aux data to keep a list of the start addresses
> +		 * of the JITed images for each function in the program
> +		 *
> +		 * for some architectures, such as powerpc64, the imm field
> +		 * might not be large enough to hold the offset of the start
> +		 * address of the callee's JITed image from __bpf_call_base
> +		 *
> +		 * in such cases, we can lookup the start address of a callee
> +		 * by using its subprog id, available from the off field of
> +		 * the call instruction, as an index for this list
> +		 */
> +		func[i]->aux->func = func;
> +		func[i]->aux->func_cnt = env->subprog_cnt + 1;

The target tree you have here is infact bpf, since in bpf-next there was a
cleanup where the + 1 is removed. Just for the record that we need to keep
this in mind for bpf into bpf-next merge since this would otherwise subtly
break.

>  	}
>  	for (i = 0; i < env->subprog_cnt; i++) {
>  		old_bpf_func = func[i]->bpf_func;
> 

^ permalink raw reply

* Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Daniel Borkmann @ 2018-05-18 15:30 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-3-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds support for bpf-to-bpf function calls in the powerpc64
> JIT compiler. The JIT compiler converts the bpf call instructions
> to native branch instructions. After a round of the usual passes,
> the start addresses of the JITed images for the callee functions
> are known. Finally, to fixup the branch target addresses, we need
> to perform an extra pass.
> 
> Because of the address range in which JITed images are allocated
> on powerpc64, the offsets of the start addresses of these images
> from __bpf_call_base are as large as 64 bits. So, for a function
> call, we cannot use the imm field of the instruction to determine
> the callee's address. Instead, we use the alternative method of
> getting it from the list of function addresses in the auxillary
> data of the caller by using the off field as an index.
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 69 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index 1bdb1aff0619..25939892d8f7 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
>  /* Assemble the body code between the prologue & epilogue */
>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>  			      struct codegen_context *ctx,
> -			      u32 *addrs)
> +			      u32 *addrs, bool extra_pass)
>  {
>  	const struct bpf_insn *insn = fp->insnsi;
>  	int flen = fp->len;
> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>  			break;
>  
>  		/*
> -		 * Call kernel helper
> +		 * Call kernel helper or bpf function
>  		 */
>  		case BPF_JMP | BPF_CALL:
>  			ctx->seen |= SEEN_FUNC;
> -			func = (u8 *) __bpf_call_base + imm;
> +
> +			/* bpf function call */
> +			if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)

Perhaps it might make sense here for !extra_pass to set func to some dummy
address as otherwise the 'kernel helper call' branch used for this is a bit
misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
optimizes the immediate addr, I presume the JIT can handle situations where
in the final extra_pass the image needs to grow/shrink again (due to different
final address for the call)?

> +				if (fp->aux->func && off < fp->aux->func_cnt)
> +					/* use the subprog id from the off
> +					 * field to lookup the callee address
> +					 */
> +					func = (u8 *) fp->aux->func[off]->bpf_func;
> +				else
> +					return -EINVAL;
> +			/* kernel helper call */
> +			else
> +				func = (u8 *) __bpf_call_base + imm;
>  
>  			bpf_jit_emit_func_call(image, ctx, (u64)func);
>  
> @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>  	return 0;
>  }
>  
> +struct powerpc64_jit_data {
> +	struct bpf_binary_header *header;
> +	u32 *addrs;
> +	u8 *image;
> +	u32 proglen;
> +	struct codegen_context ctx;
> +};
> +
>  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  {
>  	u32 proglen;
> @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  	u8 *image = NULL;
>  	u32 *code_base;
>  	u32 *addrs;
> +	struct powerpc64_jit_data *jit_data;
>  	struct codegen_context cgctx;
>  	int pass;
>  	int flen;
> @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  	struct bpf_prog *org_fp = fp;
>  	struct bpf_prog *tmp_fp;
>  	bool bpf_blinded = false;
> +	bool extra_pass = false;
>  
>  	if (!fp->jit_requested)
>  		return org_fp;
> @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  		fp = tmp_fp;
>  	}
>  
> +	jit_data = fp->aux->jit_data;
> +	if (!jit_data) {
> +		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
> +		if (!jit_data) {
> +			fp = org_fp;
> +			goto out;
> +		}
> +		fp->aux->jit_data = jit_data;
> +	}
> +
>  	flen = fp->len;
> +	addrs = jit_data->addrs;
> +	if (addrs) {
> +		cgctx = jit_data->ctx;
> +		image = jit_data->image;
> +		bpf_hdr = jit_data->header;
> +		proglen = jit_data->proglen;
> +		alloclen = proglen + FUNCTION_DESCR_SIZE;
> +		extra_pass = true;
> +		goto skip_init_ctx;
> +	}
> +
>  	addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
>  	if (addrs == NULL) {
>  		fp = org_fp;

In this case of !addrs, we leak the just allocated jit_data here!

> @@ -904,10 +947,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
>  
>  	/* Scouting faux-generate pass 0 */
> -	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
> +	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
>  		/* We hit something illegal or unsupported. */
>  		fp = org_fp;
> -		goto out;
> +		goto out_addrs;
>  	}
>  
>  	/*
> @@ -925,9 +968,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  			bpf_jit_fill_ill_insns);
>  	if (!bpf_hdr) {
>  		fp = org_fp;
> -		goto out;
> +		goto out_addrs;
>  	}
>  
> +skip_init_ctx:
>  	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
>  
>  	/* Code generation passes 1-2 */
> @@ -935,7 +979,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  		/* Now build the prologue, body code & epilogue for real. */
>  		cgctx.idx = 0;
>  		bpf_jit_build_prologue(code_base, &cgctx);
> -		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
> +		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
>  		bpf_jit_build_epilogue(code_base, &cgctx);
>  
>  		if (bpf_jit_enable > 1)
> @@ -956,15 +1000,30 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  	((u64 *)image)[1] = local_paca->kernel_toc;
>  #endif
>  
> +	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
> +
> +	if (!fp->is_func || extra_pass) {
> +		bpf_jit_binary_lock_ro(bpf_hdr);

powerpc doesn't implement set_memory_ro(). Generally this is not a problem since
set_memory_ro() defaults to 'return 0' in this case, but since the bpf_jit_free()
destructor is overridden here, there's no bpf_jit_binary_unlock_ro() and in case
powerpc would get set_memory_*() support one day this will then crash in random
places once the mem gets back to the allocator, thus hard to debug. Two options:
either you remove the bpf_jit_free() override or you remove the bpf_jit_binary_lock_ro().

> +	} else {
> +		jit_data->addrs = addrs;
> +		jit_data->ctx = cgctx;
> +		jit_data->proglen = proglen;
> +		jit_data->image = image;
> +		jit_data->header = bpf_hdr;
> +	}
> +
>  	fp->bpf_func = (void *)image;
>  	fp->jited = 1;
>  	fp->jited_len = alloclen;
>  
> -	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
> +	if (!fp->is_func || extra_pass) {
> +out_addrs:
> +		kfree(addrs);
> +		kfree(jit_data);
> +		fp->aux->jit_data = NULL;
> +	}
>  
>  out:
> -	kfree(addrs);
> -
>  	if (bpf_blinded)
>  		bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
>  
> 

^ permalink raw reply

* Re: [PATCH 0/7] Miscellaneous patches and bug fixes
From: Martin K. Petersen @ 2018-05-18 15:23 UTC (permalink / raw)
  To: Uma Krishnan
  Cc: linux-scsi, James Bottomley, Martin K. Petersen, Matthew R. Ochs,
	Manoj N. Kumar, linuxppc-dev, Andrew Donnellan, Frederic Barrat,
	Christophe Lombard
In-Reply-To: <1526065440-38806-1-git-send-email-ukrishn@linux.vnet.ibm.com>


Uma,

> This patch series adds few improvements to the cxlflash driver and it
> also contains couple of bug fixes.

Applied to 4.18/scsi-queue, thank you!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH v2 5/5] powerpc/lib: inline memcmp() for small constant sizes
From: Segher Boessenkool @ 2018-05-18 15:20 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, linux-kernel
In-Reply-To: <7a2c3de9-4223-ec47-b3c0-1336c9cdbeee@c-s.fr>

On Fri, May 18, 2018 at 12:35:48PM +0200, Christophe Leroy wrote:
> On 05/17/2018 03:55 PM, Segher Boessenkool wrote:
> >On Thu, May 17, 2018 at 12:49:58PM +0200, Christophe Leroy wrote:
> >>In my 8xx configuration, I get 208 calls to memcmp()
> >Could you show results with a more recent GCC?  What version was this?
> 
> It was with the latest GCC version I have available in my environment, 
> that is GCC 5.4. Is that too old ?

Since GCC 7 the compiler knows how to do this, for powerpc; in GCC 8
it has improved still.

> It seems that version inlines memcmp() when length is 1. All other 
> lengths call memcmp()

Yup.

> c000d018 <tstcmp4>:
> c000d018:	80 64 00 00 	lwz     r3,0(r4)
> c000d01c:	81 25 00 00 	lwz     r9,0(r5)
> c000d020:	7c 69 18 50 	subf    r3,r9,r3
> c000d024:	4e 80 00 20 	blr

This is incorrect, it does not get the sign of the result correct.
Say when comparing 0xff 0xff 0xff 0xff to 0 0 0 0.  This should return
positive, but it returns negative.

For Power9 GCC does

        lwz 3,0(3)
        lwz 9,0(4)
        cmpld 7,3,9
        setb 3,7

and for Power7/Power8,

        lwz 9,0(3)
        lwz 3,0(4)
        subfc 3,3,9
        popcntd 3,3
        subfe 9,9,9
        or 3,3,9

(and it gives up for earlier CPUs, there is no nice simple code sequence
as far as we know.  Code size matters when generating inline code).

(Generating code for -m32 it is the same, just w instead of d in a few
places).

> c000d09c <tstcmp8>:
> c000d09c:	81 25 00 04 	lwz     r9,4(r5)
> c000d0a0:	80 64 00 04 	lwz     r3,4(r4)
> c000d0a4:	81 04 00 00 	lwz     r8,0(r4)
> c000d0a8:	81 45 00 00 	lwz     r10,0(r5)
> c000d0ac:	7c 69 18 10 	subfc   r3,r9,r3
> c000d0b0:	7d 2a 41 10 	subfe   r9,r10,r8
> c000d0b4:	7d 2a fe 70 	srawi   r10,r9,31
> c000d0b8:	7d 48 4b 79 	or.     r8,r10,r9
> c000d0bc:	4d a2 00 20 	bclr+   12,eq
> c000d0c0:	7d 23 4b 78 	mr      r3,r9
> c000d0c4:	4e 80 00 20 	blr

> This shows that on PPC32, the 8 bytes comparison is not optimal, I will 
> improve it.

It's not correct either (same problem as with length 4).


Segher

^ permalink raw reply

* [PATCH] powerpc/32: Implement csum_ipv6_magic in assembly
From: Christophe Leroy @ 2018-05-18 15:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, netdev

The generic csum_ipv6_magic() generates a pretty bad result

00000000 <csum_ipv6_magic>:
   0:	81 23 00 00 	lwz     r9,0(r3)
   4:	81 03 00 04 	lwz     r8,4(r3)
   8:	7c e7 4a 14 	add     r7,r7,r9
   c:	7d 29 38 10 	subfc   r9,r9,r7
  10:	7d 4a 51 10 	subfe   r10,r10,r10
  14:	7d 27 42 14 	add     r9,r7,r8
  18:	7d 2a 48 50 	subf    r9,r10,r9
  1c:	80 e3 00 08 	lwz     r7,8(r3)
  20:	7d 08 48 10 	subfc   r8,r8,r9
  24:	7d 4a 51 10 	subfe   r10,r10,r10
  28:	7d 29 3a 14 	add     r9,r9,r7
  2c:	81 03 00 0c 	lwz     r8,12(r3)
  30:	7d 2a 48 50 	subf    r9,r10,r9
  34:	7c e7 48 10 	subfc   r7,r7,r9
  38:	7d 4a 51 10 	subfe   r10,r10,r10
  3c:	7d 29 42 14 	add     r9,r9,r8
  40:	7d 2a 48 50 	subf    r9,r10,r9
  44:	80 e4 00 00 	lwz     r7,0(r4)
  48:	7d 08 48 10 	subfc   r8,r8,r9
  4c:	7d 4a 51 10 	subfe   r10,r10,r10
  50:	7d 29 3a 14 	add     r9,r9,r7
  54:	7d 2a 48 50 	subf    r9,r10,r9
  58:	81 04 00 04 	lwz     r8,4(r4)
  5c:	7c e7 48 10 	subfc   r7,r7,r9
  60:	7d 4a 51 10 	subfe   r10,r10,r10
  64:	7d 29 42 14 	add     r9,r9,r8
  68:	7d 2a 48 50 	subf    r9,r10,r9
  6c:	80 e4 00 08 	lwz     r7,8(r4)
  70:	7d 08 48 10 	subfc   r8,r8,r9
  74:	7d 4a 51 10 	subfe   r10,r10,r10
  78:	7d 29 3a 14 	add     r9,r9,r7
  7c:	7d 2a 48 50 	subf    r9,r10,r9
  80:	81 04 00 0c 	lwz     r8,12(r4)
  84:	7c e7 48 10 	subfc   r7,r7,r9
  88:	7d 4a 51 10 	subfe   r10,r10,r10
  8c:	7d 29 42 14 	add     r9,r9,r8
  90:	7d 2a 48 50 	subf    r9,r10,r9
  94:	7d 08 48 10 	subfc   r8,r8,r9
  98:	7d 4a 51 10 	subfe   r10,r10,r10
  9c:	7d 29 2a 14 	add     r9,r9,r5
  a0:	7d 2a 48 50 	subf    r9,r10,r9
  a4:	7c a5 48 10 	subfc   r5,r5,r9
  a8:	7c 63 19 10 	subfe   r3,r3,r3
  ac:	7d 29 32 14 	add     r9,r9,r6
  b0:	7d 23 48 50 	subf    r9,r3,r9
  b4:	7c c6 48 10 	subfc   r6,r6,r9
  b8:	7c 63 19 10 	subfe   r3,r3,r3
  bc:	7c 63 48 50 	subf    r3,r3,r9
  c0:	54 6a 80 3e 	rotlwi  r10,r3,16
  c4:	7c 63 52 14 	add     r3,r3,r10
  c8:	7c 63 18 f8 	not     r3,r3
  cc:	54 63 84 3e 	rlwinm  r3,r3,16,16,31
  d0:	4e 80 00 20 	blr

This patch implements it in assembly for PPC32

Link: https://github.com/linuxppc/linux/issues/9
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/checksum.h |  8 ++++++++
 arch/powerpc/lib/checksum_32.S      | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 54065caa40b3..c41c280c252f 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -13,6 +13,7 @@
 #include <asm-generic/checksum.h>
 #else
 #include <linux/bitops.h>
+#include <linux/in6.h>
 /*
  * Computes the checksum of a memory block at src, length len,
  * and adds in "sum" (32-bit), while copying the block to dst.
@@ -211,6 +212,13 @@ static inline __sum16 ip_compute_csum(const void *buff, int len)
 	return csum_fold(csum_partial(buff, len, 0));
 }
 
+#ifdef CONFIG_PPC32
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum sum);
+#endif
+
 #endif
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index 9a671c774b22..f9c047145696 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -293,3 +293,36 @@ dst_error:
 	EX_TABLE(51b, dst_error);
 
 EXPORT_SYMBOL(csum_partial_copy_generic)
+
+/*
+ * static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ *				      const struct in6_addr *daddr,
+ *				      __u32 len, __u8 proto, __wsum sum)
+ */
+
+_GLOBAL(csum_ipv6_magic)
+	lwz	r8, 0(r3)
+	lwz	r9, 4(r3)
+	lwz	r10, 8(r3)
+	lwz	r11, 12(r3)
+	addc	r0, r5, r6
+	adde	r0, r0, r7
+	adde	r0, r0, r8
+	adde	r0, r0, r9
+	adde	r0, r0, r10
+	adde	r0, r0, r11
+	lwz	r8, 0(r4)
+	lwz	r9, 4(r4)
+	lwz	r10, 8(r4)
+	lwz	r11, 12(r4)
+	adde	r0, r0, r8
+	adde	r0, r0, r9
+	adde	r0, r0, r10
+	adde	r0, r0, r11
+	addze	r0
+	rotlwi	r3, r0, 16
+	add	r3, r0, r3
+	not	r3, r3
+	rlwinm	r3, r3, 16, 16, 31
+	blr
+EXPORT_SYMBOL(csum_ipv6_magic)
-- 
2.13.3

^ permalink raw reply related

* Re: pkeys on POWER: Default AMR, UAMOR values
From: Andy Lutomirski @ 2018-05-18 14:35 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linuxppc-dev, Linux-MM, linuxram, Dave Hansen
In-Reply-To: <36b98132-d87f-9f75-f1a9-feee36ec8ee6@redhat.com>

On Fri, May 18, 2018 at 6:17 AM Florian Weimer <fweimer@redhat.com> wrote:

> I'm working on adding POWER pkeys support to glibc.  The coding work is
> done, but I'm faced with some test suite failures.

> Unlike the default x86 configuration, on POWER, existing threads have
> full access to newly allocated keys.

> Or, more precisely, in this scenario:

> * Thread A launches thread B
> * Thread B waits
> * Thread A allocations a protection key with pkey_alloc
> * Thread A applies the key to a page
> * Thread A signals thread B
> * Thread B starts to run and accesses the page

> Then at the end, the access will be granted.

> I hope it's not too late to change this to denied access.

> Furthermore, I think the UAMOR value is wrong as well because it
> prevents thread B at the end to set the AMR register.  In particular, if
> I do this

> * =E2=80=A6 (as before)
> * Thread A signals thread B
> * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS
> * Thread B reads the current access rights for the key

> then it still gets 0 (all access permitted) because the original UAMOR
> value inherited from thread A prior to the key allocation masks out the
> access right update for the newly allocated key.

This type of issue is why I think that a good protection key ISA would not
have a usermode read-the-whole-register or write-the-whole-register
operation at all.  It's still not clear to me that there is any good
kernel-mode solution.  But at least x86 defaults to deny-everything, which
is more annoying but considerably safer than POWER's behavior.

--Andy

^ permalink raw reply

* pkeys on POWER: Access rights not reset on execve
From: Florian Weimer @ 2018-05-18 14:27 UTC (permalink / raw)
  To: linuxppc-dev, linux-mm, Ram Pai, Dave Hansen, Andy Lutomirski

This test program:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <err.h>

/* Return the value of the AMR register.  */
static inline unsigned long int
pkey_read (void)
{
   unsigned long int result;
   __asm__ volatile ("mfspr %0, 13" : "=r" (result));
   return result;
}

/* Overwrite the AMR register with VALUE.  */
static inline void
pkey_write (unsigned long int value)
{
   __asm__ volatile ("mtspr 13, %0" : : "r" (value));
}

int
main (int argc, char **argv)
{
   printf ("AMR (PID %d): 0x%016lx\n", (int) getpid (), pkey_read());
   if (argc > 1)
     {
       int key = syscall (__NR_pkey_alloc, 0, 0);
       if (key < 0)
         err (1, "pkey_alloc");
       printf ("Allocated key (PID %d): %d\n", (int) getpid (), key);
       return 0;
     }

   pid_t pid = fork ();
   if (pid == 0)
     {
       execl ("/proc/self/exe", argv[0], "subprocess", NULL);
       _exit (1);
     }
   if (pid < 0)
     err (1, "fork");
   int status;
   if (waitpid (pid, &status, 0) < 0)
     err (1, "waitpid");

   int key = syscall (__NR_pkey_alloc, 0, 0);
   if (key < 0)
     err (1, "pkey_alloc");
   printf ("Allocated key (PID %d): %d\n", (int) getpid (), key);

   unsigned long int amr = -1;
   printf ("Setting AMR: 0x%016lx\n", amr);
   pkey_write (amr);
   printf ("New AMR value (PID %d, before execl): 0x%016lx\n",
           (int) getpid (), pkey_read());
   execl ("/proc/self/exe", argv[0], "subprocess", NULL);
   err (1, "exec");
   return 1;
}

shows that the AMR register value is not reset on execve:

AMR (PID 112291): 0x0000000000000000
AMR (PID 112292): 0x0000000000000000
Allocated key (PID 112292): 2
Allocated key (PID 112291): 2
Setting AMR: 0xffffffffffffffff
New AMR value (PID 112291, before execl): 0x0c00000000000000
AMR (PID 112291): 0x0c00000000000000
Allocated key (PID 112291): 2

I think this is a real bug and needs to be fixed even if the defaults 
are kept as-is (see the other thread).

(Seen on 4.17.0-rc5.)

Thanks,
Florian

^ permalink raw reply

* [GIT PULL] Please pull powerpc/linux.git powerpc-4.17-6 tag
From: Michael Ellerman @ 2018-05-18 13:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: fbarrat, felix, linux-kernel, linuxppc-dev, npiggin

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 4.17:

The following changes since commit 6c0a8f6b5a45ac892a763b6299bd3c5324fc5e02:

  powerpc/pseries: Fix CONFIG_NUMA=n build (2018-05-08 14:59:56 +1000)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.17-6

for you to fetch changes up to c1d2a31397ec51f0370f6bd17b19b39152c263cb:

  powerpc/powernv: Fix NVRAM sleep in invalid context when crashing (2018-05-18 00:23:07 +1000)

- ----------------------------------------------------------------
powerpc fixes for 4.17 #6

Just three commits.

The two cxl ones are not fixes per se, but they modify code that was added this
cycle so that it will work with a recent firmware change.

And then a fix for a recent commit that added sleeps in the NVRAM code, which
needs to be more careful and not sleep if eg. we're called in the panic() path.

Thanks to:
  Nicholas Piggin, Philippe Bergheaud, Christophe Lombard.

- ----------------------------------------------------------------
Nicholas Piggin (1):
      powerpc/powernv: Fix NVRAM sleep in invalid context when crashing

Philippe Bergheaud (2):
      cxl: Set the PBCQ Tunnel BAR register when enabling capi mode
      cxl: Report the tunneled operations status

 Documentation/ABI/testing/sysfs-class-cxl   |  8 ++++++++
 arch/powerpc/platforms/powernv/opal-nvram.c | 14 ++++++++++++--
 drivers/misc/cxl/cxl.h                      |  1 +
 drivers/misc/cxl/pci.c                      | 12 ++++++++++++
 drivers/misc/cxl/sysfs.c                    | 10 ++++++++++
 5 files changed, 43 insertions(+), 2 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCAAGBQJa/tj4AAoJEFHr6jzI4aWAZBoP/2/kddNPHzMwnzcx0B5WzEfo
3JQcTLZPEqsb5n0xwaWOgb0o0u8EpHDtPZlJlZZWwPKbphlZvQ+reAJAXCQXXj6P
n6BrbfpHlJo1J2wZVeihOuVTtSiXqq82YEtCzdUCUQNa8+2rQ7bLOB56cLdnyFWx
wvK/xOM4rG/nPDi3tlH+IvhNeUdc2Y1vCB6N/RVvJVJOduCED1zCxcWj7uJYGt1x
UdFPcah4OPSF9qPemReu6CjiwesvEm3y3S9TYLLJLXBDmKvs7BDRbfczP52tY90P
XMErJN4MtnNasotCJVcZylfNhSTCyyaQ4si+G/COAM0eh8oPdcIH3C2yXnQHXVPz
pUpe7TF3N7DOoa66WHXsh9dr9G7E0DhcpBgpz30/HmYasaYSEYpDsCH6UvbmvpzO
xtyocfgtuyaF28Rz4WVYSVGkJwE/+NhdsvfQUFI6DGQXoVu08Xxx3rhOlNKVecTw
8yuaIpPSz0i1pFg3SRa1MaS0F4FqGs0T3OLFsEihjdsc3EpJeB3Vp0IiqbmhR6Pa
1rhsjPY4yS5tPJS9K4CQszgw3Ke6C7KYIttGo7BxVKPLXsp52HhTqFpp0LHDcw8y
zFjOS7kh07NCgn8tSwMLUTRUUXEDLE2fJOVWCDdvkXEzpd/LfaP15gjZ3m66OMTS
Oj72EjBAGNX4dN4yQuo7
=4xib
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH 1/2] powerpc: Detect the presence of big-core with interleaved threads
From: Michael Ellerman @ 2018-05-18 13:21 UTC (permalink / raw)
  To: Gautham R. Shenoy, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Balbir Singh, Oliver O'Halloran, Nicholas Piggin
  Cc: linuxppc-dev, linux-kernel, Gautham R. Shenoy
In-Reply-To: <1526037444-22876-2-git-send-email-ego@linux.vnet.ibm.com>

"Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:

> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 0af5c11..884dff2 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -436,8 +438,56 @@ static void __init cpu_init_thread_core_maps(int tpc)
>  	printk(KERN_DEBUG " (thread shift is %d)\n", threads_shift);
>  }
>  
> -
>  u32 *cpu_to_phys_id = NULL;
> +/*
> + * check_for_interleaved_big_core - Checks if the core represented by
> + *	 dn is a big-core whose threads are interleavings of the
> + *	 threads of the component small cores.
> + *
> + * @dn: device node corresponding to the core.
> + *
> + * Returns true if the core is a interleaved big-core.
> + * Returns false otherwise.
> + */
> +static inline bool check_for_interleaved_big_core(struct device_node *dn)
> +{
> +	int len, nr_groups, threads_per_group;
> +	const __be32 *thread_groups;
> +	__be32 *thread_list, *first_cpu_idx;
> +	int cur_cpu, next_cpu, i, j;
> +
> +	thread_groups = of_get_property(dn, "ibm,thread-groups", &len);
> +	if (!thread_groups)
> +		return false;

There are better device tree APIs than bare of_get_property() these
days, can you try to use those?

> +	nr_groups = be32_to_cpu(*(thread_groups + 1));
> +	if (nr_groups <= 1)
> +		return false;

eg. this would be of_property_read_u32_index()

> +
> +	threads_per_group = be32_to_cpu(*(thread_groups + 2));
> +	thread_list = (__be32 *)thread_groups + 3;
> +
> +	/*
> +	 * In case of an interleaved big-core, the thread-ids of the
> +	 * big-core can be obtained by interleaving the the thread-ids
> +	 * of the component small
> +	 *
> +	 * Eg: On a 8-thread big-core with two SMT4 small cores, the
> +	 * threads of the two component small cores will be
> +	 * {0, 2, 4, 6} and {1, 3, 5, 7}.
> +	 */
> +	for (i = 0; i < nr_groups; i++) {
> +		first_cpu_idx = thread_list + i * threads_per_group;
> +
> +		for (j = 0; j < threads_per_group - 1; j++) {
> +			cur_cpu = be32_to_cpu(*(first_cpu_idx + j));
> +			next_cpu = be32_to_cpu(*(first_cpu_idx + j + 1));
> +			if (next_cpu != cur_cpu + nr_groups)
> +				return false;
> +		}
> +	}
> +	return true;
> +}
>  
>  /**
>   * setup_cpu_maps - initialize the following cpu maps:
> @@ -565,7 +615,16 @@ void __init smp_setup_cpu_maps(void)
>  	vdso_data->processorCount = num_present_cpus();
>  #endif /* CONFIG_PPC64 */
>  
> -        /* Initialize CPU <=> thread mapping/
> +	dn = of_find_node_by_type(NULL, "cpu");
> +	if (dn) {
> +		if (check_for_interleaved_big_core(dn)) {
> +			has_interleaved_big_core = true;
> +			pr_info("Detected interleaved big-cores\n");
> +		}
> +		of_node_put(dn);
> +	}

This is a bit untidy, given how unlikely it is that you would have no
CPUs :)

You should be able to do the lookup of the property and the setting of
has_interleaved_big_core all inside check_for_interleaved_big_core().

cheers

^ permalink raw reply

* pkeys on POWER: Default AMR, UAMOR values
From: Florian Weimer @ 2018-05-18 13:17 UTC (permalink / raw)
  To: linuxppc-dev, linux-mm, Ram Pai, Dave Hansen, Andy Lutomirski

I'm working on adding POWER pkeys support to glibc.  The coding work is 
done, but I'm faced with some test suite failures.

Unlike the default x86 configuration, on POWER, existing threads have 
full access to newly allocated keys.

Or, more precisely, in this scenario:

* Thread A launches thread B
* Thread B waits
* Thread A allocations a protection key with pkey_alloc
* Thread A applies the key to a page
* Thread A signals thread B
* Thread B starts to run and accesses the page

Then at the end, the access will be granted.

I hope it's not too late to change this to denied access.

Furthermore, I think the UAMOR value is wrong as well because it 
prevents thread B at the end to set the AMR register.  In particular, if 
I do this

* … (as before)
* Thread A signals thread B
* Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS
* Thread B reads the current access rights for the key

then it still gets 0 (all access permitted) because the original UAMOR 
value inherited from thread A prior to the key allocation masks out the 
access right update for the newly allocated key.

Thanks,
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox