LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
From: Laurent Dufour @ 2020-05-20 17:35 UTC (permalink / raw)
  To: Greg Kurz; +Cc: linuxram, linux-kernel, kvm-ppc, paulus, sukadev, linuxppc-dev
In-Reply-To: <20200520193259.0b66db32@bahia.lan>

Le 20/05/2020 à 19:32, Greg Kurz a écrit :
> On Wed, 20 May 2020 18:51:10 +0200
> Laurent Dufour <ldufour@linux.ibm.com> wrote:
> 
>> The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
>> Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
>> reserved to the Ultravisor.
>>
>> However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
>> context of the VM calling UV_ESM. This allows the Hypervisor to return to
>> the guest without going through the Ultravisor. Thus the Secure bit of SRR1
>> is not set in that particular case.
>>
>> In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
>> filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
>> not set in that case.
>>
> 
> Why not checking vcpu->kvm->arch.secure_guest then ?

I don't think that's the right place.
> 
>> Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>   arch/powerpc/kvm/book3s_hv.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 93493f0cbfe8..eb1f96cb7b72 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -1099,9 +1099,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>>   			ret = kvmppc_h_svm_init_done(vcpu->kvm);
>>   		break;
>>   	case H_SVM_INIT_ABORT:
>> -		ret = H_UNSUPPORTED;
>> -		if (kvmppc_get_srr1(vcpu) & MSR_S)
>> -			ret = kvmppc_h_svm_init_abort(vcpu->kvm);
> 
> or at least put a comment to explain why H_SVM_INIT_ABORT
> doesn't have the same sanity check as the other SVM hcalls.

I agree that might help. I'll send a v2 with a comment there.

> 
>> +		ret = kvmppc_h_svm_init_abort(vcpu->kvm);
>>   		break;
>>   
>>   	default:
> 


^ permalink raw reply

* Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Vaibhav Jain @ 2020-05-20 17:15 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Aneesh Kumar K . V, linuxppc-dev, linux-kernel, Steven Rostedt,
	linux-nvdimm
In-Reply-To: <20200520145430.GB3660833@iweiny-DESK2.sc.intel.com>


Thanks for reviewing this this patch Ira. My responses below:

Ira Weiny <ira.weiny@intel.com> writes:

> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
>> Implement support for fetching nvdimm health information via
>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
>> of 64-bit big-endian integers, bitwise-and of which is then stored in
>> 'struct papr_scm_priv' and subsequently partially exposed to
>> user-space via newly introduced dimm specific attribute
>> 'papr/flags'. Since the hcall is costly, the health information is
>> cached and only re-queried, 60s after the previous successful hcall.
>> 
>> The patch also adds a  documentation text describing flags reported by
>> the the new sysfs attribute 'papr/flags' is also introduced at
>> Documentation/ABI/testing/sysfs-bus-papr-scm.
>> 
>> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
>> PAPR hcalls")
>> 
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>> ---
>> Changelog:
>> 
>> Resend:
>> * None
>> 
>> v6..v7 :
>> * Used the exported buf_seq_printf() function to generate content for
>>   'papr/flags'
>> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
>>   and removed the papr_scm.h file [Mpe]
>> * Some minor consistency issued in sysfs-bus-papr-scm
>>   documentation. [Mpe]
>> * s/dimm_mutex/health_mutex/g [Mpe]
>> * Split drc_pmem_query_health() into two function one of which takes
>>   care of caching and locking. [Mpe]
>> * Fixed a local copy creation of dimm health information using
>>   READ_ONCE(). [Mpe]
>> 
>> v5..v6 :
>> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
>>   [Dan Williams]
>> * Include documentation for 'papr/flags' attr [Dan Williams]
>> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
>> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
>> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
>> * Replaced two __be64 integers from papr_scm_priv to a single u64
>>   integer [Mpe]
>> * Updated patch description to reflect the changes made in this
>>   version.
>> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
>>   flags_show() [Dan Williams]
>> 
>> v4..v5 :
>> * None
>> 
>> v3..v4 :
>> * None
>> 
>> v2..v3 :
>> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
>>        	 NVDIMM unarmed [Aneesh]
>> 
>> v1..v2 :
>> * New patch in the series.
>> ---
>>  Documentation/ABI/testing/sysfs-bus-papr-scm |  27 +++
>>  arch/powerpc/platforms/pseries/papr_scm.c    | 169 ++++++++++++++++++-
>>  2 files changed, 194 insertions(+), 2 deletions(-)
>>  create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
>> 
>> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm b/Documentation/ABI/testing/sysfs-bus-papr-scm
>> new file mode 100644
>> index 000000000000..6143d06072f1
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm
>> @@ -0,0 +1,27 @@
>> +What:		/sys/bus/nd/devices/nmemX/papr/flags
>> +Date:		Apr, 2020
>> +KernelVersion:	v5.8
>> +Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
>> +Description:
>> +		(RO) Report flags indicating various states of a
>> +		papr-scm NVDIMM device. Each flag maps to a one or
>> +		more bits set in the dimm-health-bitmap retrieved in
>> +		response to H_SCM_HEALTH hcall. The details of the bit
>> +		flags returned in response to this hcall is available
>> +		at 'Documentation/powerpc/papr_hcalls.rst' . Below are
>> +		the flags reported in this sysfs file:
>> +
>> +		* "not_armed"	: Indicates that NVDIMM contents will not
>> +				  survive a power cycle.
>> +		* "flush_fail"	: Indicates that NVDIMM contents
>> +				  couldn't be flushed during last
>> +				  shut-down event.
>> +		* "restore_fail": Indicates that NVDIMM contents
>> +				  couldn't be restored during NVDIMM
>> +				  initialization.
>> +		* "encrypted"	: NVDIMM contents are encrypted.
>> +		* "smart_notify": There is health event for the NVDIMM.
>> +		* "scrubbed"	: Indicating that contents of the
>> +				  NVDIMM have been scrubbed.
>> +		* "locked"	: Indicating that NVDIMM contents cant
>> +				  be modified until next power cycle.
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> index f35592423380..142636e1a59f 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -12,6 +12,7 @@
>>  #include <linux/libnvdimm.h>
>>  #include <linux/platform_device.h>
>>  #include <linux/delay.h>
>> +#include <linux/seq_buf.h>
>>  
>>  #include <asm/plpar_wrappers.h>
>>  
>> @@ -22,6 +23,44 @@
>>  	 (1ul << ND_CMD_GET_CONFIG_DATA) | \
>>  	 (1ul << ND_CMD_SET_CONFIG_DATA))
>>  
>> +/* DIMM health bitmap bitmap indicators */
>> +/* SCM device is unable to persist memory contents */
>> +#define PAPR_SCM_DIMM_UNARMED                   (1ULL << (63 - 0))
>> +/* SCM device failed to persist memory contents */
>> +#define PAPR_SCM_DIMM_SHUTDOWN_DIRTY            (1ULL << (63 - 1))
>> +/* SCM device contents are persisted from previous IPL */
>> +#define PAPR_SCM_DIMM_SHUTDOWN_CLEAN            (1ULL << (63 - 2))
>> +/* SCM device contents are not persisted from previous IPL */
>> +#define PAPR_SCM_DIMM_EMPTY                     (1ULL << (63 - 3))
>> +/* SCM device memory life remaining is critically low */
>> +#define PAPR_SCM_DIMM_HEALTH_CRITICAL           (1ULL << (63 - 4))
>> +/* SCM device will be garded off next IPL due to failure */
>> +#define PAPR_SCM_DIMM_HEALTH_FATAL              (1ULL << (63 - 5))
>> +/* SCM contents cannot persist due to current platform health status */
>> +#define PAPR_SCM_DIMM_HEALTH_UNHEALTHY          (1ULL << (63 - 6))
>> +/* SCM device is unable to persist memory contents in certain conditions */
>> +#define PAPR_SCM_DIMM_HEALTH_NON_CRITICAL       (1ULL << (63 - 7))
>> +/* SCM device is encrypted */
>> +#define PAPR_SCM_DIMM_ENCRYPTED                 (1ULL << (63 - 8))
>> +/* SCM device has been scrubbed and locked */
>> +#define PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED       (1ULL << (63 - 9))
>> +
>> +/* Bits status indicators for health bitmap indicating unarmed dimm */
>> +#define PAPR_SCM_DIMM_UNARMED_MASK (PAPR_SCM_DIMM_UNARMED |		\
>> +				    PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
>> +
>> +/* Bits status indicators for health bitmap indicating unflushed dimm */
>> +#define PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK (PAPR_SCM_DIMM_SHUTDOWN_DIRTY)
>> +
>> +/* Bits status indicators for health bitmap indicating unrestored dimm */
>> +#define PAPR_SCM_DIMM_BAD_RESTORE_MASK  (PAPR_SCM_DIMM_EMPTY)
>> +
>> +/* Bit status indicators for smart event notification */
>> +#define PAPR_SCM_DIMM_SMART_EVENT_MASK (PAPR_SCM_DIMM_HEALTH_CRITICAL | \
>> +					PAPR_SCM_DIMM_HEALTH_FATAL |	\
>> +					PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
>> +
>> +/* private struct associated with each region */
>>  struct papr_scm_priv {
>>  	struct platform_device *pdev;
>>  	struct device_node *dn;
>> @@ -39,6 +78,15 @@ struct papr_scm_priv {
>>  	struct resource res;
>>  	struct nd_region *region;
>>  	struct nd_interleave_set nd_set;
>> +
>> +	/* Protect dimm health data from concurrent read/writes */
>> +	struct mutex health_mutex;
>> +
>> +	/* Last time the health information of the dimm was updated */
>> +	unsigned long lasthealth_jiffies;
>> +
>> +	/* Health information for the dimm */
>> +	u64 health_bitmap;
>
> I wonder if this should be typed big endian as you mention that it is in the
> commit message?
This was discussed in an earlier review of the patch series at
https://lore.kernel.org/linux-nvdimm/878sjetcis.fsf@mpe.ellerman.id.au

Even though health bitmap is returned in big endian format (For ex
value 0xC00000000000000 indicates bits 0,1 set), its value is never
used. Instead only test for specific bits being set in the register is
done.

Hence using native cpu type instead of __be64 to store this value.

>
>>  };
>>  
>>  static int drc_pmem_bind(struct papr_scm_priv *p)
>> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>>  	return drc_pmem_bind(p);
>>  }
>>  
>> +/*
>> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
>> + * health information.
>> + */
>> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> +{
>> +	unsigned long ret[PLPAR_HCALL_BUFSIZE];
>
> Is this exclusive to 64bit?  Why not u64?
Yes this is specific to 64 bit as the array holds 64 bit register values
returned from PHYP. Can u64 but here that will be a departure from existing
practice within arch/powerpc code to use an unsigned long array to fetch
returned values for PHYP.

>
>> +	s64 rc;
>
> plpar_hcall() returns long and this function returns int and rc is declared
> s64?
>
> Why not have them all be long to follow plpar_hcall?
Yes 'long' type is better suited for variable 'rc' and I will get it fixed.

But the value of variable 'rc' is never directly returned from this
function, we always return kernel error codes instead. Hence the
return type of this function is consistent.

>
>> +
>> +	/* issue the hcall */
>> +	rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
>> +	if (rc != H_SUCCESS) {
>> +		dev_err(&p->pdev->dev,
>> +			 "Failed to query health information, Err:%lld\n", rc);
>> +		rc = -ENXIO;
>> +		goto out;
>> +	}
>> +
>> +	p->lasthealth_jiffies = jiffies;
>> +	p->health_bitmap = ret[0] & ret[1];
>> +
>> +	dev_dbg(&p->pdev->dev,
>> +		"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
>> +		ret[0], ret[1]);
>> +out:
>> +	return rc;
>> +}
>> +
>> +/* Min interval in seconds for assuming stable dimm health */
>> +#define MIN_HEALTH_QUERY_INTERVAL 60
>> +
>> +/* Query cached health info and if needed call drc_pmem_query_health */
>> +static int drc_pmem_query_health(struct papr_scm_priv *p)
>> +{
>> +	unsigned long cache_timeout;
>> +	s64 rc;
>> +
>> +	/* Protect concurrent modifications to papr_scm_priv */
>> +	rc = mutex_lock_interruptible(&p->health_mutex);
>> +	if (rc)
>> +		return rc;
>> +
>> +	/* Jiffies offset for which the health data is assumed to be same */
>> +	cache_timeout = p->lasthealth_jiffies +
>> +		msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
>> +
>> +	/* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
>> +	if (time_after(jiffies, cache_timeout))
>> +		rc = __drc_pmem_query_health(p);
>
> And back to s64 after returning int?
Agree, will change 's64 rc' to 'int rc'.

>
>> +	else
>> +		/* Assume cached health data is valid */
>> +		rc = 0;
>> +
>> +	mutex_unlock(&p->health_mutex);
>> +	return rc;
>> +}
>>  
>>  static int papr_scm_meta_get(struct papr_scm_priv *p,
>>  			     struct nd_cmd_get_config_data_hdr *hdr)
>> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>>  	return 0;
>>  }
>>  
>> +static ssize_t flags_show(struct device *dev,
>> +				struct device_attribute *attr, char *buf)
>> +{
>> +	struct nvdimm *dimm = to_nvdimm(dev);
>> +	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>> +	struct seq_buf s;
>> +	u64 health;
>> +	int rc;
>> +
>> +	rc = drc_pmem_query_health(p);
>
> and back to int...
>
drc_pmem_query_health() returns an 'int' so the type of variable 'rc'
looks correct to me.

> Just make them long all through...
I think the return type for above all functions is 'int' with
an issue in drc_pmem_query_health() that you pointed out.

With that fixed the usage of 'int' return type for functions will become
consistent.

>
> Ira
>
>> +	if (rc)
>> +		return rc;
>> +
>> +	/* Copy health_bitmap locally, check masks & update out buffer */
>> +	health = READ_ONCE(p->health_bitmap);
>> +
>> +	seq_buf_init(&s, buf, PAGE_SIZE);
>> +	if (health & PAPR_SCM_DIMM_UNARMED_MASK)
>> +		seq_buf_printf(&s, "not_armed ");
>> +
>> +	if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
>> +		seq_buf_printf(&s, "flush_fail ");
>> +
>> +	if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
>> +		seq_buf_printf(&s, "restore_fail ");
>> +
>> +	if (health & PAPR_SCM_DIMM_ENCRYPTED)
>> +		seq_buf_printf(&s, "encrypted ");
>> +
>> +	if (health & PAPR_SCM_DIMM_SMART_EVENT_MASK)
>> +		seq_buf_printf(&s, "smart_notify ");
>> +
>> +	if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED)
>> +		seq_buf_printf(&s, "scrubbed locked ");
>> +
>> +	if (seq_buf_used(&s))
>> +		seq_buf_printf(&s, "\n");
>> +
>> +	return seq_buf_used(&s);
>> +}
>> +DEVICE_ATTR_RO(flags);
>> +
>> +/* papr_scm specific dimm attributes */
>> +static struct attribute *papr_scm_nd_attributes[] = {
>> +	&dev_attr_flags.attr,
>> +	NULL,
>> +};
>> +
>> +static struct attribute_group papr_scm_nd_attribute_group = {
>> +	.name = "papr",
>> +	.attrs = papr_scm_nd_attributes,
>> +};
>> +
>> +static const struct attribute_group *papr_scm_dimm_attr_groups[] = {
>> +	&papr_scm_nd_attribute_group,
>> +	NULL,
>> +};
>> +
>>  static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>  {
>>  	struct device *dev = &p->pdev->dev;
>> @@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>  	dimm_flags = 0;
>>  	set_bit(NDD_LABELING, &dimm_flags);
>>  
>> -	p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
>> -				  PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
>> +	p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups,
>> +				  dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
>>  	if (!p->nvdimm) {
>>  		dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
>>  		goto err;
>> @@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
>>  	if (!p)
>>  		return -ENOMEM;
>>  
>> +	/* Initialize the dimm mutex */
>> +	mutex_init(&p->health_mutex);
>> +
>>  	/* optional DT properties */
>>  	of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
>>  
>> -- 
>> 2.26.2
>> _______________________________________________
>> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
>> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

-- 
Cheers
~ Vaibhav

^ permalink raw reply

* Re: [PATCH] input: i8042: Remove special PowerPC handling
From: Dmitry Torokhov @ 2020-05-20 17:16 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: kbuild test robot, linux-kernel, clang-built-linux,
	Paul Mackerras, linux-input, Nathan Chancellor, linuxppc-dev
In-Reply-To: <87ftbv87i3.fsf@mpe.ellerman.id.au>

Hi Michael,

On Wed, May 20, 2020 at 04:07:00PM +1000, Michael Ellerman wrote:
> [ + Dmitry & linux-input ]
> 
> Nathan Chancellor <natechancellor@gmail.com> writes:
> > This causes a build error with CONFIG_WALNUT because kb_cs and kb_data
> > were removed in commit 917f0af9e5a9 ("powerpc: Remove arch/ppc and
> > include/asm-ppc").
> >
> > ld.lld: error: undefined symbol: kb_cs
> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >
> > ld.lld: error: undefined symbol: kb_data
> >> referenced by i8042.c:309 (drivers/input/serio/i8042.c:309)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced by i8042-ppcio.h:33 (drivers/input/serio/i8042-ppcio.h:33)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced by i8042.c:319 (drivers/input/serio/i8042.c:319)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced 15 more times
> >
> > Presumably since nobody has noticed this for the last 12 years, there is
> > not anyone actually trying to use this driver so we can just remove this
> > special walnut code and use the generic header so it builds for all
> > configurations.
> >
> > Fixes: 917f0af9e5a9 ("powerpc: Remove arch/ppc and include/asm-ppc")
> > Reported-by: kbuild test robot <lkp@intel.com>
> > Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> > ---
> >  drivers/input/serio/i8042-ppcio.h | 57 -------------------------------
> >  drivers/input/serio/i8042.h       |  2 --
> >  2 files changed, 59 deletions(-)
> >  delete mode 100644 drivers/input/serio/i8042-ppcio.h
> 
> This LGTM.
> 
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
> 
> I assumed drivers/input/serio would be pretty quiet, but there's
> actually some commits to it in linux-next. So perhaps this should go via
> the input tree.
> 
> Dmitry do you want to take this, or should I take it via powerpc?
> 
> Original patch is here:
>   https://lore.kernel.org/lkml/20200518181043.3363953-1-natechancellor@gmail.com

I'm fine with you taking it through powerpc.

Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Also, while I have your attention ;), could you please ack or take
https://lore.kernel.org/lkml/20191002214854.GA114387@dtor-ws/ as I
believe this is the last user or input_polled_dev API and I would like
to drop it from the tree.

Thanks!

-- 
Dmitry

^ permalink raw reply

* Re: [RESEND PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules
From: Christoph Hellwig @ 2020-05-20 17:01 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Cezary Rojewski, linux-nvdimm, linux-kernel, Steven Rostedt,
	Piotr Maziarz, Borislav Petkov, Aneesh Kumar K . V, linuxppc-dev
In-Reply-To: <20200519190058.257981-3-vaibhav@linux.ibm.com>

s/seq_buf: Export seq_buf_printf() to external modules/
  seq_buf: export seq_buf_printf/

^ permalink raw reply

* [PATCH] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
From: Laurent Dufour @ 2020-05-20 16:51 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev, linux-kernel, paulus; +Cc: sukadev, linuxram

The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
reserved to the Ultravisor.

However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
context of the VM calling UV_ESM. This allows the Hypervisor to return to
the guest without going through the Ultravisor. Thus the Secure bit of SRR1
is not set in that particular case.

In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
not set in that case.

Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 93493f0cbfe8..eb1f96cb7b72 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1099,9 +1099,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 			ret = kvmppc_h_svm_init_done(vcpu->kvm);
 		break;
 	case H_SVM_INIT_ABORT:
-		ret = H_UNSUPPORTED;
-		if (kvmppc_get_srr1(vcpu) & MSR_S)
-			ret = kvmppc_h_svm_init_abort(vcpu->kvm);
+		ret = kvmppc_h_svm_init_abort(vcpu->kvm);
 		break;

 	default:
-- 
2.26.2

^ permalink raw reply related

* [PATCH] net/ethernet/freescale: rework quiesce/activate for ucc_geth
From: Valentin Longchamp @ 2020-05-20 15:53 UTC (permalink / raw)
  To: linuxppc-dev, netdev, kuba, davem, hkallweit1
  Cc: Matteo Ghidoni, Valentin Longchamp

ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.

The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].

A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.

The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.

[1] https://lists.openwall.net/netdev/2020/01/08/201

Signed-off-by: Valentin Longchamp <valentin@longchamp.me>
Acked-by: Matteo Ghidoni <matteo.ghidoni@ch.abb.com>
---
 drivers/net/ethernet/freescale/ucc_geth.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index 6e5f6dd169b5..552e7554a9f8 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -42,6 +42,7 @@
 #include <soc/fsl/qe/ucc.h>
 #include <soc/fsl/qe/ucc_fast.h>
 #include <asm/machdep.h>
+#include <net/sch_generic.h>
 
 #include "ucc_geth.h"
 
@@ -1548,11 +1549,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, enum comm_dir mode)
 
 static void ugeth_quiesce(struct ucc_geth_private *ugeth)
 {
-	/* Prevent any further xmits, plus detach the device. */
-	netif_device_detach(ugeth->ndev);
-
-	/* Wait for any current xmits to finish. */
-	netif_tx_disable(ugeth->ndev);
+	/* Prevent any further xmits */
+	netif_tx_stop_all_queues(ugeth->ndev);
 
 	/* Disable the interrupt to avoid NAPI rescheduling. */
 	disable_irq(ugeth->ug_info->uf_info.irq);
@@ -1565,7 +1563,10 @@ static void ugeth_activate(struct ucc_geth_private *ugeth)
 {
 	napi_enable(&ugeth->napi);
 	enable_irq(ugeth->ug_info->uf_info.irq);
-	netif_device_attach(ugeth->ndev);
+
+	/* allow to xmit again  */
+	netif_tx_wake_all_queues(ugeth->ndev);
+	__netdev_watchdog_up(ugeth->ndev);
 }
 
 /* Called every time the controller might need to be made
-- 
2.25.1


^ permalink raw reply related

* Re: [RESEND PATCH v7 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
From: Ira Weiny @ 2020-05-20 15:32 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: linux-nvdimm, linux-kernel, Steven Rostedt, Aneesh Kumar K . V,
	linuxppc-dev
In-Reply-To: <20200519190058.257981-5-vaibhav@linux.ibm.com>

On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
> modules and add the command family to the white list of NVDIMM command
> sets. Also advertise support for ND_CMD_CALL for the dimm
> command mask and implement necessary scaffolding in the module to
> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
> 
> The layout of the PDSM request as we expect from libnvdimm/libndctl is
> described in newly introduced uapi header 'papr_scm_pdsm.h' which
> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
> to communicate the PDSM request via member
> 'nd_pkg_papr_scm->nd_command' and size of payload that need to be
> sent/received for servicing the PDSM.
> 
> A new function is_cmd_valid() is implemented that reads the args to
> papr_scm_ndctl() and performs sanity tests on them. A new function
> papr_scm_service_pdsm() is introduced and is called from
> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
> command from libnvdimm.
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog:
> 
> Resend:
> * None
> 
> v6..v7 :
> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>   [Mpe].
> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>   [Mpe].
> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
> 
> v5..v6 :
> * Changed the usage of the term DSM to PDSM to distinguish it from the
>   ACPI term [ Dan Williams ]
> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
>   to reflect the new terminology.
> * Updated the patch description and title to reflect the new terminology.
> * Squashed patch to introduce new command family in 'ndctl.h' with
>   this patch [ Dan Williams ]
> * Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
>   [ Dan Williams ]
> * Removed redundant license text from the papr_scm_psdm.h file.
>   [ Dan Williams ]
> * s/envelop/envelope/ at various places [ Dan Williams ]
> * Added '__packed' attribute to command package header to gaurd
>   against different compiler adding paddings between the fields.
>   [ Dan Williams]
> * Converted various pr_debug to dev_debug [ Dan Williams ]
> 
> v4..v5 :
> * None
> 
> v3..v4 :
> * None
> 
> v2..v3 :
> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
> 
> v1..v2 :
> * None
> ---
>  arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 134 ++++++++++++++++++
>  arch/powerpc/platforms/pseries/papr_scm.c     | 101 ++++++++++++-
>  include/uapi/linux/ndctl.h                    |   1 +
>  3 files changed, 230 insertions(+), 6 deletions(-)
>  create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
> new file mode 100644
> index 000000000000..671693439c1c
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
> @@ -0,0 +1,134 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl
> + *
> + * (C) Copyright IBM 2020
> + *
> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
> +#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
> +
> +#include <linux/types.h>
> +
> +/*
> + * PDSM Envelope:
> + *
> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> + * 'envelopes' which consists of a header and user-defined payload sections.
> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> + * payload following it and offset of which relative to the struct is provided
> + * by 'nd_pdsm_cmd_pkg.payload_offset'. *
> + *
> + *  +-------------+---------------------+---------------------------+
> + *  |   64-Bytes  |       8-Bytes       |       Max 184-Bytes       |
> + *  +-------------+---------------------+---------------------------+
> + *  |               nd_pdsm_cmd_pkg |                           |
> + *  |-------------+                     |                           |
> + *  |  nd_cmd_pkg |                     |                           |
> + *  +-------------+---------------------+---------------------------+
> + *  | nd_family   |			|			    |
> + *  | nd_size_out | cmd_status          |			    |
> + *  | nd_size_in  | payload_version     |      PAYLOAD		    |
> + *  | nd_command  | payload_offset ----->			    |
> + *  | nd_fw_size  |                     |			    |
> + *  +-------------+---------------------+---------------------------+
> + *
> + * PDSM Header:
> + *
> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
> + * contained in 'struct nd_cmd_pkg', the header also has members following
> + * members:
> + *
> + * 'cmd_status'		: (Out) Errors if any encountered while servicing PDSM.
> + * 'payload_version'	: (In/Out) Version number associated with the payload.
> + * 'payload_offset'	: (In)Relative offset of payload from start of envelope.
> + *
> + * PDSM Payload:
> + *
> + * The layout of the PDSM Payload is defined by various structs shared between
> + * papr_scm and libndctl so that contents of payload can be interpreted. During
> + * servicing of a PDSM the papr_scm module will read input args from the payload
> + * field by casting its contents to an appropriate struct pointer based on the
> + * PDSM command. Similarly the output of servicing the PDSM command will be
> + * copied to the payload field using the same struct.
> + *
> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
> + * leaves around 184 bytes for the envelope payload (ignoring any padding that
> + * the compiler may silently introduce).
> + *
> + * Payload Version:
> + *
> + * A 'payload_version' field is present in PDSM header that indicates a specific
> + * version of the structure present in PDSM Payload for a given PDSM command.
> + * This provides backward compatibility in case the PDSM Payload structure
> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
> + *
> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
> + * module when servicing the PDSM envelope checks the 'payload_version' and then
> + * uses 'payload struct version' == MIN('payload_version field',
> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
> + * struct in returned 'payload_version' field.

FWIW many people believe using a size rather than version is more sustainable.
It is expected that new payload structures are larger (more features) than the
previous payload structure.

I can't find references at the moment through.

What does payload_version provide us that the command size in/out does not?

> + *
> + * Libndctl on receiving the envelope back from papr_scm again checks the
> + * 'payload_version' field and based on it use the appropriate version dsm
> + * struct to parse the results.
> + *
> + * Backward Compatibility:
> + *
> + * Above scheme of exchanging different versioned PDSM struct between libndctl
> + * and papr_scm should provide backward compatibility until following two
> + * assumptions/conditions when defining new PDSM structs hold:
> + *
> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> + *
> + * 1. T(X) is a proper subset of T(Y) if X > Y.

Proper superset?  Or Y > X?

Ira

> + *    i.e Each new version of PDSM struct should retain existing struct
> + *    attributes from previous version
> + *
> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> + *    it should also support T(1), T(2)...T(X - 1).
> + *    i.e When adding support for new version of a PDSM struct, libndctl
> + *    and papr_scm should retain support of the existing PDSM struct
> + *    version they support.
> + */
> +
> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> +struct nd_pdsm_cmd_pkg {
> +	struct nd_cmd_pkg hdr;	/* Package header containing sub-cmd */
> +	__s32 cmd_status;	/* Out: Sub-cmd status returned back */
> +	__u16 payload_offset;	/* In: offset from start of struct */
> +	__u16 payload_version;	/* In/Out: version of the payload */
> +	__u8 payload[];		/* In/Out: Sub-cmd data buffer */
> +} __packed;
> +
> +/*
> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> + */
> +enum papr_scm_pdsm {
> +	PAPR_SCM_PDSM_MIN = 0x0,
> +	PAPR_SCM_PDSM_MAX,
> +};
> +
> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> +{
> +	return (struct nd_pdsm_cmd_pkg *) cmd;
> +}
> +
> +/* Return the payload pointer for a given pcmd */
> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> +{
> +	if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> +		return NULL;
> +	else
> +		return (void *)((__u8 *) pcmd + pcmd->payload_offset);
> +}
> +
> +#endif /* _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ */
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 142636e1a59f..ed4b49a6f1e1 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -15,13 +15,15 @@
>  #include <linux/seq_buf.h>
>  
>  #include <asm/plpar_wrappers.h>
> +#include <asm/papr_scm_pdsm.h>
>  
>  #define BIND_ANY_ADDR (~0ul)
>  
>  #define PAPR_SCM_DIMM_CMD_MASK \
>  	((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>  	 (1ul << ND_CMD_GET_CONFIG_DATA) | \
> -	 (1ul << ND_CMD_SET_CONFIG_DATA))
> +	 (1ul << ND_CMD_SET_CONFIG_DATA) | \
> +	 (1ul << ND_CMD_CALL))
>  
>  /* DIMM health bitmap bitmap indicators */
>  /* SCM device is unable to persist memory contents */
> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>  	return 0;
>  }
>  
> +/*
> + * Validate the inputs args to dimm-control function and return '0' if valid.
> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
> + */
> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> +		       unsigned int buf_len)
> +{
> +	unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> +	struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> +	struct papr_scm_priv *p;
> +
> +	/* Only dimm-specific calls are supported atm */
> +	if (!nvdimm)
> +		return -EINVAL;
> +
> +	/* get the provider date from struct nvdimm */
> +	p = nvdimm_provider_data(nvdimm);
> +
> +	if (!test_bit(cmd, &cmd_mask)) {
> +		dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> +		return -EINVAL;
> +	} else if (cmd == ND_CMD_CALL) {
> +
> +		/* Verify the envelope package */
> +		if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> +			dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> +				buf_len);
> +			return -EINVAL;
> +		}
> +
> +		/* Verify that the PDSM family is valid */
> +		if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR_SCM) {
> +			dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
> +				pkg->hdr.nd_family);
> +			return -EINVAL;
> +
> +		}
> +
> +		/* We except a payload with all PDSM commands */
> +		if (pdsm_cmd_to_payload(pkg) == NULL) {
> +			dev_dbg(&p->pdev->dev,
> +				"Empty payload for sub-command=0x%llx\n",
> +				pkg->hdr.nd_command);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	/* Command looks valid */
> +	return 0;
> +}
> +
> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> +				struct nd_pdsm_cmd_pkg *call_pkg)
> +{
> +	/* unknown subcommands return error in packages */
> +	if (call_pkg->hdr.nd_command <= PAPR_SCM_PDSM_MIN ||
> +	    call_pkg->hdr.nd_command >= PAPR_SCM_PDSM_MAX) {
> +		dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
> +			call_pkg->hdr.nd_command);
> +		call_pkg->cmd_status = -EINVAL;
> +		return 0;
> +	}
> +
> +	/* Depending on the DSM command call appropriate service routine */
> +	switch (call_pkg->hdr.nd_command) {
> +	default:
> +		dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> +			call_pkg->hdr.nd_command);
> +		call_pkg->cmd_status = -ENOENT;
> +		return 0;
> +	}
> +}
> +
>  static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  			  struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>  			  unsigned int buf_len, int *cmd_rc)
>  {
>  	struct nd_cmd_get_config_size *get_size_hdr;
>  	struct papr_scm_priv *p;
> +	struct nd_pdsm_cmd_pkg *call_pkg = NULL;
> +	int rc;
>  
> -	/* Only dimm-specific calls are supported atm */
> -	if (!nvdimm)
> -		return -EINVAL;
> +	/* Use a local variable in case cmd_rc pointer is NULL */
> +	if (cmd_rc == NULL)
> +		cmd_rc = &rc;
> +
> +	*cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
> +	if (*cmd_rc) {
> +		pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
> +		return *cmd_rc;
> +	}
>  
>  	p = nvdimm_provider_data(nvdimm);
>  
> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  		*cmd_rc = papr_scm_meta_set(p, buf);
>  		break;
>  
> +	case ND_CMD_CALL:
> +		call_pkg = nd_to_pdsm_cmd_pkg(buf);
> +		*cmd_rc = papr_scm_service_pdsm(p, call_pkg);
> +		break;
> +
>  	default:
> -		return -EINVAL;
> +		dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
> +		*cmd_rc = -EINVAL;
>  	}
>  
>  	dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>  
> -	return 0;
> +	return *cmd_rc;
>  }
>  
>  static ssize_t flags_show(struct device *dev,
> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> index de5d90212409..99fb60600ef8 100644
> --- a/include/uapi/linux/ndctl.h
> +++ b/include/uapi/linux/ndctl.h
> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
>  #define NVDIMM_FAMILY_HPE2 2
>  #define NVDIMM_FAMILY_MSFT 3
>  #define NVDIMM_FAMILY_HYPERV 4
> +#define NVDIMM_FAMILY_PAPR_SCM 5
>  
>  #define ND_IOCTL_CALL			_IOWR(ND_IOCTL, ND_CMD_CALL,\
>  					struct nd_cmd_pkg)
> -- 
> 2.26.2
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply

* Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Ira Weiny @ 2020-05-20 14:54 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: linux-nvdimm, linux-kernel, Steven Rostedt, Aneesh Kumar K . V,
	linuxppc-dev
In-Reply-To: <20200519190058.257981-4-vaibhav@linux.ibm.com>

On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
> Implement support for fetching nvdimm health information via
> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
> of 64-bit big-endian integers, bitwise-and of which is then stored in
> 'struct papr_scm_priv' and subsequently partially exposed to
> user-space via newly introduced dimm specific attribute
> 'papr/flags'. Since the hcall is costly, the health information is
> cached and only re-queried, 60s after the previous successful hcall.
> 
> The patch also adds a  documentation text describing flags reported by
> the the new sysfs attribute 'papr/flags' is also introduced at
> Documentation/ABI/testing/sysfs-bus-papr-scm.
> 
> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
> PAPR hcalls")
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Changelog:
> 
> Resend:
> * None
> 
> v6..v7 :
> * Used the exported buf_seq_printf() function to generate content for
>   'papr/flags'
> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
>   and removed the papr_scm.h file [Mpe]
> * Some minor consistency issued in sysfs-bus-papr-scm
>   documentation. [Mpe]
> * s/dimm_mutex/health_mutex/g [Mpe]
> * Split drc_pmem_query_health() into two function one of which takes
>   care of caching and locking. [Mpe]
> * Fixed a local copy creation of dimm health information using
>   READ_ONCE(). [Mpe]
> 
> v5..v6 :
> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
>   [Dan Williams]
> * Include documentation for 'papr/flags' attr [Dan Williams]
> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
> * Replaced two __be64 integers from papr_scm_priv to a single u64
>   integer [Mpe]
> * Updated patch description to reflect the changes made in this
>   version.
> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
>   flags_show() [Dan Williams]
> 
> v4..v5 :
> * None
> 
> v3..v4 :
> * None
> 
> v2..v3 :
> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
>        	 NVDIMM unarmed [Aneesh]
> 
> v1..v2 :
> * New patch in the series.
> ---
>  Documentation/ABI/testing/sysfs-bus-papr-scm |  27 +++
>  arch/powerpc/platforms/pseries/papr_scm.c    | 169 ++++++++++++++++++-
>  2 files changed, 194 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm b/Documentation/ABI/testing/sysfs-bus-papr-scm
> new file mode 100644
> index 000000000000..6143d06072f1
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm
> @@ -0,0 +1,27 @@
> +What:		/sys/bus/nd/devices/nmemX/papr/flags
> +Date:		Apr, 2020
> +KernelVersion:	v5.8
> +Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
> +Description:
> +		(RO) Report flags indicating various states of a
> +		papr-scm NVDIMM device. Each flag maps to a one or
> +		more bits set in the dimm-health-bitmap retrieved in
> +		response to H_SCM_HEALTH hcall. The details of the bit
> +		flags returned in response to this hcall is available
> +		at 'Documentation/powerpc/papr_hcalls.rst' . Below are
> +		the flags reported in this sysfs file:
> +
> +		* "not_armed"	: Indicates that NVDIMM contents will not
> +				  survive a power cycle.
> +		* "flush_fail"	: Indicates that NVDIMM contents
> +				  couldn't be flushed during last
> +				  shut-down event.
> +		* "restore_fail": Indicates that NVDIMM contents
> +				  couldn't be restored during NVDIMM
> +				  initialization.
> +		* "encrypted"	: NVDIMM contents are encrypted.
> +		* "smart_notify": There is health event for the NVDIMM.
> +		* "scrubbed"	: Indicating that contents of the
> +				  NVDIMM have been scrubbed.
> +		* "locked"	: Indicating that NVDIMM contents cant
> +				  be modified until next power cycle.
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index f35592423380..142636e1a59f 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -12,6 +12,7 @@
>  #include <linux/libnvdimm.h>
>  #include <linux/platform_device.h>
>  #include <linux/delay.h>
> +#include <linux/seq_buf.h>
>  
>  #include <asm/plpar_wrappers.h>
>  
> @@ -22,6 +23,44 @@
>  	 (1ul << ND_CMD_GET_CONFIG_DATA) | \
>  	 (1ul << ND_CMD_SET_CONFIG_DATA))
>  
> +/* DIMM health bitmap bitmap indicators */
> +/* SCM device is unable to persist memory contents */
> +#define PAPR_SCM_DIMM_UNARMED                   (1ULL << (63 - 0))
> +/* SCM device failed to persist memory contents */
> +#define PAPR_SCM_DIMM_SHUTDOWN_DIRTY            (1ULL << (63 - 1))
> +/* SCM device contents are persisted from previous IPL */
> +#define PAPR_SCM_DIMM_SHUTDOWN_CLEAN            (1ULL << (63 - 2))
> +/* SCM device contents are not persisted from previous IPL */
> +#define PAPR_SCM_DIMM_EMPTY                     (1ULL << (63 - 3))
> +/* SCM device memory life remaining is critically low */
> +#define PAPR_SCM_DIMM_HEALTH_CRITICAL           (1ULL << (63 - 4))
> +/* SCM device will be garded off next IPL due to failure */
> +#define PAPR_SCM_DIMM_HEALTH_FATAL              (1ULL << (63 - 5))
> +/* SCM contents cannot persist due to current platform health status */
> +#define PAPR_SCM_DIMM_HEALTH_UNHEALTHY          (1ULL << (63 - 6))
> +/* SCM device is unable to persist memory contents in certain conditions */
> +#define PAPR_SCM_DIMM_HEALTH_NON_CRITICAL       (1ULL << (63 - 7))
> +/* SCM device is encrypted */
> +#define PAPR_SCM_DIMM_ENCRYPTED                 (1ULL << (63 - 8))
> +/* SCM device has been scrubbed and locked */
> +#define PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED       (1ULL << (63 - 9))
> +
> +/* Bits status indicators for health bitmap indicating unarmed dimm */
> +#define PAPR_SCM_DIMM_UNARMED_MASK (PAPR_SCM_DIMM_UNARMED |		\
> +				    PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
> +
> +/* Bits status indicators for health bitmap indicating unflushed dimm */
> +#define PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK (PAPR_SCM_DIMM_SHUTDOWN_DIRTY)
> +
> +/* Bits status indicators for health bitmap indicating unrestored dimm */
> +#define PAPR_SCM_DIMM_BAD_RESTORE_MASK  (PAPR_SCM_DIMM_EMPTY)
> +
> +/* Bit status indicators for smart event notification */
> +#define PAPR_SCM_DIMM_SMART_EVENT_MASK (PAPR_SCM_DIMM_HEALTH_CRITICAL | \
> +					PAPR_SCM_DIMM_HEALTH_FATAL |	\
> +					PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
> +
> +/* private struct associated with each region */
>  struct papr_scm_priv {
>  	struct platform_device *pdev;
>  	struct device_node *dn;
> @@ -39,6 +78,15 @@ struct papr_scm_priv {
>  	struct resource res;
>  	struct nd_region *region;
>  	struct nd_interleave_set nd_set;
> +
> +	/* Protect dimm health data from concurrent read/writes */
> +	struct mutex health_mutex;
> +
> +	/* Last time the health information of the dimm was updated */
> +	unsigned long lasthealth_jiffies;
> +
> +	/* Health information for the dimm */
> +	u64 health_bitmap;

I wonder if this should be typed big endian as you mention that it is in the
commit message?

>  };
>  
>  static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>  	return drc_pmem_bind(p);
>  }
>  
> +/*
> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
> + * health information.
> + */
> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
> +{
> +	unsigned long ret[PLPAR_HCALL_BUFSIZE];

Is this exclusive to 64bit?  Why not u64?

> +	s64 rc;

plpar_hcall() returns long and this function returns int and rc is declared
s64?

Why not have them all be long to follow plpar_hcall?

> +
> +	/* issue the hcall */
> +	rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
> +	if (rc != H_SUCCESS) {
> +		dev_err(&p->pdev->dev,
> +			 "Failed to query health information, Err:%lld\n", rc);
> +		rc = -ENXIO;
> +		goto out;
> +	}
> +
> +	p->lasthealth_jiffies = jiffies;
> +	p->health_bitmap = ret[0] & ret[1];
> +
> +	dev_dbg(&p->pdev->dev,
> +		"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> +		ret[0], ret[1]);
> +out:
> +	return rc;
> +}
> +
> +/* Min interval in seconds for assuming stable dimm health */
> +#define MIN_HEALTH_QUERY_INTERVAL 60
> +
> +/* Query cached health info and if needed call drc_pmem_query_health */
> +static int drc_pmem_query_health(struct papr_scm_priv *p)
> +{
> +	unsigned long cache_timeout;
> +	s64 rc;
> +
> +	/* Protect concurrent modifications to papr_scm_priv */
> +	rc = mutex_lock_interruptible(&p->health_mutex);
> +	if (rc)
> +		return rc;
> +
> +	/* Jiffies offset for which the health data is assumed to be same */
> +	cache_timeout = p->lasthealth_jiffies +
> +		msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
> +
> +	/* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
> +	if (time_after(jiffies, cache_timeout))
> +		rc = __drc_pmem_query_health(p);

And back to s64 after returning int?

> +	else
> +		/* Assume cached health data is valid */
> +		rc = 0;
> +
> +	mutex_unlock(&p->health_mutex);
> +	return rc;
> +}
>  
>  static int papr_scm_meta_get(struct papr_scm_priv *p,
>  			     struct nd_cmd_get_config_data_hdr *hdr)
> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  	return 0;
>  }
>  
> +static ssize_t flags_show(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct nvdimm *dimm = to_nvdimm(dev);
> +	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> +	struct seq_buf s;
> +	u64 health;
> +	int rc;
> +
> +	rc = drc_pmem_query_health(p);

and back to int...

Just make them long all through...

Ira

> +	if (rc)
> +		return rc;
> +
> +	/* Copy health_bitmap locally, check masks & update out buffer */
> +	health = READ_ONCE(p->health_bitmap);
> +
> +	seq_buf_init(&s, buf, PAGE_SIZE);
> +	if (health & PAPR_SCM_DIMM_UNARMED_MASK)
> +		seq_buf_printf(&s, "not_armed ");
> +
> +	if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
> +		seq_buf_printf(&s, "flush_fail ");
> +
> +	if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
> +		seq_buf_printf(&s, "restore_fail ");
> +
> +	if (health & PAPR_SCM_DIMM_ENCRYPTED)
> +		seq_buf_printf(&s, "encrypted ");
> +
> +	if (health & PAPR_SCM_DIMM_SMART_EVENT_MASK)
> +		seq_buf_printf(&s, "smart_notify ");
> +
> +	if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED)
> +		seq_buf_printf(&s, "scrubbed locked ");
> +
> +	if (seq_buf_used(&s))
> +		seq_buf_printf(&s, "\n");
> +
> +	return seq_buf_used(&s);
> +}
> +DEVICE_ATTR_RO(flags);
> +
> +/* papr_scm specific dimm attributes */
> +static struct attribute *papr_scm_nd_attributes[] = {
> +	&dev_attr_flags.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group papr_scm_nd_attribute_group = {
> +	.name = "papr",
> +	.attrs = papr_scm_nd_attributes,
> +};
> +
> +static const struct attribute_group *papr_scm_dimm_attr_groups[] = {
> +	&papr_scm_nd_attribute_group,
> +	NULL,
> +};
> +
>  static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>  {
>  	struct device *dev = &p->pdev->dev;
> @@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>  	dimm_flags = 0;
>  	set_bit(NDD_LABELING, &dimm_flags);
>  
> -	p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
> -				  PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> +	p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups,
> +				  dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
>  	if (!p->nvdimm) {
>  		dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
>  		goto err;
> @@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
>  	if (!p)
>  		return -ENOMEM;
>  
> +	/* Initialize the dimm mutex */
> +	mutex_init(&p->health_mutex);
> +
>  	/* optional DT properties */
>  	of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
>  
> -- 
> 2.26.2
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply

* Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6,7)
From: Aneesh Kumar K.V @ 2020-05-20 14:29 UTC (permalink / raw)
  To: Christophe Leroy, Rui Salvaterra; +Cc: debian-powerpc, linuxppc-dev
In-Reply-To: <dbaa79c9-dfae-9cb1-cac4-3a198ca28cf0@csgroup.eu>

On 5/20/20 7:23 PM, Christophe Leroy wrote:
> 
> 
> Le 20/05/2020 à 15:43, Aneesh Kumar K.V a écrit :
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>
>>> Le 18/05/2020 à 17:19, Rui Salvaterra a écrit :
>>>> Hi again, Christophe,
>>>>
>>>> On Mon, 18 May 2020 at 15:03, Christophe Leroy
>>>> <christophe.leroy@csgroup.eu> wrote:
>>>>>
>>>>> Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It 
>>>>> may
>>>>> have broken swap.
>>>>
>>>> Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top
>>>> survives the beating. I'll be happy to test a definitive patch!
>>>>
>>>
>>> Yeah I discovered recently that the way swap is implemented on powerpc
>>> expects RW and other important bits not be one of the 3 least
>>> significant bits (see __pte_to_swp_entry() )
>>
>> The last 3 bits are there to track the _PAGE_PRESENT right? What is the
>> RW dependency there? Are you suggesting of read/write migration entry?
>> A swap entry should not retain the pte rw bits right?
>>
>> A swap entry is built using swap type + offset. And it should not have a
>> dependency on pte RW bits. Along with type and offset we also should
>> have the ability to mark it as a pte entry and also set not present
>> bits. With that understanding what am I missing here?
> 
> That's probably me who is missing something, I have not digged into the 
> swap functionning yet indeed, so that was only my first feeling.
> 
> By the way, the problems is definitely due to the order changes in the 
> PTE bits, whether that's because _PAGE_RW was moved to the last 3 bits 
> or whether that's because _PAGE_PRESENT was moved out of the last 3 
> bits, I don't know yet.
> 
> My (bad) understanding is from the fact that  __pte_to_swp_entry() is a 
> right shift by 3 bits, so it looses the last 3 bits, and therefore 
> __swp_entry_to_pte(__pte_to_swp_entry(pte)) looses the last 3 bits of a 
> PTE.
> 
> Is there somewhere a description of how swap works exactly ?
> 

Looking at  __set_pte_at(), I am wondering whether this was due to 
_PAGE_HASHPTE? . This would mean we end up wrongly updating some swap 
entry details. We call set_pte_at() on swap pte entries.

-aneesh




^ permalink raw reply

* Re: Endless soft-lockups for compiling workload since next-20200519
From: Qian Cai @ 2020-05-20 14:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexandre Chartre, Paul E. McKenney, Frederic Weisbecker,
	Linux Kernel Mailing List, Borislav Petkov, Thomas Gleixner,
	linuxppc-dev
In-Reply-To: <20200520125056.GC325280@hirez.programming.kicks-ass.net>

On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote:
> On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> > Just a head up. Repeatedly compiling kernels for a while would trigger
> > endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> > .config are in,
> 
> Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
> not seen anything like that myself. Let me go have a look.

Yes, I ended up figuring out the same commit a bit earlier. Since then I
reverted that commit and its dependency,

2a0a24ebb499 ("sched: Make scheduler_ipi inline")

Everything works fine so far.

> 
> 
> In as far as the logs are readable (they're a wrapped mess, please don't
> do that!), they contain very little useful, as is typical with IPIs :/

Sorry about that. I forgot that gmail webUI will wrap things around. I will
switch to mutt.

> 
> > [ 1167.993773][    C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> > flush_smp_call_function_queue+0x1fa/0x2e0
> > [ 1168.003333][    C1] Modules linked in: nls_iso8859_1 nls_cp437 vfat
> > fat kvm_amd ses kvm enclosure dax_pmem irqbypass dax_pmem_core efivars
> > acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi
> > scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror
> > dm_region_hash dm_log dm_mod
> > [ 1168.029492][    C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> > 5.7.0-rc6-next-20200519 #1
> > [ 1168.037665][    C1] Hardware name: HPE ProLiant DL385
> > Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > [ 1168.046978][    C1] RIP: 0010:flush_smp_call_function_queue+0x1fa/0x2e0
> > [ 1168.053658][    C1] Code: 01 0f 87 c9 12 00 00 83 e3 01 0f 85 cc fe
> > ff ff 48 c7 c7 c0 55 a9 8f c6 05 f6 86 cd 01 01 e8 de 09 ea ff 0f 0b
> > e9 b2 fe ff ff <0f> 0b e9 52 ff ff ff 0f 0b e9 f2 fe ff ff 65 44 8b 25
> > 10 52 3f 71
> > [ 1168.073262][    C1] RSP: 0018:ffffc90000178918 EFLAGS: 00010046
> > [ 1168.079253][    C1] RAX: 0000000000000000 RBX: ffff8888430c58f8
> > RCX: ffffffff8ec26083
> > [ 1168.087156][    C1] RDX: 0000000000000003 RSI: dffffc0000000000
> > RDI: ffff8888430c58f8
> > [ 1168.095054][    C1] RBP: ffffc900001789a8 R08: ffffed1108618cec
> > R09: ffffed1108618cec
> > [ 1168.102964][    C1] R10: ffff8888430c675b R11: 0000000000000000
> > R12: ffff8888430c58e0
> > [ 1168.110866][    C1] R13: ffffffff8eb30c40 R14: ffff8888430c5880
> > R15: ffff8888430c58e0
> > [ 1168.118767][    C1] FS:  0000000000000000(0000)
> > GS:ffff888843080000(0000) knlGS:0000000000000000
> > [ 1168.127628][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1168.134129][    C1] CR2: 000055b169604560 CR3: 0000000d08a14000
> > CR4: 00000000003406e0
> > [ 1168.142026][    C1] Call Trace:
> > [ 1168.145206][    C1]  <IRQ>
> > [ 1168.147957][    C1]  ? smp_call_on_cpu_callback+0xd0/0xd0
> > [ 1168.153421][    C1]  ? rcu_read_lock_sched_held+0xac/0xe0
> > [ 1168.158880][    C1]  ? rcu_read_lock_bh_held+0xc0/0xc0
> > [ 1168.164076][    C1]  generic_smp_call_function_single_interrupt+0x13/0x2b
> > [ 1168.170938][    C1]  smp_call_function_single_interrupt+0x157/0x4e0
> > [ 1168.177278][    C1]  ? smp_call_function_interrupt+0x4e0/0x4e0
> > [ 1168.183172][    C1]  ? interrupt_entry+0xe4/0xf0
> > [ 1168.187846][    C1]  ? trace_hardirqs_off_caller+0x8d/0x1f0
> > [ 1168.193478][    C1]  ? trace_hardirqs_on_caller+0x1f0/0x1f0
> > [ 1168.199116][    C1]  ? _nohz_idle_balance+0x221/0x360
> > [ 1168.204228][    C1]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > [ 1168.209690][    C1]  call_function_single_interrupt+0xf/0x20

^ permalink raw reply

* Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6,7)
From: Christophe Leroy @ 2020-05-20 13:53 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Rui Salvaterra; +Cc: debian-powerpc, linuxppc-dev
In-Reply-To: <877dx6g1rr.fsf@linux.ibm.com>

Le 20/05/2020 à 15:43, Aneesh Kumar K.V a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> 
>> Le 18/05/2020 à 17:19, Rui Salvaterra a écrit :
>>> Hi again, Christophe,
>>>
>>> On Mon, 18 May 2020 at 15:03, Christophe Leroy
>>> <christophe.leroy@csgroup.eu> wrote:
>>>>
>>>> Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It may
>>>> have broken swap.
>>>
>>> Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top
>>> survives the beating. I'll be happy to test a definitive patch!
>>>
>>
>> Yeah I discovered recently that the way swap is implemented on powerpc
>> expects RW and other important bits not be one of the 3 least
>> significant bits (see __pte_to_swp_entry() )
> 
> The last 3 bits are there to track the _PAGE_PRESENT right? What is the
> RW dependency there? Are you suggesting of read/write migration entry?
> A swap entry should not retain the pte rw bits right?
> 
> A swap entry is built using swap type + offset. And it should not have a
> dependency on pte RW bits. Along with type and offset we also should
> have the ability to mark it as a pte entry and also set not present
> bits. With that understanding what am I missing here?

That's probably me who is missing something, I have not digged into the 
swap functionning yet indeed, so that was only my first feeling.

By the way, the problems is definitely due to the order changes in the 
PTE bits, whether that's because _PAGE_RW was moved to the last 3 bits 
or whether that's because _PAGE_PRESENT was moved out of the last 3 
bits, I don't know yet.

My (bad) understanding is from the fact that  __pte_to_swp_entry() is a 
right shift by 3 bits, so it looses the last 3 bits, and therefore 
__swp_entry_to_pte(__pte_to_swp_entry(pte)) looses the last 3 bits of a PTE.

Is there somewhere a description of how swap works exactly ?

Christophe

^ permalink raw reply

* Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open
From: rananta @ 2020-05-20 13:49 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Greg KH, andrew, linuxppc-dev, linux-kernel
In-Reply-To: <cb5bd2b2-f33a-b541-ed3c-70da14c7252d@suse.cz>

On 2020-05-20 02:38, Jiri Slaby wrote:
> On 15. 05. 20, 1:22, rananta@codeaurora.org wrote:
>> On 2020-05-13 00:04, Greg KH wrote:
>>> On Tue, May 12, 2020 at 02:39:50PM -0700, rananta@codeaurora.org 
>>> wrote:
>>>> On 2020-05-12 01:25, Greg KH wrote:
>>>> > On Tue, May 12, 2020 at 09:22:15AM +0200, Jiri Slaby wrote:
>>>> > > commit bdb498c20040616e94b05c31a0ceb3e134b7e829
>>>> > > Author: Jiri Slaby <jslaby@suse.cz>
>>>> > > Date:   Tue Aug 7 21:48:04 2012 +0200
>>>> > >
>>>> > >     TTY: hvc_console, add tty install
>>>> > >
>>>> > > added hvc_install but did not move 'tty->driver_data = NULL;' from
>>>> > > hvc_open's fail path to hvc_cleanup.
>>>> > >
>>>> > > IOW hvc_open now NULLs tty->driver_data even for another task which
>>>> > > opened the tty earlier. The same holds for
>>>> > > "tty_port_tty_set(&hp->port,
>>>> > > NULL);" there. And actually "tty_port_put(&hp->port);" is also
>>>> > > incorrect
>>>> > > for the 2nd task opening the tty.
>>>> > >
> 
> ...
> 
>> These are the traces you get when the issue happens:
>> [  154.212291] hvc_install called for pid: 666
>> [  154.216705] hvc_open called for pid: 666
>> [  154.233657] hvc_open: request_irq failed with rc -22.
>> [  154.238934] hvc_open called for pid: 678
>> [  154.243012] Unable to handle kernel NULL pointer dereference at
>> virtual address 00000000000000c4
>> # hvc_install isn't called for pid: 678 as the file wasn't closed yet.
> 
> Nice. Does the attached help?
Yeah, it looks good. I think it also eliminates the port.count reference
issue discussed on the v2 patch. Are you planning to mainline this?
> 
> I wonder how comes the tty_port_put in hvc_open does not cause a UAF? I
> would say hvc_open fails, tty_port_put is called. It decrements the
> reference taken in hvc_install. So far so good.
> 
> Now, this should happen IMO:
> tty_open
>   -> hvc_open (fails)
>     -> tty_port_put
hvc_console driver defines port->ops->destruct(). Upon tty_port_put(), 
the
tty_port_destructor() calls port->ops->destruct(), rather than 
kfree(port).
>   -> tty_release
>     -> tty_release_struct
>       -> tty_kref_put
>         -> queue_release_one_tty
> SCHEDULED WORKQUEUE
> release_one_tty
>   -> hvc_cleanup
>     -> tty_port_put (should die terrible death now)
Since port is not free'd, I think we should be good.
> 
> What am I missing?
> 
> thanks,

Thank you.
Raghavendra

^ permalink raw reply

* Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6, 7)
From: Aneesh Kumar K.V @ 2020-05-20 13:43 UTC (permalink / raw)
  To: Christophe Leroy, Rui Salvaterra; +Cc: debian-powerpc, linuxppc-dev
In-Reply-To: <c00ed41c-e13e-6bd6-4084-501ca14adb4c@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> Le 18/05/2020 à 17:19, Rui Salvaterra a écrit :
>> Hi again, Christophe,
>> 
>> On Mon, 18 May 2020 at 15:03, Christophe Leroy
>> <christophe.leroy@csgroup.eu> wrote:
>>>
>>> Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It may
>>> have broken swap.
>> 
>> Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top
>> survives the beating. I'll be happy to test a definitive patch!
>> 
>
> Yeah I discovered recently that the way swap is implemented on powerpc 
> expects RW and other important bits not be one of the 3 least 
> significant bits (see __pte_to_swp_entry() )

The last 3 bits are there to track the _PAGE_PRESENT right? What is the
RW dependency there? Are you suggesting of read/write migration entry?
A swap entry should not retain the pte rw bits right? 

A swap entry is built using swap type + offset. And it should not have a
dependency on pte RW bits. Along with type and offset we also should
have the ability to mark it as a pte entry and also set not present
bits. With that understanding what am I missing here?

>
> I guess the easiest for the time being is to revert the commit with a 
> proper explanation of the issue, then one day we'll modify the way 
> powerpc manages swap.
>

-aneesh

^ permalink raw reply

* [PATCH] powerpc/64s: Disable STRICT_KERNEL_RWX
From: Michael Ellerman @ 2020-05-20 13:36 UTC (permalink / raw)
  To: linuxppc-dev

Several strange crashes have been eventually traced back to
STRICT_KERNEL_RWX and its interaction with code patching.

Various paths in our ftrace, kprobes and other patching code need to
be hardened against patching failures, otherwise we can end up running
with partially/incorrectly patched ftrace paths, kprobes or jump
labels, which can then cause strange crashes.

Although fixes for those are in development, they're not -rc material.

There also seem to be problems with the underlying strict RWX logic,
which needs further debugging.

So for now disable STRICT_KERNEL_RWX on 64-bit to prevent people from
enabling the option and tripping over the bugs.

Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 924c541a9260..d13b5328ca10 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -130,7 +130,7 @@ config PPC
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_MEMBARRIER_CALLBACKS
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64
-	select ARCH_HAS_STRICT_KERNEL_RWX	if ((PPC_BOOK3S_64 || PPC32) && !HIBERNATION)
+	select ARCH_HAS_STRICT_KERNEL_RWX	if (PPC32 && !HIBERNATION)
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAS_UACCESS_FLUSHCACHE
 	select ARCH_HAS_UACCESS_MCSAFE		if PPC64
-- 
2.25.1

^ permalink raw reply related

* Re: Endless soft-lockups for compiling workload since next-20200519
From: Peter Zijlstra @ 2020-05-20 12:50 UTC (permalink / raw)
  To: Qian Cai
  Cc: Paul E. McKenney, Frederic Weisbecker, Linux Kernel Mailing List,
	Borislav Petkov, Thomas Gleixner, linuxppc-dev
In-Reply-To: <CAG=TAF6jUsQrW-fjbS3vpjkMfn8=MUDsuQxjk3NMfvQa250RHA@mail.gmail.com>

On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> Just a head up. Repeatedly compiling kernels for a while would trigger
> endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> .config are in,

Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
not seen anything like that myself. Let me go have a look.


In as far as the logs are readable (they're a wrapped mess, please don't
do that!), they contain very little useful, as is typical with IPIs :/

> [ 1167.993773][    C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> flush_smp_call_function_queue+0x1fa/0x2e0
> [ 1168.003333][    C1] Modules linked in: nls_iso8859_1 nls_cp437 vfat
> fat kvm_amd ses kvm enclosure dax_pmem irqbypass dax_pmem_core efivars
> acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi
> scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror
> dm_region_hash dm_log dm_mod
> [ 1168.029492][    C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> 5.7.0-rc6-next-20200519 #1
> [ 1168.037665][    C1] Hardware name: HPE ProLiant DL385
> Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> [ 1168.046978][    C1] RIP: 0010:flush_smp_call_function_queue+0x1fa/0x2e0
> [ 1168.053658][    C1] Code: 01 0f 87 c9 12 00 00 83 e3 01 0f 85 cc fe
> ff ff 48 c7 c7 c0 55 a9 8f c6 05 f6 86 cd 01 01 e8 de 09 ea ff 0f 0b
> e9 b2 fe ff ff <0f> 0b e9 52 ff ff ff 0f 0b e9 f2 fe ff ff 65 44 8b 25
> 10 52 3f 71
> [ 1168.073262][    C1] RSP: 0018:ffffc90000178918 EFLAGS: 00010046
> [ 1168.079253][    C1] RAX: 0000000000000000 RBX: ffff8888430c58f8
> RCX: ffffffff8ec26083
> [ 1168.087156][    C1] RDX: 0000000000000003 RSI: dffffc0000000000
> RDI: ffff8888430c58f8
> [ 1168.095054][    C1] RBP: ffffc900001789a8 R08: ffffed1108618cec
> R09: ffffed1108618cec
> [ 1168.102964][    C1] R10: ffff8888430c675b R11: 0000000000000000
> R12: ffff8888430c58e0
> [ 1168.110866][    C1] R13: ffffffff8eb30c40 R14: ffff8888430c5880
> R15: ffff8888430c58e0
> [ 1168.118767][    C1] FS:  0000000000000000(0000)
> GS:ffff888843080000(0000) knlGS:0000000000000000
> [ 1168.127628][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1168.134129][    C1] CR2: 000055b169604560 CR3: 0000000d08a14000
> CR4: 00000000003406e0
> [ 1168.142026][    C1] Call Trace:
> [ 1168.145206][    C1]  <IRQ>
> [ 1168.147957][    C1]  ? smp_call_on_cpu_callback+0xd0/0xd0
> [ 1168.153421][    C1]  ? rcu_read_lock_sched_held+0xac/0xe0
> [ 1168.158880][    C1]  ? rcu_read_lock_bh_held+0xc0/0xc0
> [ 1168.164076][    C1]  generic_smp_call_function_single_interrupt+0x13/0x2b
> [ 1168.170938][    C1]  smp_call_function_single_interrupt+0x157/0x4e0
> [ 1168.177278][    C1]  ? smp_call_function_interrupt+0x4e0/0x4e0
> [ 1168.183172][    C1]  ? interrupt_entry+0xe4/0xf0
> [ 1168.187846][    C1]  ? trace_hardirqs_off_caller+0x8d/0x1f0
> [ 1168.193478][    C1]  ? trace_hardirqs_on_caller+0x1f0/0x1f0
> [ 1168.199116][    C1]  ? _nohz_idle_balance+0x221/0x360
> [ 1168.204228][    C1]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [ 1168.209690][    C1]  call_function_single_interrupt+0xf/0x20

^ permalink raw reply

* Re: [PATCH v2] tty: hvc: Fix data abort due to race in hvc_open
From: rananta @ 2020-05-20 12:43 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: gregkh, andrew, linuxppc-dev, linux-kernel, stable
In-Reply-To: <f84a9da7-bb0f-7538-fa00-968c9625335b@suse.cz>

On 2020-05-20 01:59, Jiri Slaby wrote:
> On 20. 05. 20, 8:47, Raghavendra Rao Ananta wrote:
>> Potentially, hvc_open() can be called in parallel when two tasks calls
>> open() on /dev/hvcX. In such a scenario, if the 
>> hp->ops->notifier_add()
>> callback in the function fails, where it sets the tty->driver_data to
>> NULL, the parallel hvc_open() can see this NULL and cause a memory 
>> abort.
>> Hence, do a NULL check at the beginning, before proceeding ahead.
>> 
>> The issue can be easily reproduced by launching two tasks 
>> simultaneously
>> that does an open() call on /dev/hvcX.
>> For example:
>> $ cat /dev/hvc0 & cat /dev/hvc0 &
>> 
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Raghavendra Rao Ananta <rananta@codeaurora.org>
>> ---
>>  drivers/tty/hvc/hvc_console.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/drivers/tty/hvc/hvc_console.c 
>> b/drivers/tty/hvc/hvc_console.c
>> index 436cc51c92c3..80709f754cc8 100644
>> --- a/drivers/tty/hvc/hvc_console.c
>> +++ b/drivers/tty/hvc/hvc_console.c
>> @@ -350,6 +350,9 @@ static int hvc_open(struct tty_struct *tty, struct 
>> file * filp)
>>  	unsigned long flags;
>>  	int rc = 0;
>> 
>> +	if (!hp)
>> +		return -ENODEV;
>> +
> 
> This is still not fixing the bug properly. See:
> https://lore.kernel.org/linuxppc-dev/0f7791f5-0a53-59f6-7277-247a789f30c2@suse.cz/
> 
> In particular, the paragraph starting "IOW".
> 
You are right. This doesn't fix the problem entirely. There are other 
parts to it which is
not handled in a clean way by the driver. Apart from the things you've 
mentioned, it doesn't
seem to handle the hp->port.count correctly as well.

hvc_open:
   hp->port.count++
   hp->ops->notifier_add(hp, hp->data) fails
   tty->driver_data = NULL

hvc_close:
   returns immediately as tty->driver_data == NULL, without 
hp->port.count--

This would leave the port in a stale state, and the second caller of 
hvc_open doesn't get
a chance to call/retry hp->ops->notifier_add(hp, hp->data);

However, the patch is not trying to address the logical issues with 
hvc_open and hvc_close.
It's only trying to eliminate the potential NULL pointer dereference, 
leading to a panic.
 From what I see, the hvc_open is serialized by tty_lock, and adding a 
NULL check here is
preventing the second caller.
> thanks,

Thank you.
Raghavendra

^ permalink raw reply

* Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
From: Mark Brown @ 2020-05-20 12:38 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: sumit.semwal, linaro-mm-sig, Linux-ALSA, linuxppc-dev,
	linux-kernel, Timur Tabi, Xiubo Li, shawnguo, Shengjiu Wang,
	Takashi Iwai, Liam Girdwood, dri-devel, perex, Nicolin Chen,
	linux-imx, kernel, linux-media, Fabio Estevam, s.hauer,
	linux-arm-kernel, Lucas Stach
In-Reply-To: <CAA+D8APAMRwtVneqFsuBgAhozmQo3R0AQi0bVdUCQO4Af4xVfw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 479 bytes --]

On Wed, May 20, 2020 at 07:22:19PM +0800, Shengjiu Wang wrote:

> I see some driver also request dma channel in open() or hw_params().
> how can they avoid the defer probe issue?
> for example：
> sound/arm/pxa2xx-pcm-lib.c
> sound/soc/sprd/sprd-pcm-dma.c

Other drivers having problems means those drivers should be fixed, not
that we should copy the problems.  In the case of the PXA driver that's
very old code which predates deferred probe by I'd guess a decade.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH] powerpc: Add ppc_inst_next()
From: Christophe Leroy @ 2020-05-20 12:30 UTC (permalink / raw)
  To: Jordan Niethe, Michael Ellerman; +Cc: Christophe Leroy, linuxppc-dev
In-Reply-To: <CACzsE9p2c2ZLny86eOEtbyoiYtSNp0kmw9KE7GdfdxhqhWwLOQ@mail.gmail.com>



Le 20/05/2020 à 14:21, Jordan Niethe a écrit :
> On Wed, May 20, 2020 at 9:44 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> In a few places we want to calculate the address of the next
>> instruction. Previously that was simple, we just added 4 bytes, or if
>> using a u32 * we incremented that pointer by 1.
>>
>> But prefixed instructions make it more complicated, we need to advance
>> by either 4 or 8 bytes depending on the actual instruction. We also
>> can't do pointer arithmetic using struct ppc_inst, because it is
>> always 8 bytes in size on 64-bit, even though we might only need to
>> advance by 4 bytes.
>>
>> So add a ppc_inst_next() helper which calculates the location of the
>> next instruction, if the given instruction was located at the given
>> address. Note the instruction doesn't need to actually be at the
>> address in memory.
>>
>> Convert several locations to use it.
>>
>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>> ---
>>   arch/powerpc/include/asm/inst.h   |  9 +++++++++
>>   arch/powerpc/kernel/uprobes.c     |  2 +-
>>   arch/powerpc/lib/feature-fixups.c | 10 +++++-----
>>   arch/powerpc/xmon/xmon.c          |  2 +-
>>   4 files changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
>> index d82e0c99cfa1..7d5ee1309b92 100644
>> --- a/arch/powerpc/include/asm/inst.h
>> +++ b/arch/powerpc/include/asm/inst.h
>> @@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x)
>>          return ppc_inst_prefixed(x) ? 8 : 4;
>>   }
>>
>> +/*
>> + * Return the address of the next instruction, if the instruction @value was
>> + * located at @location.
>> + */
>> +static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst value)
>> +{
>> +       return location + ppc_inst_len(value);
>> +}
> I think this is a good idea. I tried something similar in the initial
> post for an instruction type. I had:
> +#define PPC_INST_NEXT(ptr) ((ptr) += PPC_INST_LEN(DEREF_PPC_INST_PTR((ptr))))
> but how you've got it is much more clear/usable.

Yes I agree

> I wonder why not
> +static inline struct ppc_inst *ppc_inst_next(void *location)
> +{
> +       return location + ppc_inst_len(ppc_inst_read((struct ppc_inst
> *)location);
> +}

Because as Michael explains, the instruction to be skipped might not yet 
be at the pointed memory location (for instance in insert_bpts() )

> 
>> +
>>   int probe_user_read_inst(struct ppc_inst *inst,
>>                           struct ppc_inst __user *nip);
>>
>> diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
>> index 83e883e1a42d..683ba76919a7 100644
>> --- a/arch/powerpc/kernel/uprobes.c
>> +++ b/arch/powerpc/kernel/uprobes.c
>> @@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>>           * support doesn't exist and have to fix-up the next instruction
>>           * to be executed.
>>           */
>> -       regs->nip = utask->vaddr + ppc_inst_len(ppc_inst_read(&auprobe->insn));
>> +       regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, auprobe->insn);
>>
>>          user_disable_single_step(current);
>>          return 0;
>> diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
>> index 80f320c2e189..0ad01eebf112 100644
>> --- a/arch/powerpc/lib/feature-fixups.c
>> +++ b/arch/powerpc/lib/feature-fixups.c
>> @@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, struct fixup_entry *fcur)
>>          src = alt_start;
>>          dest = start;
>>
>> -       for (; src < alt_end; src = (void *)src + ppc_inst_len(ppc_inst_read(src)),
>> -            (dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest)))) {
>> +       for (; src < alt_end; src = ppc_inst_next(src, *src),
>> +                             dest = ppc_inst_next(dest, *dest)) {
> The reason to maybe use ppc_inst_read() in the helper instead of just
> *dest would be we don't always need to read 8 bytes.

And reading 8 bytes might trigger a page fault if we are reading the 
very last non prefixed instruction of the last page.

>>                  if (patch_alt_instruction(src, dest, alt_start, alt_end))
>>                          return 1;
>>          }
>>
>> -       for (; dest < end; dest = (void *)dest + ppc_inst_len(ppc_inst(PPC_INST_NOP)))
>> +       for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP)))
> But then you wouldn't be able to do this as easily I guess.
>>                  raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP));
>>
>>          return 0;
>> @@ -405,8 +405,8 @@ static void do_final_fixups(void)
>>          while (src < end) {
>>                  inst = ppc_inst_read(src);
>>                  raw_patch_instruction(dest, inst);
>> -               src = (void *)src + ppc_inst_len(inst);
>> -               dest = (void *)dest + ppc_inst_len(inst);
>> +               src = ppc_inst_next(src, *src);
>> +               dest = ppc_inst_next(dest, *dest);
>>          }
>>   #endif
>>   }
>> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
>> index fb135f2cd6b0..aa123f56b7d4 100644
>> --- a/arch/powerpc/xmon/xmon.c
>> +++ b/arch/powerpc/xmon/xmon.c
>> @@ -939,7 +939,7 @@ static void insert_bpts(void)
>>                  }
>>
>>                  patch_instruction(bp->instr, instr);
>> -               patch_instruction((void *)bp->instr + ppc_inst_len(instr),
>> +               patch_instruction(ppc_inst_next(bp->instr, instr),
>>                                    ppc_inst(bpinstr));
>>                  if (bp->enabled & BP_CIABR)
>>                          continue;
>> --
>> 2.25.1
>>

Christophe

^ permalink raw reply

* Re: [PATCH] powerpc: Add ppc_inst_next()
From: Jordan Niethe @ 2020-05-20 12:21 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Christophe Leroy, linuxppc-dev
In-Reply-To: <20200520114446.956215-1-mpe@ellerman.id.au>

On Wed, May 20, 2020 at 9:44 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> In a few places we want to calculate the address of the next
> instruction. Previously that was simple, we just added 4 bytes, or if
> using a u32 * we incremented that pointer by 1.
>
> But prefixed instructions make it more complicated, we need to advance
> by either 4 or 8 bytes depending on the actual instruction. We also
> can't do pointer arithmetic using struct ppc_inst, because it is
> always 8 bytes in size on 64-bit, even though we might only need to
> advance by 4 bytes.
>
> So add a ppc_inst_next() helper which calculates the location of the
> next instruction, if the given instruction was located at the given
> address. Note the instruction doesn't need to actually be at the
> address in memory.
>
> Convert several locations to use it.
>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  arch/powerpc/include/asm/inst.h   |  9 +++++++++
>  arch/powerpc/kernel/uprobes.c     |  2 +-
>  arch/powerpc/lib/feature-fixups.c | 10 +++++-----
>  arch/powerpc/xmon/xmon.c          |  2 +-
>  4 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
> index d82e0c99cfa1..7d5ee1309b92 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x)
>         return ppc_inst_prefixed(x) ? 8 : 4;
>  }
>
> +/*
> + * Return the address of the next instruction, if the instruction @value was
> + * located at @location.
> + */
> +static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst value)
> +{
> +       return location + ppc_inst_len(value);
> +}
I think this is a good idea. I tried something similar in the initial
post for an instruction type. I had:
+#define PPC_INST_NEXT(ptr) ((ptr) += PPC_INST_LEN(DEREF_PPC_INST_PTR((ptr))))
but how you've got it is much more clear/usable.
I wonder why not
+static inline struct ppc_inst *ppc_inst_next(void *location)
+{
+       return location + ppc_inst_len(ppc_inst_read((struct ppc_inst
*)location);
+}

> +
>  int probe_user_read_inst(struct ppc_inst *inst,
>                          struct ppc_inst __user *nip);
>
> diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
> index 83e883e1a42d..683ba76919a7 100644
> --- a/arch/powerpc/kernel/uprobes.c
> +++ b/arch/powerpc/kernel/uprobes.c
> @@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
>          * support doesn't exist and have to fix-up the next instruction
>          * to be executed.
>          */
> -       regs->nip = utask->vaddr + ppc_inst_len(ppc_inst_read(&auprobe->insn));
> +       regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, auprobe->insn);
>
>         user_disable_single_step(current);
>         return 0;
> diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
> index 80f320c2e189..0ad01eebf112 100644
> --- a/arch/powerpc/lib/feature-fixups.c
> +++ b/arch/powerpc/lib/feature-fixups.c
> @@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, struct fixup_entry *fcur)
>         src = alt_start;
>         dest = start;
>
> -       for (; src < alt_end; src = (void *)src + ppc_inst_len(ppc_inst_read(src)),
> -            (dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest)))) {
> +       for (; src < alt_end; src = ppc_inst_next(src, *src),
> +                             dest = ppc_inst_next(dest, *dest)) {
The reason to maybe use ppc_inst_read() in the helper instead of just
*dest would be we don't always need to read 8 bytes.
>                 if (patch_alt_instruction(src, dest, alt_start, alt_end))
>                         return 1;
>         }
>
> -       for (; dest < end; dest = (void *)dest + ppc_inst_len(ppc_inst(PPC_INST_NOP)))
> +       for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP)))
But then you wouldn't be able to do this as easily I guess.
>                 raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP));
>
>         return 0;
> @@ -405,8 +405,8 @@ static void do_final_fixups(void)
>         while (src < end) {
>                 inst = ppc_inst_read(src);
>                 raw_patch_instruction(dest, inst);
> -               src = (void *)src + ppc_inst_len(inst);
> -               dest = (void *)dest + ppc_inst_len(inst);
> +               src = ppc_inst_next(src, *src);
> +               dest = ppc_inst_next(dest, *dest);
>         }
>  #endif
>  }
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index fb135f2cd6b0..aa123f56b7d4 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -939,7 +939,7 @@ static void insert_bpts(void)
>                 }
>
>                 patch_instruction(bp->instr, instr);
> -               patch_instruction((void *)bp->instr + ppc_inst_len(instr),
> +               patch_instruction(ppc_inst_next(bp->instr, instr),
>                                   ppc_inst(bpinstr));
>                 if (bp->enabled & BP_CIABR)
>                         continue;
> --
> 2.25.1
>

^ permalink raw reply

* [PATCH] powerpc/configs/64s: Enable CONFIG_PRINTK_CALLER
From: Michael Ellerman @ 2020-05-20 12:12 UTC (permalink / raw)
  To: linuxppc-dev

This adds the CPU or thread number to printk messages. This helps a
lot when deciphering concurrent oopses that have been interleaved.

Example output, of PID1 (T1) triggering a warning:

  [    1.581678][    T1] WARNING: CPU: 0 PID: 1 at crypto/rsa-pkcs1pad.c:539 pkcs1pad_verify+0x38/0x140
  [    1.581681][    T1] Modules linked in:
  [    1.581693][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty #1515
  [    1.581700][    T1] NIP:  c000000000207d64 LR: c000000000207d3c CTR: c000000000207d2c
  [    1.581708][    T1] REGS: c0000000fd2e7560 TRAP: 0700   Not tainted  (5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty)
  [    1.581712][    T1] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44000222  XER: 00040000

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/configs/powernv_defconfig | 1 +
 arch/powerpc/configs/ppc64_defconfig   | 1 +
 arch/powerpc/configs/pseries_defconfig | 1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
index df8bdbaa5d8f..2de9aadf0f50 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -347,3 +347,4 @@ CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=m
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
+CONFIG_PRINTK_CALLER=y
diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
index bae8170d7401..57142a648ebd 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -358,6 +358,7 @@ CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
 CONFIG_CRYPTO_DEV_VMX=y
 CONFIG_PRINTK_TIME=y
+CONFIG_PRINTK_CALLER=y
 CONFIG_MAGIC_SYSRQ=y
 CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_STACK_USAGE=y
diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
index 0bea4d3ffb85..dfa4a726333b 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -322,3 +322,4 @@ CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=m
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
+CONFIG_PRINTK_CALLER=y
-- 
2.25.1


^ permalink raw reply related

* [PATCH] powerpc: Add ppc_inst_next()
From: Michael Ellerman @ 2020-05-20 11:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: christophe.leroy, jniethe5

In a few places we want to calculate the address of the next
instruction. Previously that was simple, we just added 4 bytes, or if
using a u32 * we incremented that pointer by 1.

But prefixed instructions make it more complicated, we need to advance
by either 4 or 8 bytes depending on the actual instruction. We also
can't do pointer arithmetic using struct ppc_inst, because it is
always 8 bytes in size on 64-bit, even though we might only need to
advance by 4 bytes.

So add a ppc_inst_next() helper which calculates the location of the
next instruction, if the given instruction was located at the given
address. Note the instruction doesn't need to actually be at the
address in memory.

Convert several locations to use it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/include/asm/inst.h   |  9 +++++++++
 arch/powerpc/kernel/uprobes.c     |  2 +-
 arch/powerpc/lib/feature-fixups.c | 10 +++++-----
 arch/powerpc/xmon/xmon.c          |  2 +-
 4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index d82e0c99cfa1..7d5ee1309b92 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x)
 	return ppc_inst_prefixed(x) ? 8 : 4;
 }
 
+/*
+ * Return the address of the next instruction, if the instruction @value was
+ * located at @location.
+ */
+static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst value)
+{
+	return location + ppc_inst_len(value);
+}
+
 int probe_user_read_inst(struct ppc_inst *inst,
 			 struct ppc_inst __user *nip);
 
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 83e883e1a42d..683ba76919a7 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 	 * support doesn't exist and have to fix-up the next instruction
 	 * to be executed.
 	 */
-	regs->nip = utask->vaddr + ppc_inst_len(ppc_inst_read(&auprobe->insn));
+	regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, auprobe->insn);
 
 	user_disable_single_step(current);
 	return 0;
diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
index 80f320c2e189..0ad01eebf112 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, struct fixup_entry *fcur)
 	src = alt_start;
 	dest = start;
 
-	for (; src < alt_end; src = (void *)src + ppc_inst_len(ppc_inst_read(src)),
-	     (dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest)))) {
+	for (; src < alt_end; src = ppc_inst_next(src, *src),
+			      dest = ppc_inst_next(dest, *dest)) {
 		if (patch_alt_instruction(src, dest, alt_start, alt_end))
 			return 1;
 	}
 
-	for (; dest < end; dest = (void *)dest + ppc_inst_len(ppc_inst(PPC_INST_NOP)))
+	for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP)))
 		raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP));
 
 	return 0;
@@ -405,8 +405,8 @@ static void do_final_fixups(void)
 	while (src < end) {
 		inst = ppc_inst_read(src);
 		raw_patch_instruction(dest, inst);
-		src = (void *)src + ppc_inst_len(inst);
-		dest = (void *)dest + ppc_inst_len(inst);
+		src = ppc_inst_next(src, *src);
+		dest = ppc_inst_next(dest, *dest);
 	}
 #endif
 }
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fb135f2cd6b0..aa123f56b7d4 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -939,7 +939,7 @@ static void insert_bpts(void)
 		}
 
 		patch_instruction(bp->instr, instr);
-		patch_instruction((void *)bp->instr + ppc_inst_len(instr),
+		patch_instruction(ppc_inst_next(bp->instr, instr),
 				  ppc_inst(bpinstr));
 		if (bp->enabled & BP_CIABR)
 			continue;
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
From: Shengjiu Wang @ 2020-05-20 11:22 UTC (permalink / raw)
  To: Lucas Stach
  Cc: sumit.semwal, linaro-mm-sig, Linux-ALSA, linuxppc-dev,
	linux-kernel, Timur Tabi, Xiubo Li, shawnguo, Shengjiu Wang,
	Takashi Iwai, Liam Girdwood, dri-devel, perex, Nicolin Chen,
	Mark Brown, linux-imx, kernel, Fabio Estevam, s.hauer,
	linux-arm-kernel, linux-media
In-Reply-To: <53258cd99caaf1199036737f8fad6cc097939567.camel@pengutronix.de>

Hi

On Wed, May 20, 2020 at 5:42 PM Lucas Stach <l.stach@pengutronix.de> wrote:
>
> Am Mittwoch, den 20.05.2020, 16:20 +0800 schrieb Shengjiu Wang:
> > Hi
> >
> > On Tue, May 19, 2020 at 6:04 PM Lucas Stach <l.stach@pengutronix.de> wrote:
> > > Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang:
> > > > There are two requirements that we need to move the request
> > > > of dma channel from probe to open.
> > >
> > > How do you handle -EPROBE_DEFER return code from the channel request if
> > > you don't do it in probe?
> >
> > I use the dma_request_slave_channel or dma_request_channel instead
> > of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER
> > return code.
>
> This is a pretty weak argument. The dmaengine device might probe after
> you try to get the channel. Using a function to request the channel
> that doesn't allow you to handle probe deferral is IMHO a bug and
> should be fixed, instead of building even more assumptions on top of
> it.
>

I see some driver also request dma channel in open() or hw_params().
how can they avoid the defer probe issue?
for example：
sound/arm/pxa2xx-pcm-lib.c
sound/soc/sprd/sprd-pcm-dma.c

> > > > - When dma device binds with power-domains, the power will
> > > > be enabled when we request dma channel. If the request of dma
> > > > channel happen on probe, then the power-domains will be always
> > > > enabled after kernel boot up,  which is not good for power
> > > > saving,  so we need to move the request of dma channel to .open();
> > >
> > > This is certainly something which could be fixed in the dmaengine
> > > driver.
> >
> > Dma driver always call the pm_runtime_get_sync in
> > device_alloc_chan_resources, the device_alloc_chan_resources is
> > called when channel is requested. so power is enabled on channel
> > request.
>
> So why can't you fix the dmaengine driver to do that RPM call at a
> later time when the channel is actually going to be used? This will
> allow further power savings with other slave devices than the audio
> PCM.
>
> Regards,
> Lucas
>

It seems the best place for calling pm_runtime_get_sync is the
device_alloc_chan_resources, and calling pm_runtime_put_sync
in the .device_free_chan_resources

For the slave_sg mode, the .device_prep_slave_sg and
.device_issue_pending  will be called many times after
.device_alloc_chan_resources. so it is not good to call
pm_runtime_get_sync in .device_prep_slave_sg or
.device_issue_pending

best regards
wang shengjiu

^ permalink raw reply

* [PATCH] powerpc/xmon: Show task->thread.regs in process display
From: Michael Ellerman @ 2020-05-20 11:17 UTC (permalink / raw)
  To: linuxppc-dev

Show the address of the tasks regs in the process listing in xmon. The
regs should always be on the stack page that we also print the address
of, but it's still helpful not to have to find them by hand.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/xmon/xmon.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index de585204d1d2..fb135f2cd6b0 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3185,8 +3185,8 @@ static void show_task(struct task_struct *tsk)
 		(tsk->exit_state & EXIT_DEAD) ? 'E' :
 		(tsk->state & TASK_INTERRUPTIBLE) ? 'S' : '?';
 
-	printf("%px %016lx %6d %6d %c %2d %s\n", tsk,
-		tsk->thread.ksp,
+	printf("%16px %16lx %16px %6d %6d %c %2d %s\n", tsk,
+	        tsk->thread.ksp, tsk->thread.regs,
 		tsk->pid, rcu_dereference(tsk->parent)->pid,
 		state, task_cpu(tsk),
 		tsk->comm);
@@ -3309,7 +3309,7 @@ static void show_tasks(void)
 	unsigned long tskv;
 	struct task_struct *tsk = NULL;
 
-	printf("     task_struct     ->thread.ksp    PID   PPID S  P CMD\n");
+	printf("     task_struct     ->thread.ksp    ->thread.regs    PID   PPID S  P CMD\n");
 
 	if (scanhex(&tskv))
 		tsk = (struct task_struct *)tskv;
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] powerpc/5200: update contact email
From: Michael Ellerman @ 2020-05-20 11:00 UTC (permalink / raw)
  To: linux-kernel, Wolfram Sang
  Cc: devicetree, Rob Herring, Paul Mackerras, kernel, linuxppc-dev
In-Reply-To: <20200502142642.18979-1-wsa@kernel.org>

On Sat, 2 May 2020 16:26:42 +0200, Wolfram Sang wrote:
> My 'pengutronix' address is defunct for years. Merge the entries and use
> the proper contact address.

Applied to powerpc/next.

[1/1] powerpc/5200: update contact email
      https://git.kernel.org/powerpc/c/ad0f522df1b2f4fe5d4ae6418e1ea216154a0662

cheers

^ permalink raw reply

* Re: [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously
From: Michael Ellerman @ 2020-05-20 11:00 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev; +Cc: Nathan Lynch, Oliver O'Halloran
In-Reply-To: <cover.1588045502.git.sbobroff@linux.ibm.com>

On Tue, 28 Apr 2020 13:45:04 +1000, Sam Bobroff wrote:
> Here are some fixes and cleanups that have come from other work but that I
> think stand on their own.
> 
> Only one patch ("Release EEH device state synchronously", suggested by Oliver
> O'Halloran) is a significant change: it moves the cleanup of some EEH device
> data out of the (possibly asynchronous) device release handler and into the
> (synchronously called) bus notifier. This is useful for future work as it makes
> it easier to reason about the lifetimes of EEH structures.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/eeh: Fix pseries_eeh_configure_bridge()
      https://git.kernel.org/powerpc/c/6fa13640aea7bb0760846981aa2da4245307bd26
[2/2] powerpc/eeh: Release EEH device state synchronously
      https://git.kernel.org/powerpc/c/466381ecdc741b1767d980e10b1ec49f6bde56f3

cheers

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox