LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Rich Felker @ 2020-06-04 17:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: libc-alpha, eery, Daniel Kolesa, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <20200604171232.GG31009@gate.crashing.org>

On Thu, Jun 04, 2020 at 12:12:32PM -0500, Segher Boessenkool wrote:
> On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > well, ppc64le already cannot be run on those, as far as I know (I
> > don't think it's possible to build ppc64le userland without VSX in
> > any configuration)
> 
> VSX is required by the ELFv2 ABI:
> 
> """
> Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
> compliant processors must implement the following categories:

This is not actually ABI but IBM policy laundered into an ABI
document, which musl does not honor.

Rich

^ permalink raw reply

* Re: ppc64le and 32-bit LE userland compatibility
From: Segher Boessenkool @ 2020-06-04 17:20 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: libc-alpha, eery, Daniel Kolesa, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, linuxppc-dev,
	Joseph Myers
In-Reply-To: <20200602152724.GU25173@kitsune.suse.cz>

On Tue, Jun 02, 2020 at 05:27:24PM +0200, Michal Suchánek wrote:
> Naturally on POWER the first cpu that has LE support is POWER8 so you
> can count on all other POWER8 features to be present.

This is not true.

The oldest CPU the ELFv2 ABI (and so, powerpc64le-linux) supports is
POWER8, but most 6xx/7xx CPUs had a working LE mode already.  There are
very old ABIs that support LE as well.


Segher

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Segher Boessenkool @ 2020-06-04 17:33 UTC (permalink / raw)
  To: Rich Felker
  Cc: libc-alpha, eery, Daniel Kolesa, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <20200604171844.GO1079@brightrain.aerifal.cx>

On Thu, Jun 04, 2020 at 01:18:44PM -0400, Rich Felker wrote:
> On Thu, Jun 04, 2020 at 12:12:32PM -0500, Segher Boessenkool wrote:
> > On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > > well, ppc64le already cannot be run on those, as far as I know (I
> > > don't think it's possible to build ppc64le userland without VSX in
> > > any configuration)
> > 
> > VSX is required by the ELFv2 ABI:
> > 
> > """
> > Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
> > compliant processors must implement the following categories:
> 
> This is not actually ABI but IBM policy laundered into an ABI
> document, which musl does not honor.

It is the ABI.  If you think it should be different, make your own ABI,
don't pretend the existing ABI is different than what it is.  Thank you.


Segher

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Rich Felker @ 2020-06-04 17:46 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: libc-alpha, eery, Daniel Kolesa, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <20200604173312.GI31009@gate.crashing.org>

On Thu, Jun 04, 2020 at 12:33:12PM -0500, Segher Boessenkool wrote:
> On Thu, Jun 04, 2020 at 01:18:44PM -0400, Rich Felker wrote:
> > On Thu, Jun 04, 2020 at 12:12:32PM -0500, Segher Boessenkool wrote:
> > > On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > > > well, ppc64le already cannot be run on those, as far as I know (I
> > > > don't think it's possible to build ppc64le userland without VSX in
> > > > any configuration)
> > > 
> > > VSX is required by the ELFv2 ABI:
> > > 
> > > """
> > > Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
> > > compliant processors must implement the following categories:
> > 
> > This is not actually ABI but IBM policy laundered into an ABI
> > document, which musl does not honor.
> 
> It is the ABI.  If you think it should be different, make your own ABI,
> don't pretend the existing ABI is different than what it is.  Thank you.

Our ABI is as specified in the ELFv2 document, but with ld as ld64,
and minus gratuitous requirements on ISA level that are not part of
implementing linkage.

Rich

^ permalink raw reply

* Re: [PATCH] mm: Fix pud_alloc_track()
From: Guenter Roeck @ 2020-06-04 18:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-arch, Stephen Rothwell, jroedel, linux-mm, peterz,
	linuxppc-dev, linux-kernel, Steven Rostedt, Abdul Haleem,
	linux-next, Satheesh Rajendran, Andy Lutomirski, Andrew Morton,
	manvanth, hch
In-Reply-To: <20200604074446.23944-1-joro@8bytes.org>

On Thu, Jun 04, 2020 at 09:44:46AM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The pud_alloc_track() needs to do different checks based on whether
> __ARCH_HAS_5LEVEL_HACK is defined, like it already does in
> pud_alloc(). Otherwise it causes boot failures on PowerPC.
> 
> Provide the correct implementations for both possible settings of
> __ARCH_HAS_5LEVEL_HACK to fix the boot problems.
> 
> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Fixes: d8626138009b ("mm: add functions to track page directory modifications")
> Signed-off-by: Joerg Roedel <jroedel@suse.de>

Tested-by: Guenter Roeck <linux@roeck-us.net>

> ---
>  include/asm-generic/5level-fixup.h |  5 +++++
>  include/linux/mm.h                 | 26 +++++++++++++-------------
>  2 files changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
> index 58046ddc08d0..afbab31fbd7e 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -17,6 +17,11 @@
>  	((unlikely(pgd_none(*(p4d))) && __pud_alloc(mm, p4d, address)) ? \
>  		NULL : pud_offset(p4d, address))
>  
> +#define pud_alloc_track(mm, p4d, address, mask)					\
> +	((unlikely(pgd_none(*(p4d))) &&						\
> +	  (__pud_alloc(mm, p4d, address) || ({*(mask)|=PGTBL_P4D_MODIFIED;0;})))?	\
> +	  NULL : pud_offset(p4d, address))
> +
>  #define p4d_alloc(mm, pgd, address)		(pgd)
>  #define p4d_alloc_track(mm, pgd, address, mask)	(pgd)
>  #define p4d_offset(pgd, start)			(pgd)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 66e0977f970a..ad3b31c5bcc3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2088,35 +2088,35 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, p4d_t *p4d,
>  		NULL : pud_offset(p4d, address);
>  }
>  
> -static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
> +static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
>  				     unsigned long address,
>  				     pgtbl_mod_mask *mod_mask)
> -
>  {
> -	if (unlikely(pgd_none(*pgd))) {
> -		if (__p4d_alloc(mm, pgd, address))
> +	if (unlikely(p4d_none(*p4d))) {
> +		if (__pud_alloc(mm, p4d, address))
>  			return NULL;
> -		*mod_mask |= PGTBL_PGD_MODIFIED;
> +		*mod_mask |= PGTBL_P4D_MODIFIED;
>  	}
>  
> -	return p4d_offset(pgd, address);
> +	return pud_offset(p4d, address);
>  }
>  
> -#endif /* !__ARCH_HAS_5LEVEL_HACK */
> -
> -static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
> +static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
>  				     unsigned long address,
>  				     pgtbl_mod_mask *mod_mask)
> +
>  {
> -	if (unlikely(p4d_none(*p4d))) {
> -		if (__pud_alloc(mm, p4d, address))
> +	if (unlikely(pgd_none(*pgd))) {
> +		if (__p4d_alloc(mm, pgd, address))
>  			return NULL;
> -		*mod_mask |= PGTBL_P4D_MODIFIED;
> +		*mod_mask |= PGTBL_PGD_MODIFIED;
>  	}
>  
> -	return pud_offset(p4d, address);
> +	return p4d_offset(pgd, address);
>  }
>  
> +#endif /* !__ARCH_HAS_5LEVEL_HACK */
> +
>  static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
>  {
>  	return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: [RESEND PATCH v9 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-06-04 18:42 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Santosh Sivaraj, linux-nvdimm, Aneesh Kumar K . V, linux-kernel,
	Steven Rostedt, Oliver O'Halloran, Dan Williams, linuxppc-dev
In-Reply-To: <20200603224612.GJ1505637@iweiny-DESK2.sc.intel.com>


Ira Weiny <ira.weiny@intel.com> writes:

> On Wed, Jun 03, 2020 at 11:41:42PM +0530, Vaibhav Jain wrote:
>> Hi Ira,
>> 
>> Thanks for reviewing this patch. My responses below:
>> 
>> Ira Weiny <ira.weiny@intel.com> writes:
>> 
>
> ...
>
>> >> + *
>> >> + * Payload Version:
>> >> + *
>> >> + * A 'payload_version' field is present in PDSM header that indicates a specific
>> >> + * version of the structure present in PDSM Payload for a given PDSM command.
>> >> + * This provides backward compatibility in case the PDSM Payload structure
>> >> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>> >> + *
>> >> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>> >> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>> >> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>> >> + * uses 'payload struct version' == MIN('payload_version field',
>> >> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>> >> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>> >> + * struct in returned 'payload_version' field.
>> >> + *
>> >> + * Libndctl on receiving the envelope back from papr_scm again checks the
>> >> + * 'payload_version' field and based on it use the appropriate version dsm
>> >> + * struct to parse the results.
>> >> + *
>> >> + * Backward Compatibility:
>> >> + *
>> >> + * Above scheme of exchanging different versioned PDSM struct between libndctl
>> >> + * and papr_scm should provide backward compatibility until following two
>> >> + * assumptions/conditions when defining new PDSM structs hold:
>> >> + *
>> >> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
>> >> + *
>> >> + * 1. T(X) is a proper subset of T(Y) if Y > X.
>> >> + *    i.e Each new version of PDSM struct should retain existing struct
>> >> + *    attributes from previous version
>> >> + *
>> >> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
>> >> + *    it should also support T(1), T(2)...T(X - 1).
>> >> + *    i.e When adding support for new version of a PDSM struct, libndctl
>> >> + *    and papr_scm should retain support of the existing PDSM struct
>> >> + *    version they support.
>> >
>> > Please see this thread for an example why versions are a bad idea in UAPIs:
>> >
>> > https://lkml.org/lkml/2020/3/26/213
>> >
>> 
>> > While the use of version is different in that thread the fundamental issues are
>> > the same.  You end up with some weird matrix of supported features and
>> > structure definitions.  For example, you are opening up the possibility of
>> > changing structures with a different version for no good reason.
>> 
>> Not really sure I understand the statement correctly "you are opening up
>> the possibility of changing structures with a different version for no
>> good reason."
>
[..]
> What I mean is:
>
> struct v1 {
> 	u32 x;
> 	u32 y;
> };
>
> struct v2 {
> 	u32 y;
> 	u32 x;
> };
>
> x and y are the same data but you have now redefined the order of the struct.
> You don't need that flexibility/complexity.
>
> Generally I think you are defining:
>
> struct v1 {
> 	u32 x;
> 	u32 y;
> };
>
> struct v2 {
> 	u32 x;
> 	u32 y;
> 	u32 z;
> 	u32 a;
> };
>
> Which becomes 2 structures...  There is no need.
>
> The easiest thing to do is:
>
> struct user_data {
> 	u32 x;
> 	u32 y;
> };
>
> And later you modify user_data to:
>
> struct user_data {
> 	u32 x;
> 	u32 y;
> 	u32 z;
> 	u32 a;
> };
>
> libndctl always passes sizeof(struct user_data) to the call. [Do ensure
> structures are 64bit aligned for this to work.]
>
> The kernel sees the size and returns the amount of data up to that size.
>
> Therefore, older kernels automatically fill in x and y,  newer kernels fill in
> z/a if the buffer was big enough.  libndctl only uses the fields it knows about.
>
> It is _much_ easier this way.  Almost nothing needs to get changed as versions
> roll forward.  The only big issue is if libndctl _needs_ z then it has to check
> if z is returned.
>
> In that case add a cap_mask with bit fields which the kernel can fill in for
> which fields are valid.
>
> struct user_data {
> 	u64 cap_mask;  /* where bits define extra future capabilities */
> 	u32 x;
> 	u32 y;
> };
>
> IFF you need to add data within fields which are reserved you can use
> capability flags to indicate which fields are requested and which are returned
> by the kernel.
>
> But I _think_ for what you want libndctl must survive if z/a are not available
> right?  So just adding to the structure should be fine.
Agreed. But as I mentioned in my response to Dan's review comments [1], we
will be removing the version field altogether and instead will introduce
new psdm requests bound to new struct definitions in conjuntion to
nvdimm-flags. I have a patchset ready which I will be sending out soon.

[1] https://lore.kernel.org/linux-nvdimm/87h7vrgpzx.fsf@linux.ibm.com/
>
>> We want to return more data in the struct in future iterations. So
>> 'changing structure with different version' is something we are
>> expecting. 
>> 
>> With the backward compatibility constraints 1 & 2 above, it will ensure
>> that support matrix looks like a lower traingular matrix with each
>> successive version supporting previous version attributes. So supporting
>> future versions is relatively simplified.
>
> But you end up with weird switch/if's etc to deal with the multiple structures.
>
> With the size method the kernel simply returns the same size data as the user
> requested and everything is done.  No logic required at all.  Literally it can
> just copy the data it has (truncating if necessary).
>
Agreed. But with version field gone now we will instead use new psdm
requests bound to new struct definitions in conjuntion to nvdimm-flags
to retrive extended data from papr_scm.

>> 
>> >
>> > Also having the user query with version Z and get back version X (older) is
>> > odd.  Generally if the kernel does not know about a feature (ie version Z of
>> > the structure) it should return -EINVAL and let the user figure out what to do.
>> > The user may just give up or they could try a different query.
>> >
>> Considering the flow of ndctl/libndctl this is needed. libndctl will
>> usually issues only one CMD_CALL ioctl to kernel and if that fails then
>> an error is reported and ndctl will exit loosing state.
>> 
>> Adding mechanism in libndctl to reissue CMD_CALL ioctl to fetch a
>> appropriate version of pdsm struct is going to be considerably more
>> work.
>> 
>> This version fall-back mechanism, ensures that libndctl will receive
>> usable data without having to reissue a more CMD_CALL ioctls.
>
> Define usable?
>
> What happens if libndctl does not get 'z' in my example above?  What does it
> do?  If I understand correctly it does not _need_ z.  So why have a check on
> the version from the kernel?
>
> What if we change to:
>
> struct v3 {
> 	u32 x;
> 	u32 y;
> 	u32 z;
> 	u32 a;
> 	u32 b;
> 	u32 c;
> };
>
> Now it has to
>
> 	if(version 2)
> 		z/a valid do something()
>
> 	if(version 3)
> 		b/c valid do something else()
>
> if z, a, b, c are all 0 does it matter?
>
> If not, the logic above disappears.
>
> If so, then you need a cap mask.  Then the kernel can say c and a are valid
> (but c is 0) or other flexible stuff like that.
>
>> 
>> >> + */
>> >> +
>> >> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
>> >> +struct nd_pdsm_cmd_pkg {
>> >> +	struct nd_cmd_pkg hdr;	/* Package header containing sub-cmd */
>> >> +	__s32 cmd_status;	/* Out: Sub-cmd status returned back */
>> >> +	__u16 reserved[5];	/* Ignored and to be used in future */
>> >
>> > How do you know when reserved is used for something else in the future?  Is
>> > reserved guaranteed (and checked by the code) to be 0?
>> 
>> For current set of pdsm requests ignore these reserved fields. However a
>> future pdsm request can leverage these reserved fields. So papr_scm
>> just bind the usage of these fields with the value of
>> 'nd_cmd_pkg.nd_command' that indicates the pdsm request.
>> 
>> That being said checking if the reserved fields are set to 0 will be a
>> good measure. Will add this check in next iteration.
>
> Exactly, if you don't check them now you will end up with an older libndctl
> which passes in garbage and breaks future users...  Basically rendering the
> reserved fields useless.
I have addressed this in my new patch-series which adds checks for
reserved fields to be '0'

>
>> 
>> >
>> >> +	__u16 payload_version;	/* In/Out: version of the payload */
>> >
>> > Why is payload_version after reserved?
>> Want to place the payload version field just before the payload data so
>> that it can be accessed with simple pointer arithmetic.
>
> I did not see that in the patch.  I thought you were using
> nd_pdsm_cmd_pkg->payload_version?
Thats right, but it just provided another simple way to retrive
payload_version without resorting to 'container_of' macro. Anyways the
version field is now gone hence 'payload' follows the reserved fields.

>
>> 
>> >
>> >> +	__u8 payload[];		/* In/Out: Sub-cmd data buffer */
>> >> +} __packed;
>> >> +
>> >> +/*
>> >> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
>> >> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> >> + */
>> >> +enum papr_pdsm {
>> >> +	PAPR_PDSM_MIN = 0x0,
>> >> +	PAPR_PDSM_MAX,
>> >> +};
>> >> +
>> >> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> >> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
>> >> +{
>> >> +	return (struct nd_pdsm_cmd_pkg *) cmd;
>> >> +}
>> >> +
>> >> +/* Return the payload pointer for a given pcmd */
>> >> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> >> +{
>> >> +	if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> >> +		return NULL;
>> >> +	else
>> >> +		return (void *)(pcmd->payload);
>> >> +}
>> >> +
>> >> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> index 149431594839..5e2237e7ec08 100644
>> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> @@ -15,13 +15,15 @@
>> >>  #include <linux/seq_buf.h>
>> >>  
>> >>  #include <asm/plpar_wrappers.h>
>> >> +#include <asm/papr_pdsm.h>
>> >>  
>> >>  #define BIND_ANY_ADDR (~0ul)
>> >>  
>> >>  #define PAPR_SCM_DIMM_CMD_MASK \
>> >>  	((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>> >>  	 (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> >> -	 (1ul << ND_CMD_SET_CONFIG_DATA))
>> >> +	 (1ul << ND_CMD_SET_CONFIG_DATA) | \
>> >> +	 (1ul << ND_CMD_CALL))
>> >>  
>> >>  /* DIMM health bitmap bitmap indicators */
>> >>  /* SCM device is unable to persist memory contents */
>> >> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>> >>  	return 0;
>> >>  }
>> >>  
>> >> +/*
>> >> + * Validate the inputs args to dimm-control function and return '0' if valid.
>> >> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
>> >> + */
>> >> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> >> +		       unsigned int buf_len)
>> >> +{
>> >> +	unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
>> >> +	struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
>> >> +	struct papr_scm_priv *p;
>> >> +
>> >> +	/* Only dimm-specific calls are supported atm */
>> >> +	if (!nvdimm)
>> >> +		return -EINVAL;
>> >> +
>> >> +	/* get the provider date from struct nvdimm */
>> >
>> > s/date/data
>> Thanks for point this out. Will fix this in next iteration.
>> 
>> >
>> >> +	p = nvdimm_provider_data(nvdimm);
>> >> +
>> >> +	if (!test_bit(cmd, &cmd_mask)) {
>> >> +		dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
>> >> +		return -EINVAL;
>> >> +	} else if (cmd == ND_CMD_CALL) {
>> >> +
>> >> +		/* Verify the envelope package */
>> >> +		if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
>> >> +			dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
>> >> +				buf_len);
>> >> +			return -EINVAL;
>> >> +		}
>> >> +
>> >> +		/* Verify that the PDSM family is valid */
>> >> +		if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
>> >> +			dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
>> >> +				pkg->hdr.nd_family);
>> >> +			return -EINVAL;
>> >> +
>> >> +		}
>> >> +
>> >> +		/* We except a payload with all PDSM commands */
>> >> +		if (pdsm_cmd_to_payload(pkg) == NULL) {
>> >> +			dev_dbg(&p->pdev->dev,
>> >> +				"Empty payload for sub-command=0x%llx\n",
>> >> +				pkg->hdr.nd_command);
>> >> +			return -EINVAL;
>> >> +		}
>> >> +	}
>> >> +
>> >> +	/* Command looks valid */
>> >
>> > I assume the first command to be implemented also checks the { nd_command,
>> > payload_version, payload length } for correctness?
>> Yes the pdsm service functions do check the payload_version and
>> payload_length. Please see the papr_pdsm_health() that services the
>> PAPR_PDSM_HEALTH pdsm in Patch-5
>> 
>
> cool.
>
>> >
>> >> +	return 0;
>> >> +}
>> >> +
>> >> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> >> +				struct nd_pdsm_cmd_pkg *call_pkg)
>> >> +{
>> >> +	/* unknown subcommands return error in packages */
>> >> +	if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
>> >> +	    call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
>> >> +		dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
>> >> +			call_pkg->hdr.nd_command);
>> >> +		call_pkg->cmd_status = -EINVAL;
>> >> +		return 0;
>> >> +	}
>> >> +
>> >> +	/* Depending on the DSM command call appropriate service routine */
>> >> +	switch (call_pkg->hdr.nd_command) {
>> >> +	default:
>> >> +		dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> >> +			call_pkg->hdr.nd_command);
>> >> +		call_pkg->cmd_status = -ENOENT;
>> >> +		return 0;
>> >> +	}
>> >> +}
>> >> +
>> >>  static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> >>  			  struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> >>  			  unsigned int buf_len, int *cmd_rc)
>> >>  {
>> >>  	struct nd_cmd_get_config_size *get_size_hdr;
>> >>  	struct papr_scm_priv *p;
>> >> +	struct nd_pdsm_cmd_pkg *call_pkg = NULL;
>> >> +	int rc;
>> >>  
>> >> -	/* Only dimm-specific calls are supported atm */
>> >> -	if (!nvdimm)
>> >> -		return -EINVAL;
>> >> +	/* Use a local variable in case cmd_rc pointer is NULL */
>> >> +	if (cmd_rc == NULL)
>> >> +		cmd_rc = &rc;
>> >
>> > Why is this needed?  AFAICT The caller of papr_scm_ndctl does not specify null
>> > and you did not change it.
>> This pointer is coming from outside the papr_scm code hence need to be
>> defensive here. Also as per[1] cmd_rc is "translation of firmware status"
>> and not every caller would need it hence making this pointer optional.
>> 
>> This is evident in acpi_nfit_blk_get_flags() where the 'nd_desc->ndctl'
>> is called with 'cmd_rc == NULL'.
>> 
>> [1] https://lore.kernel.org/linux-nvdimm/CAPcyv4hE_FG0YZXJVA1G=CBq8b9e0K54jxk5Sq5UKU-dnWT2Kg@mail.gmail.com/
>
> Ah... Ok.  So this is a bug fix which needs to happen regardless of the status
> of this patch...
>
>> 
>> >
>> >> +
>> >> +	*cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
>> >> +	if (*cmd_rc) {
>> >> +		pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
>> >> +		return *cmd_rc;
>> >> +	}
>> >>  
>> >>  	p = nvdimm_provider_data(nvdimm);
>> >>  
>> >> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> >>  		*cmd_rc = papr_scm_meta_set(p, buf);
>
> ... Because this will break here. even without this new code...  right?
>
> Lets get this fix in as a prelim-patch.
Yes, I have moved the changes proposed in this hunk to a separate prelim
patch in the v10 of the patch series.

>
>> >>  		break;
>> >>  
>> >> +	case ND_CMD_CALL:
>> >> +		call_pkg = nd_to_pdsm_cmd_pkg(buf);
>> >> +		*cmd_rc = papr_scm_service_pdsm(p, call_pkg);
>> >> +		break;
>> >> +
>> >>  	default:
>> >> -		return -EINVAL;
>> >> +		dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
>> >> +		*cmd_rc = -EINVAL;
>> >
>> > Is this change related?  If there is a bug where there is a caller of
>> > papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
>> > that issue.
>> This simplifies a bit debugging of errors reported in
>> papr_scm_ndctl() as it ensures that subsequest dev_dbg "Returned with
>> cmd_rc" is always logged.
>> 
>> I think, this is a too small change to be carved out as an independent
>> patch. Also this doesnt change the behaviour of the code except logging
>> some more error info.
>> 
>> However, If you feel too strongly about it I will spin a separate patch
>> in this patch series for this.
[..]
>
> This can go in as part of a 'protect against cmd_rc == NULL'
> preliminary patch.
Yes, have coalesced this change with changes I reffered to in
my previous comment into a single prelim patch.

[..]
>
> I flagged this because at first I could not figure out what this had to do with
> the ND_CMD_CALL...
>
> For reviewers you want to make your patches concise to what you are
> fixing/adding...
Sure. Will be more careful of this in future patches.

>
> Also, based on acpi_nfit_blk_get_flags() using cmd_rc == NULL it looks like we
> have a bug which needs to get fixed regardless of the this patch.  And if that
> bug exists in earlier kernels you will need a separate patch to backport as a
> fix.
>
> So lets get that in first and separate...  :-D
Sure, will send out a separate independent patch fixing the cmd_rc ==
NULL issue in acpi_nfit_blk_get_flags addressing it to stable tree.

>
> Ira
>
>> 
>> >
>> > Ira
>> >
>> >>  	}
>> >>  
>> >>  	dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>> >>  
>> >> -	return 0;
>> >> +	return *cmd_rc;
>> >>  }
>> >>  
>> >>  static ssize_t flags_show(struct device *dev,
>> >> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
>> >> index de5d90212409..0e09dc5cec19 100644
>> >> --- a/include/uapi/linux/ndctl.h
>> >> +++ b/include/uapi/linux/ndctl.h
>> >> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
>> >>  #define NVDIMM_FAMILY_HPE2 2
>> >>  #define NVDIMM_FAMILY_MSFT 3
>> >>  #define NVDIMM_FAMILY_HYPERV 4
>> >> +#define NVDIMM_FAMILY_PAPR 5
>> >>  
>> >>  #define ND_IOCTL_CALL			_IOWR(ND_IOCTL, ND_CMD_CALL,\
>> >>  					struct nd_cmd_pkg)
>> >> -- 
>> >> 2.26.2
>> >> 
>> 
>> -- 
>> Cheers
>> ~ Vaibhav

-- 
Cheers
~ Vaibhav

^ permalink raw reply

* Re: [RESEND PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
From: Vaibhav Jain @ 2020-06-04 18:49 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Santosh Sivaraj, linux-nvdimm, Aneesh Kumar K . V, linux-kernel,
	Steven Rostedt, Oliver O'Halloran, Dan Williams, linuxppc-dev
In-Reply-To: <20200603231814.GK1505637@iweiny-DESK2.sc.intel.com>

Hi Ira,

Thanks again for looking into patch. My responses below:

Ira Weiny <ira.weiny@intel.com> writes:

> On Thu, Jun 04, 2020 at 12:34:04AM +0530, Vaibhav Jain wrote:
>> Hi Ira,
>> 
>> Thanks for reviewing this patch. My responses below:
>> 
>> Ira Weiny <ira.weiny@intel.com> writes:
>> 
>> > On Tue, Jun 02, 2020 at 03:44:38PM +0530, Vaibhav Jain wrote:
>> >> This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
>> >> that returns a newly introduced 'struct nd_papr_pdsm_health' instance
>> >> containing dimm health information back to user space in response to
>> >> ND_CMD_CALL. This functionality is implemented in newly introduced
>> >> papr_pdsm_health() that queries the nvdimm health information and
>> >> then copies this information to the package payload whose layout is
>> >> defined by 'struct nd_papr_pdsm_health'.
>> >> 
>> >> The patch also introduces a new member 'struct papr_scm_priv.health'
>> >> thats an instance of 'struct nd_papr_pdsm_health' to cache the health
>> >> information of a nvdimm. As a result functions drc_pmem_query_health()
>> >> and flags_show() are updated to populate and use this new struct
>> >> instead of a u64 integer that was earlier used.
>> >> 
>> >> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
>> >> Cc: Dan Williams <dan.j.williams@intel.com>
>> >> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> >> Cc: Ira Weiny <ira.weiny@intel.com>
>> >> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>> >> ---
>> >> Changelog:
>> >> 
>> >> Resend:
>> >> * Added ack from Aneesh.
>> >> 
>> >> v8..v9:
>> >> * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g  [ Dan , Aneesh ]
>> >> * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
>> >> * Renamed papr_scm_get_health() to papr_psdm_health()
>> >> * Updated patch description to replace papr-scm dimm with nvdimm.
>> >> 
>> >> v7..v8:
>> >> * None
>> >> 
>> >> Resend:
>> >> * None
>> >> 
>> >> v6..v7:
>> >> * Updated flags_show() to use seq_buf_printf(). [Mpe]
>> >> * Updated papr_scm_get_health() to use newly introduced
>> >>   __drc_pmem_query_health() bypassing the cache [Mpe].
>> >> 
>> >> v5..v6:
>> >> * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
>> >>   gaurd against possibility of different compilers adding different
>> >>   paddings to the struct [ Dan Williams ]
>> >> 
>> >> * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
>> >>   'bool' and also updated drc_pmem_query_health() to take this into
>> >>   account. [ Dan Williams ]
>> >> 
>> >> v4..v5:
>> >> * None
>> >> 
>> >> v3..v4:
>> >> * Call the DSM_PAPR_SCM_HEALTH service function from
>> >>   papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
>> >> 
>> >> v2..v3:
>> >> * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
>> >>   as its exported to the userspace [Aneesh]
>> >> * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
>> >>   from enum to #defines [Aneesh]
>> >> 
>> >> v1..v2:
>> >> * New patch in the series
>> >> ---
>> >>  arch/powerpc/include/uapi/asm/papr_pdsm.h |  39 +++++++
>> >>  arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
>> >>  2 files changed, 147 insertions(+), 17 deletions(-)
>> >> 
>> >> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> index 6407fefcc007..411725a91591 100644
>> >> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
>> >>   */
>> >>  enum papr_pdsm {
>> >>  	PAPR_PDSM_MIN = 0x0,
>> >> +	PAPR_PDSM_HEALTH,
>> >>  	PAPR_PDSM_MAX,
>> >>  };
>> >>  
>> >> @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> >>  		return (void *)(pcmd->payload);
>> >>  }
>> >>  
>> >> +/* Various nvdimm health indicators */
>> >> +#define PAPR_PDSM_DIMM_HEALTHY       0
>> >> +#define PAPR_PDSM_DIMM_UNHEALTHY     1
>> >> +#define PAPR_PDSM_DIMM_CRITICAL      2
>> >> +#define PAPR_PDSM_DIMM_FATAL         3
>> >> +
>> >> +/*
>> >> + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
>> >> + * Various flags indicate the health status of the dimm.
>> >> + *
>> >> + * dimm_unarmed		: Dimm not armed. So contents wont persist.
>> >> + * dimm_bad_shutdown	: Previous shutdown did not persist contents.
>> >> + * dimm_bad_restore	: Contents from previous shutdown werent restored.
>> >> + * dimm_scrubbed	: Contents of the dimm have been scrubbed.
>> >> + * dimm_locked		: Contents of the dimm cant be modified until CEC reboot
>> >> + * dimm_encrypted	: Contents of dimm are encrypted.
>> >> + * dimm_health		: Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
>> >> + */
>> >> +struct nd_papr_pdsm_health_v1 {
>> >> +	__u8 dimm_unarmed;
>> >> +	__u8 dimm_bad_shutdown;
>> >> +	__u8 dimm_bad_restore;
>> >> +	__u8 dimm_scrubbed;
>> >> +	__u8 dimm_locked;
>> >> +	__u8 dimm_encrypted;
>> >> +	__u16 dimm_health;
>> >> +} __packed;
>> >> +
>> >> +/*
>> >> + * Typedef the current struct for dimm_health so that any application
>> >> + * or kernel recompiled after introducing a new version automatically
>> >> + * supports the new version.
>> >> + */
>> >> +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
>> >> +
>> >> +/* Current version number for the dimm health struct */
>> >
>> > This can't be the 'current' version.  You will need a list of versions you
>> > support.  Because if the user passes in an old version you need to be able to
>> > respond with that old version.  Also if you plan to support 'return X for a Y
>> > query' then the user will need both X and Y defined to interpret X.
>> Yes, and that change will be introduced with addition of version-2 of
>> nd_papr_pdsm_health. Earlier version of the patchset[1] had such a table
>> implemented. But to simplify the patchset, as we are only dealing with
>> version-1 of the structs right now, it was dropped.
>> 
>> [1] :
>> https://lore.kernel.org/linuxppc-dev/20200220095805.197229-9-vaibhav@linux.ibm.com/
>
> I'm not sure I follow that comment.
>
> I feel like there is some confusion about what firmware can return vs the UAPI
> structure.  You have already marshaled the data between the 2.  We can define
> whatever we want for the UAPI structures throwing away data the kernel does not
> understand from the firmware.
>
>> 
>> >
>> >> +#define ND_PAPR_PDSM_HEALTH_VERSION 1
>> >> +
>> >>  #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> index 5e2237e7ec08..c0606c0c659c 100644
>> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> @@ -88,7 +88,7 @@ struct papr_scm_priv {
>> >>  	unsigned long lasthealth_jiffies;
>> >>  
>> >>  	/* Health information for the dimm */
>> >> -	u64 health_bitmap;
>> >> +	struct nd_papr_pdsm_health health;
>> >
>> > ok so we are throwing away all the #defs from patch 1?  Are they still valid?
>> >
>> > I'm confused that patch 3 added this and we are throwing it away
>> > here...
>> The #defines are still valid, only the usage moved to a __drc_pmem_query_health().
>> 
>> >
>> >>  };
>> >>  
>> >>  static int drc_pmem_bind(struct papr_scm_priv *p)
>> >> @@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>> >>  static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> >>  {
>> >>  	unsigned long ret[PLPAR_HCALL_BUFSIZE];
>> >> +	u64 health;
>> >>  	long rc;
>> >>  
>> >>  	/* issue the hcall */
>> >> @@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> >>  	if (rc != H_SUCCESS) {
>> >>  		dev_err(&p->pdev->dev,
>> >>  			 "Failed to query health information, Err:%ld\n", rc);
>> >> -		rc = -ENXIO;
>> >> -		goto out;
>> >> +		return -ENXIO;
>> >
>> > I missed this...  probably did not need the goto in the first patch?
>> Yes, will get rid of the goto from patch-1.
>
> Cool.
>
>> 
>> >
>> >>  	}
>> >>  
>> >>  	p->lasthealth_jiffies = jiffies;
>> >> -	p->health_bitmap = ret[0] & ret[1];
>> >> +	health = ret[0] & ret[1];
>> >>  
>> >>  	dev_dbg(&p->pdev->dev,
>> >>  		"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
>> >>  		ret[0], ret[1]);
>> >> -out:
>> >> -	return rc;
>> >> +
>> >> +	memset(&p->health, 0, sizeof(p->health));
>> >> +
>> >> +	/* Check for various masks in bitmap and set the buffer */
>> >> +	if (health & PAPR_PMEM_UNARMED_MASK)
>> >
>> > Oh ok...  odd.  (don't add code then just take it away in a series)
>> > You could have lead with the user structure and put this code in patch
>> > 3.
>> The struct nd_papr_pdsm_health in only introduced this patch in header
>> 'papr_pdsm.h' as means of exchanging nvdimm health information with
>> userspace. Introducing this struct without introducing the necessary
>> scafolding in 'papr_pdsm.h' would have been very counter-intutive.
>
> I respectfully disagree.  You intended to use a copy of this structure in
> kernel to store the data.  Just do that.
Have addressed this in v10 that doesnt resort to removing the
functionality that was introduced in an earlier patch.

>
>> 
>> >
>> > Why does the user need u8 to represent a single bit?  Does this help protect
>> > against endian issues?
>> This was 'bool' earlier but since type 'bool' isnt suitable for ioctl abi
>> and I wanted to avoid bit fields here as not sure if their packing may
>> differ across compilers hence replaced with u8.
>> 
>
> ok works for me...
>
>> >
>> >> +		p->health.dimm_unarmed = 1;
>> >> +
>> >> +	if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
>> >> +		p->health.dimm_bad_shutdown = 1;
>> >> +
>> >> +	if (health & PAPR_PMEM_BAD_RESTORE_MASK)
>> >> +		p->health.dimm_bad_restore = 1;
>> >> +
>> >> +	if (health & PAPR_PMEM_ENCRYPTED)
>> >> +		p->health.dimm_encrypted = 1;
>> >> +
>> >> +	if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
>> >> +		p->health.dimm_locked = 1;
>> >> +		p->health.dimm_scrubbed = 1;
>> >> +	}
>> >> +
>> >> +	if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
>> >> +		p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
>> >> +
>> >> +	if (health & PAPR_PMEM_HEALTH_CRITICAL)
>> >> +		p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
>> >> +
>> >> +	if (health & PAPR_PMEM_HEALTH_FATAL)
>> >> +		p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
>> >> +
>> >> +	return 0;
>> >>  }
>> >>  
>> >>  /* Min interval in seconds for assuming stable dimm health */
>> >> @@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> >>  	return 0;
>> >>  }
>> >>  
>> >> +/* Fetch the DIMM health info and populate it in provided package. */
>> >> +static int papr_pdsm_health(struct papr_scm_priv *p,
>> >> +			       struct nd_pdsm_cmd_pkg *pkg)
>> >> +{
>> >> +	int rc;
>> >> +	size_t copysize = sizeof(p->health);
>> >> +
>> >> +	/* Ensure dimm health mutex is taken preventing concurrent access */
>> >> +	rc = mutex_lock_interruptible(&p->health_mutex);
>> >> +	if (rc)
>> >> +		goto out;
>> >> +
>> >> +	/* Always fetch upto date dimm health data ignoring cached values */
>> >> +	rc = __drc_pmem_query_health(p);
>> >> +	if (rc)
>> >> +		goto out_unlock;
>> >> +	/*
>> >> +	 * If the requested payload version is greater than one we know
>> >> +	 * about, return the payload version we know about and let
>> >> +	 * caller/userspace handle.
>> >> +	 */
>> >> +	if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
>> >> +		pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
>> >
>> > I know this seems easy now but I do think you will run into trouble later.
>> 
>> I did addressed this in an earlier iteration of this patchset[1] and
>> dropped it in favour of simplicity.
>> 
>> [1] :
>> https://lore.kernel.org/linuxppc-dev/20200220095805.197229-9-vaibhav@linux.ibm.com/
>  
> I don't see how that addresses this?  See my other email.
>
> Ira
>
>> 
>> > Ira
>> >
>> >> +
>> >> +	if (pkg->hdr.nd_size_out < copysize) {
>> >> +		dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
>> >> +			pkg->hdr.nd_size_out, copysize);
>> >> +		rc = -ENOSPC;
>> >> +		goto out_unlock;
>> >> +	}
>> >> +
>> >> +	dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
>> >> +		copysize, pkg->payload_version);
>> >> +
>> >> +	/* Copy the health struct to the payload */
>> >> +	memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
>> >> +	pkg->hdr.nd_fw_size = copysize;
>> >> +
>> >> +out_unlock:
>> >> +	mutex_unlock(&p->health_mutex);
>> >> +
>> >> +out:
>> >> +	/*
>> >> +	 * Put the error in out package and return success from function
>> >> +	 * so that errors if any are propogated back to userspace.
>> >> +	 */
>> >> +	pkg->cmd_status = rc;
>> >> +	dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
>> >> +
>> >> +	return 0;
>> >> +}
>> >> +
>> >>  static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> >>  				struct nd_pdsm_cmd_pkg *call_pkg)
>> >>  {
>> >> @@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> >>  
>> >>  	/* Depending on the DSM command call appropriate service routine */
>> >>  	switch (call_pkg->hdr.nd_command) {
>> >> +	case PAPR_PDSM_HEALTH:
>> >> +		return papr_pdsm_health(p, call_pkg);
>> >> +
>> >>  	default:
>> >>  		dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> >>  			call_pkg->hdr.nd_command);
>> >> @@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
>> >>  	struct nvdimm *dimm = to_nvdimm(dev);
>> >>  	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>> >>  	struct seq_buf s;
>> >> -	u64 health;
>> >>  	int rc;
>> >>  
>> >>  	rc = drc_pmem_query_health(p);
>> >>  	if (rc)
>> >>  		return rc;
>> >>  
>> >> -	/* Copy health_bitmap locally, check masks & update out buffer */
>> >> -	health = READ_ONCE(p->health_bitmap);
>> >> -
>> >>  	seq_buf_init(&s, buf, PAGE_SIZE);
>> >> -	if (health & PAPR_PMEM_UNARMED_MASK)
>> >> +
>> >> +	/* Protect concurrent modifications to papr_scm_priv */
>> >> +	rc = mutex_lock_interruptible(&p->health_mutex);
>> >> +	if (rc)
>> >> +		return rc;
>> >> +
>> >> +	if (p->health.dimm_unarmed)
>> >>  		seq_buf_printf(&s, "not_armed ");
>> >>  
>> >> -	if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
>> >> +	if (p->health.dimm_bad_shutdown)
>> >>  		seq_buf_printf(&s, "flush_fail ");
>> >>  
>> >> -	if (health & PAPR_PMEM_BAD_RESTORE_MASK)
>> >> +	if (p->health.dimm_bad_restore)
>> >>  		seq_buf_printf(&s, "restore_fail ");
>> >>  
>> >> -	if (health & PAPR_PMEM_ENCRYPTED)
>> >> +	if (p->health.dimm_encrypted)
>> >>  		seq_buf_printf(&s, "encrypted ");
>> >>  
>> >> -	if (health & PAPR_PMEM_SMART_EVENT_MASK)
>> >> +	if (p->health.dimm_health)
>> >>  		seq_buf_printf(&s, "smart_notify ");
>> >>  
>> >> -	if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
>> >> -		seq_buf_printf(&s, "scrubbed locked ");
>> >> +	if (p->health.dimm_scrubbed)
>> >> +		seq_buf_printf(&s, "scrubbed ");
>> >> +
>> >> +	if (p->health.dimm_locked)
>> >> +		seq_buf_printf(&s, "locked ");
>> >> +
>> >> +	mutex_unlock(&p->health_mutex);
>> >>  
>> >>  	if (seq_buf_used(&s))
>> >>  		seq_buf_printf(&s, "\n");
>> >> -- 
>> >> 2.26.2
>> >> 
>> 
>> -- 
>> Cheers
>> ~ Vaibhav

-- 
Cheers
~ Vaibhav

^ permalink raw reply

* [PATCH v4] powerpc/fadump: fix race between pstore write and fadump crash trigger
From: Sourabh Jain @ 2020-06-04 19:33 UTC (permalink / raw)
  To: mpe; +Cc: mahesh, linux-kernel, hbathini, linuxppc-dev

When we enter into fadump crash path via system reset we fail to update
the pstore.

On the system reset path we first update the pstore then we go for fadump
crash. But the problem here is when all the CPUs try to get the pstore
lock to initiate the pstore write, only one CPUs will acquire the lock
and proceed with the pstore write. Since it in NMI context CPUs that fail
to get lock do not wait for their turn to write to the pstore and simply
proceed with the next operation which is fadump crash. One of the CPU who
proceeded with fadump crash path triggers the crash and does not wait for
the CPU who gets the pstore lock to complete the pstore update.

Timeline diagram to depicts the sequence of events that leads to an
unsuccessful pstore update when we hit fadump crash path via system reset.

                 1    2     3    ...      n   CPU Threads
                 |    |     |             |
                 |    |     |             |
 Reached to   -->|--->|---->| ----------->|
 system reset    |    |     |             |
 path            |    |     |             |
                 |    |     |             |
 Try to       -->|--->|---->|------------>|
 acquire the     |    |     |             |
 pstore lock     |    |     |             |
                 |    |     |             |
                 |    |     |             |
 Got the      -->| +->|     |             |<-+
 pstore lock     | |  |     |             |  |-->  Didn't get the
                 | --------------------------+     lock and moving
                 |    |     |             |        ahead on fadump
                 |    |     |             |        crash path
                 |    |     |             |
  Begins the  -->|    |     |             |
  process to     |    |     |             |<-- Got the chance to
  update the     |    |     |             |    trigger the crash
  pstore         | -> |     |    ... <-   |
                 | |  |     |         |   |
                 | |  |     |         |   |<-- Triggers the
                 | |  |     |         |   |    crash
                 | |  |     |         |   |      ^
                 | |  |     |         |   |      |
  Writing to  -->| |  |     |         |   |      |
  pstore         | |  |     |         |   |      |
                   |                  |          |
       ^           |__________________|          |
       |               CPU Relax                 |
       |                                         |
       +-----------------------------------------+
                          |
                          v
            Race: crash triggered before pstore
                  update completes

To avoid this race condition a barrier is added on crash_fadump path, it
prevents the CPU to trigger the crash until all the online CPUs completes
their task.

A barrier is added to make sure all the secondary CPUs hit the
crash_fadump function before we initiates the crash. A timeout is kept to
ensure the primary CPU (one who initiates the crash) do not wait for
secondary CPUs indefinitely.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
---
 arch/powerpc/kernel/fadump.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

 ---
Chanagelog:

v1 -> v3:
   - https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208267.html

v3 -> v4:

   - Now the primary CPU (one who triggers dump) waits for all secondary
     CPUs to enter and then initiates the crash.

 ---

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 59e60a9a9f5c..4953f3246220 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -32,6 +32,14 @@
 #include <asm/fadump-internal.h>
 #include <asm/setup.h>
 
+/*
+ * The CPU who acquired the lock to trigger the fadump crash should
+ * wait for other CPUs to enter.
+ *
+ * The timeout is in milliseconds.
+ */
+#define CRASH_TIMEOUT		500
+
 static struct fw_dump fw_dump;
 
 static void __init fadump_reserve_crash_area(u64 base);
@@ -46,6 +54,8 @@ struct fadump_mrange_info reserved_mrange_info = { "reserved", NULL, 0, 0, 0 };
 #ifdef CONFIG_CMA
 static struct cma *fadump_cma;
 
+static atomic_t cpus_in_crash;
+
 /*
  * fadump_cma_init() - Initialize CMA area from a fadump reserved memory
  *
@@ -596,8 +606,10 @@ early_param("fadump_reserve_mem", early_fadump_reserve_mem);
 
 void crash_fadump(struct pt_regs *regs, const char *str)
 {
+	unsigned int msecs;
 	struct fadump_crash_info_header *fdh = NULL;
 	int old_cpu, this_cpu;
+	unsigned int ncpus = num_online_cpus() - 1; /* Do not include first CPU */
 
 	if (!should_fadump_crash())
 		return;
@@ -613,6 +625,8 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 	old_cpu = cmpxchg(&crashing_cpu, -1, this_cpu);
 
 	if (old_cpu != -1) {
+		atomic_inc(&cpus_in_crash);
+
 		/*
 		 * We can't loop here indefinitely. Wait as long as fadump
 		 * is in force. If we race with fadump un-registration this
@@ -636,6 +650,16 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 
 	fdh->online_mask = *cpu_online_mask;
 
+	/*
+	 * If we came in via system reset, wait a while for the secondary
+	 * CPUs to enter.
+	 */
+	if (TRAP(&(fdh->regs)) == 0x100) {
+		msecs = CRASH_TIMEOUT;
+		while ((atomic_read(&cpus_in_crash) < ncpus) && (--msecs > 0))
+			mdelay(1);
+	}
+
 	fw_dump.ops->fadump_trigger(fdh, str);
 }
 
-- 
2.25.4


^ permalink raw reply related

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Rich Felker @ 2020-06-04 19:37 UTC (permalink / raw)
  To: David Edelsohn
  Cc: GNU C Library, eery, Daniel Kolesa, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <CAGWvnymiNUWSMeGNDvJk_uc=8OdS1Y_8uE=cd0s2BKCFFe7Dkg@mail.gmail.com>

On Thu, Jun 04, 2020 at 03:00:51PM -0400, David Edelsohn wrote:
> On Thu, Jun 4, 2020 at 1:46 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Thu, Jun 04, 2020 at 12:33:12PM -0500, Segher Boessenkool wrote:
> > > On Thu, Jun 04, 2020 at 01:18:44PM -0400, Rich Felker wrote:
> > > > On Thu, Jun 04, 2020 at 12:12:32PM -0500, Segher Boessenkool wrote:
> > > > > On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > > > > > well, ppc64le already cannot be run on those, as far as I know (I
> > > > > > don't think it's possible to build ppc64le userland without VSX in
> > > > > > any configuration)
> > > > >
> > > > > VSX is required by the ELFv2 ABI:
> > > > >
> > > > > """
> > > > > Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
> > > > > compliant processors must implement the following categories:
> > > >
> > > > This is not actually ABI but IBM policy laundered into an ABI
> > > > document, which musl does not honor.
> > >
> > > It is the ABI.  If you think it should be different, make your own ABI,
> > > don't pretend the existing ABI is different than what it is.  Thank you.
> >
> > Our ABI is as specified in the ELFv2 document, but with ld as ld64,
> > and minus gratuitous requirements on ISA level that are not part of
> > implementing linkage.
> 
> Rich,
> 
> If you are changing the Power ELFv2 ABI then it is not the Power ELFv2
> ABI.  You can't cherry-pick what you like and claim that it is
> compatible.  You are not conforming to the ABI.

You are aware of this and you are aware that I don't care about your
opinion on the matter, and that I don't appreciate your ongoing
harassment of users of musl who run on pre-POWER8 hardware that IBM
does not approve them using.

Rich

^ permalink raw reply

* Re: [PATCH] mm: Fix pud_alloc_track()
From: Andrew Morton @ 2020-06-04 20:36 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arch, Stephen Rothwell, jroedel, linux-mm, peterz,
	linuxppc-dev, Joerg Roedel, linux-kernel, Steven Rostedt,
	Abdul Haleem, linux-next, Satheesh Rajendran, Andy Lutomirski,
	manvanth, hch
In-Reply-To: <20200604164814.GA7600@kernel.org>

On Thu, 4 Jun 2020 19:48:14 +0300 Mike Rapoport <rppt@kernel.org> wrote:

> On Thu, Jun 04, 2020 at 09:44:46AM +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <jroedel@suse.de>
> > 
> > The pud_alloc_track() needs to do different checks based on whether
> > __ARCH_HAS_5LEVEL_HACK is defined, like it already does in
> > pud_alloc(). Otherwise it causes boot failures on PowerPC.
> > 
> > Provide the correct implementations for both possible settings of
> > __ARCH_HAS_5LEVEL_HACK to fix the boot problems.
> 
> There is a patch in mmotm [1] that completely removes
> __ARCH_HAS_5LEVEL_HACK which is a part of the series [2] that updates
> p4d folding accross architectures. This should fix boot on PowerPC and
> the addition of pXd_alloc_track() for __ARCH_HAS_5LEVEL_HACK wouldn't be
> necessary.
> 
> 
> [1] https://github.com/hnaz/linux-mm/commit/cfae68792af3731ac902ea6ba5ed8df5a0f6bd2f
> [2] https://lore.kernel.org/kvmarm/20200414153455.21744-1-rppt@kernel.org/

That patchset is stacked up behind many other patches, including all
the powerpc stuff in linux-next :(

As it's a big bug fix, I'll pull those patches forward, hopefully send it
all Linuswards later today...

^ permalink raw reply

* Re: [RFC][PATCH v3 1/5] sparc64: Fix asm/percpu.h build error
From: Peter Zijlstra @ 2020-06-04 20:41 UTC (permalink / raw)
  To: David Miller
  Cc: linux-s390, linuxppc-dev, bigeasy, x86, heiko.carstens,
	linux-kernel, rostedt, a.darwish, sparclinux, tglx, will, mingo
In-Reply-To: <20200604165703.GG3976@hirez.programming.kicks-ass.net>

On Thu, Jun 04, 2020 at 06:57:03PM +0200, Peter Zijlstra wrote:

> I think I see, what happens is that these headers end up in the VDSO
> build, and that doesn't have these CFLAGS, because userspace.
> 
> Let me see what to do about that.

I feel like the below is cheating, but it's the best I could find :/
VDSO including kernel headers and the utter maze that our kernel headers
are makes it really hard to untangle :/

This builds sparc64-defconfig and sparc64-all{no,mod}config.

Dave, does this work for you, or should I try hardder?

---
 arch/sparc/include/asm/percpu_64.h  | 2 ++
 arch/sparc/include/asm/trap_block.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/sparc/include/asm/percpu_64.h b/arch/sparc/include/asm/percpu_64.h
index 32ef6f05cc565..a8786a4b90b6b 100644
--- a/arch/sparc/include/asm/percpu_64.h
+++ b/arch/sparc/include/asm/percpu_64.h
@@ -4,7 +4,9 @@

 #include <linux/compiler.h>

+#ifndef BUILD_VDSO
 register unsigned long __local_per_cpu_offset asm("g5");
+#endif

 #ifdef CONFIG_SMP

diff --git a/arch/sparc/include/asm/trap_block.h b/arch/sparc/include/asm/trap_block.h
index 0f6d0c4f66838..ace0d48e837e5 100644
--- a/arch/sparc/include/asm/trap_block.h
+++ b/arch/sparc/include/asm/trap_block.h
@@ -2,6 +2,8 @@
 #ifndef _SPARC_TRAP_BLOCK_H
 #define _SPARC_TRAP_BLOCK_H

+#include <linux/threads.h>
+
 #include <asm/hypervisor.h>
 #include <asm/asi.h>



^ permalink raw reply related

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Daniel Kolesa @ 2020-06-04 20:39 UTC (permalink / raw)
  To: Segher Boessenkool, Rich Felker
  Cc: libc-alpha, eery, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <20200604173312.GI31009@gate.crashing.org>

On Thu, Jun 4, 2020, at 19:33, Segher Boessenkool wrote:
> On Thu, Jun 04, 2020 at 01:18:44PM -0400, Rich Felker wrote:
> > On Thu, Jun 04, 2020 at 12:12:32PM -0500, Segher Boessenkool wrote:
> > > On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > > > well, ppc64le already cannot be run on those, as far as I know (I
> > > > don't think it's possible to build ppc64le userland without VSX in
> > > > any configuration)
> > > 
> > > VSX is required by the ELFv2 ABI:
> > > 
> > > """
> > > Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
> > > compliant processors must implement the following categories:
> > 
> > This is not actually ABI but IBM policy laundered into an ABI
> > document, which musl does not honor.
> 
> It is the ABI.  If you think it should be different, make your own ABI,
> don't pretend the existing ABI is different than what it is.  Thank you.
> 

Well then - in that case, what do you suggest that I do?

Void currently ships an ELFv2 (or apparently not, I guess) 64-bit big endian port that works on 970/G5 up. It is important to me that it stays that way (a large amount of users are running 970s, so introducing a VSX dependency means I might as well abandon the port entirely).

It currently works out of box - there are no changes required in glibc, and nearly the entire userland builds and works (about ~11500 out of ~12000 software packages, those that don't work either don't work on ppc64le either, or have issues related to endianness, or some other unrelated reason).

I'd like to eventually get this into a state where I don't have to worry about glibc arbitrarily breaking it - which means it would be necessary to stabilize it upstream. While I can probably maintain a downstream patchset when it comes to it, I'd much prefer if I didn't have to - but this sounds like an official ELFv2 glibc BE port would be impossible unless the VSX requirement (and thus IEEE 128-bit long double and so on) was in place, which would defeat the point of the port.

Is there *any* way I can take that would make upstreams of all parts of the toolchain happy? I explicitly don't want to go back to ELFv1. While at it, I'd like to transition to ld64 long double format, to match musl and improve software compatibility, which I feel will raise more objections from IBM side.

> 
> Segher
>

Daniel

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Ira Weiny @ 2020-06-04 20:51 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	Guenter Roeck, linux-xtensa, Borislav Petkov, Al Viro,
	Andy Lutomirski, Thomas Gleixner, linux-arm-kernel, Chris Zankel,
	Thomas Bogendoerfer, linux-parisc, linux-kernel, Christian Koenig,
	Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <20200604094133.GC202650@kernel.org>

On Thu, Jun 04, 2020 at 12:41:33PM +0300, Mike Rapoport wrote:
> On Wed, Jun 03, 2020 at 04:44:17PM -0700, Guenter Roeck wrote:
> > 
> > sparc32 smp images in next-20200603 still crash for me with a spinlock
> > recursion. s390 images hang early in boot. Several others (alpha, arm64,
> > various ppc) don't even compile. I can run some more bisects over time,
> > but this is becoming a full-time job :-(.
> 
> I've been able to bisect s390 hang to commit b614345f52bc ("x86/entry:
> Clarify irq_{enter,exit}_rcu()").
> 
> After this commit, lockdep_hardirq_exit() is called twice on s390 (and
> others) - one time in irq_exit_rcu() and another one in irq_exit():
> 
> /**
>  * irq_exit_rcu() - Exit an interrupt context without updating RCU
>  *
>  * Also processes softirqs if needed and possible.
>  */
> void irq_exit_rcu(void)
> {
> 	__irq_exit_rcu();
> 	 /* must be last! */
> 	lockdep_hardirq_exit();
> }
> 
> /**
>  * irq_exit - Exit an interrupt context, update RCU and lockdep
>  *
>  * Also processes softirqs if needed and possible.
>  */
> void irq_exit(void)
> {
> 	irq_exit_rcu();
> 	rcu_irq_exit();
> 	 /* must be last! */
> 	lockdep_hardirq_exit();
> }
> 
> Removing the call in irq_exit() make s390 boot again, and judgung by the
> x86 entry code, the comment /* must be last! */ is stale...

FWIW I got s390 to compile and this patch fixes s390 booting for me as well.

13:05:25 > /home/iweiny/dev/linux-build-test/rootfs/s390/run-qemu-s390.sh 
Build reference: next-20200603-4-g840714292d8c

Building s390:defconfig:initrd ... running ........... passed
Building s390:defconfig:virtio-blk-ccw:rootfs ... running ........... passed
Building s390:defconfig:scsi[virtio-ccw]:rootfs ... running ..............  passed
Building s390:defconfig:virtio-pci:rootfs ... running ........... passed
Building s390:defconfig:scsi[virtio-pci]:rootfs ... running ........... passed

Ira

> 
> @Peter, @Thomas, can you comment please?
> 
> From e51d50ee6f4d1f446decf91c2c67230da14ff82c Mon Sep 17 00:00:00 2001
> From: Mike Rapoport <rppt@linux.ibm.com>
> Date: Thu, 4 Jun 2020 12:37:03 +0300
> Subject: [PATCH] softirq: don't call lockdep_hardirq_exit() twice
> 
> After commit b614345f52bc ("x86/entry: Clarify irq_{enter,exit}_rcu()")
> lockdep_hardirq_exit() is called twice on every architecture that uses
> irq_exit(): one time in irq_exit_rcu() and another one in irq_exit().
> 
> Remove the extra call in irq_exit().
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  kernel/softirq.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index a3eb6eba8c41..7523f4ce4c1d 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -427,7 +427,6 @@ static inline void __irq_exit_rcu(void)
>  void irq_exit_rcu(void)
>  {
>  	__irq_exit_rcu();
> -	 /* must be last! */
>  	lockdep_hardirq_exit();
>  }
>  
> @@ -440,8 +439,6 @@ void irq_exit(void)
>  {
>  	irq_exit_rcu();
>  	rcu_irq_exit();
> -	 /* must be last! */
> -	lockdep_hardirq_exit();
>  }
>  
>  /*
> -- 
> 2.26.2
> 
> 
> 
> > Guenter
> 
> -- 
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v3 2/7] documentation for stats_fs
From: Emanuele Giuseppe Esposito @ 2020-06-04 15:34 UTC (permalink / raw)
  To: Randy Dunlap, Emanuele Giuseppe Esposito, kvm
  Cc: linux-s390, linux-doc, netdev, linux-kernel, kvm-ppc,
	Jonathan Adams, Christian Borntraeger, Alexander Viro,
	David Rientjes, linux-fsdevel, Paolo Bonzini, linux-mips,
	linuxppc-dev, linux-arm-kernel, Jim Mattson
In-Reply-To: <c9ddaed1-0efc-650b-6a51-ad5fc431af69@infradead.org>

Hi,

>> +
>> +The STATS_FS_HIDDEN attribute won't affect the aggregation, it will only
>> +block the creation of the files.
> 
> Why does HIDDEN block the creation of files?  instead of their visibility?

The file itself is used to allow the user to view the content of a 
value. In order to make it hidden, the framework just doesn't create the 
file.
The structure is still present and considered in statsfs, however.

Hidden in this case means not visible at all thus not created, not the 
hidden file concept of dotted files (".filename")

> 
>> +
>> +Add values to parent and child (also here order doesn't matter)::
>> +
>> +        struct kvm *base_ptr = kmalloc(..., sizeof(struct kvm));
>> +        ...
>> +        stats_fs_source_add_values(child_source, kvm_stats, base_ptr, 0);
>> +        stats_fs_source_add_values(parent_source, kvm_stats, NULL, STATS_FS_HIDDEN);
>> +
>> +``child_source`` will be a simple value, since it has a non-NULL base
>> +pointer, while ``parent_source`` will be an aggregate. During the adding
>> +phase, also values can optionally be marked as hidden, so that the folder
>> +and other values can be still shown.
>> +
>> +Of course the same ``struct stats_fs_value`` array can be also passed with a
>> +different base pointer, to represent the same value but in another instance
>> +of the kvm struct.
>> +
>> +Search:
>> +
>> +Fetch a value from the child source, returning the value
>> +pointed by ``(uint64_t *) base_ptr + kvm_stats[0].offset``::
>> +
>> +        uint64_t ret_child, ret_parent;
>> +
>> +        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
>> +
>> +Fetch an aggregate value, searching all subsources of ``parent_source`` for
>> +the specified ``struct stats_fs_value``::
>> +
>> +        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
>> +
>> +        assert(ret_child == ret_parent); // check expected result
>> +
>> +To make it more interesting, add another child::
>> +
>> +        struct stats_fs_source child_source2 = stats_fs_source_create(0, "child2");
>> +
>> +        stats_fs_source_add_subordinate(parent_source, child_source2);
>> +        // now  the structure is parent -> child1
>> +        //                              -> child2
> 
> Is that the same as                 parent -> child1 -> child2
> ?  It could almost be read as
>                                      parent -> child1
>                                      parent -> child2

No the example in the documentation shows the relationship
parent -> child1 and
parent -> child2.
It's not the same as
parent -> child1 -> child2.
In order to do the latter, one would need to do:

stats_fs_source_add_subordinate(parent_source, child_source1);
stats_fs_source_add_subordinate(child_source1, child_source2);

Hope that this clarifies it.

> 
> Whichever it is, can you make it more explicit, please?
> 
> 
>> +
>> +        struct kvm *other_base_ptr = kmalloc(..., sizeof(struct kvm));
>> +        ...
>> +        stats_fs_source_add_values(child_source2, kvm_stats, other_base_ptr, 0);
>> +
>> +Note that other_base_ptr points to another instance of kvm, so the struct
>> +stats_fs_value is the same but the address at which they point is not.
>> +
>> +Now get the aggregate value::
>> +
>> +        uint64_t ret_child, ret_child2, ret_parent;
>> +
>> +        stats_fs_source_get_value(child_source, &kvm_stats[0], &ret_child);
>> +        stats_fs_source_get_value(parent_source, &kvm_stats[0], &ret_parent);
>> +        stats_fs_source_get_value(child_source2, &kvm_stats[0], &ret_child2);
>> +
>> +        assert((ret_child + ret_child2) == ret_parent);
>> +
>> +Cleanup::
>> +
>> +        stats_fs_source_remove_subordinate(parent_source, child_source);
>> +        stats_fs_source_revoke(child_source);
>> +        stats_fs_source_put(child_source);
>> +
>> +        stats_fs_source_remove_subordinate(parent_source, child_source2);
>> +        stats_fs_source_revoke(child_source2);
>> +        stats_fs_source_put(child_source2);
>> +
>> +        stats_fs_source_put(parent_source);
>> +        kfree(other_base_ptr);
>> +        kfree(base_ptr);
>> +
>> +Calling stats_fs_source_revoke is very important, because it will ensure
> 
>             stats_fs_source_revoke()
> 
>> +that stats_fs will not access the data that were passed to
>> +stats_fs_source_add_value for this source.
>> +
>> +Because open files increase the reference count for a stats_fs_source, the
>> +source can end up living longer than the data that provides the values for
>> +the source.  Calling stats_fs_source_revoke just before the backing data
> 
>                          stats_fs_source_revoke()
> 
>> +is freed avoids accesses to freed data structures. The sources will return
>> +0.
>> +
>> +This is not needed for the parent_source, since it just contains
>> +aggregates that would be 0 anyways if no matching child value exist.
>> +
>> +API Documentation
>> +=================
>> +
>> +.. kernel-doc:: include/linux/stats_fs.h
>> +   :export: fs/stats_fs/*.c
>> \ No newline at end of file
> 
> Please fix that. ^^^^^
> 
> 
> Thanks for the documentation.
> 

Thank you for the feedback,
Emanuele

^ permalink raw reply

* Re: Boot issue with the latest Git kernel
From: Christian Zigotzky @ 2020-06-04 15:46 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Darren Stevens, linuxppc-dev, Christoph Hellwig, R.T.Dickinson
In-Reply-To: <f7f1b233-6101-2316-7996-4654586b7d24@csgroup.eu>

[-- Attachment #1: Type: text/plain, Size: 1111 bytes --]

Hello Christophe,

I tested it on my Nemo board with a P.A. Semi PA6T CPU [1], on my Cyrus+ board with a FSL P5040 CPU [2] and in a virtual e5500 QEMU machine. You can find the kernel configs in the following package.

Link: http://www.xenosoft.de/linux-image-5.8-alpha1-X1000_X5000.tar.gz

Cheers,
Christian

[1] https://en.m.wikipedia.org/wiki/AmigaOne_X1000
[2] https://www.amigaos.net/hardware/133/amigaone-x5000


> On 4. Jun 2020, at 16:29, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
> 
> 
>> Le 04/06/2020 à 16:26, Christophe Leroy a écrit :
>> Hi,
>>> Le 04/06/2020 à 16:16, Christian Zigotzky a écrit :
>>> Hi All,
>>> 
>>> I tested the latest Git kernel today. [1]. Unfortunately it doesn't boot on my PowerPC machines.
>>> 
>>> Could you please test the latest Git kernel with your PowerPC machine?
>>> 
>>> BTW, it doesn't boot in a virtual QEMU PowerPC machine either.
>>> 
>> Which machine/platform ? Which defconfig are you using ?
> 
> 
> And are you able to perform a 'git bisect' to identify the guilty commit ?
> 
> Thanks
> Christophe

[-- Attachment #2: Type: text/html, Size: 2635 bytes --]

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: David Edelsohn @ 2020-06-04 19:00 UTC (permalink / raw)
  To: musl
  Cc: GNU C Library, eery, Daniel Kolesa, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <20200604174613.GP1079@brightrain.aerifal.cx>

On Thu, Jun 4, 2020 at 1:46 PM Rich Felker <dalias@libc.org> wrote:
>
> On Thu, Jun 04, 2020 at 12:33:12PM -0500, Segher Boessenkool wrote:
> > On Thu, Jun 04, 2020 at 01:18:44PM -0400, Rich Felker wrote:
> > > On Thu, Jun 04, 2020 at 12:12:32PM -0500, Segher Boessenkool wrote:
> > > > On Tue, Jun 02, 2020 at 05:13:25PM +0200, Daniel Kolesa wrote:
> > > > > well, ppc64le already cannot be run on those, as far as I know (I
> > > > > don't think it's possible to build ppc64le userland without VSX in
> > > > > any configuration)
> > > >
> > > > VSX is required by the ELFv2 ABI:
> > > >
> > > > """
> > > > Specifically, to use this ABI and ABI-compliant programs, OpenPOWER-
> > > > compliant processors must implement the following categories:
> > >
> > > This is not actually ABI but IBM policy laundered into an ABI
> > > document, which musl does not honor.
> >
> > It is the ABI.  If you think it should be different, make your own ABI,
> > don't pretend the existing ABI is different than what it is.  Thank you.
>
> Our ABI is as specified in the ELFv2 document, but with ld as ld64,
> and minus gratuitous requirements on ISA level that are not part of
> implementing linkage.

Rich,

If you are changing the Power ELFv2 ABI then it is not the Power ELFv2
ABI.  You can't cherry-pick what you like and claim that it is
compatible.  You are not conforming to the ABI.

Thanks, David

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Segher Boessenkool @ 2020-06-04 21:10 UTC (permalink / raw)
  To: Daniel Kolesa
  Cc: Rich Felker, libc-alpha, eery, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <a43aeb5d-3704-4540-969e-085790ff0477@www.fastmail.com>

Hi!

On Thu, Jun 04, 2020 at 10:39:30PM +0200, Daniel Kolesa wrote:
> On Thu, Jun 4, 2020, at 19:33, Segher Boessenkool wrote:
> > It is the ABI.  If you think it should be different, make your own ABI,
> > don't pretend the existing ABI is different than what it is.  Thank you.
> 
> Well then - in that case, what do you suggest that I do?
> 
> Void currently ships an ELFv2 (or apparently not, I guess) 64-bit big endian port that works on 970/G5 up. It is important to me that it stays that way (a large amount of users are running 970s, so introducing a VSX dependency means I might as well abandon the port entirely).

You can just clearly document what ABI changes you use, and try to make
sure that everyone who uses your distro / your tools / your ABI variant
knows about it.  Telling your users that it is ELFv2, without telling
them it is not compliant, namely X Y Z are different, is a bit of a
disservice to your users, and worse to everyone else involved.

If you always use -mcpu=970 (or similar), then not very much is
different for you most likely -- except of course there is no promise
to the user that they can use VSX and all instructions in ISA 2.07,
which is a very useful promise to have normally.

> It currently works out of box - there are no changes required in glibc, and nearly the entire userland builds and works (about ~11500 out of ~12000 software packages, those that don't work either don't work on ppc64le either, or have issues related to endianness, or some other unrelated reason).

Very nice!

> I'd like to eventually get this into a state where I don't have to worry about glibc arbitrarily breaking it - which means it would be necessary to stabilize it upstream. While I can probably maintain a downstream patchset when it comes to it, I'd much prefer if I didn't have to - but this sounds like an official ELFv2 glibc BE port would be impossible unless the VSX requirement (and thus IEEE 128-bit long double and so on) was in place, which would defeat the point of the port.
> 
> Is there *any* way I can take that would make upstreams of all parts of the toolchain happy? I explicitly don't want to go back to ELFv1.

Oh absolutely, it sounds like things are in quite good shape already!
It will safe a lot of grief on all sides if you make clear this is not
"plain" ELFv2, and in what ways it differs.

Btw, if you use GCC, *please* send in testresults?  :-)

> While at it, I'd like to transition to ld64 long double format, to match musl and improve software compatibility, which I feel will raise more objections from IBM side.

I have no idea what "ld64 long double" is?  Is that just IEEE DP float?
Aka "long double is the same as double".  That is likely easier for new
ports than "double-double", yes, even if the eventual goal should be
IEEE QP float -- a much smoother transition.

Same goes here: document it!  If your users know that the ELFv2 variant
you give them is not *the* ELFv2, but it differs in some clear ways,
everyone will be happier :-)

Segher

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Daniel Kolesa @ 2020-06-04 21:43 UTC (permalink / raw)
  To: Segher Boessenkool, musl
  Cc: Rich Felker, libc-alpha, eery, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev, Joseph Myers
In-Reply-To: <20200604211009.GK31009@gate.crashing.org>

On Thu, Jun 4, 2020, at 23:10, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jun 04, 2020 at 10:39:30PM +0200, Daniel Kolesa wrote:
> > On Thu, Jun 4, 2020, at 19:33, Segher Boessenkool wrote:
> > > It is the ABI.  If you think it should be different, make your own ABI,
> > > don't pretend the existing ABI is different than what it is.  Thank you.
> > 
> > Well then - in that case, what do you suggest that I do?
> > 
> > Void currently ships an ELFv2 (or apparently not, I guess) 64-bit big endian port that works on 970/G5 up. It is important to me that it stays that way (a large amount of users are running 970s, so introducing a VSX dependency means I might as well abandon the port entirely).
> 
> You can just clearly document what ABI changes you use, and try to make
> sure that everyone who uses your distro / your tools / your ABI variant
> knows about it.  Telling your users that it is ELFv2, without telling
> them it is not compliant, namely X Y Z are different, is a bit of a
> disservice to your users, and worse to everyone else involved.

The thing is, I've yet to see in which way the ELFv2 ABI *actually* requires VSX - I don't think compiling for 970 introduces any actual differences. There will be omissions, yes - but then the more accurate thing would be to say that a subset of ELFv2 is used, rather than it being a different ABI per se.

The ELFv2 document specifies things like passing of quadruple precision floats. Indeed, VSX is needed there, but that's not a concern if you *don't* use quadruple precision floats.

> 
> If you always use -mcpu=970 (or similar), then not very much is
> different for you most likely -- except of course there is no promise
> to the user that they can use VSX and all instructions in ISA 2.07,
> which is a very useful promise to have normally.

Yes, -mcpu=970 is used for default packages. *However*, it is possible that the user compiles their own packages with -mcpu=power9 or something similar, and then it'll be possible to utilize VSX and all, and it should still work with the existing userland. When speaking of ABI, what matters is... well, the binary interface, which is the same - so I believe this is still ELFv2. A subset is always compliant with the whole.

That's why I'm worried when you speak of introducing a new ABI. As it is, we can benefit from having the compiler being generally the same (-mabi=elfv2 producing correct results even for 970) and retaining interoperability when people compile their own code for modern targets that cover the ELFv2 ABI as a whole. As I said, it's perfectly possible for somebody to run BE Void on their POWER9 machine, then compile their software for POWER9, and still have it work with the system packages built for 970 baseline. Pretty sure glibc will still provide optimized stuff (e.g. memcpy and so on) for the modern targets based on runtime detection, too.

So the "differences" in our case come down to "This is ELFv2, except you can't strictly assume that all features are present. In general that means no quad precision floating point for you if you want things to run on 970, since you don't have VSX regs"

> 
> > It currently works out of box - there are no changes required in glibc, and nearly the entire userland builds and works (about ~11500 out of ~12000 software packages, those that don't work either don't work on ppc64le either, or have issues related to endianness, or some other unrelated reason).
> 
> Very nice!
> 
> > I'd like to eventually get this into a state where I don't have to worry about glibc arbitrarily breaking it - which means it would be necessary to stabilize it upstream. While I can probably maintain a downstream patchset when it comes to it, I'd much prefer if I didn't have to - but this sounds like an official ELFv2 glibc BE port would be impossible unless the VSX requirement (and thus IEEE 128-bit long double and so on) was in place, which would defeat the point of the port.
> > 
> > Is there *any* way I can take that would make upstreams of all parts of the toolchain happy? I explicitly don't want to go back to ELFv1.
> 
> Oh absolutely, it sounds like things are in quite good shape already!
> It will safe a lot of grief on all sides if you make clear this is not
> "plain" ELFv2, and in what ways it differs.

See above.

> 
> Btw, if you use GCC, *please* send in testresults?  :-)

Yes, it's all gcc (we do have clang, but compiling repo packages with clang is generally frowned upon in the project, as we have vast majority of packages cross-compilable, and our cross-compiling infrastructure is gcc-centric, plus we enable certain things by default such as hardening flags that clang does not support). I'll try to remember next time I'm running tests.

> 
> > While at it, I'd like to transition to ld64 long double format, to match musl and improve software compatibility, which I feel will raise more objections from IBM side.
> 
> I have no idea what "ld64 long double" is?  Is that just IEEE DP float?
> Aka "long double is the same as double".  That is likely easier for new
> ports than "double-double", yes, even if the eventual goal should be
> IEEE QP float -- a much smoother transition.

Yes, I mean double == long double (like musl, and glibc before 2.4). I don't think there is any other viable way, since the IBM double-double format is legacy/often broken and real QP is constrained by hardware in this case. We're as much of a source distro as a binary distro, so a potential ppc64 build doesn't have to have any vector requirements at all, even if the default binary packages do.

I have a feeling that glibc would object to such port, since it means it would have to exist in parallel with a potential different ELFv2 port that does have a POWER8 minimal requirement; gcc would need a way to tell them apart, too (based on what would they be told apart? the simplest way would probably be something like, if --with-abi=elfv2 { if --with-cpu < power8 -> use glibc-novsx else use glibc-vsx } ...)

> 
> Same goes here: document it!  If your users know that the ELFv2 variant
> you give them is not *the* ELFv2, but it differs in some clear ways,
> everyone will be happier :-)

See above.

> 
> 
> Segher
>

Daniel

^ permalink raw reply

* Re: [PATCH] pwm: Add missing "CONFIG_" prefix
From: Kees Cook @ 2020-06-04 21:52 UTC (permalink / raw)
  To: Joe Perches
  Cc: linux-pwm, Uwe Kleine-König, linux-kernel, Thierry Reding,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <b08611018fdb6d88757c6008a5c02fa0e07b32fb.camel@perches.com>

On Wed, Jun 03, 2020 at 04:04:31PM -0700, Joe Perches wrote:
> On Wed, 2020-06-03 at 15:40 -0700, Kees Cook wrote:
> > The IS_ENABLED() use was missing the CONFIG_ prefix which would have
> > lead to skipping this code.
> > 
> > Fixes: 3ad1f3a33286 ("pwm: Implement some checks for lowlevel drivers")
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> >  drivers/pwm/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c
> > index 9973c442b455..6b3cbc0490c6 100644
> > --- a/drivers/pwm/core.c
> > +++ b/drivers/pwm/core.c
> > @@ -121,7 +121,7 @@ static int pwm_device_request(struct pwm_device *pwm, const char *label)
> >  		pwm->chip->ops->get_state(pwm->chip, pwm, &pwm->state);
> >  		trace_pwm_get(pwm, &pwm->state);
> >  
> > -		if (IS_ENABLED(PWM_DEBUG))
> > +		if (IS_ENABLED(CONFIG_PWM_DEBUG))
> >  			pwm->last = pwm->state;
> >  	}
> >  
> > -- 
> > 2.25.1
> > 
> 
> more odd uses (mostly in comments)
> 
> $ git grep -P -oh '\bIS_ENABLED\s*\(\s*\w+\s*\)'| \
>   sed -r 's/\s+//g'| \
>   grep -v '(CONFIG_' | \
>   sort | uniq -c | sort -rn
>       7 IS_ENABLED(DEBUG)
>       4 IS_ENABLED(DRM_I915_SELFTEST)
>       4 IS_ENABLED(cfg)
>       2 IS_ENABLED(opt_name)
>       2 IS_ENABLED(DEBUG_PRINT_TRIE_GRAPHVIZ)
>       2 IS_ENABLED(config)
>       2 IS_ENABLED(cond)
>       2 IS_ENABLED(__BIG_ENDIAN)
>       1 IS_ENABLED(x)
>       1 IS_ENABLED(STRICT_KERNEL_RWX)
>       1 IS_ENABLED(PWM_DEBUG)
>       1 IS_ENABLED(option)
>       1 IS_ENABLED(ETHTOOL_NETLINK)
>       1 IS_ENABLED(DEBUG_RANDOM_TRIE)
>       1 IS_ENABLED(DEBUG_CHACHA20POLY1305_SLOW_CHUNK_TEST)
> 
> STRICT_KERNEL_RWX is misused here in ppc
> 
> ---
> 
> Fix pr_warn without newline too.
> 
>  arch/powerpc/mm/book3s64/hash_utils.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 51e3c15f7aff..dd60c5f2b991 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -660,11 +660,10 @@ static void __init htab_init_page_sizes(void)
>  		 * Pick a size for the linear mapping. Currently, we only
>  		 * support 16M, 1M and 4K which is the default
>  		 */
> -		if (IS_ENABLED(STRICT_KERNEL_RWX) &&
> +		if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) &&
>  		    (unsigned long)_stext % 0x1000000) {
>  			if (mmu_psize_defs[MMU_PAGE_16M].shift)
> -				pr_warn("Kernel not 16M aligned, "
> -					"disabling 16M linear map alignment");
> +				pr_warn("Kernel not 16M aligned, disabling 16M linear map alignment\n");
>  			aligned = false;
>  		}

Joe, I was going to send all of the fixes for these issues, but your
patch doesn't have a SoB. Shall I add one for the above patch?

-- 
Kees Cook

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Daniel Kolesa @ 2020-06-04 22:06 UTC (permalink / raw)
  To: Phil Blundell
  Cc: libc-alpha, eery, Will Springer, Michal Suchánek,
	linuxppc-dev
In-Reply-To: <20200604215511.GB28641@pbcl.net>

On Thu, Jun 4, 2020, at 23:55, Phil Blundell wrote:
> On Thu, Jun 04, 2020 at 10:39:30PM +0200, Daniel Kolesa wrote:
> > Is there *any* way I can take that would make upstreams of all parts 
> > of the toolchain happy? I explicitly don't want to go back to ELFv1. 
> > While at it, I'd like to transition to ld64 long double format, to 
> > match musl and improve software compatibility, which I feel will raise 
> > more objections from IBM side.
> 
> Although I don't pretend to understand all the nuances of your port, and 
> in particular I have no idea what the thing about "ld64 long double 
> format" means, this doesn't sound like a particularly unusual situation.  
> If I understand correctly, you are in the position of essentially 
> wanting to implement the calling-standard part of the ABI on hardware 
> that isn't capable of implementing the full ABI as documented.

Well, the ld64 part is a separate issue. Defining a new long double ABI would break the ELFv2 ABI, since ELFv2 says long double must be 16-byte, of either IBM double-double format or IEEE754 binary128 :)

However, when I was talking about ELFv2 on 970 being a subset, I meant with the IBM double-double format, which has been present since glibc 2.4 at least, and doesn't require any vector functionality (it works even on 32-bit PowerPC)

So, defining a new long double ABI would indeed be a change compared to standard ELFv2. But, if we were doing a new port anyway, I think it'd be potentially worth it.

> 
> If that's the case then, depending on exactly what instructions are
> missing, I think your choices are:
> 
> 1a. Define your own subset of ELFv2 which is interworkable with the full 
> ABI at the function call interface but doesn't make all the same 
> guarantees about binary compatibility.  That would mean that a binary 
> built with your toolchain and conforming to the subset ABI would run on 
> any system that implements the full ELFv2 ABI, but the opposite is not 
> necessarily true.  There should be no impediment to getting support for 
> such an ABI upstream in any part of the GNU toolchain where it's 
> required if you can demonstrate that there's a non-trivial userbase for 
> it.  The hardest part may be thinking of a name.

Yes, this is the approach I would like to take.

> 
> 1b. Or, if the missing instructions are severe enough that it simply 
> isn't possible to have an interworkable implementation, you just need to 
> define your own ABI that fits your needs.  You can still borrow as much 
> as necessary from ELFv2 but you definitely need to call it something 
> else at that point.  All the other comments from 1a above still apply.
> 
> 2. Implement kernel emulation for the missing instructions.  If they
> are seldom used in practice then this might be adequate.  Of course,
> binaries that use them intensively will be slow; you'd have to judge
> whether this is better or worse than having them not run at all.  If
> you do this then you can implement the full ELFv2 ABI; your own
> toolchain might still choose not to use the instructions that it knows
> are going to be emulated, but at least other binaries will still run
> and you can call yourself compatible.
> 
> 3. Persuade whoever controls the ELFv2 ABI to relax their requirements.
> But I assume they didn't make the original decision capriciously so
> this might be hard/impossible.  ABI definitions from hardware vendors
> are always slightly political and we just have to accept this.

IBM has their commercial interests here and I don't think it'd be wise to take this kind of path. Implementing a new variant would probably be better; if we were documenting such differences, it'd probably be worthwhile to sync up with musl, since it'd be exactly the same ABI.

> 
> FWIW, we faced a similar situation about 20 years ago when the then-new 
> ARM EABI was defined.  This essentially required implementations to 
> support the ARMv5T instruction set; the committee that defined the ABI 
> took the view that requiring implementations to cater for older 
> architectures would be too onerous.  It was entirely possible to 
> implement 99% of the EABI on older processors; such implementations 
> weren't strictly conforming but they were interworkable enough to be 
> useful in practice, and the "almost-EABI" was still significantly
> better than what had gone before.
> 
> Phil
>

Daniel

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Joseph Myers @ 2020-06-04 22:08 UTC (permalink / raw)
  To: Daniel Kolesa
  Cc: Rich Felker, libc-alpha, eery, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev
In-Reply-To: <60fa8bd7-2439-4403-a0eb-166a2fb49a4b@www.fastmail.com>

On Thu, 4 Jun 2020, Daniel Kolesa wrote:

> The ELFv2 document specifies things like passing of quadruple precision 
> floats. Indeed, VSX is needed there, but that's not a concern if you 
> *don't* use quadruple precision floats.

My understanding is that the registers used for argument passing are all 
ones that exactly correspond to the Vector registers in earlier 
instruction set versions.  In other words, you could *in principle* 
produce an object, or a whole libm shared library, that (a) passes or 
receives _Float128 values in registers, (b) does not use any instructions 
beyond those available with -mcpu=970, (c) would work as intended whether 
executed on a 970 or on POWER8 and (d) when executed on POWER8, would 
fully interoperate with objects receiving or passing _Float128 values and 
compiled for POWER8 to use VSX instructions for that purpose.  GCC may not 
support _Float128 for older processors, but that doesn't prevent you from 
maintaining patches to add such support.  (But if you want to support 
those 64-bit processors that don't have Vector registers at all, you 
indeed can't use binary128 and interoperate with code using VSX for that 
format in POWER8.)

(Cf. how the Arm hard-float ABI variant works even on processors with 
single-precision-only VFP, because such processors still have the 
double-precision loads and stores although not double-precision 
arithmetic.  When working on that ABI support in GCC some years ago, I 
also made sure that GNU vector types corresponding to NEON vector types 
were passed consistently for the hard-float ABI whether or not any vector 
instructions were present - thus, avoiding depending on the machine modes 
for those vector types because GCC could choose a different machine mode 
depending on the instructions available.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Daniel Kolesa @ 2020-06-04 22:26 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Rich Felker, libc-alpha, eery, musl, Will Springer,
	Palmer Dabbelt via binutils, via libc-dev, Michal Suchánek,
	linuxppc-dev
In-Reply-To: <alpine.DEB.2.21.2006042154060.8237@digraph.polyomino.org.uk>

On Fri, Jun 5, 2020, at 00:08, Joseph Myers wrote:
> On Thu, 4 Jun 2020, Daniel Kolesa wrote:
> 
> > The ELFv2 document specifies things like passing of quadruple precision 
> > floats. Indeed, VSX is needed there, but that's not a concern if you 
> > *don't* use quadruple precision floats.
> 
> My understanding is that the registers used for argument passing are all 
> ones that exactly correspond to the Vector registers in earlier 
> instruction set versions.  In other words, you could *in principle* 
> produce an object, or a whole libm shared library, that (a) passes or 
> receives _Float128 values in registers, (b) does not use any instructions 
> beyond those available with -mcpu=970, (c) would work as intended whether 
> executed on a 970 or on POWER8 and (d) when executed on POWER8, would 
> fully interoperate with objects receiving or passing _Float128 values and 
> compiled for POWER8 to use VSX instructions for that purpose.  GCC may not 
> support _Float128 for older processors, but that doesn't prevent you from 
> maintaining patches to add such support.  (But if you want to support 
> those 64-bit processors that don't have Vector registers at all, you 
> indeed can't use binary128 and interoperate with code using VSX for that 
> format in POWER8.)

There's a potential userbase with 64-bit BE processors from Freescale/NXP that don't have any AltiVec support, I believe they are still in production - I'd like to retain support for these targets, as well as older IBM processors. The userland generally also supports that, and we've had multiple requests for support of this kind of hardware.

And while implementing it with just VMX may be possible, most hardware running this ABI wouldn't have any support for quad precision FP, and would perform better with using just double-precision.

We're not a commercial project, so we're just trying to support users within the FOSS community; I definitely wouldn't mind having this be just an ABI variant parallel to the others. Using 64-bit long doubles also has the benefit of being the same ABI as musl, which would enable things such as gcompat to work.

Either way I'll think about it some more and possibly prepare an RFC port. I'm definitely willing to put in the work and later maintenance effort if that's what it takes to make it happen.

> 
> (Cf. how the Arm hard-float ABI variant works even on processors with 
> single-precision-only VFP, because such processors still have the 
> double-precision loads and stores although not double-precision 
> arithmetic.  When working on that ABI support in GCC some years ago, I 
> also made sure that GNU vector types corresponding to NEON vector types 
> were passed consistently for the hard-float ABI whether or not any vector 
> instructions were present - thus, avoiding depending on the machine modes 
> for those vector types because GCC could choose a different machine mode 
> depending on the instructions available.)
> 
> -- 
> Joseph S. Myers
> joseph@codesourcery.com
>

Daniel

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Phil Blundell @ 2020-06-04 21:55 UTC (permalink / raw)
  To: Daniel Kolesa
  Cc: libc-alpha, eery, Will Springer, Michal Suchánek,
	linuxppc-dev
In-Reply-To: <a43aeb5d-3704-4540-969e-085790ff0477@www.fastmail.com>

On Thu, Jun 04, 2020 at 10:39:30PM +0200, Daniel Kolesa wrote:
> Is there *any* way I can take that would make upstreams of all parts 
> of the toolchain happy? I explicitly don't want to go back to ELFv1. 
> While at it, I'd like to transition to ld64 long double format, to 
> match musl and improve software compatibility, which I feel will raise 
> more objections from IBM side.

Although I don't pretend to understand all the nuances of your port, and 
in particular I have no idea what the thing about "ld64 long double 
format" means, this doesn't sound like a particularly unusual situation.  
If I understand correctly, you are in the position of essentially 
wanting to implement the calling-standard part of the ABI on hardware 
that isn't capable of implementing the full ABI as documented.

If that's the case then, depending on exactly what instructions are
missing, I think your choices are:

1a. Define your own subset of ELFv2 which is interworkable with the full 
ABI at the function call interface but doesn't make all the same 
guarantees about binary compatibility.  That would mean that a binary 
built with your toolchain and conforming to the subset ABI would run on 
any system that implements the full ELFv2 ABI, but the opposite is not 
necessarily true.  There should be no impediment to getting support for 
such an ABI upstream in any part of the GNU toolchain where it's 
required if you can demonstrate that there's a non-trivial userbase for 
it.  The hardest part may be thinking of a name.

1b. Or, if the missing instructions are severe enough that it simply 
isn't possible to have an interworkable implementation, you just need to 
define your own ABI that fits your needs.  You can still borrow as much 
as necessary from ELFv2 but you definitely need to call it something 
else at that point.  All the other comments from 1a above still apply.

2. Implement kernel emulation for the missing instructions.  If they
are seldom used in practice then this might be adequate.  Of course,
binaries that use them intensively will be slow; you'd have to judge
whether this is better or worse than having them not run at all.  If
you do this then you can implement the full ELFv2 ABI; your own
toolchain might still choose not to use the instructions that it knows
are going to be emulated, but at least other binaries will still run
and you can call yourself compatible.

3. Persuade whoever controls the ELFv2 ABI to relax their requirements.
But I assume they didn't make the original decision capriciously so
this might be hard/impossible.  ABI definitions from hardware vendors
are always slightly political and we just have to accept this.

FWIW, we faced a similar situation about 20 years ago when the then-new 
ARM EABI was defined.  This essentially required implementations to 
support the ARMv5T instruction set; the committee that defined the ABI 
took the view that requiring implementations to cater for older 
architectures would be too onerous.  It was entirely possible to 
implement 99% of the EABI on older processors; such implementations 
weren't strictly conforming but they were interworkable enough to be 
useful in practice, and the "almost-EABI" was still significantly
better than what had gone before.

Phil

^ permalink raw reply

* Re: [PATCH] net: ethernet: freescale: remove unneeded include for ucc_geth
From: David Miller @ 2020-06-04 22:56 UTC (permalink / raw)
  To: valentin; +Cc: netdev, leoyang.li, linuxppc-dev, kuba
In-Reply-To: <20200603212823.12501-1-valentin@longchamp.me>

From: Valentin Longchamp <valentin@longchamp.me>
Date: Wed,  3 Jun 2020 23:28:23 +0200

> net/sch_generic.h does not need to be included, remove it.
> 
> Signed-off-by: Valentin Longchamp <valentin@longchamp.me>

Applied.

^ permalink raw reply

* Re: [musl] Re: ppc64le and 32-bit LE userland compatibility
From: Segher Boessenkool @ 2020-06-04 23:06 UTC (permalink / raw)
  To: Phil Blundell
  Cc: libc-alpha, eery, Daniel Kolesa, Will Springer,
	Michal Suchánek, linuxppc-dev
In-Reply-To: <20200604215511.GB28641@pbcl.net>

Hi!

On Thu, Jun 04, 2020 at 11:55:11PM +0200, Phil Blundell wrote:
> 1a. Define your own subset of ELFv2 which is interworkable with the full 
> ABI at the function call interface but doesn't make all the same 
> guarantees about binary compatibility.  That would mean that a binary 
> built with your toolchain and conforming to the subset ABI would run on 
> any system that implements the full ELFv2 ABI, but the opposite is not 
> necessarily true.  There should be no impediment to getting support for 
> such an ABI upstream in any part of the GNU toolchain where it's 
> required if you can demonstrate that there's a non-trivial userbase for 
> it.  The hardest part may be thinking of a name.

And you can only use shared objects also built for that subset ABI.  If
you use some binary distribution, then it will also have to be built for
that subset, practically anyway.

This is very similar to soft-float targets.  There are "standard" ways
to deal with this.  Distros usually balk at having to maintain multiple
variants of a target, and users do not usually want to be restricted to
the lowest common denominator.  There always is that tension.

> 1b. Or, if the missing instructions are severe enough that it simply 
> isn't possible to have an interworkable implementation, you just need to 
> define your own ABI that fits your needs.  You can still borrow as much 
> as necessary from ELFv2 but you definitely need to call it something 
> else at that point.  All the other comments from 1a above still apply.

A different name is handy in casual conversation then, yes; but also in
case 1a, it should be clear what is what somehow.

> 2. Implement kernel emulation for the missing instructions.  If they
> are seldom used in practice then this might be adequate.  Of course,
> binaries that use them intensively will be slow; you'd have to judge
> whether this is better or worse than having them not run at all.  If
> you do this then you can implement the full ELFv2 ABI; your own
> toolchain might still choose not to use the instructions that it knows
> are going to be emulated, but at least other binaries will still run
> and you can call yourself compatible.

But not just instructions, there are actual new registers!  This might
be way too much work in the case of VSX.

But it is possible that implementing QP float (binary128) this way is
a feasible way forward, _if_ you have AltiVec enabled.

> 3. Persuade whoever controls the ELFv2 ABI to relax their requirements.
> But I assume they didn't make the original decision capriciously so
> this might be hard/impossible.  ABI definitions from hardware vendors
> are always slightly political and we just have to accept this.

There is more process involved than most open source people are
comfortable with :-/

> FWIW, we faced a similar situation about 20 years ago when the then-new 
> ARM EABI was defined.  This essentially required implementations to 
> support the ARMv5T instruction set; the committee that defined the ABI 
> took the view that requiring implementations to cater for older 
> architectures would be too onerous.  It was entirely possible to 
> implement 99% of the EABI on older processors; such implementations 
> weren't strictly conforming but they were interworkable enough to be 
> useful in practice, and the "almost-EABI" was still significantly
> better than what had gone before.

Yeah, this situation is quite similar in some ways :-)

The compilers should be able to adjust to what you need pretty easily.
Since you seem to have a distribution on-board already, the biggest
hurdle left is getting glibc to accept the new port, I think.  I don't
know if it will be easy to them, or a lot of work instead.

Thanks,


Segher

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox