* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Stephen Rothwell @ 2020-06-04 8:38 UTC (permalink / raw)
To: Andrew Morton, Michael Ellerman
Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604165246.436f02ba@canb.auug.org.au>
[-- Attachment #1: Type: text/plain, Size: 1314 bytes --]
Hi all,
On Thu, 4 Jun 2020 16:52:46 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 25c3cb8272c0..a6799723cd98 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1008,6 +1008,12 @@ extern struct page *p4d_page(p4d_t p4d);
> #define pud_page_vaddr(pud) __va(pud_val(pud) & ~PUD_MASKED_BITS)
> #define p4d_page_vaddr(p4d) __va(p4d_val(p4d) & ~P4D_MASKED_BITS)
>
> +static inline unsigned long pgd_index(unsigned long address)
> +{
> + return (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
> +}
> +#define pgd_index pgd_index
> +
> #define pte_ERROR(e) \
> pr_err("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, pte_val(e))
> #define pmd_ERROR(e) \
I have added that hunk to linux-next for tomorrow as a fix for
mm-consolidate-pgd_index-and-pgd_offset_k-definitions.
Its not strickly necessary, but Michael expressed a preference for the
inline function. I was wondering if pgd_index "Must be a compile-time
constant" on one (or a few) architectures, then why not leave the
default as an inline function and special case it as a macro where
needed ...
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Stephen Rothwell @ 2020-06-04 8:45 UTC (permalink / raw)
To: Andrew Morton, Michael Ellerman
Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604165246.436f02ba@canb.auug.org.au>
[-- Attachment #1: Type: text/plain, Size: 3455 bytes --]
Hi all,
On Thu, 4 Jun 2020 16:52:46 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> diff --git a/arch/powerpc/mm/kasan/8xx.c b/arch/powerpc/mm/kasan/8xx.c
> index db4ef44af22f..569d98a41881 100644
> --- a/arch/powerpc/mm/kasan/8xx.c
> +++ b/arch/powerpc/mm/kasan/8xx.c
> @@ -10,7 +10,7 @@
> static int __init
> kasan_init_shadow_8M(unsigned long k_start, unsigned long k_end, void *block)
> {
> - pmd_t *pmd = pmd_ptr_k(k_start);
> + pmd_t *pmd = pmd_off_k(k_start);
> unsigned long k_cur, k_next;
>
> for (k_cur = k_start; k_cur != k_end; k_cur = k_next, pmd += 2, block += SZ_8M) {
> @@ -59,7 +59,7 @@ int __init kasan_init_region(void *start, size_t size)
> return ret;
>
> for (; k_cur < k_end; k_cur += PAGE_SIZE) {
> - pmd_t *pmd = pmd_ptr_k(k_cur);
> + pmd_t *pmd = pmd_off_k(k_cur);
> void *va = block + k_cur - k_start;
> pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
>
> diff --git a/arch/powerpc/mm/kasan/book3s_32.c b/arch/powerpc/mm/kasan/book3s_32.c
> index 4bc491a4a1fd..a32b4640b9de 100644
> --- a/arch/powerpc/mm/kasan/book3s_32.c
> +++ b/arch/powerpc/mm/kasan/book3s_32.c
> @@ -46,7 +46,7 @@ int __init kasan_init_region(void *start, size_t size)
> kasan_update_early_region(k_start, k_cur, __pte(0));
>
> for (; k_cur < k_end; k_cur += PAGE_SIZE) {
> - pmd_t *pmd = pmd_ptr_k(k_cur);
> + pmd_t *pmd = pmd_off_k(k_cur);
> void *va = block + k_cur - k_start;
> pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
>
> diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
> index 286441bbbe49..92e8929cbe3e 100644
> --- a/arch/powerpc/mm/nohash/8xx.c
> +++ b/arch/powerpc/mm/nohash/8xx.c
> @@ -74,7 +74,7 @@ static pte_t __init *early_hugepd_alloc_kernel(hugepd_t *pmdp, unsigned long va)
> static int __ref __early_map_kernel_hugepage(unsigned long va, phys_addr_t pa,
> pgprot_t prot, int psize, bool new)
> {
> - pmd_t *pmdp = pmd_ptr_k(va);
> + pmd_t *pmdp = pmd_off_k(va);
> pte_t *ptep;
>
> if (WARN_ON(psize != MMU_PAGE_512K && psize != MMU_PAGE_8M))
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 45a0556089e8..1136257c3a99 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -264,7 +264,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> #if defined(CONFIG_PPC_8xx)
> void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
> {
> - pmd_t *pmd = pmd_ptr(mm, addr);
> + pmd_t *pmd = pmd_off(mm, addr);
> pte_basic_t val;
> pte_basic_t *entry = &ptep->pte;
> int num = is_hugepd(*((hugepd_t *)pmd)) ? 1 : SZ_512K / SZ_4K;
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index e2d054c9575e..6eb4eab79385 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -40,7 +40,7 @@ notrace void __init early_ioremap_init(void)
> {
> unsigned long addr = ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE);
> pte_t *ptep = (pte_t *)early_fixmap_pagetable;
> - pmd_t *pmdp = pmd_ptr_k(addr);
> + pmd_t *pmdp = pmd_off_k(addr);
>
> for (; (s32)(FIXADDR_TOP - addr) > 0;
> addr += PGDIR_SIZE, ptep += PTRS_PER_PTE, pmdp++)
I have added the above hunks as to linux-next for tomorrow as a fix for
mm-pgtable-add-shortcuts-for-accessing-kernel-pmd-and-pte.
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH 13/13] fs: move binfmt_misc sysctl to its own file
From: Xiaoming Ni @ 2020-06-04 8:45 UTC (permalink / raw)
To: Luis Chamberlain, keescook, yzaikin, ebiederm, axboe, clemens,
arnd, gregkh, jani.nikula, joonas.lahtinen, rodrigo.vivi, airlied,
daniel, benh, rdna, viro, mark, jlbec, joseph.qi, vbabka, sfr,
jack, amir73il, rafael, tytso
Cc: wangle6, intel-gfx, linux-kernel, dri-devel, julia.lawall,
alex.huangjianhui, laiyuanyuan.lai, akpm, linuxppc-dev,
ocfs2-devel
In-Reply-To: <20200529074108.16928-14-mcgrof@kernel.org>
On 2020/5/29 15:41, Luis Chamberlain wrote:
> This moves the binfmt_misc sysctl to its own file to help remove
> clutter from kernel/sysctl.c.
>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> fs/binfmt_misc.c | 1 +
> kernel/sysctl.c | 7 -------
> 2 files changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
> index f69a043f562b..656b3f5f3bbf 100644
> --- a/fs/binfmt_misc.c
> +++ b/fs/binfmt_misc.c
> @@ -821,6 +821,7 @@ static int __init init_misc_binfmt(void)
> int err = register_filesystem(&bm_fs_type);
> if (!err)
> insert_binfmt(&misc_format);
> + register_sysctl_empty_subdir("fs", "binfmt_misc");
> return err;
> }
build error when CONFIG_BINFMT_MISC=m
ERROR: modpost: "register_sysctl_empty_subdir" [fs/binfmt_misc.ko]
undefined!
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 27f0c9ea..4129dfb 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2853,6 +2853,7 @@ void register_sysctl_empty_subdir(const char *base,
{
register_sysctl_subdir(base, subdir, sysctl_mount_point);
}
+EXPORT_SYMBOL_GPL(register_sysctl_empty_subdir);
#endif /* CONFIG_SYSCTL */
Thanks
Xiaoming Ni
^ permalink raw reply related
* Re: [mainline][Oops][bisected 2ba3e6 ] 5.7.0 boot fails with kernel panic on powerpc
From: Naresh Kamboju @ 2020-06-04 8:55 UTC (permalink / raw)
To: Joerg Roedel
Cc: sachinp, Stephen Rothwell, linux-kernel, manvanth, Abdul Haleem,
linux-next, Steven Rostedt, aneesh.kumar, lkft-triage,
Andrew Morton, linuxppc-dev, hch
In-Reply-To: <20200603133257.GL6857@suse.de>
On Wed, 3 Jun 2020 at 19:03, Joerg Roedel <jroedel@suse.de> wrote:
>
> On Wed, Jun 03, 2020 at 04:20:57PM +0530, Abdul Haleem wrote:
> > @Joerg, Could you please have a look?
>
> Can you please try the attached patch?
@Joerg, Linaro test farm noticed this kernel crash on nxp ls2088
Machine model: Freescale Layerscape 2088A RDB Board
while booting Linux mainline 5.7.0 version kernel.
After applying your proposed patch fixed boot problem.
Tested-by: Naresh Kamboju <nareshj.kamboju@linaro.org>
Test ref:
https://lavalab.nxp.com/scheduler/job/23787#L426
Here is the kernel crash log before patch applied,
[ 0.000000] Linux version 5.7.0-03887-gf6aee505c71b
(TuxBuild@ecb9ef34f06f) (gcc version 9.3.0 (Debian 9.3.0-8), GNU ld
(GNU Binutils for Debian) 2.34) #1 SMP PREEMPT Wed Jun 3 18:21:26 UTC
2020
[ 0.000000] Machine model: Freescale Layerscape 2088A RDB Board
<>
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] Unable to handle kernel paging request at virtual
address fffeffff80000000
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x96000004
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000004
[ 0.000000] CM = 0, WnR = 0
[ 0.000000] [fffeffff80000000] address between user and kernel address ranges
[ 0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
5.7.0-03887-gf6aee505c71b #1
[ 0.000000] Hardware name: Freescale Layerscape 2088A RDB Board (DT)
[ 0.000000] pstate: 80000085 (Nzcv daIf -PAN -UAO BTYPE=--)
[ 0.000000] pc : map_kernel_range_noflush+0xc0/0x280
[ 0.000000] lr : __vmalloc_node_range+0x154/0x2a0
[ 0.000000] sp : ffffb3b1dcbc3e20
[ 0.000000] x29: ffffb3b1dcbc3e20 x28: fffeffff80000000
[ 0.000000] x27: ffff800010004000 x26: ffff800010000000
[ 0.000000] x25: 0000000000402dc2 x24: ffffb3b1dc53c000
[ 0.000000] x23: 0068000000000f13 x22: 0000000000000004
[ 0.000000] x21: ffffb3b1dc53cf48 x20: 0000000000000000
[ 0.000000] x19: ffffb3b1dc627800 x18: 00000000000000c0
[ 0.000000] x17: 0000000000000000 x16: 0000000000000007
[ 0.000000] x15: dead000000000100 x14: fffffe020b990600
[ 0.000000] x13: dead000000000122 x12: 0000000000000001
[ 0.000000] x11: 0000000000000000 x10: ffff0082fe3fdec0
[ 0.000000] x9 : ffff0082fe342d58 x8 : ffff4cd121ba5000
[ 0.000000] x7 : ffff808010000000 x6 : 0000000000000004
[ 0.000000] x5 : 000000000000fffd x4 : 0000000000004000
[ 0.000000] x3 : ffff800050000000 x2 : 0001000080000000
[ 0.000000] x1 : 0000000000000000 x0 : ffff800010003fff
[ 0.000000] Call trace:
[ 0.000000] map_kernel_range_noflush+0xc0/0x280
[ 0.000000] __vmalloc_node_range+0x154/0x2a0
[ 0.000000] __vmalloc_node+0x5c/0x70
[ 0.000000] init_IRQ+0xac/0xf8
[ 0.000000] start_kernel+0x2d0/0x4dc
[ 0.000000] Code: f90047e0 d503201f d2a80003 8b030343 (f9400380)
[ 0.000000] random: get_random_bytes called from
print_oops_end_marker+0x2c/0x58 with crng_init=0
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
ref:
https://lavalab.nxp.com/scheduler/job/23596#L603
--
Linaro LKFT
https://lkft.linaro.org
^ permalink raw reply
* RE: [RESEND PATCH v9 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-06-04 9:05 UTC (permalink / raw)
To: Williams, Dan J, linuxppc-dev@lists.ozlabs.org,
linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org
Cc: Aneesh Kumar K . V, Santosh Sivaraj, Oliver O'Halloran,
Weiny, Ira, Steven Rostedt
In-Reply-To: <BN6PR11MB413223B333153721405DFD91C6890@BN6PR11MB4132.namprd11.prod.outlook.com>
Hi Dan,
Thanks for review and insights on this. My responses below:
"Williams, Dan J" <dan.j.williams@intel.com> writes:
> [ forgive formatting I'm temporarily stuck using Outlook this week... ]
>
>> From: Vaibhav Jain <vaibhav@linux.ibm.com>
> [..]
>>
>> Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
>> module and add the command family NVDIMM_FAMILY_PAPR to the white
>> list of NVDIMM command sets. Also advertise support for ND_CMD_CALL for
>> the nvdimm command mask and implement necessary scaffolding in the
>> module to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>>
>> The layout of the PDSM request as we expect from libnvdimm/libndctl is
>> described in newly introduced uapi header 'papr_pdsm.h' which defines a
>> new 'struct nd_pdsm_cmd_pkg' header. This header is used to communicate
>> the PDSM request via member 'nd_cmd_pkg.nd_command' and size of
>> payload that need to be sent/received for servicing the PDSM.
>>
>> A new function is_cmd_valid() is implemented that reads the args to
>> papr_scm_ndctl() and performs sanity tests on them. A new function
>> papr_scm_service_pdsm() is introduced and is called from
>> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
>> command from libnvdimm.
>>
>> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>> ---
>> Changelog:
>>
>> Resend:
>> * Added ack from Aneesh.
>>
>> v8..v9:
>> * Reduced the usage of term SCM replacing it with appropriate
>> replacement [ Dan Williams, Aneesh ]
>> * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
>> * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
>> * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
>> * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
>> * Minor update to patch description.
>>
>> v7..v8:
>> * Removed the 'payload_offset' field from 'struct
>> nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
>> at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
>> * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
>> 'reserved' field of 10-bytes is introduced. [ Aneesh ]
>> * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
>> [ Ira ]
>>
>> Resend:
>> * None
>>
>> v6..v7 :
>> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>> [Mpe].
>> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
>> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>> [Mpe].
>> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
>>
>> v5..v6 :
>> * Changed the usage of the term DSM to PDSM to distinguish it from the
>> ACPI term [ Dan Williams ]
>> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various
>> struct
>> to reflect the new terminology.
>> * Updated the patch description and title to reflect the new terminology.
>> * Squashed patch to introduce new command family in 'ndctl.h' with
>> this patch [ Dan Williams ]
>> * Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
>> [ Dan Williams ]
>> * Removed redundant license text from the papr_scm_psdm.h file.
>> [ Dan Williams ]
>> * s/envelop/envelope/ at various places [ Dan Williams ]
>> * Added '__packed' attribute to command package header to gaurd
>> against different compiler adding paddings between the fields.
>> [ Dan Williams]
>> * Converted various pr_debug to dev_debug [ Dan Williams ]
>>
>> v4..v5 :
>> * None
>>
>> v3..v4 :
>> * None
>>
>> v2..v3 :
>> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>>
>> v1..v2 :
>> * None
>> ---
>> arch/powerpc/include/uapi/asm/papr_pdsm.h | 136
>> ++++++++++++++++++++++ arch/powerpc/platforms/pseries/papr_scm.c |
>> 101 +++++++++++++++-
>> include/uapi/linux/ndctl.h | 1 +
>> 3 files changed, 232 insertions(+), 6 deletions(-) create mode 100644
>> arch/powerpc/include/uapi/asm/papr_pdsm.h
>>
>> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> new file mode 100644
>> index 000000000000..6407fefcc007
>> --- /dev/null
>> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> @@ -0,0 +1,136 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
>> +/*
>> + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
>> + *
>> + * (C) Copyright IBM 2020
>> + *
>> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com> */
>> +
>> +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
>> +#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
>> +
>> +#include <linux/types.h>
>> +
>> +/*
>> + * PDSM Envelope:
>> + *
>> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel
>> +via
>> + * envelope which consists of a header and user-defined payload sections.
>> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
>> + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
>> + * There is reserved field that can used to introduce new fields to the
>> + * structure in future. It also tries to ensure that
>> 'nd_pdsm_cmd_pkg.payload'
>> + * lies at a 8-byte boundary.
>> + *
>> + * +-------------+---------------------+---------------------------+
>> + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
>> + * +-------------+---------------------+---------------------------+
>> + * | nd_pdsm_cmd_pkg | |
>> + * |-------------+ | |
>> + * | nd_cmd_pkg | | |
>> + * +-------------+---------------------+---------------------------+
>> + * | nd_family | | |
>> + * | nd_size_out | cmd_status | |
>> + * | nd_size_in | payload_version | payload |
>> + * | nd_command | reserved | |
>> + * | nd_fw_size | | |
>> + * +-------------+---------------------+---------------------------+
>> + *
>> + * PDSM Header:
>> + *
>> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
>> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to
>> member
>> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the
>> envelope
>> +which is
>> + * contained in 'struct nd_cmd_pkg', the header also has members
>> +following
>> + * members:
>> + *
>> + * 'cmd_status' : (Out) Errors if any encountered while
>> servicing PDSM.
>> + * 'payload_version' : (In/Out) Version number associated with the
>> payload.
>> + * 'reserved' : Not used and reserved for future.
>> + *
>> + * PDSM Payload:
>> + *
>> + * The layout of the PDSM Payload is defined by various structs shared
>> +between
>> + * papr_scm and libndctl so that contents of payload can be
>> +interpreted. During
>> + * servicing of a PDSM the papr_scm module will read input args from
>> +the payload
>> + * field by casting its contents to an appropriate struct pointer based
>> +on the
>> + * PDSM command. Similarly the output of servicing the PDSM command
>> +will be
>> + * copied to the payload field using the same struct.
>> + *
>> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size,
>> +which
>> + * leaves around 176 bytes for the envelope payload (ignoring any
>> +padding that
>> + * the compiler may silently introduce).
>> + *
>> + * Payload Version:
>> + *
>> + * A 'payload_version' field is present in PDSM header that indicates a
>> +specific
>> + * version of the structure present in PDSM Payload for a given PDSM
>> command.
>> + * This provides backward compatibility in case the PDSM Payload
>> +structure
>> + * evolves and different structures are supported by 'papr_scm' and
>> 'libndctl'.
>> + *
>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send
>> +the version
>> + * of the payload struct it supports via 'payload_version' field. The
>> 'papr_scm'
>> + * module when servicing the PDSM envelope checks the 'payload_version'
>> +and then
>> + * uses 'payload struct version' == MIN('payload_version field',
>> + * 'max payload-struct-version supported by papr_scm') to service the
>> PDSM.
>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of
>> +payload
>> + * struct in returned 'payload_version' field.
>> + *
>> + * Libndctl on receiving the envelope back from papr_scm again checks
>> +the
>> + * 'payload_version' field and based on it use the appropriate version
>> +dsm
>> + * struct to parse the results.
>> + *
>> + * Backward Compatibility:
>> + *
>> + * Above scheme of exchanging different versioned PDSM struct between
>> +libndctl
>> + * and papr_scm should provide backward compatibility until following
>> +two
>> + * assumptions/conditions when defining new PDSM structs hold:
>> + *
>> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
>> + *
>> + * 1. T(X) is a proper subset of T(Y) if Y > X.
>> + * i.e Each new version of PDSM struct should retain existing struct
>> + * attributes from previous version
>> + *
>> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
>> + * it should also support T(1), T(2)...T(X - 1).
>> + * i.e When adding support for new version of a PDSM struct, libndctl
>> + * and papr_scm should retain support of the existing PDSM struct
>> + * version they support.
>> + */
>> +
>> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from
>> libnvdimm
>> +*/ struct nd_pdsm_cmd_pkg {
>> + struct nd_cmd_pkg hdr; /* Package header containing sub-
>> cmd */
>> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> + __u16 reserved[5]; /* Ignored and to be used in future */
>> + __u16 payload_version; /* In/Out: version of the payload */
>> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> +} __packed;
>> +
>> +/*
>> + * Methods to be embedded in ND_CMD_CALL request. These are sent to
>> the
>> +kernel
>> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct */
>> +enum papr_pdsm {
>> + PAPR_PDSM_MIN = 0x0,
>> + PAPR_PDSM_MAX,
>> +};
>> +
>> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */ static inline
>> +struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg
>> *cmd) {
>> + return (struct nd_pdsm_cmd_pkg *) cmd; }
>> +
>> +/* Return the payload pointer for a given pcmd */ static inline void
>> +*pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd) {
>> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> + return NULL;
>> + else
>> + return (void *)(pcmd->payload);
>> +}
>> +
>> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 149431594839..5e2237e7ec08 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -15,13 +15,15 @@
>> #include <linux/seq_buf.h>
>>
>> #include <asm/plpar_wrappers.h>
>> +#include <asm/papr_pdsm.h>
>>
>> #define BIND_ANY_ADDR (~0ul)
>>
>> #define PAPR_SCM_DIMM_CMD_MASK \
>> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>> (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> - (1ul << ND_CMD_SET_CONFIG_DATA))
>> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
>> + (1ul << ND_CMD_CALL))
>>
>> /* DIMM health bitmap bitmap indicators */
>> /* SCM device is unable to persist memory contents */ @@ -350,16 +352,97
>> @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>> return 0;
>> }
>>
>> +/*
>> + * Validate the inputs args to dimm-control function and return '0' if valid.
>> + * This also does initial sanity validation to ND_CMD_CALL sub-command
>> packages.
>> + */
>> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void
>> *buf,
>> + unsigned int buf_len)
>> +{
>> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
>> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
>> + struct papr_scm_priv *p;
>> +
>> + /* Only dimm-specific calls are supported atm */
>> + if (!nvdimm)
>> + return -EINVAL;
>> +
>> + /* get the provider date from struct nvdimm */
>> + p = nvdimm_provider_data(nvdimm);
>> +
>> + if (!test_bit(cmd, &cmd_mask)) {
>> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
>> + return -EINVAL;
>> + } else if (cmd == ND_CMD_CALL) {
>> +
>> + /* Verify the envelope package */
>> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
>> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
>> + buf_len);
>> + return -EINVAL;
>> + }
>> +
>> + /* Verify that the PDSM family is valid */
>> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
>> + dev_dbg(&p->pdev->dev, "Invalid pkg
>> family=0x%llx\n",
>> + pkg->hdr.nd_family);
>> + return -EINVAL;
>> +
>> + }
>> +
>> + /* We except a payload with all PDSM commands */
>> + if (pdsm_cmd_to_payload(pkg) == NULL) {
>> + dev_dbg(&p->pdev->dev,
>> + "Empty payload for sub-command=0x%llx\n",
>> + pkg->hdr.nd_command);
>> + return -EINVAL;
>> + }
>> + }
>> +
>> + /* Command looks valid */
>
<snip>
> So this is where I would expect the kernel to validate the command vs
> a known list of supported commands / payloads. One of the goals of
> requiring public documentation of any commands that libnvdimm might
> support for the ioctl path is to give the kernel the ability to gate
> future enabling on consideration of a common kernel front-end
> interface. I believe this would also address questions about the
> versioning scheme because userspace would be actively prevented from
> sending command payloads that were not first explicitly enabled in the
> kernel. This interface as it stands in this patch set seems to be a
> very thin / "anything goes" passthrough with no consideration for that
> policy.
>
> As an example of the utility of this policy, consider the recent
> support for nvdimm security commands that allow a passphrase to be set
> and issue commands like "unlock" and "secure erase". The kernel
> actively prevents those commands from being sent from userspace. See
> acpi_nfit_clear_to_send() and nd_cmd_clear_to_send(). The reasoning is
> that it enforces the kernel's nvdimm security model that uses
> encrypted/trusted keys to protect key material (clear text keys
> only-ever exist in kernel-space). Yes, that restriction is painful for
> people that don't want the kernel's security model and just want the
> simplicity of passing clear-text keys around, but it's necessary for
> the kernel to have any chance to provide a common abstraction across
> vendors. The pain of negotiating every single command with what the
> kernel will support is useful for the long term health of the
> kernel. It forces ongoing conversations across vendors to consolidate
> interfaces and reuse kernel best practices like encrypted/trusted
> keys. Code acceptance is the only real gate the kernel has to enforce
> cooperation across vendors.
>
> The expectation is that the kernel does not allow any command to pass
> that is not explicitly listed in a bitmap of known commands. I would
> expect that if you changed the payload of an existing command that
> would likely require a new entry in this bitmap. The goal is to give
> the kernel a chance to constrain the passthrough interface to afford a
> chance to have a discussion of what might done in a common
> implementation. Another example is the label-area read-write
> commands. The kernel needs explicit control to ensure that it owns the
> label area and that userspace is not able to corrupt it (write it
> behind the kernel's back).
>
> Now that said, I have battle scars with some OEMs that just want a
> generic passthrough interface so they never need to work with the
> kernel community again and can just write their custom validation
> tooling and be done. I've mostly been successful in that fight outside
> of the gaping hole of ND_CMD_VENDOR. That's the path that ipmctl has
> used to issue commands that have not made it into the public
> specification on docs.pmem.io. My warning shot for that is the
> "disable_vendor_specific" module option that administrators can set to
> only allow commands that the kernel explicitly knows the effects of to
> be issued. The result is only tooling / enabling that submits to this
> auditing regime is guaranteed to work everywhere.
Agree with points made above. With this patchset we arent really trying
to push an ioctl passthrough to exchange arbitary data with
papr-scm module. Nor do we want to bypass the kernel community for any
future enhancements on this interface. We made some design choices based on
our understanding of certain restriction we saw in
ndctl/libndctl. Specifically wanted to avoid issuing two CMD_CALL ioctl
roundtrips.
That being said I had an extended discussion with Aneesh rethinking the
'version' field and we both agreed *to remove this field* from the
proposed 'struct nd_pdsm_cmd_pkg'. This should resolve the contentions
around this Patch-4 in this patchset. Since the 'version' field isnt
extensively used right now the impact on the patchset would be small.
>
> So, that long explanation out of the way, what does that mean for this
> patch set? I'd like to understand if you still see a need for a
> versioning scheme if the implementation is required to explicitly list
> all the commands it supports? I.e. that the kernel need not worry
> about userspace sending future unknown payloads because unknown
> payloads are blocked. Also if your interface has anything similar to a
> "vendor specific" passthrough I would like to require that go through
> the ND_CMD_VENDOR ioctl, so that the kernel still has a common check
> point to prevent vendor specific "I don't want to talk to the kernel
> community" shenanigans, but even better if ND_CMD_VENDOR is something
> the kernel can eventually jettison because nobody is using it.
As I mentioned above this isn't a 'vendor specific passthrough'
machenism. The 'version' field was proposed to avoid two CMD_CALL ioctl
roundtrip to fetch and report extended nvdimm health data like
'life-remaining' which isnt always available for papr-scm.
However we just realized instead of relying on 'version' field we can
advertise support for these extended attributes via nvdimm-flags from
sysfs. Looking at the nvdimm-flags libndctl can use an appropriate
pdsm command and struct to fetch the dimm health information from
papr_scm via CMD_CALL.
But thats something we plan to do in future and not with the current
patchset which only reports fixed set of nvdimm health attributes.
>
> I feel like this is a conversation that will take a few days to
> resolve, which does not leave time to push this for v5.8. That said, I
> do think the health flags patches at the beginning of this series are
> low risk and uncontentious. How about I merge those for v5.8 and
> circle back to get this ioctl path queued early in v5.8-rc? Apologies
> for the late feedback on this relative to v5.8.
>
Thanks for this consideration. Agree to the proposal. However changes to
patchset with removal of 'version' field is fairly small hence can
quickly push an updated patch series cumulating rest of the review
comments from Ira.
Does that sounds reasonable ?
Thanks,
~ Vaibhav
^ permalink raw reply
* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Guenter Roeck @ 2020-06-04 9:38 UTC (permalink / raw)
To: Ira Weiny
Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
James E.J. Bottomley, Max Filippov, Paul Mackerras,
H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
linux-xtensa, Borislav Petkov, Al Viro, Andy Lutomirski,
Thomas Gleixner, linux-arm-kernel, Chris Zankel,
Thomas Bogendoerfer, linux-parisc, linux-kernel, Christian Koenig,
Andrew Morton, linuxppc-dev, David S. Miller, Mike Rapoport
In-Reply-To: <20200604062226.GA1740345@iweiny-DESK2.sc.intel.com>
On 6/3/20 11:22 PM, Ira Weiny wrote:
[ ... ]
>
> s390: (does not compile)
>
> <stdin>:1511:2: warning: #warning syscall clone3 not implemented [-Wcpp]
> In file included from ./arch/sparc/include/asm/bug.h:6:0,
> from ./include/linux/bug.h:5,
> from ./include/linux/mmdebug.h:5,
> from ./include/linux/mm.h:9,
> from mm/huge_memory.c:8:
> mm/huge_memory.c: In function 'hugepage_init':
> ./include/linux/compiler.h:403:38: error: call to '__compiletime_assert_127' declared with attribute error: BUILD_BUG_ON failed: ((13 + (13-3))-13) >= 9
> _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> ^
> ./include/linux/compiler.h:384:4: note: in definition of macro '__compiletime_assert'
> prefix ## suffix(); \
> ^~~~~~
> ./include/linux/compiler.h:403:2: note: in expansion of macro '_compiletime_assert'
> _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> ^~~~~~~~~~~~~~~~~~~
> ./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
> #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> ^~~~~~~~~~~~~~~~~~
> ./include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
> BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
> ^~~~~~~~~~~~~~~~
> ./include/linux/bug.h:24:4: note: in expansion of macro 'BUILD_BUG_ON'
> BUILD_BUG_ON(cond); \
> ^~~~~~~~~~~~
> mm/huge_memory.c:403:2: note: in expansion of macro 'MAYBE_BUILD_BUG_ON'
> MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER >= MAX_ORDER);
> ^~~~~~~~~~~~~~~~~~
> make[1]: *** [scripts/Makefile.build:267: mm/huge_memory.o] Error 1
> make[1]: *** Waiting for unfinished jobs....
> make: *** [Makefile:1735: mm] Error 2
> make: *** Waiting for unfinished jobs....
> ------------
>
>
> The s390 error is the same on Linus' master and linux-next. So whatever is
> causing that has slipped into mainline and/or is something I've broken in the
> test scripts.
>
Compiler version related. gcc version 8.x and later no longer work.
Bisect points to commit a148866489f ("sched: Replace rq::wake_list").
Oddly enough x86 images are broken as well. You'll have to use an
older version of gcc (or presumably clang) until this is fixed.
Guenter
^ permalink raw reply
* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Mike Rapoport @ 2020-06-04 9:41 UTC (permalink / raw)
To: Guenter Roeck
Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
James E.J. Bottomley, Max Filippov, Paul Mackerras,
H. Peter Anvin, sparclinux, Ira Weiny, Dan Williams, Helge Deller,
x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
linux-xtensa, Borislav Petkov, Al Viro, Andy Lutomirski,
Thomas Gleixner, linux-arm-kernel, Chris Zankel,
Thomas Bogendoerfer, linux-parisc, linux-kernel, Christian Koenig,
Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <3538c8ad-674e-d310-d870-4ef6888092ed@roeck-us.net>
On Wed, Jun 03, 2020 at 04:44:17PM -0700, Guenter Roeck wrote:
>
> sparc32 smp images in next-20200603 still crash for me with a spinlock
> recursion. s390 images hang early in boot. Several others (alpha, arm64,
> various ppc) don't even compile. I can run some more bisects over time,
> but this is becoming a full-time job :-(.
I've been able to bisect s390 hang to commit b614345f52bc ("x86/entry:
Clarify irq_{enter,exit}_rcu()").
After this commit, lockdep_hardirq_exit() is called twice on s390 (and
others) - one time in irq_exit_rcu() and another one in irq_exit():
/**
* irq_exit_rcu() - Exit an interrupt context without updating RCU
*
* Also processes softirqs if needed and possible.
*/
void irq_exit_rcu(void)
{
__irq_exit_rcu();
/* must be last! */
lockdep_hardirq_exit();
}
/**
* irq_exit - Exit an interrupt context, update RCU and lockdep
*
* Also processes softirqs if needed and possible.
*/
void irq_exit(void)
{
irq_exit_rcu();
rcu_irq_exit();
/* must be last! */
lockdep_hardirq_exit();
}
Removing the call in irq_exit() make s390 boot again, and judgung by the
x86 entry code, the comment /* must be last! */ is stale...
@Peter, @Thomas, can you comment please?
From e51d50ee6f4d1f446decf91c2c67230da14ff82c Mon Sep 17 00:00:00 2001
From: Mike Rapoport <rppt@linux.ibm.com>
Date: Thu, 4 Jun 2020 12:37:03 +0300
Subject: [PATCH] softirq: don't call lockdep_hardirq_exit() twice
After commit b614345f52bc ("x86/entry: Clarify irq_{enter,exit}_rcu()")
lockdep_hardirq_exit() is called twice on every architecture that uses
irq_exit(): one time in irq_exit_rcu() and another one in irq_exit().
Remove the extra call in irq_exit().
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
kernel/softirq.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index a3eb6eba8c41..7523f4ce4c1d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -427,7 +427,6 @@ static inline void __irq_exit_rcu(void)
void irq_exit_rcu(void)
{
__irq_exit_rcu();
- /* must be last! */
lockdep_hardirq_exit();
}
@@ -440,8 +439,6 @@ void irq_exit(void)
{
irq_exit_rcu();
rcu_irq_exit();
- /* must be last! */
- lockdep_hardirq_exit();
}
/*
--
2.26.2
> Guenter
--
Sincerely yours,
Mike.
^ permalink raw reply related
* Re: [PATCH v4 08/14] powerpc: add support for folded p4d page tables
From: Qian Cai @ 2020-06-04 9:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Rich Felker, linux-ia64, Geert Uytterhoeven, linux-sh, linux-mm,
Paul Mackerras, linux-hexagon, Will Deacon, kvmarm, Jonas Bonn,
linux-arch, Brian Cain, Marc Zyngier, Russell King, Ley Foon Tan,
Mike Rapoport, Catalin Marinas, Julien Thierry, uclinux-h8-devel,
Fenghua Yu, Arnd Bergmann, Suzuki K Poulose, kvm-ppc,
Stefan Kristiansson, openrisc, Stafford Horne, Guan Xuetao,
linux-arm-kernel, Christophe Leroy, Tony Luck, Yoshinori Sato,
linux-kernel, James Morse, nios2-dev, linuxppc-dev, Mike Rapoport
In-Reply-To: <20200603120522.7646d56a23088416a7d3fc1a@linux-foundation.org>
> On Jun 3, 2020, at 3:05 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> A bunch of new material just landed in linux-next/powerpc.
>
> The timing is awkward! I trust this will be going into mainline during
> this merge window? If not, please drop it and repull after -rc1.
I have noticed the same pattern over and over again, i.e., many powerpc new material has only shown up in linux-next for only a few days before sending for a pull request to Linus.
There are absolutely no safe net for this kind of practice. The main problem is that Linus seems totally fine with it.
^ permalink raw reply
* linux-next: build failure on powerpc 8xx with 16k pages
From: Christophe Leroy @ 2020-06-04 10:48 UTC (permalink / raw)
To: Andrew Morton, Michael Ellerman, PowerPC, Will Deacon,
Thomas Gleixner, Stephen Rothwell
Cc: Linux Next Mailing List, Linux Kernel Mailing List
Hi all,
Using mpc885_ads_defconfig with CONFIG_PPC_16K_PAGES instead of
CONFIG_PPC_4K_PAGES, getting the following build failure:
CC mm/gup.o
In file included from ./include/linux/kernel.h:11:0,
from mm/gup.c:2:
In function 'gup_hugepte.constprop',
inlined from 'gup_huge_pd.isra.78' at mm/gup.c:2465:8:
./include/linux/compiler.h:392:38: error: call to
'__compiletime_assert_257' declared with attribute error: Unsupported
access size for {READ,WRITE}_ONCE().
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:373:4: note: in definition of macro
'__compiletime_assert'
prefix ## suffix(); \
^
./include/linux/compiler.h:392:2: note: in expansion of macro
'_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:405:2: note: in expansion of macro
'compiletime_assert'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
./include/linux/compiler.h:291:2: note: in expansion of macro
'compiletime_assert_rwonce_type'
compiletime_assert_rwonce_type(x); \
^
mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
pte = READ_ONCE(*ptep);
^
In function 'gup_get_pte',
inlined from 'gup_pte_range' at mm/gup.c:2228:9,
inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
inlined from 'gup_pud_range' at mm/gup.c:2641:15,
inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
inlined from 'internal_get_user_pages_fast' at mm/gup.c:2785:3:
./include/linux/compiler.h:392:38: error: call to
'__compiletime_assert_254' declared with attribute error: Unsupported
access size for {READ,WRITE}_ONCE().
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:373:4: note: in definition of macro
'__compiletime_assert'
prefix ## suffix(); \
^
./include/linux/compiler.h:392:2: note: in expansion of macro
'_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
./include/linux/compiler.h:405:2: note: in expansion of macro
'compiletime_assert'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
./include/linux/compiler.h:291:2: note: in expansion of macro
'compiletime_assert_rwonce_type'
compiletime_assert_rwonce_type(x); \
^
mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE'
return READ_ONCE(*ptep);
^
make[2]: *** [mm/gup.o] Error 1
Bisected to:
2ab3a0a02905 (HEAD, refs/bisect/bad) READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses
Christophe
^ permalink raw reply
* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Stephen Rothwell @ 2020-06-04 11:08 UTC (permalink / raw)
To: Andrew Morton, Michael Ellerman
Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604174925.3610fdd1@canb.auug.org.au>
[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]
Hi all,
On Thu, 4 Jun 2020 17:49:25 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> diff --cc arch/powerpc/include/asm/nohash/32/pgtable.h
> index 639f3b3713ec,eb8538c85077..1927e1b653f2
> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
> @@@ -204,13 -205,6 +205,9 @@@ static inline void pmd_clear(pmd_t *pmd
> *pmdp = __pmd(0);
> }
>
> -
> - /* to find an entry in a kernel page-table-directory */
> - #define pgd_offset_k(address) pgd_offset(&init_mm, address)
> -
> +/* to find an entry in a page-table-directory */
> +#define pgd_index(address) ((address) >> PGDIR_SHIFT)
> +#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))
>
> /*
> * PTE updates. This function is called whenever an existing
> @@@ -240,7 -234,7 +237,7 @@@ static inline pte_basic_t pte_update(st
> pte_basic_t old = pte_val(*p);
> pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
> int num, i;
> -- pmd_t *pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
> ++ pmd_t *pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, addr), addr), addr), addr);
>
> if (!huge)
> num = PAGE_SIZE / SZ_4K;
I have added those hunks (more or less) to linux-next for tomorrow as a
fix for mm-consolidate-pgd_index-and-pgd_offset_k-definitions.
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: linux-next: build failure on powerpc 8xx with 16k pages
From: Will Deacon @ 2020-06-04 11:17 UTC (permalink / raw)
To: Christophe Leroy
Cc: Stephen Rothwell, peterz, Linux Kernel Mailing List,
Linux Next Mailing List, Andrew Morton, PowerPC, Thomas Gleixner
In-Reply-To: <dc2b16e1-b719-5500-508d-ae97bf50c4a6@csgroup.eu>
Hi, [+Peter]
On Thu, Jun 04, 2020 at 10:48:03AM +0000, Christophe Leroy wrote:
> Using mpc885_ads_defconfig with CONFIG_PPC_16K_PAGES instead of
> CONFIG_PPC_4K_PAGES, getting the following build failure:
>
> CC mm/gup.o
> In file included from ./include/linux/kernel.h:11:0,
> from mm/gup.c:2:
> In function 'gup_hugepte.constprop',
> inlined from 'gup_huge_pd.isra.78' at mm/gup.c:2465:8:
> ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_257'
> declared with attribute error: Unsupported access size for
> {READ,WRITE}_ONCE().
> _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> ^
> ./include/linux/compiler.h:373:4: note: in definition of macro
> '__compiletime_assert'
> prefix ## suffix(); \
> ^
> ./include/linux/compiler.h:392:2: note: in expansion of macro
> '_compiletime_assert'
> _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> ^
> ./include/linux/compiler.h:405:2: note: in expansion of macro
> 'compiletime_assert'
> compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
> ^
> ./include/linux/compiler.h:291:2: note: in expansion of macro
> 'compiletime_assert_rwonce_type'
> compiletime_assert_rwonce_type(x); \
> ^
> mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
> pte = READ_ONCE(*ptep);
> ^
> In function 'gup_get_pte',
> inlined from 'gup_pte_range' at mm/gup.c:2228:9,
> inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
> inlined from 'gup_pud_range' at mm/gup.c:2641:15,
> inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
> inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
> inlined from 'internal_get_user_pages_fast' at mm/gup.c:2785:3:
At first glance, this looks like a real bug in the 16k page code -- you're
loading the pte non-atomically on the fast GUP path and so you're prone to
tearing, which probably isn't what you want. For a short-term hack, I'd
suggest having CONFIG_HAVE_FAST_GUP depend on !CONFIG_PPC_16K_PAGES, but if
you want to support this them you'll need to rework your pte_t so that it
can be loaded atomically.
Will
^ permalink raw reply
* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Stephen Rothwell @ 2020-06-04 11:22 UTC (permalink / raw)
To: Andrew Morton, Michael Ellerman
Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604174925.3610fdd1@canb.auug.org.au>
[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]
Hi all,
On Thu, 4 Jun 2020 17:49:25 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> diff --cc arch/powerpc/include/asm/nohash/32/pgtable.h
> index 639f3b3713ec,eb8538c85077..1927e1b653f2
> --- a/arch/powerpc/include/asm/nohash/32/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
> @@@ -342,15 -334,6 +337,10 @@@ static inline int pte_young(pte_t pte
> pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
> #endif
>
> - /* Find an entry in the third-level page table.. */
> - #define pte_index(address) \
> - (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
> +#define pte_offset_kernel(dir, addr) \
> + (pmd_bad(*(dir)) ? NULL : (pte_t *)pmd_page_vaddr(*(dir)) + \
> + pte_index(addr))
> - #define pte_offset_map(dir, addr) pte_offset_kernel((dir), (addr))
> - static inline void pte_unmap(pte_t *pte) { }
> +
> /*
> * Encode and decode a swap entry.
> * Note that the bits we use in a PTE for representing a swap entry
I have added this hunk (sort of - see below) to linux-next for tomorrow
as a fix for mm-consolidate-pte_index-and-pte_offset_-definitions.
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Thu, 4 Jun 2020 21:16:19 +1000
Subject: [PATCH] mm-consolidate-pte_index-and-pte_offset_-definitions-fix
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
arch/powerpc/include/asm/nohash/32/pgtable.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index c188a6f64bcd..d94bcd117c5b 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -341,6 +341,10 @@ static inline int pte_young(pte_t pte)
pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
#endif
+#define pte_offset_kernel(dir, addr) \
+ (pmd_bad(*(dir)) ? NULL : (pte_t *)pmd_page_vaddr(*(dir)) + \
+ pte_index(addr))
+
/*
* Encode and decode a swap entry.
* Note that the bits we use in a PTE for representing a swap entry
--
2.27.0.rc2
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply related
* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Michael Ellerman @ 2020-06-04 11:28 UTC (permalink / raw)
To: Stephen Rothwell, Andrew Morton
Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604183805.6f384b23@canb.auug.org.au>
Stephen Rothwell <sfr@canb.auug.org.au> writes:
> Hi all,
>
> On Thu, 4 Jun 2020 16:52:46 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 25c3cb8272c0..a6799723cd98 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -1008,6 +1008,12 @@ extern struct page *p4d_page(p4d_t p4d);
>> #define pud_page_vaddr(pud) __va(pud_val(pud) & ~PUD_MASKED_BITS)
>> #define p4d_page_vaddr(p4d) __va(p4d_val(p4d) & ~P4D_MASKED_BITS)
>>
>> +static inline unsigned long pgd_index(unsigned long address)
>> +{
>> + return (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
>> +}
>> +#define pgd_index pgd_index
>> +
>> #define pte_ERROR(e) \
>> pr_err("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, pte_val(e))
>> #define pmd_ERROR(e) \
>
> I have added that hunk to linux-next for tomorrow as a fix for
> mm-consolidate-pgd_index-and-pgd_offset_k-definitions.
>
> Its not strickly necessary, but Michael expressed a preference for the
> inline function.
That was because we just recently converted it into a static inline to
avoid UBSAN warnings:
commit c2e929b18cea6cbf71364f22d742d9aad7f4677a
Author: Qian Cai <cai@lca.pw>
AuthorDate: Thu Mar 5 23:48:52 2020 -0500
powerpc/64s/pgtable: fix an undefined behaviour
Booting a power9 server with hash MMU could trigger an undefined
behaviour because pud_offset(p4d, 0) will do,
0 >> (PAGE_SHIFT:16 + PTE_INDEX_SIZE:8 + H_PMD_INDEX_SIZE:10)
Fix it by converting pud_index() and friends to static inline
functions.
UBSAN: shift-out-of-bounds in arch/powerpc/mm/ptdump/ptdump.c:282:15
shift exponent 34 is too large for 32-bit type 'int'
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200303+ #13
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
ubsan_epilogue+0x18/0x78
__ubsan_handle_shift_out_of_bounds+0x160/0x21c
walk_pagetables+0x2cc/0x700
walk_pud at arch/powerpc/mm/ptdump/ptdump.c:282
(inlined by) walk_pagetables at arch/powerpc/mm/ptdump/ptdump.c:311
ptdump_check_wx+0x8c/0xf0
mark_rodata_ro+0x48/0x80
kernel_init+0x74/0x194
ret_from_kernel_thread+0x5c/0x74
> I was wondering if pgd_index "Must be a compile-time
> constant" on one (or a few) architectures, then why not leave the
> default as an inline function and special case it as a macro where
> needed ...
AIUI that requirement comes from x86 which has:
#define KERNEL_PGD_BOUNDARY pgd_index(PAGE_OFFSET)
#define KERNEL_PGD_PTRS (PTRS_PER_PGD - KERNEL_PGD_BOUNDARY)
...
#define MAX_PREALLOCATED_USER_PMDS KERNEL_PGD_PTRS
...
pgd_t *pgd_alloc(struct mm_struct *mm)
{
pgd_t *pgd;
pmd_t *u_pmds[MAX_PREALLOCATED_USER_PMDS];
Which will produce a variable length array if pgd_index() isn't a
compile-time constant.
cheers
^ permalink raw reply
* Re: [PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics
From: Amit Kucheria @ 2020-06-04 11:59 UTC (permalink / raw)
To: David Rientjes
Cc: Emanuele Giuseppe Esposito, linux-s390, kvm, David Hildenbrand,
Cornelia Huck, Emanuele Giuseppe Esposito, LKML, kvm-ppc,
Jonathan Adams, Christian Borntraeger, Alexander Viro,
linux-fsdevel, Paolo Bonzini, Vitaly Kuznetsov, open list:MIPS,
linuxppc-dev, Jim Mattson
In-Reply-To: <alpine.DEB.2.22.394.2005041429210.224786@chino.kir.corp.google.com>
On Tue, May 5, 2020 at 3:07 AM David Rientjes <rientjes@google.com> wrote:
>
> On Mon, 4 May 2020, Emanuele Giuseppe Esposito wrote:
>
> > There is currently no common way for Linux kernel subsystems to expose
> > statistics to userspace shared throughout the Linux kernel; subsystems
> > have to take care of gathering and displaying statistics by themselves,
> > for example in the form of files in debugfs. For example KVM has its own
> > code section that takes care of this in virt/kvm/kvm_main.c, where it sets
> > up debugfs handlers for displaying values and aggregating them from
> > various subfolders to obtain information about the system state (i.e.
> > displaying the total number of exits, calculated by summing all exits of
> > all cpus of all running virtual machines).
> >
> > Allowing each section of the kernel to do so has two disadvantages. First,
> > it will introduce redundant code. Second, debugfs is anyway not the right
> > place for statistics (for example it is affected by lockdown)
> >
> > In this patch series I introduce statsfs, a synthetic ram-based virtual
> > filesystem that takes care of gathering and displaying statistics for the
> > Linux kernel subsystems.
> >
>
> This is exciting, we have been looking in the same area recently. Adding
> Jonathan Adams <jwadams@google.com>.
>
> In your diffstat, one thing I notice that is omitted: an update to
> Documentation/* :) Any chance of getting some proposed Documentation/
> updates with structure of the fs, the per subsystem breakdown, and best
> practices for managing the stats from the kernel level?
>
> > The file system is mounted on /sys/kernel/stats and would be already used
> > by kvm. Statsfs was initially introduced by Paolo Bonzini [1].
> >
> > Statsfs offers a generic and stable API, allowing any kind of
> > directory/file organization and supporting multiple kind of aggregations
> > (not only sum, but also average, max, min and count_zero) and data types
> > (all unsigned and signed types plus boolean). The implementation, which is
> > a generalization of KVM’s debugfs statistics code, takes care of gathering
> > and displaying information at run time; users only need to specify the
> > values to be included in each source.
> >
> > Statsfs would also be a different mountpoint from debugfs, and would not
> > suffer from limited access due to the security lock down patches. Its main
> > function is to display each statistics as a file in the desired folder
> > hierarchy defined through the API. Statsfs files can be read, and possibly
> > cleared if their file mode allows it.
> >
> > Statsfs has two main components: the public API defined by
> > include/linux/statsfs.h, and the virtual file system which should end up
> > in /sys/kernel/stats.
> >
> > The API has two main elements, values and sources. Kernel subsystems like
> > KVM can use the API to create a source, add child
> > sources/values/aggregates and register it to the root source (that on the
> > virtual fs would be /sys/kernel/statsfs).
> >
> > Sources are created via statsfs_source_create(), and each source becomes a
> > directory in the file system. Sources form a parent-child relationship;
> > root sources are added to the file system via statsfs_source_register().
> > Every other source is added to or removed from a parent through the
> > statsfs_source_add_subordinate and statsfs_source_remote_subordinate APIs.
> > Once a source is created and added to the tree (via add_subordinate), it
> > will be used to compute aggregate values in the parent source.
> >
> > Values represent quantites that are gathered by the statsfs user. Examples
> > of values include the number of vm exits of a given kind, the amount of
> > memory used by some data structure, the length of the longest hash table
> > chain, or anything like that. Values are defined with the
> > statsfs_source_add_values function. Each value is defined by a struct
> > statsfs_value; the same statsfs_value can be added to many different
> > sources. A value can be considered "simple" if it fetches data from a
> > user-provided location, or "aggregate" if it groups all values in the
> > subordinates sources that include the same statsfs_value.
> >
>
> This seems like it could have a lot of overhead if we wanted to
> periodically track the totality of subsystem stats as a form of telemetry
> gathering from userspace. To collect telemetry for 1,000 different stats,
> do we need to issue lseek()+read() syscalls for each of them individually
> (or, worse, open()+read()+close())?
>
> Any thoughts on how that can be optimized? A couple of ideas:
>
> - an interface that allows gathering of all stats for a particular
> interface through a single file that would likely be encoded in binary
> and the responsibility of userspace to disseminate, or
>
> - an interface that extends beyond this proposal and allows the reader to
> specify which stats they are interested in collecting and then the
> kernel will only provide these stats in a well formed structure and
> also be binary encoded.
Something akin to how ftrace allows you specify the list of functions
in /sys/kernel/debug/tracing/set_ftrace_filter would make this a lot
easier to use than the one-file-per-stat interface.
That would be useful, e.g. in capturing correlated stats periodically
e.g. scheduler, power and thermal stats
> We've found that the one-file-per-stat method is pretty much a show
> stopper from the performance view and we always must execute at least two
> syscalls to obtain a single stat.
>
> Since this is becoming a generic API (good!!), maybe we can discuss
> possible ways to optimize gathering of stats in mass?
>
> > For more information, please consult the kerneldoc documentation in patch
> > 2 and the sample uses in the kunit tests and in KVM.
> >
> > This series of patches is based on my previous series "libfs: group and
> > simplify linux fs code" and the single patch sent to kvm "kvm_host: unify
> > VM_STAT and VCPU_STAT definitions in a single place". The former
> > simplifies code duplicated in debugfs and tracefs (from which statsfs is
> > based on), the latter groups all macros definition for statistics in kvm
> > in a single common file shared by all architectures.
> >
> > Patch 1 adds a new refcount and kref destructor wrappers that take a
> > semaphore, as those are used later by statsfs. Patch 2 introduces the
> > statsfs API, patch 3 provides extensive tests that can also be used as
> > example on how to use the API and patch 4 adds the file system support.
> > Finally, patch 5 provides a real-life example of statsfs usage in KVM.
> >
> > [1] https://lore.kernel.org/kvm/5d6cdcb1-d8ad-7ae6-7351-3544e2fa366d@redhat.com/?fbclid=IwAR18LHJ0PBcXcDaLzILFhHsl3qpT3z2vlG60RnqgbpGYhDv7L43n0ZXJY8M
> >
> > Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> >
> > v1->v2 remove unnecessary list_foreach_safe loops, fix wrong indentation,
> > change statsfs in stats_fs
> >
> > Emanuele Giuseppe Esposito (5):
> > refcount, kref: add dec-and-test wrappers for rw_semaphores
> > stats_fs API: create, add and remove stats_fs sources and values
> > kunit: tests for stats_fs API
> > stats_fs fs: virtual fs to show stats to the end-user
> > kvm_main: replace debugfs with stats_fs
> >
> > MAINTAINERS | 7 +
> > arch/arm64/kvm/Kconfig | 1 +
> > arch/arm64/kvm/guest.c | 2 +-
> > arch/mips/kvm/Kconfig | 1 +
> > arch/mips/kvm/mips.c | 2 +-
> > arch/powerpc/kvm/Kconfig | 1 +
> > arch/powerpc/kvm/book3s.c | 6 +-
> > arch/powerpc/kvm/booke.c | 8 +-
> > arch/s390/kvm/Kconfig | 1 +
> > arch/s390/kvm/kvm-s390.c | 16 +-
> > arch/x86/include/asm/kvm_host.h | 2 +-
> > arch/x86/kvm/Kconfig | 1 +
> > arch/x86/kvm/Makefile | 2 +-
> > arch/x86/kvm/debugfs.c | 64 --
> > arch/x86/kvm/stats_fs.c | 56 ++
> > arch/x86/kvm/x86.c | 6 +-
> > fs/Kconfig | 12 +
> > fs/Makefile | 1 +
> > fs/stats_fs/Makefile | 6 +
> > fs/stats_fs/inode.c | 337 ++++++++++
> > fs/stats_fs/internal.h | 35 +
> > fs/stats_fs/stats_fs-tests.c | 1088 +++++++++++++++++++++++++++++++
> > fs/stats_fs/stats_fs.c | 773 ++++++++++++++++++++++
> > include/linux/kref.h | 11 +
> > include/linux/kvm_host.h | 39 +-
> > include/linux/refcount.h | 2 +
> > include/linux/stats_fs.h | 304 +++++++++
> > include/uapi/linux/magic.h | 1 +
> > lib/refcount.c | 32 +
> > tools/lib/api/fs/fs.c | 21 +
> > virt/kvm/arm/arm.c | 2 +-
> > virt/kvm/kvm_main.c | 314 ++-------
> > 32 files changed, 2772 insertions(+), 382 deletions(-)
> > delete mode 100644 arch/x86/kvm/debugfs.c
> > create mode 100644 arch/x86/kvm/stats_fs.c
> > create mode 100644 fs/stats_fs/Makefile
> > create mode 100644 fs/stats_fs/inode.c
> > create mode 100644 fs/stats_fs/internal.h
> > create mode 100644 fs/stats_fs/stats_fs-tests.c
> > create mode 100644 fs/stats_fs/stats_fs.c
> > create mode 100644 include/linux/stats_fs.h
> >
> > --
> > 2.25.2
> >
> >
^ permalink raw reply
* Re: linux-next: build failure on powerpc 8xx with 16k pages
From: Peter Zijlstra @ 2020-06-04 12:00 UTC (permalink / raw)
To: Will Deacon
Cc: Stephen Rothwell, Linux Kernel Mailing List,
Linux Next Mailing List, Andrew Morton, PowerPC, Thomas Gleixner
In-Reply-To: <20200604111723.GA1267@willie-the-truck>
On Thu, Jun 04, 2020 at 12:17:23PM +0100, Will Deacon wrote:
> Hi, [+Peter]
>
> On Thu, Jun 04, 2020 at 10:48:03AM +0000, Christophe Leroy wrote:
> > Using mpc885_ads_defconfig with CONFIG_PPC_16K_PAGES instead of
> > CONFIG_PPC_4K_PAGES, getting the following build failure:
> >
> > CC mm/gup.o
> > In file included from ./include/linux/kernel.h:11:0,
> > from mm/gup.c:2:
> > In function 'gup_hugepte.constprop',
> > inlined from 'gup_huge_pd.isra.78' at mm/gup.c:2465:8:
> > ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_257'
> > declared with attribute error: Unsupported access size for
> > {READ,WRITE}_ONCE().
> > _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> > ^
> > ./include/linux/compiler.h:373:4: note: in definition of macro
> > '__compiletime_assert'
> > prefix ## suffix(); \
> > ^
> > ./include/linux/compiler.h:392:2: note: in expansion of macro
> > '_compiletime_assert'
> > _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> > ^
> > ./include/linux/compiler.h:405:2: note: in expansion of macro
> > 'compiletime_assert'
> > compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
> > ^
> > ./include/linux/compiler.h:291:2: note: in expansion of macro
> > 'compiletime_assert_rwonce_type'
> > compiletime_assert_rwonce_type(x); \
> > ^
> > mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
> > pte = READ_ONCE(*ptep);
> > ^
> > In function 'gup_get_pte',
> > inlined from 'gup_pte_range' at mm/gup.c:2228:9,
> > inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
> > inlined from 'gup_pud_range' at mm/gup.c:2641:15,
> > inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
> > inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
> > inlined from 'internal_get_user_pages_fast' at mm/gup.c:2785:3:
>
> At first glance, this looks like a real bug in the 16k page code -- you're
> loading the pte non-atomically on the fast GUP path and so you're prone to
> tearing, which probably isn't what you want. For a short-term hack, I'd
> suggest having CONFIG_HAVE_FAST_GUP depend on !CONFIG_PPC_16K_PAGES, but if
> you want to support this them you'll need to rework your pte_t so that it
> can be loaded atomically.
Looking at commit 55c8fc3f49302, they're all the exact same value, so
what they could do is grow another special gup_get_pte() variant that
just loads the first value.
Also, per that very same commit, there's a distinct lack of WRITE_ONCE()
in the pte_update() / __set_pte_at() paths for much of Power.
^ permalink raw reply
* Re: linux-next: fix ups for clashes between akpm and powerpc trees
From: Michael Ellerman @ 2020-06-04 12:43 UTC (permalink / raw)
To: Stephen Rothwell, Andrew Morton
Cc: Linux Next Mailing List, PowerPC, Linux Kernel Mailing List
In-Reply-To: <20200604184501.1ea5ba36@canb.auug.org.au>
Stephen Rothwell <sfr@canb.auug.org.au> writes:
> Hi all,
>
> On Thu, 4 Jun 2020 16:52:46 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>>
>> diff --git a/arch/powerpc/mm/kasan/8xx.c b/arch/powerpc/mm/kasan/8xx.c
>> index db4ef44af22f..569d98a41881 100644
>> --- a/arch/powerpc/mm/kasan/8xx.c
>> +++ b/arch/powerpc/mm/kasan/8xx.c
>> @@ -10,7 +10,7 @@
>> static int __init
>> kasan_init_shadow_8M(unsigned long k_start, unsigned long k_end, void *block)
>> {
>> - pmd_t *pmd = pmd_ptr_k(k_start);
>> + pmd_t *pmd = pmd_off_k(k_start);
>> unsigned long k_cur, k_next;
>>
>> for (k_cur = k_start; k_cur != k_end; k_cur = k_next, pmd += 2, block += SZ_8M) {
>> @@ -59,7 +59,7 @@ int __init kasan_init_region(void *start, size_t size)
>> return ret;
>>
>> for (; k_cur < k_end; k_cur += PAGE_SIZE) {
>> - pmd_t *pmd = pmd_ptr_k(k_cur);
>> + pmd_t *pmd = pmd_off_k(k_cur);
>> void *va = block + k_cur - k_start;
>> pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
>>
>> diff --git a/arch/powerpc/mm/kasan/book3s_32.c b/arch/powerpc/mm/kasan/book3s_32.c
>> index 4bc491a4a1fd..a32b4640b9de 100644
>> --- a/arch/powerpc/mm/kasan/book3s_32.c
>> +++ b/arch/powerpc/mm/kasan/book3s_32.c
>> @@ -46,7 +46,7 @@ int __init kasan_init_region(void *start, size_t size)
>> kasan_update_early_region(k_start, k_cur, __pte(0));
>>
>> for (; k_cur < k_end; k_cur += PAGE_SIZE) {
>> - pmd_t *pmd = pmd_ptr_k(k_cur);
>> + pmd_t *pmd = pmd_off_k(k_cur);
>> void *va = block + k_cur - k_start;
>> pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
>>
>> diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
>> index 286441bbbe49..92e8929cbe3e 100644
>> --- a/arch/powerpc/mm/nohash/8xx.c
>> +++ b/arch/powerpc/mm/nohash/8xx.c
>> @@ -74,7 +74,7 @@ static pte_t __init *early_hugepd_alloc_kernel(hugepd_t *pmdp, unsigned long va)
>> static int __ref __early_map_kernel_hugepage(unsigned long va, phys_addr_t pa,
>> pgprot_t prot, int psize, bool new)
>> {
>> - pmd_t *pmdp = pmd_ptr_k(va);
>> + pmd_t *pmdp = pmd_off_k(va);
>> pte_t *ptep;
>>
>> if (WARN_ON(psize != MMU_PAGE_512K && psize != MMU_PAGE_8M))
>> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
>> index 45a0556089e8..1136257c3a99 100644
>> --- a/arch/powerpc/mm/pgtable.c
>> +++ b/arch/powerpc/mm/pgtable.c
>> @@ -264,7 +264,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> #if defined(CONFIG_PPC_8xx)
>> void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
>> {
>> - pmd_t *pmd = pmd_ptr(mm, addr);
>> + pmd_t *pmd = pmd_off(mm, addr);
>> pte_basic_t val;
>> pte_basic_t *entry = &ptep->pte;
>> int num = is_hugepd(*((hugepd_t *)pmd)) ? 1 : SZ_512K / SZ_4K;
>> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
>> index e2d054c9575e..6eb4eab79385 100644
>> --- a/arch/powerpc/mm/pgtable_32.c
>> +++ b/arch/powerpc/mm/pgtable_32.c
>> @@ -40,7 +40,7 @@ notrace void __init early_ioremap_init(void)
>> {
>> unsigned long addr = ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE);
>> pte_t *ptep = (pte_t *)early_fixmap_pagetable;
>> - pmd_t *pmdp = pmd_ptr_k(addr);
>> + pmd_t *pmdp = pmd_off_k(addr);
>>
>> for (; (s32)(FIXADDR_TOP - addr) > 0;
>> addr += PGDIR_SIZE, ptep += PTRS_PER_PTE, pmdp++)
>
> I have added the above hunks as to linux-next for tomorrow as a fix for
> mm-pgtable-add-shortcuts-for-accessing-kernel-pmd-and-pte.
Looks good. Thanks.
cheers
^ permalink raw reply
* [PATCH v3 0/3] selftests: powerpc: Fixes and execute-disable test for pkeys
From: Sandipan Das @ 2020-06-04 12:56 UTC (permalink / raw)
To: mpe
Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
linuxppc-dev, bauerman
This fixes the way the Authority Mask Register (AMR) is updated
by the existing pkey tests and adds a new test to verify the
functionality of execute-disabled pkeys.
Previous versions can be found at:
v2: https://lore.kernel.org/linuxppc-dev/20200527030342.13712-1-sandipan@linux.ibm.com/
v1: https://lore.kernel.org/linuxppc-dev/20200508162332.65316-1-sandipan@linux.ibm.com/
Changes in v3:
- Fixed AMR writes for existing pkey tests (new patch).
- Moved Hash MMU check under utilities (new patch) and removed duplicate
code.
- Fixed comments on why the pkey permission bits were redefined.
- Switched to existing mfspr() macro for reading AMR.
- Switched to sig_atomic_t as data type for variables updated in the
signal handlers.
- Switched to exit()-ing if the signal handlers come across an unexpected
condition instead of trying to reset page and pkey permissions.
- Switched to write() from printf() for printing error messages from
the signal handlers.
- Switched to getpagesize().
- Renamed fault counter to denote remaining faults.
- Dropped unnecessary randomization for choosing an address to fault at.
- Added additional information on change in permissions due to AMR and
IAMR bits in comments.
- Switched the first instruction word of the executable region to a trap
to test if it is actually overwritten by a no-op later.
- Added an new test scenario where the pkey imposes no restrictions and
an attempt is made to jump to the executable region again.
Changes in v2:
- Added .gitignore entry for test binary.
- Fixed builds for older distros where siginfo_t might not have si_pkey as
a formal member based on discussion with Michael.
Sandipan Das (3):
selftests: powerpc: Fix pkey access right updates
selftests: powerpc: Move Hash MMU check to utilities
selftests: powerpc: Add test for execute-disabled pkeys
tools/testing/selftests/powerpc/include/reg.h | 6 +
.../testing/selftests/powerpc/include/utils.h | 1 +
tools/testing/selftests/powerpc/mm/.gitignore | 1 +
tools/testing/selftests/powerpc/mm/Makefile | 5 +-
.../selftests/powerpc/mm/bad_accesses.c | 28 --
.../selftests/powerpc/mm/pkey_exec_prot.c | 388 ++++++++++++++++++
.../selftests/powerpc/ptrace/core-pkey.c | 2 +-
.../selftests/powerpc/ptrace/ptrace-pkey.c | 2 +-
tools/testing/selftests/powerpc/utils.c | 28 ++
9 files changed, 429 insertions(+), 32 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
--
2.25.1
^ permalink raw reply
* [PATCH v3 2/3] selftests: powerpc: Move Hash MMU check to utilities
From: Sandipan Das @ 2020-06-04 12:56 UTC (permalink / raw)
To: mpe
Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
linuxppc-dev, bauerman
In-Reply-To: <20200604125610.649668-1-sandipan@linux.ibm.com>
This moves a function to test if the MMU is in Hash mode
under the generic test utilities.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
---
.../testing/selftests/powerpc/include/utils.h | 1 +
tools/testing/selftests/powerpc/mm/Makefile | 2 +-
.../selftests/powerpc/mm/bad_accesses.c | 28 -------------------
tools/testing/selftests/powerpc/utils.c | 28 +++++++++++++++++++
4 files changed, 30 insertions(+), 29 deletions(-)
diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h
index e089a0c30d9a..ad2728736ae5 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -60,6 +60,7 @@ static inline bool have_hwcap2(unsigned long ftr2)
#endif
bool is_ppc64le(void);
+int using_hash_mmu(bool *using_hash);
/* Yes, this is evil */
#define FAIL_IF(x) \
diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile
index b9103c4bb414..2389bf791fd6 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -10,7 +10,7 @@ TEST_GEN_FILES := tempfile
top_srcdir = ../../../../..
include ../../lib.mk
-$(TEST_GEN_PROGS): ../harness.c
+$(TEST_GEN_PROGS): ../harness.c ../utils.c
$(OUTPUT)/prot_sao: ../utils.c
diff --git a/tools/testing/selftests/powerpc/mm/bad_accesses.c b/tools/testing/selftests/powerpc/mm/bad_accesses.c
index adc465f499ef..a864ed7e2008 100644
--- a/tools/testing/selftests/powerpc/mm/bad_accesses.c
+++ b/tools/testing/selftests/powerpc/mm/bad_accesses.c
@@ -64,34 +64,6 @@ int bad_access(char *p, bool write)
return 0;
}
-static int using_hash_mmu(bool *using_hash)
-{
- char line[128];
- FILE *f;
- int rc;
-
- f = fopen("/proc/cpuinfo", "r");
- FAIL_IF(!f);
-
- rc = 0;
- while (fgets(line, sizeof(line), f) != NULL) {
- if (strcmp(line, "MMU : Hash\n") == 0) {
- *using_hash = true;
- goto out;
- }
-
- if (strcmp(line, "MMU : Radix\n") == 0) {
- *using_hash = false;
- goto out;
- }
- }
-
- rc = -1;
-out:
- fclose(f);
- return rc;
-}
-
static int test(void)
{
unsigned long i, j, addr, region_shift, page_shift, page_size;
diff --git a/tools/testing/selftests/powerpc/utils.c b/tools/testing/selftests/powerpc/utils.c
index 5ee0e98c4896..933678f1ed0a 100644
--- a/tools/testing/selftests/powerpc/utils.c
+++ b/tools/testing/selftests/powerpc/utils.c
@@ -293,3 +293,31 @@ void set_dscr(unsigned long val)
asm volatile("mtspr %1,%0" : : "r" (val), "i" (SPRN_DSCR));
}
+
+int using_hash_mmu(bool *using_hash)
+{
+ char line[128];
+ FILE *f;
+ int rc;
+
+ f = fopen("/proc/cpuinfo", "r");
+ FAIL_IF(!f);
+
+ rc = 0;
+ while (fgets(line, sizeof(line), f) != NULL) {
+ if (strcmp(line, "MMU : Hash\n") == 0) {
+ *using_hash = true;
+ goto out;
+ }
+
+ if (strcmp(line, "MMU : Radix\n") == 0) {
+ *using_hash = false;
+ goto out;
+ }
+ }
+
+ rc = -1;
+out:
+ fclose(f);
+ return rc;
+}
--
2.25.1
^ permalink raw reply related
* [PATCH v3 3/3] selftests: powerpc: Add test for execute-disabled pkeys
From: Sandipan Das @ 2020-06-04 12:56 UTC (permalink / raw)
To: mpe
Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
linuxppc-dev, bauerman
In-Reply-To: <20200604125610.649668-1-sandipan@linux.ibm.com>
Apart from read and write access, memory protection keys can
also be used for restricting execute permission of pages on
powerpc. This adds a test to verify if the feature works as
expected.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
---
tools/testing/selftests/powerpc/mm/.gitignore | 1 +
tools/testing/selftests/powerpc/mm/Makefile | 3 +-
.../selftests/powerpc/mm/pkey_exec_prot.c | 388 ++++++++++++++++++
3 files changed, 391 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
diff --git a/tools/testing/selftests/powerpc/mm/.gitignore b/tools/testing/selftests/powerpc/mm/.gitignore
index 2ca523255b1b..8f841f925baa 100644
--- a/tools/testing/selftests/powerpc/mm/.gitignore
+++ b/tools/testing/selftests/powerpc/mm/.gitignore
@@ -8,3 +8,4 @@ wild_bctr
large_vm_fork_separation
bad_accesses
tlbie_test
+pkey_exec_prot
diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile
index 2389bf791fd6..f9fa0ba7435c 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -3,7 +3,7 @@ noarg:
$(MAKE) -C ../
TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors wild_bctr \
- large_vm_fork_separation bad_accesses
+ large_vm_fork_separation bad_accesses pkey_exec_prot
TEST_GEN_PROGS_EXTENDED := tlbie_test
TEST_GEN_FILES := tempfile
@@ -17,6 +17,7 @@ $(OUTPUT)/prot_sao: ../utils.c
$(OUTPUT)/wild_bctr: CFLAGS += -m64
$(OUTPUT)/large_vm_fork_separation: CFLAGS += -m64
$(OUTPUT)/bad_accesses: CFLAGS += -m64
+$(OUTPUT)/pkey_exec_prot: CFLAGS += -m64
$(OUTPUT)/tempfile:
dd if=/dev/zero of=$@ bs=64k count=1
diff --git a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
new file mode 100644
index 000000000000..7c7c93425c5e
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
@@ -0,0 +1,388 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/*
+ * Copyright 2020, Sandipan Das, IBM Corp.
+ *
+ * Test if applying execute protection on pages using memory
+ * protection keys works as expected.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <signal.h>
+
+#include <unistd.h>
+#include <sys/mman.h>
+
+#include "reg.h"
+#include "utils.h"
+
+/*
+ * Older versions of libc use the Intel-specific access rights.
+ * Hence, override the definitions as they might be incorrect.
+ */
+#undef PKEY_DISABLE_ACCESS
+#define PKEY_DISABLE_ACCESS 0x3
+
+#undef PKEY_DISABLE_WRITE
+#define PKEY_DISABLE_WRITE 0x2
+
+#undef PKEY_DISABLE_EXECUTE
+#define PKEY_DISABLE_EXECUTE 0x4
+
+/* Older versions of libc do not not define this */
+#ifndef SEGV_PKUERR
+#define SEGV_PKUERR 4
+#endif
+
+#define SI_PKEY_OFFSET 0x20
+
+#define SYS_pkey_mprotect 386
+#define SYS_pkey_alloc 384
+#define SYS_pkey_free 385
+
+#define PKEY_BITS_PER_PKEY 2
+#define NR_PKEYS 32
+#define PKEY_BITS_MASK ((1UL << PKEY_BITS_PER_PKEY) - 1)
+
+#define PPC_INST_NOP 0x60000000
+#define PPC_INST_TRAP 0x7fe00008
+#define PPC_INST_BLR 0x4e800020
+
+#define sigsafe_err(msg) ({ \
+ ssize_t nbytes __attribute__((unused)); \
+ nbytes = write(STDERR_FILENO, msg, strlen(msg)); })
+
+static inline unsigned long pkeyreg_get(void)
+{
+ return mfspr(SPRN_AMR);
+}
+
+static inline void pkeyreg_set(unsigned long amr)
+{
+ set_amr(amr);
+}
+
+static void pkey_set_rights(int pkey, unsigned long rights)
+{
+ unsigned long amr, shift;
+
+ shift = (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY;
+ amr = pkeyreg_get();
+ amr &= ~(PKEY_BITS_MASK << shift);
+ amr |= (rights & PKEY_BITS_MASK) << shift;
+ pkeyreg_set(amr);
+}
+
+static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
+{
+ return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
+}
+
+static int sys_pkey_alloc(unsigned long flags, unsigned long rights)
+{
+ return syscall(SYS_pkey_alloc, flags, rights);
+}
+
+static int sys_pkey_free(int pkey)
+{
+ return syscall(SYS_pkey_free, pkey);
+}
+
+static volatile sig_atomic_t fault_pkey, fault_code, fault_type;
+static volatile sig_atomic_t remaining_faults;
+static volatile unsigned int *fault_addr;
+static unsigned long pgsize, numinsns;
+static unsigned int *insns;
+
+static void trap_handler(int signum, siginfo_t *sinfo, void *ctx)
+{
+ /* Check if this fault originated from the expected address */
+ if (sinfo->si_addr != (void *) fault_addr)
+ sigsafe_err("got a fault for an unexpected address\n");
+
+ _exit(1);
+}
+
+static void segv_handler(int signum, siginfo_t *sinfo, void *ctx)
+{
+ int signal_pkey;
+
+ /*
+ * In older versions of libc, siginfo_t does not have si_pkey as
+ * a member.
+ */
+#ifdef si_pkey
+ signal_pkey = sinfo->si_pkey;
+#else
+ signal_pkey = *((int *)(((char *) sinfo) + SI_PKEY_OFFSET));
+#endif
+
+ fault_code = sinfo->si_code;
+
+ /* Check if this fault originated from the expected address */
+ if (sinfo->si_addr != (void *) fault_addr) {
+ sigsafe_err("got a fault for an unexpected address\n");
+ _exit(1);
+ }
+
+ /* Check if too many faults have occurred for a single test case */
+ if (!remaining_faults) {
+ sigsafe_err("got too many faults for the same address\n");
+ _exit(1);
+ }
+
+
+ /* Restore permissions in order to continue */
+ switch (fault_code) {
+ case SEGV_ACCERR:
+ if (mprotect(insns, pgsize, PROT_READ | PROT_WRITE)) {
+ sigsafe_err("failed to set access permissions\n");
+ _exit(1);
+ }
+ break;
+ case SEGV_PKUERR:
+ if (signal_pkey != fault_pkey) {
+ sigsafe_err("got a fault for an unexpected pkey\n");
+ _exit(1);
+ }
+
+ switch (fault_type) {
+ case PKEY_DISABLE_ACCESS:
+ pkey_set_rights(fault_pkey, 0);
+ break;
+ case PKEY_DISABLE_EXECUTE:
+ /*
+ * Reassociate the exec-only pkey with the region
+ * to be able to continue. Unlike AMR, we cannot
+ * set IAMR directly from userspace to restore the
+ * permissions.
+ */
+ if (mprotect(insns, pgsize, PROT_EXEC)) {
+ sigsafe_err("failed to set execute permissions\n");
+ _exit(1);
+ }
+ break;
+ default:
+ sigsafe_err("got a fault with an unexpected type\n");
+ _exit(1);
+ }
+ break;
+ default:
+ sigsafe_err("got a fault with an unexpected code\n");
+ _exit(1);
+ }
+
+ remaining_faults--;
+}
+
+static int pkeys_unsupported(void)
+{
+ bool hash_mmu = false;
+ int pkey;
+
+ /* Protection keys are currently supported on Hash MMU only */
+ FAIL_IF(using_hash_mmu(&hash_mmu));
+ SKIP_IF(!hash_mmu);
+
+ /* Check if the system call is supported */
+ pkey = sys_pkey_alloc(0, 0);
+ SKIP_IF(pkey < 0);
+ sys_pkey_free(pkey);
+
+ return 0;
+}
+
+static int test(void)
+{
+ struct sigaction segv_act, trap_act;
+ int pkey, ret, i;
+
+ ret = pkeys_unsupported();
+ if (ret)
+ return ret;
+
+ /* Setup SIGSEGV handler */
+ segv_act.sa_handler = 0;
+ segv_act.sa_sigaction = segv_handler;
+ FAIL_IF(sigprocmask(SIG_SETMASK, 0, &segv_act.sa_mask) != 0);
+ segv_act.sa_flags = SA_SIGINFO;
+ segv_act.sa_restorer = 0;
+ FAIL_IF(sigaction(SIGSEGV, &segv_act, NULL) != 0);
+
+ /* Setup SIGTRAP handler */
+ trap_act.sa_handler = 0;
+ trap_act.sa_sigaction = trap_handler;
+ FAIL_IF(sigprocmask(SIG_SETMASK, 0, &trap_act.sa_mask) != 0);
+ trap_act.sa_flags = SA_SIGINFO;
+ trap_act.sa_restorer = 0;
+ FAIL_IF(sigaction(SIGTRAP, &trap_act, NULL) != 0);
+
+ /* Setup executable region */
+ pgsize = getpagesize();
+ numinsns = pgsize / sizeof(unsigned int);
+ insns = (unsigned int *) mmap(NULL, pgsize, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ FAIL_IF(insns == MAP_FAILED);
+
+ /* Write the instruction words */
+ for (i = 1; i < numinsns - 1; i++)
+ insns[i] = PPC_INST_NOP;
+
+ /*
+ * Set the first instruction as an unconditional trap. If
+ * the last write to this address succeeds, this should
+ * get overwritten by a no-op.
+ */
+ insns[0] = PPC_INST_TRAP;
+
+ /*
+ * Later, to jump to the executable region, we use a branch
+ * and link instruction (bctrl) which sets the return address
+ * automatically in LR. Use that to return back.
+ */
+ insns[numinsns - 1] = PPC_INST_BLR;
+
+ /* Allocate a pkey that restricts execution */
+ pkey = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE);
+ FAIL_IF(pkey < 0);
+
+ /*
+ * Pick the first instruction's address from the executable
+ * region.
+ */
+ fault_addr = insns;
+
+ /* The following two cases will avoid SEGV_PKUERR */
+ fault_type = -1;
+ fault_pkey = -1;
+
+ /*
+ * Read an instruction word from the address when AMR bits
+ * are not set i.e. the pkey permits both read and write
+ * access.
+ *
+ * This should not generate a fault as having PROT_EXEC
+ * implies PROT_READ on GNU systems. The pkey currently
+ * restricts execution only based on the IAMR bits. The
+ * AMR bits are cleared.
+ */
+ remaining_faults = 0;
+ FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+ printf("read from %p, pkey is execute-disabled, access-enabled\n",
+ (void *) fault_addr);
+ i = *fault_addr;
+ FAIL_IF(remaining_faults != 0);
+
+ /*
+ * Write an instruction word to the address when AMR bits
+ * are not set i.e. the pkey permits both read and write
+ * access.
+ *
+ * This should generate an access fault as having just
+ * PROT_EXEC also restricts writes. The pkey currently
+ * restricts execution only based on the IAMR bits. The
+ * AMR bits are cleared.
+ */
+ remaining_faults = 1;
+ FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+ printf("write to %p, pkey is execute-disabled, access-enabled\n",
+ (void *) fault_addr);
+ *fault_addr = PPC_INST_TRAP;
+ FAIL_IF(remaining_faults != 0 || fault_code != SEGV_ACCERR);
+
+ /* The following three cases will generate SEGV_PKUERR */
+ fault_type = PKEY_DISABLE_ACCESS;
+ fault_pkey = pkey;
+
+ /*
+ * Read an instruction word from the address when AMR bits
+ * are set i.e. the pkey permits neither read nor write
+ * access.
+ *
+ * This should generate a pkey fault based on AMR bits only
+ * as having PROT_EXEC implicitly allows reads.
+ */
+ remaining_faults = 1;
+ FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+ printf("read from %p, pkey is execute-disabled, access-disabled\n",
+ (void *) fault_addr);
+ pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
+ i = *fault_addr;
+ FAIL_IF(remaining_faults != 0 || fault_code != SEGV_PKUERR);
+
+ /*
+ * Write an instruction word to the address when AMR bits
+ * are set i.e. the pkey permits neither read nor write
+ * access.
+ *
+ * This should generate two faults. First, a pkey fault
+ * based on AMR bits and then an access fault since
+ * PROT_EXEC does not allow writes.
+ */
+ remaining_faults = 2;
+ FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+ printf("write to %p, pkey is execute-disabled, access-disabled\n",
+ (void *) fault_addr);
+ pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
+ *fault_addr = PPC_INST_NOP;
+ FAIL_IF(remaining_faults != 0 || fault_code != SEGV_ACCERR);
+
+ /*
+ * Jump to the executable region when AMR bits are set i.e.
+ * the pkey permits neither read nor write access.
+ *
+ * This should generate a pkey fault based on IAMR bits which
+ * are set to not permit execution. AMR bits should not affect
+ * execution.
+ *
+ * This also checks if the overwrite of the first instruction
+ * word from a trap to a no-op succeeded.
+ */
+ fault_addr = insns;
+ fault_type = PKEY_DISABLE_EXECUTE;
+ fault_pkey = pkey;
+ remaining_faults = 1;
+ FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+ pkey_set_rights(pkey, PKEY_DISABLE_ACCESS);
+ printf("execute at %p, pkey is execute-disabled, access-disabled\n",
+ (void *) fault_addr);
+ asm volatile("mtctr %0; bctrl" : : "r"(insns));
+ FAIL_IF(remaining_faults != 0 || fault_code != SEGV_PKUERR);
+
+ /*
+ * Free the current pkey and allocate a new one that is
+ * fully permissive.
+ */
+ sys_pkey_free(pkey);
+ pkey = sys_pkey_alloc(0, 0);
+
+ /*
+ * Jump to the executable region when AMR bits are not set
+ * i.e. the pkey permits read and write access.
+ *
+ * This should not generate any faults as the IAMR bits are
+ * also not set and hence will the pkey will not restrict
+ * execution.
+ */
+ fault_pkey = pkey;
+ remaining_faults = 0;
+ FAIL_IF(sys_pkey_mprotect(insns, pgsize, PROT_EXEC, pkey) != 0);
+ printf("execute at %p, pkey is execute-enabled, access-enabled\n",
+ (void *) fault_addr);
+ asm volatile("mtctr %0; bctrl" : : "r"(insns));
+ FAIL_IF(remaining_faults != 0);
+
+ /* Cleanup */
+ munmap((void *) insns, pgsize);
+ sys_pkey_free(pkey);
+
+ return 0;
+}
+
+int main(void)
+{
+ test_harness(test, "pkey_exec_prot");
+}
--
2.25.1
^ permalink raw reply related
* [PATCH v3 1/3] selftests: powerpc: Fix pkey access right updates
From: Sandipan Das @ 2020-06-04 12:56 UTC (permalink / raw)
To: mpe
Cc: fweimer, aneesh.kumar, linuxram, linux-mm, linux-kselftest,
linuxppc-dev, bauerman
In-Reply-To: <20200604125610.649668-1-sandipan@linux.ibm.com>
The Power ISA mandates that all writes to the Authority
Mask Register (AMR) must always be preceded as well as
succeeded by a context synchronizing instruction.
This makes sure that the tests follow this requirement
when attempting to update a pkey's access rights.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
---
tools/testing/selftests/powerpc/include/reg.h | 6 ++++++
tools/testing/selftests/powerpc/ptrace/core-pkey.c | 2 +-
tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 2 +-
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/include/reg.h b/tools/testing/selftests/powerpc/include/reg.h
index 022c5076b2c5..c0f2742a3a59 100644
--- a/tools/testing/selftests/powerpc/include/reg.h
+++ b/tools/testing/selftests/powerpc/include/reg.h
@@ -57,6 +57,12 @@
#define SPRN_PPR 896 /* Program Priority Register */
#define SPRN_AMR 13 /* Authority Mask Register - problem state */
+#define set_amr(v) asm volatile("isync;" \
+ "mtspr " __stringify(SPRN_AMR) ",%0;" \
+ "isync" : \
+ : "r" ((unsigned long)(v)) \
+ : "memory")
+
/* TEXASR register bits */
#define TEXASR_FC 0xFE00000000000000
#define TEXASR_FP 0x0100000000000000
diff --git a/tools/testing/selftests/powerpc/ptrace/core-pkey.c b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
index d5c64fee032d..bbc05ffc5860 100644
--- a/tools/testing/selftests/powerpc/ptrace/core-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
@@ -150,7 +150,7 @@ static int child(struct shared_info *info)
printf("%-30s AMR: %016lx pkey1: %d pkey2: %d pkey3: %d\n",
user_write, info->amr, pkey1, pkey2, pkey3);
- mtspr(SPRN_AMR, info->amr);
+ set_amr(info->amr);
/*
* We won't use pkey3. This tests whether the kernel restores the UAMOR
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index bdbbbe8431e0..904c04f8c919 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -126,7 +126,7 @@ static int child(struct shared_info *info)
printf("%-30s AMR: %016lx pkey1: %d pkey2: %d pkey3: %d\n",
user_write, info->amr1, pkey1, pkey2, pkey3);
- mtspr(SPRN_AMR, info->amr1);
+ set_amr(info->amr1);
/* Wait for parent to read our AMR value and write a new one. */
ret = prod_parent(&info->child_sync);
--
2.25.1
^ permalink raw reply related
* Re: linux-next: build failure on powerpc 8xx with 16k pages
From: Christophe Leroy @ 2020-06-04 13:55 UTC (permalink / raw)
To: Will Deacon
Cc: Stephen Rothwell, peterz, Linux Kernel Mailing List,
Linux Next Mailing List, Andrew Morton, PowerPC, Thomas Gleixner
In-Reply-To: <20200604111723.GA1267@willie-the-truck>
On 06/04/2020 11:17 AM, Will Deacon wrote:
> Hi, [+Peter]
>
> On Thu, Jun 04, 2020 at 10:48:03AM +0000, Christophe Leroy wrote:
>> Using mpc885_ads_defconfig with CONFIG_PPC_16K_PAGES instead of
>> CONFIG_PPC_4K_PAGES, getting the following build failure:
>>
>> CC mm/gup.o
>> In file included from ./include/linux/kernel.h:11:0,
>> from mm/gup.c:2:
>> In function 'gup_hugepte.constprop',
>> inlined from 'gup_huge_pd.isra.78' at mm/gup.c:2465:8:
>> ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_257'
>> declared with attribute error: Unsupported access size for
>> {READ,WRITE}_ONCE().
>> _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>> ^
>> ./include/linux/compiler.h:373:4: note: in definition of macro
>> '__compiletime_assert'
>> prefix ## suffix(); \
>> ^
>> ./include/linux/compiler.h:392:2: note: in expansion of macro
>> '_compiletime_assert'
>> _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>> ^
>> ./include/linux/compiler.h:405:2: note: in expansion of macro
>> 'compiletime_assert'
>> compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
>> ^
>> ./include/linux/compiler.h:291:2: note: in expansion of macro
>> 'compiletime_assert_rwonce_type'
>> compiletime_assert_rwonce_type(x); \
>> ^
>> mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
>> pte = READ_ONCE(*ptep);
>> ^
>> In function 'gup_get_pte',
>> inlined from 'gup_pte_range' at mm/gup.c:2228:9,
>> inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
>> inlined from 'gup_pud_range' at mm/gup.c:2641:15,
>> inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
>> inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
>> inlined from 'internal_get_user_pages_fast' at mm/gup.c:2785:3:
>
> At first glance, this looks like a real bug in the 16k page code -- you're
> loading the pte non-atomically on the fast GUP path and so you're prone to
> tearing, which probably isn't what you want. For a short-term hack, I'd
> suggest having CONFIG_HAVE_FAST_GUP depend on !CONFIG_PPC_16K_PAGES, but if
> you want to support this them you'll need to rework your pte_t so that it
> can be loaded atomically.
What do you mean by *rework* pte_t ?
pte are 32 bits words in size and are spread every 4 words in memory.
Therefore pte_t has to be 128 bits because unlike huge_pte handling
which always use huge_pte_offset() in loops, many many places in the
kernel do pte++, so we need the pte type to be the size of the interval
from one pte to the next one.
Christophe
^ permalink raw reply
* Re: [PATCH] ASoC: fsl-asoc-card: Defer probe when fail to find codec device
From: Mark Brown @ 2020-06-04 14:05 UTC (permalink / raw)
To: lgirdwood, tiwai, perex, nicoleotsuka, Xiubo.Lee, timur,
Shengjiu Wang, linuxppc-dev, festevam, linux-kernel, alsa-devel
In-Reply-To: <1591251930-4111-1-git-send-email-shengjiu.wang@nxp.com>
On Thu, 4 Jun 2020 14:25:30 +0800, Shengjiu Wang wrote:
> Defer probe when fail to find codec device, because the codec
> device maybe probed later than machine driver.
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next
Thanks!
[1/1] ASoC: fsl-asoc-card: Defer probe when fail to find codec device
commit: e396dec46c5600d426b2ca8a01a877928b50d1d9
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
^ permalink raw reply
* Boot issue with the latest Git kernel
From: Christian Zigotzky @ 2020-06-04 14:16 UTC (permalink / raw)
To: linuxppc-dev, Christoph Hellwig, Darren Stevens; +Cc: R.T.Dickinson
Hi All,
I tested the latest Git kernel today. [1]. Unfortunately it doesn't boot
on my PowerPC machines.
Could you please test the latest Git kernel with your PowerPC machine?
BTW, it doesn't boot in a virtual QEMU PowerPC machine either.
Thanks,
Christian
[1]
https://forum.hyperion-entertainment.com/viewtopic.php?f=58&p=50758&sid=3f816f078869510dea9fe4baca3605db#p50758
^ permalink raw reply
* Re: Boot issue with the latest Git kernel
From: Christophe Leroy @ 2020-06-04 14:26 UTC (permalink / raw)
To: Christian Zigotzky, linuxppc-dev, Christoph Hellwig,
Darren Stevens
Cc: R.T.Dickinson
In-Reply-To: <cca69c35-899b-38d8-73ee-2d62997484e5@xenosoft.de>
Hi,
Le 04/06/2020 à 16:16, Christian Zigotzky a écrit :
> Hi All,
>
> I tested the latest Git kernel today. [1]. Unfortunately it doesn't boot
> on my PowerPC machines.
>
> Could you please test the latest Git kernel with your PowerPC machine?
>
> BTW, it doesn't boot in a virtual QEMU PowerPC machine either.
>
Which machine/platform ? Which defconfig are you using ?
Christophe
^ permalink raw reply
* Re: Boot issue with the latest Git kernel
From: Christophe Leroy @ 2020-06-04 14:29 UTC (permalink / raw)
To: Christian Zigotzky, linuxppc-dev, Christoph Hellwig,
Darren Stevens
Cc: R.T.Dickinson
In-Reply-To: <16b46b7d-7bd0-83b8-307f-2b9be830b096@csgroup.eu>
Le 04/06/2020 à 16:26, Christophe Leroy a écrit :
> Hi,
>
>
> Le 04/06/2020 à 16:16, Christian Zigotzky a écrit :
>> Hi All,
>>
>> I tested the latest Git kernel today. [1]. Unfortunately it doesn't
>> boot on my PowerPC machines.
>>
>> Could you please test the latest Git kernel with your PowerPC machine?
>>
>> BTW, it doesn't boot in a virtual QEMU PowerPC machine either.
>>
>
> Which machine/platform ? Which defconfig are you using ?
>
And are you able to perform a 'git bisect' to identify the guilty commit ?
Thanks
Christophe
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox