* Re: [PATCH v3 25/41] powerpc/book3s64/kuep: Store/restore userspace IAMR correctly on entry and exit from kernel
From: kernel test robot @ 2020-06-11 0:03 UTC (permalink / raw)
To: Aneesh Kumar K.V, linuxppc-dev, mpe
Cc: clang-built-linux, kbuild-all, bauerman, linuxram,
Aneesh Kumar K.V
In-Reply-To: <20200610095204.608183-26-aneesh.kumar@linux.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 9130 bytes --]
Hi "Aneesh,
I love your patch! Yet something to improve:
[auto build test ERROR on powerpc/next]
[also build test ERROR on next-20200610]
[cannot apply to v5.7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
url: https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/Kernel-userspace-access-execution-prevention-with-hash-translation/20200610-191943
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r006-20200608 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project bc2b70982be8f5250cd0082a7190f8b417bd4dfe)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>, old ones prefixed by <<):
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
In file included from arch/powerpc/include/asm/kup.h:18:
arch/powerpc/include/asm/book3s/64/kup.h:181:24: error: no member named 'kuap' in 'struct pt_regs'
mtspr(SPRN_AMR, regs->kuap);
~~~~ ^
arch/powerpc/include/asm/reg.h:1386:33: note: expanded from macro 'mtspr'
: "r" ((unsigned long)(v)) ^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
In file included from arch/powerpc/include/asm/kup.h:18:
>> arch/powerpc/include/asm/book3s/64/kup.h:182:25: error: no member named 'kuep' in 'struct pt_regs'
mtspr(SPRN_IAMR, regs->kuep);
~~~~ ^
arch/powerpc/include/asm/reg.h:1386:33: note: expanded from macro 'mtspr'
: "r" ((unsigned long)(v)) ^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
In file included from arch/powerpc/include/asm/kup.h:18:
arch/powerpc/include/asm/book3s/64/kup.h:194:22: error: no member named 'kuap' in 'struct pt_regs'
if (unlikely(regs->kuap != amr)) {
~~~~ ^
include/linux/compiler.h:78:42: note: expanded from macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
In file included from arch/powerpc/include/asm/kup.h:18:
arch/powerpc/include/asm/book3s/64/kup.h:196:26: error: no member named 'kuap' in 'struct pt_regs'
mtspr(SPRN_AMR, regs->kuap);
~~~~ ^
arch/powerpc/include/asm/reg.h:1386:33: note: expanded from macro 'mtspr'
: "r" ((unsigned long)(v)) ^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
In file included from arch/powerpc/include/asm/kup.h:18:
arch/powerpc/include/asm/book3s/64/kup.h:293:14: error: no member named 'kuap' in 'struct pt_regs'
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
~~~~ ^
include/asm-generic/bug.h:122:25: note: expanded from macro 'WARN'
int __ret_warn_on = !!(condition); ^~~~~~~~~
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
arch/powerpc/include/asm/kup.h:56:20: error: redefinition of 'allow_user_access'
static inline void allow_user_access(void __user *to, const void __user *from,
^
arch/powerpc/include/asm/book3s/64/kup.h:254:29: note: previous definition is here
static __always_inline void allow_user_access(void __user *to, const void __user *from,
^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
arch/powerpc/include/asm/kup.h:58:20: error: redefinition of 'prevent_user_access'
static inline void prevent_user_access(void __user *to, const void __user *from,
^
arch/powerpc/include/asm/book3s/64/kup.h:269:20: note: previous definition is here
static inline void prevent_user_access(void __user *to, const void __user *from,
^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
arch/powerpc/include/asm/kup.h:60:29: error: redefinition of 'prevent_user_access_return'
static inline unsigned long prevent_user_access_return(void) { return 0UL; }
^
arch/powerpc/include/asm/book3s/64/kup.h:275:29: note: previous definition is here
static inline unsigned long prevent_user_access_return(void)
^
In file included from arch/powerpc/kernel/asm-offsets.c:14:
In file included from include/linux/compat.h:15:
In file included from include/linux/socket.h:8:
In file included from include/linux/uio.h:10:
In file included from include/crypto/hash.h:11:
In file included from include/linux/crypto.h:21:
In file included from include/linux/uaccess.h:11:
In file included from arch/powerpc/include/asm/uaccess.h:9:
vim +182 arch/powerpc/include/asm/book3s/64/kup.h
174
175 static inline void kuap_restore_user_amr(struct pt_regs *regs)
176 {
177 if (!mmu_has_feature(MMU_FTR_PKEY))
178 return;
179
180 isync();
181 mtspr(SPRN_AMR, regs->kuap);
> 182 mtspr(SPRN_IAMR, regs->kuep);
183 /*
184 * No isync required here because we are about to rfi
185 * back to previous context before any user accesses
186 * would be made, which is a CSI.
187 */
188 }
189 static inline void kuap_restore_kernel_amr(struct pt_regs *regs,
190 unsigned long amr)
191 {
192 if (mmu_has_feature(MMU_FTR_KUAP) || mmu_has_feature(MMU_FTR_PKEY)) {
193
194 if (unlikely(regs->kuap != amr)) {
195 isync();
196 mtspr(SPRN_AMR, regs->kuap);
197 /*
198 * No isync required here because we are about to rfi
199 * back to previous context before any user accesses
200 * would be made, which is a CSI.
201 */
202 }
203 }
204 /*
205 * No need to restore IAMR when returning to kernel space.
206 */
207 }
208
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37467 bytes --]
^ permalink raw reply
* Re: [RFC PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions
From: Nicholas Piggin @ 2020-06-11 2:34 UTC (permalink / raw)
To: linuxppc-dev, Matheus Castanho
In-Reply-To: <cc609f82-1d11-b1f9-2594-153936d7fe48@linux.ibm.com>
Excerpts from Matheus Castanho's message of May 14, 2020 6:55 am:
> Hi Nicholas,
>
> Small comment below:
>
> On 4/30/20 1:02 AM, Nicholas Piggin wrote:
>> Add support for the scv instruction on POWER9 and later CPUs.
>>
>> For now this implements the zeroth scv vector 'scv 0', as identical
>> to 'sc' system calls, with the exception that lr is not preserved, and
>> it is 64-bit only. There may yet be changes made to this ABI, so it's
>> for testing only.
>>
>> rfscv is implemented to return from scv type system calls. It can not
>> be used to return from sc system calls because those are defined to
>> preserve lr.
>>
>> In a comparison of getpid syscall, the test program had scv taking
>> about 3 more cycles in user mode (92 vs 89 for sc), due to lr handling.
>> getpid syscall throughput on POWER9 is improved by 33%, mostly due to
>> reducing mtmsr and mtspr.
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> Documentation/powerpc/syscall64-abi.rst | 42 ++++--
>
> [...]
>
>> +Return value
>> +------------
>> +- For the sc instruction, both a return value and a return error code are
>> + returned. cr0.SO is the return error code, and r3 is the return value or
>> + error code. When cr0.SO is clear, the syscall succeeded and r3 is the return
>> + value. When cr0.SO is set, the syscall failed and r3 is the error code that
>> + generally corresponds to errno.
>> +
>> +- For the scv 0 instruction, there is a return value indicates failure if it
>> + is >= -MAX_ERRNO (-4095) as an unsigned comparison, in which case it is the
>> + negated return error code. Otherwise it is the successful return value.
>
> I believe this last paragraph is a bit confusing (didn't quite get the
> unsigned comparison with negative values). So instead of cr0.SO to
> indicate failure, scv returns the negated error code, and positive in
> case of success?
Yes, it will be like other major architectures and return values from
-4095..-1 indicate an error with error value equal to -return value.
I will try to make it a bit clearer.
Thanks,
Nick
^ permalink raw reply
* [PATCH kernel] KVM: PPC: Fix nested guest RC bits update
From: Alexey Kardashevskiy @ 2020-06-11 3:05 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
Before commit 6cdf30375f82 ("powerpc/kvm/book3s: Use kvm helpers
to walk shadow or secondary table") we called __find_linux_pte() with
a page table pointer from a kvm_nested_guest struct but
now we rely on kvmhv_find_nested() which takes an L1 LPID and returns
a kvm_nested_guest pointer, however we pass a L0 LPID there and
the L2 guest hangs.
This fixes the LPID passed to kvmppc_hv_handle_set_rc().
Fixes: 6cdf30375f82 ("powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 99011f1b772a..f36f0a2993c0 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1234,7 +1234,7 @@ static long kvmhv_handle_nested_set_rc(struct kvm_vcpu *vcpu,
/* Set the rc bit in the pte of the shadow_pgtable for the nest guest */
ret = kvmppc_hv_handle_set_rc(kvm, true, writing,
- n_gpa, gp->shadow_lpid);
+ n_gpa, gp->l1_lpid);
if (!ret)
ret = -EINVAL;
else
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 2/5] powerpc/lib: Initialize a temporary mm for code patching
From: Christopher M. Riedl @ 2020-06-11 3:29 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev, kernel-hardening
In-Reply-To: <4ffced42-ee3a-841f-2d3f-34daec11b05b@csgroup.eu>
On Wed Jun 3, 2020 at 9:01 AM, Christophe Leroy wrote:
>
>
>
>
> Le 03/06/2020 à 07:19, Christopher M. Riedl a écrit :
> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> > address to be patched is temporarily mapped with permissive memory
> > protections. Currently, a per-cpu vmalloc patch area is used for this
> > purpose. While the patch area is per-cpu, the temporary page mapping is
> > inserted into the kernel page tables for the duration of the patching.
> > The mapping is exposed to CPUs other than the patching CPU - this is
> > undesirable from a hardening perspective.
> >
> > Use the `poking_init` init hook to prepare a temporary mm and patching
> > address. Initialize the temporary mm by copying the init mm. Choose a
> > randomized patching address inside the temporary mm userspace address
> > portion. The next patch uses the temporary mm and patching address for
> > code patching.
> >
> > Based on x86 implementation:
> >
> > commit 4fc19708b165
> > ("x86/alternatives: Initialize temporary mm for patching")
> >
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> > arch/powerpc/lib/code-patching.c | 33 ++++++++++++++++++++++++++++++++
> > 1 file changed, 33 insertions(+)
> >
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 5ecf0d635a8d..599114f63b44 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -11,6 +11,8 @@
> > #include <linux/cpuhotplug.h>
> > #include <linux/slab.h>
> > #include <linux/uaccess.h>
> > +#include <linux/sched/task.h>
> > +#include <linux/random.h>
> >
> > #include <asm/pgtable.h>
> > #include <asm/tlbflush.h>
> > @@ -45,6 +47,37 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
> > }
> >
> > #ifdef CONFIG_STRICT_KERNEL_RWX
> > +
> > +static struct mm_struct *patching_mm __ro_after_init;
> > +static unsigned long patching_addr __ro_after_init;
> > +
> > +void __init poking_init(void)
> > +{
> > + spinlock_t *ptl; /* for protecting pte table */
> > + pte_t *ptep;
> > +
> > + /*
> > + * Some parts of the kernel (static keys for example) depend on
> > + * successful code patching. Code patching under STRICT_KERNEL_RWX
> > + * requires this setup - otherwise we cannot patch at all. We use
> > + * BUG_ON() here and later since an early failure is preferred to
> > + * buggy behavior and/or strange crashes later.
> > + */
> > + patching_mm = copy_init_mm();
> > + BUG_ON(!patching_mm);
> > +
> > + /*
> > + * In hash we cannot go above DEFAULT_MAP_WINDOW easily.
> > + * XXX: Do we want additional bits of entropy for radix?
> > + */
> > + patching_addr = (get_random_long() & PAGE_MASK) %
> > + (DEFAULT_MAP_WINDOW - PAGE_SIZE);
> > +
> > + ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
> > + BUG_ON(!ptep);
> > + pte_unmap_unlock(ptep, ptl);
>
>
> Is this needed ? What's the point in getting the pte to unmap it
> immediatly without doing anything with it ?
>
We pre-allocate the PTE here since later the allocation may fail
(GFP_KERNEL) badly when interrupts are disabled during patching.
>
> Christophe
>
>
> > +}
> > +
> > static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> >
> > static int text_area_cpu_up(unsigned int cpu)
> >
>
>
>
>
^ permalink raw reply
* Re: [PATCH 3/5] powerpc/lib: Use a temporary mm for code patching
From: Christopher M. Riedl @ 2020-06-11 3:31 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev, kernel-hardening
In-Reply-To: <7c72a249-dff1-ca22-393b-dabe35665375@csgroup.eu>
On Wed Jun 3, 2020 at 9:12 AM, Christophe Leroy wrote:
>
>
>
>
> Le 03/06/2020 à 07:19, Christopher M. Riedl a écrit :
> > Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
> > mappings to other CPUs. These mappings should be kept local to the CPU
> > doing the patching. Use the pre-initialized temporary mm and patching
> > address for this purpose. Also add a check after patching to ensure the
> > patch succeeded.
> >
> > Use the KUAP functions on non-BOOKS3_64 platforms since the temporary
> > mapping for patching uses a userspace address (to keep the mapping
> > local). On BOOKS3_64 platforms hash does not implement KUAP and on radix
> > the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR
> > (KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35).
> >
> > Based on x86 implementation:
> >
> > commit b3fd8e83ada0
> > ("x86/alternatives: Use temporary mm for text poking")
> >
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> > arch/powerpc/lib/code-patching.c | 148 ++++++++++++-------------------
> > 1 file changed, 55 insertions(+), 93 deletions(-)
> >
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 599114f63b44..df0765845204 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -20,6 +20,7 @@
> > #include <asm/code-patching.h>
> > #include <asm/setup.h>
> > #include <asm/inst.h>
> > +#include <asm/mmu_context.h>
> >
> > static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst instr,
> > struct ppc_inst *patch_addr)
> > @@ -78,101 +79,58 @@ void __init poking_init(void)
> > pte_unmap_unlock(ptep, ptl);
> > }
> >
> > -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> > -
> > -static int text_area_cpu_up(unsigned int cpu)
> > -{
> > - struct vm_struct *area;
> > -
> > - area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > - if (!area) {
> > - WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> > - cpu);
> > - return -1;
> > - }
> > - this_cpu_write(text_poke_area, area);
> > -
> > - return 0;
> > -}
> > -
> > -static int text_area_cpu_down(unsigned int cpu)
> > -{
> > - free_vm_area(this_cpu_read(text_poke_area));
> > - return 0;
> > -}
> > -
> > -/*
> > - * Run as a late init call. This allows all the boot time patching to be done
> > - * simply by patching the code, and then we're called here prior to
> > - * mark_rodata_ro(), which happens after all init calls are run. Although
> > - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
> > - * it as being preferable to a kernel that will crash later when someone tries
> > - * to use patch_instruction().
> > - */
> > -static int __init setup_text_poke_area(void)
> > -{
> > - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> > - "powerpc/text_poke:online", text_area_cpu_up,
> > - text_area_cpu_down));
> > -
> > - return 0;
> > -}
> > -late_initcall(setup_text_poke_area);
> > +struct patch_mapping {
> > + spinlock_t *ptl; /* for protecting pte table */
> > + pte_t *ptep;
> > + struct temp_mm temp_mm;
> > +};
> >
> > /*
> > * This can be called for kernel text or a module.
> > */
> > -static int map_patch_area(void *addr, unsigned long text_poke_addr)
> > +static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> > {
> > - unsigned long pfn;
> > - int err;
> > + struct page *page;
> > + pte_t pte;
> > + pgprot_t pgprot;
> >
> > if (is_vmalloc_addr(addr))
> > - pfn = vmalloc_to_pfn(addr);
> > + page = vmalloc_to_page(addr);
> > else
> > - pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> > + page = virt_to_page(addr);
> >
> > - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
> > + if (radix_enabled())
> > + pgprot = PAGE_KERNEL;
> > + else
> > + pgprot = PAGE_SHARED;
> >
> > - pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
> > - if (err)
> > + patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
> > + &patch_mapping->ptl);
> > + if (unlikely(!patch_mapping->ptep)) {
> > + pr_warn("map patch: failed to allocate pte for patching\n");
> > return -1;
> > + }
> > +
> > + pte = mk_pte(page, pgprot);
> > + if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> > + pte = pte_mkdirty(pte);
>
>
> Are you should you don't need the DIRTY bit for BOOK3S/64 non radix ?
>
>
> I think the DIRTY bit is needed always, and adding it when it is already
> there is harmless, so it should be done inconditionnnaly.
>
I tested this and it doesn't seem to make a differnce so I can make this
common in the next spin.
>
> > + set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
> > +
> > + init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> > + use_temporary_mm(&patch_mapping->temp_mm);
> >
> > return 0;
> > }
> >
> > -static inline int unmap_patch_area(unsigned long addr)
> > +static void unmap_patch(struct patch_mapping *patch_mapping)
> > {
> > - pte_t *ptep;
> > - pmd_t *pmdp;
> > - pud_t *pudp;
> > - pgd_t *pgdp;
> > -
> > - pgdp = pgd_offset_k(addr);
> > - if (unlikely(!pgdp))
> > - return -EINVAL;
> > -
> > - pudp = pud_offset(pgdp, addr);
> > - if (unlikely(!pudp))
> > - return -EINVAL;
> > -
> > - pmdp = pmd_offset(pudp, addr);
> > - if (unlikely(!pmdp))
> > - return -EINVAL;
> > -
> > - ptep = pte_offset_kernel(pmdp, addr);
> > - if (unlikely(!ptep))
> > - return -EINVAL;
> > + /* In hash, pte_clear flushes the tlb */
> > + pte_clear(patching_mm, patching_addr, patch_mapping->ptep);
> > + unuse_temporary_mm(&patch_mapping->temp_mm);
> >
> > - pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
> > -
> > - /*
> > - * In hash, pte_clear flushes the tlb, in radix, we have to
> > - */
> > - pte_clear(&init_mm, addr, ptep);
> > - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> > -
> > - return 0;
> > + /* In radix, we have to explicitly flush the tlb (no-op in hash) */
> > + local_flush_tlb_mm(patching_mm);
> > + pte_unmap_unlock(patch_mapping->ptep, patch_mapping->ptl);
> > }
> >
> > static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
> > @@ -180,32 +138,36 @@ static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
> > int err;
> > struct ppc_inst *patch_addr = NULL;
> > unsigned long flags;
> > - unsigned long text_poke_addr;
> > - unsigned long kaddr = (unsigned long)addr;
> > + struct patch_mapping patch_mapping;
> >
> > /*
> > - * During early early boot patch_instruction is called
> > - * when text_poke_area is not ready, but we still need
> > - * to allow patching. We just do the plain old patching
> > + * The patching_mm is initialized before calling mark_rodata_ro. Prior
> > + * to this, patch_instruction is called when we don't have (and don't
> > + * need) the patching_mm so just do plain old patching.
> > */
> > - if (!this_cpu_read(text_poke_area))
> > + if (!patching_mm)
> > return raw_patch_instruction(addr, instr);
> >
> > local_irq_save(flags);
> >
> > - text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
> > - if (map_patch_area(addr, text_poke_addr)) {
> > - err = -1;
> > + err = map_patch(addr, &patch_mapping);
> > + if (err)
> > goto out;
> > - }
> >
> > - patch_addr = (struct ppc_inst *)(text_poke_addr + (kaddr & ~PAGE_MASK));
> > + patch_addr = (struct ppc_inst *)(patching_addr | offset_in_page(addr));
> >
> > - __patch_instruction(addr, instr, patch_addr);
> > + if (!radix_enabled())
> > + allow_write_to_user(patch_addr, sizeof(instr));
>
>
> Can't use sizeof(instr), you have to use ppc_inst_size()
>
Good catch, will fix this in the next spin (the other one below too).
>
> > + err = __patch_instruction(addr, instr, patch_addr);
> > + if (!radix_enabled())
> > + prevent_write_to_user(patch_addr, sizeof(instr));
>
>
> Same
>
>
> >
> > - err = unmap_patch_area(text_poke_addr);
> > - if (err)
> > - pr_warn("failed to unmap %lx\n", text_poke_addr);
> > + unmap_patch(&patch_mapping);
> > + /*
> > + * Something is wrong if what we just wrote doesn't match what we
> > + * think we just wrote.
> > + */
> > + WARN_ON(!ppc_inst_equal(ppc_inst_read(addr), instr));
> >
> > out:
> > local_irq_restore(flags);
> >
>
>
> Christophe
>
>
>
>
^ permalink raw reply
* Re: [PATCH 1/5] powerpc/mm: Introduce temporary mm
From: Christopher M. Riedl @ 2020-06-11 3:34 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev, kernel-hardening
In-Reply-To: <ff05b833-720e-e1e2-f43b-8285d520a563@csgroup.eu>
On Wed Jun 3, 2020 at 8:58 AM, Christophe Leroy wrote:
>
>
>
>
> Le 03/06/2020 à 07:19, Christopher M. Riedl a écrit :
> > x86 supports the notion of a temporary mm which restricts access to
> > temporary PTEs to a single CPU. A temporary mm is useful for situations
> > where a CPU needs to perform sensitive operations (such as patching a
> > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> > said mappings to other CPUs. A side benefit is that other CPU TLBs do
> > not need to be flushed when the temporary mm is torn down.
> >
> > Mappings in the temporary mm can be set in the userspace portion of the
> > address-space.
> >
> > Interrupts must be disabled while the temporary mm is in use. HW
> > breakpoints, which may have been set by userspace as watchpoints on
> > addresses now within the temporary mm, are saved and disabled when
> > loading the temporary mm. The HW breakpoints are restored when unloading
> > the temporary mm. All HW breakpoints are indiscriminately disabled while
> > the temporary mm is in use.
> >
> > Based on x86 implementation:
> >
> > commit cefa929c034e
> > ("x86/mm: Introduce temporary mm structs")
> >
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> > arch/powerpc/include/asm/debug.h | 1 +
> > arch/powerpc/include/asm/mmu_context.h | 64 ++++++++++++++++++++++++++
> > arch/powerpc/kernel/process.c | 5 ++
> > 3 files changed, 70 insertions(+)
> >
> > diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> > index ec57daf87f40..827350c9bcf3 100644
> > --- a/arch/powerpc/include/asm/debug.h
> > +++ b/arch/powerpc/include/asm/debug.h
> > @@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
> > #endif
> >
> > void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
> > +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
> > bool ppc_breakpoint_available(void);
> > #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> > extern void do_send_trap(struct pt_regs *regs, unsigned long address,
> > diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> > index 1a474f6b1992..9269c7c7b04e 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -10,6 +10,7 @@
> > #include <asm/mmu.h>
> > #include <asm/cputable.h>
> > #include <asm/cputhreads.h>
> > +#include <asm/debug.h>
> >
> > /*
> > * Most if the context management is out of line
> > @@ -300,5 +301,68 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
> > return 0;
> > }
> >
> > +struct temp_mm {
> > + struct mm_struct *temp;
> > + struct mm_struct *prev;
> > + bool is_kernel_thread;
> > + struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> > +};
> > +
> > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> > +{
> > + temp_mm->temp = mm;
> > + temp_mm->prev = NULL;
> > + temp_mm->is_kernel_thread = false;
> > + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> > +}
> > +
> > +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > + lockdep_assert_irqs_disabled();
> > +
> > + temp_mm->is_kernel_thread = current->mm == NULL;
> > + if (temp_mm->is_kernel_thread)
> > + temp_mm->prev = current->active_mm;
> > + else
> > + temp_mm->prev = current->mm;
>
>
> Is that necessary to make different for kernel threads ? When I look at
> x86 implementation, they don't do such a thing.
>
Yup, in do_slb_fault we error out if the current->mm is NULL resulting
in spectacular fails during patching w/ hash mmu.
>
> > +
> > + /*
> > + * Hash requires a non-NULL current->mm to allocate a userspace address
> > + * when handling a page fault. Does not appear to hurt in Radix either.
> > + */
> > + current->mm = temp_mm->temp;
> > + switch_mm_irqs_off(NULL, temp_mm->temp, current);
> > +
> > + if (ppc_breakpoint_available()) {
> > + struct arch_hw_breakpoint null_brk = {0};
> > + int i = 0;
> > +
> > + for (; i < nr_wp_slots(); ++i) {
> > + __get_breakpoint(i, &temp_mm->brk[i]);
> > + if (temp_mm->brk[i].type != 0)
> > + __set_breakpoint(i, &null_brk);
> > + }
> > + }
> > +}
> > +
> > +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > + lockdep_assert_irqs_disabled();
> > +
> > + if (temp_mm->is_kernel_thread)
> > + current->mm = NULL;
> > + else
> > + current->mm = temp_mm->prev;
> > + switch_mm_irqs_off(NULL, temp_mm->prev, current);
> > +
> > + if (ppc_breakpoint_available()) {
> > + int i = 0;
> > +
> > + for (; i < nr_wp_slots(); ++i)
> > + if (temp_mm->brk[i].type != 0)
> > + __set_breakpoint(i, &temp_mm->brk[i]);
> > + }
> > +}
> > +
> > #endif /* __KERNEL__ */
> > #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 048d64c4e115..3973144f6980 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -825,6 +825,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
> > return 0;
> > }
> >
> > +void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> > +{
> > + memcpy(brk, this_cpu_ptr(¤t_brk[nr]), sizeof(*brk));
> > +}
> > +
> > void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
> > {
> > memcpy(this_cpu_ptr(¤t_brk[nr]), brk, sizeof(*brk));
> >
>
>
> Christophe
>
>
>
>
^ permalink raw reply
* [PATCH v2] powerpc: Remove inaccessible CMDLINE default
From: Chris Packham @ 2020-06-11 3:41 UTC (permalink / raw)
To: mpe, benh, paulus, christophe.leroy
Cc: Chris Packham, linuxppc-dev, linux-kernel
Since commit cbe46bd4f510 ("powerpc: remove CONFIG_CMDLINE #ifdef mess")
CONFIG_CMDLINE has always had a value regardless of CONFIG_CMDLINE_BOOL.
For example:
$ make ARCH=powerpc defconfig
$ cat .config
# CONFIG_CMDLINE_BOOL is not set
CONFIG_CMDLINE=""
When enabling CONFIG_CMDLINE_BOOL this value is kept making the 'default
"..." if CONFIG_CMDLINE_BOOL' ineffective.
$ ./scripts/config --enable CONFIG_CMDLINE_BOOL
$ cat .config
CONFIG_CMDLINE_BOOL=y
CONFIG_CMDLINE=""
Remove CONFIG_CMDLINE_BOOL and the inaccessible default.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
It took me a while to get round to sending a v2, for a refresher v1 can be found here:
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190802050232.22978-1-chris.packham@alliedtelesis.co.nz/
Changes in v2:
- Rebase on top of Linus's tree
- Fix some typos in commit message
- Add review from Christophe
- Remove CONFIG_CMDLINE_BOOL
arch/powerpc/Kconfig | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..51abc59c3334 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -859,12 +859,8 @@ config PPC_DENORMALISATION
Add support for handling denormalisation of single precision
values. Useful for bare metal only. If unsure say Y here.
-config CMDLINE_BOOL
- bool "Default bootloader kernel arguments"
-
config CMDLINE
- string "Initial kernel command string" if CMDLINE_BOOL
- default "console=ttyS0,9600 console=tty0 root=/dev/sda2" if CMDLINE_BOOL
+ string "Initial kernel command string"
default ""
help
On some platforms, there is currently no way for the boot loader to
--
2.27.0
^ permalink raw reply related
* [PATCH v2] All arch: remove system call sys_sysctl
From: Xiaoming Ni @ 2020-06-11 3:54 UTC (permalink / raw)
To: acme, ak, akpm, alexander.shishkin, arnd, axboe, bauerman, benh,
borntraeger, bp, brgerst, catalin.marinas, christian, chris,
cyphar, dalias, davem, deller, dhowells, dvyukov, ebiederm, elver,
fenghua.yu, flameeyes, geert, gor, haolee.swjtu, heiko.carstens,
hpa, ink, James.Bottomley, jcmvbkbc, jiri, jolsa, jongk, keescook,
krzk, linux-alpha, linux-api, linux-arm-kernel, linux, linux,
linux-fsdevel, linux-ia64, linux-kernel, linux-m68k, linux-mips,
linux-parisc, linuxppc-dev, linux-s390, linux-sh, linux-xtensa,
luto, mark.rutland, martin.petersen, mattst88, mcgrof, minchan,
mingo, monstr, mpe, mszeredi, namhyung, naveen.n.rao, nixiaoming,
npiggin, oleg, olof, paulburton, paulmck, paulus, peterz,
ravi.bangoria, rdunlap, rth, samitolvanen, sargun, sfr, shawnguo,
sparclinux, sudeep.holla, surenb, svens, tglx, tony.luck,
tsbogend, vbabka, viro, will, x86, yamada.masahiro, ysato,
yzaikin, zhouyanjie
Cc: young.liuyang, alex.huangjianhui
Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
sys_sysctl is actually unavailable: any input can only return an error.
We have been warning about people using the sysctl system call for years
and believe there are no more users. Even if there are users of this
interface if they have not complained or fixed their code by now they
probably are not going to, so there is no point in warning them any
longer.
So completely remove sys_sysctl on all architectures.
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
changes in v2:
According to Kees Cook's suggestion, completely remove sys_sysctl on all arch
According to Eric W. Biederman's suggestion, update the commit log
V1: https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaoming@huawei.com/
Delete the code of sys_sysctl and return -ENOSYS directly at the function entry
---
arch/alpha/kernel/syscalls/syscall.tbl | 2 +-
arch/arm/configs/am200epdkit_defconfig | 1 -
arch/arm/tools/syscall.tbl | 2 +-
arch/arm64/include/asm/unistd32.h | 4 +-
arch/ia64/kernel/syscalls/syscall.tbl | 2 +-
arch/m68k/kernel/syscalls/syscall.tbl | 2 +-
arch/microblaze/kernel/syscalls/syscall.tbl | 2 +-
arch/mips/configs/cu1000-neo_defconfig | 1 -
arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +-
arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +-
arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +-
arch/parisc/kernel/syscalls/syscall.tbl | 2 +-
arch/powerpc/kernel/syscalls/syscall.tbl | 2 +-
arch/s390/kernel/syscalls/syscall.tbl | 2 +-
arch/sh/configs/dreamcast_defconfig | 1 -
arch/sh/configs/espt_defconfig | 1 -
arch/sh/configs/hp6xx_defconfig | 1 -
arch/sh/configs/landisk_defconfig | 1 -
arch/sh/configs/lboxre2_defconfig | 1 -
arch/sh/configs/microdev_defconfig | 1 -
arch/sh/configs/migor_defconfig | 1 -
arch/sh/configs/r7780mp_defconfig | 1 -
arch/sh/configs/r7785rp_defconfig | 1 -
arch/sh/configs/rts7751r2d1_defconfig | 1 -
arch/sh/configs/rts7751r2dplus_defconfig | 1 -
arch/sh/configs/se7206_defconfig | 1 -
arch/sh/configs/se7343_defconfig | 1 -
arch/sh/configs/se7619_defconfig | 1 -
arch/sh/configs/se7705_defconfig | 1 -
arch/sh/configs/se7750_defconfig | 1 -
arch/sh/configs/se7751_defconfig | 1 -
arch/sh/configs/secureedge5410_defconfig | 1 -
arch/sh/configs/sh03_defconfig | 1 -
arch/sh/configs/sh7710voipgw_defconfig | 1 -
arch/sh/configs/sh7757lcr_defconfig | 1 -
arch/sh/configs/sh7763rdp_defconfig | 1 -
arch/sh/configs/shmin_defconfig | 1 -
arch/sh/configs/titan_defconfig | 1 -
arch/sh/include/uapi/asm/unistd_64.h | 2 +-
arch/sh/kernel/syscalls/syscall.tbl | 2 +-
arch/sh/kernel/syscalls_64.S | 2 +-
arch/sparc/kernel/syscalls/syscall.tbl | 2 +-
arch/x86/entry/syscalls/syscall_32.tbl | 2 +-
arch/x86/entry/syscalls/syscall_64.tbl | 2 +-
arch/xtensa/kernel/syscalls/syscall.tbl | 2 +-
include/linux/compat.h | 1 -
include/linux/syscalls.h | 2 -
include/linux/sysctl.h | 6 +-
include/uapi/linux/sysctl.h | 15 --
kernel/Makefile | 2 +-
kernel/sys_ni.c | 1 -
kernel/sysctl_binary.c | 171 ---------------------
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 2 +-
tools/perf/arch/s390/entry/syscalls/syscall.tbl | 2 +-
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2 +-
55 files changed, 26 insertions(+), 244 deletions(-)
delete mode 100644 kernel/sysctl_binary.c
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index b249824..0da7f1c 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -249,7 +249,7 @@
316 common mlockall sys_mlockall
317 common munlockall sys_munlockall
318 common sysinfo sys_sysinfo
-319 common _sysctl sys_sysctl
+319 common _sysctl sys_ni_syscall
# 320 was sys_idle
321 common oldumount sys_oldumount
322 common swapon sys_swapon
diff --git a/arch/arm/configs/am200epdkit_defconfig b/arch/arm/configs/am200epdkit_defconfig
index f56ac39..4e49d6c 100644
--- a/arch/arm/configs/am200epdkit_defconfig
+++ b/arch/arm/configs/am200epdkit_defconfig
@@ -3,7 +3,6 @@ CONFIG_LOCALVERSION="gum"
CONFIG_SYSVIPC=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_EXPERT=y
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_EPOLL is not set
# CONFIG_SHMEM is not set
# CONFIG_VM_EVENT_COUNTERS is not set
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 7b3832d..f36fda6 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -162,7 +162,7 @@
146 common writev sys_writev
147 common getsid sys_getsid
148 common fdatasync sys_fdatasync
-149 common _sysctl sys_sysctl
+149 common _sysctl sys_ni_syscall
150 common mlock sys_mlock
151 common munlock sys_munlock
152 common mlockall sys_mlockall
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index f8dafe9..ca41bb7 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -308,8 +308,8 @@
__SYSCALL(__NR_getsid, sys_getsid)
#define __NR_fdatasync 148
__SYSCALL(__NR_fdatasync, sys_fdatasync)
-#define __NR__sysctl 149
-__SYSCALL(__NR__sysctl, compat_sys_sysctl)
+ /* 149 was sys_sysctl */
+__SYSCALL(149, sys_ni_syscall)
#define __NR_mlock 150
__SYSCALL(__NR_mlock, sys_mlock)
#define __NR_munlock 151
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 6636a1a..75b880b 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -135,7 +135,7 @@
123 common writev sys_writev
124 common pread64 sys_pread64
125 common pwrite64 sys_pwrite64
-126 common _sysctl sys_sysctl
+126 common _sysctl sys_ni_syscall
127 common mmap sys_mmap
128 common munmap sys_munmap
129 common mlock sys_mlock
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 8cd84a7..91b21ad 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -156,7 +156,7 @@
146 common writev sys_writev
147 common getsid sys_getsid
148 common fdatasync sys_fdatasync
-149 common _sysctl sys_sysctl
+149 common _sysctl sys_ni_syscall
150 common mlock sys_mlock
151 common munlock sys_munlock
152 common mlockall sys_mlockall
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index f581a02..dc5f9fb 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -156,7 +156,7 @@
146 common writev sys_writev
147 common getsid sys_getsid
148 common fdatasync sys_fdatasync
-149 common _sysctl sys_sysctl
+149 common _sysctl sys_ni_syscall
150 common mlock sys_mlock
151 common munlock sys_munlock
152 common mlockall sys_mlockall
diff --git a/arch/mips/configs/cu1000-neo_defconfig b/arch/mips/configs/cu1000-neo_defconfig
index 9b05a8f..244654c 100644
--- a/arch/mips/configs/cu1000-neo_defconfig
+++ b/arch/mips/configs/cu1000-neo_defconfig
@@ -17,7 +17,6 @@ CONFIG_CGROUP_CPUACCT=y
CONFIG_NAMESPACES=y
CONFIG_USER_NS=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
-CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS_ALL=y
CONFIG_EMBEDDED=y
# CONFIG_VM_EVENT_COUNTERS is not set
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c85bdc3..2653b28 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -159,7 +159,7 @@
149 n32 munlockall sys_munlockall
150 n32 vhangup sys_vhangup
151 n32 pivot_root sys_pivot_root
-152 n32 _sysctl compat_sys_sysctl
+152 n32 _sysctl sys_ni_syscall
153 n32 prctl sys_prctl
154 n32 adjtimex sys_adjtimex_time32
155 n32 setrlimit compat_sys_setrlimit
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 9e08c40..a4fd3bf 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -159,7 +159,7 @@
149 n64 munlockall sys_munlockall
150 n64 vhangup sys_vhangup
151 n64 pivot_root sys_pivot_root
-152 n64 _sysctl sys_sysctl
+152 n64 _sysctl sys_ni_syscall
153 n64 prctl sys_prctl
154 n64 adjtimex sys_adjtimex
155 n64 setrlimit sys_setrlimit
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index a2b591d..a30cfd4 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -164,7 +164,7 @@
150 o32 unused150 sys_ni_syscall
151 o32 getsid sys_getsid
152 o32 fdatasync sys_fdatasync
-153 o32 _sysctl sys_sysctl compat_sys_sysctl
+153 o32 _sysctl sys_ni_syscall
154 o32 mlock sys_mlock
155 o32 munlock sys_munlock
156 o32 mlockall sys_mlockall
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 98e7442..a47bc19 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -163,7 +163,7 @@
146 common writev sys_writev compat_sys_writev
147 common getsid sys_getsid
148 common fdatasync sys_fdatasync
-149 common _sysctl sys_sysctl compat_sys_sysctl
+149 common _sysctl sys_ni_syscall
150 common mlock sys_mlock
151 common munlock sys_munlock
152 common mlockall sys_mlockall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 72fb9dd..a60163f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -197,7 +197,7 @@
146 common writev sys_writev compat_sys_writev
147 common getsid sys_getsid
148 common fdatasync sys_fdatasync
-149 nospu _sysctl sys_sysctl compat_sys_sysctl
+149 nospu _sysctl sys_ni_syscall
150 common mlock sys_mlock
151 common munlock sys_munlock
152 common mlockall sys_mlockall
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index b731fcb..f17aaf6 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -138,7 +138,7 @@
146 common writev sys_writev compat_sys_writev
147 common getsid sys_getsid sys_getsid
148 common fdatasync sys_fdatasync sys_fdatasync
-149 common _sysctl sys_sysctl compat_sys_sysctl
+149 common _sysctl sys_ni_syscall
150 common mlock sys_mlock sys_mlock
151 common munlock sys_munlock sys_munlock
152 common mlockall sys_mlockall sys_mlockall
diff --git a/arch/sh/configs/dreamcast_defconfig b/arch/sh/configs/dreamcast_defconfig
index ae067e0..6a82c7b 100644
--- a/arch/sh/configs/dreamcast_defconfig
+++ b/arch/sh/configs/dreamcast_defconfig
@@ -1,7 +1,6 @@
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_MODULES=y
diff --git a/arch/sh/configs/espt_defconfig b/arch/sh/configs/espt_defconfig
index a5b865a..9a988c3 100644
--- a/arch/sh/configs/espt_defconfig
+++ b/arch/sh/configs/espt_defconfig
@@ -5,7 +5,6 @@ CONFIG_LOG_BUF_SHIFT=14
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
diff --git a/arch/sh/configs/hp6xx_defconfig b/arch/sh/configs/hp6xx_defconfig
index a92db66..70e6605 100644
--- a/arch/sh/configs/hp6xx_defconfig
+++ b/arch/sh/configs/hp6xx_defconfig
@@ -3,7 +3,6 @@ CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_CPU_SUBTYPE_SH7709=y
diff --git a/arch/sh/configs/landisk_defconfig b/arch/sh/configs/landisk_defconfig
index 567af75..ba6ec04 100644
--- a/arch/sh/configs/landisk_defconfig
+++ b/arch/sh/configs/landisk_defconfig
@@ -1,6 +1,5 @@
CONFIG_SYSVIPC=y
CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_SLAB=y
CONFIG_MODULES=y
diff --git a/arch/sh/configs/lboxre2_defconfig b/arch/sh/configs/lboxre2_defconfig
index 10f6d37..05e4ac6 100644
--- a/arch/sh/configs/lboxre2_defconfig
+++ b/arch/sh/configs/lboxre2_defconfig
@@ -1,6 +1,5 @@
CONFIG_SYSVIPC=y
CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_SLAB=y
CONFIG_MODULES=y
diff --git a/arch/sh/configs/microdev_defconfig b/arch/sh/configs/microdev_defconfig
index ed84d13..c65667d 100644
--- a/arch/sh/configs/microdev_defconfig
+++ b/arch/sh/configs/microdev_defconfig
@@ -2,7 +2,6 @@ CONFIG_BSD_PROCESS_ACCT=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_BLK_DEV_INITRD=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_CPU_SUBTYPE_SH4_202=y
diff --git a/arch/sh/configs/migor_defconfig b/arch/sh/configs/migor_defconfig
index 494a1675..dec9316 100644
--- a/arch/sh/configs/migor_defconfig
+++ b/arch/sh/configs/migor_defconfig
@@ -4,7 +4,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_BLK_DEV_INITRD=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
diff --git a/arch/sh/configs/r7780mp_defconfig b/arch/sh/configs/r7780mp_defconfig
index 0a18f80..ff8f8d4 100644
--- a/arch/sh/configs/r7780mp_defconfig
+++ b/arch/sh/configs/r7780mp_defconfig
@@ -3,7 +3,6 @@ CONFIG_BSD_PROCESS_ACCT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_FUTEX is not set
# CONFIG_EPOLL is not set
CONFIG_SLAB=y
diff --git a/arch/sh/configs/r7785rp_defconfig b/arch/sh/configs/r7785rp_defconfig
index 7226ac5..d9afce5 100644
--- a/arch/sh/configs/r7785rp_defconfig
+++ b/arch/sh/configs/r7785rp_defconfig
@@ -7,7 +7,6 @@ CONFIG_RCU_TRACE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
diff --git a/arch/sh/configs/rts7751r2d1_defconfig b/arch/sh/configs/rts7751r2d1_defconfig
index 6a3cfe0..fc9c221 100644
--- a/arch/sh/configs/rts7751r2d1_defconfig
+++ b/arch/sh/configs/rts7751r2d1_defconfig
@@ -1,7 +1,6 @@
CONFIG_SYSVIPC=y
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
diff --git a/arch/sh/configs/rts7751r2dplus_defconfig b/arch/sh/configs/rts7751r2dplus_defconfig
index 2b3d7d2..ff3fd678 100644
--- a/arch/sh/configs/rts7751r2dplus_defconfig
+++ b/arch/sh/configs/rts7751r2dplus_defconfig
@@ -1,7 +1,6 @@
CONFIG_SYSVIPC=y
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
index a93402b..19f0dae 100644
--- a/arch/sh/configs/se7206_defconfig
+++ b/arch/sh/configs/se7206_defconfig
@@ -18,7 +18,6 @@ CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_UID16 is not set
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS_ALL=y
# CONFIG_ELF_CORE is not set
# CONFIG_COMPAT_BRK is not set
diff --git a/arch/sh/configs/se7343_defconfig b/arch/sh/configs/se7343_defconfig
index 06d067c..553c7aa 100644
--- a/arch/sh/configs/se7343_defconfig
+++ b/arch/sh/configs/se7343_defconfig
@@ -2,7 +2,6 @@
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_FUTEX is not set
# CONFIG_EPOLL is not set
# CONFIG_SHMEM is not set
diff --git a/arch/sh/configs/se7619_defconfig b/arch/sh/configs/se7619_defconfig
index f54722d..baf1c84 100644
--- a/arch/sh/configs/se7619_defconfig
+++ b/arch/sh/configs/se7619_defconfig
@@ -1,7 +1,6 @@
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_UID16 is not set
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_KALLSYMS is not set
# CONFIG_HOTPLUG is not set
# CONFIG_ELF_CORE is not set
diff --git a/arch/sh/configs/se7705_defconfig b/arch/sh/configs/se7705_defconfig
index ddfc698..805966f 100644
--- a/arch/sh/configs/se7705_defconfig
+++ b/arch/sh/configs/se7705_defconfig
@@ -2,7 +2,6 @@
CONFIG_LOG_BUF_SHIFT=14
CONFIG_BLK_DEV_INITRD=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_KALLSYMS is not set
# CONFIG_HOTPLUG is not set
CONFIG_SLAB=y
diff --git a/arch/sh/configs/se7750_defconfig b/arch/sh/configs/se7750_defconfig
index b23f675..3f1c137 100644
--- a/arch/sh/configs/se7750_defconfig
+++ b/arch/sh/configs/se7750_defconfig
@@ -5,7 +5,6 @@ CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_HOTPLUG is not set
CONFIG_SLAB=y
CONFIG_MODULES=y
diff --git a/arch/sh/configs/se7751_defconfig b/arch/sh/configs/se7751_defconfig
index 1623436..4a02406 100644
--- a/arch/sh/configs/se7751_defconfig
+++ b/arch/sh/configs/se7751_defconfig
@@ -3,7 +3,6 @@ CONFIG_BSD_PROCESS_ACCT=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_BLK_DEV_INITRD=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_HOTPLUG is not set
CONFIG_SLAB=y
CONFIG_MODULES=y
diff --git a/arch/sh/configs/secureedge5410_defconfig b/arch/sh/configs/secureedge5410_defconfig
index 360592d..8422599 100644
--- a/arch/sh/configs/secureedge5410_defconfig
+++ b/arch/sh/configs/secureedge5410_defconfig
@@ -1,7 +1,6 @@
# CONFIG_SWAP is not set
CONFIG_LOG_BUF_SHIFT=14
CONFIG_BLK_DEV_INITRD=y
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_HOTPLUG is not set
CONFIG_SLAB=y
# CONFIG_BLK_DEV_BSG is not set
diff --git a/arch/sh/configs/sh03_defconfig b/arch/sh/configs/sh03_defconfig
index 87db9a8..f0073ed 100644
--- a/arch/sh/configs/sh03_defconfig
+++ b/arch/sh/configs/sh03_defconfig
@@ -3,7 +3,6 @@ CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_BLK_DEV_INITRD=y
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
diff --git a/arch/sh/configs/sh7710voipgw_defconfig b/arch/sh/configs/sh7710voipgw_defconfig
index c86f284..12a1395 100644
--- a/arch/sh/configs/sh7710voipgw_defconfig
+++ b/arch/sh/configs/sh7710voipgw_defconfig
@@ -2,7 +2,6 @@
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_LOG_BUF_SHIFT=14
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_FUTEX is not set
# CONFIG_EPOLL is not set
# CONFIG_SHMEM is not set
diff --git a/arch/sh/configs/sh7757lcr_defconfig b/arch/sh/configs/sh7757lcr_defconfig
index 9f2aed0..ca327d1 100644
--- a/arch/sh/configs/sh7757lcr_defconfig
+++ b/arch/sh/configs/sh7757lcr_defconfig
@@ -8,7 +8,6 @@ CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_BLK_DEV_INITRD=y
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS_ALL=y
CONFIG_SLAB=y
CONFIG_MODULES=y
diff --git a/arch/sh/configs/sh7763rdp_defconfig b/arch/sh/configs/sh7763rdp_defconfig
index d0a0aa7..26c5fd0 100644
--- a/arch/sh/configs/sh7763rdp_defconfig
+++ b/arch/sh/configs/sh7763rdp_defconfig
@@ -5,7 +5,6 @@ CONFIG_LOG_BUF_SHIFT=14
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
diff --git a/arch/sh/configs/shmin_defconfig b/arch/sh/configs/shmin_defconfig
index d589cfd..5504ca4 100644
--- a/arch/sh/configs/shmin_defconfig
+++ b/arch/sh/configs/shmin_defconfig
@@ -1,7 +1,6 @@
# CONFIG_SWAP is not set
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_UID16 is not set
-# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_KALLSYMS is not set
# CONFIG_HOTPLUG is not set
# CONFIG_BUG is not set
diff --git a/arch/sh/configs/titan_defconfig b/arch/sh/configs/titan_defconfig
index 4ec961a..ba887f1 100644
--- a/arch/sh/configs/titan_defconfig
+++ b/arch/sh/configs/titan_defconfig
@@ -6,7 +6,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
CONFIG_BLK_DEV_INITRD=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
-# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SLAB=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
diff --git a/arch/sh/include/uapi/asm/unistd_64.h b/arch/sh/include/uapi/asm/unistd_64.h
index 75da548..04fe2ef 100644
--- a/arch/sh/include/uapi/asm/unistd_64.h
+++ b/arch/sh/include/uapi/asm/unistd_64.h
@@ -164,7 +164,7 @@
#define __NR_writev 146
#define __NR_getsid 147
#define __NR_fdatasync 148
-#define __NR__sysctl 149
+ /* 149 was sys_sysctl */
#define __NR_mlock 150
#define __NR_munlock 151
#define __NR_mlockall 152
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index e7a4804..7456845 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -156,7 +156,7 @@
146 common writev sys_writev
147 common getsid sys_getsid
148 common fdatasync sys_fdatasync
-149 common _sysctl sys_sysctl
+149 common _sysctl sys_ni_syscall
150 common mlock sys_mlock
151 common munlock sys_munlock
152 common mlockall sys_mlockall
diff --git a/arch/sh/kernel/syscalls_64.S b/arch/sh/kernel/syscalls_64.S
index 1bcb86f..e4c1d54 100644
--- a/arch/sh/kernel/syscalls_64.S
+++ b/arch/sh/kernel/syscalls_64.S
@@ -166,7 +166,7 @@ sys_call_table:
.long sys_writev
.long sys_getsid
.long sys_fdatasync
- .long sys_sysctl
+ .long sys_ni_syscall /* 149: for sys_sysctl */
.long sys_mlock /* 150 */
.long sys_munlock
.long sys_mlockall
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index d6126ee..74adaeca 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -300,7 +300,7 @@
249 64 nanosleep sys_nanosleep
250 32 mremap sys_mremap
250 64 mremap sys_64_mremap
-251 common _sysctl sys_sysctl compat_sys_sysctl
+251 common _sysctl sys_ni_syscall
252 common getsid sys_getsid
253 common fdatasync sys_fdatasync
254 32 nfsservctl sys_ni_syscall sys_nis_syscall
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 686d59d..ef76360 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -160,7 +160,7 @@
146 i386 writev sys_writev compat_sys_writev
147 i386 getsid sys_getsid
148 i386 fdatasync sys_fdatasync
-149 i386 _sysctl sys_sysctl compat_sys_sysctl
+149 i386 _sysctl sys_ni_syscall
150 i386 mlock sys_mlock
151 i386 munlock sys_munlock
152 i386 mlockall sys_mlockall
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b345b35..6a3b0b3 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -164,7 +164,7 @@
153 common vhangup sys_vhangup
154 common modify_ldt sys_modify_ldt
155 common pivot_root sys_pivot_root
-156 64 _sysctl sys_sysctl
+156 64 _sysctl sys_ni_syscall
157 common prctl sys_prctl
158 common arch_prctl sys_arch_prctl
159 common adjtimex sys_adjtimex
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 96cb070..34cbbf5 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -222,7 +222,7 @@
204 common quotactl sys_quotactl
# 205 was old nfsservctl
205 common nfsservctl sys_ni_syscall
-206 common _sysctl sys_sysctl
+206 common _sysctl sys_ni_syscall
207 common bdflush sys_bdflush
208 common uname sys_newuname
209 common sysinfo sys_sysinfo
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 86b61e8..c55d245 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -859,7 +859,6 @@ asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
asmlinkage long compat_sys_ustat(unsigned dev, struct compat_ustat __user *u32);
asmlinkage long compat_sys_recv(int fd, void __user *buf, compat_size_t len,
unsigned flags);
-asmlinkage long compat_sys_sysctl(struct compat_sysctl_args __user *args);
/* obsolete: fs/readdir.c */
asmlinkage long compat_sys_old_readdir(unsigned int fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 63ffa6d..915233a 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -48,7 +48,6 @@
struct statfs64;
struct statx;
struct fsinfo_params;
-struct __sysctl_args;
struct sysinfo;
struct timespec;
struct __kernel_old_timeval;
@@ -1125,7 +1124,6 @@ asmlinkage long sys_epoll_wait(int epfd, struct epoll_event __user *events,
asmlinkage long sys_bdflush(int func, long data);
asmlinkage long sys_oldumount(char __user *name);
asmlinkage long sys_uselib(const char __user *library);
-asmlinkage long sys_sysctl(struct __sysctl_args __user *args);
asmlinkage long sys_sysfs(int option,
unsigned long arg1, unsigned long arg2);
asmlinkage long sys_fork(void);
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 8112c15..299f9cb 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -90,15 +90,13 @@ int proc_do_static_key(struct ctl_table *table, int write, void *buffer,
* sysctl names can be mirrored automatically under /proc/sys. The
* procname supplied controls /proc naming.
*
- * The table's mode will be honoured both for sys_sysctl(2) and
- * proc-fs access.
+ * The table's mode will be honoured for proc-fs access.
*
* Leaf nodes in the sysctl tree will be represented by a single file
* under /proc; non-leaf nodes will be represented by directories. A
* null procname disables /proc mirroring at this node.
*
- * sysctl(2) can automatically manage read and write requests through
- * the sysctl table. The data and maxlen fields of the ctl_table
+ * The data and maxlen fields of the ctl_table
* struct enable minimal validation of the values being written to be
* performed, and the mode field allows minimal authentication.
*
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 27c1ed2..84b44c3 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -27,21 +27,6 @@
#include <linux/types.h>
#include <linux/compiler.h>
-#define CTL_MAXNAME 10 /* how many path components do we allow in a
- call to sysctl? In other words, what is
- the largest acceptable value for the nlen
- member of a struct __sysctl_args to have? */
-
-struct __sysctl_args {
- int __user *name;
- int nlen;
- void __user *oldval;
- size_t __user *oldlenp;
- void __user *newval;
- size_t newlen;
- unsigned long __unused[4];
-};
-
/* Define sysctl names first */
/* Top-level names: */
diff --git a/kernel/Makefile b/kernel/Makefile
index 0bd4ed7..a3f7c08 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -5,7 +5,7 @@
obj-y = fork.o exec_domain.o panic.o \
cpu.o exit.o softirq.o resource.o \
- sysctl.o sysctl_binary.o capability.o ptrace.o user.o \
+ sysctl.o capability.o ptrace.o user.o \
signal.o sys.o umh.o workqueue.o pid.o task_work.o \
extable.o params.o \
kthread.o sys_ni.o nsproxy.o \
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index fad48ac..c935c18 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -373,7 +373,6 @@ asmlinkage long sys_ni_syscall(void)
COND_SYSCALL_COMPAT(socketcall);
/* compat syscalls for arm64, x86, ... */
-COND_SYSCALL_COMPAT(sysctl);
COND_SYSCALL_COMPAT(fanotify_mark);
/* x86 */
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
deleted file mode 100644
index 7d550cc..00000000
--- a/kernel/sysctl_binary.c
+++ /dev/null
@@ -1,171 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/stat.h>
-#include <linux/sysctl.h>
-#include "../fs/xfs/xfs_sysctl.h"
-#include <linux/sunrpc/debug.h>
-#include <linux/string.h>
-#include <linux/syscalls.h>
-#include <linux/namei.h>
-#include <linux/mount.h>
-#include <linux/fs.h>
-#include <linux/nsproxy.h>
-#include <linux/pid_namespace.h>
-#include <linux/file.h>
-#include <linux/ctype.h>
-#include <linux/netdevice.h>
-#include <linux/kernel.h>
-#include <linux/uuid.h>
-#include <linux/slab.h>
-#include <linux/compat.h>
-
-static ssize_t binary_sysctl(const int *name, int nlen,
- void __user *oldval, size_t oldlen, void __user *newval, size_t newlen)
-{
- return -ENOSYS;
-}
-
-static void deprecated_sysctl_warning(const int *name, int nlen)
-{
- int i;
-
- /*
- * CTL_KERN/KERN_VERSION is used by older glibc and cannot
- * ever go away.
- */
- if (nlen >= 2 && name[0] == CTL_KERN && name[1] == KERN_VERSION)
- return;
-
- if (printk_ratelimit()) {
- printk(KERN_INFO
- "warning: process `%s' used the deprecated sysctl "
- "system call with ", current->comm);
- for (i = 0; i < nlen; i++)
- printk(KERN_CONT "%d.", name[i]);
- printk(KERN_CONT "\n");
- }
- return;
-}
-
-#define WARN_ONCE_HASH_BITS 8
-#define WARN_ONCE_HASH_SIZE (1<<WARN_ONCE_HASH_BITS)
-
-static DECLARE_BITMAP(warn_once_bitmap, WARN_ONCE_HASH_SIZE);
-
-#define FNV32_OFFSET 2166136261U
-#define FNV32_PRIME 0x01000193
-
-/*
- * Print each legacy sysctl (approximately) only once.
- * To avoid making the tables non-const use a external
- * hash-table instead.
- * Worst case hash collision: 6, but very rarely.
- * NOTE! We don't use the SMP-safe bit tests. We simply
- * don't care enough.
- */
-static void warn_on_bintable(const int *name, int nlen)
-{
- int i;
- u32 hash = FNV32_OFFSET;
-
- for (i = 0; i < nlen; i++)
- hash = (hash ^ name[i]) * FNV32_PRIME;
- hash %= WARN_ONCE_HASH_SIZE;
- if (__test_and_set_bit(hash, warn_once_bitmap))
- return;
- deprecated_sysctl_warning(name, nlen);
-}
-
-static ssize_t do_sysctl(int __user *args_name, int nlen,
- void __user *oldval, size_t oldlen, void __user *newval, size_t newlen)
-{
- int name[CTL_MAXNAME];
- int i;
-
- /* Check args->nlen. */
- if (nlen < 0 || nlen > CTL_MAXNAME)
- return -ENOTDIR;
- /* Read in the sysctl name for simplicity */
- for (i = 0; i < nlen; i++)
- if (get_user(name[i], args_name + i))
- return -EFAULT;
-
- warn_on_bintable(name, nlen);
-
- return binary_sysctl(name, nlen, oldval, oldlen, newval, newlen);
-}
-
-SYSCALL_DEFINE1(sysctl, struct __sysctl_args __user *, args)
-{
- struct __sysctl_args tmp;
- size_t oldlen = 0;
- ssize_t result;
-
- if (copy_from_user(&tmp, args, sizeof(tmp)))
- return -EFAULT;
-
- if (tmp.oldval && !tmp.oldlenp)
- return -EFAULT;
-
- if (tmp.oldlenp && get_user(oldlen, tmp.oldlenp))
- return -EFAULT;
-
- result = do_sysctl(tmp.name, tmp.nlen, tmp.oldval, oldlen,
- tmp.newval, tmp.newlen);
-
- if (result >= 0) {
- oldlen = result;
- result = 0;
- }
-
- if (tmp.oldlenp && put_user(oldlen, tmp.oldlenp))
- return -EFAULT;
-
- return result;
-}
-
-
-#ifdef CONFIG_COMPAT
-
-struct compat_sysctl_args {
- compat_uptr_t name;
- int nlen;
- compat_uptr_t oldval;
- compat_uptr_t oldlenp;
- compat_uptr_t newval;
- compat_size_t newlen;
- compat_ulong_t __unused[4];
-};
-
-COMPAT_SYSCALL_DEFINE1(sysctl, struct compat_sysctl_args __user *, args)
-{
- struct compat_sysctl_args tmp;
- compat_size_t __user *compat_oldlenp;
- size_t oldlen = 0;
- ssize_t result;
-
- if (copy_from_user(&tmp, args, sizeof(tmp)))
- return -EFAULT;
-
- if (tmp.oldval && !tmp.oldlenp)
- return -EFAULT;
-
- compat_oldlenp = compat_ptr(tmp.oldlenp);
- if (compat_oldlenp && get_user(oldlen, compat_oldlenp))
- return -EFAULT;
-
- result = do_sysctl(compat_ptr(tmp.name), tmp.nlen,
- compat_ptr(tmp.oldval), oldlen,
- compat_ptr(tmp.newval), tmp.newlen);
-
- if (result >= 0) {
- oldlen = result;
- result = 0;
- }
-
- if (compat_oldlenp && put_user(oldlen, compat_oldlenp))
- return -EFAULT;
-
- return result;
-}
-
-#endif /* CONFIG_COMPAT */
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 35b61bf..6d29d9a 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -193,7 +193,7 @@
146 common writev sys_writev compat_sys_writev
147 common getsid sys_getsid
148 common fdatasync sys_fdatasync
-149 nospu _sysctl sys_sysctl compat_sys_sysctl
+149 nospu _sysctl sys_ni_syscall
150 common mlock sys_mlock
151 common munlock sys_munlock
152 common mlockall sys_mlockall
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index b38d484..0193f9b 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -138,7 +138,7 @@
146 common writev sys_writev compat_sys_writev
147 common getsid sys_getsid sys_getsid
148 common fdatasync sys_fdatasync sys_fdatasync
-149 common _sysctl sys_sysctl compat_sys_sysctl
+149 common _sysctl sys_ni_syscall
150 common mlock sys_mlock compat_sys_mlock
151 common munlock sys_munlock compat_sys_munlock
152 common mlockall sys_mlockall sys_mlockall
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index 37b844f..4e50062 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -164,7 +164,7 @@
153 common vhangup sys_vhangup
154 common modify_ldt sys_modify_ldt
155 common pivot_root sys_pivot_root
-156 64 _sysctl sys_sysctl
+156 64 _sysctl sys_ni_syscall
157 common prctl sys_prctl
158 common arch_prctl sys_arch_prctl
159 common adjtimex sys_adjtimex
--
1.8.5.6
^ permalink raw reply related
* Re: [PATCH v2] All arch: remove system call sys_sysctl
From: Stephen Rothwell @ 2020-06-11 4:12 UTC (permalink / raw)
To: Xiaoming Ni
Cc: linux-sh, catalin.marinas, paulus, ak, paulburton, geert,
mattst88, brgerst, acme, cyphar, viro, luto, tglx, surenb, rth,
young.liuyang, linux-parisc, rdunlap, linux-kernel, mcgrof,
linux-fsdevel, akpm, mark.rutland, linux-ia64, linux-xtensa,
jongk, linux, James.Bottomley, jcmvbkbc, linux-s390, ysato,
deller, yzaikin, mszeredi, gor, linux-alpha, linux-m68k,
linux-arm-kernel, chris, tony.luck, linux-api, zhouyanjie,
minchan, ebiederm, sargun, alexander.shishkin, heiko.carstens,
alex.huangjianhui, will, krzk, borntraeger, vbabka, samitolvanen,
flameeyes, ravi.bangoria, elver, keescook, arnd, bp, christian,
tsbogend, jiri, martin.petersen, yamada.masahiro, oleg,
sudeep.holla, olof, shawnguo, davem, bauerman, dalias, fenghua.yu,
peterz, dhowells, hpa, sparclinux, jolsa, svens, x86, linux,
mingo, naveen.n.rao, paulmck, npiggin, namhyung, dvyukov, axboe,
monstr, haolee.swjtu, linux-mips, ink, linuxppc-dev
In-Reply-To: <1591847640-124894-1-git-send-email-nixiaoming@huawei.com>
[-- Attachment #1: Type: text/plain, Size: 1962 bytes --]
Hi Xiaoming,
On Thu, 11 Jun 2020 11:54:00 +0800 Xiaoming Ni <nixiaoming@huawei.com> wrote:
>
> arch/sh/configs/dreamcast_defconfig | 1 -
> arch/sh/configs/espt_defconfig | 1 -
> arch/sh/configs/hp6xx_defconfig | 1 -
> arch/sh/configs/landisk_defconfig | 1 -
> arch/sh/configs/lboxre2_defconfig | 1 -
> arch/sh/configs/microdev_defconfig | 1 -
> arch/sh/configs/migor_defconfig | 1 -
> arch/sh/configs/r7780mp_defconfig | 1 -
> arch/sh/configs/r7785rp_defconfig | 1 -
> arch/sh/configs/rts7751r2d1_defconfig | 1 -
> arch/sh/configs/rts7751r2dplus_defconfig | 1 -
> arch/sh/configs/se7206_defconfig | 1 -
> arch/sh/configs/se7343_defconfig | 1 -
> arch/sh/configs/se7619_defconfig | 1 -
> arch/sh/configs/se7705_defconfig | 1 -
> arch/sh/configs/se7750_defconfig | 1 -
> arch/sh/configs/se7751_defconfig | 1 -
> arch/sh/configs/secureedge5410_defconfig | 1 -
> arch/sh/configs/sh03_defconfig | 1 -
> arch/sh/configs/sh7710voipgw_defconfig | 1 -
> arch/sh/configs/sh7757lcr_defconfig | 1 -
> arch/sh/configs/sh7763rdp_defconfig | 1 -
> arch/sh/configs/shmin_defconfig | 1 -
> arch/sh/configs/titan_defconfig | 1 -
> arch/sh/include/uapi/asm/unistd_64.h | 2 +-
> arch/sh/kernel/syscalls/syscall.tbl | 2 +-
> arch/sh/kernel/syscalls_64.S | 2 +-
You might want to rebase this onto v5.8-rc1 when it is released this
weekend as the 64bit sh code (sh5) has been removed.
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH v2] powerpc: Remove inaccessible CMDLINE default
From: Christophe Leroy @ 2020-06-11 5:46 UTC (permalink / raw)
To: Chris Packham, mpe, benh, paulus, christophe.leroy
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200611034140.9133-1-chris.packham@alliedtelesis.co.nz>
Le 11/06/2020 à 05:41, Chris Packham a écrit :
> Since commit cbe46bd4f510 ("powerpc: remove CONFIG_CMDLINE #ifdef mess")
> CONFIG_CMDLINE has always had a value regardless of CONFIG_CMDLINE_BOOL.
>
> For example:
>
> $ make ARCH=powerpc defconfig
> $ cat .config
> # CONFIG_CMDLINE_BOOL is not set
> CONFIG_CMDLINE=""
>
> When enabling CONFIG_CMDLINE_BOOL this value is kept making the 'default
> "..." if CONFIG_CMDLINE_BOOL' ineffective.
>
> $ ./scripts/config --enable CONFIG_CMDLINE_BOOL
> $ cat .config
> CONFIG_CMDLINE_BOOL=y
> CONFIG_CMDLINE=""
>
> Remove CONFIG_CMDLINE_BOOL and the inaccessible default.
You also have to remove all CONFIG_CMDLINE_BOOL from the defconfigs
Christophe
>
> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> It took me a while to get round to sending a v2, for a refresher v1 can be found here:
>
> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190802050232.22978-1-chris.packham@alliedtelesis.co.nz/
>
> Changes in v2:
> - Rebase on top of Linus's tree
> - Fix some typos in commit message
> - Add review from Christophe
> - Remove CONFIG_CMDLINE_BOOL
>
> arch/powerpc/Kconfig | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 9fa23eb320ff..51abc59c3334 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -859,12 +859,8 @@ config PPC_DENORMALISATION
> Add support for handling denormalisation of single precision
> values. Useful for bare metal only. If unsure say Y here.
>
> -config CMDLINE_BOOL
> - bool "Default bootloader kernel arguments"
> -
> config CMDLINE
> - string "Initial kernel command string" if CMDLINE_BOOL
> - default "console=ttyS0,9600 console=tty0 root=/dev/sda2" if CMDLINE_BOOL
> + string "Initial kernel command string"
> default ""
> help
> On some platforms, there is currently no way for the boot loader to
>
^ permalink raw reply
* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
From: bugzilla-daemon @ 2020-06-11 6:43 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <bug-205183-206035@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=205183
--- Comment #4 from Daniel Black (daniel@linux.ibm.com) ---
Still broken.
danielgb@talos2:~$ gcc -g -Wall -O stacktest.c
danielgb@talos2:~$ ./a.out 1240000 &
[1] 494618
danielgb@talos2:~$ cat /proc/$(pidof a.out)/maps | grep stack
7fffcde80000-7fffcdfb0000 rw-p 00000000 00:00 0
[stack]
danielgb@talos2:~$ kill -USR1 %1
danielgb@talos2:~$ signal delivered, stack base 0x7fffcdfb0000 top
0x7fffcde81427 (1240025 used)
[1]+ Done ./a.out 1240000
danielgb@talos2:~$ ./a.out 1241000 &
[1] 494677
danielgb@talos2:~$ kill -USR1 %1
danielgb@talos2:~$
[1]+ Segmentation fault ./a.out 1241000
danielgb@talos2:~$
danielgb@talos2:~$ dmesg | grep a.out
[10617.616145] a.out[494587]: bad frame in setup_rt_frame: 00007fffdea30010 nip
000000011a0a09fc lr 00007fffa1c404c8
[10865.752876] a.out[494677]: bad frame in setup_rt_frame: 00007fffcc420030 nip
0000000135a70a3c lr 00007fff952604c8
danielgb@talos2:~$ uname -a
Linux talos2 5.7.0-rc5-77151-gfea086b627a0 #1 SMP Mon May 11 16:00:00 AEST 2020
ppc64le ppc64le ppc64le GNU/Linux
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply
* Re: [PATCH v2] All arch: remove system call sys_sysctl
From: Will Deacon @ 2020-06-11 7:07 UTC (permalink / raw)
To: Xiaoming Ni
Cc: linux-sh, catalin.marinas, paulus, ak, paulburton, geert,
mattst88, brgerst, acme, cyphar, viro, luto, tglx, surenb, rth,
young.liuyang, linux-parisc, rdunlap, linux-kernel, mcgrof,
linux-fsdevel, akpm, mark.rutland, linux-ia64, linux-xtensa,
jongk, linux, James.Bottomley, jcmvbkbc, linux-s390, ysato,
deller, yzaikin, mszeredi, gor, linux-alpha, linux-m68k,
linux-arm-kernel, chris, tony.luck, linux-api, zhouyanjie,
minchan, ebiederm, sargun, alexander.shishkin, heiko.carstens,
alex.huangjianhui, krzk, borntraeger, vbabka, samitolvanen,
flameeyes, ravi.bangoria, elver, keescook, arnd, bp, christian,
tsbogend, jiri, martin.petersen, yamada.masahiro, oleg,
sudeep.holla, olof, shawnguo, davem, bauerman, dalias, fenghua.yu,
peterz, dhowells, hpa, sparclinux, jolsa, svens, x86, linux,
mingo, naveen.n.rao, paulmck, sfr, npiggin, namhyung, dvyukov,
axboe, monstr, haolee.swjtu, linux-mips, ink, linuxppc-dev
In-Reply-To: <1591847640-124894-1-git-send-email-nixiaoming@huawei.com>
On Thu, Jun 11, 2020 at 11:54:00AM +0800, Xiaoming Ni wrote:
> Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
> sys_sysctl is actually unavailable: any input can only return an error.
>
> We have been warning about people using the sysctl system call for years
> and believe there are no more users. Even if there are users of this
> interface if they have not complained or fixed their code by now they
> probably are not going to, so there is no point in warning them any
> longer.
>
> So completely remove sys_sysctl on all architectures.
>
> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
>
> changes in v2:
> According to Kees Cook's suggestion, completely remove sys_sysctl on all arch
> According to Eric W. Biederman's suggestion, update the commit log
>
> V1: https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaoming@huawei.com/
> Delete the code of sys_sysctl and return -ENOSYS directly at the function entry
> ---
> arch/alpha/kernel/syscalls/syscall.tbl | 2 +-
> arch/arm/configs/am200epdkit_defconfig | 1 -
> arch/arm/tools/syscall.tbl | 2 +-
> arch/arm64/include/asm/unistd32.h | 4 +-
For the arm/arm64 parts:
Acked-by: Will Deacon <will@kernel.org>
Will
^ permalink raw reply
* Linux powerpc new system call instruction and ABI
From: Nicholas Piggin @ 2020-06-11 8:12 UTC (permalink / raw)
To: linuxppc-dev; +Cc: libc-dev, musl, Nicholas Piggin, linux-api
Thanks to everyone who has given feedback on the proposed new system
call instruction and ABI, I think it has reached agreement and the
implementation can be merged into Linux.
I have a hacked glibc implementation (that doesn't do all the right
HWCAP detection and misses a few things) that I've tested several things
including some kernel selftests (involving signals and syscalls) with.
System Call Vectored (scv) ABI
==============================
The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is performance
(trading slower SRR0/1 with faster LR/CTR registers, and entering the
kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR
updates. The scv instruction has 128 levels (not enough to cover the Linux
system call space).
Assignment and advertisement
----------------------------
The proposal is to assign scv levels conservatively, and advertise them
with HWCAP feature bits as we add support for more.
Linux has not enabled FSCR[SCV] yet, so executing the scv instruction will
cause the kernel to log a "SCV facility unavilable" message, and deliver a
SIGILL with ILL_ILLOPC to the process. Linux has defined a HWCAP2 bit
PPC_FEATURE2_SCV for SCV support, but does not set it.
This change allocates the zero level ('scv 0'), advertised with
PPC_FEATURE2_SCV, which will be used to provide normal Linux system
calls (equivalent to 'sc').
Attempting to execute scv with other levels will cause a SIGILL to be
delivered the same as before, but will not log a "SCV facility unavailable"
message (because the processor facility is enabled).
Calling convention
------------------
The proposal is for scv 0 to provide the standard Linux system call ABI
with the following differences from sc convention[1]:
- lr is to be volatile across scv calls. This is necessary because the
scv instruction clobbers lr. From previous discussion, this should be
possible to deal with in GCC clobbers and CFI.
- cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the
kernel system call exit to avoid restoring the volatile cr registers
(although we probably still would anyway to avoid information leaks).
- Error handling: The consensus among kernel, glibc, and musl is to move to
using negative return values in r3 rather than CR0[SO]=1 to indicate error,
which matches most other architectures, and is closer to a function call.
Notes
-----
- r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as
submitted currently preserves them. This is to leave room for deciding
which way to go with these. Some small benefit was found by preserving
them[1] but I'm not convinced it's worth deviating from the C function
call ABI just for this. Release code should follow the ABI.
Previous discussions:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html
[1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst
[2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html
The following patches to add scv support to Linux are posted to
https://lists.ozlabs.org/pipermail/linuxppc-dev/
Nicholas Piggin (2):
powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
powerpc/64s: system call support for scv/rfscv instructions
Thanks,
Nick
--
2.23.0
^ permalink raw reply
* [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
From: Nicholas Piggin @ 2020-06-11 8:12 UTC (permalink / raw)
To: linuxppc-dev; +Cc: libc-dev, musl, Nicholas Piggin, linux-api
In-Reply-To: <20200611081203.995112-1-npiggin@gmail.com>
The scv instruction causes an interrupt which can enter the kernel with
MSR[EE]=1, thus allowing interrupts to hit at any time. These must not
be taken as normal interrupts, because they come from MSR[PR]=0 context,
and yet the kernel stack is not yet set up and r13 is not set to the
PACA).
Treat this as a soft-masked interrupt regardless of the soft masked
state. This does not affect behaviour yet, because currently all
interrupts are taken with MSR[EE]=0.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/exceptions-64s.S | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index e70ebb5c318c..388e34665b4a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -508,8 +508,24 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
.macro __GEN_COMMON_BODY name
.if IMASK
+ .if ! ISTACK
+ .error "No support for masked interrupt to use custom stack"
+ .endif
+
+ /* If coming from user, skip soft-mask tests. */
+ andi. r10,r12,MSR_PR
+ bne 2f
+
+ /* Kernel code running below __end_interrupts is implicitly
+ * soft-masked */
+ LOAD_HANDLER(r10, __end_interrupts)
+ cmpld r11,r10
+ li r10,IMASK
+ blt- 1f
+
+ /* Test the soft mask state against our interrupt's bit */
lbz r10,PACAIRQSOFTMASK(r13)
- andi. r10,r10,IMASK
+1: andi. r10,r10,IMASK
/* Associate vector numbers with bits in paca->irq_happened */
.if IVEC == 0x500 || IVEC == 0xea0
li r10,PACA_IRQ_EE
@@ -540,7 +556,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
.if ISTACK
andi. r10,r12,MSR_PR /* See if coming from user */
- mr r10,r1 /* Save r1 */
+2: mr r10,r1 /* Save r1 */
subi r1,r1,INT_FRAME_SIZE /* alloc frame on kernel stack */
beq- 100f
ld r1,PACAKSAVE(r13) /* kernel stack to use */
@@ -2838,7 +2854,8 @@ masked_interrupt:
ld r10,PACA_EXGEN+EX_R10(r13)
ld r11,PACA_EXGEN+EX_R11(r13)
ld r12,PACA_EXGEN+EX_R12(r13)
- /* returns to kernel where r13 must be set up, so don't restore it */
+ ld r13,PACA_EXGEN+EX_R13(r13)
+ /* May return to masked low address where r13 is not set up */
.if \hsrr
HRFI_TO_KERNEL
.else
@@ -2997,6 +3014,10 @@ EXC_COMMON_BEGIN(ppc64_runlatch_on_trampoline)
USE_FIXED_SECTION(virt_trampolines)
/*
+ * All code below __end_interrupts is treated as soft-masked. If
+ * any code runs here with MSR[EE]=1, it must then cope with pending
+ * soft interrupt being raised (i.e., by ensuring it is replayed).
+ *
* The __end_interrupts marker must be past the out-of-line (OOL)
* handlers, so that they are copied to real address 0x100 when running
* a relocatable kernel. This ensures they can be reached from the short
--
2.23.0
^ permalink raw reply related
* [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions
From: Nicholas Piggin @ 2020-06-11 8:12 UTC (permalink / raw)
To: linuxppc-dev; +Cc: libc-dev, musl, Nicholas Piggin, linux-api
In-Reply-To: <20200611081203.995112-1-npiggin@gmail.com>
Add support for the scv instruction on POWER9 and later CPUs.
For now this implements the zeroth scv vector 'scv 0', as identical to
'sc' system calls, with the exception that lr is not preserved, nor are
volatile cr registers, and error is not indicated with CR0[SO], but by
returning a negative errno.
rfscv is implemented to return from scv type system calls. It can not be
used to return from sc system calls because those are defined to
preserve lr.
getpid syscall throughput on POWER9 is improved by 26% (428 to 318
cycles), largely due to reducing mtmsr and mtspr.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Documentation/powerpc/syscall64-abi.rst | 42 ++++--
arch/powerpc/include/asm/asm-prototypes.h | 2 +-
arch/powerpc/include/asm/exception-64s.h | 6 +
arch/powerpc/include/asm/head-64.h | 2 +-
arch/powerpc/include/asm/ppc-opcode.h | 2 +
arch/powerpc/include/asm/ppc_asm.h | 2 +
arch/powerpc/include/asm/ptrace.h | 7 +-
arch/powerpc/include/asm/setup.h | 4 +-
arch/powerpc/include/asm/sstep.h | 1 +
arch/powerpc/kernel/cpu_setup_power.S | 10 +-
arch/powerpc/kernel/cputable.c | 3 +-
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
arch/powerpc/kernel/entry_64.S | 171 +++++++++++++++++++++-
arch/powerpc/kernel/exceptions-64s.S | 123 +++++++++++++++-
arch/powerpc/kernel/process.c | 10 +-
arch/powerpc/kernel/setup_64.c | 5 +-
arch/powerpc/kernel/signal.c | 19 ++-
arch/powerpc/kernel/syscall_64.c | 32 +++-
arch/powerpc/lib/sstep.c | 14 ++
arch/powerpc/platforms/pseries/setup.c | 8 +-
arch/powerpc/xmon/xmon.c | 1 +
21 files changed, 421 insertions(+), 44 deletions(-)
diff --git a/Documentation/powerpc/syscall64-abi.rst b/Documentation/powerpc/syscall64-abi.rst
index e49f69f941b9..46caaadbb029 100644
--- a/Documentation/powerpc/syscall64-abi.rst
+++ b/Documentation/powerpc/syscall64-abi.rst
@@ -5,6 +5,15 @@ Power Architecture 64-bit Linux system call ABI
syscall
=======
+Invocation
+----------
+The syscall is made with the sc instruction, and returns with execution
+continuing at the instruction following the sc instruction.
+
+If PPC_FEATURE2_SCV appears in the AT_HWCAP2 ELF auxiliary vector, the
+scv 0 instruction is an alternative that may provide better performance,
+with some differences to calling sequence.
+
syscall calling sequence\ [1]_ matches the Power Architecture 64-bit ELF ABI
specification C function calling sequence, including register preservation
rules, with the following differences.
@@ -12,16 +21,23 @@ rules, with the following differences.
.. [1] Some syscalls (typically low-level management functions) may have
different calling sequences (e.g., rt_sigreturn).
-Parameters and return value
----------------------------
+Parameters
+----------
The system call number is specified in r0.
There is a maximum of 6 integer parameters to a syscall, passed in r3-r8.
-Both a return value and a return error code are returned. cr0.SO is the return
-error code, and r3 is the return value or error code. When cr0.SO is clear,
-the syscall succeeded and r3 is the return value. When cr0.SO is set, the
-syscall failed and r3 is the error code that generally corresponds to errno.
+Return value
+------------
+- For the sc instruction, both a value and an error condition are returned.
+ cr0.SO is the error condition, and r3 is the return value. When cr0.SO is
+ clear, the syscall succeeded and r3 is the return value. When cr0.SO is set,
+ the syscall failed and r3 is the error value (that normally corresponds to
+ errno).
+
+- For the scv 0 instruction, the return value indicates failure if it is
+ -4095..-1 (i.e., it is >= -MAX_ERRNO (-4095) as an unsigned comparison),
+ in which case the error value is the negated return value.
Stack
-----
@@ -34,22 +50,23 @@ Register preservation rules match the ELF ABI calling sequence with the
following differences:
=========== ============= ========================================
+--- For the sc instruction, differences with the ELF ABI ---
r0 Volatile (System call number.)
r3 Volatile (Parameter 1, and return value.)
r4-r8 Volatile (Parameters 2-6.)
-cr0 Volatile (cr0.SO is the return error condition)
+cr0 Volatile (cr0.SO is the return error condition.)
cr1, cr5-7 Nonvolatile
lr Nonvolatile
+
+--- For the scv 0 instruction, differences with the ELF ABI ---
+r0 Volatile (System call number.)
+r3 Volatile (Parameter 1, and return value.)
+r4-r8 Volatile (Parameters 2-6.)
=========== ============= ========================================
All floating point and vector data registers as well as control and status
registers are nonvolatile.
-Invocation
-----------
-The syscall is performed with the sc instruction, and returns with execution
-continuing at the instruction following the sc instruction.
-
Transactional Memory
--------------------
Syscall behavior can change if the processor is in transactional or suspended
@@ -75,6 +92,7 @@ auxiliary vector.
returning to the caller. This case is not well defined or supported, so this
behavior should not be relied upon.
+scv 0 syscalls will always behave as PPC_FEATURE2_HTM_NOSC.
vsyscall
========
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 7d81e86a1e5d..fb47bf5818c8 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -98,7 +98,7 @@ unsigned long __init early_init(unsigned long dt_ptr);
void __init machine_init(u64 dt_ptr);
#endif
long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
-notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs);
+notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs, long scv);
notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 47bd4ea0837d..0c2fe7f042d1 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -123,6 +123,12 @@
hrfid; \
b hrfi_flush_fallback
+#define RFSCV_TO_USER \
+ STF_EXIT_BARRIER_SLOT; \
+ RFI_FLUSH_SLOT; \
+ RFSCV; \
+ b rfscv_flush_fallback
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_EXCEPTION_H */
diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h
index 2dabcf668292..4cb9efa2eb21 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -128,7 +128,7 @@ end_##sname:
.if ((start) % (size) != 0); \
.error "Fixed section exception vector misalignment"; \
.endif; \
- .if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100); \
+ .if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100) && ((size) != 0x1000); \
.error "Fixed section exception vector bad size"; \
.endif; \
.if (start) < sname##_start; \
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 2a39c716c343..b2bdc4de1292 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -257,6 +257,7 @@
#define PPC_INST_MFVSRD 0x7c000066
#define PPC_INST_MTVSRD 0x7c000166
#define PPC_INST_SC 0x44000002
+#define PPC_INST_SCV 0x44000001
#define PPC_INST_SLBFEE 0x7c0007a7
#define PPC_INST_SLBIA 0x7c0003e4
@@ -411,6 +412,7 @@
#define __PPC_CT(t) (((t) & 0x0f) << 21)
#define __PPC_SPR(r) ((((r) & 0x1f) << 16) | ((((r) >> 5) & 0x1f) << 11))
#define __PPC_RC21 (0x1 << 10)
+#define __PPC_LEV(l) (((l) & 0x7f) << 5)
/*
* Both low and high 16 bits are added as SIGNED additions, so if low 16 bits
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 6b03dff61a05..160f3bb77ea4 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -755,6 +755,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)
#define N_SLINE 68
#define N_SO 100
+#define RFSCV .long 0x4c0000a4
+
/*
* Create an endian fixup trampoline
*
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index ac3970fff0d5..f194339cef3b 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -222,9 +222,14 @@ static inline void set_trap(struct pt_regs *regs, unsigned long val)
regs->trap = (regs->trap & TRAP_FLAGS_MASK) | (val & ~TRAP_FLAGS_MASK);
}
+static inline bool trap_is_scv(struct pt_regs *regs)
+{
+ return (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && TRAP(regs) == 0x3000);
+}
+
static inline bool trap_is_syscall(struct pt_regs *regs)
{
- return TRAP(regs) == 0xc00;
+ return (trap_is_scv(regs) || TRAP(regs) == 0xc00);
}
static inline bool trap_norestart(struct pt_regs *regs)
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 65676e2325b8..9efbddee2bca 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -30,12 +30,12 @@ void setup_panic(void);
#define ARCH_PANIC_TIMEOUT 180
#ifdef CONFIG_PPC_PSERIES
-extern void pseries_enable_reloc_on_exc(void);
+extern bool pseries_enable_reloc_on_exc(void);
extern void pseries_disable_reloc_on_exc(void);
extern void pseries_big_endian_exceptions(void);
extern void pseries_little_endian_exceptions(void);
#else
-static inline void pseries_enable_reloc_on_exc(void) {}
+static inline bool pseries_enable_reloc_on_exc(void) { return false; }
static inline void pseries_disable_reloc_on_exc(void) {}
static inline void pseries_big_endian_exceptions(void) {}
static inline void pseries_little_endian_exceptions(void) {}
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 3b01c69a44aa..eaa4fb6c8960 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -40,6 +40,7 @@ enum instruction_type {
CACHEOP,
BARRIER,
SYSCALL,
+ SYSCALL_VECTORED_0,
MFMSR,
MTMSR,
RFI,
diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
index efdcfa714106..86527d19348c 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -98,7 +98,7 @@ _GLOBAL(__setup_cpu_power10)
_GLOBAL(__setup_cpu_power9)
mflr r11
- bl __init_FSCR
+ bl __init_FSCR_power9
1: bl __init_PMU
bl __init_hvmode_206
mtlr r11
@@ -128,7 +128,7 @@ _GLOBAL(__restore_cpu_power10)
_GLOBAL(__restore_cpu_power9)
mflr r11
- bl __init_FSCR
+ bl __init_FSCR_power9
1: bl __init_PMU
mfmsr r3
rldicl. r0,r3,4,63
@@ -198,6 +198,12 @@ __init_FSCR_power10:
mtspr SPRN_FSCR, r3
// fall through
+__init_FSCR_power9:
+ mfspr r3, SPRN_FSCR
+ ori r3, r3, FSCR_SCV
+ mtspr SPRN_FSCR, r3
+ // fall through
+
__init_FSCR:
mfspr r3,SPRN_FSCR
ori r3,r3,FSCR_TAR|FSCR_EBB
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index b4066354f073..3d406a9626e8 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -120,7 +120,8 @@ extern void __restore_cpu_e6500(void);
#define COMMON_USER2_POWER9 (COMMON_USER2_POWER8 | \
PPC_FEATURE2_ARCH_3_00 | \
PPC_FEATURE2_HAS_IEEE128 | \
- PPC_FEATURE2_DARN )
+ PPC_FEATURE2_DARN | \
+ PPC_FEATURE2_SCV)
#define COMMON_USER_POWER10 COMMON_USER_POWER9
#define COMMON_USER2_POWER10 (COMMON_USER2_POWER9 | \
PPC_FEATURE2_ARCH_3_1 | \
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 3a409517c031..50b2d544361e 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -587,6 +587,7 @@ static struct dt_cpu_feature_match __initdata
{"little-endian", feat_enable_le, CPU_FTR_REAL_LE},
{"smt", feat_enable_smt, 0},
{"interrupt-facilities", feat_enable, 0},
+ {"system-call-vectored", feat_enable, 0},
{"timer-facilities", feat_enable, 0},
{"timer-facilities-v3", feat_enable, 0},
{"debug-facilities", feat_enable, 0},
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9d49338e0c85..223c4f008e63 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -64,15 +64,173 @@ exception_marker:
.section ".text"
.align 7
+#ifdef CONFIG_PPC_BOOK3S
+.macro system_call_vectored name trapnr
+ .globl system_call_vectored_\name
+system_call_vectored_\name:
+_ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+BEGIN_FTR_SECTION
+ extrdi. r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
+ bne .Ltabort_syscall
+END_FTR_SECTION_IFSET(CPU_FTR_TM)
+#endif
+ INTERRUPT_TO_KERNEL
+ mr r10,r1
+ ld r1,PACAKSAVE(r13)
+ std r10,0(r1)
+ std r11,_NIP(r1)
+ std r12,_MSR(r1)
+ std r0,GPR0(r1)
+ std r10,GPR1(r1)
+ std r2,GPR2(r1)
+ ld r2,PACATOC(r13)
+ mfcr r12
+ li r11,0
+ /* Can we avoid saving r3-r8 in common case? */
+ std r3,GPR3(r1)
+ std r4,GPR4(r1)
+ std r5,GPR5(r1)
+ std r6,GPR6(r1)
+ std r7,GPR7(r1)
+ std r8,GPR8(r1)
+ /* Zero r9-r12, this should only be required when restoring all GPRs */
+ std r11,GPR9(r1)
+ std r11,GPR10(r1)
+ std r11,GPR11(r1)
+ std r11,GPR12(r1)
+ std r9,GPR13(r1)
+ SAVE_NVGPRS(r1)
+ std r11,_XER(r1)
+ std r11,_LINK(r1)
+ std r11,_CTR(r1)
+
+ li r11,\trapnr
+ std r11,_TRAP(r1)
+ std r12,_CCR(r1)
+ std r3,ORIG_GPR3(r1)
+ addi r10,r1,STACK_FRAME_OVERHEAD
+ ld r11,exception_marker@toc(r2)
+ std r11,-16(r10) /* "regshere" marker */
+
+ /*
+ * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
+ * would clobber syscall parameters. Also we always enter with IRQs
+ * enabled and nothing pending. system_call_exception() will call
+ * trace_hardirqs_off().
+ *
+ * scv enters with MSR[EE]=1, so don't set PACA_IRQ_HARD_DIS. The
+ * entry vector already sets PACAIRQSOFTMASK to IRQS_ALL_DISABLED.
+ */
+
+ /* Calling convention has r9 = orig r0, r10 = regs */
+ mr r9,r0
+ bl system_call_exception
+
+.Lsyscall_vectored_\name\()_exit:
+ addi r4,r1,STACK_FRAME_OVERHEAD
+ li r5,1 /* scv */
+ bl syscall_exit_prepare
+
+ ld r2,_CCR(r1)
+ ld r4,_NIP(r1)
+ ld r5,_MSR(r1)
+
+BEGIN_FTR_SECTION
+ stdcx. r0,0,r1 /* to clear the reservation */
+END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
+
+BEGIN_FTR_SECTION
+ HMT_MEDIUM_LOW
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+
+ cmpdi r3,0
+ bne .Lsyscall_vectored_\name\()_restore_regs
+
+ /* rfscv returns with LR->NIA and CTR->MSR */
+ mtlr r4
+ mtctr r5
+
+ /* Could zero these as per ABI, but we may consider a stricter ABI
+ * which preserves these if libc implementations can benefit, so
+ * restore them for now until further measurement is done. */
+ ld r0,GPR0(r1)
+ ld r4,GPR4(r1)
+ ld r5,GPR5(r1)
+ ld r6,GPR6(r1)
+ ld r7,GPR7(r1)
+ ld r8,GPR8(r1)
+ /* Zero volatile regs that may contain sensitive kernel data */
+ li r9,0
+ li r10,0
+ li r11,0
+ li r12,0
+ mtspr SPRN_XER,r0
+
+ /*
+ * We don't need to restore AMR on the way back to userspace for KUAP.
+ * The value of AMR only matters while we're in the kernel.
+ */
+ mtcr r2
+ ld r2,GPR2(r1)
+ ld r3,GPR3(r1)
+ ld r13,GPR13(r1)
+ ld r1,GPR1(r1)
+ RFSCV_TO_USER
+ b . /* prevent speculative execution */
+
+.Lsyscall_vectored_\name\()_restore_regs:
+ li r3,0
+ mtmsrd r3,1
+ mtspr SPRN_SRR0,r4
+ mtspr SPRN_SRR1,r5
+
+ ld r3,_CTR(r1)
+ ld r4,_LINK(r1)
+ ld r5,_XER(r1)
+
+ REST_NVGPRS(r1)
+ ld r0,GPR0(r1)
+ mtcr r2
+ mtctr r3
+ mtlr r4
+ mtspr SPRN_XER,r5
+ REST_10GPRS(2, r1)
+ REST_2GPRS(12, r1)
+ ld r1,GPR1(r1)
+ RFI_TO_USER
+.endm
+
+system_call_vectored common 0x3000
+/*
+ * We instantiate another entry copy for the SIGILL variant, with TRAP=0x7ff0
+ * which is tested by system_call_exception when r0 is -1 (as set by vector
+ * entry code).
+ */
+system_call_vectored sigill 0x7ff0
+
+
+/*
+ * Entered via kernel return set up by kernel/sstep.c, must match entry regs
+ */
+ .globl system_call_vectored_emulate
+system_call_vectored_emulate:
+_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
+ li r10,IRQS_ALL_DISABLED
+ stb r10,PACAIRQSOFTMASK(r13)
+ b system_call_vectored_common
+#endif
+
+ .balign IFETCH_ALIGN_BYTES
.globl system_call_common
system_call_common:
+_ASM_NOKPROBE_SYMBOL(system_call_common)
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
BEGIN_FTR_SECTION
extrdi. r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
bne .Ltabort_syscall
END_FTR_SECTION_IFSET(CPU_FTR_TM)
#endif
-_ASM_NOKPROBE_SYMBOL(system_call_common)
mr r10,r1
ld r1,PACAKSAVE(r13)
std r10,0(r1)
@@ -138,6 +296,7 @@ END_BTB_FLUSH_SECTION
.Lsyscall_exit:
addi r4,r1,STACK_FRAME_OVERHEAD
+ li r5,0 /* !scv */
bl syscall_exit_prepare
ld r2,_CCR(r1)
@@ -224,10 +383,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
b . /* prevent speculative execution */
#endif
+#ifdef CONFIG_PPC_BOOK3S
+_GLOBAL(ret_from_fork_scv)
+ bl schedule_tail
+ REST_NVGPRS(r1)
+ li r3,0 /* fork() return value */
+ b .Lsyscall_vectored_common_exit
+#endif
+
_GLOBAL(ret_from_fork)
bl schedule_tail
REST_NVGPRS(r1)
- li r3,0
+ li r3,0 /* fork() return value */
b .Lsyscall_exit
_GLOBAL(ret_from_kernel_thread)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 388e34665b4a..f5f24e6c685f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -756,6 +756,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
* guarantee they will be delivered virtually. Some conditions (see the ISA)
* cause exceptions to be delivered in real mode.
*
+ * The scv instructions are a special case. They get a 0x3000 offset applied.
+ * scv exceptions have unique reentrancy properties, see below.
+ *
* It's impossible to receive interrupts below 0x300 via AIL.
*
* KVM: None of the virtual exceptions are from the guest. Anything that
@@ -765,8 +768,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
* We layout physical memory as follows:
* 0x0000 - 0x00ff : Secondary processor spin code
* 0x0100 - 0x18ff : Real mode pSeries interrupt vectors
- * 0x1900 - 0x3fff : Real mode trampolines
- * 0x4000 - 0x58ff : Relon (IR=1,DR=1) mode pSeries interrupt vectors
+ * 0x1900 - 0x2fff : Real mode trampolines
+ * 0x3000 - 0x58ff : Relon (IR=1,DR=1) mode pSeries interrupt vectors
* 0x5900 - 0x6fff : Relon mode trampolines
* 0x7000 - 0x7fff : FWNMI data area
* 0x8000 - .... : Common interrupt handlers, remaining early
@@ -777,8 +780,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
* vectors there.
*/
OPEN_FIXED_SECTION(real_vectors, 0x0100, 0x1900)
-OPEN_FIXED_SECTION(real_trampolines, 0x1900, 0x4000)
-OPEN_FIXED_SECTION(virt_vectors, 0x4000, 0x5900)
+OPEN_FIXED_SECTION(real_trampolines, 0x1900, 0x3000)
+OPEN_FIXED_SECTION(virt_vectors, 0x3000, 0x5900)
OPEN_FIXED_SECTION(virt_trampolines, 0x5900, 0x7000)
#ifdef CONFIG_PPC_POWERNV
@@ -814,6 +817,77 @@ USE_FIXED_SECTION(real_vectors)
.globl __start_interrupts
__start_interrupts:
+/**
+ * Interrupt 0x3000 - System Call Vectored Interrupt (syscall).
+ * This is a synchronous interrupt invoked with the "scv" instruction. The
+ * system call does not alter the HV bit, so it is directed to the OS.
+ *
+ * Handling:
+ * scv instructions enter the kernel without changing EE, RI, ME, or HV.
+ * In particular, this means we can take a maskable interrupt at any point
+ * in the scv handler, which is unlike any other interrupt. This is solved
+ * by treating the instruction addresses below __end_interrupts as being
+ * soft-masked.
+ *
+ * AIL-0 mode scv exceptions go to 0x17000-0x17fff, but we set AIL-3 and
+ * ensure scv is never executed with relocation off, which means AIL-0
+ * should never happen.
+ *
+ * Before leaving the below __end_interrupts text, at least of the following
+ * must be true:
+ * - MSR[PR]=1 (i.e., return to userspace)
+ * - MSR_EE|MSR_RI is set (no reentrant exceptions)
+ * - Standard kernel environment is set up (stack, paca, etc)
+ *
+ * Call convention:
+ *
+ * syscall register convention is in Documentation/powerpc/syscall64-abi.rst
+ */
+EXC_VIRT_BEGIN(system_call_vectored, 0x3000, 0x1000)
+ /* SCV 0 */
+ mr r9,r13
+ GET_PACA(r13)
+ mflr r11
+ mfctr r12
+ li r10,IRQS_ALL_DISABLED
+ stb r10,PACAIRQSOFTMASK(r13)
+#ifdef CONFIG_RELOCATABLE
+ b system_call_vectored_tramp
+#else
+ b system_call_vectored_common
+#endif
+ nop
+
+ /* SCV 1 - 127 */
+ .rept 127
+ mr r9,r13
+ GET_PACA(r13)
+ mflr r11
+ mfctr r12
+ li r10,IRQS_ALL_DISABLED
+ stb r10,PACAIRQSOFTMASK(r13)
+ li r0,-1 /* cause failure */
+#ifdef CONFIG_RELOCATABLE
+ b system_call_vectored_sigill_tramp
+#else
+ b system_call_vectored_sigill
+#endif
+ .endr
+EXC_VIRT_END(system_call_vectored, 0x3000, 0x1000)
+
+#ifdef CONFIG_RELOCATABLE
+TRAMP_VIRT_BEGIN(system_call_vectored_tramp)
+ __LOAD_HANDLER(r10, system_call_vectored_common)
+ mtctr r10
+ bctr
+
+TRAMP_VIRT_BEGIN(system_call_vectored_sigill_tramp)
+ __LOAD_HANDLER(r10, system_call_vectored_sigill)
+ mtctr r10
+ bctr
+#endif
+
+
/* No virt vectors corresponding with 0x0..0x100 */
EXC_VIRT_NONE(0x4000, 0x100)
@@ -2963,6 +3037,47 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback)
GET_SCRATCH0(r13);
hrfid
+TRAMP_REAL_BEGIN(rfscv_flush_fallback)
+ /* system call volatile */
+ mr r7,r13
+ GET_PACA(r13);
+ mr r8,r1
+ ld r1,PACAKSAVE(r13)
+ mfctr r9
+ ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13)
+ ld r11,PACA_L1D_FLUSH_SIZE(r13)
+ srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */
+ mtctr r11
+ DCBT_BOOK3S_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
+
+ /* order ld/st prior to dcbt stop all streams with flushing */
+ sync
+
+ /*
+ * The load adresses are at staggered offsets within cachelines,
+ * which suits some pipelines better (on others it should not
+ * hurt).
+ */
+1:
+ ld r11,(0x80 + 8)*0(r10)
+ ld r11,(0x80 + 8)*1(r10)
+ ld r11,(0x80 + 8)*2(r10)
+ ld r11,(0x80 + 8)*3(r10)
+ ld r11,(0x80 + 8)*4(r10)
+ ld r11,(0x80 + 8)*5(r10)
+ ld r11,(0x80 + 8)*6(r10)
+ ld r11,(0x80 + 8)*7(r10)
+ addi r10,r10,0x80*8
+ bdnz 1b
+
+ mtctr r9
+ li r9,0
+ li r10,0
+ li r11,0
+ mr r1,r8
+ mr r13,r7
+ RFSCV
+
USE_TEXT_SECTION()
MASKED_INTERRUPT
MASKED_INTERRUPT hsrr=1
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 7bb7faf84490..a0c2746f8c11 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1596,6 +1596,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
{
struct pt_regs *childregs, *kregs;
extern void ret_from_fork(void);
+ extern void ret_from_fork_scv(void);
extern void ret_from_kernel_thread(void);
void (*f)(void);
unsigned long sp = (unsigned long)task_stack_page(p) + THREAD_SIZE;
@@ -1632,7 +1633,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
if (usp)
childregs->gpr[1] = usp;
p->thread.regs = childregs;
- childregs->gpr[3] = 0; /* Result from fork() */
+ /* 64s sets this in ret_from_fork */
+ if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+ childregs->gpr[3] = 0; /* Result from fork() */
if (clone_flags & CLONE_SETTLS) {
if (!is_32bit_task())
childregs->gpr[13] = tls;
@@ -1640,7 +1643,10 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
childregs->gpr[2] = tls;
}
- f = ret_from_fork;
+ if (trap_is_scv(regs))
+ f = ret_from_fork_scv;
+ else
+ f = ret_from_fork;
}
childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
sp -= STACK_FRAME_OVERHEAD;
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 0ba1ed77dc68..6be430107c6f 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -196,7 +196,10 @@ static void __init configure_exceptions(void)
/* Under a PAPR hypervisor, we need hypercalls */
if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
/* Enable AIL if possible */
- pseries_enable_reloc_on_exc();
+ if (!pseries_enable_reloc_on_exc()) {
+ init_task.thread.fscr &= ~FSCR_SCV;
+ cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV;
+ }
/*
* Tell the hypervisor that we want our exceptions to
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index b4143b6ff093..d15a98c758b8 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -205,8 +205,14 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
return;
/* error signalled ? */
- if (!(regs->ccr & 0x10000000))
+ if (trap_is_scv(regs)) {
+ /* 32-bit compat mode sign extend? */
+ if (!IS_ERR_VALUE(ret))
+ return;
+ ret = -ret;
+ } else if (!(regs->ccr & 0x10000000)) {
return;
+ }
switch (ret) {
case ERESTART_RESTARTBLOCK:
@@ -239,9 +245,14 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
regs->nip -= 4;
regs->result = 0;
} else {
- regs->result = -EINTR;
- regs->gpr[3] = EINTR;
- regs->ccr |= 0x10000000;
+ if (trap_is_scv(regs)) {
+ regs->result = -EINTR;
+ regs->gpr[3] = -EINTR;
+ } else {
+ regs->result = -EINTR;
+ regs->gpr[3] = EINTR;
+ regs->ccr |= 0x10000000;
+ }
}
}
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 79edba3ab312..a783fd324cb0 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -60,6 +60,11 @@ notrace long system_call_exception(long r3, long r4, long r5,
local_irq_enable();
if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
+ if (unlikely(regs->trap == 0x7ff0)) {
+ /* Unsupported scv vector */
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+ return regs->gpr[3];
+ }
/*
* We use the return value of do_syscall_trace_enter() as the
* syscall number. If the syscall was rejected for any reason
@@ -78,6 +83,11 @@ notrace long system_call_exception(long r3, long r4, long r5,
r8 = regs->gpr[8];
} else if (unlikely(r0 >= NR_syscalls)) {
+ if (unlikely(regs->trap == 0x7ff0)) {
+ /* Unsupported scv vector */
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+ return regs->gpr[3];
+ }
return -ENOSYS;
}
@@ -105,16 +115,20 @@ notrace long system_call_exception(long r3, long r4, long r5,
* local irqs must be disabled. Returns false if the caller must re-enable
* them, check for new work, and try again.
*/
-static notrace inline bool prep_irq_for_enabled_exit(void)
+static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri)
{
/* This must be done with RI=1 because tracing may touch vmaps */
trace_hardirqs_on();
/* This pattern matches prep_irq_for_idle */
- __hard_EE_RI_disable();
+ if (clear_ri)
+ __hard_EE_RI_disable();
+ else
+ __hard_irq_disable();
if (unlikely(lazy_irq_pending_nocheck())) {
/* Took an interrupt, may have more exit work to do. */
- __hard_RI_enable();
+ if (clear_ri)
+ __hard_RI_enable();
trace_hardirqs_off();
local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
@@ -136,7 +150,8 @@ static notrace inline bool prep_irq_for_enabled_exit(void)
* because RI=0 and soft mask state is "unreconciled", so it is marked notrace.
*/
notrace unsigned long syscall_exit_prepare(unsigned long r3,
- struct pt_regs *regs)
+ struct pt_regs *regs,
+ long scv)
{
unsigned long *ti_flagsp = ¤t_thread_info()->flags;
unsigned long ti_flags;
@@ -151,7 +166,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
ti_flags = *ti_flagsp;
- if (unlikely(r3 >= (unsigned long)-MAX_ERRNO)) {
+ if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) {
if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
r3 = -r3;
regs->ccr |= 0x10000000; /* Set SO bit in CR */
@@ -211,7 +226,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
}
}
- if (unlikely(!prep_irq_for_enabled_exit())) {
+ /* scv need not set RI=0 because SRRs are not used */
+ if (unlikely(!prep_irq_for_enabled_exit(!scv))) {
local_irq_enable();
goto again;
}
@@ -282,7 +298,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned
}
}
- if (unlikely(!prep_irq_for_enabled_exit())) {
+ if (unlikely(!prep_irq_for_enabled_exit(true))) {
local_irq_enable();
local_irq_disable();
goto again;
@@ -345,7 +361,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
}
}
- if (unlikely(!prep_irq_for_enabled_exit())) {
+ if (unlikely(!prep_irq_for_enabled_exit(true))) {
/*
* Can't local_irq_restore to replay if we were in
* interrupt context. Must replay directly.
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 5abe98216dc2..161bfccbc309 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -16,6 +16,7 @@
#include <asm/disassemble.h>
extern char system_call_common[];
+extern char system_call_vectored_emulate[];
#ifdef CONFIG_PPC64
/* Bits in SRR1 that are copied from MSR */
@@ -1236,6 +1237,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
case 17: /* sc */
if ((word & 0xfe2) == 2)
op->type = SYSCALL;
+ else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) &&
+ (word & 0xfe3) == 1)
+ op->type = SYSCALL_VECTORED_0;
else
op->type = UNKNOWN;
return 0;
@@ -3378,6 +3382,16 @@ int emulate_step(struct pt_regs *regs, struct ppc_inst instr)
regs->msr = MSR_KERNEL;
return 1;
+ case SYSCALL_VECTORED_0: /* scv 0 */
+ regs->gpr[9] = regs->gpr[13];
+ regs->gpr[10] = MSR_KERNEL;
+ regs->gpr[11] = regs->nip + 4;
+ regs->gpr[12] = regs->msr & MSR_MASK;
+ regs->gpr[13] = (unsigned long) get_paca();
+ regs->nip = (unsigned long) &system_call_vectored_emulate;
+ regs->msr = MSR_KERNEL;
+ return 1;
+
case RFI:
return -1;
#endif
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 2db8469e475f..8c85466e0dd8 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -358,7 +358,7 @@ static void pseries_lpar_idle(void)
* to ever be a problem in practice we can move this into a kernel thread to
* finish off the process later in boot.
*/
-void pseries_enable_reloc_on_exc(void)
+bool pseries_enable_reloc_on_exc(void)
{
long rc;
unsigned int delay, total_delay = 0;
@@ -369,11 +369,13 @@ void pseries_enable_reloc_on_exc(void)
if (rc == H_P2) {
pr_info("Relocation on exceptions not"
" supported\n");
+ return false;
} else if (rc != H_SUCCESS) {
pr_warn("Unable to enable relocation"
" on exceptions: %ld\n", rc);
+ return false;
}
- break;
+ return true;
}
delay = get_longbusy_msecs(rc);
@@ -382,7 +384,7 @@ void pseries_enable_reloc_on_exc(void)
pr_warn("Warning: Giving up waiting to enable "
"relocation on exceptions (%u msec)!\n",
total_delay);
- return;
+ return false;
}
mdelay(delay);
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 7efe4bc3ccf6..3203c3606737 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1593,6 +1593,7 @@ const char *getvecname(unsigned long vec)
case 0x1300: ret = "(Instruction Breakpoint)"; break;
case 0x1500: ret = "(Denormalisation)"; break;
case 0x1700: ret = "(Altivec Assist)"; break;
+ case 0x3000: ret = "(System Call Vectored)"; break;
default: ret = "";
}
return ret;
--
2.23.0
^ permalink raw reply related
* Re: [PATCH? v2] powerpc: Hard wire PT_SOFTE value to 1 in gpr_get() too
From: Madhavan Srinivasan @ 2020-06-11 8:52 UTC (permalink / raw)
To: Oleg Nesterov, Benjamin Herrenschmidt, Madhavan Srinivasan,
Michael Ellerman, Paul Mackerras
Cc: linuxppc-dev, Jan Kratochvil, linux-kernel
In-Reply-To: <20200610150224.GA6793@redhat.com>
On 6/10/20 8:37 PM, Oleg Nesterov wrote:
> Hi,
>
> looks like this patch was forgotten.
yep, I missed this. But mpe did have comments for the patch.
https://lkml.org/lkml/2019/9/19/107
Maddy
>
> Do you think this should be fixed or should we document that
> PTRACE_GETREGS is not consistent with PTRACE_PEEKUSER on ppc64?
>
>
> On 09/17, Oleg Nesterov wrote:
>> I don't have a ppc machine, this patch wasn't even compile tested,
>> could you please review?
>>
>> The commit a8a4b03ab95f ("powerpc: Hard wire PT_SOFTE value to 1 in
>> ptrace & signals") changed ptrace_get_reg(PT_SOFTE) to report 0x1,
>> but PTRACE_GETREGS still copies pt_regs->softe as is.
>>
>> This is not consistent and this breaks
>> http://sourceware.org/systemtap/wiki/utrace/tests/user-regs-peekpoke
>>
>> Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
>> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>> ---
>> arch/powerpc/kernel/ptrace.c | 25 +++++++++++++++++++++++++
>> 1 file changed, 25 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
>> index 8c92feb..291acfb 100644
>> --- a/arch/powerpc/kernel/ptrace.c
>> +++ b/arch/powerpc/kernel/ptrace.c
>> @@ -363,11 +363,36 @@ static int gpr_get(struct task_struct *target, const struct user_regset *regset,
>> BUILD_BUG_ON(offsetof(struct pt_regs, orig_gpr3) !=
>> offsetof(struct pt_regs, msr) + sizeof(long));
>>
>> +#ifdef CONFIG_PPC64
>> + if (!ret)
>> + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>> + &target->thread.regs->orig_gpr3,
>> + offsetof(struct pt_regs, orig_gpr3),
>> + offsetof(struct pt_regs, softe));
>> +
>> + if (!ret) {
>> + unsigned long softe = 0x1;
>> + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &softe,
>> + offsetof(struct pt_regs, softe),
>> + offsetof(struct pt_regs, softe) +
>> + sizeof(softe));
>> + }
>> +
>> + BUILD_BUG_ON(offsetof(struct pt_regs, trap) !=
>> + offsetof(struct pt_regs, softe) + sizeof(long));
>> +
>> + if (!ret)
>> + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>> + &target->thread.regs->trap,
>> + offsetof(struct pt_regs, trap),
>> + sizeof(struct user_pt_regs));
>> +#else
>> if (!ret)
>> ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>> &target->thread.regs->orig_gpr3,
>> offsetof(struct pt_regs, orig_gpr3),
>> sizeof(struct user_pt_regs));
>> +#endif
>> if (!ret)
>> ret = user_regset_copyout_zero(&pos, &count, &kbuf, &ubuf,
>> sizeof(struct user_pt_regs), -1);
>> --
>> 2.5.0
>>
^ permalink raw reply
* Re: [PATCH? v2] powerpc: Hard wire PT_SOFTE value to 1 in gpr_get() too
From: Oleg Nesterov @ 2020-06-11 10:58 UTC (permalink / raw)
To: Madhavan Srinivasan
Cc: Madhavan Srinivasan, linuxppc-dev, linux-kernel, Paul Mackerras,
Jan Kratochvil
In-Reply-To: <321e6865-1762-c459-56c4-0cc89c7c2a7e@linux.ibm.com>
On 06/11, Madhavan Srinivasan wrote:
>
>
> On 6/10/20 8:37 PM, Oleg Nesterov wrote:
> >Hi,
> >
> >looks like this patch was forgotten.
>
> yep, I missed this. But mpe did have comments for the patch.
>
> https://lkml.org/lkml/2019/9/19/107
Yes, and I thought that I have replied... apparently not, sorry!
So let me repeat, I am fine either way, I do not understand this
ppc-specific code and I can't really test this change.
Let me quote that email from Michael:
> We could do it like below. I'm 50/50 though on whether it's worth it, or
> if we should just go with the big ifdef like in your patch.
up to you ;)
Hmm. And yes,
> >>This is not consistent and this breaks
> >>http://sourceware.org/systemtap/wiki/utrace/tests/user-regs-peekpoke
this is 404.
Jan, could correct the link above?
Oleg.
^ permalink raw reply
* Re: [PATCH? v2] powerpc: Hard wire PT_SOFTE value to 1 in gpr_get() too
From: Jan Kratochvil @ 2020-06-11 11:11 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Madhavan Srinivasan, linux-kernel, Madhavan Srinivasan,
Paul Mackerras, linuxppc-dev
In-Reply-To: <20200611105830.GB12500@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 371 bytes --]
On Thu, 11 Jun 2020 12:58:31 +0200, Oleg Nesterov wrote:
> On 06/11, Madhavan Srinivasan wrote:
> > On 6/10/20 8:37 PM, Oleg Nesterov wrote:
> > > > This is not consistent and this breaks
> > > > http://sourceware.org/systemtap/wiki/utrace/tests/user-regs-peekpoke
>
> this is 404.
Attaching the testcase, the CVS web interface no longer works on
sourceware.org.
Jan
[-- Attachment #2: user-regs-peekpoke.c --]
[-- Type: text/plain, Size: 10499 bytes --]
/* Test case for PTRACE_SETREGS modifying the requested ragisters.
x86* counterpart of the s390* testcase `user-area-access.c'.
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely. */
/* FIXME: EFLAGS should be tested restricted on the appropriate bits. */
#define _GNU_SOURCE 1
#if defined __powerpc__ || defined __sparc__
# define user_regs_struct pt_regs
#endif
#ifdef __ia64__
#define ia64_fpreg ia64_fpreg_DISABLE
#define pt_all_user_regs pt_all_user_regs_DISABLE
#endif /* __ia64__ */
#include <sys/ptrace.h>
#ifdef __ia64__
#undef ia64_fpreg
#undef pt_all_user_regs
#endif /* __ia64__ */
#include <linux/ptrace.h>
#include <sys/types.h>
#include <sys/user.h>
#if defined __i386__ || defined __x86_64__
#include <sys/debugreg.h>
#endif
#include <asm/unistd.h>
#include <assert.h>
#include <errno.h>
#include <error.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <string.h>
#include <stddef.h>
/* ia64 has PTRACE_SETREGS but it has no USER_REGS_STRUCT. */
#if !defined PTRACE_SETREGS || defined __ia64__
int
main (void)
{
return 77;
}
#else /* PTRACE_SETREGS */
/* The minimal alignment we use for the random access ranges. */
#define REGALIGN (sizeof (long))
static pid_t child;
static void
cleanup (void)
{
if (child > 0)
kill (child, SIGKILL);
child = 0;
}
static void
handler_fail (int signo)
{
cleanup ();
signal (SIGABRT, SIG_DFL);
abort ();
}
int
main (void)
{
long l;
int status, i;
pid_t pid;
union
{
struct user_regs_struct user;
unsigned char byte[sizeof (struct user_regs_struct)];
} u, u2;
int start;
setbuf (stdout, NULL);
atexit (cleanup);
signal (SIGABRT, handler_fail);
signal (SIGALRM, handler_fail);
signal (SIGINT, handler_fail);
i = alarm (10);
assert (i == 0);
child = fork ();
switch (child)
{
case -1:
assert_perror (errno);
assert (0);
case 0:
l = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
assert (l == 0);
// Prevent rt_sigprocmask() call called by glibc after raise().
syscall (__NR_tkill, getpid (), SIGSTOP);
assert (0);
default:
break;
}
pid = waitpid (child, &status, 0);
assert (pid == child);
assert (WIFSTOPPED (status));
assert (WSTOPSIG (status) == SIGSTOP);
/* Fetch U2 from the inferior. */
errno = 0;
# ifdef __sparc__
l = ptrace (PTRACE_GETREGS, child, &u2.user, NULL);
# else
l = ptrace (PTRACE_GETREGS, child, NULL, &u2.user);
# endif
assert_perror (errno);
assert (l == 0);
/* Initialize U with a pattern. */
for (i = 0; i < sizeof u.byte; i++)
u.byte[i] = i;
#ifdef __x86_64__
/* non-EFLAGS modifications fail with EIO, EFLAGS gets back different. */
u.user.eflags = u2.user.eflags;
u.user.cs = u2.user.cs;
u.user.ds = u2.user.ds;
u.user.es = u2.user.es;
u.user.fs = u2.user.fs;
u.user.gs = u2.user.gs;
u.user.ss = u2.user.ss;
u.user.fs_base = u2.user.fs_base;
u.user.gs_base = u2.user.gs_base;
/* RHEL-4 refuses to set too high (and invalid) PC values. */
u.user.rip = (unsigned long) handler_fail;
/* 2.6.25 always truncates and sign-extends orig_rax. */
u.user.orig_rax = (int) u.user.orig_rax;
#endif /* __x86_64__ */
#ifdef __i386__
/* These values get back different. */
u.user.xds = u2.user.xds;
u.user.xes = u2.user.xes;
u.user.xfs = u2.user.xfs;
u.user.xgs = u2.user.xgs;
u.user.xcs = u2.user.xcs;
u.user.eflags = u2.user.eflags;
u.user.xss = u2.user.xss;
/* RHEL-4 refuses to set too high (and invalid) PC values. */
u.user.eip = (unsigned long) handler_fail;
#endif /* __i386__ */
#ifdef __powerpc__
/* These fields are constrained. */
u.user.msr = u2.user.msr;
# ifdef __powerpc64__
u.user.softe = u2.user.softe;
# else
u.user.mq = u2.user.mq;
# endif /* __powerpc64__ */
u.user.trap = u2.user.trap;
u.user.dar = u2.user.dar;
u.user.dsisr = u2.user.dsisr;
u.user.result = u2.user.result;
#endif /* __powerpc__ */
/* Poke U. */
# ifdef __sparc__
l = ptrace (PTRACE_SETREGS, child, &u.user, NULL);
# else
l = ptrace (PTRACE_SETREGS, child, NULL, &u.user);
# endif
assert (l == 0);
/* Peek into U2. */
# ifdef __sparc__
l = ptrace (PTRACE_GETREGS, child, &u2.user, NULL);
# else
l = ptrace (PTRACE_GETREGS, child, NULL, &u2.user);
# endif
assert (l == 0);
/* Verify it matches. */
if (memcmp (&u.user, &u2.user, sizeof u.byte) != 0)
{
for (start = 0; start + REGALIGN <= sizeof u.byte; start += REGALIGN)
if (*(unsigned long *) (u.byte + start)
!= *(unsigned long *) (u2.byte + start))
printf ("\
mismatch at offset %#x: SETREGS wrote %lx GETREGS read %lx\n",
start, *(unsigned long *) (u.byte + start),
*(unsigned long *) (u2.byte + start));
return 1;
}
/* Reverse the pattern. */
for (i = 0; i < sizeof u.byte; i++)
u.byte[i] ^= -1;
#ifdef __x86_64__
/* non-EFLAGS modifications fail with EIO, EFLAGS gets back different. */
u.user.eflags = u2.user.eflags;
u.user.cs = u2.user.cs;
u.user.ds = u2.user.ds;
u.user.es = u2.user.es;
u.user.fs = u2.user.fs;
u.user.gs = u2.user.gs;
u.user.ss = u2.user.ss;
u.user.fs_base = u2.user.fs_base;
u.user.gs_base = u2.user.gs_base;
/* RHEL-4 refuses to set too high (and invalid) PC values. */
u.user.rip = (unsigned long) handler_fail;
/* 2.6.25 always truncates and sign-extends orig_rax. */
u.user.orig_rax = (int) u.user.orig_rax;
#endif /* __x86_64__ */
#ifdef __i386__
/* These values get back different. */
u.user.xds = u2.user.xds;
u.user.xes = u2.user.xes;
u.user.xfs = u2.user.xfs;
u.user.xgs = u2.user.xgs;
u.user.xcs = u2.user.xcs;
u.user.eflags = u2.user.eflags;
u.user.xss = u2.user.xss;
/* RHEL-4 refuses to set too high (and invalid) PC values. */
u.user.eip = (unsigned long) handler_fail;
#endif /* __i386__ */
#ifdef __powerpc__
/* These fields are constrained. */
u.user.msr = u2.user.msr;
# ifdef __powerpc64__
u.user.softe = u2.user.softe;
# else
u.user.mq = u2.user.mq;
# endif /* __powerpc64__ */
u.user.trap = u2.user.trap;
u.user.dar = u2.user.dar;
u.user.dsisr = u2.user.dsisr;
u.user.result = u2.user.result;
#endif /* __powerpc__ */
/* Poke U. */
# ifdef __sparc__
l = ptrace (PTRACE_SETREGS, child, &u.user, NULL);
# else
l = ptrace (PTRACE_SETREGS, child, NULL, &u.user);
# endif
assert (l == 0);
/* Peek into U2. */
# ifdef __sparc__
l = ptrace (PTRACE_GETREGS, child, &u2.user, NULL);
# else
l = ptrace (PTRACE_GETREGS, child, NULL, &u2.user);
# endif
assert (l == 0);
/* Verify it matches. */
if (memcmp (&u.user, &u2.user, sizeof u.byte) != 0)
{
for (start = 0; start + REGALIGN <= sizeof u.byte; start += REGALIGN)
if (*(unsigned long *) (u.byte + start)
!= *(unsigned long *) (u2.byte + start))
printf ("\
mismatch at offset %#x: SETREGS wrote %lx GETREGS read %lx\n",
start, *(unsigned long *) (u.byte + start),
*(unsigned long *) (u2.byte + start));
return 1;
}
/* Now try poking arbitrary ranges and verifying it reads back right.
We expect the U area is already a random enough pattern. */
for (start = 0; start + REGALIGN <= sizeof u.byte; start += REGALIGN)
{
for (i = start; i < start + REGALIGN; i++)
u.byte[i]++;
#ifdef __x86_64__
/* non-EFLAGS modifications fail with EIO, EFLAGS gets back different. */
u.user.eflags = u2.user.eflags;
u.user.cs = u2.user.cs;
u.user.ds = u2.user.ds;
u.user.es = u2.user.es;
u.user.fs = u2.user.fs;
u.user.gs = u2.user.gs;
u.user.ss = u2.user.ss;
u.user.fs_base = u2.user.fs_base;
u.user.gs_base = u2.user.gs_base;
/* RHEL-4 refuses to set too high (and invalid) PC values. */
u.user.rip = (unsigned long) handler_fail;
/* 2.6.25 always truncates and sign-extends orig_rax. */
u.user.orig_rax = (int) u.user.orig_rax;
#endif /* __x86_64__ */
#ifdef __i386__
/* These values get back different. */
u.user.xds = u2.user.xds;
u.user.xes = u2.user.xes;
u.user.xfs = u2.user.xfs;
u.user.xgs = u2.user.xgs;
u.user.xcs = u2.user.xcs;
u.user.eflags = u2.user.eflags;
u.user.xss = u2.user.xss;
/* RHEL-4 refuses to set too high (and invalid) PC values. */
u.user.eip = (unsigned long) handler_fail;
#endif /* __i386__ */
#ifdef __powerpc__
/* These fields are constrained. */
u.user.msr = u2.user.msr;
# ifdef __powerpc64__
u.user.softe = u2.user.softe;
# else
u.user.mq = u2.user.mq;
# endif /* __powerpc64__ */
u.user.trap = u2.user.trap;
u.user.dar = u2.user.dar;
u.user.dsisr = u2.user.dsisr;
u.user.result = u2.user.result;
if (start > offsetof (struct pt_regs, ccr))
break;
#endif /* __powerpc__ */
/* Poke U. */
l = ptrace (PTRACE_POKEUSER, child, (void *) (unsigned long) start,
(void *) *(unsigned long *) (u.byte + start));
if (l != 0)
error (1, errno, "PTRACE_POKEUSER at %x", start);
/* Peek into U2. */
# ifdef __sparc__
l = ptrace (PTRACE_GETREGS, child, &u2.user, NULL);
# else
l = ptrace (PTRACE_GETREGS, child, NULL, &u2.user);
# endif
assert (l == 0);
/* Verify it matches. */
if (memcmp (&u.user, &u2.user, sizeof u.byte) != 0)
{
printf ("mismatch at offset %#x: poked %lx but GETREGS read %lx\n",
start, *(unsigned long *) (u.byte + start),
*(unsigned long *) (u2.byte + start));
return 1;
}
}
/* Now try peeking arbitrary ranges and verifying it is the same.
We expect the U area is already a random enough pattern. */
for (start = 0; start + REGALIGN <= sizeof u.byte; start += REGALIGN)
{
/* Peek for the U comparation. */
errno = 0;
l = ptrace (PTRACE_PEEKUSER, child, (void *) (unsigned long) start,
NULL);
assert_perror (errno);
/* Verify it matches. */
if (*(unsigned long *) (u.byte + start) != l)
{
printf ("mismatch at offset %#x: poked %lx but peeked %lx\n",
start, *(unsigned long *) (u.byte + start), l);
return 1;
}
}
return 0;
}
#endif /* PTRACE_SETREGS */
^ permalink raw reply
* [PATCH] powerpc/kvm/book3s64/nested: Fix kernel crash with nested kvm
From: Aneesh Kumar K.V @ 2020-06-11 12:01 UTC (permalink / raw)
To: paulus, kvm-ppc; +Cc: Aneesh Kumar K.V, linuxppc-dev
__pa() do check for addr value passed and if < PAGE_OFFSET
results in BUG.
#define __pa(x) \
({ \
VIRTUAL_BUG_ON((unsigned long)(x) < PAGE_OFFSET); \
(unsigned long)(x) & 0x0fffffffffffffffUL; \
})
kvmhv_copy_tofrom_guest_radix() use a NULL value for
to/from to indicate direction of copy. Avoid calling __pa() if the
value is NULL
kernel BUG at arch/powerpc/kvm/book3s_64_mmu_radix.c:43!
cpu 0x70: Vector: 700 (Program Check) at [c0000018a2187360]
pc: c000000000161b30: __kvmhv_copy_tofrom_guest_radix+0x130/0x1f0
lr: c000000000161d5c: kvmhv_copy_from_guest_radix+0x3c/0x80
....
[c0000018a2187670] c000000000161d5c kvmhv_copy_from_guest_radix+0x3c/0x80
[c0000018a21876b0] c00000000014feb8 kvmhv_load_from_eaddr+0x48/0xc0
[c0000018a21876e0] c000000000135828 kvmppc_ld+0x98/0x1e0
[c0000018a2187780] c00000000013bc20 kvmppc_load_last_inst+0x50/0x90
[c0000018a21877b0] c00000000015e9e8 kvmppc_hv_emulate_mmio+0x288/0x2b0
[c0000018a2187810] c000000000164888 kvmppc_book3s_radix_page_fault+0xd8/0x2b0
[c0000018a21878c0] c00000000015ed8c kvmppc_book3s_hv_page_fault+0x37c/0x1050
[c0000018a2187a00] c00000000015a518 kvmppc_vcpu_run_hv+0xbb8/0x1080
[c0000018a2187b20] c00000000013d204 kvmppc_vcpu_run+0x34/0x50
[c0000018a2187b40] c00000000013949c kvm_arch_vcpu_ioctl_run+0x2fc/0x410
[c0000018a2187bd0] c00000000012a2a4 kvm_vcpu_ioctl+0x2b4/0x8f0
[c0000018a2187d50] c0000000005b12a4 ksys_ioctl+0xf4/0x150
[c0000018a2187da0] c0000000005b1328 sys_ioctl+0x28/0x80
[c0000018a2187dc0] c000000000030584 system_call_exception+0x104/0x1d0
[c0000018a2187e20] c00000000000ca68 system_call_common+0xe8/0x214
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/kvm/book3s_64_mmu_radix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 02219e28b1e4..84acb4769487 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -40,7 +40,8 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
/* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */
if (kvmhv_on_pseries())
return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr,
- __pa(to), __pa(from), n);
+ (to != NULL) ? __pa(to): 0,
+ (from != NULL) ? __pa(from): 0, n);
quadrant = 1;
if (!pid)
--
2.26.2
^ permalink raw reply related
* Re: [PATCH v2] All arch: remove system call sys_sysctl
From: Eric W. Biederman @ 2020-06-11 11:43 UTC (permalink / raw)
To: Xiaoming Ni
Cc: linux-sh, catalin.marinas, paulus, ak, paulburton, geert,
mattst88, brgerst, acme, cyphar, viro, luto, tglx, surenb, rth,
young.liuyang, linux-parisc, rdunlap, linux-kernel, mcgrof,
linux-fsdevel, akpm, mark.rutland, linux-ia64, linux-xtensa,
jongk, linux, James.Bottomley, jcmvbkbc, linux-s390, ysato,
deller, yzaikin, mszeredi, gor, linux-alpha, linux-m68k,
linux-arm-kernel, chris, tony.luck, linux-api, zhouyanjie,
minchan, sargun, alexander.shishkin, heiko.carstens,
alex.huangjianhui, will, krzk, borntraeger, vbabka, samitolvanen,
flameeyes, ravi.bangoria, elver, keescook, arnd, bp, christian,
tsbogend, jiri, martin.petersen, yamada.masahiro, oleg,
sudeep.holla, olof, shawnguo, davem, bauerman, dalias, fenghua.yu,
peterz, dhowells, hpa, sparclinux, jolsa, svens, x86, linux,
mingo, naveen.n.rao, paulmck, sfr, npiggin, namhyung, dvyukov,
axboe, monstr, haolee.swjtu, linux-mips, ink, linuxppc-dev
In-Reply-To: <1591847640-124894-1-git-send-email-nixiaoming@huawei.com>
Xiaoming Ni <nixiaoming@huawei.com> writes:
> Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
> sys_sysctl is actually unavailable: any input can only return an error.
>
> We have been warning about people using the sysctl system call for years
> and believe there are no more users. Even if there are users of this
> interface if they have not complained or fixed their code by now they
> probably are not going to, so there is no point in warning them any
> longer.
>
> So completely remove sys_sysctl on all architectures.
>
> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
>
> changes in v2:
> According to Kees Cook's suggestion, completely remove sys_sysctl on all arch
> According to Eric W. Biederman's suggestion, update the commit log
>
> V1: https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaoming@huawei.com/
> Delete the code of sys_sysctl and return -ENOSYS directly at the function entry
> ---
> include/uapi/linux/sysctl.h | 15 --
[snip]
> diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> index 27c1ed2..84b44c3 100644
> --- a/include/uapi/linux/sysctl.h
> +++ b/include/uapi/linux/sysctl.h
> @@ -27,21 +27,6 @@
> #include <linux/types.h>
> #include <linux/compiler.h>
>
> -#define CTL_MAXNAME 10 /* how many path components do we allow in a
> - call to sysctl? In other words, what is
> - the largest acceptable value for the nlen
> - member of a struct __sysctl_args to have? */
> -
> -struct __sysctl_args {
> - int __user *name;
> - int nlen;
> - void __user *oldval;
> - size_t __user *oldlenp;
> - void __user *newval;
> - size_t newlen;
> - unsigned long __unused[4];
> -};
> -
> /* Define sysctl names first */
>
> /* Top-level names: */
[snip]
The uapi header change does not make sense. The entire point of the
header is to allow userspace programs to be able to call sys_sysctl.
It either needs to all stay or all go.
As the concern with the uapi header is about userspace programs being
able to compile please leave the header for now.
We should leave auditing userspace and seeing if userspace code will
still compile if we remove this header for a separate patch. The
concerns and justifications for the uapi header are completely different
then for the removing the sys_sysctl implementation.
Otherwise
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Eric
^ permalink raw reply
* [PATCH] powerpc/64: indirect function call use bctrl rather than blrl in ret_from_kernel_thread
From: Nicholas Piggin @ 2020-06-11 12:11 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin
blrl is not recommended to use as an indirect function call, as it may
corrupt the link stack predictor.
This is not a performance critical path but this should be fixed for
consistency.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kernel/entry_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 223c4f008e63..f59a17471d4d 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -400,12 +400,12 @@ _GLOBAL(ret_from_fork)
_GLOBAL(ret_from_kernel_thread)
bl schedule_tail
REST_NVGPRS(r1)
- mtlr r14
+ mtctr r14
mr r3,r15
#ifdef PPC64_ELF_ABI_v2
mr r12,r14
#endif
- blrl
+ bctrl
li r3,0
b .Lsyscall_exit
--
2.23.0
^ permalink raw reply related
* Re: [PATCH] powerpc/64: indirect function call use bctrl rather than blrl in ret_from_kernel_thread
From: Christophe Leroy @ 2020-06-11 12:26 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20200611121119.1015740-1-npiggin@gmail.com>
Le 11/06/2020 à 14:11, Nicholas Piggin a écrit :
> blrl is not recommended to use as an indirect function call, as it may
> corrupt the link stack predictor.
>
> This is not a performance critical path but this should be fixed for
> consistency.
There's exactly the same in entry_32.S
Should it be changed there too ... for consistency :) ?
ppc32 also uses blrl for calling syscall handler, should it be changed
as well ?
Christophe
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/kernel/entry_64.S | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 223c4f008e63..f59a17471d4d 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -400,12 +400,12 @@ _GLOBAL(ret_from_fork)
> _GLOBAL(ret_from_kernel_thread)
> bl schedule_tail
> REST_NVGPRS(r1)
> - mtlr r14
> + mtctr r14
> mr r3,r15
> #ifdef PPC64_ELF_ABI_v2
> mr r12,r14
> #endif
> - blrl
> + bctrl
> li r3,0
> b .Lsyscall_exit
>
>
^ permalink raw reply
* Re: PowerPC KVM-PR issue
From: Christian Zigotzky @ 2020-06-11 14:47 UTC (permalink / raw)
To: linuxppc-dev, npiggin, kvm-ppc@vger.kernel.org
Cc: Darren Stevens, R.T.Dickinson, Christian Zigotzky
In-Reply-To: <7e859f68-9455-f98f-1fa3-071619fa1731@xenosoft.de>
On 10 June 2020 at 01:23 pm, Christian Zigotzky wrote:
> On 10 June 2020 at 11:06 am, Christian Zigotzky wrote:
>> On 10 June 2020 at 00:18 am, Christian Zigotzky wrote:
>>> Hello,
>>>
>>> KVM-PR doesn't work anymore on my Nemo board [1]. I figured out that
>>> the Git kernels and the kernel 5.7 are affected.
>>>
>>> Error message: Fienix kernel: kvmppc_exit_pr_progint: emulation at
>>> 700 failed (00000000)
>>>
>>> I can boot virtual QEMU PowerPC machines with KVM-PR with the kernel
>>> 5.6 without any problems on my Nemo board.
>>>
>>> I tested it with QEMU 2.5.0 and QEMU 5.0.0 today.
>>>
>>> Could you please check KVM-PR on your PowerPC machine?
>>>
>>> Thanks,
>>> Christian
>>>
>>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
>>
>> I figured out that the PowerPC updates 5.7-1 [1] are responsible for
>> the KVM-PR issue. Please test KVM-PR on your PowerPC machines and
>> check the PowerPC updates 5.7-1 [1].
>>
>> Thanks
>>
>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969
>>
>>
> I tested the latest Git kernel with Mac-on-Linux/KVM-PR today.
> Unfortunately I can't use KVM-PR with MoL anymore because of this
> issue (see screenshots [1]). Please check the PowerPC updates 5.7-1.
>
> Thanks
>
> [1]
> -
> https://i.pinimg.com/originals/0c/b3/64/0cb364a40241fa2b7f297d4272bbb8b7.png
> -
> https://i.pinimg.com/originals/9a/61/d1/9a61d170b1c9f514f7a78a3014ffd18f.png
>
Hi All,
I bisected today because of the KVM-PR issue.
Result:
9600f261acaaabd476d7833cec2dd20f2919f1a0 is the first bad commit
commit 9600f261acaaabd476d7833cec2dd20f2919f1a0
Author: Nicholas Piggin <npiggin@gmail.com>
Date: Wed Feb 26 03:35:21 2020 +1000
powerpc/64s/exception: Move KVM test to common code
This allows more code to be moved out of unrelocated regions. The
system call KVMTEST is changed to be open-coded and remain in the
tramp area to avoid having to move it to entry_64.S. The custom nature
of the system call entry code means the hcall case can be made more
streamlined than regular interrupt handlers.
mpe: Incorporate fix from Nick:
Moving KVM test to the common entry code missed the case of HMI and
MCE, which do not do __GEN_COMMON_ENTRY (because they don't want to
switch to virt mode).
This means a MCE or HMI exception that is taken while KVM is running a
guest context will not be switched out of that context, and KVM won't
be notified. Found by running sigfuz in guest with patched host on
POWER9 DD2.3, which causes some TM related HMI interrupts (which are
expected and supposed to be handled by KVM).
This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add
the KVM test. This makes them look a little more like other handlers
that all use __GEN_COMMON_ENTRY.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link:
https://lore.kernel.org/r/20200225173541.1549955-13-npiggin@gmail.com
:040000 040000 ec21cec22d165f8696d69532734cb2985d532cb0
87dd49a9cd7202ec79350e8ca26cea01f1dbd93d M arch
-----
The following commit is the problem: powerpc/64s/exception: Move KVM
test to common code [1]
These changes were included in the PowerPC updates 5.7-1. [2]
Another test:
git checkout d38c07afc356ddebaa3ed8ecb3f553340e05c969 (PowerPC updates
5.7-1 [2] ) -> KVM-PR doesn't work.
After that: git revert d38c07afc356ddebaa3ed8ecb3f553340e05c969 -m 1 ->
KVM-PR works.
Could you please check the first bad commit? [1]
Thanks,
Christian
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9600f261acaaabd476d7833cec2dd20f2919f1a0
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d38c07afc356ddebaa3ed8ecb3f553340e05c969
^ permalink raw reply
* Re: [PATCH v11 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
From: Vaibhav Jain @ 2020-06-11 18:03 UTC (permalink / raw)
To: Dan Williams
Cc: Santosh Sivaraj, linux-nvdimm, Aneesh Kumar K . V,
Linux Kernel Mailing List, Oliver O'Halloran, linuxppc-dev
In-Reply-To: <CAPcyv4h_0qSqS2P0=vNk9KWy-=WZq-giNupks+Q0+wmYVt9iLA@mail.gmail.com>
Dan Williams <dan.j.williams@intel.com> writes:
> On Wed, Jun 10, 2020 at 5:10 AM Vaibhav Jain <vaibhav@linux.ibm.com> wrote:
>>
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>> > On Tue, Jun 9, 2020 at 10:54 AM Vaibhav Jain <vaibhav@linux.ibm.com> wrote:
>> >>
>> >> Thanks Dan for the consideration and taking time to look into this.
>> >>
>> >> My responses below:
>> >>
>> >> Dan Williams <dan.j.williams@intel.com> writes:
>> >>
>> >> > On Mon, Jun 8, 2020 at 5:16 PM kernel test robot <lkp@intel.com> wrote:
>> >> >>
>> >> >> Hi Vaibhav,
>> >> >>
>> >> >> Thank you for the patch! Perhaps something to improve:
>> >> >>
>> >> >> [auto build test WARNING on powerpc/next]
>> >> >> [also build test WARNING on linus/master v5.7 next-20200605]
>> >> >> [cannot apply to linux-nvdimm/libnvdimm-for-next scottwood/next]
>> >> >> [if your patch is applied to the wrong git tree, please drop us a note to help
>> >> >> improve the system. BTW, we also suggest to use '--base' option to specify the
>> >> >> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>> >> >>
>> >> >> url: https://github.com/0day-ci/linux/commits/Vaibhav-Jain/powerpc-papr_scm-Add-support-for-reporting-nvdimm-health/20200607-211653
>> >> >> base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
>> >> >> config: powerpc-randconfig-r016-20200607 (attached as .config)
>> >> >> compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project e429cffd4f228f70c1d9df0e5d77c08590dd9766)
>> >> >> reproduce (this is a W=1 build):
>> >> >> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>> >> >> chmod +x ~/bin/make.cross
>> >> >> # install powerpc cross compiling tool for clang build
>> >> >> # apt-get install binutils-powerpc-linux-gnu
>> >> >> # save the attached .config to linux build tree
>> >> >> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc
>> >> >>
>> >> >> If you fix the issue, kindly add following tag as appropriate
>> >> >> Reported-by: kernel test robot <lkp@intel.com>
>> >> >>
>> >> >> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>> >> >>
>> >> >> In file included from <built-in>:1:
>> >> >> >> ./usr/include/asm/papr_pdsm.h:69:20: warning: field 'hdr' with variable sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
>> >> >> struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> >> >
>> >> > Hi Vaibhav,
>> >> >
>> >> [.]
>> >> > This looks like it's going to need another round to get this fixed. I
>> >> > don't think 'struct nd_pdsm_cmd_pkg' should embed a definition of
>> >> > 'struct nd_cmd_pkg'. An instance of 'struct nd_cmd_pkg' carries a
>> >> > payload that is the 'pdsm' specifics. As the code has it now it's
>> >> > defined as a superset of 'struct nd_cmd_pkg' and the compiler warning
>> >> > is pointing out a real 'struct' organization problem.
>> >> >
>> >> > Given the soak time needed in -next after the code is finalized this
>> >> > there's no time to do another round of updates and still make the v5.8
>> >> > merge window.
>> >>
>> >> Agreed that this looks bad, a solution will probably need some more
>> >> review cycles resulting in this series missing the merge window.
>> >>
>> >> I am investigating into the possible solutions for this reported issue
>> >> and made few observations:
>> >>
>> >> I see command pkg for Intel, Hpe, Msft and Hyperv families using a
>> >> similar layout of embedding nd_cmd_pkg at the head of the
>> >> command-pkg. struct nd_pdsm_cmd_pkg is following the same pattern.
>> >>
>> >> struct nd_pdsm_cmd_pkg {
>> >> struct nd_cmd_pkg hdr;
>> >> /* other members */
>> >> };
>> >>
>> >> struct ndn_pkg_msft {
>> >> struct nd_cmd_pkg gen;
>> >> /* other members */
>> >> };
>> >> struct nd_pkg_intel {
>> >> struct nd_cmd_pkg gen;
>> >> /* other members */
>> >> };
>> >> struct ndn_pkg_hpe1 {
>> >> struct nd_cmd_pkg gen;
>> >> /* other members */
>> [.]
>> >
>> > In those cases the other members are a union and there is no second
>> > variable length array. Perhaps that is why those definitions are not
>> > getting flagged? I'm not seeing anything in ndctl build options that
>> > would explicitly disable this warning, but I'm not sure if the ndctl
>> > build environment is missing this build warning by accident.
>>
>> I tried building ndctl master with clang-10 with CC=clang and
>> CFLAGS="". Seeing the same warning messages reported for all command
>> package struct for existing command families.
>>
>> ./hpe1.h:334:20: warning: field 'gen' with variable sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
>> struct nd_cmd_pkg gen;
>> ^
>> ./msft.h:59:20: warning: field 'gen' with variable sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
>> struct nd_cmd_pkg gen;
>> ^
>> ./hyperv.h:34:20: warning: field 'gen' with variable sized type 'struct nd_cmd_pkg' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
>> struct nd_cmd_pkg gen;
>> ^
>
[.]
> Good to know, but ugh now I'm just realizing this warning is only
> coming from clang and not gcc. Frankly I'm not as concerned about
> clang warnings and I should have been more careful looking at the
> source of this warning.
Thanks for acknowledging this.
I digged deeper into this today and it seems that with clang, kernel code
is compiled with diagnostic flag '-Wno-gnu' [1][2] which implicitly implies
'-Wno-gnu-variable-sized-type-not-at-end'. Hence the structures with
flexible arrays not the end of containing struct are not flagged in
kernel code.
[1] https://github.com/torvalds/linux/blob/b29482fde649c72441d5478a4ea2c52c56d97a5e/Makefile#L788
[2] https://clang.llvm.org/docs/DiagnosticsReference.html#wgnu
However this dignostic flag is not used for uapi header test hence
build robot emmited this warning while trying to test compile
'papr_pdsm.h' uapi header.
>
>> >
>> > Those variable size payloads are also not being used in any code paths
>> > that would look at the size of the command payload, like the kernel
>> > ioctl() path. The payload validation code needs static sizes and the
>> > payload parsing code wants to cast the payload to a known type. I
>> > don't think you can use the same struct definition for both those
>> > cases which is why the ndctl parsing code uses the union layout, but
>> > the kernel command marshaling code does strict layering.
>> Even if I switch to union layout and replacing the flexible array 'payload'
>> at end to a fixed size array something like below, I still see
>> '-Wgnu-variable-sized-type-not-at-end' warning reported by clang:
>>
>> union nd_pdsm_cmd_payload {
>> struct nd_papr_pdsm_health health;
>> __u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
>> };
>>
>> struct nd_pdsm_cmd_pkg {
>> struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> __u16 reserved[2]; /* Ignored and to be used in future */
>> union nd_pdsm_cmd_payload payload;
>> } __attribute__((packed));
>
> Even though this is a clang warning, I'm still not sure it's a good
> idea to copy the ndctl approach into the kernel. Could you perhaps
> handle this the way that drivers/acpi/nfit/intel.c handles submitting
> commands through the ND_CMD_CALL interface, i.e. by just defining the
> command locally like this (from intel_security_flags()):
>
> struct {
> struct nd_cmd_pkg pkg;
> struct nd_intel_get_security_state cmd;
> } nd_cmd = {
> .pkg = {
> .nd_command = NVDIMM_INTEL_GET_SECURITY_STATE,
> .nd_family = NVDIMM_FAMILY_INTEL,
> .nd_size_out =
> sizeof(struct nd_intel_get_security_state),
> .nd_fw_size =
> sizeof(struct nd_intel_get_security_state),
> },
> };
>
> That way it's clear that the payload is 'struct
> nd_intel_get_security_state' without needing to have a pre-existing
> definition. For parsing in the ioctl path I think it's clearer to cast
> the payload to the local pdsm structure for the command.
>
In userspace libndctl code doesnt use '-Wno-gnu' (yet) hence this would
still be reported as a warning. Also for each pdsm I want a consistent
way to report errors back. Above would force me to define a 'status'
field to report error in every pdsm payload struct.
I have two possible solutions to work around the clang and 'status'
field issue:
1. I remove instance of 'struct nd_cmd_pkg' from 'nd_pdsm_cmd_pkg' like
below. This should make the clang warning go away and I still keep the
'cmd_status' field.
struct nd_pdsm_cmd_pkg {
__s32 cmd_status; /* Out: Sub-cmd status returned back */
__u16 reserved[2]; /* Ignored and to be used in future */
__u8 payload[]; /* In/Out: Sub-cmd data buffer */
} __packed;
When sending CMD_CALL allocate and populate an envelop large enough to
hold generic nd_cmd header, pdsm header and the payload like below:
0 64 72 255
+------------+-----------------+--------------------------------+
|nd_cmd_pkg | nd_pdsm_cmd_pkg | payload |
+------------+-----------------+--------------------------------+
pdsm handling code introduced in this patchset already uses helpers to
convert nd_cmd_pkg -> nd_pdsm_cmd_pkg and nd_pdsm_cmd_pkg -> payload. So
the impact to the patchset should be contained to these helper
functions. There are places in pdsm service functions that directly
access members of nd_cmd_pkg which may need some tweaking.
2. I open-code members of 'struct nd_cmd_pkg' at start of 'struct
nd_pdsm_cmd_pkg' except the nd_payload field like below. This struct
should ensure ABI compatibility with 'struct nd_cmd_pkg'.
struct nd_pdsm_cmd_pkg {
__u64 nd_family; /* family of commands */
__u64 nd_command;
__u32 nd_size_in; /* INPUT: size of input args */
__u32 nd_size_out; /* INPUT: size of payload */
__u32 nd_reserved2[9]; /* reserved must be zero */
__u32 nd_fw_size; /* OUTPUT: size fw wants to return */
__s32 cmd_status; /* Out: Sub-cmd status returned back */
__u16 reserved[2]; /* Ignored and to be used in future */
__u8 payload[]; /* In/Out: Sub-cmd data buffer */
} __packed;
BULD_BUG_ON((sizeof(struct nd_cmd_pkg) + 8) > sizeof(struct nd_pdsm_cmd_pkg))
>>
>>
>> >
>> >> };
>> >>
>> >> Even though other command families implement similar command-package
>> >> layout they were not flagged (yet) as they are (I am guessing) serviced
>> >> in vendor acpi drivers rather than in kernel like in case of papr-scm
>> >> command family.
>> >
>> > I sincerely hope there are no vendor acpi kernel drivers outside of
>> > the upstream one.
>> Apologies if I was not clear. Was referring to nvdimm vendor uefi
>> drivers which ultimately service the DSM commands. Since CMD_CALL serves
>> as a conduit to send the command payload to these vendor drivers,
>> libnvdimm never needs to peek into the nd_cmd_pkg.payload
>> field. Consequently nfit module never hit this warning in kernel before.
>
> Ah, understood, and no, that's not the root reason this problem is not
> present in the kernel. The expectation is that any payload that the
> kernel would need to consider should probably have a kernel specific
> translation defined. For example,
>
> ND_CMD_GET_CONFIG_SIZE
> ND_CMD_GET_CONFIG_DATA
> ND_CMD_SET_CONFIG_DATA
>
> ...are payloads that the kernel needs to understand. However instead
> of supporting each way to read / write the label area the expectation
> is that all drivers just parse this common kernel payload and
> translate it if necessary. For example ND_CMD_{GET,SET}_CONFIG_DATA is
> optionally translated to the Intel DSMs, generic ACPI _LSR/_LSW, or
> papr_scm_meta_{get,set}.
>
> Outside of validating command numbers the expectation is that the
> kernel does not validate/consume the contents of the ND_CMD_CALL
> payload, it passes it to the backend where ACPI DSM or pdsm protocol
> takes over.
Right, but arent those independent IOCTLs to libnvdimm with a fixed
predefined struct thats exchanged with libndctl. Not sure how can that
help with exchanging pdsms with papr_scm that are variable in length and
can only rely on CMD_CALL ioctl.
--
Cheers
~ Vaibhav
^ permalink raw reply
* Re: [PATCH v5 2/4] riscv: Introduce CONFIG_RELOCATABLE
From: Alex Ghiti @ 2020-06-11 19:43 UTC (permalink / raw)
To: Jerome Forissier, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Anup Patel, Atish Patra, Zong Li, linux-kernel, linuxppc-dev,
linux-riscv
Cc: Anup Patel
In-Reply-To: <b588dd9e-dff8-3458-0c7d-149e3990bca7@forissier.org>
Hi Jerome,
Le 6/10/20 à 10:10 AM, Jerome Forissier a écrit :
> On 6/7/20 9:59 AM, Alexandre Ghiti wrote:
> [...]
>
>> +config RELOCATABLE
>> + bool
>> + depends on MMU
>> + help
>> + This builds a kernel as a Position Independent Executable (PIE),
>> + which retains all relocation metadata required to relocate the
>> + kernel binary at runtime to a different virtual address than the
>> + address it was linked at.
>> + Since RISCV uses the RELA relocation format, this requires a
>> + relocation pass at runtime even if the kernel is loaded at the
>> + same address it was linked at.
> Is this true? I thought that the GNU linker would write the "proper"
> values by default, contrary to the LLVM linker (ld.lld) which would need
> a special flag: --apply-dynamic-relocs (by default the relocated places
> are set to zero). At least, it is my experience with Aarch64 on a
> different project. So, sorry if I'm talking nonsense here -- I have not
> looked at the details.
>
>
It seems that you're right, at least for aarch64 since they specifically
specify the --no-apply-dynamic-relocs option. I retried to boot without
relocating at runtime, and it fails on riscv. Can this be arch specific ?
Thanks,
Alex
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox