* Re: [PATCH v15 10/10] arm64: Add IMA log information in kimage used for kexec
From: Lakshmi Ramasubramanian @ 2021-01-28 4:05 UTC (permalink / raw)
To: Will Deacon, Mimi Zohar
Cc: mark.rutland, bhsharma, tao.li, paulus, vincenzo.frascino,
frowand.list, sashal, robh, masahiroy, jmorris, takahiro.akashi,
linux-arm-kernel, catalin.marinas, serge, devicetree,
pasha.tatashin, prsriva, hsinyi, allison, christophe.leroy,
mbrugger, balajib, gregkh, linux-kernel, james.morse,
dmitry.kasatkin, linux-integrity, linuxppc-dev, bauerman
In-Reply-To: <20210127231334.GB1016@willie-the-truck>
On 1/27/21 3:13 PM, Will Deacon wrote:
> On Wed, Jan 27, 2021 at 01:31:02PM -0500, Mimi Zohar wrote:
>> On Wed, 2021-01-27 at 10:24 -0800, Lakshmi Ramasubramanian wrote:
>>> On 1/27/21 10:02 AM, Will Deacon wrote:
>>>> On Wed, Jan 27, 2021 at 09:56:53AM -0800, Lakshmi Ramasubramanian wrote:
>>>>> On 1/27/21 8:54 AM, Will Deacon wrote:
>>>>>> On Fri, Jan 15, 2021 at 09:30:17AM -0800, Lakshmi Ramasubramanian wrote:
>>>>>>> Address and size of the buffer containing the IMA measurement log need
>>>>>>> to be passed from the current kernel to the next kernel on kexec.
>>>>>>>
>>>>>>> Add address and size fields to "struct kimage_arch" for ARM64 platform
>>>>>>> to hold the address and size of the IMA measurement log buffer.
>>>>>>>
>>>>>>> Update CONFIG_KEXEC_FILE to select CONFIG_HAVE_IMA_KEXEC, if CONFIG_IMA
>>>>>>> is enabled, to indicate that the IMA measurement log information is
>>>>>>> present in the device tree for ARM64.
>>>>>>>
>>>>>>> Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
>>>>>>> Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
>>>>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>>>>>>> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
>>>>>>> ---
>>>>>>> arch/arm64/Kconfig | 1 +
>>>>>>> arch/arm64/include/asm/kexec.h | 5 +++++
>>>>>>> 2 files changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>>> index 1d466addb078..ea7f7fe3dccd 100644
>>>>>>> --- a/arch/arm64/Kconfig
>>>>>>> +++ b/arch/arm64/Kconfig
>>>>>>> @@ -1094,6 +1094,7 @@ config KEXEC
>>>>>>> config KEXEC_FILE
>>>>>>> bool "kexec file based system call"
>>>>>>> select KEXEC_CORE
>>>>>>> + select HAVE_IMA_KEXEC if IMA
>>>>>>> help
>>>>>>> This is new version of kexec system call. This system call is
>>>>>>> file based and takes file descriptors as system call argument
>>>>>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>>>>>>> index d24b527e8c00..2bd19ccb6c43 100644
>>>>>>> --- a/arch/arm64/include/asm/kexec.h
>>>>>>> +++ b/arch/arm64/include/asm/kexec.h
>>>>>>> @@ -100,6 +100,11 @@ struct kimage_arch {
>>>>>>> void *elf_headers;
>>>>>>> unsigned long elf_headers_mem;
>>>>>>> unsigned long elf_headers_sz;
>>>>>>> +
>>>>>>> +#ifdef CONFIG_IMA_KEXEC
>>>>>>> + phys_addr_t ima_buffer_addr;
>>>>>>> + size_t ima_buffer_size;
>>>>>>> +#endif
>>>>>>
>>>>>> Why do these need to be in the arch structure instead of 'struct kimage'?
>>>>>>
>>>>>
>>>>> Currently, only powerpc and, with this patch set, arm64 have support for
>>>>> carrying forward IMA measurement list across kexec system call. The above
>>>>> fields are used for tracking IMA measurement list.
>>>>>
>>>>> Do you see a reason to move these fields to "struct kimage"?
>>>>
>>>> If they're gated on CONFIG_IMA_KEXEC, then it seems harmless for them to
>>>> be added to the shared structure. Or are you saying that there are
>>>> architectures which have CONFIG_IMA_KEXEC but do not want these fields?
>>>>
>>>
>>> As far as I know, there are no other architectures that define
>>> CONFIG_IMA_KEXEC, but do not use these fields.
>>
>> Yes, CONFIG_IMA_KEXEC enables "carrying the IMA measurement list across
>> a soft boot". The only arch that currently carries the IMA
>> measurement across kexec is powerpc.
>
> Ok, in which case this sounds like it should be in the shared structure, no?
>
Ok - I'll move the IMA kexec buffer fields from "struct kimage_arch" to
"struct kimage" for both powerpc and arm64.
thanks,
-lakshmi
^ permalink raw reply
* [PATCH v4 07/10] powerpc/signal64: Replace restore_sigcontext() w/ unsafe_restore_sigcontext()
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
Previously restore_sigcontext() performed a costly KUAP switch on every
uaccess operation. These repeated uaccess switches cause a significant
drop in signal handling performance.
Rewrite restore_sigcontext() to assume that a userspace read access
window is open. Replace all uaccess functions with their 'unsafe'
versions which avoid the repeated uaccess switches.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 68 ++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 4248e4489ff1..d668f8af18fe 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -326,14 +326,14 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
/*
* Restore the sigcontext from the signal frame.
*/
-
-static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig,
- struct sigcontext __user *sc)
+#define unsafe_restore_sigcontext(tsk, set, sig, sc, e) \
+ unsafe_op_wrap(__unsafe_restore_sigcontext(tsk, set, sig, sc), e)
+static long notrace __unsafe_restore_sigcontext(struct task_struct *tsk, sigset_t *set,
+ int sig, struct sigcontext __user *sc)
{
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs;
#endif
- unsigned long err = 0;
unsigned long save_r13 = 0;
unsigned long msr;
struct pt_regs *regs = tsk->thread.regs;
@@ -348,27 +348,28 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig,
save_r13 = regs->gpr[13];
/* copy the GPRs */
- err |= __copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr));
- err |= __get_user(regs->nip, &sc->gp_regs[PT_NIP]);
+ unsafe_copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr),
+ efault_out);
+ unsafe_get_user(regs->nip, &sc->gp_regs[PT_NIP], efault_out);
/* get MSR separately, transfer the LE bit if doing signal return */
- err |= __get_user(msr, &sc->gp_regs[PT_MSR]);
+ unsafe_get_user(msr, &sc->gp_regs[PT_MSR], efault_out);
if (sig)
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
- err |= __get_user(regs->orig_gpr3, &sc->gp_regs[PT_ORIG_R3]);
- err |= __get_user(regs->ctr, &sc->gp_regs[PT_CTR]);
- err |= __get_user(regs->link, &sc->gp_regs[PT_LNK]);
- err |= __get_user(regs->xer, &sc->gp_regs[PT_XER]);
- err |= __get_user(regs->ccr, &sc->gp_regs[PT_CCR]);
+ unsafe_get_user(regs->orig_gpr3, &sc->gp_regs[PT_ORIG_R3], efault_out);
+ unsafe_get_user(regs->ctr, &sc->gp_regs[PT_CTR], efault_out);
+ unsafe_get_user(regs->link, &sc->gp_regs[PT_LNK], efault_out);
+ unsafe_get_user(regs->xer, &sc->gp_regs[PT_XER], efault_out);
+ unsafe_get_user(regs->ccr, &sc->gp_regs[PT_CCR], efault_out);
/* Don't allow userspace to set SOFTE */
set_trap_norestart(regs);
- err |= __get_user(regs->dar, &sc->gp_regs[PT_DAR]);
- err |= __get_user(regs->dsisr, &sc->gp_regs[PT_DSISR]);
- err |= __get_user(regs->result, &sc->gp_regs[PT_RESULT]);
+ unsafe_get_user(regs->dar, &sc->gp_regs[PT_DAR], efault_out);
+ unsafe_get_user(regs->dsisr, &sc->gp_regs[PT_DSISR], efault_out);
+ unsafe_get_user(regs->result, &sc->gp_regs[PT_RESULT], efault_out);
if (!sig)
regs->gpr[13] = save_r13;
if (set != NULL)
- err |= __get_user(set->sig[0], &sc->oldmask);
+ unsafe_get_user(set->sig[0], &sc->oldmask, efault_out);
/*
* Force reload of FP/VEC.
@@ -378,29 +379,28 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig,
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
#ifdef CONFIG_ALTIVEC
- err |= __get_user(v_regs, &sc->v_regs);
- if (err)
- return err;
+ unsafe_get_user(v_regs, &sc->v_regs, efault_out);
if (v_regs && !access_ok(v_regs, 34 * sizeof(vector128)))
return -EFAULT;
/* Copy 33 vec registers (vr0..31 and vscr) from the stack */
if (v_regs != NULL && (msr & MSR_VEC) != 0) {
- err |= __copy_from_user(&tsk->thread.vr_state, v_regs,
- 33 * sizeof(vector128));
+ unsafe_copy_from_user(&tsk->thread.vr_state, v_regs,
+ 33 * sizeof(vector128), efault_out);
tsk->thread.used_vr = true;
} else if (tsk->thread.used_vr) {
memset(&tsk->thread.vr_state, 0, 33 * sizeof(vector128));
}
/* Always get VRSAVE back */
if (v_regs != NULL)
- err |= __get_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]);
+ unsafe_get_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33],
+ efault_out);
else
tsk->thread.vrsave = 0;
if (cpu_has_feature(CPU_FTR_ALTIVEC))
mtspr(SPRN_VRSAVE, tsk->thread.vrsave);
#endif /* CONFIG_ALTIVEC */
/* restore floating point */
- err |= copy_fpr_from_user(tsk, &sc->fp_regs);
+ unsafe_copy_fpr_from_user(tsk, &sc->fp_regs, efault_out);
#ifdef CONFIG_VSX
/*
* Get additional VSX data. Update v_regs to point after the
@@ -409,14 +409,17 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig,
*/
v_regs += ELF_NVRREG;
if ((msr & MSR_VSX) != 0) {
- err |= copy_vsx_from_user(tsk, v_regs);
+ unsafe_copy_vsx_from_user(tsk, v_regs, efault_out);
tsk->thread.used_vsr = true;
} else {
for (i = 0; i < 32 ; i++)
tsk->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
}
#endif
- return err;
+ return 0;
+
+efault_out:
+ return -EFAULT;
}
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
@@ -701,8 +704,14 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
if (__copy_from_user(&set, &new_ctx->uc_sigmask, sizeof(set)))
do_exit(SIGSEGV);
set_current_blocked(&set);
- if (restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext))
+
+ if (!user_read_access_begin(new_ctx, ctx_size))
+ return -EFAULT;
+ if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) {
+ user_read_access_end();
do_exit(SIGSEGV);
+ }
+ user_read_access_end();
/* This returns like rt_sigreturn */
set_thread_flag(TIF_RESTOREALL);
@@ -807,8 +816,13 @@ SYSCALL_DEFINE0(rt_sigreturn)
* causing a TM bad thing.
*/
current->thread.regs->msr &= ~MSR_TS_MASK;
- if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext))
+ if (!user_read_access_begin(uc, sizeof(*uc)))
+ return -EFAULT;
+ if (__unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) {
+ user_read_access_end();
goto badframe;
+ }
+ user_read_access_end();
}
if (restore_altstack(&uc->uc_stack))
--
2.26.1
^ permalink raw reply related
* [PATCH v4 03/10] powerpc/signal64: Move non-inline functions out of setup_sigcontext()
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
There are non-inline functions which get called in setup_sigcontext() to
save register state to the thread struct. Move these functions into a
separate prepare_setup_sigcontext() function so that
setup_sigcontext() can be refactored later into an "unsafe" version
which assumes an open uaccess window. Non-inline functions should be
avoided when uaccess is open.
The majority of setup_sigcontext() can be refactored to execute in an
"unsafe" context (uaccess window is opened) except for some non-inline
functions. Move these out into a separate prepare_setup_sigcontext()
function which must be called first and before opening up a uaccess
window. A follow-up commit converts setup_sigcontext() to be "unsafe".
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index f9e4a1ac440f..b211a8ea4f6e 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -79,6 +79,24 @@ static elf_vrreg_t __user *sigcontext_vmx_regs(struct sigcontext __user *sc)
}
#endif
+static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_region)
+{
+#ifdef CONFIG_ALTIVEC
+ /* save altivec registers */
+ if (tsk->thread.used_vr)
+ flush_altivec_to_thread(tsk);
+ if (cpu_has_feature(CPU_FTR_ALTIVEC))
+ tsk->thread.vrsave = mfspr(SPRN_VRSAVE);
+#endif /* CONFIG_ALTIVEC */
+
+ flush_fp_to_thread(tsk);
+
+#ifdef CONFIG_VSX
+ if (tsk->thread.used_vsr && ctx_has_vsx_region)
+ flush_vsx_to_thread(tsk);
+#endif /* CONFIG_VSX */
+}
+
/*
* Set up the sigcontext for the signal frame.
*/
@@ -97,7 +115,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
*/
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs = sigcontext_vmx_regs(sc);
- unsigned long vrsave;
#endif
struct pt_regs *regs = tsk->thread.regs;
unsigned long msr = regs->msr;
@@ -112,7 +129,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
/* save altivec registers */
if (tsk->thread.used_vr) {
- flush_altivec_to_thread(tsk);
/* Copy 33 vec registers (vr0..31 and vscr) to the stack */
err |= __copy_to_user(v_regs, &tsk->thread.vr_state,
33 * sizeof(vector128));
@@ -124,17 +140,10 @@ static long setup_sigcontext(struct sigcontext __user *sc,
/* We always copy to/from vrsave, it's 0 if we don't have or don't
* use altivec.
*/
- vrsave = 0;
- if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
- vrsave = mfspr(SPRN_VRSAVE);
- tsk->thread.vrsave = vrsave;
- }
-
- err |= __put_user(vrsave, (u32 __user *)&v_regs[33]);
+ err |= __put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]);
#else /* CONFIG_ALTIVEC */
err |= __put_user(0, &sc->v_regs);
#endif /* CONFIG_ALTIVEC */
- flush_fp_to_thread(tsk);
/* copy fpr regs and fpscr */
err |= copy_fpr_to_user(&sc->fp_regs, tsk);
@@ -150,7 +159,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
* VMX data.
*/
if (tsk->thread.used_vsr && ctx_has_vsx_region) {
- flush_vsx_to_thread(tsk);
v_regs += ELF_NVRREG;
err |= copy_vsx_to_user(v_regs, tsk);
/* set MSR_VSX in the MSR value in the frame to
@@ -655,6 +663,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
ctx_has_vsx_region = 1;
if (old_ctx != NULL) {
+ prepare_setup_sigcontext(current, ctx_has_vsx_region);
if (!access_ok(old_ctx, ctx_size)
|| setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL, 0,
ctx_has_vsx_region)
@@ -842,6 +851,7 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
#endif
{
err |= __put_user(0, &frame->uc.uc_link);
+ prepare_setup_sigcontext(tsk, 1);
err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
NULL, (unsigned long)ksig->ka.sa.sa_handler,
1);
--
2.26.1
^ permalink raw reply related
* [PATCH v4 08/10] powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
From: Daniel Axtens <dja@axtens.net>
Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.
There is no 'unsafe' version of copy_siginfo_to_user, so move it
slightly to allow for a "longer" uaccess block.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Co-developed-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 54 +++++++++++++++++++++------------
1 file changed, 34 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index d668f8af18fe..a471e97589a8 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -849,44 +849,51 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
unsigned long msr = regs->msr;
frame = get_sigframe(ksig, tsk, sizeof(*frame), 0);
- if (!access_ok(frame, sizeof(*frame)))
- goto badframe;
- err |= __put_user(&frame->info, &frame->pinfo);
- err |= __put_user(&frame->uc, &frame->puc);
- err |= copy_siginfo_to_user(&frame->info, &ksig->info);
- if (err)
+ /* This only applies when calling unsafe_setup_sigcontext() and must be
+ * called before opening the uaccess window.
+ */
+ if (!MSR_TM_ACTIVE(msr))
+ prepare_setup_sigcontext(tsk, 1);
+
+ if (!user_write_access_begin(frame, sizeof(*frame)))
goto badframe;
+ unsafe_put_user(&frame->info, &frame->pinfo, badframe_block);
+ unsafe_put_user(&frame->uc, &frame->puc, badframe_block);
+
/* Create the ucontext. */
- err |= __put_user(0, &frame->uc.uc_flags);
- err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]);
+ unsafe_put_user(0, &frame->uc.uc_flags, badframe_block);
+ unsafe_save_altstack(&frame->uc.uc_stack, regs->gpr[1], badframe_block);
if (MSR_TM_ACTIVE(msr)) {
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* The ucontext_t passed to userland points to the second
* ucontext_t (for transactional state) with its uc_link ptr.
*/
- err |= __put_user(&frame->uc_transact, &frame->uc.uc_link);
+ unsafe_put_user(&frame->uc_transact, &frame->uc.uc_link, badframe_block);
+
+ user_write_access_end();
+
err |= setup_tm_sigcontexts(&frame->uc.uc_mcontext,
&frame->uc_transact.uc_mcontext,
tsk, ksig->sig, NULL,
(unsigned long)ksig->ka.sa.sa_handler,
msr);
+
+ if (!user_write_access_begin(frame, sizeof(struct rt_sigframe)))
+ goto badframe;
+
#endif
} else {
- err |= __put_user(0, &frame->uc.uc_link);
- prepare_setup_sigcontext(tsk, 1);
- if (!user_write_access_begin(frame, sizeof(struct rt_sigframe)))
- return -EFAULT;
- err |= __unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk,
- ksig->sig, NULL,
- (unsigned long)ksig->ka.sa.sa_handler, 1);
- user_write_access_end();
+ unsafe_put_user(0, &frame->uc.uc_link, badframe_block);
+ unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
+ NULL, (unsigned long)ksig->ka.sa.sa_handler,
+ 1, badframe_block);
}
- err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
- if (err)
- goto badframe;
+
+ unsafe_copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set), badframe_block);
+ user_write_access_end();
/* Make sure signal handler doesn't get spurious FP exceptions */
tsk->thread.fp_state.fpscr = 0;
@@ -901,6 +908,11 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
regs->nip = (unsigned long) &frame->tramp[0];
}
+
+ /* Save the siginfo outside of the unsafe block. */
+ if (copy_siginfo_to_user(&frame->info, &ksig->info))
+ goto badframe;
+
/* Allocate a dummy caller frame for the signal handler. */
newsp = ((unsigned long)frame) - __SIGNAL_FRAMESIZE;
err |= put_user(regs->gpr[1], (unsigned long __user *)newsp);
@@ -940,6 +952,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
return 0;
+badframe_block:
+ user_write_access_end();
badframe:
signal_fault(current, regs, "handle_rt_signal64", frame);
--
2.26.1
^ permalink raw reply related
* [PATCH v4 06/10] powerpc/signal64: Replace setup_sigcontext() w/ unsafe_setup_sigcontext()
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
Previously setup_sigcontext() performed a costly KUAP switch on every
uaccess operation. These repeated uaccess switches cause a significant
drop in signal handling performance.
Rewrite setup_sigcontext() to assume that a userspace write access window
is open. Replace all uaccess functions with their 'unsafe' versions
which avoid the repeated uaccess switches.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 70 ++++++++++++++++++++-------------
1 file changed, 43 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 8e1d804ce552..4248e4489ff1 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -101,9 +101,13 @@ static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_re
* Set up the sigcontext for the signal frame.
*/
-static long setup_sigcontext(struct sigcontext __user *sc,
- struct task_struct *tsk, int signr, sigset_t *set,
- unsigned long handler, int ctx_has_vsx_region)
+#define unsafe_setup_sigcontext(sc, tsk, signr, set, handler, \
+ ctx_has_vsx_region, e) \
+ unsafe_op_wrap(__unsafe_setup_sigcontext(sc, tsk, signr, set, \
+ handler, ctx_has_vsx_region), e)
+static long notrace __unsafe_setup_sigcontext(struct sigcontext __user *sc,
+ struct task_struct *tsk, int signr, sigset_t *set,
+ unsigned long handler, int ctx_has_vsx_region)
{
/* When CONFIG_ALTIVEC is set, we _always_ setup v_regs even if the
* process never used altivec yet (MSR_VEC is zero in pt_regs of
@@ -118,20 +122,19 @@ static long setup_sigcontext(struct sigcontext __user *sc,
#endif
struct pt_regs *regs = tsk->thread.regs;
unsigned long msr = regs->msr;
- long err = 0;
/* Force usr to alway see softe as 1 (interrupts enabled) */
unsigned long softe = 0x1;
BUG_ON(tsk != current);
#ifdef CONFIG_ALTIVEC
- err |= __put_user(v_regs, &sc->v_regs);
+ unsafe_put_user(v_regs, &sc->v_regs, efault_out);
/* save altivec registers */
if (tsk->thread.used_vr) {
/* Copy 33 vec registers (vr0..31 and vscr) to the stack */
- err |= __copy_to_user(v_regs, &tsk->thread.vr_state,
- 33 * sizeof(vector128));
+ unsafe_copy_to_user(v_regs, &tsk->thread.vr_state,
+ 33 * sizeof(vector128), efault_out);
/* set MSR_VEC in the MSR value in the frame to indicate that sc->v_reg)
* contains valid data.
*/
@@ -140,12 +143,12 @@ static long setup_sigcontext(struct sigcontext __user *sc,
/* We always copy to/from vrsave, it's 0 if we don't have or don't
* use altivec.
*/
- err |= __put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]);
+ unsafe_put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33], efault_out);
#else /* CONFIG_ALTIVEC */
- err |= __put_user(0, &sc->v_regs);
+ unsafe_put_user(0, &sc->v_regs, efault_out);
#endif /* CONFIG_ALTIVEC */
/* copy fpr regs and fpscr */
- err |= copy_fpr_to_user(&sc->fp_regs, tsk);
+ unsafe_copy_fpr_to_user(&sc->fp_regs, tsk, efault_out);
/*
* Clear the MSR VSX bit to indicate there is no valid state attached
@@ -160,24 +163,27 @@ static long setup_sigcontext(struct sigcontext __user *sc,
*/
if (tsk->thread.used_vsr && ctx_has_vsx_region) {
v_regs += ELF_NVRREG;
- err |= copy_vsx_to_user(v_regs, tsk);
+ unsafe_copy_vsx_to_user(v_regs, tsk, efault_out);
/* set MSR_VSX in the MSR value in the frame to
* indicate that sc->vs_reg) contains valid data.
*/
msr |= MSR_VSX;
}
#endif /* CONFIG_VSX */
- err |= __put_user(&sc->gp_regs, &sc->regs);
+ unsafe_put_user(&sc->gp_regs, &sc->regs, efault_out);
WARN_ON(!FULL_REGS(regs));
- err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
- err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __put_user(softe, &sc->gp_regs[PT_SOFTE]);
- err |= __put_user(signr, &sc->signal);
- err |= __put_user(handler, &sc->handler);
+ unsafe_copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE, efault_out);
+ unsafe_put_user(msr, &sc->gp_regs[PT_MSR], efault_out);
+ unsafe_put_user(softe, &sc->gp_regs[PT_SOFTE], efault_out);
+ unsafe_put_user(signr, &sc->signal, efault_out);
+ unsafe_put_user(handler, &sc->handler, efault_out);
if (set != NULL)
- err |= __put_user(set->sig[0], &sc->oldmask);
+ unsafe_put_user(set->sig[0], &sc->oldmask, efault_out);
- return err;
+ return 0;
+
+efault_out:
+ return -EFAULT;
}
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
@@ -664,12 +670,15 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
if (old_ctx != NULL) {
prepare_setup_sigcontext(current, ctx_has_vsx_region);
- if (!access_ok(old_ctx, ctx_size)
- || setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL, 0,
- ctx_has_vsx_region)
- || __copy_to_user(&old_ctx->uc_sigmask,
- ¤t->blocked, sizeof(sigset_t)))
+ if (!user_write_access_begin(old_ctx, ctx_size))
return -EFAULT;
+
+ unsafe_setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL,
+ 0, ctx_has_vsx_region, efault_out);
+ unsafe_copy_to_user(&old_ctx->uc_sigmask, ¤t->blocked,
+ sizeof(sigset_t), efault_out);
+
+ user_write_access_end();
}
if (new_ctx == NULL)
return 0;
@@ -698,6 +707,10 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
/* This returns like rt_sigreturn */
set_thread_flag(TIF_RESTOREALL);
return 0;
+
+efault_out:
+ user_write_access_end();
+ return -EFAULT;
}
@@ -850,9 +863,12 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
} else {
err |= __put_user(0, &frame->uc.uc_link);
prepare_setup_sigcontext(tsk, 1);
- err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
- NULL, (unsigned long)ksig->ka.sa.sa_handler,
- 1);
+ if (!user_write_access_begin(frame, sizeof(struct rt_sigframe)))
+ return -EFAULT;
+ err |= __unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk,
+ ksig->sig, NULL,
+ (unsigned long)ksig->ka.sa.sa_handler, 1);
+ user_write_access_end();
}
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
if (err)
--
2.26.1
^ permalink raw reply related
* [PATCH v4 05/10] powerpc/signal64: Remove TM ifdefery in middle of if/else block
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
Rework the messy ifdef breaking up the if-else for TM similar to
commit f1cf4f93de2f ("powerpc/signal32: Remove ifdefery in middle of if/else").
Unlike that commit for ppc32, the ifdef can't be removed entirely since
uc_transact in sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index b211a8ea4f6e..8e1d804ce552 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -710,9 +710,7 @@ SYSCALL_DEFINE0(rt_sigreturn)
struct pt_regs *regs = current_pt_regs();
struct ucontext __user *uc = (struct ucontext __user *)regs->gpr[1];
sigset_t set;
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
unsigned long msr;
-#endif
/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;
@@ -765,7 +763,10 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
goto badframe;
+#endif
+
if (MSR_TM_ACTIVE(msr)) {
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* We recheckpoint on return. */
struct ucontext __user *uc_transact;
@@ -778,9 +779,8 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (restore_tm_sigcontexts(current, &uc->uc_mcontext,
&uc_transact->uc_mcontext))
goto badframe;
- } else
#endif
- {
+ } else {
/*
* Fall through, for non-TM restore
*
@@ -818,10 +818,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
unsigned long newsp = 0;
long err = 0;
struct pt_regs *regs = tsk->thread.regs;
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* Save the thread's msr before get_tm_stackpointer() changes it */
unsigned long msr = regs->msr;
-#endif
frame = get_sigframe(ksig, tsk, sizeof(*frame), 0);
if (!access_ok(frame, sizeof(*frame)))
@@ -836,8 +834,9 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
/* Create the ucontext. */
err |= __put_user(0, &frame->uc.uc_flags);
err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]);
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+
if (MSR_TM_ACTIVE(msr)) {
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* The ucontext_t passed to userland points to the second
* ucontext_t (for transactional state) with its uc_link ptr.
*/
@@ -847,9 +846,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
tsk, ksig->sig, NULL,
(unsigned long)ksig->ka.sa.sa_handler,
msr);
- } else
#endif
- {
+ } else {
err |= __put_user(0, &frame->uc.uc_link);
prepare_setup_sigcontext(tsk, 1);
err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
--
2.26.1
^ permalink raw reply related
* [PATCH v4 00/10] Improve signal performance on PPC64 with KUAP
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
As reported by Anton, there is a large penalty to signal handling
performance on radix systems using KUAP. The signal handling code
performs many user access operations, each of which needs to switch the
KUAP permissions bit to open and then close user access. This involves a
costly 'mtspr' operation [0].
There is existing work done on x86 and by Christopher Leroy for PPC32 to
instead open up user access in "blocks" using user_*_access_{begin,end}.
We can do the same in PPC64 to bring performance back up on KUAP-enabled
radix and now also hash MMU systems [1].
Hash MMU KUAP support along with uaccess flush has landed in linuxppc/next
since the last revision. This series also provides a large benefit on hash
with KUAP. However, in the hash implementation of KUAP the user AMR is
always restored during system_call_exception() which cannot be avoided.
Fewer user access switches naturally also result in less uaccess flushing.
The first two patches add some needed 'unsafe' versions of copy-from
functions. While these do not make use of asm-goto they still allow for
avoiding the repeated uaccess switches.
The third patch moves functions called by setup_sigcontext() into a new
prepare_setup_sigcontext() to simplify converting setup_sigcontext()
into an 'unsafe' version which assumes an open uaccess window later.
The fourth and fifths patches clean-up some of the Transactional Memory
ifdef stuff to simplify using uaccess blocks later.
The next two patches rewrite some of the signal64 helper functions to
be 'unsafe'. Finally, the last three patches update the main signal
handling functions to make use of the new 'unsafe' helpers and eliminate
some additional uaccess switching.
I used the will-it-scale signal1 benchmark to measure and compare
performance [2]. The below results are from running a minimal
kernel+initramfs QEMU/KVM guest on a POWER9 Blackbird:
signal1_threads -t1 -s10
| | hash | radix |
| --------------------------- | ------ | ------ |
| linuxppc/next | 118693 | 133296 |
| linuxppc/next w/o KUAP+KUEP | 228911 | 228654 |
| unsafe-signal64 | 199443 | 234716 |
[0]: https://github.com/linuxppc/issues/issues/277
[1]: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278
[2]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/signal1.c
v4: * Fix issues identified by Christophe Leroy (thanks for review)
* Use __get_user() directly to copy the 8B sigset_t
v3: * Rebase on latest linuxppc/next
* Reword confusing commit messages
* Add missing comma in macro in signal.h which broke compiles without
CONFIG_ALTIVEC
* Validate hash KUAP signal performance improvements
v2: * Rebase on latest linuxppc/next + Christophe Leroy's PPC32
signal series
* Simplify/remove TM ifdefery similar to PPC32 series and clean
up the uaccess begin/end calls
* Isolate non-inline functions so they are not called when
uaccess window is open
Christopher M. Riedl (8):
powerpc/uaccess: Add unsafe_copy_from_user
powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user()
powerpc/signal64: Move non-inline functions out of setup_sigcontext()
powerpc: Reference param in MSR_TM_ACTIVE() macro
powerpc/signal64: Remove TM ifdefery in middle of if/else block
powerpc/signal64: Replace setup_sigcontext() w/
unsafe_setup_sigcontext()
powerpc/signal64: Replace restore_sigcontext() w/
unsafe_restore_sigcontext()
powerpc/signal64: Use __get_user() to copy sigset_t
Daniel Axtens (2):
powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess
switches
powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
arch/powerpc/include/asm/reg.h | 2 +-
arch/powerpc/include/asm/uaccess.h | 3 +
arch/powerpc/kernel/signal.h | 33 ++++
arch/powerpc/kernel/signal_64.c | 251 ++++++++++++++++++-----------
4 files changed, 196 insertions(+), 93 deletions(-)
--
2.26.1
^ permalink raw reply
* [PATCH v4 01/10] powerpc/uaccess: Add unsafe_copy_from_user
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
Just wrap __copy_tofrom_user() for the usual 'unsafe' pattern which
takes in a label to goto on error.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/include/asm/uaccess.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 501c9a79038c..036e82eefac9 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -542,6 +542,9 @@ user_write_access_begin(const void __user *ptr, size_t len)
#define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
#define unsafe_put_user(x, p, e) __put_user_goto(x, p, e)
+#define unsafe_copy_from_user(d, s, l, e) \
+ unsafe_op_wrap(__copy_tofrom_user((__force void __user *)d, s, l), e)
+
#define unsafe_copy_to_user(d, s, l, e) \
do { \
u8 __user *_dst = (u8 __user *)(d); \
--
2.26.1
^ permalink raw reply related
* [PATCH v4 02/10] powerpc/signal: Add unsafe_copy_{vsx, fpr}_from_user()
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
Reuse the "safe" implementation from signal.c except for calling
unsafe_copy_from_user() to copy into a local buffer.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
index 2559a681536e..c18402d625f1 100644
--- a/arch/powerpc/kernel/signal.h
+++ b/arch/powerpc/kernel/signal.h
@@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
&buf[i], label);\
} while (0)
+#define unsafe_copy_fpr_from_user(task, from, label) do { \
+ struct task_struct *__t = task; \
+ u64 __user *__f = (u64 __user *)from; \
+ u64 buf[ELF_NFPREG]; \
+ int i; \
+ \
+ unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double), \
+ label); \
+ for (i = 0; i < ELF_NFPREG - 1; i++) \
+ __t->thread.TS_FPR(i) = buf[i]; \
+ __t->thread.fp_state.fpscr = buf[i]; \
+} while (0)
+
+#define unsafe_copy_vsx_from_user(task, from, label) do { \
+ struct task_struct *__t = task; \
+ u64 __user *__f = (u64 __user *)from; \
+ u64 buf[ELF_NVSRHALFREG]; \
+ int i; \
+ \
+ unsafe_copy_from_user(buf, __f, \
+ ELF_NVSRHALFREG * sizeof(double), \
+ label); \
+ for (i = 0; i < ELF_NVSRHALFREG ; i++) \
+ __t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; \
+} while (0)
+
+
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
#define unsafe_copy_ckfpr_to_user(to, task, label) do { \
struct task_struct *__t = task; \
@@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
unsafe_copy_to_user(to, (task)->thread.fp_state.fpr, \
ELF_NFPREG * sizeof(double), label)
+#define unsafe_copy_fpr_from_user(task, from, label) \
+ unsafe_copy_from_user((task)->thread.fp_state.fpr, from, \
+ ELF_NFPREG * sizeof(double), label)
+
static inline unsigned long
copy_fpr_to_user(void __user *to, struct task_struct *task)
{
@@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from)
#else
#define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
+#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0)
+
static inline unsigned long
copy_fpr_to_user(void __user *to, struct task_struct *task)
{
--
2.26.1
^ permalink raw reply related
* [PATCH v4 09/10] powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
From: Daniel Axtens <dja@axtens.net>
Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Co-developed-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index a471e97589a8..817b64e1e409 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -782,9 +782,13 @@ SYSCALL_DEFINE0(rt_sigreturn)
* restore_tm_sigcontexts.
*/
regs->msr &= ~MSR_TS_MASK;
+#endif
- if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
+ if (!user_read_access_begin(uc, sizeof(*uc)))
goto badframe;
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+ unsafe_get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR], badframe_block);
#endif
if (MSR_TM_ACTIVE(msr)) {
@@ -794,10 +798,12 @@ SYSCALL_DEFINE0(rt_sigreturn)
/* Trying to start TM on non TM system */
if (!cpu_has_feature(CPU_FTR_TM))
- goto badframe;
+ goto badframe_block;
+
+ unsafe_get_user(uc_transact, &uc->uc_link, badframe_block);
+
+ user_read_access_end();
- if (__get_user(uc_transact, &uc->uc_link))
- goto badframe;
if (restore_tm_sigcontexts(current, &uc->uc_mcontext,
&uc_transact->uc_mcontext))
goto badframe;
@@ -816,12 +822,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
* causing a TM bad thing.
*/
current->thread.regs->msr &= ~MSR_TS_MASK;
- if (!user_read_access_begin(uc, sizeof(*uc)))
- return -EFAULT;
- if (__unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) {
- user_read_access_end();
- goto badframe;
- }
+ unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext,
+ badframe_block);
+
user_read_access_end();
}
@@ -831,6 +834,8 @@ SYSCALL_DEFINE0(rt_sigreturn)
set_thread_flag(TIF_RESTOREALL);
return 0;
+badframe_block:
+ user_read_access_end();
badframe:
signal_fault(current, regs, "rt_sigreturn", uc);
--
2.26.1
^ permalink raw reply related
* [PATCH v4 04/10] powerpc: Reference param in MSR_TM_ACTIVE() macro
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
Unlike the other MSR_TM_* macros, MSR_TM_ACTIVE does not reference or
use its parameter unless CONFIG_PPC_TRANSACTIONAL_MEM is defined. This
causes an 'unused variable' compile warning unless the variable is also
guarded with CONFIG_PPC_TRANSACTIONAL_MEM.
Reference but do nothing with the argument in the macro to avoid a
potential compile warning.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/include/asm/reg.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index e40a921d78f9..c5a3e856191c 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -124,7 +124,7 @@
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
#define MSR_TM_ACTIVE(x) (((x) & MSR_TS_MASK) != 0) /* Transaction active? */
#else
-#define MSR_TM_ACTIVE(x) 0
+#define MSR_TM_ACTIVE(x) ((void)(x), 0)
#endif
#if defined(CONFIG_PPC_BOOK3S_64)
--
2.26.1
^ permalink raw reply related
* [PATCH v4 10/10] powerpc/signal64: Use __get_user() to copy sigset_t
From: Christopher M. Riedl @ 2021-01-28 4:04 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210128040424.12720-1-cmr@codefail.de>
Usually sigset_t is exactly 8B which is a "trivial" size and does not
warrant using __copy_from_user(). Use __get_user() directly in
anticipation of future work to remove the trivial size optimizations
from __copy_from_user(). Calling __get_user() also results in a small
boost to signal handling throughput here.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 817b64e1e409..42fdc4a7ff72 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -97,6 +97,14 @@ static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_re
#endif /* CONFIG_VSX */
}
+static inline int get_user_sigset(sigset_t *dst, const sigset_t *src)
+{
+ if (sizeof(sigset_t) <= 8)
+ return __get_user(dst->sig[0], &src->sig[0]);
+ else
+ return __copy_from_user(dst, src, sizeof(sigset_t));
+}
+
/*
* Set up the sigcontext for the signal frame.
*/
@@ -701,8 +709,9 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
* We kill the task with a SIGSEGV in this situation.
*/
- if (__copy_from_user(&set, &new_ctx->uc_sigmask, sizeof(set)))
+ if (get_user_sigset(&set, &new_ctx->uc_sigmask))
do_exit(SIGSEGV);
+
set_current_blocked(&set);
if (!user_read_access_begin(new_ctx, ctx_size))
@@ -740,8 +749,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (!access_ok(uc, sizeof(*uc)))
goto badframe;
- if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))
+ if (get_user_sigset(&set, &uc->uc_sigmask))
goto badframe;
+
set_current_blocked(&set);
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
--
2.26.1
^ permalink raw reply related
* Re: [PATCH 2/2] powerpc/vdso64: remove meaningless vgettimeofday.o build rule
From: Masahiro Yamada @ 2021-01-28 4:01 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
linuxppc-dev
Cc: Catalin Marinas, Nicholas Piggin, Greentime Hu,
Linux Kernel Mailing List
In-Reply-To: <20201223171142.707053-2-masahiroy@kernel.org>
On Thu, Dec 24, 2020 at 2:12 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> VDSO64 is only built for the 64-bit kernel, hence vgettimeofday.o is
> built by the generic rule in scripts/Makefile.build.
>
> This line does not provide anything useful.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Michael, please take a look at this too.
> ---
>
> arch/powerpc/kernel/vdso64/Makefile | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
> index b50b39fedf74..422addf394c7 100644
> --- a/arch/powerpc/kernel/vdso64/Makefile
> +++ b/arch/powerpc/kernel/vdso64/Makefile
> @@ -32,8 +32,6 @@ asflags-y := -D__VDSO64__ -s
> targets += vdso64.lds
> CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
> -$(obj)/vgettimeofday.o: %.o: %.c FORCE
> -
> # link rule for the .so file, .lds has to be first
> $(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) $(obj)/vgettimeofday.o FORCE
> $(call if_changed,vdso64ld_and_check)
> --
> 2.27.0
>
--
Best Regards
Masahiro Yamada
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o
From: Masahiro Yamada @ 2021-01-28 4:01 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
linuxppc-dev
Cc: Ravi Bangoria, Linux Kernel Mailing List, Nicholas Piggin,
Oliver O'Halloran, Greentime Hu, Michal Suchanek,
Ard Biesheuvel, Daniel Axtens
In-Reply-To: <20201223171142.707053-1-masahiroy@kernel.org>
On Thu, Dec 24, 2020 at 2:12 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> vgettimeofday.o is unnecessarily rebuilt. Adding it to 'targets' is not
> enough to fix the issue. Kbuild is correctly rebuilding it because the
> command line is changed.
>
> PowerPC builds each vdso directory twice; first in vdso_prepare to
> generate vdso{32,64}-offsets.h, second as part of the ordinary build
> process to embed vdso{32,64}.so.dbg into the kernel.
>
> The problem shows up when CONFIG_PPC_WERROR=y due to the following line
> in arch/powerpc/Kbuild:
>
> subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
>
> In the preparation stage, Kbuild directly visits the vdso directories,
> hence it does not inherit subdir-ccflags-y. In the second descend,
> Kbuild adds -Werror, which results in the command line flipping
> with/without -Werror.
>
> It implies a potential danger; if a more critical flag that would impact
> the resulted vdso, the offsets recorded in the headers might be different
> from real offsets in the embedded vdso images.
>
> Removing the unneeded second descend solves the problem.
>
> Link: https://lore.kernel.org/linuxppc-dev/87tuslxhry.fsf@mpe.ellerman.id.au/
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
Michael, please take a look at this.
The unneeded rebuild problem is still remaining.
>
> arch/powerpc/kernel/Makefile | 4 ++--
> arch/powerpc/kernel/vdso32/Makefile | 5 +----
> arch/powerpc/kernel/{vdso32 => }/vdso32_wrapper.S | 0
> arch/powerpc/kernel/vdso64/Makefile | 6 +-----
> arch/powerpc/kernel/{vdso64 => }/vdso64_wrapper.S | 0
> 5 files changed, 4 insertions(+), 11 deletions(-)
> rename arch/powerpc/kernel/{vdso32 => }/vdso32_wrapper.S (100%)
> rename arch/powerpc/kernel/{vdso64 => }/vdso64_wrapper.S (100%)
>
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index fe2ef598e2ea..79ee7750937d 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -51,7 +51,7 @@ obj-y += ptrace/
> obj-$(CONFIG_PPC64) += setup_64.o \
> paca.o nvram_64.o note.o syscall_64.o
> obj-$(CONFIG_COMPAT) += sys_ppc32.o signal_32.o
> -obj-$(CONFIG_VDSO32) += vdso32/
> +obj-$(CONFIG_VDSO32) += vdso32_wrapper.o
> obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
> obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
> obj-$(CONFIG_PPC_DAWR) += dawr.o
> @@ -60,7 +60,7 @@ obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o
> obj-$(CONFIG_PPC_BOOK3S_64) += mce.o mce_power.o
> obj-$(CONFIG_PPC_BOOK3E_64) += exceptions-64e.o idle_book3e.o
> obj-$(CONFIG_PPC_BARRIER_NOSPEC) += security.o
> -obj-$(CONFIG_PPC64) += vdso64/
> +obj-$(CONFIG_PPC64) += vdso64_wrapper.o
> obj-$(CONFIG_ALTIVEC) += vecemu.o
> obj-$(CONFIG_PPC_BOOK3S_IDLE) += idle_book3s.o
> procfs-y := proc_powerpc.o
> diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
> index 59aa2944ecae..42fc3de89b39 100644
> --- a/arch/powerpc/kernel/vdso32/Makefile
> +++ b/arch/powerpc/kernel/vdso32/Makefile
> @@ -30,7 +30,7 @@ CC32FLAGS += -m32
> KBUILD_CFLAGS := $(filter-out -mcmodel=medium,$(KBUILD_CFLAGS))
> endif
>
> -targets := $(obj-vdso32) vdso32.so.dbg
> +targets := $(obj-vdso32) vdso32.so.dbg vgettimeofday.o
> obj-vdso32 := $(addprefix $(obj)/, $(obj-vdso32))
>
> GCOV_PROFILE := n
> @@ -46,9 +46,6 @@ obj-y += vdso32_wrapper.o
> targets += vdso32.lds
> CPPFLAGS_vdso32.lds += -P -C -Upowerpc
>
> -# Force dependency (incbin is bad)
> -$(obj)/vdso32_wrapper.o : $(obj)/vdso32.so.dbg
> -
> # link rule for the .so file, .lds has to be first
> $(obj)/vdso32.so.dbg: $(src)/vdso32.lds $(obj-vdso32) $(obj)/vgettimeofday.o FORCE
> $(call if_changed,vdso32ld_and_check)
> diff --git a/arch/powerpc/kernel/vdso32/vdso32_wrapper.S b/arch/powerpc/kernel/vdso32_wrapper.S
> similarity index 100%
> rename from arch/powerpc/kernel/vdso32/vdso32_wrapper.S
> rename to arch/powerpc/kernel/vdso32_wrapper.S
> diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
> index d365810a689a..b50b39fedf74 100644
> --- a/arch/powerpc/kernel/vdso64/Makefile
> +++ b/arch/powerpc/kernel/vdso64/Makefile
> @@ -17,7 +17,7 @@ endif
>
> # Build rules
>
> -targets := $(obj-vdso64) vdso64.so.dbg
> +targets := $(obj-vdso64) vdso64.so.dbg vgettimeofday.o
> obj-vdso64 := $(addprefix $(obj)/, $(obj-vdso64))
>
> GCOV_PROFILE := n
> @@ -29,15 +29,11 @@ ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
> -Wl,-soname=linux-vdso64.so.1 -Wl,--hash-style=both
> asflags-y := -D__VDSO64__ -s
>
> -obj-y += vdso64_wrapper.o
> targets += vdso64.lds
> CPPFLAGS_vdso64.lds += -P -C -U$(ARCH)
>
> $(obj)/vgettimeofday.o: %.o: %.c FORCE
>
> -# Force dependency (incbin is bad)
> -$(obj)/vdso64_wrapper.o : $(obj)/vdso64.so.dbg
> -
> # link rule for the .so file, .lds has to be first
> $(obj)/vdso64.so.dbg: $(src)/vdso64.lds $(obj-vdso64) $(obj)/vgettimeofday.o FORCE
> $(call if_changed,vdso64ld_and_check)
> diff --git a/arch/powerpc/kernel/vdso64/vdso64_wrapper.S b/arch/powerpc/kernel/vdso64_wrapper.S
> similarity index 100%
> rename from arch/powerpc/kernel/vdso64/vdso64_wrapper.S
> rename to arch/powerpc/kernel/vdso64_wrapper.S
> --
> 2.27.0
>
--
Best Regards
Masahiro Yamada
^ permalink raw reply
* Re: [PATCH v15 09/10] arm64: Call kmalloc() to allocate DTB buffer
From: Lakshmi Ramasubramanian @ 2021-01-28 4:00 UTC (permalink / raw)
To: Thiago Jung Bauermann, Will Deacon
Cc: mark.rutland, bhsharma, tao.li, zohar, paulus, vincenzo.frascino,
frowand.list, sashal, robh, masahiroy, jmorris, takahiro.akashi,
linux-arm-kernel, catalin.marinas, serge, devicetree,
pasha.tatashin, prsriva, hsinyi, allison, christophe.leroy,
mbrugger, balajib, gregkh, linux-kernel, james.morse,
dmitry.kasatkin, linux-integrity, linuxppc-dev
In-Reply-To: <871re5soof.fsf@manicouagan.localdomain>
On 1/27/21 7:52 PM, Thiago Jung Bauermann wrote:
>
> Will Deacon <will@kernel.org> writes:
>
>> On Wed, Jan 27, 2021 at 09:59:38AM -0800, Lakshmi Ramasubramanian wrote:
>>> On 1/27/21 8:52 AM, Will Deacon wrote:
>>>
>>> Hi Will,
>>>
>>>> On Fri, Jan 15, 2021 at 09:30:16AM -0800, Lakshmi Ramasubramanian wrote:
>>>>> create_dtb() function allocates kernel virtual memory for
>>>>> the device tree blob (DTB). This is not consistent with other
>>>>> architectures, such as powerpc, which calls kmalloc() for allocating
>>>>> memory for the DTB.
>>>>>
>>>>> Call kmalloc() to allocate memory for the DTB, and kfree() to free
>>>>> the allocated memory.
>>>>>
>>>>> Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
>>>>> Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
>>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>>>>> ---
>>>>> arch/arm64/kernel/machine_kexec_file.c | 12 +++++++-----
>>>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>>>>> index 7de9c47dee7c..51c40143d6fa 100644
>>>>> --- a/arch/arm64/kernel/machine_kexec_file.c
>>>>> +++ b/arch/arm64/kernel/machine_kexec_file.c
>>>>> @@ -29,7 +29,7 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
>>>>> int arch_kimage_file_post_load_cleanup(struct kimage *image)
>>>>> {
>>>>> - vfree(image->arch.dtb);
>>>>> + kfree(image->arch.dtb);
>>>>> image->arch.dtb = NULL;
>>>>> vfree(image->arch.elf_headers);
>>>>> @@ -59,19 +59,21 @@ static int create_dtb(struct kimage *image,
>>>>> + cmdline_len + DTB_EXTRA_SPACE;
>>>>> for (;;) {
>>>>> - buf = vmalloc(buf_size);
>>>>> + buf = kmalloc(buf_size, GFP_KERNEL);
>>>>
>>>> Is there a functional need for this patch? I build the 'dtbs' target just
>>>> now and sdm845-db845c.dtb is approaching 100K, which feels quite large
>>>> for kmalloc().
>>>
>>> Changing the allocation from vmalloc() to kmalloc() would help us further
>>> consolidate the DTB setup code for powerpc and arm64.
>>
>> Ok, but at the risk of allocation failure. Can powerpc use vmalloc()
>> instead?
>
> I believe this patch stems from this suggestion by Rob Herring:
>
>> This could be taken a step further and do the allocation of the new
>> FDT. The difference is arm64 uses vmalloc and powerpc uses kmalloc. The
>> arm64 version also retries with a bigger allocation. That seems
>> unnecessary.
>
> in https://lore.kernel.org/linux-integrity/20201211221006.1052453-3-robh@kernel.org/
>
> The problem is that this patch implements only part of the suggestion,
> which isn't useful in itself. So the patch series should either drop
> this patch or consolidate the FDT allocation between the arches.
>
> I just tested on powernv and pseries platforms and powerpc can use
> vmalloc for the FDT buffer.
>
Thanks for verifying on powerpc platform Thiago.
I'll update the patch to do the following:
=> Use vmalloc for FDT buffer allocation on powerpc
=> Keep vmalloc for arm64, but remove the retry on allocation.
=> Also, there was a memory leak of FDT buffer in the error code path on
arm64, which I'll fix as well.
Did I miss anything?
thanks,
-lakshmi
^ permalink raw reply
* Re: [PATCH v15 09/10] arm64: Call kmalloc() to allocate DTB buffer
From: Thiago Jung Bauermann @ 2021-01-28 3:52 UTC (permalink / raw)
To: Will Deacon
Cc: mark.rutland, bhsharma, tao.li, zohar, paulus, vincenzo.frascino,
frowand.list, sashal, robh, Lakshmi Ramasubramanian, masahiroy,
jmorris, takahiro.akashi, linux-arm-kernel, catalin.marinas,
serge, devicetree, pasha.tatashin, prsriva, hsinyi, allison,
christophe.leroy, mbrugger, balajib, dmitry.kasatkin,
linux-kernel, james.morse, gregkh, linux-integrity, linuxppc-dev
In-Reply-To: <20210127184319.GA676@willie-the-truck>
Will Deacon <will@kernel.org> writes:
> On Wed, Jan 27, 2021 at 09:59:38AM -0800, Lakshmi Ramasubramanian wrote:
>> On 1/27/21 8:52 AM, Will Deacon wrote:
>>
>> Hi Will,
>>
>> > On Fri, Jan 15, 2021 at 09:30:16AM -0800, Lakshmi Ramasubramanian wrote:
>> > > create_dtb() function allocates kernel virtual memory for
>> > > the device tree blob (DTB). This is not consistent with other
>> > > architectures, such as powerpc, which calls kmalloc() for allocating
>> > > memory for the DTB.
>> > >
>> > > Call kmalloc() to allocate memory for the DTB, and kfree() to free
>> > > the allocated memory.
>> > >
>> > > Co-developed-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
>> > > Signed-off-by: Prakhar Srivastava <prsriva@linux.microsoft.com>
>> > > Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>> > > ---
>> > > arch/arm64/kernel/machine_kexec_file.c | 12 +++++++-----
>> > > 1 file changed, 7 insertions(+), 5 deletions(-)
>> > >
>> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> > > index 7de9c47dee7c..51c40143d6fa 100644
>> > > --- a/arch/arm64/kernel/machine_kexec_file.c
>> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
>> > > @@ -29,7 +29,7 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
>> > > int arch_kimage_file_post_load_cleanup(struct kimage *image)
>> > > {
>> > > - vfree(image->arch.dtb);
>> > > + kfree(image->arch.dtb);
>> > > image->arch.dtb = NULL;
>> > > vfree(image->arch.elf_headers);
>> > > @@ -59,19 +59,21 @@ static int create_dtb(struct kimage *image,
>> > > + cmdline_len + DTB_EXTRA_SPACE;
>> > > for (;;) {
>> > > - buf = vmalloc(buf_size);
>> > > + buf = kmalloc(buf_size, GFP_KERNEL);
>> >
>> > Is there a functional need for this patch? I build the 'dtbs' target just
>> > now and sdm845-db845c.dtb is approaching 100K, which feels quite large
>> > for kmalloc().
>>
>> Changing the allocation from vmalloc() to kmalloc() would help us further
>> consolidate the DTB setup code for powerpc and arm64.
>
> Ok, but at the risk of allocation failure. Can powerpc use vmalloc()
> instead?
I believe this patch stems from this suggestion by Rob Herring:
> This could be taken a step further and do the allocation of the new
> FDT. The difference is arm64 uses vmalloc and powerpc uses kmalloc. The
> arm64 version also retries with a bigger allocation. That seems
> unnecessary.
in https://lore.kernel.org/linux-integrity/20201211221006.1052453-3-robh@kernel.org/
The problem is that this patch implements only part of the suggestion,
which isn't useful in itself. So the patch series should either drop
this patch or consolidate the FDT allocation between the arches.
I just tested on powernv and pseries platforms and powerpc can use
vmalloc for the FDT buffer.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page
From: Ding Tianhong @ 2021-01-28 3:13 UTC (permalink / raw)
To: Nicholas Piggin, linux-mm, Andrew Morton
Cc: linux-arch, linux-kernel, Christoph Hellwig, Jonathan Cameron,
Rick Edgecombe, linuxppc-dev, Christoph Hellwig
In-Reply-To: <20210126044510.2491820-2-npiggin@gmail.com>
On 2021/1/26 12:44, Nicholas Piggin wrote:
> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
> Whether or not a vmap is huge depends on the architecture details,
> alignments, boot options, etc., which the caller can not be expected
> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
>
> This change teaches vmalloc_to_page about larger pages, and returns
> the struct page that corresponds to the offset within the large page.
> This makes the API agnostic to mapping implementation details.
>
> [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
> fail gracefully on unexpected huge vmap mappings")
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> mm/vmalloc.c | 41 ++++++++++++++++++++++++++---------------
> 1 file changed, 26 insertions(+), 15 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e6f352bf0498..62372f9e0167 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -34,7 +34,7 @@
> #include <linux/bitops.h>
> #include <linux/rbtree_augmented.h>
> #include <linux/overflow.h>
> -
> +#include <linux/pgtable.h>
> #include <linux/uaccess.h>
> #include <asm/tlbflush.h>
> #include <asm/shmparam.h>
> @@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
> }
>
> /*
> - * Walk a vmap address to the struct page it maps.
> + * Walk a vmap address to the struct page it maps. Huge vmap mappings will
> + * return the tail page that corresponds to the base page address, which
> + * matches small vmap mappings.
> */
> struct page *vmalloc_to_page(const void *vmalloc_addr)
> {
> @@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>
> if (pgd_none(*pgd))
> return NULL;
> + if (WARN_ON_ONCE(pgd_leaf(*pgd)))
> + return NULL; /* XXX: no allowance for huge pgd */
> + if (WARN_ON_ONCE(pgd_bad(*pgd)))
> + return NULL;
> +
> p4d = p4d_offset(pgd, addr);
> if (p4d_none(*p4d))
> return NULL;
> - pud = pud_offset(p4d, addr);
> + if (p4d_leaf(*p4d))
> + return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
> + if (WARN_ON_ONCE(p4d_bad(*p4d)))
> + return NULL;
>
> - /*
> - * Don't dereference bad PUD or PMD (below) entries. This will also
> - * identify huge mappings, which we may encounter on architectures
> - * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
> - * identified as vmalloc addresses by is_vmalloc_addr(), but are
> - * not [unambiguously] associated with a struct page, so there is
> - * no correct value to return for them.
> - */
> - WARN_ON_ONCE(pud_bad(*pud));
> - if (pud_none(*pud) || pud_bad(*pud))
> + pud = pud_offset(p4d, addr);
> + if (pud_none(*pud))
> + return NULL;
> + if (pud_leaf(*pud))
> + return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
Hi Nicho:
/builds/1mzfdQzleCy69KZFb5qHNSEgabZ/mm/vmalloc.c: In function 'vmalloc_to_page':
/builds/1mzfdQzleCy69KZFb5qHNSEgabZ/include/asm-generic/pgtable-nop4d-hack.h:48:27: error: implicit declaration of function 'pud_page'; did you mean 'put_page'? [-Werror=implicit-function-declaration]
48 | #define pgd_page(pgd) (pud_page((pud_t){ pgd }))
| ^~~~~~~~
the pug_page is not defined for aarch32 when enabling 2-level page config, it break the system building.
> + if (WARN_ON_ONCE(pud_bad(*pud)))
> return NULL;
> +
> pmd = pmd_offset(pud, addr);
> - WARN_ON_ONCE(pmd_bad(*pmd));
> - if (pmd_none(*pmd) || pmd_bad(*pmd))
> + if (pmd_none(*pmd))
> + return NULL;
> + if (pmd_leaf(*pmd))
> + return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> + if (WARN_ON_ONCE(pmd_bad(*pmd)))
> return NULL;
>
> ptep = pte_offset_map(pmd, addr);
> @@ -389,6 +399,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
> if (pte_present(pte))
> page = pte_page(pte);
> pte_unmap(ptep);
> +
> return page;
> }
> EXPORT_SYMBOL(vmalloc_to_page);
>
^ permalink raw reply
* Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
From: Jens Axboe @ 2021-01-28 3:06 UTC (permalink / raw)
To: Zorro Lang, Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <20210128031355.GP14354@localhost.localdomain>
On 1/27/21 8:13 PM, Zorro Lang wrote:
> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
>>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>>>> The fail source line is:
>>>>>
>>>>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>>> return SIGSEGV;
>>>>>
>>>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>>>> io_uring, where the helper thread can assume the user app identity
>>>>> and could perform this fault just fine. So turn to use mm to decide
>>>>> if this is valid or not.
>>>>
>>>> I don't understand why testing is_user would be an issue. KUAP purpose
>>>> it to block any unallowed access from kernel to user memory
>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>>>> that is what is_user provides.
>>>>
>>>> If the kernel access is legitimate, kernel should have opened
>>>> userspace access then you shouldn't get this "Bug: Read fault blocked
>>>> by KUAP!".
>>>>
>>>> As far as I understand, the fault occurs in
>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>>>> access and you really should get a KUAP fault.
>>>>
>>>> So the problem is somewhere else, I think you proposed patch just
>>>> hides the problem, it doesn't fix it.
>>>
>>> If we do kthread_use_mm(), can we agree that the user access is valid?
>>
>> Yeah the io uring code is fine, provided it uses the uaccess primitives
>> like any other kernel code. It's looking more like a an arch/powerpc bug.
>>
>>> We should be able to copy to/from user space, and including faults, if
>>> that's been done and the new mm assigned. Because it really should be.
>>> If SMAP was a problem on x86, we would have seen it long ago.
>>>
>>> I'm assuming this may be breakage related to the recent uaccess changes
>>> related to set_fs and friends? Or maybe recent changes on the powerpc
>>> side?
>>>
>>> Zorro, did 5.10 work?
>>
>> Would be interesting to know.
>
> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
> commit(be the HEAD) in 5.10 phase?
I forget which versions had what series of this, but 5.10 final - and if
that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
5.10 definitely has them.
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING
From: Zorro Lang @ 2021-01-28 3:13 UTC (permalink / raw)
To: Nicholas Piggin, Jens Axboe; +Cc: linuxppc-dev
In-Reply-To: <1611792928.nw4g8h8kj4.astroid@bobo.none>
On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> > On 1/27/21 9:38 AM, Christophe Leroy wrote:
> >>
> >>
> >> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> >>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> >>> The fail source line is:
> >>>
> >>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> >>> return SIGSEGV;
> >>>
> >>> The is_user() is based on user_mod(regs) only. This's not suit for
> >>> io_uring, where the helper thread can assume the user app identity
> >>> and could perform this fault just fine. So turn to use mm to decide
> >>> if this is valid or not.
> >>
> >> I don't understand why testing is_user would be an issue. KUAP purpose
> >> it to block any unallowed access from kernel to user memory
> >> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> >> that is what is_user provides.
> >>
> >> If the kernel access is legitimate, kernel should have opened
> >> userspace access then you shouldn't get this "Bug: Read fault blocked
> >> by KUAP!".
> >>
> >> As far as I understand, the fault occurs in
> >> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> >> fault_in_pages_readable() uses __get_user() so it is a legitimate
> >> access and you really should get a KUAP fault.
> >>
> >> So the problem is somewhere else, I think you proposed patch just
> >> hides the problem, it doesn't fix it.
> >
> > If we do kthread_use_mm(), can we agree that the user access is valid?
>
> Yeah the io uring code is fine, provided it uses the uaccess primitives
> like any other kernel code. It's looking more like a an arch/powerpc bug.
>
> > We should be able to copy to/from user space, and including faults, if
> > that's been done and the new mm assigned. Because it really should be.
> > If SMAP was a problem on x86, we would have seen it long ago.
> >
> > I'm assuming this may be breakage related to the recent uaccess changes
> > related to set_fs and friends? Or maybe recent changes on the powerpc
> > side?
> >
> > Zorro, did 5.10 work?
>
> Would be interesting to know.
Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
commit(be the HEAD) in 5.10 phase?
Thanks,
Zorro
>
> Thanks,
> Nick
>
^ permalink raw reply
* Re: [PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range
From: Miaohe Lin @ 2021-01-28 2:38 UTC (permalink / raw)
To: Nicholas Piggin
Cc: linux-arch, Ding Tianhong, linux-kernel, Christoph Hellwig,
linux-mm, Jonathan Cameron, Andrew Morton, Rick Edgecombe,
linuxppc-dev
In-Reply-To: <20210126044510.2491820-5-npiggin@gmail.com>
Hi:
On 2021/1/26 12:45, Nicholas Piggin wrote:
> This will be used as a generic kernel virtual mapping function, so
> re-name it in preparation.
>
Looks good to me. Thanks.
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> mm/ioremap.c | 64 +++++++++++++++++++++++++++-------------------------
> 1 file changed, 33 insertions(+), 31 deletions(-)
>
> diff --git a/mm/ioremap.c b/mm/ioremap.c
> index 5fa1ab41d152..3f4d36f9745a 100644
> --- a/mm/ioremap.c
> +++ b/mm/ioremap.c
> @@ -61,9 +61,9 @@ static inline int ioremap_pud_enabled(void) { return 0; }
> static inline int ioremap_pmd_enabled(void) { return 0; }
> #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
>
> -static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
> - unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
> - pgtbl_mod_mask *mask)
> +static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot,
> + pgtbl_mod_mask *mask)
> {
> pte_t *pte;
> u64 pfn;
> @@ -81,9 +81,8 @@ static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
> return 0;
> }
>
> -static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
> - unsigned long end, phys_addr_t phys_addr,
> - pgprot_t prot)
> +static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot)
> {
> if (!ioremap_pmd_enabled())
> return 0;
> @@ -103,9 +102,9 @@ static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
> return pmd_set_huge(pmd, phys_addr, prot);
> }
>
> -static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> - unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
> - pgtbl_mod_mask *mask)
> +static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot,
> + pgtbl_mod_mask *mask)
> {
> pmd_t *pmd;
> unsigned long next;
> @@ -116,20 +115,19 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> do {
> next = pmd_addr_end(addr, end);
>
> - if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
> + if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
> *mask |= PGTBL_PMD_MODIFIED;
> continue;
> }
>
> - if (ioremap_pte_range(pmd, addr, next, phys_addr, prot, mask))
> + if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
> return -ENOMEM;
> } while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
> return 0;
> }
>
> -static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
> - unsigned long end, phys_addr_t phys_addr,
> - pgprot_t prot)
> +static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot)
> {
> if (!ioremap_pud_enabled())
> return 0;
> @@ -149,9 +147,9 @@ static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
> return pud_set_huge(pud, phys_addr, prot);
> }
>
> -static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
> - unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
> - pgtbl_mod_mask *mask)
> +static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot,
> + pgtbl_mod_mask *mask)
> {
> pud_t *pud;
> unsigned long next;
> @@ -162,20 +160,19 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
> do {
> next = pud_addr_end(addr, end);
>
> - if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
> + if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
> *mask |= PGTBL_PUD_MODIFIED;
> continue;
> }
>
> - if (ioremap_pmd_range(pud, addr, next, phys_addr, prot, mask))
> + if (vmap_pmd_range(pud, addr, next, phys_addr, prot, mask))
> return -ENOMEM;
> } while (pud++, phys_addr += (next - addr), addr = next, addr != end);
> return 0;
> }
>
> -static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
> - unsigned long end, phys_addr_t phys_addr,
> - pgprot_t prot)
> +static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot)
> {
> if (!ioremap_p4d_enabled())
> return 0;
> @@ -195,9 +192,9 @@ static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
> return p4d_set_huge(p4d, phys_addr, prot);
> }
>
> -static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
> - unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
> - pgtbl_mod_mask *mask)
> +static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot,
> + pgtbl_mod_mask *mask)
> {
> p4d_t *p4d;
> unsigned long next;
> @@ -208,19 +205,19 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
> do {
> next = p4d_addr_end(addr, end);
>
> - if (ioremap_try_huge_p4d(p4d, addr, next, phys_addr, prot)) {
> + if (vmap_try_huge_p4d(p4d, addr, next, phys_addr, prot)) {
> *mask |= PGTBL_P4D_MODIFIED;
> continue;
> }
>
> - if (ioremap_pud_range(p4d, addr, next, phys_addr, prot, mask))
> + if (vmap_pud_range(p4d, addr, next, phys_addr, prot, mask))
> return -ENOMEM;
> } while (p4d++, phys_addr += (next - addr), addr = next, addr != end);
> return 0;
> }
>
> -int ioremap_page_range(unsigned long addr,
> - unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> +static int vmap_range(unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot)
> {
> pgd_t *pgd;
> unsigned long start;
> @@ -235,8 +232,7 @@ int ioremap_page_range(unsigned long addr,
> pgd = pgd_offset_k(addr);
> do {
> next = pgd_addr_end(addr, end);
> - err = ioremap_p4d_range(pgd, addr, next, phys_addr, prot,
> - &mask);
> + err = vmap_p4d_range(pgd, addr, next, phys_addr, prot, &mask);
> if (err)
> break;
> } while (pgd++, phys_addr += (next - addr), addr = next, addr != end);
> @@ -249,6 +245,12 @@ int ioremap_page_range(unsigned long addr,
> return err;
> }
>
> +int ioremap_page_range(unsigned long addr,
> + unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> +{
> + return vmap_range(addr, end, phys_addr, prot);
> +}
> +
> #ifdef CONFIG_GENERIC_IOREMAP
> void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot)
> {
>
^ permalink raw reply
* Re: [PATCH 01/27] scripts: add generic syscalltbl.sh
From: Masahiro Yamada @ 2021-01-28 1:01 UTC (permalink / raw)
To: linux-arch, X86 ML
Cc: open list:TENSILICA XTENSA PORT (xtensa), linux-ia64,
linux-parisc, Linux Kbuild mailing list, Linux-sh list, linux-um,
Linux Kernel Mailing List, linux-mips, linux-m68k, linux-alpha,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210128005110.2613902-2-masahiroy@kernel.org>
On Thu, Jan 28, 2021 at 9:51 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> Most of architectures generate syscall headers at the compile time
> in the almost same way.
>
> The syscall table has the same format for all architectures. Each line
> has 3, 4 or 5 fields; syscall number, ABI, syscall name, native entry
> point, and compat entry point. The syscall table is processed by
> syscalltbl.sh script into header files.
>
> Despite the same pattern, scripts are maintained per architecture,
> which results in code duplication and bad maintainability.
>
> As of v5.11-rc1, 12 architectures duplicate similar shell scripts:
>
> $ find arch -name syscalltbl.sh | sort
> arch/alpha/kernel/syscalls/syscalltbl.sh
> arch/arm/tools/syscalltbl.sh
> arch/ia64/kernel/syscalls/syscalltbl.sh
> arch/m68k/kernel/syscalls/syscalltbl.sh
> arch/microblaze/kernel/syscalls/syscalltbl.sh
> arch/mips/kernel/syscalls/syscalltbl.sh
> arch/parisc/kernel/syscalls/syscalltbl.sh
> arch/powerpc/kernel/syscalls/syscalltbl.sh
> arch/sh/kernel/syscalls/syscalltbl.sh
> arch/sparc/kernel/syscalls/syscalltbl.sh
> arch/x86/entry/syscalls/syscalltbl.sh
> arch/xtensa/kernel/syscalls/syscalltbl.sh
>
> My goal is to unify them into a single file, scripts/syscalltbl.sh.
>
> For example, the i386 syscall table looks like this:
>
> 0 i386 restart_syscall sys_restart_syscall
> 1 i386 exit sys_exit
> 2 i386 fork sys_fork
> 3 i386 read sys_read
> 4 i386 write sys_write
> 5 i386 open sys_open compat_sys_open
> ...
>
> scripts/syscalltbl.sh generates the following code:
>
> __SYSCALL(0, sys_restart_syscall)
> __SYSCALL(1, sys_exit)
> __SYSCALL(2, sys_fork)
> __SYSCALL(3, sys_read)
> __SYSCALL(4, sys_write)
> __SYSCALL_WITH_COMPAT(5, sys_open, compat_sys_open)
> ...
>
> Then, the i386 kernel will do:
>
> #define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
>
> and the x86_64 kernel will do:
>
> #define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, compat)
>
> I noticed all 32/64 bit architectures can be covered by the same
> pattern. Having an arch-specific script is fine if there is a good
> reason to do so, but a single generic script should work for this case.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
> scripts/syscalltbl.sh | 52 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 52 insertions(+)
> create mode 100644 scripts/syscalltbl.sh
>
> diff --git a/scripts/syscalltbl.sh b/scripts/syscalltbl.sh
> new file mode 100644
> index 000000000000..15bf4e09f88c
> --- /dev/null
> +++ b/scripts/syscalltbl.sh
> @@ -0,0 +1,52 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Usage:
> +# scripts/syscalltbl.sh INFILE OUTFILE [ABIS] [OFFSET]
> +#
> +# INFILE: input syscall table
> +# OUTFILE: output file
> +# ABIS (optional): specify the ABIs to handle.
> +# If omitted, all lines are handled.
> +# OFFSET (optinal): spefify the offset of the syscall numbers.
> +# If omitted, the offset is zero.
> +#
> +# The syscall table format:
> +# nr abi name native [compat]
This line should be
nr abi name [native] [compat]
because the native entry point is also optional.
(if it is missing, sys_ni_syscall is used)
> +#
> +# nr: syscall number
> +# abi: ABI name
> +# name: syscall name
> +# native: native entry point
native (optional): native entry point
> +# compat (optional): compat entry point
> +
> +set -e
> +
> +in="$1"
> +out="$2"
> +abis=$(echo "($3)" | tr ',' '|')
> +offset="${4:-0}"
> +
> +nxt=$offset
> +
> +grep -E "^[0-9]+[[:space:]]+${abis}" "$in" | sort -n | {
> +
> + while read nr abi name native compat ; do
> +
> + nr=$((nr + $offset))
> +
> + while [ $nxt -lt $nr ]; do
> + echo "__SYSCALL($nxt, sys_ni_syscall)"
> + nxt=$((nxt + 1))
> + done
> +
> + if [ -n "$compat" ]; then
> + echo "__SYSCALL_WITH_COMPAT($nr, $native, $compat)"
> + elif [ -n "$native" ]; then
> + echo "__SYSCALL($nr, $native)"
> + else
> + echo "__SYSCALL($nr, sys_ni_syscall)"
> + fi
> + nxt=$((nr + 1))
> + done
> +} > "$out"
> --
> 2.27.0
>
--
Best Regards
Masahiro Yamada
^ permalink raw reply
* Re: [PATCH 02/27] x86/syscalls: fix -Wmissing-prototypes warnings from COND_SYSCALL()
From: Masahiro Yamada @ 2021-01-28 0:55 UTC (permalink / raw)
To: linux-arch, X86 ML
Cc: open list:TENSILICA XTENSA PORT (xtensa), linux-ia64,
linux-parisc, Linux Kbuild mailing list, Linux-sh list, linux-um,
Linux Kernel Mailing List, linux-mips, linux-m68k, linux-alpha,
sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210128005110.2613902-3-masahiroy@kernel.org>
On Thu, Jan 28, 2021 at 9:52 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> Building kernel/sys_ni.c with W=1 omits tons of -Wmissing-prototypes
This is a typo. "omits" -> "emits"
> warnings.
>
> $ make W=1 kernel/sys_ni.o
> [ snip ]
> CC kernel/sys_ni.o
> In file included from kernel/sys_ni.c:10:
> ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous prototype for '__x64_sys_io_setup' [-Wmissing-prototypes]
> 83 | __weak long __##abi##_##name(const struct pt_regs *__unused) \
> | ^~
> ./arch/x86/include/asm/syscall_wrapper.h:100:2: note: in expansion of macro '__COND_SYSCALL'
> 100 | __COND_SYSCALL(x64, sys_##name)
> | ^~~~~~~~~~~~~~
> ./arch/x86/include/asm/syscall_wrapper.h:256:2: note: in expansion of macro '__X64_COND_SYSCALL'
> 256 | __X64_COND_SYSCALL(name) \
> | ^~~~~~~~~~~~~~~~~~
> kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
> 39 | COND_SYSCALL(io_setup);
> | ^~~~~~~~~~~~
> ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous prototype for '__ia32_sys_io_setup' [-Wmissing-prototypes]
> 83 | __weak long __##abi##_##name(const struct pt_regs *__unused) \
> | ^~
> ./arch/x86/include/asm/syscall_wrapper.h:120:2: note: in expansion of macro '__COND_SYSCALL'
> 120 | __COND_SYSCALL(ia32, sys_##name)
> | ^~~~~~~~~~~~~~
> ./arch/x86/include/asm/syscall_wrapper.h:257:2: note: in expansion of macro '__IA32_COND_SYSCALL'
> 257 | __IA32_COND_SYSCALL(name)
> | ^~~~~~~~~~~~~~~~~~~
> kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
> 39 | COND_SYSCALL(io_setup);
> | ^~~~~~~~~~~~
> ...
>
> __SYS_STUB0() and __SYS_STUBx() defined a few lines above have forward
> declarations. Let's do likewise for __COND_SYSCALL() to fix the
> warnings.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
> arch/x86/include/asm/syscall_wrapper.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
> index a84333adeef2..80c08c7d5e72 100644
> --- a/arch/x86/include/asm/syscall_wrapper.h
> +++ b/arch/x86/include/asm/syscall_wrapper.h
> @@ -80,6 +80,7 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
> }
>
> #define __COND_SYSCALL(abi, name) \
> + __weak long __##abi##_##name(const struct pt_regs *__unused); \
> __weak long __##abi##_##name(const struct pt_regs *__unused) \
> { \
> return sys_ni_syscall(); \
> --
> 2.27.0
>
--
Best Regards
Masahiro Yamada
^ permalink raw reply
* [PATCH 20/27] sh: syscalls: switch to generic syscalltbl.sh
From: Masahiro Yamada @ 2021-01-28 0:51 UTC (permalink / raw)
To: linux-arch, x86
Cc: linux-xtensa, linux-ia64, linux-parisc, linux-kbuild,
Masahiro Yamada, linux-sh, linux-um, linux-kernel, linux-mips,
linux-m68k, linux-alpha, sparclinux, linuxppc-dev,
linux-arm-kernel
In-Reply-To: <20210128005110.2613902-1-masahiroy@kernel.org>
As of v5.11-rc1, 12 architectures duplicate similar shell scripts in
order to generate syscall table headers. My goal is to unify them into
the single scripts/syscalltbl.sh.
This commit converts sh to use scripts/syscalltbl.sh.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---
arch/sh/kernel/syscalls/Makefile | 7 ++----
arch/sh/kernel/syscalls/syscalltbl.sh | 32 ---------------------------
2 files changed, 2 insertions(+), 37 deletions(-)
delete mode 100644 arch/sh/kernel/syscalls/syscalltbl.sh
diff --git a/arch/sh/kernel/syscalls/Makefile b/arch/sh/kernel/syscalls/Makefile
index 1c42d2d2926d..6610130c67bc 100644
--- a/arch/sh/kernel/syscalls/Makefile
+++ b/arch/sh/kernel/syscalls/Makefile
@@ -7,7 +7,7 @@ _dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') \
syscall := $(srctree)/$(src)/syscall.tbl
syshdr := $(srctree)/$(src)/syscallhdr.sh
-systbl := $(srctree)/$(src)/syscalltbl.sh
+systbl := $(srctree)/scripts/syscalltbl.sh
quiet_cmd_syshdr = SYSHDR $@
cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \
@@ -16,10 +16,7 @@ quiet_cmd_syshdr = SYSHDR $@
'$(syshdr_offset_$(basetarget))'
quiet_cmd_systbl = SYSTBL $@
- cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@' \
- '$(systbl_abis_$(basetarget))' \
- '$(systbl_abi_$(basetarget))' \
- '$(systbl_offset_$(basetarget))'
+ cmd_systbl = $(CONFIG_SHELL) $(systbl) $< $@
$(uapi)/unistd_32.h: $(syscall) $(syshdr) FORCE
$(call if_changed,syshdr)
diff --git a/arch/sh/kernel/syscalls/syscalltbl.sh b/arch/sh/kernel/syscalls/syscalltbl.sh
deleted file mode 100644
index 904b8e6e625d..000000000000
--- a/arch/sh/kernel/syscalls/syscalltbl.sh
+++ /dev/null
@@ -1,32 +0,0 @@
-#!/bin/sh
-# SPDX-License-Identifier: GPL-2.0
-
-in="$1"
-out="$2"
-my_abis=`echo "($3)" | tr ',' '|'`
-my_abi="$4"
-offset="$5"
-
-emit() {
- t_nxt="$1"
- t_nr="$2"
- t_entry="$3"
-
- while [ $t_nxt -lt $t_nr ]; do
- printf "__SYSCALL(%s,sys_ni_syscall)\n" "${t_nxt}"
- t_nxt=$((t_nxt+1))
- done
- printf "__SYSCALL(%s,%s)\n" "${t_nxt}" "${t_entry}"
-}
-
-grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
- nxt=0
- if [ -z "$offset" ]; then
- offset=0
- fi
-
- while read nr abi name entry ; do
- emit $((nxt+offset)) $((nr+offset)) $entry
- nxt=$((nr+1))
- done
-) > "$out"
--
2.27.0
^ permalink raw reply related
* [PATCH 16/27] mips: syscalls: switch to generic syscalltbl.sh
From: Masahiro Yamada @ 2021-01-28 0:50 UTC (permalink / raw)
To: linux-arch, x86
Cc: linux-xtensa, linux-ia64, linux-parisc, linux-kbuild,
Masahiro Yamada, linux-sh, linux-um, linux-kernel, linux-mips,
linux-m68k, linux-alpha, sparclinux, linuxppc-dev,
linux-arm-kernel
In-Reply-To: <20210128005110.2613902-1-masahiroy@kernel.org>
As of v5.11-rc1, 12 architectures duplicate similar shell scripts in
order to generate syscall table headers. My goal is to unify them into
the single scripts/syscalltbl.sh.
This commit converts mips to use scripts/syscalltbl.sh. This also
unifies syscall_table_32_o32.h and syscall_table_64_o32.h into
syscall_table_o32.h.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---
arch/mips/include/asm/Kbuild | 7 +++--
arch/mips/kernel/scall32-o32.S | 4 +--
arch/mips/kernel/scall64-n32.S | 3 +--
arch/mips/kernel/scall64-n64.S | 3 +--
arch/mips/kernel/scall64-o32.S | 4 +--
arch/mips/kernel/syscalls/Makefile | 34 ++++++++---------------
arch/mips/kernel/syscalls/syscalltbl.sh | 36 -------------------------
7 files changed, 20 insertions(+), 71 deletions(-)
delete mode 100644 arch/mips/kernel/syscalls/syscalltbl.sh
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 95b4fa7bd0d1..70b15857369d 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -1,9 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
# MIPS headers
-generated-y += syscall_table_32_o32.h
-generated-y += syscall_table_64_n32.h
-generated-y += syscall_table_64_n64.h
-generated-y += syscall_table_64_o32.h
+generated-y += syscall_table_n32.h
+generated-y += syscall_table_n64.h
+generated-y += syscall_table_o32.h
generic-y += export.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index b449b68662a9..84e8624e83a2 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -217,9 +217,9 @@ einval: li v0, -ENOSYS
#define sys_sched_getaffinity mipsmt_sys_sched_getaffinity
#endif /* CONFIG_MIPS_MT_FPAFF */
+#define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, native)
#define __SYSCALL(nr, entry) PTR entry
.align 2
.type sys_call_table, @object
EXPORT(sys_call_table)
-#include <asm/syscall_table_32_o32.h>
-#undef __SYSCALL
+#include <asm/syscall_table_o32.h>
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index 35d8c86b160e..f650c55a17dc 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -104,5 +104,4 @@ not_n32_scall:
#define __SYSCALL(nr, entry) PTR entry
.type sysn32_call_table, @object
EXPORT(sysn32_call_table)
-#include <asm/syscall_table_64_n32.h>
-#undef __SYSCALL
+#include <asm/syscall_table_n32.h>
diff --git a/arch/mips/kernel/scall64-n64.S b/arch/mips/kernel/scall64-n64.S
index 23b2e2b1609c..c71c13f9fcbc 100644
--- a/arch/mips/kernel/scall64-n64.S
+++ b/arch/mips/kernel/scall64-n64.S
@@ -113,5 +113,4 @@ illegal_syscall:
.align 3
.type sys_call_table, @object
EXPORT(sys_call_table)
-#include <asm/syscall_table_64_n64.h>
-#undef __SYSCALL
+#include <asm/syscall_table_n64.h>
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index 50c9a57e0d3a..cedc8bd88804 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -213,9 +213,9 @@ einval: li v0, -ENOSYS
jr ra
END(sys32_syscall)
+#define __SYSCALL_WITH_COMPAT(nr, native, compat) __SYSCALL(nr, compat)
#define __SYSCALL(nr, entry) PTR entry
.align 3
.type sys32_call_table,@object
EXPORT(sys32_call_table)
-#include <asm/syscall_table_64_o32.h>
-#undef __SYSCALL
+#include <asm/syscall_table_o32.h>
diff --git a/arch/mips/kernel/syscalls/Makefile b/arch/mips/kernel/syscalls/Makefile
index f15842bda464..265dab4253ab 100644
--- a/arch/mips/kernel/syscalls/Makefile
+++ b/arch/mips/kernel/syscalls/Makefile
@@ -10,7 +10,7 @@ syscalln64 := $(srctree)/$(src)/syscall_n64.tbl
syscallo32 := $(srctree)/$(src)/syscall_o32.tbl
syshdr := $(srctree)/$(src)/syscallhdr.sh
sysnr := $(srctree)/$(src)/syscallnr.sh
-systbl := $(srctree)/$(src)/syscalltbl.sh
+systbl := $(srctree)/scripts/syscalltbl.sh
quiet_cmd_syshdr = SYSHDR $@
cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \
@@ -25,10 +25,7 @@ quiet_cmd_sysnr = SYSNR $@
'$(sysnr_offset_$(basetarget))'
quiet_cmd_systbl = SYSTBL $@
- cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@' \
- '$(systbl_abis_$(basetarget))' \
- '$(systbl_abi_$(basetarget))' \
- '$(systbl_offset_$(basetarget))'
+ cmd_systbl = $(CONFIG_SHELL) $(systbl) $< $@ "" $(offset)
syshdr_offset_unistd_n32 := __NR_Linux
$(uapi)/unistd_n32.h: $(syscalln32) $(syshdr) FORCE
@@ -57,24 +54,16 @@ sysnr_offset_unistd_nr_o32 := 4000
$(uapi)/unistd_nr_o32.h: $(syscallo32) $(sysnr) FORCE
$(call if_changed,sysnr)
-systbl_abi_syscall_table_32_o32 := 32_o32
-systbl_offset_syscall_table_32_o32 := 4000
-$(kapi)/syscall_table_32_o32.h: $(syscallo32) $(systbl) FORCE
+$(kapi)/syscall_table_n32.h: offset := 6000
+$(kapi)/syscall_table_n32.h: $(syscalln32) $(systbl) FORCE
$(call if_changed,systbl)
-systbl_abi_syscall_table_64_n32 := 64_n32
-systbl_offset_syscall_table_64_n32 := 6000
-$(kapi)/syscall_table_64_n32.h: $(syscalln32) $(systbl) FORCE
+$(kapi)/syscall_table_n64.h: offset := 5000
+$(kapi)/syscall_table_n64.h: $(syscalln64) $(systbl) FORCE
$(call if_changed,systbl)
-systbl_abi_syscall_table_64_n64 := 64_n64
-systbl_offset_syscall_table_64_n64 := 5000
-$(kapi)/syscall_table_64_n64.h: $(syscalln64) $(systbl) FORCE
- $(call if_changed,systbl)
-
-systbl_abi_syscall_table_64_o32 := 64_o32
-systbl_offset_syscall_table_64_o32 := 4000
-$(kapi)/syscall_table_64_o32.h: $(syscallo32) $(systbl) FORCE
+$(kapi)/syscall_table_o32.h: offset := 4000
+$(kapi)/syscall_table_o32.h: $(syscallo32) $(systbl) FORCE
$(call if_changed,systbl)
uapisyshdr-y += unistd_n32.h \
@@ -83,10 +72,9 @@ uapisyshdr-y += unistd_n32.h \
unistd_nr_n32.h \
unistd_nr_n64.h \
unistd_nr_o32.h
-kapisyshdr-y += syscall_table_32_o32.h \
- syscall_table_64_n32.h \
- syscall_table_64_n64.h \
- syscall_table_64_o32.h
+kapisyshdr-y += syscall_table_n32.h \
+ syscall_table_n64.h \
+ syscall_table_o32.h
uapisyshdr-y := $(addprefix $(uapi)/, $(uapisyshdr-y))
kapisyshdr-y := $(addprefix $(kapi)/, $(kapisyshdr-y))
diff --git a/arch/mips/kernel/syscalls/syscalltbl.sh b/arch/mips/kernel/syscalls/syscalltbl.sh
deleted file mode 100644
index 1e2570740c20..000000000000
--- a/arch/mips/kernel/syscalls/syscalltbl.sh
+++ /dev/null
@@ -1,36 +0,0 @@
-#!/bin/sh
-# SPDX-License-Identifier: GPL-2.0
-
-in="$1"
-out="$2"
-my_abis=`echo "($3)" | tr ',' '|'`
-my_abi="$4"
-offset="$5"
-
-emit() {
- t_nxt="$1"
- t_nr="$2"
- t_entry="$3"
-
- while [ $t_nxt -lt $t_nr ]; do
- printf "__SYSCALL(%s,sys_ni_syscall)\n" "${t_nxt}"
- t_nxt=$((t_nxt+1))
- done
- printf "__SYSCALL(%s,%s)\n" "${t_nxt}" "${t_entry}"
-}
-
-grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
- nxt=0
- if [ -z "$offset" ]; then
- offset=0
- fi
-
- while read nr abi name entry compat ; do
- if [ "$my_abi" = "64_o32" ] && [ ! -z "$compat" ]; then
- emit $((nxt+offset)) $((nr+offset)) $compat
- else
- emit $((nxt+offset)) $((nr+offset)) $entry
- fi
- nxt=$((nr+1))
- done
-) > "$out"
--
2.27.0
^ permalink raw reply related
* [PATCH 15/27] mips: add missing FORCE and fix 'targets' to make if_changed work
From: Masahiro Yamada @ 2021-01-28 0:50 UTC (permalink / raw)
To: linux-arch, x86
Cc: linux-xtensa, linux-ia64, linux-parisc, linux-kbuild,
Masahiro Yamada, linux-sh, linux-um, linux-kernel, linux-mips,
linux-m68k, linux-alpha, sparclinux, linuxppc-dev,
linux-arm-kernel
In-Reply-To: <20210128005110.2613902-1-masahiroy@kernel.org>
The rules in this Makefile cannot detect the command line change because
the prerequisite 'FORCE' is missing.
Adding 'FORCE' will result in the headers being rebuilt every time
because the 'targets' addition is also wrong; the file paths in
'targets' must be relative to the current Makefile.
Fix all of them so the if_changed rules work correctly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---
arch/mips/kernel/syscalls/Makefile | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/arch/mips/kernel/syscalls/Makefile b/arch/mips/kernel/syscalls/Makefile
index 6efb2f6889a7..f15842bda464 100644
--- a/arch/mips/kernel/syscalls/Makefile
+++ b/arch/mips/kernel/syscalls/Makefile
@@ -31,50 +31,50 @@ quiet_cmd_systbl = SYSTBL $@
'$(systbl_offset_$(basetarget))'
syshdr_offset_unistd_n32 := __NR_Linux
-$(uapi)/unistd_n32.h: $(syscalln32) $(syshdr)
+$(uapi)/unistd_n32.h: $(syscalln32) $(syshdr) FORCE
$(call if_changed,syshdr)
syshdr_offset_unistd_n64 := __NR_Linux
-$(uapi)/unistd_n64.h: $(syscalln64) $(syshdr)
+$(uapi)/unistd_n64.h: $(syscalln64) $(syshdr) FORCE
$(call if_changed,syshdr)
syshdr_offset_unistd_o32 := __NR_Linux
-$(uapi)/unistd_o32.h: $(syscallo32) $(syshdr)
+$(uapi)/unistd_o32.h: $(syscallo32) $(syshdr) FORCE
$(call if_changed,syshdr)
sysnr_pfx_unistd_nr_n32 := N32
sysnr_offset_unistd_nr_n32 := 6000
-$(uapi)/unistd_nr_n32.h: $(syscalln32) $(sysnr)
+$(uapi)/unistd_nr_n32.h: $(syscalln32) $(sysnr) FORCE
$(call if_changed,sysnr)
sysnr_pfx_unistd_nr_n64 := 64
sysnr_offset_unistd_nr_n64 := 5000
-$(uapi)/unistd_nr_n64.h: $(syscalln64) $(sysnr)
+$(uapi)/unistd_nr_n64.h: $(syscalln64) $(sysnr) FORCE
$(call if_changed,sysnr)
sysnr_pfx_unistd_nr_o32 := O32
sysnr_offset_unistd_nr_o32 := 4000
-$(uapi)/unistd_nr_o32.h: $(syscallo32) $(sysnr)
+$(uapi)/unistd_nr_o32.h: $(syscallo32) $(sysnr) FORCE
$(call if_changed,sysnr)
systbl_abi_syscall_table_32_o32 := 32_o32
systbl_offset_syscall_table_32_o32 := 4000
-$(kapi)/syscall_table_32_o32.h: $(syscallo32) $(systbl)
+$(kapi)/syscall_table_32_o32.h: $(syscallo32) $(systbl) FORCE
$(call if_changed,systbl)
systbl_abi_syscall_table_64_n32 := 64_n32
systbl_offset_syscall_table_64_n32 := 6000
-$(kapi)/syscall_table_64_n32.h: $(syscalln32) $(systbl)
+$(kapi)/syscall_table_64_n32.h: $(syscalln32) $(systbl) FORCE
$(call if_changed,systbl)
systbl_abi_syscall_table_64_n64 := 64_n64
systbl_offset_syscall_table_64_n64 := 5000
-$(kapi)/syscall_table_64_n64.h: $(syscalln64) $(systbl)
+$(kapi)/syscall_table_64_n64.h: $(syscalln64) $(systbl) FORCE
$(call if_changed,systbl)
systbl_abi_syscall_table_64_o32 := 64_o32
systbl_offset_syscall_table_64_o32 := 4000
-$(kapi)/syscall_table_64_o32.h: $(syscallo32) $(systbl)
+$(kapi)/syscall_table_64_o32.h: $(syscallo32) $(systbl) FORCE
$(call if_changed,systbl)
uapisyshdr-y += unistd_n32.h \
@@ -88,9 +88,10 @@ kapisyshdr-y += syscall_table_32_o32.h \
syscall_table_64_n64.h \
syscall_table_64_o32.h
-targets += $(uapisyshdr-y) $(kapisyshdr-y)
+uapisyshdr-y := $(addprefix $(uapi)/, $(uapisyshdr-y))
+kapisyshdr-y := $(addprefix $(kapi)/, $(kapisyshdr-y))
+targets += $(addprefix ../../../../, $(uapisyshdr-y) $(kapisyshdr-y))
PHONY += all
-all: $(addprefix $(uapi)/,$(uapisyshdr-y))
-all: $(addprefix $(kapi)/,$(kapisyshdr-y))
+all: $(uapisyshdr-y) $(kapisyshdr-y)
@:
--
2.27.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox