* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
From: Michal Hocko @ 2019-04-25 12:14 UTC (permalink / raw)
To: Matthew Garrett; +Cc: linux-mm, linux-kernel, Matthew Garrett, linux-api
In-Reply-To: <20190424211038.204001-1-matthewgarrett@google.com>
Please cc linux-api for user visible API proposals (now done). Keep the
rest of the email intact for reference.
On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> From: Matthew Garrett <mjg59@google.com>
>
> Applications that hold secrets and wish to avoid them leaking can use
> mlock() to prevent the page from being pushed out to swap and
> MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> can also use atexit() handlers to overwrite secrets on application exit.
> However, if an attacker can reboot the system into another OS, they can
> dump the contents of RAM and extract secrets. We can avoid this by setting
> CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> firmware wipe the contents of RAM before booting another OS, but this means
> rebooting takes a *long* time - the expected behaviour is for a clean
> shutdown to remove the request after scrubbing secrets from RAM in order to
> avoid this.
>
> Unfortunately, if an application exits uncleanly, its secrets may still be
> present in RAM. This can't be easily fixed in userland (eg, if the OOM
> killer decides to kill a process holding secrets, we're not going to be able
> to avoid that), so this patch adds a new flag to madvise() to allow userland
> to request that the kernel clear the covered pages whenever the page
> reference count hits zero. Since vm_flags is already full on 32-bit, it
> will only work on 64-bit systems.
>
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> ---
>
> Modified to wipe when the VMA is released rather than on page freeing
>
> include/linux/mm.h | 6 ++++++
> include/uapi/asm-generic/mman-common.h | 2 ++
> mm/madvise.c | 21 +++++++++++++++++++++
> mm/memory.c | 3 +++
> 4 files changed, 32 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6b10c21630f5..64bdab679275 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -257,6 +257,8 @@ extern unsigned int kobjsize(const void *objp);
> #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
> #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
> #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
> +
> +#define VM_WIPEONRELEASE BIT(37) /* Clear pages when releasing them */
> #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
>
> #ifdef CONFIG_ARCH_HAS_PKEYS
> @@ -298,6 +300,10 @@ extern unsigned int kobjsize(const void *objp);
> # define VM_GROWSUP VM_NONE
> #endif
>
> +#ifndef VM_WIPEONRELEASE
> +# define VM_WIPEONRELEASE VM_NONE
> +#endif
> +
> /* Bits set in the VMA until the stack is in its final location */
> #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
>
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index abd238d0f7a4..82dfff4a8e3d 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -64,6 +64,8 @@
> #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
> #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
>
> +#define MADV_WIPEONRELEASE 20
> +#define MADV_DONTWIPEONRELEASE 21
> /* compatibility flags */
> #define MAP_FILE 0
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 21a7881a2db4..989c2fde15cf 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
> case MADV_KEEPONFORK:
> new_flags &= ~VM_WIPEONFORK;
> break;
> + case MADV_WIPEONRELEASE:
> + /* MADV_WIPEONRELEASE is only supported on anonymous memory. */
> + if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> + vma->vm_flags & VM_SHARED) {
> + error = -EINVAL;
> + goto out;
> + }
> + new_flags |= VM_WIPEONRELEASE;
> + break;
> + case MADV_DONTWIPEONRELEASE:
> + if (VM_WIPEONRELEASE == 0) {
> + error = -EINVAL;
> + goto out;
> + }
> + new_flags &= ~VM_WIPEONRELEASE;
> + break;
> case MADV_DONTDUMP:
> new_flags |= VM_DONTDUMP;
> break;
> @@ -727,6 +743,8 @@ madvise_behavior_valid(int behavior)
> case MADV_DODUMP:
> case MADV_WIPEONFORK:
> case MADV_KEEPONFORK:
> + case MADV_WIPEONRELEASE:
> + case MADV_DONTWIPEONRELEASE:
> #ifdef CONFIG_MEMORY_FAILURE
> case MADV_SOFT_OFFLINE:
> case MADV_HWPOISON:
> @@ -785,6 +803,9 @@ madvise_behavior_valid(int behavior)
> * MADV_DONTDUMP - the application wants to prevent pages in the given range
> * from being included in its core dump.
> * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump.
> + * MADV_WIPEONRELEASE - clear the contents of the memory after the last
> + * reference to it has been released
> + * MADV_DONTWIPEONRELEASE - cancel MADV_WIPEONRELEASE
> *
> * return values:
> * zero - success
> diff --git a/mm/memory.c b/mm/memory.c
> index ab650c21bccd..ff78b527660e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1091,6 +1091,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> page_remove_rmap(page, false);
> if (unlikely(page_mapcount(page) < 0))
> print_bad_pte(vma, addr, ptent, page);
> + if (unlikely(vma->vm_flags & VM_WIPEONRELEASE) &&
> + page_mapcount(page) == 0)
> + clear_highpage(page);
> if (unlikely(__tlb_remove_page(tlb, page))) {
> force_flush = 1;
> addr += PAGE_SIZE;
> --
> 2.21.0.593.g511ec345e18-goog
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [RFC PATCH v6 22/26] x86/cet/shstk: ELF header parsing of Shadow Stack
From: Dave Martin @ 2019-04-25 11:02 UTC (permalink / raw)
To: Yu-cheng Yu
Cc: x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
linux-doc, linux-mm, linux-arch, linux-api, Arnd Bergmann,
Andy Lutomirski, Balbir Singh, Cyrill Gorcunov, Dave Hansen,
Eugene Syromiatnikov, Florian Weimer, H.J. Lu, Jann Horn,
Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit,
Oleg Nesterov, Pa
In-Reply-To: <20181119214809.6086-23-yu-cheng.yu@intel.com>
On Mon, Nov 19, 2018 at 01:48:05PM -0800, Yu-cheng Yu wrote:
> Look in .note.gnu.property of an ELF file and check if Shadow Stack needs
> to be enabled for the task.
What's the status of this series? I don't see anything in linux-next
yet.
For describing ELF features, Arm has recently adopted
NT_GNU_PROPERTY_TYPE_0, with properties closely modelled on
GNU_PROPERTY_X86_FEATURE_1_AND etc. [1]
So, arm64 will be need something like this patch for supporting new
features (such as the Branch Target Identification feature of ARMv8.5-A
[2]).
If this series isn't likely to merge soon, can we split this patch into
generic and x86-specific parts and handle them separately?
It would be good to see the generic ELF note parsing move to common
code -- I'll take a look and comment in more detail.
[...]
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 69c0f892e310..557ed0ba71c7 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -381,4 +381,9 @@ struct va_alignment {
>
> extern struct va_alignment va_align;
> extern unsigned long align_vdso_addr(unsigned long);
> +
> +#ifdef CONFIG_ARCH_HAS_PROGRAM_PROPERTIES
> +extern int arch_setup_features(void *ehdr, void *phdr, struct file *file,
> + bool interp);
> +#endif
> #endif /* _ASM_X86_ELF_H */
> diff --git a/arch/x86/include/uapi/asm/elf_property.h b/arch/x86/include/uapi/asm/elf_property.h
> new file mode 100644
> index 000000000000..af361207718c
> --- /dev/null
> +++ b/arch/x86/include/uapi/asm/elf_property.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _UAPI_ASM_X86_ELF_PROPERTY_H
> +#define _UAPI_ASM_X86_ELF_PROPERTY_H
> +
> +/*
> + * pr_type
> + */
> +#define GNU_PROPERTY_X86_FEATURE_1_AND (0xc0000002)
> +
> +/*
> + * Bits for GNU_PROPERTY_X86_FEATURE_1_AND
> + */
> +#define GNU_PROPERTY_X86_FEATURE_1_SHSTK (0x00000002)
> +
Generally we seem to collect all ELF definitions in <linux/uapi/elf.h>,
including arch-specific ones.
Is a new header really needed here?
[...]
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 54207327f98f..007ff0fbae84 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1081,6 +1081,21 @@ static int load_elf_binary(struct linux_binprm *bprm)
> goto out_free_dentry;
> }
>
> +#ifdef CONFIG_ARCH_HAS_PROGRAM_PROPERTIES
> + if (interpreter) {
> + retval = arch_setup_features(&loc->interp_elf_ex,
> + interp_elf_phdata,
> + interpreter, true);
Can we dummy no-op functions in the common headers to avoid this
ifdeffery? Logically all arches will always do this step, even if it's
a no-op today.
> + } else {
> + retval = arch_setup_features(&loc->elf_ex,
> + elf_phdata,
> + bprm->file, false);
> + }
> +
> + if (retval < 0)
> + goto out_free_dentry;
> +#endif
> +
> if (elf_interpreter) {
> unsigned long interp_map_addr = 0;
>
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index c5358e0ae7c5..5ef25a565e88 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -372,6 +372,7 @@ typedef struct elf64_shdr {
> #define NT_PRFPREG 2
> #define NT_PRPSINFO 3
> #define NT_TASKSTRUCT 4
> +#define NT_GNU_PROPERTY_TYPE_0 5
IIUC, note type codes are namespaced by the note name. This section
currently only seems to have codes for name == "LINUX".
There are conflicts: for example NT_GNU_ABI_TAG == NT_PRSTATUS.
We should probably split out the codes for name == "GNU" into a separate
list, otherwise people are likely to get confused.
As noted above, can the GNU_PRPOERTY_<arch>_* definitions just go in
here instead of a separate header?
[...]
Cheers
---Dave
[1] https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q1-documentation
[2] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Thomas Gleixner @ 2019-04-25 10:50 UTC (permalink / raw)
To: Enrico Weigelt, metux IT consult
Cc: Li, Aubrey, mingo, peterz, hpa, ak, tim.c.chen, dave.hansen,
arjan, adobriyan, akpm, aubrey.li, linux-api, linux-kernel,
Andy Lutomirski
In-Reply-To: <ee1620d5-9650-5bfa-c458-fa2d051522b1@metux.net>
On Thu, 25 Apr 2019, Enrico Weigelt, metux IT consult wrote:
> On 25.04.19 12:42, Li, Aubrey wrote:
> >
> > Yep, I'll make it disabled by default and not switchable and let arch select it.
> >
>
> That's not quite what I've suggested. Instead:
>
> #1: make the switch depend on the arch's that support it
No. That's what select is for.
> #2: still leave it selectable to the user, so somebody who doesn't need
> it, can just disable it.
Well, the number of knobs is exploding over time and the number of people
actually tweaking them is close to 0. So no, we don't want to have the
extra tunable for everything and the world.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Enrico Weigelt, metux IT consult @ 2019-04-25 10:46 UTC (permalink / raw)
To: Li, Aubrey, Thomas Gleixner
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <4456cc68-5120-e820-90ed-cb5b5dfcd760@linux.intel.com>
On 25.04.19 12:42, Li, Aubrey wrote:
>
> Yep, I'll make it disabled by default and not switchable and let arch select it.
>
That's not quite what I've suggested. Instead:
#1: make the switch depend on the arch's that support it
#2: still leave it selectable to the user, so somebody who doesn't need
it, can just disable it.
(I, personally, tend to disable everything I don't need)
--mtx
--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Li, Aubrey @ 2019-04-25 10:42 UTC (permalink / raw)
To: Enrico Weigelt, metux IT consult, Thomas Gleixner
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <6a9e6bd9-8e1a-9fe5-135d-c7a49697b5a4@metux.net>
On 2019/4/25 18:11, Enrico Weigelt, metux IT consult wrote:
> On 25.04.19 03:50, Li, Aubrey wrote:
>
>>>> +>>> +config PROC_PID_ARCH_STATUS>>> + bool "Enable
> /proc/<pid>/arch_status file">>>> Why is this switchable? x86 selects it
> if PROC_FS is enabled and all other>> architectures are absolutely not
> interested in this.> > Above and this, I was trying to avoid an empty
> arch_file on other architectures.> In previous proposal the entry only
> exists on the platform with AVX512.
> Why not making it depend on those archs that actually support it ?
Yep, I'll make it disabled by default and not switchable and let arch select it.
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Enrico Weigelt, metux IT consult @ 2019-04-25 10:40 UTC (permalink / raw)
To: Thomas Gleixner, Aubrey Li
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <alpine.DEB.2.21.1904242310040.1762@nanos.tec.linutronix.de>
On 24.04.19 23:18, Thomas Gleixner wrote:
Hi,
>> +config PROC_PID_ARCH_STATUS
>> + bool "Enable /proc/<pid>/arch_status file"
>
> Why is this switchable? x86 selects it if PROC_FS is enabled and all other
> architectures are absolutely not interested in this.
IMHO, it's good to have a switch, but that way doesn't make much sense.
Instead, I'd do it the other way round: make that switch depending on
those archs that actually support it. Something like this:
config PROC_PID_ARCH_STATUS
bool "Enable /proc/<pid>/arch_status file"
depends on PROC_FS
depends on BROKEN
When x86 comes in, it would change to:
config PROC_PID_ARCH_STATUS
bool "Enable /proc/<pid>/arch_status file"
depends on PROC_FS
depends on X86
And later arm coming in:
config PROC_PID_ARCH_STATUS
bool "Enable /proc/<pid>/arch_status file"
depends on PROC_FS
depends on X86 || ARM
>> + default n
>> + help
>> + Provides a way to examine process architecture specific information.
>> + See <file:Documentation/filesystems/proc.txt> for more information.
>
> Which contains zero information about this file when only this patch is
> applied.
hmm, the patch alone doesn't do anything useful anyway. it only becomes
useful with subsequent patches that add some arch. I wonder if there's
anything more useful to document at that point.
>> +/*
>> + * Add support for task architecture specific output in /proc/pid/arch_status.
>> + * task_arch_status() must be defined in asm/processor.h
>> + */
>> +#ifdef CONFIG_PROC_PID_ARCH_STATUS
>> +# ifndef task_arch_status
>> +# define task_arch_status(m, task)
>> +# endif
>
> What exactly is the point of this macro mess? If an architecture selects
> CONFIG_PROC_PID_ARCH_STATUS then it has to provide proc_task_arch_status()
> and the prototype should be in include/linux/proc_fs.h.
ACK.
>> +static int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
>> + struct pid *pid, struct task_struct *task)
>> +{
>> + task_arch_status(m, task);
>> + return 0;
>> +}
Is that wrapper really neeeded ?
--mtx
--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Enrico Weigelt, metux IT consult @ 2019-04-25 10:11 UTC (permalink / raw)
To: Li, Aubrey, Thomas Gleixner
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <da1fcb4f-e856-d31f-7c72-76c969af3935@linux.intel.com>
On 25.04.19 03:50, Li, Aubrey wrote:
>>> +>>> +config PROC_PID_ARCH_STATUS>>> + bool "Enable
/proc/<pid>/arch_status file">>>> Why is this switchable? x86 selects it
if PROC_FS is enabled and all other>> architectures are absolutely not
interested in this.> > Above and this, I was trying to avoid an empty
arch_file on other architectures.> In previous proposal the entry only
exists on the platform with AVX512.
Why not making it depend on those archs that actually support it ?
> In that way proc_task_arch_status() should be defined in asm/processor.h,> but proc_task_arch_status() has four parameters, I don't want
unnecessary> "struct pid_namespace *ns" and "struct pid *pid" leaked
into arch headers,> so I defined task_arch_status(m, task) to avoid that.
--mtx
--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Li, Aubrey @ 2019-04-25 8:24 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <alpine.DEB.2.21.1904251019520.1960@nanos.tec.linutronix.de>
On 2019/4/25 16:20, Thomas Gleixner wrote:
> On Thu, 25 Apr 2019, Li, Aubrey wrote:
>> On 2019/4/25 15:20, Thomas Gleixner wrote:
>>> Let the arch select CONFIG_PROC_PID_ARCH_STATUS
>>
>> Sorry, I didn't get the point here, above you mentioned not mixing arch and proc code
>> and not enabling this on x86 right away, then how to let x86 select it?
>>
>
> By using 'select PROC_PID_ARCH_STATUS' in the arch/xxxx/Kconfig file in the
> patch which implements the required arch function perhaps?
Oh,oh, I misunderstood, thanks for the clarification!
Thanks,
-Aubrey
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Thomas Gleixner @ 2019-04-25 8:20 UTC (permalink / raw)
To: Li, Aubrey
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <1d7954b3-32e9-b3ac-1157-6b9c29ddbfe7@linux.intel.com>
On Thu, 25 Apr 2019, Li, Aubrey wrote:
> On 2019/4/25 15:20, Thomas Gleixner wrote:
> > Let the arch select CONFIG_PROC_PID_ARCH_STATUS
>
> Sorry, I didn't get the point here, above you mentioned not mixing arch and proc code
> and not enabling this on x86 right away, then how to let x86 select it?
>
By using 'select PROC_PID_ARCH_STATUS' in the arch/xxxx/Kconfig file in the
patch which implements the required arch function perhaps?
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Li, Aubrey @ 2019-04-25 8:12 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <alpine.DEB.2.21.1904250917400.1762@nanos.tec.linutronix.de>
On 2019/4/25 15:20, Thomas Gleixner wrote:
> On Thu, 25 Apr 2019, Li, Aubrey wrote:
>
>> On 2019/4/25 5:18, Thomas Gleixner wrote:
>>> On Mon, 22 Apr 2019, Aubrey Li wrote:
>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>> index 5ad92419be19..d5a9c5ddd453 100644
>>>> --- a/arch/x86/Kconfig
>>>> +++ b/arch/x86/Kconfig
>>>> @@ -208,6 +208,7 @@ config X86
>>>> select USER_STACKTRACE_SUPPORT
>>>> select VIRT_TO_BUS
>>>> select X86_FEATURE_NAMES if PROC_FS
>>>> + select PROC_PID_ARCH_STATUS if PROC_FS
>>>
>>> Can you please stop mixing arch and proc code? There is no point in
>>> enabling this on x86 right away.
>>>
>>>> +
>>>> +config PROC_PID_ARCH_STATUS
>>>> + bool "Enable /proc/<pid>/arch_status file"
>>>
>>> Why is this switchable? x86 selects it if PROC_FS is enabled and all other
>>> architectures are absolutely not interested in this.
>>
>> Above and this, I was trying to avoid an empty arch_file on other architectures.
>> In previous proposal the entry only exists on the platform with AVX512.
>
> What's the benefit of having a conditional enabled empty file for all other
> architectures? Nothing AFAICT.
>
>>>> +/*
>>>> + * Add support for task architecture specific output in /proc/pid/arch_status.
>>>> + * task_arch_status() must be defined in asm/processor.h
>>>> + */
>>>> +#ifdef CONFIG_PROC_PID_ARCH_STATUS
>>>> +# ifndef task_arch_status
>>>> +# define task_arch_status(m, task)
>>>> +# endif
>>>
>>> What exactly is the point of this macro mess? If an architecture selects
>>> CONFIG_PROC_PID_ARCH_STATUS then it has to provide proc_task_arch_status()
>>> and the prototype should be in include/linux/proc_fs.h.
>>
>> I was trying to address Andy's last comments. If we have the prototype in
>> include/linux/proc_fs.h, we'll have a weak function definition in fs/proc/array.c,
>> which bloats other architectures.
>>
>> In that way proc_task_arch_status() should be defined in asm/processor.h,
>> but proc_task_arch_status() has four parameters, I don't want unnecessary
>> "struct pid_namespace *ns" and "struct pid *pid" leaked into arch headers,
>> so I defined task_arch_status(m, task) to avoid that.
>
> This #define mess is ugly and pointless.
> Let the arch select CONFIG_PROC_PID_ARCH_STATUS
Sorry, I didn't get the point here, above you mentioned not mixing arch and proc code
and not enabling this on x86 right away, then how to let x86 select it?
Thanks,
-Aubrey
>and if it selects it it has to provide the function. No waek function required at all.
>
> Thanks,
>
> tglx
>
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Thomas Gleixner @ 2019-04-25 7:20 UTC (permalink / raw)
To: Li, Aubrey
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <da1fcb4f-e856-d31f-7c72-76c969af3935@linux.intel.com>
On Thu, 25 Apr 2019, Li, Aubrey wrote:
> On 2019/4/25 5:18, Thomas Gleixner wrote:
> > On Mon, 22 Apr 2019, Aubrey Li wrote:
> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> >> index 5ad92419be19..d5a9c5ddd453 100644
> >> --- a/arch/x86/Kconfig
> >> +++ b/arch/x86/Kconfig
> >> @@ -208,6 +208,7 @@ config X86
> >> select USER_STACKTRACE_SUPPORT
> >> select VIRT_TO_BUS
> >> select X86_FEATURE_NAMES if PROC_FS
> >> + select PROC_PID_ARCH_STATUS if PROC_FS
> >
> > Can you please stop mixing arch and proc code? There is no point in
> > enabling this on x86 right away.
> >
> >> +
> >> +config PROC_PID_ARCH_STATUS
> >> + bool "Enable /proc/<pid>/arch_status file"
> >
> > Why is this switchable? x86 selects it if PROC_FS is enabled and all other
> > architectures are absolutely not interested in this.
>
> Above and this, I was trying to avoid an empty arch_file on other architectures.
> In previous proposal the entry only exists on the platform with AVX512.
What's the benefit of having a conditional enabled empty file for all other
architectures? Nothing AFAICT.
> >> +/*
> >> + * Add support for task architecture specific output in /proc/pid/arch_status.
> >> + * task_arch_status() must be defined in asm/processor.h
> >> + */
> >> +#ifdef CONFIG_PROC_PID_ARCH_STATUS
> >> +# ifndef task_arch_status
> >> +# define task_arch_status(m, task)
> >> +# endif
> >
> > What exactly is the point of this macro mess? If an architecture selects
> > CONFIG_PROC_PID_ARCH_STATUS then it has to provide proc_task_arch_status()
> > and the prototype should be in include/linux/proc_fs.h.
>
> I was trying to address Andy's last comments. If we have the prototype in
> include/linux/proc_fs.h, we'll have a weak function definition in fs/proc/array.c,
> which bloats other architectures.
>
> In that way proc_task_arch_status() should be defined in asm/processor.h,
> but proc_task_arch_status() has four parameters, I don't want unnecessary
> "struct pid_namespace *ns" and "struct pid *pid" leaked into arch headers,
> so I defined task_arch_status(m, task) to avoid that.
This #define mess is ugly and pointless. Let the arch select
CONFIG_PROC_PID_ARCH_STATUS and if it selects it it has to provide the
function. No waek function required at all.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Li, Aubrey @ 2019-04-25 1:50 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <alpine.DEB.2.21.1904242310040.1762@nanos.tec.linutronix.de>
On 2019/4/25 5:18, Thomas Gleixner wrote:
> On Mon, 22 Apr 2019, Aubrey Li wrote:
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 5ad92419be19..d5a9c5ddd453 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -208,6 +208,7 @@ config X86
>> select USER_STACKTRACE_SUPPORT
>> select VIRT_TO_BUS
>> select X86_FEATURE_NAMES if PROC_FS
>> + select PROC_PID_ARCH_STATUS if PROC_FS
>
> Can you please stop mixing arch and proc code? There is no point in
> enabling this on x86 right away.
>
>> +
>> +config PROC_PID_ARCH_STATUS
>> + bool "Enable /proc/<pid>/arch_status file"
>
> Why is this switchable? x86 selects it if PROC_FS is enabled and all other
> architectures are absolutely not interested in this.
Above and this, I was trying to avoid an empty arch_file on other architectures.
In previous proposal the entry only exists on the platform with AVX512.
>
>> + default n
>> + help
>> + Provides a way to examine process architecture specific information.
>> + See <file:Documentation/filesystems/proc.txt> for more information.
>
> Which contains zero information about this file when only this patch is
> applied.
>
>> @@ -94,6 +94,7 @@
>> #include <linux/sched/debug.h>
>> #include <linux/sched/stat.h>
>> #include <linux/posix-timers.h>
>> +#include <linux/processor.h>
>
> That include is required because it does NOT contain anything useful for
> this, right?
>
>> +/*
>> + * Add support for task architecture specific output in /proc/pid/arch_status.
>> + * task_arch_status() must be defined in asm/processor.h
>> + */
>> +#ifdef CONFIG_PROC_PID_ARCH_STATUS
>> +# ifndef task_arch_status
>> +# define task_arch_status(m, task)
>> +# endif
>
> What exactly is the point of this macro mess? If an architecture selects
> CONFIG_PROC_PID_ARCH_STATUS then it has to provide proc_task_arch_status()
> and the prototype should be in include/linux/proc_fs.h.
I was trying to address Andy's last comments. If we have the prototype in
include/linux/proc_fs.h, we'll have a weak function definition in fs/proc/array.c,
which bloats other architectures.
In that way proc_task_arch_status() should be defined in asm/processor.h,
but proc_task_arch_status() has four parameters, I don't want unnecessary
"struct pid_namespace *ns" and "struct pid *pid" leaked into arch headers,
so I defined task_arch_status(m, task) to avoid that.
>
>> +static int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
>> + struct pid *pid, struct task_struct *task)
>> +{
>> + task_arch_status(m, task);
>> + return 0;
>> +}
>> +#endif /* CONFIG_PROC_PID_ARCH_STATUS */
>
> Thanks,
>
> tglx
>
Thanks,
-Aubrey
^ permalink raw reply
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
From: Maciej W. Rozycki @ 2019-04-25 0:41 UTC (permalink / raw)
To: Paul Burton
Cc: Mathieu Desnoyers, Carlos O'Donell, Will Deacon, Boqun Feng,
heiko carstens, gor, schwidefsky, Russell King, ARM Linux,
Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos,
Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
Thomas Gleixner, Ben Maurer, Peter Zijlstra,
"Paul E. McKenney" <paulmc>
In-Reply-To: <20190424231303.zu2irxd5g3v7yqey@pburton-laptop>
On Wed, 24 Apr 2019, Paul Burton wrote:
> > > Any idea why 0x7273 is not accepted by my assembler ?
>
> I don't know why the assembler wants a smaller immediate than the
> instruction encoding allows... There's a comment in the binutils file
> include/opcode/mips.h that reads:
>
> > A breakpoint instruction uses OP, CODE and SPEC (10 bits of the
> > breakpoint instruction are not defined; Kane says the breakpoint code
> > field in BREAK is 20 bits; yet MIPS assemblers and debuggers only use
> > ten bits). An optional two-operand form of break/sdbbp allows the
> > lower ten bits to be set too, and MIPS32 and later architectures allow
> > 20 bits to be set with a signal operand (using CODE20).
>
> I suspect there's some history here that predates my involvement (or
> possibly just predates me).
A useful explanation is in the Linux kernel (always good to look there),
in arch/mips/kernel/traps.c:
/*
* There is the ancient bug in the MIPS assemblers that the break
* code starts left to bit 16 instead to bit 6 in the opcode.
* Gas is bug-compatible, but not always, grrr...
* We handle both cases with a simple heuristics. --macro
*/
Unfortunately the bug has been carried over to the microMIPS instruction
encoding in libopcodes for no reason (i.e. likely by copying the table
mechanically without analysing it) and I didn't catch it when upstreaming.
We should have permitted setting all bits in the 20-bit code field in the
microMIPS encoding with a single operand, but you need two, like with the
regular MIPS instruction set.
The note on the MIPS32 assembly ISA permitting to set all the 20 bits
with a single operand is a stale comment referring to the situation before
binutils commit 1586d91e32ea ("/ 0 should send SIGFPE not SIGTRAP..."),
<https://sourceware.org/ml/binutils/2004-07/msg00260.html>, which
addressed a user ABI compatibility issue as discussed upthread here:
<https://sourceware.org/ml/binutils/2004-06/msg00188.html> and previously:
<https://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=40C9F5A4.2050606%40avtrex.com>.
As this is my mistake with the stale note, I have applied a fix to
binutils now, commit cd0923370be1 ("MIPS/include: opcode/mips.h: Update
stale comment for CODE20 operand"), so that it is clear that it is only
SDBBP that accepts a single 20-bit operand for the code field (for the
MIPS32 and later ISAs).
FWIW,
Maciej
^ permalink raw reply
* Re: [RFC PATCH for 5.2 10/10] rseq/selftests: mips: use break instruction for RSEQ_SIG
From: Mathieu Desnoyers @ 2019-04-24 23:22 UTC (permalink / raw)
To: Paul Burton
Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng, linux-kernel,
linux-api, Thomas Gleixner, Andy Lutomirski, Dave Watson,
Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
Josh Triplett, Linus Torvalds, Catalin Marinas, Will
In-Reply-To: <20190424231717.os4p6fq7fbx6afxa@pburton-laptop>
----- On Apr 24, 2019, at 7:17 PM, Paul Burton paul.burton@mips.com wrote:
> Hi Mathieu,
>
> On Wed, Apr 24, 2019 at 07:12:03PM -0400, Mathieu Desnoyers wrote:
>> Does the following comment above the forest of #ifdef work for you ?
>>
>> /*
>> * RSEQ_SIG uses the break instruction. The instruction pattern is:
>> *
>> * On MIPS:
>> * 0350000d break 0x350
>> *
>> * On nanoMIPS32:
>> * 00100350 break 0x350
>> *
>> * On microMIPS:
>> * 0000d407 break 0x350
>> *
>> * For nanoMIPS32 and microMIPS, the instruction stream is encoded as 16-bit
>> * halfwords, so the signature halfwords need to be swapped accordingly for
>> * little-endian.
>> */
>
> I'd probably just say nanoMIPS rather than nanoMIPS32, because when we
> get nanoMIPS64 in the future it'll be a superset of nanoMIPS32 & the
> break encoding will be the same.
Done.
>
> But otherwise it looks good to me :)
Great! I've added your "Suggested-by" tag to the patch.
Thanks,
Mathieu
>
> Thanks,
> Paul
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [RFC PATCH for 5.2 10/10] rseq/selftests: mips: use break instruction for RSEQ_SIG
From: Paul Burton @ 2019-04-24 23:17 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng, linux-kernel,
linux-api, Thomas Gleixner, Andy Lutomirski, Dave Watson,
Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
Josh Triplett, Linus Torvalds, Catalin Marinas, Will
In-Reply-To: <2114325810.1301.1556147523476.JavaMail.zimbra@efficios.com>
Hi Mathieu,
On Wed, Apr 24, 2019 at 07:12:03PM -0400, Mathieu Desnoyers wrote:
> Does the following comment above the forest of #ifdef work for you ?
>
> /*
> * RSEQ_SIG uses the break instruction. The instruction pattern is:
> *
> * On MIPS:
> * 0350000d break 0x350
> *
> * On nanoMIPS32:
> * 00100350 break 0x350
> *
> * On microMIPS:
> * 0000d407 break 0x350
> *
> * For nanoMIPS32 and microMIPS, the instruction stream is encoded as 16-bit
> * halfwords, so the signature halfwords need to be swapped accordingly for
> * little-endian.
> */
I'd probably just say nanoMIPS rather than nanoMIPS32, because when we
get nanoMIPS64 in the future it'll be a superset of nanoMIPS32 & the
break encoding will be the same.
But otherwise it looks good to me :)
Thanks,
Paul
^ permalink raw reply
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7)
From: Paul Burton @ 2019-04-24 23:13 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens, gor,
schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, carlos, Florian Weimer,
Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner,
Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson
In-Reply-To: <1103046939.521.1556118342613.JavaMail.zimbra@efficios.com>
Hi Mathieu,
Just following up on a couple of things here.
On Wed, Apr 24, 2019 at 11:05:42AM -0400, Mathieu Desnoyers wrote:
> >>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon)
> >>
> >> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS
> >> & classic MIPS) so that could be something like:
> >>
> >> break 0x7273 # ASCII 'rs'
> >>
> >
> > I like this uncommon break instruction as signature choice.
> >
> > However, if I try to compile assembler with a break 0x7273 instruction
> > with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu
> > 8.2.0-1ubuntu2~18.04))
> > I get:
> >
> > /tmp/ccVh9F7T.s: Assembler messages:
> > /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273'
> >
> > It works up to the value 0x3FF, which seems to use the top 10
> > code bits:
> >
> > a: 03ff 0007 break 0x3ff
> >
> > Would a "break 0x350" be a good choice as well ?
The "break 0x350" instruction seems good to me - it's still going to be
rare.
> > Any idea why 0x7273 is not accepted by my assembler ?
I don't know why the assembler wants a smaller immediate than the
instruction encoding allows... There's a comment in the binutils file
include/opcode/mips.h that reads:
> A breakpoint instruction uses OP, CODE and SPEC (10 bits of the
> breakpoint instruction are not defined; Kane says the breakpoint code
> field in BREAK is 20 bits; yet MIPS assemblers and debuggers only use
> ten bits). An optional two-operand form of break/sdbbp allows the
> lower ten bits to be set too, and MIPS32 and later architectures allow
> 20 bits to be set with a signal operand (using CODE20).
I suspect there's some history here that predates my involvement (or
possibly just predates me).
> > I also tried crafting the assembler with values between 0x3FF and 0x7273
> > in the 20 code bits. It seems fine from an objdump perspective:
> >
> > ".long 0x03FFFC7\n\t"
> >
> > generates:
> >
> > 10: 003f ffc7 break 0x3f,0x3ff
> >
> > What I don't understand is why the instruction generated by my
> > toolchain ends with the last 6 bits "000111", whereas the mips32
> > instruction set specifies break as ending with "001101" [1].
> > What am I missing ?
Were you targeting microMIPS by any chance? There the break32
instructions ends with 000111.
> > Also, the nanomips break code [2] has a completely different
> > instruction layout. Should we use a different signature when
> > compiling for nanomips ? What #ifdef should we use ?
Yes, and __nanomips__. I included the encoding in my reply to your RFC
patch.
> > Do I need a special toolchain to generate nanomips binaries ?
Yes, you can find it here:
https://codescape.mips.com/components/toolchain/nanomips/latest/index.html
We don't have the nanoMIPS kernel support in mainline yet, I've gotten
various things applied in preparation but also been swamped with other
things so it's taking a while. If you want to see a working downstream
kernel though you can find it here:
git://git.linux-mips.org/pub/scm/linux-mti.git nanomips-v4.15
> Hi Paul, I'm still waiting for feedback on the MIPS front.
>
> Meanwhile, I plan to use #define RSEQ_SIG 0x0350000d which maps to:
>
> 0350000d break 0x350
>
> and use RSEQ_SIG in assembly with:
>
> ".word " __rseq_str(RSEQ_SIG) "\n\t"
>
> on big and little endian MIPS, for MIPS32 and MIPS64, based on
> code generated with gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04).
>
> Let me know if it needs to be tweaked.
That's fine for the classic MIPS ISA, but won't decode as a break for
microMIPS or nanoMIPS. See my reply to your RFC for valid encodings for
both of those.
Thanks,
Paul
^ permalink raw reply
* Re: [RFC PATCH for 5.2 10/10] rseq/selftests: mips: use break instruction for RSEQ_SIG
From: Mathieu Desnoyers @ 2019-04-24 23:12 UTC (permalink / raw)
To: Paul Burton
Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng, linux-kernel,
linux-api, Thomas Gleixner, Andy Lutomirski, Dave Watson,
Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
H. Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer, rostedt,
Josh Triplett, Linus Torvalds, Catalin Marinas, Will
In-Reply-To: <20190424220609.4kryfcgsv46iu3ds@pburton-laptop>
----- On Apr 24, 2019, at 6:06 PM, Paul Burton paul.burton@mips.com wrote:
> Hi Mathieu,
>
> On Wed, Apr 24, 2019 at 11:25:02AM -0400, Mathieu Desnoyers wrote:
>> diff --git a/tools/testing/selftests/rseq/rseq-mips.h
>> b/tools/testing/selftests/rseq/rseq-mips.h
>> index fe3eabcdcbe5..eb53a6adfbbb 100644
>> --- a/tools/testing/selftests/rseq/rseq-mips.h
>> +++ b/tools/testing/selftests/rseq/rseq-mips.h
>> @@ -7,7 +7,11 @@
>> * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> */
>>
>> -#define RSEQ_SIG 0x53053053
>> +/*
>> + * RSEQ_SIG uses the break instruction. The instruction pattern is
>> + * 0350000d break 0x350
>> + */
>> +#define RSEQ_SIG 0x0350000d
>
> My apologies for taking a while to get back to you on the various ISAs &
> endian issues here, but I think we'll want this to be something like:
>
> #if defined(__nanomips__)
> # ifdef __MIPSEL__
> # define RSEQ_SIG 0x03500010
> # else
> # define RSEQ_SIG 0x00100350
> # endif
> #elif defined(__mips_micromips)
> # ifdef __MIPSEL__
> # define RSEQ_SIG 0xd4070000
> # else
> # define RSEQ_SIG 0x0000d407
> # endif
> #else
> # define RSEQ_SIG 0x0350000d
> #endif
>
> For plain old MIPS the .word directive will be fine endian-wise, but for
> microMIPS & nanoMIPS we need to take into account that the instruction
> stream is encoded as 16b halfwords & swap those accordingly for little
> endian.
Hi Paul,
Thanks for looking into it!
Does the following comment above the forest of #ifdef work for you ?
/*
* RSEQ_SIG uses the break instruction. The instruction pattern is:
*
* On MIPS:
* 0350000d break 0x350
*
* On nanoMIPS32:
* 00100350 break 0x350
*
* On microMIPS:
* 0000d407 break 0x350
*
* For nanoMIPS32 and microMIPS, the instruction stream is encoded as 16-bit
* halfwords, so the signature halfwords need to be swapped accordingly for
* little-endian.
*/
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [RFC PATCH for 5.2 10/10] rseq/selftests: mips: use break instruction for RSEQ_SIG
From: Paul Burton @ 2019-04-24 22:06 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
Thomas Gleixner, Andy Lutomirski, Dave Watson, Paul Turner,
Andrew Morton, Russell King, Ingo Molnar, H . Peter Anvin,
Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt,
Josh Triplett, Linus Torvalds, Catalin
In-Reply-To: <20190424152502.14246-11-mathieu.desnoyers@efficios.com>
Hi Mathieu,
On Wed, Apr 24, 2019 at 11:25:02AM -0400, Mathieu Desnoyers wrote:
> diff --git a/tools/testing/selftests/rseq/rseq-mips.h b/tools/testing/selftests/rseq/rseq-mips.h
> index fe3eabcdcbe5..eb53a6adfbbb 100644
> --- a/tools/testing/selftests/rseq/rseq-mips.h
> +++ b/tools/testing/selftests/rseq/rseq-mips.h
> @@ -7,7 +7,11 @@
> * (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> */
>
> -#define RSEQ_SIG 0x53053053
> +/*
> + * RSEQ_SIG uses the break instruction. The instruction pattern is
> + * 0350000d break 0x350
> + */
> +#define RSEQ_SIG 0x0350000d
My apologies for taking a while to get back to you on the various ISAs &
endian issues here, but I think we'll want this to be something like:
#if defined(__nanomips__)
# ifdef __MIPSEL__
# define RSEQ_SIG 0x03500010
# else
# define RSEQ_SIG 0x00100350
# endif
#elif defined(__mips_micromips)
# ifdef __MIPSEL__
# define RSEQ_SIG 0xd4070000
# else
# define RSEQ_SIG 0x0000d407
# endif
#else
# define RSEQ_SIG 0x0350000d
#endif
For plain old MIPS the .word directive will be fine endian-wise, but for
microMIPS & nanoMIPS we need to take into account that the instruction
stream is encoded as 16b halfwords & swap those accordingly for little
endian.
Thanks,
Paul
^ permalink raw reply
* Re: [PATCH v17 3/3] Documentation/filesystems/proc.txt: add arch_status file
From: Thomas Gleixner @ 2019-04-24 21:27 UTC (permalink / raw)
To: Aubrey Li
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <20190421183529.9141-3-aubrey.li@linux.intel.com>
On Mon, 22 Apr 2019, Aubrey Li wrote:
> +3.12 /proc/<pid>/arch_status - task architecture specific status
> +-------------------------------------------------------------------
> +When CONFIG_PROC_PID_ARCH_STATUS is enabled, this file displays the
> +architecture specific status of the task.
> +
> +Example
> +-------
> + $ cat /proc/6753/arch_status
> + AVX512_elapsed_ms: 8
Examples are nice, but as this is architecture specific this section needs
to list the entries which are exposed by a particular architecture.
x86 specific entries:
---------------------
AVX512_elapsed_ms:
------------------
If AVX512 is supported ....
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH v17 2/3] /proc/pid/arch_status: Add AVX-512 usage elapsed time
From: Thomas Gleixner @ 2019-04-24 21:19 UTC (permalink / raw)
To: Aubrey Li
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <20190421183529.9141-2-aubrey.li@linux.intel.com>
On Mon, 22 Apr 2019, Aubrey Li wrote:
$Subject: /proc/pid/arch_status: Add AVX-512 usage elapsed time
x86 patches surely have a different prefix.
>
> +#ifdef CONFIG_PROC_PID_ARCH_STATUS
> +/* Add support for task architecture specific output in /proc/pid/arch_status */
> +void task_arch_status(struct seq_file *m, struct task_struct *task);
> +#define task_arch_status task_arch_status
> +#endif /* CONFIG_PROC_PID_ARCH_STATUS */
See previous reply.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH v17 1/3] proc: add /proc/<pid>/arch_status
From: Thomas Gleixner @ 2019-04-24 21:18 UTC (permalink / raw)
To: Aubrey Li
Cc: mingo, peterz, hpa, ak, tim.c.chen, dave.hansen, arjan, adobriyan,
akpm, aubrey.li, linux-api, linux-kernel, Andy Lutomirski
In-Reply-To: <20190421183529.9141-1-aubrey.li@linux.intel.com>
On Mon, 22 Apr 2019, Aubrey Li wrote:
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5ad92419be19..d5a9c5ddd453 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -208,6 +208,7 @@ config X86
> select USER_STACKTRACE_SUPPORT
> select VIRT_TO_BUS
> select X86_FEATURE_NAMES if PROC_FS
> + select PROC_PID_ARCH_STATUS if PROC_FS
Can you please stop mixing arch and proc code? There is no point in
enabling this on x86 right away.
> +
> +config PROC_PID_ARCH_STATUS
> + bool "Enable /proc/<pid>/arch_status file"
Why is this switchable? x86 selects it if PROC_FS is enabled and all other
architectures are absolutely not interested in this.
> + default n
> + help
> + Provides a way to examine process architecture specific information.
> + See <file:Documentation/filesystems/proc.txt> for more information.
Which contains zero information about this file when only this patch is
applied.
> @@ -94,6 +94,7 @@
> #include <linux/sched/debug.h>
> #include <linux/sched/stat.h>
> #include <linux/posix-timers.h>
> +#include <linux/processor.h>
That include is required because it does NOT contain anything useful for
this, right?
> +/*
> + * Add support for task architecture specific output in /proc/pid/arch_status.
> + * task_arch_status() must be defined in asm/processor.h
> + */
> +#ifdef CONFIG_PROC_PID_ARCH_STATUS
> +# ifndef task_arch_status
> +# define task_arch_status(m, task)
> +# endif
What exactly is the point of this macro mess? If an architecture selects
CONFIG_PROC_PID_ARCH_STATUS then it has to provide proc_task_arch_status()
and the prototype should be in include/linux/proc_fs.h.
> +static int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
> + struct pid *pid, struct task_struct *task)
> +{
> + task_arch_status(m, task);
> + return 0;
> +}
> +#endif /* CONFIG_PROC_PID_ARCH_STATUS */
Thanks,
tglx
^ permalink raw reply
* Re: [RFC PATCH for 5.2 08/10] rseq/selftests: aarch64 code signature: handle big-endian environment
From: Mathieu Desnoyers @ 2019-04-24 17:02 UTC (permalink / raw)
To: Mark Rutland
Cc: Will Deacon, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
linux-kernel, linux-api, Thomas Gleixner, Andy Lutomirski,
Dave Watson, Paul Turner, Andrew Morton, Russell King,
Ingo Molnar, H. Peter Anvin, Andi Kleen, Chris Lameter,
Ben Maurer, rostedt, Josh Triplett, Linus Torvalds
In-Reply-To: <697611694.604.1556125254852.JavaMail.zimbra@efficios.com>
----- On Apr 24, 2019, at 1:00 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
> ----- On Apr 24, 2019, at 12:51 PM, Mark Rutland mark.rutland@arm.com wrote:
>
>> On Wed, Apr 24, 2019 at 05:45:38PM +0100, Will Deacon wrote:
>>> On Wed, Apr 24, 2019 at 11:25:00AM -0400, Mathieu Desnoyers wrote:
>>> > +/*
>>> > + * aarch64 -mbig-endian generates mixed endianness code vs data:
>>> > + * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>>> > + * matches code endianness.
>>> > + */
>>> > +#define RSEQ_SIG_CODE 0xd428bc00 /* BRK #0x45E0. */
>>> > +
>>> > +#ifdef __AARCH64EB__
>>> > +#define RSEQ_SIG_DATA 0x00bc28d4 /* BRK #0x45E0. */
>>>
>>> It would be neater to implement swab32 and use that with RSEQ_SIG_CODE,
>>
>> If possible, marginally neater than that would be using
>> le32_to_cpu(RSEQ_SIG_CODE), without any ifdeffery necessary.
>>
>> It looks like that's defined in tools/include/linux/kernel.h, but I'm
>> not sure if that gets pulled into your include path.
>
> Considering that those RSEQ_SIG* define will end up in public bits/rseq.h
> headers within glibc, I'm tempted to keep the amount of dependencies on
> external headers to a minimum, if it's OK with you.
Also, I'm not sure the le32_to_cpu() approach would work in __ASSEMBLER__
builds.
Thanks,
Mathieu
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [RFC PATCH for 5.2 08/10] rseq/selftests: aarch64 code signature: handle big-endian environment
From: Mathieu Desnoyers @ 2019-04-24 17:00 UTC (permalink / raw)
To: Mark Rutland
Cc: Will Deacon, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
linux-kernel, linux-api, Thomas Gleixner, Andy Lutomirski,
Dave Watson, Paul Turner, Andrew Morton, Russell King,
Ingo Molnar, H. Peter Anvin, Andi Kleen, Chris Lameter,
Ben Maurer, rostedt, Josh Triplett, Linus Torvalds
In-Reply-To: <20190424165150.GF21101@lakrids.cambridge.arm.com>
----- On Apr 24, 2019, at 12:51 PM, Mark Rutland mark.rutland@arm.com wrote:
> On Wed, Apr 24, 2019 at 05:45:38PM +0100, Will Deacon wrote:
>> On Wed, Apr 24, 2019 at 11:25:00AM -0400, Mathieu Desnoyers wrote:
>> > +/*
>> > + * aarch64 -mbig-endian generates mixed endianness code vs data:
>> > + * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>> > + * matches code endianness.
>> > + */
>> > +#define RSEQ_SIG_CODE 0xd428bc00 /* BRK #0x45E0. */
>> > +
>> > +#ifdef __AARCH64EB__
>> > +#define RSEQ_SIG_DATA 0x00bc28d4 /* BRK #0x45E0. */
>>
>> It would be neater to implement swab32 and use that with RSEQ_SIG_CODE,
>
> If possible, marginally neater than that would be using
> le32_to_cpu(RSEQ_SIG_CODE), without any ifdeffery necessary.
>
> It looks like that's defined in tools/include/linux/kernel.h, but I'm
> not sure if that gets pulled into your include path.
Considering that those RSEQ_SIG* define will end up in public bits/rseq.h
headers within glibc, I'm tempted to keep the amount of dependencies on
external headers to a minimum, if it's OK with you.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply
* Re: [RFC PATCH for 5.2 08/10] rseq/selftests: aarch64 code signature: handle big-endian environment
From: Mark Rutland @ 2019-04-24 16:51 UTC (permalink / raw)
To: Will Deacon
Cc: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
linux-kernel, linux-api, Thomas Gleixner, Andy Lutomirski,
Dave Watson, Paul Turner, Andrew Morton, Russell King,
Ingo Molnar, H . Peter Anvin, Andi Kleen, Chris Lameter,
Ben Maurer, Steven Rostedt, Josh Triplett, Linus Torvalds
In-Reply-To: <20190424164538.GC18611@fuggles.cambridge.arm.com>
On Wed, Apr 24, 2019 at 05:45:38PM +0100, Will Deacon wrote:
> On Wed, Apr 24, 2019 at 11:25:00AM -0400, Mathieu Desnoyers wrote:
> > +/*
> > + * aarch64 -mbig-endian generates mixed endianness code vs data:
> > + * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
> > + * matches code endianness.
> > + */
> > +#define RSEQ_SIG_CODE 0xd428bc00 /* BRK #0x45E0. */
> > +
> > +#ifdef __AARCH64EB__
> > +#define RSEQ_SIG_DATA 0x00bc28d4 /* BRK #0x45E0. */
>
> It would be neater to implement swab32 and use that with RSEQ_SIG_CODE,
If possible, marginally neater than that would be using
le32_to_cpu(RSEQ_SIG_CODE), without any ifdeffery necessary.
It looks like that's defined in tools/include/linux/kernel.h, but I'm
not sure if that gets pulled into your include path.
Thanks,
Mark.
^ permalink raw reply
* Re: [RFC PATCH for 5.2 08/10] rseq/selftests: aarch64 code signature: handle big-endian environment
From: Will Deacon @ 2019-04-24 16:45 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Peter Zijlstra, Paul E . McKenney, Boqun Feng, linux-kernel,
linux-api, Thomas Gleixner, Andy Lutomirski, Dave Watson,
Paul Turner, Andrew Morton, Russell King, Ingo Molnar,
H . Peter Anvin, Andi Kleen, Chris Lameter, Ben Maurer,
Steven Rostedt, Josh Triplett, Linus Torvalds, Catalin Marinas,
Michael Kerrisk <mtk.manpage>
In-Reply-To: <20190424152502.14246-9-mathieu.desnoyers@efficios.com>
On Wed, Apr 24, 2019 at 11:25:00AM -0400, Mathieu Desnoyers wrote:
> Handle compiling with -mbig-endian on aarch64, which generates binaries
> with mixed code vs data endianness (little endian code, big endian
> data).
>
> Else mismatch between code endianness for the generated signatures and
> data endianness for the RSEQ_SIG parameter passed to the rseq
> registration will trigger application segmentation faults when the
> kernel try to abort rseq critical sections.
>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Joel Fernandes <joelaf@google.com>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Dave Watson <davejwatson@fb.com>
> CC: Will Deacon <will.deacon@arm.com>
> CC: Shuah Khan <shuah@kernel.org>
> CC: Andi Kleen <andi@firstfloor.org>
> CC: linux-kselftest@vger.kernel.org
> CC: "H . Peter Anvin" <hpa@zytor.com>
> CC: Chris Lameter <cl@linux.com>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Michael Kerrisk <mtk.manpages@gmail.com>
> CC: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Paul Turner <pjt@google.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ben Maurer <bmaurer@fb.com>
> CC: linux-api@vger.kernel.org
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> tools/testing/selftests/rseq/rseq-arm64.h | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/rseq/rseq-arm64.h b/tools/testing/selftests/rseq/rseq-arm64.h
> index b41a2a48e965..200dae9e4208 100644
> --- a/tools/testing/selftests/rseq/rseq-arm64.h
> +++ b/tools/testing/selftests/rseq/rseq-arm64.h
> @@ -6,7 +6,20 @@
> * (C) Copyright 2018 - Will Deacon <will.deacon@arm.com>
> */
>
> -#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0 */
> +/*
> + * aarch64 -mbig-endian generates mixed endianness code vs data:
> + * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
> + * matches code endianness.
> + */
> +#define RSEQ_SIG_CODE 0xd428bc00 /* BRK #0x45E0. */
> +
> +#ifdef __AARCH64EB__
> +#define RSEQ_SIG_DATA 0x00bc28d4 /* BRK #0x45E0. */
It would be neater to implement swab32 and use that with RSEQ_SIG_CODE,
but either way:
Acked-by: Will Deacon <will.deacon@arm.com>
Will
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox