Linux Documentation
 help / color / mirror / Atom feed
* [PATCH RESEND v4 1/2] arm64: KVM: export the capability to set guest SError syndrome
From: Dongjiu Geng @ 2018-06-08 19:48 UTC (permalink / raw)
  To: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
	catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	gengdongjiu, linux-arm-kernel, linux-kernel, linux-acpi
In-Reply-To: <1528487320-2873-1-git-send-email-gengdongjiu@huawei.com>

For the arm64 RAS Extension, user space can inject a virtual-SError
with specified ESR. So user space needs to know whether KVM support
to inject such SError, this interface adds this query for this capability.

KVM will check whether system support RAS Extension, if supported, KVM
returns true to user space, otherwise returns false.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
---
 Documentation/virtual/kvm/api.txt | 11 +++++++++++
 arch/arm64/kvm/reset.c            |  3 +++
 include/uapi/linux/kvm.h          |  1 +
 3 files changed, 15 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 758bf40..fdac969 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4603,3 +4603,14 @@ Architectures: s390
 This capability indicates that kvm will implement the interfaces to handle
 reset, migration and nested KVM for branch prediction blocking. The stfle
 facility 82 should not be provided to the guest without this capability.
+
+8.14 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify the syndrome value reported
+to the guest OS when guest takes a virtual SError interrupt exception.
+If KVM has this capability, userspace can only specify the ISS field for the ESR
+syndrome, it can not specify the EC field which is not under control by KVM.
+If this virtual SError is taken to EL1 using AArch64, this value will be reported
+in ISS filed of ESR_EL1.
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..38c8a64 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PMU_V3:
 		r = kvm_arm_support_pmu_v3();
 		break;
+	case KVM_CAP_ARM_INJECT_SERROR_ESR:
+		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b02c41e..e88f976 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -948,6 +948,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_BPB 152
 #define KVM_CAP_GET_MSR_FEATURES 153
 #define KVM_CAP_HYPERV_EVENTFD 154
+#define KVM_CAP_ARM_INJECT_SERROR_ESR 155
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] doc: Update synchronize_rcu() definition in whatisRCU.txt
From: Paul E. McKenney @ 2018-06-08  9:38 UTC (permalink / raw)
  To: Andrea Parri
  Cc: linux-kernel, linux-doc, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Jonathan Corbet
In-Reply-To: <1528365717-7213-1-git-send-email-andrea.parri@amarulasolutions.com>

On Thu, Jun 07, 2018 at 12:01:57PM +0200, Andrea Parri wrote:
> The synchronize_rcu() definition based on RW-locks in whatisRCU.txt
> does not meet the "Memory-Barrier Guarantees" in Requirements.html;
> for example, the following SB-like test:
> 
>     P0:                      P1:
> 
>     WRITE_ONCE(x, 1);        WRITE_ONCE(y, 1);
>     synchronize_rcu();       smp_mb();
>     r0 = READ_ONCE(y);       r1 = READ_ONCE(x);
> 
> should not be allowed to reach the state "r0 = 0 AND r1 = 0", but
> the current write_lock()+write_unlock() definition can not ensure
> this. Remedies this by inserting an smp_mb__after_spinlock().
> 
> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>

Queued for review, thank you!

							Thanx, Paul

> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/RCU/whatisRCU.txt | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
> index a27fbfb0efb82..86a54ff911fc2 100644
> --- a/Documentation/RCU/whatisRCU.txt
> +++ b/Documentation/RCU/whatisRCU.txt
> @@ -586,6 +586,7 @@ It is extremely simple:
>  	void synchronize_rcu(void)
>  	{
>  		write_lock(&rcu_gp_mutex);
> +		smp_mb__after_spinlock();
>  		write_unlock(&rcu_gp_mutex);
>  	}
> 
> @@ -607,12 +608,15 @@ don't forget about them when submitting patches making use of RCU!]
> 
>  The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
>  and release a global reader-writer lock.  The synchronize_rcu()
> -primitive write-acquires this same lock, then immediately releases
> -it.  This means that once synchronize_rcu() exits, all RCU read-side
> -critical sections that were in progress before synchronize_rcu() was
> -called are guaranteed to have completed -- there is no way that
> -synchronize_rcu() would have been able to write-acquire the lock
> -otherwise.
> +primitive write-acquires this same lock, then releases it.  This means
> +that once synchronize_rcu() exits, all RCU read-side critical sections
> +that were in progress before synchronize_rcu() was called are guaranteed
> +to have completed -- there is no way that synchronize_rcu() would have
> +been able to write-acquire the lock otherwise.  The smp_mb__after_spinlock()
> +promotes synchronize_rcu() to a full memory barrier in compliance with
> +the "Memory-Barrier Guarantees" listed in:
> +
> +	Documentation/RCU/Design/Requirements/Requirements.html.
> 
>  It is possible to nest rcu_read_lock(), since reader-writer locks may
>  be recursively acquired.  Note also that rcu_read_lock() is immune
> -- 
> 2.7.4
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
From: Ravi Bangoria @ 2018-06-08  6:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria
In-Reply-To: <20180608141431.c0196a2c80a8d4701bc9db58@kernel.org>

Hi Masami,

>> So for kernel modules,
>>
>> is it fine to change current ABI from
>>     uprobe_register(inode, offset, consumer)
>> to
>>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
>>
>> Or I should introduce new function for this:
>>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>> and export it to kernel module?
>>
>> What's your suggestion?
> 
> Latter is fine to me. Since the refctr is introduced totally in userspace
> (for SDT) and free-address userspace probing doesn't need refctr, maybe
> we should keep those separated.

Sure.

> 
>> [...]
>>
>>>>
>>>>  - This patches still has one issue. If there are multiple instances of
>>>>    same application running and user wants to trace any particular
>>>>    instance, trace_uprobe is updating reference counter in all instances.
>>>>    This is not a problem on user side because instruction is not replaced
>>>>    with trap/int3 and thus user will only see samples from his interested
>>>>    process. But still this is more of a correctness issue. I'm working on
>>>>    a fix for this.
>>>
>>> Hmm, it sounds like not a correctness issue, but there maybe a performace
>>> tradeoff. Tracing one particulear instance, other instances also will get
>>> a performance loss
>>
>>
>> Right, but it's temporary. I mean, putting everything in to this series was making
>> it complex. So this is the initial one and I'll send followup patches which will
>> optimize the reference counter update.
> 
> Ah, OK. If you have prepared the followup patches, could you also send it
> with this series? Perhups it will help us to understand the issue clearer.

Not ready as such.. it's making the code bit complicated. I'm working on it
and will send the next series with those patches included.

> 
>>
>>> (Only if the parameter preparation block is heavy,
>>> because the heaviest part of probing - trap/int3 and recording data - isn't
>>> executed.)
>>>> BTW, why this happens? I thought the refcounter part is just a data which
>>> is not shared among processes...
>>>
>>
>> This happens because we are not calling consumer_filter function. consumer_filter
>> is the one who decides whether to change the instruction to trap or not in a given
>> mm. We also need to call it before updating reference counter.
> 
> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
> remove_breakpoint?

Not really, it would be simpler if I can put it inside install_breakpoint().
Consider an mmap() case. Probed instruction resides in the text section whereas
reference counter resides in the data section. These sections gets mapped using
separate mmap() calls. So, when process mmaps the text section we will change the
instruction, but section holding the reference counter may not have been mapped
yet in the virtual memory. If so, we will fail to update the reference counter.

Thanks,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions
From: kbuild test robot @ 2018-06-08 14:13 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: kbuild-all, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen, Andy Lutomirski,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, Mike Kravetz,
	Yu-cheng Yu
In-Reply-To: <20180607143705.3531-7-yu-cheng.yu@intel.com>

[-- Attachment #1: Type: text/plain, Size: 2368 bytes --]

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   In file included from arch/xtensa/include/asm/pgtable.h:444,
                    from include/linux/memremap.h:8,
                    from include/linux/mm.h:27,
                    from include/linux/pid_namespace.h:7,
                    from include/linux/ptrace.h:10,
                    from arch/xtensa/kernel/asm-offsets.c:21:
   include/asm-generic/pgtable.h: In function 'huge_ptep_set_wrorptect_flush':
>> include/asm-generic/pgtable.h:129:2: error: implicit declaration of function 'huge_ptep_set_wrprotect'; did you mean 'ptep_set_wrprotect'? [-Werror=implicit-function-declaration]
     huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
     ^~~~~~~~~~~~~~~~~~~~~~~
     ptep_set_wrprotect
   cc1: some warnings being treated as errors
   make[2]: *** [arch/xtensa/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +129 include/asm-generic/pgtable.h

   123	
   124	#ifndef __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT_FLUSH
   125	static inline void huge_ptep_set_wrorptect_flush(struct vm_area_struct *vma,
   126							 unsigned long addr,
   127							 pte_t *ptep)
   128	{
 > 129		huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
   130	}
   131	#endif
   132	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 53194 bytes --]

^ permalink raw reply

* Re: [PATCH 5/9] x86/mm: Introduce _PAGE_DIRTY_SW
From: kbuild test robot @ 2018-06-08  5:15 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: kbuild-all, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen, Andy Lutomirski,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, Mike Kravetz,
	Yu-cheng Yu
In-Reply-To: <20180607143705.3531-6-yu-cheng.yu@intel.com>

[-- Attachment #1: Type: text/plain, Size: 6964 bytes --]

Hi Yu-cheng,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on asm-generic/master]
[also build test WARNING on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: i386-randconfig-x003-201822 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/current.h:5:0,
                    from include/linux/sched.h:12,
                    from include/linux/context_tracking.h:5,
                    from arch/x86/kernel/traps.c:15:
   arch/x86/kernel/traps.c: In function 'do_control_protection':
   arch/x86/kernel/traps.c:605:27: error: 'X86_FEATURE_SHSTK' undeclared (first use in this function); did you mean 'X86_FEATURE_EST'?
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
                              ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> arch/x86/kernel/traps.c:605:2: note: in expansion of macro 'if'
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
     ^~
>> arch/x86/kernel/traps.c:605:7: note: in expansion of macro 'cpu_feature_enabled'
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
          ^~~~~~~~~~~~~~~~~~~
   arch/x86/kernel/traps.c:605:27: note: each undeclared identifier is reported only once for each function it appears in
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
                              ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> arch/x86/kernel/traps.c:605:2: note: in expansion of macro 'if'
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
     ^~
>> arch/x86/kernel/traps.c:605:7: note: in expansion of macro 'cpu_feature_enabled'
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
          ^~~~~~~~~~~~~~~~~~~
   arch/x86/kernel/traps.c:606:27: error: 'X86_FEATURE_IBT' undeclared (first use in this function); did you mean 'X86_FEATURE_IBS'?
         !cpu_feature_enabled(X86_FEATURE_IBT)) {
                              ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> arch/x86/kernel/traps.c:605:2: note: in expansion of macro 'if'
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
     ^~
   arch/x86/kernel/traps.c:606:7: note: in expansion of macro 'cpu_feature_enabled'
         !cpu_feature_enabled(X86_FEATURE_IBT)) {
          ^~~~~~~~~~~~~~~~~~~

vim +/if +605 arch/x86/kernel/traps.c

a74a6de2 Yu-cheng Yu 2018-06-07  589  
a74a6de2 Yu-cheng Yu 2018-06-07  590  /*
a74a6de2 Yu-cheng Yu 2018-06-07  591   * When a control protection exception occurs, send a signal
a74a6de2 Yu-cheng Yu 2018-06-07  592   * to the responsible application.  Currently, control
a74a6de2 Yu-cheng Yu 2018-06-07  593   * protection is only enabled for the user mode.  This
a74a6de2 Yu-cheng Yu 2018-06-07  594   * exception should not come from the kernel mode.
a74a6de2 Yu-cheng Yu 2018-06-07  595   */
a74a6de2 Yu-cheng Yu 2018-06-07  596  dotraplinkage void
a74a6de2 Yu-cheng Yu 2018-06-07  597  do_control_protection(struct pt_regs *regs, long error_code)
a74a6de2 Yu-cheng Yu 2018-06-07  598  {
a74a6de2 Yu-cheng Yu 2018-06-07  599  	struct task_struct *tsk;
a74a6de2 Yu-cheng Yu 2018-06-07  600  
a74a6de2 Yu-cheng Yu 2018-06-07  601  	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
a74a6de2 Yu-cheng Yu 2018-06-07  602  	cond_local_irq_enable(regs);
a74a6de2 Yu-cheng Yu 2018-06-07  603  
a74a6de2 Yu-cheng Yu 2018-06-07  604  	tsk = current;
a74a6de2 Yu-cheng Yu 2018-06-07 @605  	if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
a74a6de2 Yu-cheng Yu 2018-06-07  606  	    !cpu_feature_enabled(X86_FEATURE_IBT)) {
a74a6de2 Yu-cheng Yu 2018-06-07  607  		goto exit;
a74a6de2 Yu-cheng Yu 2018-06-07  608  	}
a74a6de2 Yu-cheng Yu 2018-06-07  609  
a74a6de2 Yu-cheng Yu 2018-06-07  610  	if (!user_mode(regs)) {
a74a6de2 Yu-cheng Yu 2018-06-07  611  		tsk->thread.error_code = error_code;
a74a6de2 Yu-cheng Yu 2018-06-07  612  		tsk->thread.trap_nr = X86_TRAP_CP;
a74a6de2 Yu-cheng Yu 2018-06-07  613  		if (notify_die(DIE_TRAP, "control protection fault", regs,
a74a6de2 Yu-cheng Yu 2018-06-07  614  			       error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)
a74a6de2 Yu-cheng Yu 2018-06-07  615  			die("control protection fault", regs, error_code);
a74a6de2 Yu-cheng Yu 2018-06-07  616  		return;
a74a6de2 Yu-cheng Yu 2018-06-07  617  	}
a74a6de2 Yu-cheng Yu 2018-06-07  618  
a74a6de2 Yu-cheng Yu 2018-06-07  619  	tsk->thread.error_code = error_code;
a74a6de2 Yu-cheng Yu 2018-06-07  620  	tsk->thread.trap_nr = X86_TRAP_CP;
a74a6de2 Yu-cheng Yu 2018-06-07  621  
a74a6de2 Yu-cheng Yu 2018-06-07  622  	if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
a74a6de2 Yu-cheng Yu 2018-06-07  623  	    printk_ratelimit()) {
a74a6de2 Yu-cheng Yu 2018-06-07  624  		unsigned int max_idx, err_idx;
a74a6de2 Yu-cheng Yu 2018-06-07  625  
a74a6de2 Yu-cheng Yu 2018-06-07  626  		max_idx = ARRAY_SIZE(control_protection_err) - 1;
a74a6de2 Yu-cheng Yu 2018-06-07  627  		err_idx = min((unsigned int)error_code - 1, max_idx);
a74a6de2 Yu-cheng Yu 2018-06-07  628  		pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
a74a6de2 Yu-cheng Yu 2018-06-07  629  			tsk->comm, task_pid_nr(tsk),
a74a6de2 Yu-cheng Yu 2018-06-07  630  			regs->ip, regs->sp, error_code,
a74a6de2 Yu-cheng Yu 2018-06-07  631  			control_protection_err[err_idx]);
a74a6de2 Yu-cheng Yu 2018-06-07  632  		print_vma_addr(" in ", regs->ip);
a74a6de2 Yu-cheng Yu 2018-06-07  633  		pr_cont("\n");
a74a6de2 Yu-cheng Yu 2018-06-07  634  	}
a74a6de2 Yu-cheng Yu 2018-06-07  635  
a74a6de2 Yu-cheng Yu 2018-06-07  636  exit:
a74a6de2 Yu-cheng Yu 2018-06-07  637  	force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
a74a6de2 Yu-cheng Yu 2018-06-07  638  }
a74a6de2 Yu-cheng Yu 2018-06-07  639  NOKPROBE_SYMBOL(do_control_protection);
a74a6de2 Yu-cheng Yu 2018-06-07  640  

:::::: The code at line 605 was first introduced by commit
:::::: a74a6de2a3290257798598ae1f816eddb04f63f2 x86/cet: Control protection exception handler

:::::: TO: Yu-cheng Yu <yu-cheng.yu@intel.com>
:::::: CC: 0day robot <lkp@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33603 bytes --]

^ permalink raw reply

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
From: Masami Hiramatsu @ 2018-06-08  5:14 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao
In-Reply-To: <0d487b32-3141-ec30-6cca-eba7159d70b0@linux.ibm.com>

On Fri, 8 Jun 2018 07:59:38 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Hi Masami,
> 
> On 06/08/2018 06:40 AM, Masami Hiramatsu wrote:
> > On Wed,  6 Jun 2018 14:03:37 +0530
> > Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> > 
> >> Why RFC again:
> >>
> >> This series is different from earlier versions[1]. Earlier series
> >> implemented this feature in trace_uprobe while this has implemented
> >> the logic in core uprobe. Few reasons for this:
> >>  1. One of the major reason was the deadlock between uprobe_lock and
> >>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
> >>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
> >>  to take uprobe_lock to loop over trace_uprobe list. More details can
> >>  be found at[2]. With this new approach, there are no deadlocks found
> >>  so far.
> >>  2. Many of the core uprobe function and data-structures needs to be
> >>  exported to make earlier implementation simple. With this new approach,
> >>  reference counter logic is been implemented in core uprobe and thus
> >>  no need to export anything.
> > 
> > I agree with you. Moreover, since uprobe_register/unregister() are
> > exported to modules, this enablement would better be implemented
> > inside uprobe so that all uprobe users benefit from this.
> 
> 
> Sorry, I think you got me wrong. I meant, I don't need to expose all core
> uprobe _static_ functions to tarce_uprobe.
> 
> Now, about kernel modules, basically uprobe_register() takes three parameters:
>     inode, offset and consumer.
> There is no scope for the reference counter there. So I've created one more
> function: uprobe_register_refctr(). But this function is not exported as ABI
> to kernel module. i.e. kernel modules still does not have a way to create
> uprobe with reference counter.

OK, I got it from your patches. :)

> So for kernel modules,
> 
> is it fine to change current ABI from
>     uprobe_register(inode, offset, consumer)
> to
>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
> 
> Or I should introduce new function for this:
>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> and export it to kernel module?
> 
> What's your suggestion?

Latter is fine to me. Since the refctr is introduced totally in userspace
(for SDT) and free-address userspace probing doesn't need refctr, maybe
we should keep those separated.

> [...]
> 
> >> 
> >>  - This patches still has one issue. If there are multiple instances of
> >>    same application running and user wants to trace any particular
> >>    instance, trace_uprobe is updating reference counter in all instances.
> >>    This is not a problem on user side because instruction is not replaced
> >>    with trap/int3 and thus user will only see samples from his interested
> >>    process. But still this is more of a correctness issue. I'm working on
> >>    a fix for this.
> > 
> > Hmm, it sounds like not a correctness issue, but there maybe a performace
> > tradeoff. Tracing one particulear instance, other instances also will get
> > a performance loss
> 
> 
> Right, but it's temporary. I mean, putting everything in to this series was making
> it complex. So this is the initial one and I'll send followup patches which will
> optimize the reference counter update.

Ah, OK. If you have prepared the followup patches, could you also send it
with this series? Perhups it will help us to understand the issue clearer.

> 
> > (Only if the parameter preparation block is heavy,
> > because the heaviest part of probing - trap/int3 and recording data - isn't
> > executed.)
> >> BTW, why this happens? I thought the refcounter part is just a data which
> > is not shared among processes...
> > 
> 
> This happens because we are not calling consumer_filter function. consumer_filter
> is the one who decides whether to change the instruction to trap or not in a given
> mm. We also need to call it before updating reference counter.

Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
remove_breakpoint?

Thank you,

> 
> Let me know your thoughts.
> 
> Thanks,
> Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions
From: kbuild test robot @ 2018-06-08  4:43 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: kbuild-all, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen, Andy Lutomirski,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, Mike Kravetz,
	Yu-cheng Yu
In-Reply-To: <20180607143705.3531-7-yu-cheng.yu@intel.com>

[-- Attachment #1: Type: text/plain, Size: 3067 bytes --]

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: sparc64-allmodconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   In file included from arch/sparc/include/asm/pgtable_64.h:1064:0,
                    from arch/sparc/include/asm/pgtable.h:5,
                    from include/linux/memremap.h:8,
                    from include/linux/mm.h:27,
                    from mm//swap.c:16:
   include/asm-generic/pgtable.h: In function 'huge_ptep_set_wrorptect_flush':
>> include/asm-generic/pgtable.h:129:2: error: implicit declaration of function 'huge_ptep_set_wrprotect'; did you mean 'huge_ptep_set_wrorptect_flush'? [-Werror=implicit-function-declaration]
     huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
     ^~~~~~~~~~~~~~~~~~~~~~~
     huge_ptep_set_wrorptect_flush
   In file included from include/linux/hugetlb.h:445:0,
                    from mm//swap.c:35:
   arch/sparc/include/asm/hugetlb.h: At top level:
>> arch/sparc/include/asm/hugetlb.h:59:20: warning: conflicting types for 'huge_ptep_set_wrprotect'
    static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
                       ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/sparc/include/asm/hugetlb.h:59:20: error: static declaration of 'huge_ptep_set_wrprotect' follows non-static declaration
   In file included from arch/sparc/include/asm/pgtable_64.h:1064:0,
                    from arch/sparc/include/asm/pgtable.h:5,
                    from include/linux/memremap.h:8,
                    from include/linux/mm.h:27,
                    from mm//swap.c:16:
   include/asm-generic/pgtable.h:129:2: note: previous implicit declaration of 'huge_ptep_set_wrprotect' was here
     huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
     ^~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +129 include/asm-generic/pgtable.h

   123	
   124	#ifndef __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT_FLUSH
   125	static inline void huge_ptep_set_wrorptect_flush(struct vm_area_struct *vma,
   126							 unsigned long addr,
   127							 pte_t *ptep)
   128	{
 > 129		huge_ptep_set_wrprotect(vma->vm_mm, addr, ptep);
   130	}
   131	#endif
   132	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 53366 bytes --]

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: Andy Lutomirski @ 2018-06-08  4:38 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CAMe9rOpeDrkwi-AG0vsiZy4NwkmavhB5Empv58FSHxtr3rpapw@mail.gmail.com>

On Thu, Jun 7, 2018 at 9:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Jun 7, 2018 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> > On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >> >>
> >> >> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> >> >> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >> >> > >
> >> >> > > The following operations are provided.
> >> >> > >
> >> >> > > ARCH_CET_STATUS:
> >> >> > >         return the current CET status
> >> >> > >
> >> >> > > ARCH_CET_DISABLE:
> >> >> > >         disable CET features
> >> >> > >
> >> >> > > ARCH_CET_LOCK:
> >> >> > >         lock out CET features
> >> >> > >
> >> >> > > ARCH_CET_EXEC:
> >> >> > >         set CET features for exec()
> >> >> > >
> >> >> > > ARCH_CET_ALLOC_SHSTK:
> >> >> > >         allocate a new shadow stack
> >> >> > >
> >> >> > > ARCH_CET_PUSH_SHSTK:
> >> >> > >         put a return address on shadow stack
> >> >> > >
> >> >> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> >> >> > > the implementation of GLIBC ucontext related APIs.
> >> >> >
> >> >> > Please document exactly what these all do and why.  I don't understand
> >> >> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
> >> >> > each ELF program, so I think there should be no need for a magic
> >> >> > override.
> >> >>
> >> >> CET is initially enabled if the loader has CET capability.  Then the
> >> >> loader decides if the application can run with CET.  If the application
> >> >> cannot run with CET (e.g. a dependent library does not have CET), then
> >> >> the loader turns off CET before passing to the application.  When the
> >> >> loader is done, it locks out CET and the feature cannot be turned off
> >> >> anymore until the next exec() call.
> >> >
> >> > Why is the lockout necessary?  If user code enables CET and tries to
> >> > run code that doesn't support CET, it will crash.  I don't see why we
> >> > need special code in the kernel to prevent a user program from calling
> >> > arch_prctl() and crashing itself.  There are already plenty of ways to
> >> > do that :)
> >>
> >> On CET enabled machine, not all programs nor shared libraries are
> >> CET enabled.  But since ld.so is CET enabled, all programs start
> >> as CET enabled.  ld.so will disable CET if a program or any of its shared
> >> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
> >> checking so that CET can't no longer be disabled afterwards.
> >
> > Yeah, I got that.  No one has explained *why*.
>
> It is to prevent malicious code from disabling CET.
>

By the time malicious code issue its own syscalls, you've already lost
the battle.  I could probably be convinced that a lock-CET-on feature
that applies *only* to the calling thread and is not inherited by
clone() is a decent idea, but I'd want to see someone who understands
the state of the art in exploit design justify it.  You're also going
to need to figure out how to make CRIU work if you allow locking CET
on.

A priori, I think we should just not provide a lock mechanism.

> > (Also, shouldn't the vDSO itself be marked as supporting CET?)
>
> No. vDSO is loaded by kernel.  vDSO in CET kernel is CET
> compatible.
>

I think the vDSO should do its best to act like a real DSO.  That
means that, if the vDSO supports CET, it should advertise support for
CET using the Linux ABI.  Since you're going to require GCC 8 anyway,
this should be a single line of code in the Makefile.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: Andy Lutomirski @ 2018-06-08  4:35 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CAMe9rOphjpPd3HnKAdU-RmG0RGj6c2oAbnq+C2Jd1srsqTA7=w@mail.gmail.com>

On Thu, Jun 7, 2018 at 9:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 3:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>>
> >>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> >>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>> > >
> >>> > > The following operations are provided.
> >>> > >
> >>> > > ARCH_CET_STATUS:
> >>> > >         return the current CET status
> >>> > >
> >>> > > ARCH_CET_DISABLE:
> >>> > >         disable CET features
> >>> > >
> >>> > > ARCH_CET_LOCK:
> >>> > >         lock out CET features
> >>> > >
> >>> > > ARCH_CET_EXEC:
> >>> > >         set CET features for exec()
> >>> > >
> >>> > > ARCH_CET_ALLOC_SHSTK:
> >>> > >         allocate a new shadow stack
> >>> > >
> >>> > > ARCH_CET_PUSH_SHSTK:
> >>> > >         put a return address on shadow stack
> >>> > >
>
> >> And why do we need ARCH_CET_EXEC?
> >>
> >> For background, I really really dislike adding new state that persists
> >> across exec().  It's nice to get as close to a clean slate as possible
> >> after exec() so that programs can run in a predictable environment.
> >> exec() is also a security boundary, and anything a task can do to
> >> affect itself after exec() needs to have its security implications
> >> considered very carefully.  (As a trivial example, you should not be
> >> able to use cetcmd ... sudo [malicious options here] to cause sudo to
> >> run with CET off and then try to exploit it via the malicious options.
> >>
> >> If a shutoff is needed for testing, how about teaching ld.so to parse
> >> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
> >> protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
> >>
> >
> > I will take a look.
>
> We can use LD_CET to turn off CET.   Since most of legacy binaries
> are compatible with shadow stack,  ARCH_CET_EXEC can be used
> to turn on shadow stack on legacy binaries:

Is there any reason you can't use LD_CET=force to do it for
dynamically linked binaries?

I find it quite hard to believe that forcibly CET-ifying a legacy
statically linked binary is a good idea.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: H.J. Lu @ 2018-06-08  4:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz
In-Reply-To: <CAMe9rOr49V8rqRa_KVsw61PWd+crkQvPDgPKtvowazjmsfgWWQ@mail.gmail.com>

On Thu, Jun 7, 2018 at 3:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>>
>>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>> > >
>>> > > The following operations are provided.
>>> > >
>>> > > ARCH_CET_STATUS:
>>> > >         return the current CET status
>>> > >
>>> > > ARCH_CET_DISABLE:
>>> > >         disable CET features
>>> > >
>>> > > ARCH_CET_LOCK:
>>> > >         lock out CET features
>>> > >
>>> > > ARCH_CET_EXEC:
>>> > >         set CET features for exec()
>>> > >
>>> > > ARCH_CET_ALLOC_SHSTK:
>>> > >         allocate a new shadow stack
>>> > >
>>> > > ARCH_CET_PUSH_SHSTK:
>>> > >         put a return address on shadow stack
>>> > >

>> And why do we need ARCH_CET_EXEC?
>>
>> For background, I really really dislike adding new state that persists
>> across exec().  It's nice to get as close to a clean slate as possible
>> after exec() so that programs can run in a predictable environment.
>> exec() is also a security boundary, and anything a task can do to
>> affect itself after exec() needs to have its security implications
>> considered very carefully.  (As a trivial example, you should not be
>> able to use cetcmd ... sudo [malicious options here] to cause sudo to
>> run with CET off and then try to exploit it via the malicious options.
>>
>> If a shutoff is needed for testing, how about teaching ld.so to parse
>> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
>> protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
>>
>
> I will take a look.

We can use LD_CET to turn off CET.   Since most of legacy binaries
are compatible with shadow stack,  ARCH_CET_EXEC can be used
to turn on shadow stack on legacy binaries:

[hjl@gnu-cet-1 glibc]$ readelf -n /bin/ls| head -10

Displaying notes found in: .note.ABI-tag
  Owner                 Data size Description
  GNU                  0x00000010 NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 3.2.0

Displaying notes found in: .note.gnu.property
  Owner                 Data size Description
  GNU                  0x00000020 NT_GNU_PROPERTY_TYPE_0
      Properties: x86 ISA used:
[hjl@gnu-cet-1 glibc]$ cetcmd --on -- /bin/ls /
Segmentation fault
[hjl@gnu-cet-1 glibc]$ cetcmd --on -f shstk -- /bin/ls /
bin   dev  export  lib   libx32      media  mnt  opt root  sbin  sys  usr
boot  etc  home    lib64  lost+found  misc   net  proc run   srv   tmp  var
[hjl@gnu-cet-1 glibc]$ cetcmd --on -f ibt -- /bin/ls /
Segmentation fault
[hjl@gnu-cet-1 glibc]$

-- 
H.J.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/9] x86/cet: Control protection exception handler
From: kbuild test robot @ 2018-06-08  4:18 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: kbuild-all, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen, Andy Lutomirski,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, Mike Kravetz,
	Yu-cheng Yu
In-Reply-To: <20180607143705.3531-2-yu-cheng.yu@intel.com>

[-- Attachment #1: Type: text/plain, Size: 4669 bytes --]

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/thread_info.h:53:0,
                    from include/linux/thread_info.h:38,
                    from arch/x86/include/asm/preempt.h:7,
                    from include/linux/preempt.h:81,
                    from include/linux/rcupdate.h:40,
                    from include/linux/rculist.h:11,
                    from include/linux/pid.h:5,
                    from include/linux/sched.h:14,
                    from include/linux/context_tracking.h:5,
                    from arch/x86/kernel/traps.c:15:
   arch/x86/kernel/traps.c: In function 'do_control_protection':
>> arch/x86/kernel/traps.c:605:27: error: 'X86_FEATURE_SHSTK' undeclared (first use in this function); did you mean 'X86_FEATURE_EST'?
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
                              ^
   arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
     (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
                           ^~~
   arch/x86/kernel/traps.c:605:27: note: each undeclared identifier is reported only once for each function it appears in
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
                              ^
   arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
     (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
                           ^~~
>> arch/x86/kernel/traps.c:606:27: error: 'X86_FEATURE_IBT' undeclared (first use in this function); did you mean 'X86_FEATURE_IBS'?
         !cpu_feature_enabled(X86_FEATURE_IBT)) {
                              ^
   arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
     (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
                           ^~~

vim +605 arch/x86/kernel/traps.c

   589	
   590	/*
   591	 * When a control protection exception occurs, send a signal
   592	 * to the responsible application.  Currently, control
   593	 * protection is only enabled for the user mode.  This
   594	 * exception should not come from the kernel mode.
   595	 */
   596	dotraplinkage void
   597	do_control_protection(struct pt_regs *regs, long error_code)
   598	{
   599		struct task_struct *tsk;
   600	
   601		RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
   602		cond_local_irq_enable(regs);
   603	
   604		tsk = current;
 > 605		if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
 > 606		    !cpu_feature_enabled(X86_FEATURE_IBT)) {
   607			goto exit;
   608		}
   609	
   610		if (!user_mode(regs)) {
   611			tsk->thread.error_code = error_code;
   612			tsk->thread.trap_nr = X86_TRAP_CP;
   613			if (notify_die(DIE_TRAP, "control protection fault", regs,
   614				       error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)
   615				die("control protection fault", regs, error_code);
   616			return;
   617		}
   618	
   619		tsk->thread.error_code = error_code;
   620		tsk->thread.trap_nr = X86_TRAP_CP;
   621	
   622		if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
   623		    printk_ratelimit()) {
   624			unsigned int max_idx, err_idx;
   625	
   626			max_idx = ARRAY_SIZE(control_protection_err) - 1;
   627			err_idx = min((unsigned int)error_code - 1, max_idx);
   628			pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
   629				tsk->comm, task_pid_nr(tsk),
   630				regs->ip, regs->sp, error_code,
   631				control_protection_err[err_idx]);
   632			print_vma_addr(" in ", regs->ip);
   633			pr_cont("\n");
   634		}
   635	
   636	exit:
   637		force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
   638	}
   639	NOKPROBE_SYMBOL(do_control_protection);
   640	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6721 bytes --]

^ permalink raw reply

* Re: [PATCH 1/9] x86/cet: Control protection exception handler
From: kbuild test robot @ 2018-06-08  4:17 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: kbuild-all, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen, Andy Lutomirski,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, Mike Kravetz,
	Yu-cheng Yu
In-Reply-To: <20180607143705.3531-2-yu-cheng.yu@intel.com>

[-- Attachment #1: Type: text/plain, Size: 4628 bytes --]

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: i386-randconfig-a0-201822 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/thread_info.h:53:0,
                    from include/linux/thread_info.h:38,
                    from arch/x86/include/asm/preempt.h:7,
                    from include/linux/preempt.h:81,
                    from include/linux/rcupdate.h:40,
                    from include/linux/rculist.h:11,
                    from include/linux/pid.h:5,
                    from include/linux/sched.h:14,
                    from include/linux/context_tracking.h:5,
                    from arch/x86/kernel/traps.c:15:
   arch/x86/kernel/traps.c: In function 'do_control_protection':
>> arch/x86/kernel/traps.c:605:27: error: 'X86_FEATURE_SHSTK' undeclared (first use in this function)
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
                              ^
   arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
     (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
                           ^
   arch/x86/kernel/traps.c:605:27: note: each undeclared identifier is reported only once for each function it appears in
     if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
                              ^
   arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
     (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
                           ^
>> arch/x86/kernel/traps.c:606:27: error: 'X86_FEATURE_IBT' undeclared (first use in this function)
         !cpu_feature_enabled(X86_FEATURE_IBT)) {
                              ^
   arch/x86/include/asm/cpufeature.h:127:24: note: in definition of macro 'cpu_feature_enabled'
     (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
                           ^

vim +/X86_FEATURE_SHSTK +605 arch/x86/kernel/traps.c

   589	
   590	/*
   591	 * When a control protection exception occurs, send a signal
   592	 * to the responsible application.  Currently, control
   593	 * protection is only enabled for the user mode.  This
   594	 * exception should not come from the kernel mode.
   595	 */
   596	dotraplinkage void
   597	do_control_protection(struct pt_regs *regs, long error_code)
   598	{
   599		struct task_struct *tsk;
   600	
   601		RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
   602		cond_local_irq_enable(regs);
   603	
   604		tsk = current;
 > 605		if (!cpu_feature_enabled(X86_FEATURE_SHSTK) &&
 > 606		    !cpu_feature_enabled(X86_FEATURE_IBT)) {
   607			goto exit;
   608		}
   609	
   610		if (!user_mode(regs)) {
   611			tsk->thread.error_code = error_code;
   612			tsk->thread.trap_nr = X86_TRAP_CP;
   613			if (notify_die(DIE_TRAP, "control protection fault", regs,
   614				       error_code, X86_TRAP_CP, SIGSEGV) != NOTIFY_STOP)
   615				die("control protection fault", regs, error_code);
   616			return;
   617		}
   618	
   619		tsk->thread.error_code = error_code;
   620		tsk->thread.trap_nr = X86_TRAP_CP;
   621	
   622		if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
   623		    printk_ratelimit()) {
   624			unsigned int max_idx, err_idx;
   625	
   626			max_idx = ARRAY_SIZE(control_protection_err) - 1;
   627			err_idx = min((unsigned int)error_code - 1, max_idx);
   628			pr_info("%s[%d] control protection ip:%lx sp:%lx error:%lx(%s)",
   629				tsk->comm, task_pid_nr(tsk),
   630				regs->ip, regs->sp, error_code,
   631				control_protection_err[err_idx]);
   632			print_vma_addr(" in ", regs->ip);
   633			pr_cont("\n");
   634		}
   635	
   636	exit:
   637		force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
   638	}
   639	NOKPROBE_SYMBOL(do_control_protection);
   640	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29261 bytes --]

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: H.J. Lu @ 2018-06-08  4:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrV1GG5rq_kwxkS-o3x8Ldr72ThdYgkJKQ9cx9Q63SxgTQ@mail.gmail.com>

On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >>
>> >> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> >> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >> > >
>> >> > > The following operations are provided.
>> >> > >
>> >> > > ARCH_CET_STATUS:
>> >> > >         return the current CET status
>> >> > >
>> >> > > ARCH_CET_DISABLE:
>> >> > >         disable CET features
>> >> > >
>> >> > > ARCH_CET_LOCK:
>> >> > >         lock out CET features
>> >> > >
>> >> > > ARCH_CET_EXEC:
>> >> > >         set CET features for exec()
>> >> > >
>> >> > > ARCH_CET_ALLOC_SHSTK:
>> >> > >         allocate a new shadow stack
>> >> > >
>> >> > > ARCH_CET_PUSH_SHSTK:
>> >> > >         put a return address on shadow stack
>> >> > >
>> >> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
>> >> > > the implementation of GLIBC ucontext related APIs.
>> >> >
>> >> > Please document exactly what these all do and why.  I don't understand
>> >> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
>> >> > each ELF program, so I think there should be no need for a magic
>> >> > override.
>> >>
>> >> CET is initially enabled if the loader has CET capability.  Then the
>> >> loader decides if the application can run with CET.  If the application
>> >> cannot run with CET (e.g. a dependent library does not have CET), then
>> >> the loader turns off CET before passing to the application.  When the
>> >> loader is done, it locks out CET and the feature cannot be turned off
>> >> anymore until the next exec() call.
>> >
>> > Why is the lockout necessary?  If user code enables CET and tries to
>> > run code that doesn't support CET, it will crash.  I don't see why we
>> > need special code in the kernel to prevent a user program from calling
>> > arch_prctl() and crashing itself.  There are already plenty of ways to
>> > do that :)
>>
>> On CET enabled machine, not all programs nor shared libraries are
>> CET enabled.  But since ld.so is CET enabled, all programs start
>> as CET enabled.  ld.so will disable CET if a program or any of its shared
>> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
>> checking so that CET can't no longer be disabled afterwards.
>
> Yeah, I got that.  No one has explained *why*.

It is to prevent malicious code from disabling CET.

> (Also, shouldn't the vDSO itself be marked as supporting CET?)

No. vDSO is loaded by kernel.  vDSO in CET kernel is CET
compatible.

-- 
H.J.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 4/9] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW
From: kbuild test robot @ 2018-06-08  3:53 UTC (permalink / raw)
  To: Yu-cheng Yu
  Cc: kbuild-all, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen, Andy Lutomirski,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, Mike Kravetz,
	Yu-cheng Yu
In-Reply-To: <20180607143705.3531-5-yu-cheng.yu@intel.com>

[-- Attachment #1: Type: text/plain, Size: 4971 bytes --]

Hi Yu-cheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on asm-generic/master]
[also build test ERROR on v4.17 next-20180607]
[cannot apply to tip/x86/core mmotm/master]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/Yu-cheng-Yu/Control-Flow-Enforcement-Part-2/20180608-111152 HEAD 71d9d315a5e241d9b500540a452d0bec292e1dbb builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   In file included from include/linux/memremap.h:8:0,
                    from include/linux/mm.h:27,
                    from include/linux/memcontrol.h:29,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/x86/kernel/asm-offsets.c:13:
   arch/x86/include/asm/pgtable.h: In function 'pte_dirty':
>> arch/x86/include/asm/pgtable.h:121:26: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
     return pte_flags(pte) & _PAGE_DIRTY;
                             ^~~~~~~~~~~
                             _PAGE_DIRTY_HW
   arch/x86/include/asm/pgtable.h:121:26: note: each undeclared identifier is reported only once for each function it appears in
   arch/x86/include/asm/pgtable.h: In function 'pmd_dirty':
   arch/x86/include/asm/pgtable.h:145:26: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
     return pmd_flags(pmd) & _PAGE_DIRTY;
                             ^~~~~~~~~~~
                             _PAGE_DIRTY_HW
   arch/x86/include/asm/pgtable.h: In function 'pud_dirty':
   arch/x86/include/asm/pgtable.h:155:26: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
     return pud_flags(pud) & _PAGE_DIRTY;
                             ^~~~~~~~~~~
                             _PAGE_DIRTY_HW
   arch/x86/include/asm/pgtable.h: In function 'pte_mkclean':
   arch/x86/include/asm/pgtable.h:286:30: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
     return pte_clear_flags(pte, _PAGE_DIRTY);
                                 ^~~~~~~~~~~
                                 _PAGE_DIRTY_HW
   arch/x86/include/asm/pgtable.h: In function 'pmd_mkclean':
   arch/x86/include/asm/pgtable.h:370:30: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
     return pmd_clear_flags(pmd, _PAGE_DIRTY);
                                 ^~~~~~~~~~~
                                 _PAGE_DIRTY_HW
   arch/x86/include/asm/pgtable.h: In function 'pud_mkclean':
   arch/x86/include/asm/pgtable.h:429:30: error: '_PAGE_DIRTY' undeclared (first use in this function); did you mean '_PAGE_DIRTY_HW'?
     return pud_clear_flags(pud, _PAGE_DIRTY);
                                 ^~~~~~~~~~~
                                 _PAGE_DIRTY_HW
   make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +121 arch/x86/include/asm/pgtable.h

54321d947 arch/x86/include/asm/pgtable.h Jeremy Fitzhardinge 2009-02-11  114  
8405b122a include/asm-x86/pgtable.h      Jeremy Fitzhardinge 2008-01-30  115  /*
4614139c6 include/asm-x86/pgtable.h      Jeremy Fitzhardinge 2008-01-30  116   * The following only work if pte_present() is true.
4614139c6 include/asm-x86/pgtable.h      Jeremy Fitzhardinge 2008-01-30  117   * Undefined behaviour if not..
4614139c6 include/asm-x86/pgtable.h      Jeremy Fitzhardinge 2008-01-30  118   */
3cbaeafeb include/asm-x86/pgtable.h      Joe Perches         2008-03-23  119  static inline int pte_dirty(pte_t pte)
3cbaeafeb include/asm-x86/pgtable.h      Joe Perches         2008-03-23  120  {
a15af1c9e include/asm-x86/pgtable.h      Jeremy Fitzhardinge 2008-05-26 @121  	return pte_flags(pte) & _PAGE_DIRTY;
3cbaeafeb include/asm-x86/pgtable.h      Joe Perches         2008-03-23  122  }
3cbaeafeb include/asm-x86/pgtable.h      Joe Perches         2008-03-23  123  

:::::: The code at line 121 was first introduced by commit
:::::: a15af1c9ea2750a9ff01e51615c45950bad8221b x86/paravirt: add pte_flags to just get pte flags

:::::: TO: Jeremy Fitzhardinge <jeremy@goop.org>
:::::: CC: Thomas Gleixner <tglx@linutronix.de>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6732 bytes --]

^ permalink raw reply

* Re: [PATCH v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS
From: kbuild test robot @ 2018-06-08  3:20 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: kbuild-all, rkrcmar, corbet, christoffer.dall, marc.zyngier,
	linux, catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	gengdongjiu, linux-arm-kernel, linux-kernel, linux-acpi
In-Reply-To: <1528401833-29963-3-git-send-email-gengdongjiu@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 2015 bytes --]

Hi Dongjiu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on v4.17]
[cannot apply to next-20180607]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dongjiu-Geng/support-exception-state-migration-and-set-VSESR_EL2-by-user-space/20180608-063827
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All warnings (new ones prefixed by >>):

   arch/arm64/kvm/guest.c: In function 'kvm_arm_vcpu_get_events':
>> arch/arm64/kvm/guest.c:295:26: warning: argument to 'sizeof' in 'memset' call is the same expression as the destination; did you mean to dereference it? [-Wsizeof-pointer-memaccess]
     memset(events, 0, sizeof(events));
                             ^

vim +295 arch/arm64/kvm/guest.c

   291	
   292	int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
   293				struct kvm_vcpu_events *events)
   294	{
 > 295		memset(events, 0, sizeof(events));
   296	
   297		events->exception.serror_pending = !!(vcpu->arch.hcr_el2 & HCR_VSE);
   298		events->exception.serror_has_esr =
   299						cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
   300	
   301		if (events->exception.serror_pending &&
   302			events->exception.serror_has_esr)
   303			events->exception.serror_esr = vcpu_get_vsesr(vcpu);
   304		else
   305			events->exception.serror_esr = 0;
   306	
   307		return 0;
   308	}
   309	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37498 bytes --]

^ permalink raw reply

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
From: Ravi Bangoria @ 2018-06-08  2:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria
In-Reply-To: <20180608101023.cbf20db485026213d490cb7c@kernel.org>

Hi Masami,

On 06/08/2018 06:40 AM, Masami Hiramatsu wrote:
> On Wed,  6 Jun 2018 14:03:37 +0530
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> 
>> Why RFC again:
>>
>> This series is different from earlier versions[1]. Earlier series
>> implemented this feature in trace_uprobe while this has implemented
>> the logic in core uprobe. Few reasons for this:
>>  1. One of the major reason was the deadlock between uprobe_lock and
>>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>>  to take uprobe_lock to loop over trace_uprobe list. More details can
>>  be found at[2]. With this new approach, there are no deadlocks found
>>  so far.
>>  2. Many of the core uprobe function and data-structures needs to be
>>  exported to make earlier implementation simple. With this new approach,
>>  reference counter logic is been implemented in core uprobe and thus
>>  no need to export anything.
> 
> I agree with you. Moreover, since uprobe_register/unregister() are
> exported to modules, this enablement would better be implemented
> inside uprobe so that all uprobe users benefit from this.


Sorry, I think you got me wrong. I meant, I don't need to expose all core
uprobe _static_ functions to tarce_uprobe.

Now, about kernel modules, basically uprobe_register() takes three parameters:
    inode, offset and consumer.
There is no scope for the reference counter there. So I've created one more
function: uprobe_register_refctr(). But this function is not exported as ABI
to kernel module. i.e. kernel modules still does not have a way to create
uprobe with reference counter. So for kernel modules,

is it fine to change current ABI from
    uprobe_register(inode, offset, consumer)
to
    uprobe_register(inode, offset, ref_ctr_offset, consumer)

Or I should introduce new function for this:
    uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
and export it to kernel module?

What's your suggestion?

[...]

>> 
>>  - This patches still has one issue. If there are multiple instances of
>>    same application running and user wants to trace any particular
>>    instance, trace_uprobe is updating reference counter in all instances.
>>    This is not a problem on user side because instruction is not replaced
>>    with trap/int3 and thus user will only see samples from his interested
>>    process. But still this is more of a correctness issue. I'm working on
>>    a fix for this.
> 
> Hmm, it sounds like not a correctness issue, but there maybe a performace
> tradeoff. Tracing one particulear instance, other instances also will get
> a performance loss


Right, but it's temporary. I mean, putting everything in to this series was making
it complex. So this is the initial one and I'll send followup patches which will
optimize the reference counter update.


> (Only if the parameter preparation block is heavy,
> because the heaviest part of probing - trap/int3 and recording data - isn't
> executed.)
>> BTW, why this happens? I thought the refcounter part is just a data which
> is not shared among processes...
> 

This happens because we are not calling consumer_filter function. consumer_filter
is the one who decides whether to change the instruction to trap or not in a given
mm. We also need to call it before updating reference counter.

Let me know your thoughts.

Thanks,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions
From: Dave Hansen @ 2018-06-08  1:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrU7uNpSp8DWKnpH28wHE3JOeXkmp-H97n2nWHJEu4pDEA@mail.gmail.com>

On 06/07/2018 05:59 PM, Andy Lutomirski wrote:
> Can you ask the architecture folks to clarify the situation?  And, if
> your notes are indeed correct, don't we need code to handle spurious
> faults?

I'll double check that I didn't misunderstand the situation and that it
has not changed on processors with shadow stacks.

But, as far as spurious faults, wouldn't it just be a fault because
we've transiently gone to Present=0?  We already do that when clearing
the Dirty bit, so I'm not sure that's new.  We surely already handle
that one.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
From: Masami Hiramatsu @ 2018-06-08  1:10 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao
In-Reply-To: <20180606083344.31320-1-ravi.bangoria@linux.ibm.com>

On Wed,  6 Jun 2018 14:03:37 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Why RFC again:
> 
> This series is different from earlier versions[1]. Earlier series
> implemented this feature in trace_uprobe while this has implemented
> the logic in core uprobe. Few reasons for this:
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>  to take uprobe_lock to loop over trace_uprobe list. More details can
>  be found at[2]. With this new approach, there are no deadlocks found
>  so far.
>  2. Many of the core uprobe function and data-structures needs to be
>  exported to make earlier implementation simple. With this new approach,
>  reference counter logic is been implemented in core uprobe and thus
>  no need to export anything.

I agree with you. Moreover, since uprobe_register/unregister() are
exported to modules, this enablement would better be implemented
inside uprobe so that all uprobe users benefit from this.

> 
> Description:
> 
> Userspace Statically Defined Tracepoints[3] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
> 
> Default value of reference counter is 0. Tracer has to increment the 
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the trace_uprobe infrastructure. Ex,[4]
> 
>   # cat tick.c
>     ... 
>     for (i = 0; i < 100; i++) {
> 	DTRACE_PROBE1(tick, loop1, i);
>         if (TICK_LOOP2_ENABLED()) {
>             DTRACE_PROBE1(tick, loop2, i); 
>         }
>         printf("hi: %d\n", i); 
>         sleep(1);
>     }   
>     ... 
> 
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
> 
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
> 
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>              3      sdt_tick:loop1
>              0      sdt_tick:loop2
>      2.747086086 seconds time elapsed
> 
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C  
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> 
> Note:
>  - 'reference counter' is called as 'semaphore' in original Dtrace
>    (or Systemtap, bcc and even in ELF) documentation and code. But the 
>    term 'semaphore' is misleading in this context. This is just a counter
>    used to hold number of tracers tracing on a marker. This is not haeally
>    used for any synchronization. So we are referring it as 'reference
>    counter' in kernel / perf code.

+1. I like this "reference counter" term :)

>  - This patches still has one issue. If there are multiple instances of
>    same application running and user wants to trace any particular
>    instance, trace_uprobe is updating reference counter in all instances.
>    This is not a problem on user side because instruction is not replaced
>    with trap/int3 and thus user will only see samples from his interested
>    process. But still this is more of a correctness issue. I'm working on
>    a fix for this.

Hmm, it sounds like not a correctness issue, but there maybe a performace
tradeoff. Tracing one particulear instance, other instances also will get
a performance loss (Only if the parameter preparation block is heavy,
because the heaviest part of probing - trap/int3 and recording data - isn't
executed.)

BTW, why this happens? I thought the refcounter part is just a data which
is not shared among processes...

Thank you,


> 
> [1] https://lkml.org/lkml/2018/4/17/23
> [2] https://lkml.org/lkml/2018/5/25/111
> [3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> [4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
> 
> Ravi Bangoria (7):
>   Uprobes: Simplify uprobe_register() body
>   Uprobes: Support SDT markers having reference count (semaphore)
>   Uprobes/sdt: Fix multiple update of same reference counter
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Document about reference counter
>   perf probe: Support SDT markers having reference counter (semaphore)
> 
>  Documentation/trace/uprobetracer.rst |  16 +-
>  include/linux/uprobes.h              |   5 +
>  kernel/events/uprobes.c              | 502 +++++++++++++++++++++++++++++++----
>  kernel/trace/trace.c                 |   2 +-
>  kernel/trace/trace_uprobe.c          |  74 +++++-
>  tools/perf/util/probe-event.c        |  39 ++-
>  tools/perf/util/probe-event.h        |   1 +
>  tools/perf/util/probe-file.c         |  34 ++-
>  tools/perf/util/probe-file.h         |   1 +
>  tools/perf/util/symbol-elf.c         |  46 +++-
>  tools/perf/util/symbol.h             |   7 +
>  11 files changed, 643 insertions(+), 84 deletions(-)
> 
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions
From: Andy Lutomirski @ 2018-06-08  0:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Jonathan Corbet,
	Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <5c39caf1-2198-3c2b-b590-8c38a525747f@linux.intel.com>

On Thu, Jun 7, 2018 at 1:30 PM Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
> On 06/07/2018 09:24 AM, Andy Lutomirski wrote:
>
> >> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
> >> +                                           unsigned long addr, pte_t *ptep)
> >> +{
> >> +       bool rw;
> >> +
> >> +       rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> >> +       if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
> >> +               struct mm_struct *mm = vma->vm_mm;
> >> +               pte_t pte;
> >> +
> >> +               if (rw && (atomic_read(&mm->mm_users) > 1))
> >> +                       pte = ptep_clear_flush(vma, addr, ptep);
> > Why are you clearing the pte?
>
> I found my notes on the subject. :)
>
> Here's the sequence that causes the problem.  This could happen any time
> we try to take a PTE from read-write to read-only.  P==Present, W=Write,
> D=Dirty:
>
> CPU0 does a write, sees PTE with P=1,W=1,D=0
> CPU0 decides to set D=1
> CPU1 comes in and sets W=0
> CPU0 does locked operation to set D=1
>         CPU0 sees P=1,W=0,D=0
>         CPU0 sets back P=1,W=0,D=1
> CPU0 loads P=1,W=0,D=1 into the TLB
> CPU0 attempts to continue the write, but sees W=0 in the TLB and a #PF
> is generated because of the write fault.
>
> The problem with this is that we end up with a shadowstack-PTE
> (Write=0,Dirty=1) where we didn't want one.  This, unfortunately,
> imposes extra TLB flushing overhead on the R/W->R/O transitions that
> does not exist before shadowstack enabling.
>

So what exactly do the architects want the OS to do?  AFAICS the only
valid ways to clear the dirty bit are:

--- Choice 1 ---
a) Set P=0.
b) Flush using an IPI
c) Read D (so we know if the page was actually dirty)
d) Set P=1,W=0,D=0

and we need to handle spurious faults that happen between steps (a)
and (c).  This isn't so easy because the straightforward "is the fault
spurious" check is going to think it's *not* spurious.

--- Choice 2 ---
a) Set W=0
b) flush
c) Test and clear D

and we need to handle the spurious fault between b and c.  At least
this particular spurious fault is easier to handle since we can check
the error code.

But surely the right solution is to get the architecture team to see
if they can fix the dirty-bit-setting algorithm or, even better, to
look and see if the dirty-bit-setting algorithm is *already* better
and just document it.  If the cpu does a locked set-bit on D in your
example, the CPU is just being silly.  The CPU should make the whole
operation fully atomic: when trying to write to a page that's D=0 in
the TLB, it should re-walk the page tables and, atomically, load the
PTE and, if it's W=1,D=0, set D=1.  I'd honestly be a bit surprised if
modern CPUs don't already do this.

(Hmm.  If the CPUs were that smart, then we wouldn't need a flush at
all in some cases.  If we lock cmpxchg to change W=1,D=0 to W=0,D=0,
then we know that no other CPU can subsequently write the page without
re-walking, and we don't need to flush.)

Can you ask the architecture folks to clarify the situation?  And, if
your notes are indeed correct, don't we need code to handle spurious
faults?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 5/7] x86: Insert endbr32/endbr64 to vDSO
From: Andy Lutomirski @ 2018-06-08  0:47 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CAMe9rOphRmP4aEGGwmM=d+D7GyZWrC_hMTWaiy1tEtBd0PLdxA@mail.gmail.com>

On Thu, Jun 7, 2018 at 5:31 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 4:00 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Jun 7, 2018 at 3:03 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Thu, Jun 7, 2018 at 1:50 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> > On Thu, Jun 7, 2018 at 7:42 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >> >>
> >> >> From: "H.J. Lu" <hjl.tools@gmail.com>
> >> >>
> >> >> When Intel indirect branch tracking is enabled, functions in vDSO which
> >> >> may be called indirectly should have endbr32 or endbr64 as the first
> >> >> instruction.  We try to compile vDSO with -fcf-protection=branch -mibt
> >> >> if possible.  Otherwise, we insert endbr32 or endbr64 by hand to assembly
> >> >> codes generated by the compiler.
> >> >
> >> > Wow, that's... a genuine abomination.  Do we really need to support
> >> > CET on kernels built with old toolchains?
> >> >
> >>
> >> Yes.  GCC 7 should be able to build CET kernel.
> >>
> >
> > Why?  Presumably people running distros that use CET are going to have
> > kernels build with a CET-supporting compiler.
> >
>
> Good point.  It was needed before GCC 8 was released.   We can drop
> arch/x86/entry/vdso/endbr.sh now.
>

Phew!  We'll still need a patch to prevent configuring CET on a
compiler that can't support it.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 5/7] x86: Insert endbr32/endbr64 to vDSO
From: H.J. Lu @ 2018-06-08  0:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrXOc3-2QLks3iaTTvEE=4+2Oi3PORa8HPbYiDTtKV6CLQ@mail.gmail.com>

On Thu, Jun 7, 2018 at 4:00 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 3:03 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Jun 7, 2018 at 1:50 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Thu, Jun 7, 2018 at 7:42 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >>
>> >> From: "H.J. Lu" <hjl.tools@gmail.com>
>> >>
>> >> When Intel indirect branch tracking is enabled, functions in vDSO which
>> >> may be called indirectly should have endbr32 or endbr64 as the first
>> >> instruction.  We try to compile vDSO with -fcf-protection=branch -mibt
>> >> if possible.  Otherwise, we insert endbr32 or endbr64 by hand to assembly
>> >> codes generated by the compiler.
>> >
>> > Wow, that's... a genuine abomination.  Do we really need to support
>> > CET on kernels built with old toolchains?
>> >
>>
>> Yes.  GCC 7 should be able to build CET kernel.
>>
>
> Why?  Presumably people running distros that use CET are going to have
> kernels build with a CET-supporting compiler.
>

Good point.  It was needed before GCC 8 was released.   We can drop
arch/x86/entry/vdso/endbr.sh now.

-- 
H.J.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: Andy Lutomirski @ 2018-06-07 23:01 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CAMe9rOr49V8rqRa_KVsw61PWd+crkQvPDgPKtvowazjmsfgWWQ@mail.gmail.com>

On Thu, Jun 7, 2018 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
> >> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >> > >
> >> > > The following operations are provided.
> >> > >
> >> > > ARCH_CET_STATUS:
> >> > >         return the current CET status
> >> > >
> >> > > ARCH_CET_DISABLE:
> >> > >         disable CET features
> >> > >
> >> > > ARCH_CET_LOCK:
> >> > >         lock out CET features
> >> > >
> >> > > ARCH_CET_EXEC:
> >> > >         set CET features for exec()
> >> > >
> >> > > ARCH_CET_ALLOC_SHSTK:
> >> > >         allocate a new shadow stack
> >> > >
> >> > > ARCH_CET_PUSH_SHSTK:
> >> > >         put a return address on shadow stack
> >> > >
> >> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
> >> > > the implementation of GLIBC ucontext related APIs.
> >> >
> >> > Please document exactly what these all do and why.  I don't understand
> >> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
> >> > each ELF program, so I think there should be no need for a magic
> >> > override.
> >>
> >> CET is initially enabled if the loader has CET capability.  Then the
> >> loader decides if the application can run with CET.  If the application
> >> cannot run with CET (e.g. a dependent library does not have CET), then
> >> the loader turns off CET before passing to the application.  When the
> >> loader is done, it locks out CET and the feature cannot be turned off
> >> anymore until the next exec() call.
> >
> > Why is the lockout necessary?  If user code enables CET and tries to
> > run code that doesn't support CET, it will crash.  I don't see why we
> > need special code in the kernel to prevent a user program from calling
> > arch_prctl() and crashing itself.  There are already plenty of ways to
> > do that :)
>
> On CET enabled machine, not all programs nor shared libraries are
> CET enabled.  But since ld.so is CET enabled, all programs start
> as CET enabled.  ld.so will disable CET if a program or any of its shared
> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
> checking so that CET can't no longer be disabled afterwards.

Yeah, I got that.  No one has explained *why*.

(Also, shouldn't the vDSO itself be marked as supporting CET?)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 5/7] x86: Insert endbr32/endbr64 to vDSO
From: Andy Lutomirski @ 2018-06-07 23:00 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CAMe9rOp9YtNnTzsDPFXukK6ps7VNa2MU6o0GwnoES5Ma_AC65A@mail.gmail.com>

On Thu, Jun 7, 2018 at 3:03 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 1:50 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Jun 7, 2018 at 7:42 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> From: "H.J. Lu" <hjl.tools@gmail.com>
> >>
> >> When Intel indirect branch tracking is enabled, functions in vDSO which
> >> may be called indirectly should have endbr32 or endbr64 as the first
> >> instruction.  We try to compile vDSO with -fcf-protection=branch -mibt
> >> if possible.  Otherwise, we insert endbr32 or endbr64 by hand to assembly
> >> codes generated by the compiler.
> >
> > Wow, that's... a genuine abomination.  Do we really need to support
> > CET on kernels built with old toolchains?
> >
>
> Yes.  GCC 7 should be able to build CET kernel.
>

Why?  Presumably people running distros that use CET are going to have
kernels build with a CET-supporting compiler.

If we really really need this patch, then I want some kind of
assurance that selftests will catch the failure if something breaks it
or a new vDSO entry point is added.  But my inclination is to NAK this
patch and let the distros carry it if they really really want it.  As
it stands, this sucks for maintainability.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 5/7] x86: Insert endbr32/endbr64 to vDSO
From: H.J. Lu @ 2018-06-07 22:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrVRH8LeYoo7V1VBPqg4WS0Enxtizt=T7dPvgoeWfJrdzA@mail.gmail.com>

On Thu, Jun 7, 2018 at 1:50 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 7:42 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> From: "H.J. Lu" <hjl.tools@gmail.com>
>>
>> When Intel indirect branch tracking is enabled, functions in vDSO which
>> may be called indirectly should have endbr32 or endbr64 as the first
>> instruction.  We try to compile vDSO with -fcf-protection=branch -mibt
>> if possible.  Otherwise, we insert endbr32 or endbr64 by hand to assembly
>> codes generated by the compiler.
>
> Wow, that's... a genuine abomination.  Do we really need to support
> CET on kernels built with old toolchains?
>

Yes.  GCC 7 should be able to build CET kernel.


-- 
H.J.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: H.J. Lu @ 2018-06-07 22:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrXz3WWgZwUXJsDTWvmqKUArQFuMH1xJdSLVKFpTysNWxg@mail.gmail.com>

On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> > >
>> > > The following operations are provided.
>> > >
>> > > ARCH_CET_STATUS:
>> > >         return the current CET status
>> > >
>> > > ARCH_CET_DISABLE:
>> > >         disable CET features
>> > >
>> > > ARCH_CET_LOCK:
>> > >         lock out CET features
>> > >
>> > > ARCH_CET_EXEC:
>> > >         set CET features for exec()
>> > >
>> > > ARCH_CET_ALLOC_SHSTK:
>> > >         allocate a new shadow stack
>> > >
>> > > ARCH_CET_PUSH_SHSTK:
>> > >         put a return address on shadow stack
>> > >
>> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
>> > > the implementation of GLIBC ucontext related APIs.
>> >
>> > Please document exactly what these all do and why.  I don't understand
>> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
>> > each ELF program, so I think there should be no need for a magic
>> > override.
>>
>> CET is initially enabled if the loader has CET capability.  Then the
>> loader decides if the application can run with CET.  If the application
>> cannot run with CET (e.g. a dependent library does not have CET), then
>> the loader turns off CET before passing to the application.  When the
>> loader is done, it locks out CET and the feature cannot be turned off
>> anymore until the next exec() call.
>
> Why is the lockout necessary?  If user code enables CET and tries to
> run code that doesn't support CET, it will crash.  I don't see why we
> need special code in the kernel to prevent a user program from calling
> arch_prctl() and crashing itself.  There are already plenty of ways to
> do that :)

On CET enabled machine, not all programs nor shared libraries are
CET enabled.  But since ld.so is CET enabled, all programs start
as CET enabled.  ld.so will disable CET if a program or any of its shared
libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
checking so that CET can't no longer be disabled afterwards.

>> When the next exec() is called, CET
>> feature is turned on/off based on the values set by ARCH_CET_EXEC.
>
> And why do we need ARCH_CET_EXEC?
>
> For background, I really really dislike adding new state that persists
> across exec().  It's nice to get as close to a clean slate as possible
> after exec() so that programs can run in a predictable environment.
> exec() is also a security boundary, and anything a task can do to
> affect itself after exec() needs to have its security implications
> considered very carefully.  (As a trivial example, you should not be
> able to use cetcmd ... sudo [malicious options here] to cause sudo to
> run with CET off and then try to exploit it via the malicious options.
>
> If a shutoff is needed for testing, how about teaching ld.so to parse
> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
> protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
>

I will take a look.


-- 
H.J.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox