Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH RESEND v4 1/2] arm64: KVM: export the capability to set guest SError syndrome
From: Dongjiu Geng @ 2018-06-08 19:48 UTC (permalink / raw)
  To: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
	catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	gengdongjiu, linux-arm-kernel, linux-kernel, linux-acpi
In-Reply-To: <1528487320-2873-1-git-send-email-gengdongjiu@huawei.com>

For the arm64 RAS Extension, user space can inject a virtual-SError
with specified ESR. So user space needs to know whether KVM support
to inject such SError, this interface adds this query for this capability.

KVM will check whether system support RAS Extension, if supported, KVM
returns true to user space, otherwise returns false.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
---
 Documentation/virtual/kvm/api.txt | 11 +++++++++++
 arch/arm64/kvm/reset.c            |  3 +++
 include/uapi/linux/kvm.h          |  1 +
 3 files changed, 15 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 758bf40..fdac969 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4603,3 +4603,14 @@ Architectures: s390
 This capability indicates that kvm will implement the interfaces to handle
 reset, migration and nested KVM for branch prediction blocking. The stfle
 facility 82 should not be provided to the guest without this capability.
+
+8.14 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify the syndrome value reported
+to the guest OS when guest takes a virtual SError interrupt exception.
+If KVM has this capability, userspace can only specify the ISS field for the ESR
+syndrome, it can not specify the EC field which is not under control by KVM.
+If this virtual SError is taken to EL1 using AArch64, this value will be reported
+in ISS filed of ESR_EL1.
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..38c8a64 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PMU_V3:
 		r = kvm_arm_support_pmu_v3();
 		break;
+	case KVM_CAP_ARM_INJECT_SERROR_ESR:
+		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b02c41e..e88f976 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -948,6 +948,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_BPB 152
 #define KVM_CAP_GET_MSR_FEATURES 153
 #define KVM_CAP_HYPERV_EVENTFD 154
+#define KVM_CAP_ARM_INJECT_SERROR_ESR 155
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH RESEND v4 0/2] support exception state migration and set VSESR_EL2 by user space
From: Dongjiu Geng @ 2018-06-08 19:48 UTC (permalink / raw)
  To: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
	catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	gengdongjiu, linux-arm-kernel, linux-kernel, linux-acpi

This series patch is separated from https://www.spinics.net/lists/kvm/msg168917.html

1. Detect whether KVM can set set guest SError syndrome
2. Support to Set VSESR_EL2 and inject SError by user space.
3. Support live migration to keep SError pending state and VSESR_EL2 value

The user space patch is here: https://www.mail-archive.com/qemu-devel@nongnu.org/msg539534.html

Dongjiu Geng (2):
  arm64: KVM: export the capability to set guest SError syndrome
  arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

 Documentation/virtual/kvm/api.txt    | 42 +++++++++++++++++++++++++++++++++---
 arch/arm/include/asm/kvm_host.h      |  6 ++++++
 arch/arm/include/uapi/asm/kvm.h      | 12 +++++++++++
 arch/arm/kvm/guest.c                 | 12 +++++++++++
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/include/asm/kvm_host.h    |  7 ++++++
 arch/arm64/include/uapi/asm/kvm.h    | 13 +++++++++++
 arch/arm64/kvm/guest.c               | 36 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/inject_fault.c        |  6 +++---
 arch/arm64/kvm/reset.c               |  4 ++++
 include/uapi/linux/kvm.h             |  1 +
 virt/kvm/arm/arm.c                   | 19 ++++++++++++++++
 12 files changed, 157 insertions(+), 6 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH RESEND v4 2/2] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS
From: Dongjiu Geng @ 2018-06-08 19:48 UTC (permalink / raw)
  To: rkrcmar, corbet, christoffer.dall, marc.zyngier, linux,
	catalin.marinas, will.deacon, kvm, linux-doc, james.morse,
	gengdongjiu, linux-arm-kernel, linux-kernel, linux-acpi
In-Reply-To: <1528487320-2873-1-git-send-email-gengdongjiu@huawei.com>

For the migrating VMs, user space may need to know the exception
state. For example, in the machine A, KVM make an SError pending,
when migrate to B, KVM also needs to pend an SError.

This new IOCTL exports user-invisible states related to SError.
Together with appropriate user space changes, user space can get/set
the SError exception state to do migrate/snapshot/suspend.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
change since v3:
1. Fix the memset() issue in the kvm_arm_vcpu_get_events()

change since v2:
1. Add kvm_vcpu_events structure definition for arm platform to avoid the build errors.

change since v1:
Address Marc's comments, thanks Marc's review
1. serror_has_esr always true when ARM64_HAS_RAS_EXTN is set
2. remove Spurious blank line in kvm_arm_vcpu_set_events()
3. rename pend_guest_serror() to kvm_set_sei_esr()
4. Make kvm_arm_vcpu_get_events() did all the work rather than having this split responsibility.
5.  using sizeof(events) instead of sizeof(struct kvm_vcpu_events)

this series patch is separated from https://www.spinics.net/lists/kvm/msg168917.html
The user space patch is here: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html

change since V12:
1. change (vcpu->arch.hcr_el2 & HCR_VSE) to !!(vcpu->arch.hcr_el2 & HCR_VSE) in kvm_arm_vcpu_get_events()

Change since V11:
Address James's comments, thanks James
1. Align the struct of kvm_vcpu_events to 64 bytes
2. Avoid exposing the stale ESR value in the kvm_arm_vcpu_get_events()
3. Change variables 'injected' name to 'serror_pending' in the kvm_arm_vcpu_set_events()
4. Change to sizeof(events) from sizeof(struct kvm_vcpu_events) in kvm_arch_vcpu_ioctl()

Change since V10:
Address James's comments, thanks James
1. Merge the helper function with the user.
2. Move the ISS_MASK into pend_guest_serror() to clear top bits
3. Make kvm_vcpu_events struct align to 4 bytes
4. Add something check in the kvm_arm_vcpu_set_events()
5. Check kvm_arm_vcpu_get/set_events()'s return value.
6. Initialise kvm_vcpu_events to 0 so that padding transferred to user-space doesn't
   contain kernel stack.
---
 Documentation/virtual/kvm/api.txt    | 31 ++++++++++++++++++++++++++++---
 arch/arm/include/asm/kvm_host.h      |  6 ++++++
 arch/arm/include/uapi/asm/kvm.h      | 12 ++++++++++++
 arch/arm/kvm/guest.c                 | 12 ++++++++++++
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/include/asm/kvm_host.h    |  7 +++++++
 arch/arm64/include/uapi/asm/kvm.h    | 13 +++++++++++++
 arch/arm64/kvm/guest.c               | 36 ++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/inject_fault.c        |  6 +++---
 arch/arm64/kvm/reset.c               |  1 +
 virt/kvm/arm/arm.c                   | 19 +++++++++++++++++++
 11 files changed, 142 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index fdac969..8896737 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -835,11 +835,13 @@ struct kvm_clock_data {
 
 Capability: KVM_CAP_VCPU_EVENTS
 Extended by: KVM_CAP_INTR_SHADOW
-Architectures: x86
+Architectures: x86, arm, arm64
 Type: vm ioctl
 Parameters: struct kvm_vcpu_event (out)
 Returns: 0 on success, -1 on error
 
+X86:
+
 Gets currently pending exceptions, interrupts, and NMIs as well as related
 states of the vcpu.
 
@@ -881,15 +883,32 @@ Only two fields are defined in the flags field:
 - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
   smi contains a valid state.
 
+ARM, ARM64:
+
+Gets currently pending SError exceptions as well as related states of the vcpu.
+
+struct kvm_vcpu_events {
+	struct {
+		__u8 serror_pending;
+		__u8 serror_has_esr;
+		/* Align it to 8 bytes */
+		__u8 pad[6];
+		__u64 serror_esr;
+	} exception;
+	__u32 reserved[12];
+};
+
 4.32 KVM_SET_VCPU_EVENTS
 
-Capability: KVM_CAP_VCPU_EVENTS
+Capebility: KVM_CAP_VCPU_EVENTS
 Extended by: KVM_CAP_INTR_SHADOW
-Architectures: x86
+Architectures: x86, arm, arm64
 Type: vm ioctl
 Parameters: struct kvm_vcpu_event (in)
 Returns: 0 on success, -1 on error
 
+X86:
+
 Set pending exceptions, interrupts, and NMIs as well as related states of the
 vcpu.
 
@@ -910,6 +929,12 @@ shall be written into the VCPU.
 
 KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
 
+ARM, ARM64:
+
+Set pending SError exceptions as well as related states of the vcpu.
+
+See KVM_GET_VCPU_EVENTS for the data structure.
+
 
 4.33 KVM_GET_DEBUGREGS
 
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c7c28c8..39f9901 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -213,6 +213,12 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events);
+
+int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events);
+
 unsigned long kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index caae484..c3e6975 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -124,6 +124,18 @@ struct kvm_sync_regs {
 struct kvm_arch_memory_slot {
 };
 
+/* for KVM_GET/SET_VCPU_EVENTS */
+struct kvm_vcpu_events {
+	struct {
+		__u8 serror_pending;
+		__u8 serror_has_esr;
+		/* Align it to 8 bytes */
+		__u8 pad[6];
+		__u64 serror_esr;
+	} exception;
+	__u32 reserved[12];
+};
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index a18f33e..c685f0e 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -261,6 +261,18 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events)
+{
+	return -EINVAL;
+}
+
+int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 1dab3a9..18f61ff 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -81,6 +81,11 @@ static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
 	return (unsigned long *)&vcpu->arch.hcr_el2;
 }
 
+static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.vsesr_el2;
+}
+
 static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
 {
 	vcpu->arch.vsesr_el2 = vsesr;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 469de8a..357304a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -335,6 +335,11 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events);
+
+int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
@@ -363,6 +368,8 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
 int kvm_perf_init(void);
 int kvm_perf_teardown(void);
 
+void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
+
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
 void __kvm_set_tpidr_el2(u64 tpidr_el2);
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 04b3256..df4faee 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -39,6 +39,7 @@
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_IRQ_LINE
 #define __KVM_HAVE_READONLY_MEM
+#define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
@@ -153,6 +154,18 @@ struct kvm_sync_regs {
 struct kvm_arch_memory_slot {
 };
 
+/* for KVM_GET/SET_VCPU_EVENTS */
+struct kvm_vcpu_events {
+	struct {
+		__u8 serror_pending;
+		__u8 serror_has_esr;
+		/* Align it to 8 bytes */
+		__u8 pad[6];
+		__u64 serror_esr;
+	} exception;
+	__u32 reserved[12];
+};
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 56a0260..4426915 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -289,6 +289,42 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events)
+{
+	memset(events, 0, sizeof(*events));
+
+	events->exception.serror_pending = !!(vcpu->arch.hcr_el2 & HCR_VSE);
+	events->exception.serror_has_esr =
+					cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+
+	if (events->exception.serror_pending &&
+		events->exception.serror_has_esr)
+		events->exception.serror_esr = vcpu_get_vsesr(vcpu);
+	else
+		events->exception.serror_esr = 0;
+
+	return 0;
+}
+
+int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
+			struct kvm_vcpu_events *events)
+{
+	bool serror_pending = events->exception.serror_pending;
+	bool has_esr = events->exception.serror_has_esr;
+
+	if (serror_pending && has_esr) {
+		if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
+			return -EINVAL;
+
+		kvm_set_sei_esr(vcpu, events->exception.serror_esr);
+	} else if (serror_pending) {
+		kvm_inject_vabt(vcpu);
+	}
+
+	return 0;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index d8e7165..a55e91d 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -164,9 +164,9 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 		inject_undef64(vcpu);
 }
 
-static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
+void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 esr)
 {
-	vcpu_set_vsesr(vcpu, esr);
+	vcpu_set_vsesr(vcpu, esr & ESR_ELx_ISS_MASK);
 	*vcpu_hcr(vcpu) |= HCR_VSE;
 }
 
@@ -184,5 +184,5 @@ static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	pend_guest_serror(vcpu, ESR_ELx_ISV);
+	kvm_set_sei_esr(vcpu, ESR_ELx_ISV);
 }
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 38c8a64..20e919a 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -82,6 +82,7 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
+	case KVM_CAP_VCPU_EVENTS:
 		r = 1;
 		break;
 	default:
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a4c1b76..79ecba9 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1107,6 +1107,25 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = kvm_arm_vcpu_has_attr(vcpu, &attr);
 		break;
 	}
+	case KVM_GET_VCPU_EVENTS: {
+		struct kvm_vcpu_events events;
+
+		if (kvm_arm_vcpu_get_events(vcpu, &events))
+			return -EINVAL;
+
+		if (copy_to_user(argp, &events, sizeof(events)))
+			return -EFAULT;
+
+		return 0;
+	}
+	case KVM_SET_VCPU_EVENTS: {
+		struct kvm_vcpu_events events;
+
+		if (copy_from_user(&events, argp, sizeof(events)))
+			return -EFAULT;
+
+		return kvm_arm_vcpu_set_events(vcpu, &events);
+	}
 	default:
 		r = -EINVAL;
 	}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 03/10] x86/cet: Signal handling for shadow stack
From: Cyrill Gorcunov @ 2018-06-08 12:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, Florian Weimer, Dmitry Safonov, LKML, linux-doc,
	Linux-MM, linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar,
	Dave Hansen, Jonathan Corbet, Oleg Nesterov, Arnd Bergmann,
	mike.kravetz
In-Reply-To: <CALCETrXAoPsHK49c1Dpa8N0ccsxjwnVOTktKVaY++xjHxdmUzg@mail.gmail.com>

On Thu, Jun 07, 2018 at 01:57:03PM -0700, Andy Lutomirski wrote:
...
> >
> > I didn't read the whole series of patches in details
> > yet, hopefully will be able tomorrow. Thanks Andy for
> > CC'ing!
> 
> We have uc_flags.  It might be useful to carve out some of the flag
> space (24 bits?) to indicate something like the *size* of sigcontext
> and teach the kernel that new sigcontext fields should only be parsed
> on sigreturn() if the size is large enough.

Yes, this should do the trick.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: H.J. Lu @ 2018-06-08 12:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrWd+1iNmt36EFiLxMv8bQ-GodU=XygPRGb4h+xanhHHLQ@mail.gmail.com>

On Thu, Jun 7, 2018 at 9:35 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 9:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Jun 7, 2018 at 3:02 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> > On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >>>
>> >>> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> >>> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >>> > >
>> >>> > > The following operations are provided.
>> >>> > >
>> >>> > > ARCH_CET_STATUS:
>> >>> > >         return the current CET status
>> >>> > >
>> >>> > > ARCH_CET_DISABLE:
>> >>> > >         disable CET features
>> >>> > >
>> >>> > > ARCH_CET_LOCK:
>> >>> > >         lock out CET features
>> >>> > >
>> >>> > > ARCH_CET_EXEC:
>> >>> > >         set CET features for exec()
>> >>> > >
>> >>> > > ARCH_CET_ALLOC_SHSTK:
>> >>> > >         allocate a new shadow stack
>> >>> > >
>> >>> > > ARCH_CET_PUSH_SHSTK:
>> >>> > >         put a return address on shadow stack
>> >>> > >
>>
>> >> And why do we need ARCH_CET_EXEC?
>> >>
>> >> For background, I really really dislike adding new state that persists
>> >> across exec().  It's nice to get as close to a clean slate as possible
>> >> after exec() so that programs can run in a predictable environment.
>> >> exec() is also a security boundary, and anything a task can do to
>> >> affect itself after exec() needs to have its security implications
>> >> considered very carefully.  (As a trivial example, you should not be
>> >> able to use cetcmd ... sudo [malicious options here] to cause sudo to
>> >> run with CET off and then try to exploit it via the malicious options.
>> >>
>> >> If a shutoff is needed for testing, how about teaching ld.so to parse
>> >> LD_CET=no or similar and protect it the same way as LD_PRELOAD is
>> >> protected.  Or just do LD_PRELOAD=/lib/libdoesntsupportcet.so.
>> >>
>> >
>> > I will take a look.
>>
>> We can use LD_CET to turn off CET.   Since most of legacy binaries
>> are compatible with shadow stack,  ARCH_CET_EXEC can be used
>> to turn on shadow stack on legacy binaries:
>
> Is there any reason you can't use LD_CET=force to do it for
> dynamically linked binaries?

We need to enable shadow stack from the start.  Otherwise function
return will fail when returning from callee with shadow stack to caller
without shadow stack.

> I find it quite hard to believe that forcibly CET-ifying a legacy
> statically linked binary is a good idea.

We'd like to provide protection as much as we can.

-- 
H.J.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: H.J. Lu @ 2018-06-08 12:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Shanbhogue, Vedvyas,
	Ravi V. Shankar, Dave Hansen, Jonathan Corbet, Oleg Nesterov,
	Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrWhMmqGWKx-yw55YKHMJwGyLZio5f8Pskh8X69zfQMy7A@mail.gmail.com>

On Thu, Jun 7, 2018 at 9:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 7, 2018 at 9:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> > On Thu, Jun 7, 2018 at 3:02 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>
>> >> On Thu, Jun 7, 2018 at 2:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> >> > On Thu, Jun 7, 2018 at 1:33 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >> >>
>> >> >> On Thu, 2018-06-07 at 11:48 -0700, Andy Lutomirski wrote:
>> >> >> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>> >> >> > >
>> >> >> > > The following operations are provided.
>> >> >> > >
>> >> >> > > ARCH_CET_STATUS:
>> >> >> > >         return the current CET status
>> >> >> > >
>> >> >> > > ARCH_CET_DISABLE:
>> >> >> > >         disable CET features
>> >> >> > >
>> >> >> > > ARCH_CET_LOCK:
>> >> >> > >         lock out CET features
>> >> >> > >
>> >> >> > > ARCH_CET_EXEC:
>> >> >> > >         set CET features for exec()
>> >> >> > >
>> >> >> > > ARCH_CET_ALLOC_SHSTK:
>> >> >> > >         allocate a new shadow stack
>> >> >> > >
>> >> >> > > ARCH_CET_PUSH_SHSTK:
>> >> >> > >         put a return address on shadow stack
>> >> >> > >
>> >> >> > > ARCH_CET_ALLOC_SHSTK and ARCH_CET_PUSH_SHSTK are intended only for
>> >> >> > > the implementation of GLIBC ucontext related APIs.
>> >> >> >
>> >> >> > Please document exactly what these all do and why.  I don't understand
>> >> >> > what purpose ARCH_CET_LOCK and ARCH_CET_EXEC serve.  CET is opt in for
>> >> >> > each ELF program, so I think there should be no need for a magic
>> >> >> > override.
>> >> >>
>> >> >> CET is initially enabled if the loader has CET capability.  Then the
>> >> >> loader decides if the application can run with CET.  If the application
>> >> >> cannot run with CET (e.g. a dependent library does not have CET), then
>> >> >> the loader turns off CET before passing to the application.  When the
>> >> >> loader is done, it locks out CET and the feature cannot be turned off
>> >> >> anymore until the next exec() call.
>> >> >
>> >> > Why is the lockout necessary?  If user code enables CET and tries to
>> >> > run code that doesn't support CET, it will crash.  I don't see why we
>> >> > need special code in the kernel to prevent a user program from calling
>> >> > arch_prctl() and crashing itself.  There are already plenty of ways to
>> >> > do that :)
>> >>
>> >> On CET enabled machine, not all programs nor shared libraries are
>> >> CET enabled.  But since ld.so is CET enabled, all programs start
>> >> as CET enabled.  ld.so will disable CET if a program or any of its shared
>> >> libraries aren't CET enabled.  ld.so will lock up CET once it is done CET
>> >> checking so that CET can't no longer be disabled afterwards.
>> >
>> > Yeah, I got that.  No one has explained *why*.
>>
>> It is to prevent malicious code from disabling CET.
>>
>
> By the time malicious code issue its own syscalls, you've already lost
> the battle.  I could probably be convinced that a lock-CET-on feature
> that applies *only* to the calling thread and is not inherited by
> clone() is a decent idea, but I'd want to see someone who understands
> the state of the art in exploit design justify it.  You're also going
> to need to figure out how to make CRIU work if you allow locking CET
> on.
>
> A priori, I think we should just not provide a lock mechanism.

We need a door for CET.  But it is a very bad idea to leave it open
all the time.  I don't know much about CRIU,  If it is Checkpoint/Restore
In Userspace.  Can you free any application with AVX512 on AVX512
machine and restore it on non-AVX512 machine?

>> > (Also, shouldn't the vDSO itself be marked as supporting CET?)
>>
>> No. vDSO is loaded by kernel.  vDSO in CET kernel is CET
>> compatible.
>>
>
> I think the vDSO should do its best to act like a real DSO.  That
> means that, if the vDSO supports CET, it should advertise support for
> CET using the Linux ABI.  Since you're going to require GCC 8 anyway,
> this should be a single line of code in the Makefile.

Sure.  A couple lines.

-- 
H.J.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
From: Florian Weimer @ 2018-06-08 14:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yu-cheng Yu, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrWPDXpbVcuFK_1M5DtaCOW_LSf-XHFAD0vpc735oFWLPg@mail.gmail.com>

On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
>>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>>>
>>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
>>>> needs a separate program stack and a separate shadow stack.
>>>> This patch handles allocation and freeing of the thread shadow
>>>> stack.
>>>
>>> Aha -- you're trying to make this automatic.  I'm not convinced this
>>> is a good idea.  The Linux kernel has a long and storied history of
>>> enabling new hardware features in ways that are almost entirely
>>> useless for userspace.
>>>
>>> Florian, do you have any thoughts on how the user/kernel interaction
>>> for the shadow stack should work?
>>
>> I have not looked at this in detail, have not played with the emulator,
>> and have not been privy to any discussions before these patches have
>> been posted, however …
>>
>> I believe that we want as little code in userspace for shadow stack
>> management as possible.  One concern I have is that even with the code
>> we arguably need for various kinds of stack unwinding, we might have
>> unwittingly built a generic trampoline that leads to full CET bypass.
> 
> I was imagining an API like "allocate a shadow stack for the current
> thread, fail if the current thread already has one, and turn on the
> shadow stack".  glibc would call clone and then call this ABI pretty
> much immediately (i.e. before making any calls from which it expects
> to return).

Ahh.  So you propose not to enable shadow stack enforcement on the new 
thread even if it is enabled for the current thread?  For the cases 
where CLONE_VM is involved?

It will still need a new assembler wrapper which sets up the shadow 
stack, and it's probably required to disable signals.

I think it should be reasonable safe and actually implementable.  But 
the benefits are not immediately obvious to me.

> We definitely want strong enough user control that tools like CRIU can
> continue to work.  I haven't looked at the SDM recently enough to
> remember for sure, but I'm reasonably confident that user code can
> learn the address of its own shadow stack.  If nothing else, CRIU
> needs to be able to restore from a context where there's a signal on
> the stack and the signal frame contains a shadow stack pointer.

CRIU also needs the shadow stack *contents*, which shouldn't be directly 
available to the process.  So it needs very special interfaces anyway.

Does CRIU implement MPX support?

Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: Andy Lutomirski @ 2018-06-08 14:57 UTC (permalink / raw)
  To: H. J. Lu, Cyrill Gorcunov, Dmitry Safonov
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CAMe9rOpLDzWk=xdZqN1QJVnP-c_dti5Fy=C_GqbeQpS_a=0ewA@mail.gmail.com>

On Fri, Jun 8, 2018 at 5:24 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 9:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, Jun 7, 2018 at 9:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >>
> >
> > By the time malicious code issue its own syscalls, you've already lost
> > the battle.  I could probably be convinced that a lock-CET-on feature
> > that applies *only* to the calling thread and is not inherited by
> > clone() is a decent idea, but I'd want to see someone who understands
> > the state of the art in exploit design justify it.  You're also going
> > to need to figure out how to make CRIU work if you allow locking CET
> > on.
> >
> > A priori, I think we should just not provide a lock mechanism.
>
> We need a door for CET.  But it is a very bad idea to leave it open
> all the time.  I don't know much about CRIU,  If it is Checkpoint/Restore
> In Userspace.  Can you free any application with AVX512 on AVX512
> machine and restore it on non-AVX512 machine?

Presumably not -- if the program uses AVX512 and AVX512 goes away,
then the program won't be happy.

Anyway, having thought about this, here's a straw man proposal.  We
add a lock flag like in these patches.  The lock flag is set by
arch_prctl(), inherited on clone, and cleared on exec().  ptrace()
gains a new API to clear the lock flag and can modify the CET
configuration regardless of the lock flag.  (So ptrace() needs APIs to
read and write SSP, to read and write the shadow stack itself, and to
change the mode.)  By the time an attacker has gotten enough control
of a victim process to get it to use ptrace(), I don't think that
trying to protect CET serves any purpose.

As an aside, where are the latest CET docs?  I've found the "CET
technology preview 2.0", but it doesn't seem to be very clear or
entirely complete.

On Fri, Jun 8, 2018 at 5:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Jun 7, 2018 at 9:35 PM, Andy Lutomirski <luto@kernel.org> wrote:

> > Is there any reason you can't use LD_CET=force to do it for
> > dynamically linked binaries?
>
> We need to enable shadow stack from the start.  Otherwise function
> return will fail when returning from callee with shadow stack to caller
> without shadow stack.

I don't see the problem.  A CET-supporting ld.so will be started with
CET on regardless of what the final binary says.  If ld.so sees
LD_CET=force, it can keep CET on regardless of the flags in the loaded
binary.

>
> > I find it quite hard to believe that forcibly CET-ifying a legacy
> > statically linked binary is a good idea.
>
> We'd like to provide protection as much as we can.
>

I agree that this is a nice sentiment, but I don't think that a simple
"force CET on next exec()" flag is a good way to accomplish this.
I've had the pleasure of using legacy binaries, and there are all
kinds of gotchas.  First, a bunch of them aren't binaries at all --
they're shell scripts.  There's big_expensive_program that starts with
#!/bin/bash and eventually execs
/opt/blahblahblah/big_expensive_program_bin, and that involves two
execs.  (Heck, even Firefox is set up more or less like this.)  Some
programs can re-exec themselves.  All of this is not to mention that
it would be really annoying when your program crashes after you've
been using it for hours because you finally triggered the code path
that did longjmp() and CET kills it.

And you don't really need kernel support for this anyway.  It should
be relatively straightforward to write a loader that opens and loads a
static binary.

I think that this entire CET-on-exec concept should be dropped from
this patch series.  If someone really wants it, make it a separate
patch on top after everything has been merged, and we can poke holes
in it them.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
From: Andy Lutomirski @ 2018-06-08 15:01 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Andrew Lutomirski, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	H. J. Lu, Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <6ee29e8b-4a0a-3459-a1ee-03923ba4e15d@redhat.com>

On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>>>
> >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> >>>> needs a separate program stack and a separate shadow stack.
> >>>> This patch handles allocation and freeing of the thread shadow
> >>>> stack.
> >>>
> >>> Aha -- you're trying to make this automatic.  I'm not convinced this
> >>> is a good idea.  The Linux kernel has a long and storied history of
> >>> enabling new hardware features in ways that are almost entirely
> >>> useless for userspace.
> >>>
> >>> Florian, do you have any thoughts on how the user/kernel interaction
> >>> for the shadow stack should work?
> >>
> >> I have not looked at this in detail, have not played with the emulator,
> >> and have not been privy to any discussions before these patches have
> >> been posted, however …
> >>
> >> I believe that we want as little code in userspace for shadow stack
> >> management as possible.  One concern I have is that even with the code
> >> we arguably need for various kinds of stack unwinding, we might have
> >> unwittingly built a generic trampoline that leads to full CET bypass.
> >
> > I was imagining an API like "allocate a shadow stack for the current
> > thread, fail if the current thread already has one, and turn on the
> > shadow stack".  glibc would call clone and then call this ABI pretty
> > much immediately (i.e. before making any calls from which it expects
> > to return).
>
> Ahh.  So you propose not to enable shadow stack enforcement on the new
> thread even if it is enabled for the current thread?  For the cases
> where CLONE_VM is involved?
>
> It will still need a new assembler wrapper which sets up the shadow
> stack, and it's probably required to disable signals.
>
> I think it should be reasonable safe and actually implementable.  But
> the benefits are not immediately obvious to me.

Doing it this way would have been my first incliniation.  It would
avoid all the oddities of the kernel magically creating a VMA when
clone() is called, guessing the shadow stack size, etc.  But I'm okay
with having the kernel do it automatically, too.  I think it would be
very nice to have a way for user code to find out the size of the
shadow stack and change it, though.  (And relocate it, but maybe
that's impossible.  The CET documentation doesn't have a clear
description of the shadow stack layout.)

>
> > We definitely want strong enough user control that tools like CRIU can
> > continue to work.  I haven't looked at the SDM recently enough to
> > remember for sure, but I'm reasonably confident that user code can
> > learn the address of its own shadow stack.  If nothing else, CRIU
> > needs to be able to restore from a context where there's a signal on
> > the stack and the signal frame contains a shadow stack pointer.
>
> CRIU also needs the shadow stack *contents*, which shouldn't be directly
> available to the process.  So it needs very special interfaces anyway.

True.  I proposed in a different email that ptrace() have full control
of the shadow stack (read, write, lock, unlock, etc).

>
> Does CRIU implement MPX support?

Dunno.  But given that MPX seems to be dying, I'm not sure it matters.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
From: Masami Hiramatsu @ 2018-06-08 15:45 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao
In-Reply-To: <71f19b63-a641-1705-f087-a39b8b81c4be@linux.ibm.com>

On Fri, 8 Jun 2018 12:04:25 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Hi Masami,
> 
> >> So for kernel modules,
> >>
> >> is it fine to change current ABI from
> >>     uprobe_register(inode, offset, consumer)
> >> to
> >>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
> >>
> >> Or I should introduce new function for this:
> >>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> >> and export it to kernel module?
> >>
> >> What's your suggestion?
> > 
> > Latter is fine to me. Since the refctr is introduced totally in userspace
> > (for SDT) and free-address userspace probing doesn't need refctr, maybe
> > we should keep those separated.
> 
> Sure.
> 
> > 
> >> [...]
> >>
> >>>>
> >>>>  - This patches still has one issue. If there are multiple instances of
> >>>>    same application running and user wants to trace any particular
> >>>>    instance, trace_uprobe is updating reference counter in all instances.
> >>>>    This is not a problem on user side because instruction is not replaced
> >>>>    with trap/int3 and thus user will only see samples from his interested
> >>>>    process. But still this is more of a correctness issue. I'm working on
> >>>>    a fix for this.
> >>>
> >>> Hmm, it sounds like not a correctness issue, but there maybe a performace
> >>> tradeoff. Tracing one particulear instance, other instances also will get
> >>> a performance loss
> >>
> >>
> >> Right, but it's temporary. I mean, putting everything in to this series was making
> >> it complex. So this is the initial one and I'll send followup patches which will
> >> optimize the reference counter update.
> > 
> > Ah, OK. If you have prepared the followup patches, could you also send it
> > with this series? Perhups it will help us to understand the issue clearer.
> 
> Not ready as such.. it's making the code bit complicated. I'm working on it
> and will send the next series with those patches included.

OK, thanks!

> >>> (Only if the parameter preparation block is heavy,
> >>> because the heaviest part of probing - trap/int3 and recording data - isn't
> >>> executed.)
> >>>> BTW, why this happens? I thought the refcounter part is just a data which
> >>> is not shared among processes...
> >>>
> >>
> >> This happens because we are not calling consumer_filter function. consumer_filter
> >> is the one who decides whether to change the instruction to trap or not in a given
> >> mm. We also need to call it before updating reference counter.
> > 
> > Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
> > remove_breakpoint?
> 
> Not really, it would be simpler if I can put it inside install_breakpoint().
> Consider an mmap() case. Probed instruction resides in the text section whereas
> reference counter resides in the data section. These sections gets mapped using
> separate mmap() calls. So, when process mmaps the text section we will change the
> instruction, but section holding the reference counter may not have been mapped
> yet in the virtual memory. If so, we will fail to update the reference counter.

Got it. 
In such case, maybe we can hook the target page mmapped and do install_breakpoint()
at that point. Since the instruction is protected by a refctr, unless mmap the
page on where the refctr is, the program doesn't reach the tracepoint. Is that right?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/10] x86/cet: Add arch_prctl functions for shadow stack
From: Cyrill Gorcunov @ 2018-06-08 15:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. J. Lu, Dmitry Safonov, Yu-cheng Yu, LKML, linux-doc, Linux-MM,
	linux-arch, X86 ML, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrUyapFiiXrHH23NW8XbqEkfKdGGU2wMUZ2DU=A+GWGqvw@mail.gmail.com>

On Fri, Jun 08, 2018 at 07:57:22AM -0700, Andy Lutomirski wrote:
> On Fri, Jun 8, 2018 at 5:24 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Jun 7, 2018 at 9:38 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > > On Thu, Jun 7, 2018 at 9:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >>
> > >> On Thu, Jun 7, 2018 at 4:01 PM, Andy Lutomirski <luto@kernel.org> wrote:
> > >>
> > >
> > > By the time malicious code issue its own syscalls, you've already lost
> > > the battle.  I could probably be convinced that a lock-CET-on feature
> > > that applies *only* to the calling thread and is not inherited by
> > > clone() is a decent idea, but I'd want to see someone who understands
> > > the state of the art in exploit design justify it.  You're also going
> > > to need to figure out how to make CRIU work if you allow locking CET
> > > on.
> > >
> > > A priori, I think we should just not provide a lock mechanism.
> >
> > We need a door for CET.  But it is a very bad idea to leave it open
> > all the time.  I don't know much about CRIU,  If it is Checkpoint/Restore
> > In Userspace.  Can you free any application with AVX512 on AVX512
> > machine and restore it on non-AVX512 machine?
> 
> Presumably not -- if the program uses AVX512 and AVX512 goes away,
> then the program won't be happy.

Yes. In most scenarios we require the fpu capability to be the same
on both machines (in case of migration) or/and not being changed
between c/r cycles.
...
> As an aside, where are the latest CET docs?  I've found the "CET
> technology preview 2.0", but it doesn't seem to be very clear or
> entirely complete.

+1
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/10] x86/cet: Handle thread shadow stack
From: Yu-cheng Yu @ 2018-06-08 15:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Florian Weimer, LKML, linux-doc, Linux-MM, linux-arch, X86 ML,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H. J. Lu,
	Shanbhogue, Vedvyas, Ravi V. Shankar, Dave Hansen,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, mike.kravetz
In-Reply-To: <CALCETrV_V68nVhCpUSGXrwUKCu4utbdp01snmG=G=+_xAo0KJA@mail.gmail.com>

On Fri, 2018-06-08 at 08:01 -0700, Andy Lutomirski wrote:
> On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> > > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
> > >>
> > >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> > >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >>>>
> > >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> > >>>> needs a separate program stack and a separate shadow stack.
> > >>>> This patch handles allocation and freeing of the thread shadow
> > >>>> stack.
> > >>>
> > >>> Aha -- you're trying to make this automatic.  I'm not convinced this
> > >>> is a good idea.  The Linux kernel has a long and storied history of
> > >>> enabling new hardware features in ways that are almost entirely
> > >>> useless for userspace.
> > >>>
> > >>> Florian, do you have any thoughts on how the user/kernel interaction
> > >>> for the shadow stack should work?
> > >>
> > >> I have not looked at this in detail, have not played with the emulator,
> > >> and have not been privy to any discussions before these patches have
> > >> been posted, however …
> > >>
> > >> I believe that we want as little code in userspace for shadow stack
> > >> management as possible.  One concern I have is that even with the code
> > >> we arguably need for various kinds of stack unwinding, we might have
> > >> unwittingly built a generic trampoline that leads to full CET bypass.
> > >
> > > I was imagining an API like "allocate a shadow stack for the current
> > > thread, fail if the current thread already has one, and turn on the
> > > shadow stack".  glibc would call clone and then call this ABI pretty
> > > much immediately (i.e. before making any calls from which it expects
> > > to return).
> >
> > Ahh.  So you propose not to enable shadow stack enforcement on the new
> > thread even if it is enabled for the current thread?  For the cases
> > where CLONE_VM is involved?
> >
> > It will still need a new assembler wrapper which sets up the shadow
> > stack, and it's probably required to disable signals.
> >
> > I think it should be reasonable safe and actually implementable.  But
> > the benefits are not immediately obvious to me.
> 
> Doing it this way would have been my first incliniation.  It would
> avoid all the oddities of the kernel magically creating a VMA when
> clone() is called, guessing the shadow stack size, etc.  But I'm okay
> with having the kernel do it automatically, too.

HJ wanted to add a arch_prctl that allocates a new shadow stack and
switches to it.  That was mainly for swapcontext.  Perhaps we can also
use that for threads?  HJ, can you comment on this?

> I think it would be
> very nice to have a way for user code to find out the size of the
> shadow stack and change it, though.  (And relocate it, but maybe
> that's impossible.  The CET documentation doesn't have a clear
> description of the shadow stack layout.)

The shadow stack is vm_mmap'ed from memory and does not have any special
layout.  We can add a arch_prctl to find out shadow stack's address and
size.

> >
> > > We definitely want strong enough user control that tools like CRIU can
> > > continue to work.  I haven't looked at the SDM recently enough to
> > > remember for sure, but I'm reasonably confident that user code can
> > > learn the address of its own shadow stack.  If nothing else, CRIU
> > > needs to be able to restore from a context where there's a signal on
> > > the stack and the signal frame contains a shadow stack pointer.
> >
> > CRIU also needs the shadow stack *contents*, which shouldn't be directly
> > available to the process.  So it needs very special interfaces anyway.
> 
> True.  I proposed in a different email that ptrace() have full control
> of the shadow stack (read, write, lock, unlock, etc).

PTRACE can do PTRACE_POKEDATA on shadow stack.  We can add lock/unlock.

> >
> > Does CRIU implement MPX support?
> 
> Dunno.  But given that MPX seems to be dying, I'm not sure it matters.
> 
> --Andy


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
From: Oleg Nesterov @ 2018-06-08 16:36 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao
In-Reply-To: <20180606083344.31320-1-ravi.bangoria@linux.ibm.com>

Hello,

I am travelling till the end of the next week, can't read this version
until I return. Just one question,

On 06/06, Ravi Bangoria wrote:
>
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix

Could you remind what exactly was wrong?

I can't find your previous email about this problem, but iirc you didn't
explain the deadlock in details, just copied some traces from lockdep...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v11 00/13] Intel SGX1 support
From: Jarkko Sakkinen @ 2018-06-08 17:09 UTC (permalink / raw)
  To: x86, platform-driver-x86
  Cc: dave.hansen, sean.j.christopherson, nhorman, npmccallum,
	Jarkko Sakkinen, Alexei Starovoitov, Andi Kleen, Andrew Morton,
	Andy Lutomirski, Borislav Petkov, David S. Miller,
	David Woodhouse, Greg Kroah-Hartman, H. Peter Anvin, Ingo Molnar,
	open list:INTEL SGX, Janakarajan Natarajan, Kirill A. Shutemov,
	Konrad Rzeszutek Wilk,
	open list:KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86), Len Brown,
	Linus Walleij, open list:CRYPTO API, open list:DOCUMENTATION,
	open list, open list:SPARSE CHECKER, Mauro Carvalho Chehab,
	Peter Zijlstra, Rafael J. Wysocki, Randy Dunlap, Ricardo Neri,
	Thomas Gleixner, Tom Lendacky, Vikas Shivappa

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 13837 bytes --]

Intel(R) SGX is a set of CPU instructions that can be used by applications
to set aside private regions of code and data. The code outside the enclave
is disallowed to access the memory inside the enclave by the CPU access
control.  In a way you can think that SGX provides inverted sandbox. It
protects the application from a malicious host.

There is a new hardware unit in the processor called Memory Encryption
Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
one or many MEE regions that can hold enclave data by configuring them with
PRMRR registers.

The MEE automatically encrypts the data leaving the processor package to
the MEE regions. The data is encrypted using a random key whose life-time
is exactly one power cycle.

You can tell if your CPU supports SGX by looking into /proc/cpuinfo:

	cat /proc/cpuinfo  | grep sgx

v11:
* Polished ENCLS wrappers with refined exception handling.
* ksgxswapd was not stopped (regression in v5) in
  sgx_page_cache_teardown(), which causes a leaked kthread after driver
  deinitialization.
* Shutdown sgx_le_proxy when going to suspend because its EPC pages will be
  invalidated when resuming, which will cause it not function properly
  anymore.
* Set EINITTOKEN.VALID to zero for a token that is passed when
  SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero
  page.
* Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets
  around the flags field, which causes a #GP. Only flags read is readable.
* On read access memcpy() call inside sgx_vma_access() had src and dest
  parameters in wrong order.
* The build issue with CONFIG_KASAN is now fixed. Added undefined symbols
  to LE even if “KASAN_SANITIZE := false” was set in the makefile. 
* Fixed a regression in the #PF handler. If a page has
  SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail.
  It did not, which caused weird races when trying to change other parts of
  swapping code.
* EPC management has been refactored to a flat LRU cache and moved to
  arch/x86. The swapper thread reads a cluster of EPC pages and swaps all
  of them. It can now swap from multiple enclaves in the same round.
* For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL
  when an enclave is already initialized or dead instead of zero.

v10:
* Cleaned up anon inode based IPC between the ring-0 and ring-3 parts
  of the driver.
* Unset the reserved flag from an enclave page if EDBGRD/WR fails
  (regression in v6).
* Close the anon inode when LE is stopped (regression in v9).
* Update the documentation with a more detailed description of SGX.

v9:
* Replaced kernel-LE IPC based on pipes with an anonymous inode.
  The driver does not require anymore new exports.

v8:
* Check that public key MSRs match the LE public key hash in the
  driver initialization when the MSRs are read-only.
* Fix the race in VA slot allocation by checking the fullness
  immediately after succeesful allocation.
* Fix the race in hash mrsigner calculation between the launch
  enclave and user enclaves by having a separate lock for hash
  calculation.

v7:
* Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK
  when it should have been masked with ~PAGE_MASK.
* Fixed a memory leak in sgx_ioc_enclave_create().
* Simplified swapping code by using a pointer array for a cluster
  instead of a linked list.
* Squeezed struct sgx_encl_page to 32 bytes.
* Fixed deferencing of an RSA key on OpenSSL 1.1.0.
* Modified TC's CMAC to use kernel AES-NI. Restructured the code
  a bit in order to better align with kernel conventions.

v6:
* Fixed semaphore underrun when accessing /dev/sgx from the launch enclave.
* In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/.
* Removed virtualization chapter from the documentation.
* Changed the default filename for the signing key as signing_key.pem.
* Reworked EPC management in a way that instead of a linked list of
  struct sgx_epc_page instances there is an array of integers that
  encodes address and bank of an EPC page (the same data as 'pa' field
  earlier). The locking has been moved to the EPC bank level instead
  of a global lock.
* Relaxed locking requirements for EPC management. EPC pages can be
  released back to the EPC bank concurrently.
* Cleaned up ptrace() code.
* Refined commit messages for new architectural constants.
* Sorted includes in every source file.
* Sorted local variable declarations according to the line length in
  every function.
* Style fixes based on Darren's comments to sgx_le.c.

v5:
* Described IPC between the Launch Enclave and kernel in the commit messages.
* Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier
  versions except those that exist in the imported TinyCrypt code.
* Fixed spelling mistakes in the documentation.
* Forgot to check the return value of sgx_drv_subsys_init().
* Encapsulated properly page cache init and teardown.
* Collect epc pages to a temp list in sgx_add_epc_bank
* Removed SGX_ENCLAVE_INIT_ARCH constant.

v4:
* Tied life-cycle of the sgx_le_proxy process to /dev/sgx.
* Removed __exit annotation from sgx_drv_subsys_exit().
* Fixed a leak of a backing page in sgx_process_add_page_req() in the
  case when vm_insert_pfn() fails.
* Removed unused symbol exports for sgx_page_cache.c.
* Updated sgx_alloc_page() to require encl parameter and documented the
  behavior (Sean Christopherson).
* Refactored a more lean API for sgx_encl_find() and documented the behavior.
* Moved #PF handler to sgx_fault.c.
* Replaced subsys_system_register() with plain bus_register().
* Retry EINIT 2nd time only if MSRs are not locked.

v3:
* Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set.
* Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails.
* Use unused bits in epc_page->pa to store the bank number.
* Removed #ifdef for WQ_NONREENTRANT.
* If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS.
* Added --remove-section=.got.plt to objcopy flags in order to prevent a
  dummy .got.plt, which will cause an inconsistent size for the LE.
* Documented sgx_encl_* functions.
* Added remark about AES implementation used inside the LE.
* Removed redundant sgx_sys_exit() from le/main.c.
* Fixed struct sgx_secinfo alignment from 128 to 64 bytes.
* Validate miscselect in sgx_encl_create().
* Fixed SSA frame size calculation to take the misc region into account.
* Implemented consistent exception handling to __encls() and __encls_ret().
* Implemented a proper device model in order to allow sysfs attributes
  and in-kernel API.
* Cleaned up various "find enclave" implementations to the unified
  sgx_encl_find().
* Validate that vm_pgoff is zero.
* Discard backing pages with shmem_truncate_range() after EADD.
* Added missing EEXTEND operations to LE signing and launch.
* Fixed SSA size for GPRS region from 168 to 184 bytes.
* Fixed the checks for TCS flags. Now DBGOPTIN is allowed.
* Check that TCS addresses are in ELRANGE and not just page aligned.
* Require kernel to be compiled with X64_64 and CPU_SUP_INTEL.
* Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02.

v2:
* get_rand_uint32() changed the value of the pointer instead of value
  where it is pointing at.
* Launch enclave incorrectly used sigstruct attributes-field instead of
  enclave attributes-field.
* Removed unused struct sgx_add_page_req from sgx_ioctl.c
* Removed unused sgx_has_sgx2.
* Updated arch/x86/include/asm/sgx.h so that it provides stub
  implementations when sgx in not enabled.
* Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs().
* return -ENOMEM in sgx_alloc_page() when VA pages consume too much space
* removed unused global sgx_nr_pids
* moved sgx_encl_release to sgx_encl.c
* return -ERESTARTSYS instead of -EINTR in sgx_encl_init()

Jarkko Sakkinen (7):
  x86, sgx: updated MAINTAINERS
  x86, sgx: added ENCLS wrappers
  x86, sgx: basic routines for enclave page cache
  intel_sgx: driver for Intel Software Guard Extensions
  intel_sgx: ptrace() support
  intel_sgx: driver documentation
  intel_sgx: in-kernel launch enclave

Kai Huang (1):
  x86, sgx: add SGX definitions to cpufeature

Sean Christopherson (5):
  compiler.h, kasan: add __SANITIZE_ADDRESS__ check for
    __no_kasan_or_inline
  x86, sgx: add SGX definitions to msr-index.h
  x86, cpufeatures: add Intel-defined SGX leaf CPUID_12_EAX
  crypto: aesni: add minimal build option for SGX LE
  x86, sgx: detect Intel SGX

 Documentation/index.rst                       |   1 +
 Documentation/x86/intel_sgx.rst               | 195 ++++
 MAINTAINERS                                   |   7 +
 arch/x86/Kconfig                              |  19 +
 arch/x86/crypto/aesni-intel_asm.S             |  11 +
 arch/x86/include/asm/cpufeature.h             |   7 +-
 arch/x86/include/asm/cpufeatures.h            |  10 +-
 arch/x86/include/asm/disabled-features.h      |   3 +-
 arch/x86/include/asm/msr-index.h              |   8 +
 arch/x86/include/asm/required-features.h      |   3 +-
 arch/x86/include/asm/sgx.h                    | 289 +++++
 arch/x86/include/asm/sgx_arch.h               | 224 ++++
 arch/x86/include/asm/sgx_le.h                 |  17 +
 arch/x86/include/asm/sgx_pr.h                 |  36 +
 arch/x86/include/uapi/asm/sgx.h               | 138 +++
 arch/x86/kernel/cpu/Makefile                  |   1 +
 arch/x86/kernel/cpu/common.c                  |   7 +
 arch/x86/kernel/cpu/intel_sgx.c               | 492 +++++++++
 arch/x86/kvm/cpuid.h                          |   1 +
 drivers/platform/x86/Kconfig                  |   2 +
 drivers/platform/x86/Makefile                 |   1 +
 drivers/platform/x86/intel_sgx/Kconfig        |  34 +
 drivers/platform/x86/intel_sgx/Makefile       |  32 +
 drivers/platform/x86/intel_sgx/le/Makefile    |  34 +
 .../x86/intel_sgx/le/enclave/Makefile         |  54 +
 .../intel_sgx/le/enclave/aesni-intel_asm.S    |   1 +
 .../x86/intel_sgx/le/enclave/cmac_mode.c      | 209 ++++
 .../x86/intel_sgx/le/enclave/cmac_mode.h      |  54 +
 .../x86/intel_sgx/le/enclave/encl_bootstrap.S | 116 ++
 .../platform/x86/intel_sgx/le/enclave/main.c  | 146 +++
 .../platform/x86/intel_sgx/le/enclave/main.h  |  19 +
 .../x86/intel_sgx/le/enclave/sgx_le.lds       |  33 +
 .../x86/intel_sgx/le/enclave/sgxsign.c        | 551 ++++++++++
 .../x86/intel_sgx/le/enclave/string.c         |   1 +
 drivers/platform/x86/intel_sgx/le/entry.S     |  70 ++
 .../x86/intel_sgx/le/include/sgx_asm.h        |  15 +
 drivers/platform/x86/intel_sgx/le/main.c      | 140 +++
 drivers/platform/x86/intel_sgx/le/main.h      |  30 +
 .../platform/x86/intel_sgx/le/sgx_le_piggy.S  |  22 +
 drivers/platform/x86/intel_sgx/le/string.c    |  39 +
 drivers/platform/x86/intel_sgx/sgx.h          | 181 ++++
 drivers/platform/x86/intel_sgx/sgx_encl.c     | 988 ++++++++++++++++++
 .../platform/x86/intel_sgx/sgx_encl_page.c    | 294 ++++++
 drivers/platform/x86/intel_sgx/sgx_fault.c    | 159 +++
 drivers/platform/x86/intel_sgx/sgx_ioctl.c    | 235 +++++
 drivers/platform/x86/intel_sgx/sgx_le.c       | 303 ++++++
 .../x86/intel_sgx/sgx_le_proxy_piggy.S        |  22 +
 drivers/platform/x86/intel_sgx/sgx_main.c     | 373 +++++++
 drivers/platform/x86/intel_sgx/sgx_vma.c      | 182 ++++
 include/linux/compiler.h                      |   2 +-
 50 files changed, 5805 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/x86/intel_sgx.rst
 create mode 100644 arch/x86/include/asm/sgx.h
 create mode 100644 arch/x86/include/asm/sgx_arch.h
 create mode 100644 arch/x86/include/asm/sgx_le.h
 create mode 100644 arch/x86/include/asm/sgx_pr.h
 create mode 100644 arch/x86/include/uapi/asm/sgx.h
 create mode 100644 arch/x86/kernel/cpu/intel_sgx.c
 create mode 100644 drivers/platform/x86/intel_sgx/Kconfig
 create mode 100644 drivers/platform/x86/intel_sgx/Makefile
 create mode 100644 drivers/platform/x86/intel_sgx/le/Makefile
 create mode 100644 drivers/platform/x86/intel_sgx/le/enclave/Makefile
 create mode 120000 drivers/platform/x86/intel_sgx/le/enclave/aesni-intel_asm.S
 create mode 100644 drivers/platform/x86/intel_sgx/le/enclave/cmac_mode.c
 create mode 100644 drivers/platform/x86/intel_sgx/le/enclave/cmac_mode.h
 create mode 100644 drivers/platform/x86/intel_sgx/le/enclave/encl_bootstrap.S
 create mode 100644 drivers/platform/x86/intel_sgx/le/enclave/main.c
 create mode 100644 drivers/platform/x86/intel_sgx/le/enclave/main.h
 create mode 100644 drivers/platform/x86/intel_sgx/le/enclave/sgx_le.lds
 create mode 100644 drivers/platform/x86/intel_sgx/le/enclave/sgxsign.c
 create mode 120000 drivers/platform/x86/intel_sgx/le/enclave/string.c
 create mode 100644 drivers/platform/x86/intel_sgx/le/entry.S
 create mode 100644 drivers/platform/x86/intel_sgx/le/include/sgx_asm.h
 create mode 100644 drivers/platform/x86/intel_sgx/le/main.c
 create mode 100644 drivers/platform/x86/intel_sgx/le/main.h
 create mode 100644 drivers/platform/x86/intel_sgx/le/sgx_le_piggy.S
 create mode 100644 drivers/platform/x86/intel_sgx/le/string.c
 create mode 100644 drivers/platform/x86/intel_sgx/sgx.h
 create mode 100644 drivers/platform/x86/intel_sgx/sgx_encl.c
 create mode 100644 drivers/platform/x86/intel_sgx/sgx_encl_page.c
 create mode 100644 drivers/platform/x86/intel_sgx/sgx_fault.c
 create mode 100644 drivers/platform/x86/intel_sgx/sgx_ioctl.c
 create mode 100644 drivers/platform/x86/intel_sgx/sgx_le.c
 create mode 100644 drivers/platform/x86/intel_sgx/sgx_le_proxy_piggy.S
 create mode 100644 drivers/platform/x86/intel_sgx/sgx_main.c
 create mode 100644 drivers/platform/x86/intel_sgx/sgx_vma.c

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v11 12/13] intel_sgx: driver documentation
From: Jarkko Sakkinen @ 2018-06-08 17:09 UTC (permalink / raw)
  To: x86, platform-driver-x86
  Cc: dave.hansen, sean.j.christopherson, nhorman, npmccallum,
	Jarkko Sakkinen, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, open list:DOCUMENTATION, open list
In-Reply-To: <20180608171216.26521-1-jarkko.sakkinen@linux.intel.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 9854 bytes --]

Documentation of the features of the  Software Guard eXtensions usable
for the Linux kernel and how the driver internals uses these features.
In addition, contains documentation for the ioctl API.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 Documentation/index.rst         |   1 +
 Documentation/x86/intel_sgx.rst | 195 ++++++++++++++++++++++++++++++++
 2 files changed, 196 insertions(+)
 create mode 100644 Documentation/x86/intel_sgx.rst

diff --git a/Documentation/index.rst b/Documentation/index.rst
index 3b99ab931d41..b9fb92928e8c 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -100,6 +100,7 @@ implementation.
    :maxdepth: 2
 
    sh/index
+   x86/index
 
 Korean translations
 -------------------
diff --git a/Documentation/x86/intel_sgx.rst b/Documentation/x86/intel_sgx.rst
new file mode 100644
index 000000000000..ecbe544eb2cb
--- /dev/null
+++ b/Documentation/x86/intel_sgx.rst
@@ -0,0 +1,195 @@
+===================
+Intel(R) SGX driver
+===================
+
+Introduction
+============
+
+Intel(R) SGX is a set of CPU instructions that can be used by applications to
+set aside private regions of code and data. The code outside the enclave is
+disallowed to access the memory inside the enclave by the CPU access control.
+In a way you can think that SGX provides inverted sandbox. It protects the
+application from a malicious host.
+
+You can tell if your CPU supports SGX by looking into ``/proc/cpuinfo``:
+
+	``cat /proc/cpuinfo  | grep sgx``
+
+Overview of SGX
+===============
+
+SGX has a set of data structures to maintain information about the enclaves and
+their security properties. BIOS reserves a fixed size region of physical memory
+for these structures by setting Processor Reserved Memory Range Registers
+(PRMRR).
+
+This memory range is protected from outside access by the CPU and all the data
+coming in and out of the CPU package is encrypted by a key that is generated for
+each boot cycle.
+
+Enclaves execute in ring-3 in a special enclave submode using pages from the
+reserved memory range. A fixed logical address range for the enclave is reserved
+by ENCLS(ECREATE), a leaf instruction used to create enclaves. It is referred in
+the documentation commonly as the ELRANGE.
+
+Every memory access to the ELRANGE is asserted by the CPU. If the CPU is not
+executing in the enclave mode inside the enclave, #GP is raised. On the other
+hand enclave code can make memory accesses both inside and outside of the
+ELRANGE.
+
+Enclave can only execute code inside the ELRANGE. Instructions that may cause
+VMEXIT, IO instructions and instructions that require a privilege change are
+prohibited inside the enclave. Interrupts and exceptions always cause enclave
+to exit and jump to an address outside the enclave given when the enclave is
+entered by using the leaf instruction ENCLS(EENTER).
+
+Data types
+----------
+
+The protected memory range contains the following data:
+
+* **Enclave Page Cache (EPC):** protected pages
+* **Enclave Page Cache Map (EPCM):** a database that describes the state of the
+  pages and link them to an enclave.
+
+EPC has a number of different types of pages:
+
+* **SGX Enclave Control Structure (SECS)**: describes the global
+  properties of an enclave.
+* **Regular (REG):** code and data pages in the ELRANGE.
+* **Thread Control Structure (TCS):** pages that define entry points inside an
+  enclave. The enclave can only be entered through these entry points and each
+  can host a single hardware thread at a time.
+* **Version Array (VA)**: 64-bit version numbers for pages that have been
+  swapped outside the enclave. Each page contains 512 version numbers.
+
+Launch control
+--------------
+
+To launch an enclave, two structures must be provided for ENCLS(EINIT):
+
+1. **SIGSTRUCT:** signed measurement of the enclave binary.
+2. **EINITTOKEN:** a cryptographic token CMAC-signed with a AES256-key called
+   *launch key*, which is re-generated for each boot cycle.
+
+The CPU holds a SHA256 hash of a 3072-bit RSA public key inside
+IA32_SGXLEPUBKEYHASHn MSRs. Enclaves with a SIGSTRUCT that is signed with this
+key do not require a valid EINITTOKEN and can be authorized with special
+privileges. One of those privileges is ability to acquire the launch key with
+ENCLS(EGETKEY).
+
+**IA32_FEATURE_CONTROL[17]** is used by to BIOS configure whether
+IA32_SGXLEPUBKEYHASH MSRs are read-only or read-write before locking the
+feature control register and handing over control to the operating system.
+
+Enclave construction
+--------------------
+
+The construction is started by filling out the SECS that contains enclave
+address range, privileged attributes and measurement of TCS and REG pages (pages
+that will be mapped to the address range) among the other things. This structure
+is passed out to the ENCLS(ECREATE) together with a physical address of a page
+in EPC that will hold the SECS.
+
+Then pages are added with ENCLS(EADD) and measured with ENCLS(EEXTEND).  Finally
+enclave is initialized with ENCLS(EINIT). ENCLS(INIT) checks that the SIGSTRUCT
+is signed with the contained public key and that the supplied EINITTOKEN is
+valid (CMAC'd with the launch key). If these hold, the enclave is successfully
+initialized.
+
+Swapping pages
+--------------
+
+Enclave pages can be swapped out with ENCLS(EWB) to the unprotected memory. In
+addition to the EPC page, ENCLS(EWB) takes in a VA page and address for PCMD
+structure (Page Crypto MetaData) as input. The VA page will seal a version
+number for the page. PCMD is 128 byte structure that contains tracking
+information for the page, most importantly its MAC. With these structures the
+enclave is sealed and rollback protected while it resides in the unprotected
+memory.
+
+Before the page can be swapped out it must not have any active TLB references.
+By using ENCLS(EBLOCK) instructions no new TLB entries can be created to it.
+After this the a counter called *epoch* associated hardware threads inside the
+enclave is increased with ENCLS(ETRACK). After all the threads from the previous
+epoch have exited the page can be safely swapped out.
+
+An enclave memory access to a swapped out pages will cause #PF. #PF handler can
+fault the page back by using ENCLS(ELDU).
+
+Kernel internals
+================
+
+Requirements
+------------
+
+Because SGX has an ever evolving and expanding feature set, it's possible for
+a BIOS or VMM to configure a system in such a way that not all cpus are equal,
+e.g. where Launch Control is only enabled on a subset of cpus.  Linux does
+*not* support such a heterogenous system configuration, nor does it even
+attempt to play nice in the face of a misconfigured system.  With the exception
+of Launch Control's hash MSRs, which can vary per cpu, Linux assumes that all
+cpus have a configuration that is identical to the boot cpu.
+
+
+Roles and responsibilities
+--------------------------
+
+SGX introduces system resources, e.g. EPC memory, that must be accessible to
+multiple entities, e.g. the native kernel driver (to expose SGX to userspace)
+and KVM (to expose SGX to VMs), ideally without introducing any dependencies
+between each SGX entity.  To that end, the kernel owns and manages the shared
+system resources, i.e. the EPC and Launch Control MSRs, and defines functions
+that provide appropriate access to the shared resources.  SGX support for
+userpace and VMs is left to the SGX platform driver and KVM respectively.
+
+Launching enclaves
+------------------
+
+For privileged enclaves the launch is performed simply by submitting the
+SIGSTRUCT for that enclave to ENCLS(EINIT). For unprivileged enclaves the
+driver hosts a process in ring-3 that hosts a launch enclave signed with a key
+supplied for kbuild.
+
+The current implementation of the launch enclave generates a token for any
+enclave. In the future it could be potentially extended to have ways to
+configure policy what can be lauched.
+
+The driver will fail to initialize if it cannot start its own launch enclave.
+A user space application can submit a SIGSTRUCT instance through the ioctl API.
+The kernel will take care of the rest.
+
+This design assures that the Linux kernel has always full control, which
+enclaves get to launch and which do not, even if the public key MSRs are
+read-only. Having launch intrinsics inside the kernel also enables easy
+development of enclaves without necessarily needing any heavy weight SDK.
+Having a low-barrier to implement enclaves could make sense for example for
+system daemons where amount of dependecies ought to be minimized.
+
+EPC management
+--------------
+
+Due to the unique requirements for swapping EPC pages, and because EPC pages
+(currently) do not have associated page structures, management of the EPC is
+not handled by the standard Linux swapper.  SGX directly handles swapping
+of EPC pages, including a kthread to initiate reclaim and a rudimentary LRU
+mechanism.  Consumsers of EPC pages, e.g. the SGX driver, are required to
+implement function callbacks that can be invoked by the kernel to age,
+swap, and/or forcefully reclaim a target EPC page.  In effect, the kernel
+controls what happens and when, while the consumers (driver, KVM, etc..) do
+the actual work.
+
+SGX uapi
+========
+
+.. kernel-doc:: drivers/platform/x86/intel_sgx/sgx_ioctl.c
+   :functions: sgx_ioc_enclave_create
+               sgx_ioc_enclave_add_page
+               sgx_ioc_enclave_init
+
+.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h
+
+References
+==========
+
+* System Programming Manual: 39.1.4 Intel® SGX Launch Control Configuration
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 04/24] 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
From: Catalin Marinas @ 2018-06-08 17:32 UTC (permalink / raw)
  To: Yury Norov
  Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, linux-doc,
	linux-arch, linux-api, Szabolcs Nagy, Heiko Carstens,
	Philipp Tomsich, Joseph Myers, Steve Ellcey, Prasun Kapoor,
	Andreas Schwab, Alexander Graf, Bamvor Zhangjian,
	Geert Uytterhoeven, Dave Martin, Adam Borowski, Manuel Montezelo,
	James Hogan, Chris Metcalf, Andrew Pinski, Lin Yongting,
	Alexey Klimov, Mark Brown, Maxim Kuvyrkov, Florian Weimer,
	Nathan_Lynch, James Morse, Ramana Radhakrishnan,
	Martin Schwidefsky, David S . Miller, Christoph Muellner
In-Reply-To: <20180516081910.10067-5-ynorov@caviumnetworks.com>

On Wed, May 16, 2018 at 11:18:49AM +0300, Yury Norov wrote:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 76c0b54443b1..ee079244dc3c 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -264,6 +264,21 @@ config ARCH_THREAD_STACK_ALLOCATOR
>  config ARCH_WANTS_DYNAMIC_TASK_STRUCT
>  	bool
>  
> +config ARCH_32BIT_OFF_T
> +	bool
> +	depends on !64BIT
> +	help
> +	  All new 32-bit architectures should have 64-bit off_t type on
> +	  userspace side which corresponds to the loff_t kernel type. This
> +	  is the requirement for modern ABIs. Some existing architectures
> +	  already have 32-bit off_t. This option is enabled for all such
> +	  architectures explicitly. Namely: arc, arm, blackfin, cris, frv,
> +	  h8300, hexagon, m32r, m68k, metag, microblaze, mips32, mn10300,
> +	  nios2, openrisc, parisc32, powerpc32, score, sh, sparc, tile32,
> +	  unicore32, x86_32 and xtensa. This is the complete list. Any
> +	  new 32-bit architecture should declare 64-bit off_t type on user
> +	  side and so should not enable this option.

Do you know if this is the case for riscv and nds32, merged in the
meantime? If not, I suggest you drop this patch altogether and just
define force_o_largefile() for arm64/ilp32 as we don't seem to stick to
"all new 32-bit architectures should have 64-bit off_t".

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v11 12/13] intel_sgx: driver documentation
From: Jethro Beekman @ 2018-06-08 18:32 UTC (permalink / raw)
  To: Jarkko Sakkinen, x86, platform-driver-x86
  Cc: dave.hansen, sean.j.christopherson, nhorman, npmccallum,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	open list:DOCUMENTATION, open list
In-Reply-To: <20180608171216.26521-13-jarkko.sakkinen@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 1511 bytes --]

On 2018-06-08 10:09, Jarkko Sakkinen wrote:
> +Launching enclaves
> +------------------
> +
> +For privileged enclaves the launch is performed simply by submitting the
> +SIGSTRUCT for that enclave to ENCLS(EINIT). For unprivileged enclaves the
> +driver hosts a process in ring-3 that hosts a launch enclave signed with a key
> +supplied for kbuild.
> +
> +The current implementation of the launch enclave generates a token for any
> +enclave. In the future it could be potentially extended to have ways to
> +configure policy what can be lauched.
> +
> +The driver will fail to initialize if it cannot start its own launch enclave.
> +A user space application can submit a SIGSTRUCT instance through the ioctl API.
> +The kernel will take care of the rest.
> +
> +This design assures that the Linux kernel has always full control, which
> +enclaves get to launch and which do not, even if the public key MSRs are

As discussed previously at length, since the kernel needs to execute 
ENCLS[EINIT], it has full control to deny the launching of enclaves 
regardless of any launch enclave implementation. Please change this 
misleading statement.

> +read-only. Having launch intrinsics inside the kernel also enables easy
> +development of enclaves without necessarily needing any heavy weight SDK.
> +Having a low-barrier to implement enclaves could make sense for example for
> +system daemons where amount of dependecies ought to be minimized.

--
Jethro Beekman | Fortanix


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3994 bytes --]

^ permalink raw reply

* Re: [PATCH v3 1/3] usb: gadget: ccid: add support for USB CCID Gadget Device
From: Marcus Folkesson @ 2018-06-08 18:54 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Greg Kroah-Hartman, Jonathan Corbet, davem, Mauro Carvalho Chehab,
	Andrew Morton, Randy Dunlap, Ruslan Bilovol, Thomas Gleixner,
	Kate Stewart, linux-usb, linux-doc, linux-kernel
In-Reply-To: <20180530140415.GE2939@gmail.com>

Hi Felipe,

Should I send out v4 or what do you think?

On Wed, May 30, 2018 at 04:04:15PM +0200, Marcus Folkesson wrote:
> Hi Filipe,
> 
> On Wed, May 30, 2018 at 03:28:18PM +0300, Felipe Balbi wrote:
> > Marcus Folkesson <marcus.folkesson@gmail.com> writes:
> > 
> > > Chip Card Interface Device (CCID) protocol is a USB protocol that
> > > allows a smartcard device to be connected to a computer via a card
> > > reader using a standard USB interface, without the need for each manufacturer
> > > of smartcards to provide its own reader or protocol.
> > >
> > > This gadget driver makes Linux show up as a CCID device to the host and let a
> > > userspace daemon act as the smartcard.
> > >
> > > This is useful when the Linux gadget itself should act as a cryptographic
> > > device or forward APDUs to an embedded smartcard device.
> > >
> > > Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
> > 
> > this could be done entirely in userspace with functionfs, why do we need
> > this part in the kernel? It does very little.
> 
> Andrzej pointed this out, and I actually do not have any good answer
> more than that the userspace application could be kept small and the
> important configuration of the CCID device is done with well (I hope)
> documented configfs attributes.
> 
> > 
> > -- 
> > balbi
> 
> Best regards,
> Marcus Folkesson
> 
> 

Thank you,
Marcus Folkesson
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v11 12/13] intel_sgx: driver documentation
From: Randy Dunlap @ 2018-06-08 21:41 UTC (permalink / raw)
  To: Jarkko Sakkinen, x86, platform-driver-x86
  Cc: dave.hansen, sean.j.christopherson, nhorman, npmccallum,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	open list:DOCUMENTATION, open list
In-Reply-To: <20180608171216.26521-13-jarkko.sakkinen@linux.intel.com>

On 06/08/2018 10:09 AM, Jarkko Sakkinen wrote:
> Documentation of the features of the  Software Guard eXtensions usable
> for the Linux kernel and how the driver internals uses these features.
> In addition, contains documentation for the ioctl API.
> 
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

Hi,

I have a few corrections below...


> ---
>  Documentation/index.rst         |   1 +
>  Documentation/x86/intel_sgx.rst | 195 ++++++++++++++++++++++++++++++++
>  2 files changed, 196 insertions(+)
>  create mode 100644 Documentation/x86/intel_sgx.rst
> 
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index 3b99ab931d41..b9fb92928e8c 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -100,6 +100,7 @@ implementation.
>     :maxdepth: 2
>  
>     sh/index
> +   x86/index
>  
>  Korean translations
>  -------------------
> diff --git a/Documentation/x86/intel_sgx.rst b/Documentation/x86/intel_sgx.rst
> new file mode 100644
> index 000000000000..ecbe544eb2cb
> --- /dev/null
> +++ b/Documentation/x86/intel_sgx.rst
> @@ -0,0 +1,195 @@
> +===================
> +Intel(R) SGX driver
> +===================
> +
> +Introduction
> +============
> +
> +Intel(R) SGX is a set of CPU instructions that can be used by applications to
> +set aside private regions of code and data. The code outside the enclave is
> +disallowed to access the memory inside the enclave by the CPU access control.
> +In a way you can think that SGX provides inverted sandbox. It protects the
> +application from a malicious host.
> +
> +You can tell if your CPU supports SGX by looking into ``/proc/cpuinfo``:
> +
> +	``cat /proc/cpuinfo  | grep sgx``
> +
> +Overview of SGX
> +===============
> +
> +SGX has a set of data structures to maintain information about the enclaves and
> +their security properties. BIOS reserves a fixed size region of physical memory
> +for these structures by setting Processor Reserved Memory Range Registers
> +(PRMRR).
> +
> +This memory range is protected from outside access by the CPU and all the data
> +coming in and out of the CPU package is encrypted by a key that is generated for
> +each boot cycle.
> +
> +Enclaves execute in ring-3 in a special enclave submode using pages from the
> +reserved memory range. A fixed logical address range for the enclave is reserved
> +by ENCLS(ECREATE), a leaf instruction used to create enclaves. It is referred in
> +the documentation commonly as the ELRANGE.
> +
> +Every memory access to the ELRANGE is asserted by the CPU. If the CPU is not
> +executing in the enclave mode inside the enclave, #GP is raised. On the other
> +hand enclave code can make memory accesses both inside and outside of the
> +ELRANGE.
> +
> +Enclave can only execute code inside the ELRANGE. Instructions that may cause
> +VMEXIT, IO instructions and instructions that require a privilege change are
> +prohibited inside the enclave. Interrupts and exceptions always cause enclave
> +to exit and jump to an address outside the enclave given when the enclave is
> +entered by using the leaf instruction ENCLS(EENTER).
> +
> +Data types
> +----------
> +
> +The protected memory range contains the following data:
> +
> +* **Enclave Page Cache (EPC):** protected pages
> +* **Enclave Page Cache Map (EPCM):** a database that describes the state of the
> +  pages and link them to an enclave.
> +
> +EPC has a number of different types of pages:
> +
> +* **SGX Enclave Control Structure (SECS)**: describes the global
> +  properties of an enclave.
> +* **Regular (REG):** code and data pages in the ELRANGE.
> +* **Thread Control Structure (TCS):** pages that define entry points inside an
> +  enclave. The enclave can only be entered through these entry points and each
> +  can host a single hardware thread at a time.
> +* **Version Array (VA)**: 64-bit version numbers for pages that have been
> +  swapped outside the enclave. Each page contains 512 version numbers.
> +
> +Launch control
> +--------------
> +
> +To launch an enclave, two structures must be provided for ENCLS(EINIT):
> +
> +1. **SIGSTRUCT:** signed measurement of the enclave binary.
> +2. **EINITTOKEN:** a cryptographic token CMAC-signed with a AES256-key called
> +   *launch key*, which is re-generated for each boot cycle.
> +
> +The CPU holds a SHA256 hash of a 3072-bit RSA public key inside
> +IA32_SGXLEPUBKEYHASHn MSRs. Enclaves with a SIGSTRUCT that is signed with this
> +key do not require a valid EINITTOKEN and can be authorized with special
> +privileges. One of those privileges is ability to acquire the launch key with
> +ENCLS(EGETKEY).
> +
> +**IA32_FEATURE_CONTROL[17]** is used by to BIOS configure whether

                                        by the BIOS to configure whether

> +IA32_SGXLEPUBKEYHASH MSRs are read-only or read-write before locking the
> +feature control register and handing over control to the operating system.
> +
> +Enclave construction
> +--------------------
> +
> +The construction is started by filling out the SECS that contains enclave
> +address range, privileged attributes and measurement of TCS and REG pages (pages
> +that will be mapped to the address range) among the other things. This structure
> +is passed out to the ENCLS(ECREATE) together with a physical address of a page
> +in EPC that will hold the SECS.
> +
> +Then pages are added with ENCLS(EADD) and measured with ENCLS(EEXTEND).  Finally

"measured"?  what does that mean?

> +enclave is initialized with ENCLS(EINIT). ENCLS(INIT) checks that the SIGSTRUCT
> +is signed with the contained public key and that the supplied EINITTOKEN is
> +valid (CMAC'd with the launch key). If these hold, the enclave is successfully
> +initialized.
> +
> +Swapping pages
> +--------------
> +
> +Enclave pages can be swapped out with ENCLS(EWB) to the unprotected memory. In
> +addition to the EPC page, ENCLS(EWB) takes in a VA page and address for PCMD
> +structure (Page Crypto MetaData) as input. The VA page will seal a version
> +number for the page. PCMD is 128 byte structure that contains tracking
> +information for the page, most importantly its MAC. With these structures the
> +enclave is sealed and rollback protected while it resides in the unprotected
> +memory.
> +
> +Before the page can be swapped out it must not have any active TLB references.
> +By using ENCLS(EBLOCK) instructions no new TLB entries can be created to it.
> +After this the a counter called *epoch* associated hardware threads inside the

huh?

> +enclave is increased with ENCLS(ETRACK). After all the threads from the previous
> +epoch have exited the page can be safely swapped out.
> +
> +An enclave memory access to a swapped out pages will cause #PF. #PF handler can
> +fault the page back by using ENCLS(ELDU).
> +
> +Kernel internals
> +================
> +
> +Requirements
> +------------
> +
> +Because SGX has an ever evolving and expanding feature set, it's possible for
> +a BIOS or VMM to configure a system in such a way that not all cpus are equal,

                                                                  CPUs

> +e.g. where Launch Control is only enabled on a subset of cpus.  Linux does

                                                            CPUs.

> +*not* support such a heterogenous system configuration, nor does it even

                        heterogeneous

> +attempt to play nice in the face of a misconfigured system.  With the exception
> +of Launch Control's hash MSRs, which can vary per cpu, Linux assumes that all

                                                     CPU,

> +cpus have a configuration that is identical to the boot cpu.

   CPUs                                                    CPU.

> +
> +
> +Roles and responsibilities
> +--------------------------
> +
> +SGX introduces system resources, e.g. EPC memory, that must be accessible to
> +multiple entities, e.g. the native kernel driver (to expose SGX to userspace)
> +and KVM (to expose SGX to VMs), ideally without introducing any dependencies
> +between each SGX entity.  To that end, the kernel owns and manages the shared
> +system resources, i.e. the EPC and Launch Control MSRs, and defines functions
> +that provide appropriate access to the shared resources.  SGX support for
> +userpace and VMs is left to the SGX platform driver and KVM respectively.

   userspace

> +
> +Launching enclaves
> +------------------
> +
> +For privileged enclaves the launch is performed simply by submitting the
> +SIGSTRUCT for that enclave to ENCLS(EINIT). For unprivileged enclaves the
> +driver hosts a process in ring-3 that hosts a launch enclave signed with a key
> +supplied for kbuild.
> +
> +The current implementation of the launch enclave generates a token for any
> +enclave. In the future it could be potentially extended to have ways to
> +configure policy what can be lauched.

                                launched.

> +
> +The driver will fail to initialize if it cannot start its own launch enclave.
> +A user space application can submit a SIGSTRUCT instance through the ioctl API.
> +The kernel will take care of the rest.
> +
> +This design assures that the Linux kernel has always full control, which
> +enclaves get to launch and which do not, even if the public key MSRs are
> +read-only. Having launch intrinsics inside the kernel also enables easy
> +development of enclaves without necessarily needing any heavy weight SDK.
> +Having a low-barrier to implement enclaves could make sense for example for

            low barrier

> +system daemons where amount of dependecies ought to be minimized.

                                  dependencies

> +
> +EPC management
> +--------------
> +
> +Due to the unique requirements for swapping EPC pages, and because EPC pages
> +(currently) do not have associated page structures, management of the EPC is
> +not handled by the standard Linux swapper.  SGX directly handles swapping
> +of EPC pages, including a kthread to initiate reclaim and a rudimentary LRU
> +mechanism.  Consumsers of EPC pages, e.g. the SGX driver, are required to

               Consumers

> +implement function callbacks that can be invoked by the kernel to age,
> +swap, and/or forcefully reclaim a target EPC page.  In effect, the kernel
> +controls what happens and when, while the consumers (driver, KVM, etc..) do
> +the actual work.
> +
> +SGX uapi
> +========
> +
> +.. kernel-doc:: drivers/platform/x86/intel_sgx/sgx_ioctl.c
> +   :functions: sgx_ioc_enclave_create
> +               sgx_ioc_enclave_add_page
> +               sgx_ioc_enclave_init
> +
> +.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h
> +
> +References
> +==========
> +
> +* System Programming Manual: 39.1.4 Intel® SGX Launch Control Configuration
> 


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/24] 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
From: Palmer Dabbelt @ 2018-06-08 22:33 UTC (permalink / raw)
  To: catalin.marinas
  Cc: ynorov, Arnd Bergmann, linux-arm-kernel, linux-kernel, linux-doc,
	linux-arch, linux-api, szabolcs.nagy, heiko.carstens,
	philipp.tomsich, joseph, sellcey, Prasun.Kapoor, schwab, agraf,
	bamv2005, geert, Dave.Martin, kilobyte, manuel.montezelo,
	james.hogan, cmetcalf, pinskia, linyongting, klimov.linux,
	broonie, maxim.kuvyrkov, fweimer, Nathan_Lynch, james.morse,
	ramana.gcc, schwidefsky, davem, christoph.muellner
In-Reply-To: <20180608173207.nwoi25jee52gpdwy@armageddon.cambridge.arm.com>

On Fri, 08 Jun 2018 10:32:07 PDT (-0700), catalin.marinas@arm.com wrote:
> On Wed, May 16, 2018 at 11:18:49AM +0300, Yury Norov wrote:
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 76c0b54443b1..ee079244dc3c 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -264,6 +264,21 @@ config ARCH_THREAD_STACK_ALLOCATOR
>>  config ARCH_WANTS_DYNAMIC_TASK_STRUCT
>>  	bool
>>
>> +config ARCH_32BIT_OFF_T
>> +	bool
>> +	depends on !64BIT
>> +	help
>> +	  All new 32-bit architectures should have 64-bit off_t type on
>> +	  userspace side which corresponds to the loff_t kernel type. This
>> +	  is the requirement for modern ABIs. Some existing architectures
>> +	  already have 32-bit off_t. This option is enabled for all such
>> +	  architectures explicitly. Namely: arc, arm, blackfin, cris, frv,
>> +	  h8300, hexagon, m32r, m68k, metag, microblaze, mips32, mn10300,
>> +	  nios2, openrisc, parisc32, powerpc32, score, sh, sparc, tile32,
>> +	  unicore32, x86_32 and xtensa. This is the complete list. Any
>> +	  new 32-bit architecture should declare 64-bit off_t type on user
>> +	  side and so should not enable this option.
>
> Do you know if this is the case for riscv and nds32, merged in the
> meantime? If not, I suggest you drop this patch altogether and just
> define force_o_largefile() for arm64/ilp32 as we don't seem to stick to
> "all new 32-bit architectures should have 64-bit off_t".

We (RISC-V) don't have support for rv32i in glibc yet, so there really isn't a 
fixed ABI there yet.  From my understanding the rv32i port as it currently 
stands has a 32-bit off_t (via __kernel_off_t being defined as long), so this 
change would technically be a kernel ABI break.

Since we don't have rv32i glibc yet I'm not fundamentally opposed to an ABI 
break.  Is there a concrete advantage to this?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 5/5] Documentation/x86: Add CET description
From: Randy Dunlap @ 2018-06-09  0:10 UTC (permalink / raw)
  To: Yu-cheng Yu, linux-kernel, linux-doc, linux-mm, linux-arch, x86,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, H.J. Lu,
	Vedvyas Shanbhogue, Ravi V. Shankar, Dave Hansen, Andy Lutomirski,
	Jonathan Corbet, Oleg Nesterov, Arnd Bergmann, Mike Kravetz
In-Reply-To: <20180607143544.3477-6-yu-cheng.yu@intel.com>

On 06/07/2018 07:35 AM, Yu-cheng Yu wrote:
> Explain how CET works and the noshstk/noibt kernel parameters.
> 
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |   6 +
>  Documentation/x86/intel_cet.txt                 | 161 ++++++++++++++++++++++++
>  2 files changed, 167 insertions(+)
>  create mode 100644 Documentation/x86/intel_cet.txt
> 

> diff --git a/Documentation/x86/intel_cet.txt b/Documentation/x86/intel_cet.txt
> new file mode 100644
> index 000000000000..1b902a6c49f4
> --- /dev/null
> +++ b/Documentation/x86/intel_cet.txt
> @@ -0,0 +1,161 @@
> +-----------------------------------------
> +Control Flow Enforcement Technology (CET)
> +-----------------------------------------
> +
> +[1] Overview
> +
> +Control Flow Enforcement Technology (CET) provides protection against
> +return/jump-oriented programing (ROP) attacks.  It can be implemented to

                        programming

> +protect both the kernel and applications.  In the first phase, only the
> +user-mode protection is implemented for the 64-bit kernel.  Thirty-two bit
> +applications are supported under the compatibility mode.
> +
> +CET includes shadow stack (SHSTK) and indirect branch tracking (IBT) and
> +they are enabled from two kernel configuration options:
> +
> +  INTEL_X86_SHADOW_STACK_USER, and

no comma.

> +  INTEL_X86_BRANCH_TRACKING_USER.
> +
> +There are two command-line options for disabling CET features:
> +
> +  noshstk - disables shadow stack, and
> +  noibt - disables indirect branch tracking.
> +
> +At run time, /proc/cpuinfo shows the availability of SHSTK and IBT.
> +

[snip]


thanks,
-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 03/10] PCI: Update xxx_pcie_ep_raise_irq() and pci_epc_raise_irq() signatures
From: kbuild test robot @ 2018-06-09  7:21 UTC (permalink / raw)
  To: Gustavo Pimentel
  Cc: kbuild-all, bhelgaas, lorenzo.pieralisi, Joao.Pinto, jingoohan1,
	kishon, adouglas, jesper.nilsson, linux-pci, linux-doc,
	linux-kernel, Gustavo Pimentel
In-Reply-To: <b4eba65e4ae7b8d15a5770a31d6a339de518cea7.1527862777.git.gustavo.pimentel@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]

Hi Gustavo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on pci/next]
[also build test ERROR on next-20180608]
[cannot apply to v4.17]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Gustavo-Pimentel/Add-MSI-X-support-on-pcitest-tool/20180609-143316
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

>> drivers/pci/host/pcie-rockchip-ep.c:516:15: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
     .raise_irq = rockchip_pcie_ep_raise_irq,
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/pci/host/pcie-rockchip-ep.c:516:15: note: (near initialization for 'rockchip_pcie_epc_ops.raise_irq')
   cc1: some warnings being treated as errors

vim +516 drivers/pci/host/pcie-rockchip-ep.c

cf590b07 Shawn Lin 2018-05-09  507  
cf590b07 Shawn Lin 2018-05-09  508  static const struct pci_epc_ops rockchip_pcie_epc_ops = {
cf590b07 Shawn Lin 2018-05-09  509  	.write_header	= rockchip_pcie_ep_write_header,
cf590b07 Shawn Lin 2018-05-09  510  	.set_bar	= rockchip_pcie_ep_set_bar,
cf590b07 Shawn Lin 2018-05-09  511  	.clear_bar	= rockchip_pcie_ep_clear_bar,
cf590b07 Shawn Lin 2018-05-09  512  	.map_addr	= rockchip_pcie_ep_map_addr,
cf590b07 Shawn Lin 2018-05-09  513  	.unmap_addr	= rockchip_pcie_ep_unmap_addr,
cf590b07 Shawn Lin 2018-05-09  514  	.set_msi	= rockchip_pcie_ep_set_msi,
cf590b07 Shawn Lin 2018-05-09  515  	.get_msi	= rockchip_pcie_ep_get_msi,
cf590b07 Shawn Lin 2018-05-09 @516  	.raise_irq	= rockchip_pcie_ep_raise_irq,
cf590b07 Shawn Lin 2018-05-09  517  	.start		= rockchip_pcie_ep_start,
cf590b07 Shawn Lin 2018-05-09  518  };
cf590b07 Shawn Lin 2018-05-09  519  

:::::: The code at line 516 was first introduced by commit
:::::: cf590b07839133146842d2d3d9a68f804c2edc4b PCI: rockchip: Add EP driver for Rockchip PCIe controller

:::::: TO: Shawn Lin <shawn.lin@rock-chips.com>
:::::: CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 63133 bytes --]

^ permalink raw reply

* Re: [PATCH v3 03/10] PCI: Update xxx_pcie_ep_raise_irq() and pci_epc_raise_irq() signatures
From: kbuild test robot @ 2018-06-09  7:22 UTC (permalink / raw)
  To: Gustavo Pimentel
  Cc: kbuild-all, bhelgaas, lorenzo.pieralisi, Joao.Pinto, jingoohan1,
	kishon, adouglas, jesper.nilsson, linux-pci, linux-doc,
	linux-kernel, Gustavo Pimentel
In-Reply-To: <b4eba65e4ae7b8d15a5770a31d6a339de518cea7.1527862777.git.gustavo.pimentel@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 2893 bytes --]

Hi Gustavo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on pci/next]
[also build test ERROR on next-20180608]
[cannot apply to v4.17]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Gustavo-Pimentel/Add-MSI-X-support-on-pcitest-tool/20180609-143316
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

>> drivers/pci/host/pcie-rockchip-ep.c:516:15: error: initialization of 'int (*)(struct pci_epc *, u8,  enum pci_epc_irq_type,  u16)' {aka 'int (*)(struct pci_epc *, unsigned char,  enum pci_epc_irq_type,  short unsigned int)'} from incompatible pointer type 'int (*)(struct pci_epc *, u8,  enum pci_epc_irq_type,  u8)' {aka 'int (*)(struct pci_epc *, unsigned char,  enum pci_epc_irq_type,  unsigned char)'} [-Werror=incompatible-pointer-types]
     .raise_irq = rockchip_pcie_ep_raise_irq,
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/pci/host/pcie-rockchip-ep.c:516:15: note: (near initialization for 'rockchip_pcie_epc_ops.raise_irq')
   cc1: some warnings being treated as errors

vim +516 drivers/pci/host/pcie-rockchip-ep.c

cf590b07 Shawn Lin 2018-05-09  507  
cf590b07 Shawn Lin 2018-05-09  508  static const struct pci_epc_ops rockchip_pcie_epc_ops = {
cf590b07 Shawn Lin 2018-05-09  509  	.write_header	= rockchip_pcie_ep_write_header,
cf590b07 Shawn Lin 2018-05-09  510  	.set_bar	= rockchip_pcie_ep_set_bar,
cf590b07 Shawn Lin 2018-05-09  511  	.clear_bar	= rockchip_pcie_ep_clear_bar,
cf590b07 Shawn Lin 2018-05-09  512  	.map_addr	= rockchip_pcie_ep_map_addr,
cf590b07 Shawn Lin 2018-05-09  513  	.unmap_addr	= rockchip_pcie_ep_unmap_addr,
cf590b07 Shawn Lin 2018-05-09  514  	.set_msi	= rockchip_pcie_ep_set_msi,
cf590b07 Shawn Lin 2018-05-09  515  	.get_msi	= rockchip_pcie_ep_get_msi,
cf590b07 Shawn Lin 2018-05-09 @516  	.raise_irq	= rockchip_pcie_ep_raise_irq,
cf590b07 Shawn Lin 2018-05-09  517  	.start		= rockchip_pcie_ep_start,
cf590b07 Shawn Lin 2018-05-09  518  };
cf590b07 Shawn Lin 2018-05-09  519  

:::::: The code at line 516 was first introduced by commit
:::::: cf590b07839133146842d2d3d9a68f804c2edc4b PCI: rockchip: Add EP driver for Rockchip PCIe controller

:::::: TO: Shawn Lin <shawn.lin@rock-chips.com>
:::::: CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49947 bytes --]

^ permalink raw reply

* Re: [PATCH 04/24] 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
From: Yury Norov @ 2018-06-09  7:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Arnd Bergmann, linux-arm-kernel, linux-kernel, linux-doc,
	linux-arch, linux-api, Szabolcs Nagy, Heiko Carstens,
	Philipp Tomsich, Joseph Myers, Steve Ellcey, Prasun Kapoor,
	Andreas Schwab, Alexander Graf, Bamvor Zhangjian,
	Geert Uytterhoeven, Dave Martin, Adam Borowski, Manuel Montezelo,
	James Hogan, Chris Metcalf, Andrew Pinski, Lin Yongting,
	Alexey Klimov, Mark Brown, Maxim Kuvyrkov, Florian Weimer,
	Nathan_Lynch, James Morse, Ramana Radhakrishnan,
	Martin Schwidefsky, David S . Miller, Christoph Muellner
In-Reply-To: <20180608173207.nwoi25jee52gpdwy@armageddon.cambridge.arm.com>

On Fri, Jun 08, 2018 at 06:32:07PM +0100, Catalin Marinas wrote:
> On Wed, May 16, 2018 at 11:18:49AM +0300, Yury Norov wrote:
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 76c0b54443b1..ee079244dc3c 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -264,6 +264,21 @@ config ARCH_THREAD_STACK_ALLOCATOR
> >  config ARCH_WANTS_DYNAMIC_TASK_STRUCT
> >  	bool
> >  
> > +config ARCH_32BIT_OFF_T
> > +	bool
> > +	depends on !64BIT
> > +	help
> > +	  All new 32-bit architectures should have 64-bit off_t type on
> > +	  userspace side which corresponds to the loff_t kernel type. This
> > +	  is the requirement for modern ABIs. Some existing architectures
> > +	  already have 32-bit off_t. This option is enabled for all such
> > +	  architectures explicitly. Namely: arc, arm, blackfin, cris, frv,
> > +	  h8300, hexagon, m32r, m68k, metag, microblaze, mips32, mn10300,
> > +	  nios2, openrisc, parisc32, powerpc32, score, sh, sparc, tile32,
> > +	  unicore32, x86_32 and xtensa. This is the complete list. Any
> > +	  new 32-bit architecture should declare 64-bit off_t type on user
> > +	  side and so should not enable this option.
> 
> Do you know if this is the case for riscv and nds32, merged in the
> meantime? If not, I suggest you drop this patch altogether and just
> define force_o_largefile() for arm64/ilp32 as we don't seem to stick to
> "all new 32-bit architectures should have 64-bit off_t".

I wrote this patch at request of Arnd Bergmann. This is actually his
words that all new 32-bit architectures should have 64-bit off_t. So
I was surprized when riscv was merged with 32-bit off_t (and I didn't
follow nds32).

If this rule is still in force, we'd better add new exceptions to this
patch. Otherwise, we can drop it.

Arnd, could you please comment it?

Yury
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/24] 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
From: Yury Norov @ 2018-06-09  7:43 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: catalin.marinas, Arnd Bergmann, linux-arm-kernel, linux-kernel,
	linux-doc, linux-arch, linux-api, szabolcs.nagy, heiko.carstens,
	philipp.tomsich, joseph, sellcey, Prasun.Kapoor, schwab, agraf,
	bamv2005, geert, Dave.Martin, kilobyte, manuel.montezelo,
	james.hogan, cmetcalf, pinskia, linyongting, klimov.linux,
	broonie, maxim.kuvyrkov, fweimer, Nathan_Lynch, james.morse,
	ramana.gcc, schwidefsky, davem, christoph.muellner
In-Reply-To: <mhng-e1922456-a05b-46f9-8644-d45ad70a55e5@palmer-si-x1c4>

On Fri, Jun 08, 2018 at 03:33:51PM -0700, Palmer Dabbelt wrote:
> On Fri, 08 Jun 2018 10:32:07 PDT (-0700), catalin.marinas@arm.com wrote:
> > On Wed, May 16, 2018 at 11:18:49AM +0300, Yury Norov wrote:
> > > diff --git a/arch/Kconfig b/arch/Kconfig
> > > index 76c0b54443b1..ee079244dc3c 100644
> > > --- a/arch/Kconfig
> > > +++ b/arch/Kconfig
> > > @@ -264,6 +264,21 @@ config ARCH_THREAD_STACK_ALLOCATOR
> > >  config ARCH_WANTS_DYNAMIC_TASK_STRUCT
> > >  	bool
> > > 
> > > +config ARCH_32BIT_OFF_T
> > > +	bool
> > > +	depends on !64BIT
> > > +	help
> > > +	  All new 32-bit architectures should have 64-bit off_t type on
> > > +	  userspace side which corresponds to the loff_t kernel type. This
> > > +	  is the requirement for modern ABIs. Some existing architectures
> > > +	  already have 32-bit off_t. This option is enabled for all such
> > > +	  architectures explicitly. Namely: arc, arm, blackfin, cris, frv,
> > > +	  h8300, hexagon, m32r, m68k, metag, microblaze, mips32, mn10300,
> > > +	  nios2, openrisc, parisc32, powerpc32, score, sh, sparc, tile32,
> > > +	  unicore32, x86_32 and xtensa. This is the complete list. Any
> > > +	  new 32-bit architecture should declare 64-bit off_t type on user
> > > +	  side and so should not enable this option.
> > 
> > Do you know if this is the case for riscv and nds32, merged in the
> > meantime? If not, I suggest you drop this patch altogether and just
> > define force_o_largefile() for arm64/ilp32 as we don't seem to stick to
> > "all new 32-bit architectures should have 64-bit off_t".
> 
> We (RISC-V) don't have support for rv32i in glibc yet, so there really isn't
> a fixed ABI there yet.  From my understanding the rv32i port as it currently
> stands has a 32-bit off_t (via __kernel_off_t being defined as long), so
> this change would technically be a kernel ABI break.
> 
> Since we don't have rv32i glibc yet I'm not fundamentally opposed to an ABI
> break.  Is there a concrete advantage to this?

One obvious advantage is manipulating large files - if file is greater than
2G, you cannot easily mmap(), lseek() etc with 32-bit offset.

Another point is unification of layuots for structures like struct
stat between 32- and 64-bit worlds.

On glibc side it helps to unify 32-bit and 64-bit versions of syscalls.
Refer, for example this commit:
3c7f1f59cd161 (Consolidate lseek/lseek64/llseek implementations).

Yury
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox