[PATCH v4 0/1] Enable automatic SVN updates for SGX enclaves

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/1] Enable automatic SVN updates for SGX enclaves
@ 2025-05-07 11:13 Elena Reshetova
  2025-05-07 11:14 ` [PATCH v4 1/1] x86/sgx: " Elena Reshetova
  0 siblings, 1 reply; 10+ messages in thread
From: Elena Reshetova @ 2025-05-07 11:13 UTC (permalink / raw)
  To: dave.hansen
  Cc: jarkko, seanjc, kai.huang, linux-sgx, linux-kernel, x86,
	asit.k.mallick, vincent.r.scarlata, chongc, erdemaktas,
	vannapurve, dionnaglaze, bondarn, scott.raynor, Elena Reshetova

Changes since v3 following reviews by Kai and Sean:

 - Change the overall approach to the one suggested
  by Sean and do the EUPDATESVN execution during
  sgx_open() and sgx_vepc_open().
  Note, I do not try to do EUPDATESVN during the release()
  flows since it doesnt save any noticable amount
  of time during next open flow per underlying EUPDATESVN
  microcode logic.
 - In sgx_update_svn() remove the check that we are
  running under VMM and expect the VMM to instead
  expose correct CPUID
 - Move the actual CPUID leaf check out of
  sgx_update_svn() into sgx_init()
 - Use existing RDRAND_RETRY_LOOPS define instead of 10
 - Change the sgx_update_svn() to return 0 only in
 success cases (or if unsupported)
 - Various smaller cosmetic fixes

The still to be discussed question is what sgx_update_svn()
should return in case of various failures. The current version
follows suggestion by Kai and would return an error (and block
sgx_(vepc_)open() in all cases, including running out of entropy.
I think this might be the correct approach for SGX_INSUFFICIENT_ENTROPY
since in such cases userspace can retry the open() and also
will get an info about what is actually blocking the EUPDATEVSN
(and can act on this). However, this is a change in existing API
and therefore debatable and I would like to hear people's feedback.

Changes since v2 following review by Jarkko:

 - formatting of comments is fixed
 - change from pr_error to ENCLS_WARN to communicate errors from
 EUPDATESVN
 - In case an unknown error is detected (must not happen per spec),
 make page allocation from EPC fail in order to prevent EPC usage

Changes since v1 following review by Jarkko:

 - first and second patch are squashed together and a better
   explanation of the change is added into the commit message
 - third and fourth patch are also combined for better understanding
   of error code purposes used in 4th patch
 - implementation of sgx_updatesvn adjusted following Jarkko's 
   suggestions
 - minor fixes in both commit messages and code from the review
 - dropping co-developed-by tag since the code now differs enough
   from the original submission. However, the reference where the
   original code came from and credits to original author is kept

Background
----------

In case an SGX vulnerability is discovered and TCB recovery
for SGX is triggered, Intel specifies a process that must be
followed for a given vulnerability. Steps to mitigate can vary
based on vulnerability type, affected components, etc.
In some cases, a vulnerability can be mitigated via a runtime
recovery flow by shutting down all running SGX enclaves,
clearing enclave page cache (EPC), applying a microcode patch
that does not require a reboot (via late microcode loading) and
restarting all SGX enclaves.

Problem statement
-------------------------
Even when the above-described runtime recovery flow to mitigate the
SGX vulnerability is followed, the SGX attestation evidence will
still reflect the security SVN version being equal to the previous
state of security SVN (containing vulnerability) that created
and managed the enclave until the runtime recovery event. This
limitation currently can be only overcome via a platform reboot,
which negates all the benefits from the rebootless late microcode
loading and not required in this case for functional or security
purposes.

Proposed solution
-----------------

SGX architecture introduced  a new instruction called EUPDATESVN [1]
to Ice Lake. It allows updating security SVN version, given that EPC
is completely empty. The latter is required for security reasons
in order to reason that enclave security posture is as secure as the
security SVN version of the TCB that created it.

This series enables opportunistic execution of EUPDATESVN upon first
EPC page allocation for a first enclave to be run on the platform.

This series is partly based on the previous work done by Cathy Zhang
[2], which attempted to enable forceful destruction of all SGX
enclaves and execution of EUPDATESVN upon successful application of
any microcode patch. This approach is determined as being too
intrusive for the running SGX enclaves, especially taking into account
that it would be performed upon *every* microcode patch application
regardless if it changes the security SVN version or not (change to the
security SVN version is a rare event).

Testing
-------

Tested on EMR machine using kernel-6.15.0_rc5 & sgx selftests.
If Google folks in CC can test on their side, it would be greatly appreciated.

References
----------

[1] https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true
[2] https://lore.kernel.org/all/20220520103904.1216-1-cathy.zhang@intel.com/

Elena Reshetova (1):
  x86/sgx: Enable automatic SVN updates for SGX enclaves

 arch/x86/include/asm/sgx.h       | 42 ++++++++++++-------
 arch/x86/kernel/cpu/sgx/driver.c |  4 ++
 arch/x86/kernel/cpu/sgx/encl.c   |  1 +
 arch/x86/kernel/cpu/sgx/encls.h  |  5 +++
 arch/x86/kernel/cpu/sgx/main.c   | 70 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h    |  3 ++
 arch/x86/kernel/cpu/sgx/virt.c   |  6 +++
 7 files changed, 116 insertions(+), 15 deletions(-)

-- 
2.45.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-07 11:13 [PATCH v4 0/1] Enable automatic SVN updates for SGX enclaves Elena Reshetova
@ 2025-05-07 11:14 ` Elena Reshetova
  2025-05-07 16:04   ` Dave Hansen
  2025-05-08 20:13   ` Jarkko Sakkinen
  0 siblings, 2 replies; 10+ messages in thread
From: Elena Reshetova @ 2025-05-07 11:14 UTC (permalink / raw)
  To: dave.hansen
  Cc: jarkko, seanjc, kai.huang, linux-sgx, linux-kernel, x86,
	asit.k.mallick, vincent.r.scarlata, chongc, erdemaktas,
	vannapurve, dionnaglaze, bondarn, scott.raynor, Elena Reshetova

In case an SGX vulnerability is discovered and TCB recovery
for SGX is triggered, Intel specifies a process that must be
followed for a given vulnerability. Steps to mitigate can vary
based on vulnerability type, affected components, etc.
In some cases, a vulnerability can be mitigated via a runtime
recovery flow by shutting down all running SGX enclaves,
clearing enclave page cache (EPC), applying a microcode patch
that does not require a reboot (via late microcode loading) and
restarting all SGX enclaves.

Even when the above-described runtime recovery flow to mitigate the
SGX vulnerability is followed, the SGX attestation evidence will
still reflect the security SVN version being equal to the previous
state of security SVN (containing vulnerability) that created
and managed the enclave until the runtime recovery event. This
limitation currently can be only overcome via a platform reboot,
which negates all the benefits from the rebootless late microcode
loading and not required in this case for functional or security
purposes.

SGX architecture introduced a new instruction called ENCLS[EUPDATESVN]
to Ice Lake [1]. It allows updating security SVN version, given that EPC
is completely empty. The latter is required for security reasons
in order to reason that enclave security posture is as secure as the
security SVN version of the TCB that created it.

Additionally it is important to ensure that while ENCLS[EUPDATESVN]
runs, no concurrent page creation happens in EPC, because it might
result in #GP delivered to the creator. Legacy SW might not be prepared
to handle such unexpected #GPs and therefore this patch introduces
a locking mechanism in sgx_(vepc_)open to ensure no concurrent EPC
allocations can happen.

Implement ENCLS[EUPDATESVN] and enable kernel to opportunistically
call it during sgx_(vepc_)open().
The patch is loosely based on previous submission by Cathy Zhang [2].

[1]
https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true
[2]
https://lore.kernel.org/all/20220520103904.1216-1-cathy.zhang@intel.com/

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 arch/x86/include/asm/sgx.h       | 42 ++++++++++++-------
 arch/x86/kernel/cpu/sgx/driver.c |  4 ++
 arch/x86/kernel/cpu/sgx/encl.c   |  1 +
 arch/x86/kernel/cpu/sgx/encls.h  |  5 +++
 arch/x86/kernel/cpu/sgx/main.c   | 70 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/sgx.h    |  3 ++
 arch/x86/kernel/cpu/sgx/virt.c   |  6 +++
 7 files changed, 116 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 6a0069761508..8ac026ef6365 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -27,22 +27,26 @@
 /* The bitmask for the EPC section type. */
 #define SGX_CPUID_EPC_MASK	GENMASK(3, 0)
 
+/* EUPDATESVN support in CPUID.0x12.0.EAX */
+#define SGX_CPUID_EUPDATESVN	BIT(10)
+
 enum sgx_encls_function {
-	ECREATE	= 0x00,
-	EADD	= 0x01,
-	EINIT	= 0x02,
-	EREMOVE	= 0x03,
-	EDGBRD	= 0x04,
-	EDGBWR	= 0x05,
-	EEXTEND	= 0x06,
-	ELDU	= 0x08,
-	EBLOCK	= 0x09,
-	EPA	= 0x0A,
-	EWB	= 0x0B,
-	ETRACK	= 0x0C,
-	EAUG	= 0x0D,
-	EMODPR	= 0x0E,
-	EMODT	= 0x0F,
+	ECREATE		= 0x00,
+	EADD		= 0x01,
+	EINIT		= 0x02,
+	EREMOVE		= 0x03,
+	EDGBRD		= 0x04,
+	EDGBWR		= 0x05,
+	EEXTEND		= 0x06,
+	ELDU		= 0x08,
+	EBLOCK		= 0x09,
+	EPA			= 0x0A,
+	EWB			= 0x0B,
+	ETRACK		= 0x0C,
+	EAUG		= 0x0D,
+	EMODPR		= 0x0E,
+	EMODT		= 0x0F,
+	EUPDATESVN	= 0x18,
 };
 
 /**
@@ -73,6 +77,11 @@ enum sgx_encls_function {
  *				public key does not match IA32_SGXLEPUBKEYHASH.
  * %SGX_PAGE_NOT_MODIFIABLE:	The EPC page cannot be modified because it
  *				is in the PENDING or MODIFIED state.
+ * %SGX_INSUFFICIENT_ENTROPY:	Insufficient entropy in RNG.
+ * %SGX_EPC_NOT_READY:			EPC is not ready for SVN update.
+ * %SGX_NO_UPDATE:		EUPDATESVN was successful, but CPUSVN was not
+ *				updated because current SVN was not newer than
+ *				CPUSVN.
  * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
  */
 enum sgx_return_code {
@@ -81,6 +90,9 @@ enum sgx_return_code {
 	SGX_CHILD_PRESENT		= 13,
 	SGX_INVALID_EINITTOKEN		= 16,
 	SGX_PAGE_NOT_MODIFIABLE		= 20,
+	SGX_INSUFFICIENT_ENTROPY	= 29,
+	SGX_EPC_NOT_READY			= 30,
+	SGX_NO_UPDATE				= 31,
 	SGX_UNMASKED_EVENT		= 128,
 };
 
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 7f8d1e11dbee..669e44d61f9f 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file *file)
 	struct sgx_encl *encl;
 	int ret;
 
+	ret = sgx_inc_usage_count();
+	if (ret)
+		return ret;
+
 	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
 	if (!encl)
 		return -ENOMEM;
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..3b54889ae4a4 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -765,6 +765,7 @@ void sgx_encl_release(struct kref *ref)
 	WARN_ON_ONCE(encl->secs.epc_page);
 
 	kfree(encl);
+	sgx_dec_usage_count();
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
index 99004b02e2ed..788acf8ec563 100644
--- a/arch/x86/kernel/cpu/sgx/encls.h
+++ b/arch/x86/kernel/cpu/sgx/encls.h
@@ -233,4 +233,9 @@ static inline int __eaug(struct sgx_pageinfo *pginfo, void *addr)
 	return __encls_2(EAUG, pginfo, addr);
 }
 
+/* Update CPUSVN at runtime. */
+static inline int __eupdatesvn(void)
+{
+	return __encls_ret_1(EUPDATESVN, "");
+}
 #endif /* _X86_ENCLS_H */
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8ce352fc72ac..2872567cd22b 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -15,6 +15,7 @@
 #include <linux/sysfs.h>
 #include <linux/vmalloc.h>
 #include <asm/sgx.h>
+#include <asm/archrandom.h>
 #include "driver.h"
 #include "encl.h"
 #include "encls.h"
@@ -914,6 +915,73 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+static bool sgx_has_eupdatesvn;
+static atomic_t sgx_usage_count;
+static DEFINE_MUTEX(sgx_svn_lock);
+
+/**
+ * sgx_updatesvn() - Issue ENCLS[EUPDATESVN]
+ * If EPC is empty, this instruction will update CPUSVN to the currently
+ * loaded microcode update SVN and generate new cryptographic assets.
+ *
+ * Return:
+ * 0: Success or not supported
+ * errno on error
+ */
+static int sgx_update_svn(void)
+{
+	int retry = RDRAND_RETRY_LOOPS;
+	int ret;
+
+	if (!sgx_has_eupdatesvn)
+		return 0;
+
+	do {
+		ret = __eupdatesvn();
+	} while (ret == SGX_INSUFFICIENT_ENTROPY && --retry);
+
+	if (!ret || ret == SGX_NO_UPDATE) {
+		/*
+		 * SVN successfully updated, or it was already up-to-date.
+		 * Let users know when the update was successful.
+		 */
+		if (!ret)
+			pr_info("SVN updated successfully\n");
+		return 0;
+	}
+
+	/*
+	 * EUPDATESVN was called when EPC is empty, all other error
+	 * codes are unexcepted except running out of entropy.
+	 */
+	if (ret != SGX_INSUFFICIENT_ENTROPY)
+		ENCLS_WARN(ret, "EUPDATESVN");
+	return ret;
+}
+
+int sgx_inc_usage_count(void)
+{
+	int ret;
+
+	if (atomic_inc_not_zero(&sgx_usage_count))
+		return 0;
+
+	guard(mutex)(&sgx_svn_lock);
+
+	if (atomic_inc_not_zero(&sgx_usage_count))
+		return 0;
+
+	ret = sgx_update_svn();
+	if (!ret)
+		atomic_inc(&sgx_usage_count);
+	return ret;
+}
+
+void sgx_dec_usage_count(void)
+{
+	atomic_dec(&sgx_usage_count);
+}
+
 static int __init sgx_init(void)
 {
 	int ret;
@@ -947,6 +1015,8 @@ static int __init sgx_init(void)
 	if (sgx_vepc_init() && ret)
 		goto err_provision;
 
+	sgx_has_eupdatesvn = (cpuid_eax(SGX_CPUID) & SGX_CPUID_EUPDATESVN);
+
 	return 0;
 
 err_provision:
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index d2dad21259a8..f5940393d9bd 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -102,6 +102,9 @@ static inline int __init sgx_vepc_init(void)
 }
 #endif
 
+int sgx_inc_usage_count(void);
+void sgx_dec_usage_count(void);
+
 void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
 
 #endif /* _X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 7aaa3652e31d..e548de340c2e 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -255,12 +255,18 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
 	xa_destroy(&vepc->page_array);
 	kfree(vepc);
 
+	sgx_dec_usage_count();
 	return 0;
 }
 
 static int sgx_vepc_open(struct inode *inode, struct file *file)
 {
 	struct sgx_vepc *vepc;
+	int ret;
+
+	ret = sgx_inc_usage_count();
+	if (ret)
+		return ret;
 
 	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
 	if (!vepc)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-07 11:14 ` [PATCH v4 1/1] x86/sgx: " Elena Reshetova
@ 2025-05-07 16:04   ` Dave Hansen
  2025-05-08 14:07     ` Reshetova, Elena
  2025-05-08 20:13   ` Jarkko Sakkinen
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2025-05-07 16:04 UTC (permalink / raw)
  To: Elena Reshetova
  Cc: jarkko, seanjc, kai.huang, linux-sgx, linux-kernel, x86,
	asit.k.mallick, vincent.r.scarlata, chongc, erdemaktas,
	vannapurve, dionnaglaze, bondarn, scott.raynor

On 5/7/25 04:14, Elena Reshetova wrote:
> In case an SGX vulnerability is discovered and TCB recovery
> for SGX is triggered, Intel specifies a process that must be
> followed for a given vulnerability. Steps to mitigate can vary
> based on vulnerability type, affected components, etc.
> In some cases, a vulnerability can be mitigated via a runtime
> recovery flow by shutting down all running SGX enclaves,
> clearing enclave page cache (EPC), applying a microcode patch
> that does not require a reboot (via late microcode loading) and
> restarting all SGX enclaves.

Plain language and brevity have a lot of value in changelogs. There's a
substantial amount of irrelevant content here.

> Even when the above-described runtime recovery flow to mitigate the
> SGX vulnerability is followed, the SGX attestation evidence will
> still reflect the security SVN version being equal to the previous
> state of security SVN (containing vulnerability) that created
> and managed the enclave until the runtime recovery event. This
> limitation currently can be only overcome via a platform reboot,
> which negates all the benefits from the rebootless late microcode
> loading and not required in this case for functional or security
> purposes.

Can this please be trimmed down?

Actually, I think I wrote changelogs for this once upon a time. Could
you please go dig those up and use them?

> SGX architecture introduced a new instruction called ENCLS[EUPDATESVN]
> to Ice Lake [1].

Is it really on "Ice Lake" parts? Like, does it show up on
INTEL_ICELAKE? If not, this is only confusing and mostly irrelevant
information.

> It allows updating security SVN version, given that EPC
> is completely empty. The latter is required for security reasons
> in order to reason that enclave security posture is as secure as the
> security SVN version of the TCB that created it.
> 
> Additionally it is important to ensure that while ENCLS[EUPDATESVN]
> runs, no concurrent page creation happens in EPC, because it might
> result in #GP delivered to the creator. Legacy SW might not be prepared
> to handle such unexpected #GPs and therefore this patch introduces
> a locking mechanism in sgx_(vepc_)open to ensure no concurrent EPC
> allocations can happen.
> 
> Implement ENCLS[EUPDATESVN] and enable kernel to opportunistically
> call it during sgx_(vepc_)open().


> [1]
> https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true

These become stale almost immediately. Please also cite the document title.

>  arch/x86/include/asm/sgx.h       | 42 ++++++++++++-------
>  arch/x86/kernel/cpu/sgx/driver.c |  4 ++
>  arch/x86/kernel/cpu/sgx/encl.c   |  1 +
>  arch/x86/kernel/cpu/sgx/encls.h  |  5 +++
>  arch/x86/kernel/cpu/sgx/main.c   | 70 ++++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/sgx.h    |  3 ++
>  arch/x86/kernel/cpu/sgx/virt.c   |  6 +++
>  7 files changed, 116 insertions(+), 15 deletions(-)

Gah, how did this get squished back down to a single patch? It was
multiple patches before. There are multiple logical things going on here
and they need to be broken out.

> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 6a0069761508..8ac026ef6365 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -27,22 +27,26 @@
>  /* The bitmask for the EPC section type. */
>  #define SGX_CPUID_EPC_MASK	GENMASK(3, 0)
>  
> +/* EUPDATESVN support in CPUID.0x12.0.EAX */
> +#define SGX_CPUID_EUPDATESVN	BIT(10)
> +
>  enum sgx_encls_function {
> -	ECREATE	= 0x00,
> -	EADD	= 0x01,
> -	EINIT	= 0x02,
> -	EREMOVE	= 0x03,
> -	EDGBRD	= 0x04,
> -	EDGBWR	= 0x05,
> -	EEXTEND	= 0x06,
> -	ELDU	= 0x08,
> -	EBLOCK	= 0x09,
> -	EPA	= 0x0A,
> -	EWB	= 0x0B,
> -	ETRACK	= 0x0C,
> -	EAUG	= 0x0D,
> -	EMODPR	= 0x0E,
> -	EMODT	= 0x0F,
> +	ECREATE		= 0x00,
> +	EADD		= 0x01,
> +	EINIT		= 0x02,
> +	EREMOVE		= 0x03,
> +	EDGBRD		= 0x04,
> +	EDGBWR		= 0x05,
> +	EEXTEND		= 0x06,
> +	ELDU		= 0x08,
> +	EBLOCK		= 0x09,
> +	EPA			= 0x0A,
> +	EWB			= 0x0B,
> +	ETRACK		= 0x0C,
> +	EAUG		= 0x0D,
> +	EMODPR		= 0x0E,
> +	EMODT		= 0x0F,
> +	EUPDATESVN	= 0x18,
>  };
>  
>  /**
> @@ -73,6 +77,11 @@ enum sgx_encls_function {
>   *				public key does not match IA32_SGXLEPUBKEYHASH.
>   * %SGX_PAGE_NOT_MODIFIABLE:	The EPC page cannot be modified because it
>   *				is in the PENDING or MODIFIED state.
> + * %SGX_INSUFFICIENT_ENTROPY:	Insufficient entropy in RNG.
> + * %SGX_EPC_NOT_READY:			EPC is not ready for SVN update.
> + * %SGX_NO_UPDATE:		EUPDATESVN was successful, but CPUSVN was not
> + *				updated because current SVN was not newer than
> + *				CPUSVN.
>   * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
>   */
>  enum sgx_return_code {
> @@ -81,6 +90,9 @@ enum sgx_return_code {
>  	SGX_CHILD_PRESENT		= 13,
>  	SGX_INVALID_EINITTOKEN		= 16,
>  	SGX_PAGE_NOT_MODIFIABLE		= 20,
> +	SGX_INSUFFICIENT_ENTROPY	= 29,
> +	SGX_EPC_NOT_READY			= 30,
> +	SGX_NO_UPDATE				= 31,
>  	SGX_UNMASKED_EVENT		= 128,
>  };

I'd *much* rather that these mechanical constant introductions and
mindless refactoring (like reindenting) not be mixed with actual logic code.

>  
> diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
> index 7f8d1e11dbee..669e44d61f9f 100644
> --- a/arch/x86/kernel/cpu/sgx/driver.c
> +++ b/arch/x86/kernel/cpu/sgx/driver.c
> @@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file *file)
>  	struct sgx_encl *encl;
>  	int ret;
>  
> +	ret = sgx_inc_usage_count();
> +	if (ret)
> +		return ret;
> +
>  	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
>  	if (!encl)
>  		return -ENOMEM;
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 279148e72459..3b54889ae4a4 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -765,6 +765,7 @@ void sgx_encl_release(struct kref *ref)
>  	WARN_ON_ONCE(encl->secs.epc_page);
>  
>  	kfree(encl);
> +	sgx_dec_usage_count();
>  }
>  
>  /*
> diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
> index 99004b02e2ed..788acf8ec563 100644
> --- a/arch/x86/kernel/cpu/sgx/encls.h
> +++ b/arch/x86/kernel/cpu/sgx/encls.h
> @@ -233,4 +233,9 @@ static inline int __eaug(struct sgx_pageinfo *pginfo, void *addr)
>  	return __encls_2(EAUG, pginfo, addr);
>  }
>  
> +/* Update CPUSVN at runtime. */
> +static inline int __eupdatesvn(void)
> +{
> +	return __encls_ret_1(EUPDATESVN, "");
> +}
>  #endif /* _X86_ENCLS_H */
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 8ce352fc72ac..2872567cd22b 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -15,6 +15,7 @@
>  #include <linux/sysfs.h>
>  #include <linux/vmalloc.h>
>  #include <asm/sgx.h>
> +#include <asm/archrandom.h>
>  #include "driver.h"
>  #include "encl.h"
>  #include "encls.h"
> @@ -914,6 +915,73 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
>  }
>  EXPORT_SYMBOL_GPL(sgx_set_attribute);
>  
> +static bool sgx_has_eupdatesvn;

We have CPUID "caches" of sorts. Why open code this?

> +static atomic_t sgx_usage_count;

Is 32 bits enough for sgx_usage_count? What goes boom when it overflows?

> +static DEFINE_MUTEX(sgx_svn_lock);

What does this lock protect?

> +/**
> + * sgx_updatesvn() - Issue ENCLS[EUPDATESVN]
> + * If EPC is empty, this instruction will update CPUSVN to the currently
> + * loaded microcode update SVN and generate new cryptographic assets.

This is *NOT* EUPDATESVN. "this instruction" is not what's happening here.

sgx_updatesvn() _tries_ to update the SVN. Most of the time, there will
be no update and that's OK. This should only be called with EPC is empty.

> + * Return:
> + * 0: Success or not supported
> + * errno on error

I'm not a big fan of filling thing out just because.

What value is there in saying "errno on error"?

> + */
> +static int sgx_update_svn(void)
> +{
> +	int retry = RDRAND_RETRY_LOOPS;
> +	int ret;
> +
> +	if (!sgx_has_eupdatesvn)
> +		return 0;

This looks goofy. Why is it OK to just silently ignore an update
request? (I know the answer, but it needs to be obvious)

> +	do {
> +		ret = __eupdatesvn();
> +	} while (ret == SGX_INSUFFICIENT_ENTROPY && --retry);


	for (i = 0; i < RDRAND_RETRY_LOOPS; i++) {
		ret = __eupdatesvn();

		/* Stop on success or unexpected errors: */
		if (ret != SGX_INSUFFICIENT_ENTROPY)
			break;
	}

> +	if (!ret || ret == SGX_NO_UPDATE) {
> +		/*
> +		 * SVN successfully updated, or it was already up-to-date.
> +		 * Let users know when the update was successful.
> +		 */
> +		if (!ret)
> +			pr_info("SVN updated successfully\n");
> +		return 0;
> +	}

Isn't this a lot simpler?

	if (ret == SGX_NO_UPDATE)
		return 0;

	if (!ret) {
		pr_info("SVN updated successfully\n");
		return 0;
	}

> +	/*
> +	 * EUPDATESVN was called when EPC is empty, all other error
> +	 * codes are unexcepted except running out of entropy.

			^ unexpected

Spell check, please.

> +	 */
> +	if (ret != SGX_INSUFFICIENT_ENTROPY)
> +		ENCLS_WARN(ret, "EUPDATESVN");
> +	return ret;
> +}

The indentation here is backwards. The error paths should be indented
and the success path at the lowest indent whenever possible. This, for
example:

	if (ret == SGX_NO_UPDATE)
		return 0;

	if (ret) {
		ENCLS_WARN(ret, "EUPDATESVN");
		return ret;
	}

	pr_info("SVN updated successfully\n");
	return 0;

Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I thought it
was supposed to be horribly rare. Shouldn't we warn on it?

> +int sgx_inc_usage_count(void)
> +{

No comments, eh?

What does success _mean_? What does failure mean?

> +	int ret;
> +
> +	if (atomic_inc_not_zero(&sgx_usage_count))
> +		return 0;
> +
> +	guard(mutex)(&sgx_svn_lock);
> +
> +	if (atomic_inc_not_zero(&sgx_usage_count))
> +		return 0;
> +
> +	ret = sgx_update_svn();
> +	if (!ret)
> +		atomic_inc(&sgx_usage_count);
> +	return ret;
> +}

Gah, this is 100% *NOT* obvious what's going on. Yet there are zero
comments on it. The lock is uncommented. The atomic is uncommented.

What does any of this mean? What do the components do?

> +
> +void sgx_dec_usage_count(void)
> +{
> +	atomic_dec(&sgx_usage_count);
> +}



>  static int __init sgx_init(void)
>  {
>  	int ret;
> @@ -947,6 +1015,8 @@ static int __init sgx_init(void)
>  	if (sgx_vepc_init() && ret)
>  		goto err_provision;
>  
> +	sgx_has_eupdatesvn = (cpuid_eax(SGX_CPUID) & SGX_CPUID_EUPDATESVN);
> +
>  	return 0;
>  
>  err_provision:
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index d2dad21259a8..f5940393d9bd 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -102,6 +102,9 @@ static inline int __init sgx_vepc_init(void)
>  }
>  #endif
>  
> +int sgx_inc_usage_count(void);
> +void sgx_dec_usage_count(void);
> +
>  void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
>  
>  #endif /* _X86_SGX_H */
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 7aaa3652e31d..e548de340c2e 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -255,12 +255,18 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
>  	xa_destroy(&vepc->page_array);
>  	kfree(vepc);
>  
> +	sgx_dec_usage_count();
>  	return 0;
>  }
>  
>  static int sgx_vepc_open(struct inode *inode, struct file *file)
>  {
>  	struct sgx_vepc *vepc;
> +	int ret;
> +
> +	ret = sgx_inc_usage_count();
> +	if (ret)
> +		return ret;
>  
>  	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
>  	if (!vepc)

I think I'd do this in at least 4 patches:

1. Introduce the usage count tracking: the atomic and the open/release
   "hooks", maybe without error handling on the open() side
2. Introduce the EUPDATESVN mechanical bits: the CPUID bit, the
   enumeration, the bool, the new error enum values
3. Introduce the mechanical eupdatesvn function. The retry loop and the
   "no entropy" handling
4. Plumb #3 into #1

#4 is your place to argue if EUPDATESVN failures should cascade to open().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-07 16:04   ` Dave Hansen
@ 2025-05-08 14:07     ` Reshetova, Elena
  2025-05-08 15:45       ` Dave Hansen
  0 siblings, 1 reply; 10+ messages in thread
From: Reshetova, Elena @ 2025-05-08 14:07 UTC (permalink / raw)
  To: Hansen, Dave
  Cc: jarkko@kernel.org, seanjc@google.com, Huang, Kai,
	linux-sgx@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, Mallick, Asit K, Scarlata, Vincent R, Cai, Chong,
	Aktas, Erdem, Annapurve, Vishal, dionnaglaze@google.com,
	bondarn@google.com, Raynor, Scott

Thank you very much for your detailed review, Dave!
Responses inline below. 


> On 5/7/25 04:14, Elena Reshetova wrote:
> > In case an SGX vulnerability is discovered and TCB recovery
> > for SGX is triggered, Intel specifies a process that must be
> > followed for a given vulnerability. Steps to mitigate can vary
> > based on vulnerability type, affected components, etc.
> > In some cases, a vulnerability can be mitigated via a runtime
> > recovery flow by shutting down all running SGX enclaves,
> > clearing enclave page cache (EPC), applying a microcode patch
> > that does not require a reboot (via late microcode loading) and
> > restarting all SGX enclaves.
> 
> Plain language and brevity have a lot of value in changelogs. There's a
> substantial amount of irrelevant content here.
> 
> > Even when the above-described runtime recovery flow to mitigate the
> > SGX vulnerability is followed, the SGX attestation evidence will
> > still reflect the security SVN version being equal to the previous
> > state of security SVN (containing vulnerability) that created
> > and managed the enclave until the runtime recovery event. This
> > limitation currently can be only overcome via a platform reboot,
> > which negates all the benefits from the rebootless late microcode
> > loading and not required in this case for functional or security
> > purposes.
> 
> Can this please be trimmed down?

Sure, I can work on trimming the text. 

> 
> Actually, I think I wrote changelogs for this once upon a time. Could
> you please go dig those up and use them?

Could you please suggest where can I find them? Was it for the previous
submission done by Cathy? 

> 
> > SGX architecture introduced a new instruction called ENCLS[EUPDATESVN]
> > to Ice Lake [1].
> 
> Is it really on "Ice Lake" parts? Like, does it show up on
> INTEL_ICELAKE? If not, this is only confusing and mostly irrelevant
> information.

Ok, will remove. 

> 
> > It allows updating security SVN version, given that EPC
> > is completely empty. The latter is required for security reasons
> > in order to reason that enclave security posture is as secure as the
> > security SVN version of the TCB that created it.
> >
> > Additionally it is important to ensure that while ENCLS[EUPDATESVN]
> > runs, no concurrent page creation happens in EPC, because it might
> > result in #GP delivered to the creator. Legacy SW might not be prepared
> > to handle such unexpected #GPs and therefore this patch introduces
> > a locking mechanism in sgx_(vepc_)open to ensure no concurrent EPC
> > allocations can happen.
> >
> > Implement ENCLS[EUPDATESVN] and enable kernel to opportunistically
> > call it during sgx_(vepc_)open().
> 
> 
> > [1]
> > https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true
> 
> These become stale almost immediately. Please also cite the document title.

Sure, I can add the title, but I dont have another link. 

> 
> >  arch/x86/include/asm/sgx.h       | 42 ++++++++++++-------
> >  arch/x86/kernel/cpu/sgx/driver.c |  4 ++
> >  arch/x86/kernel/cpu/sgx/encl.c   |  1 +
> >  arch/x86/kernel/cpu/sgx/encls.h  |  5 +++
> >  arch/x86/kernel/cpu/sgx/main.c   | 70
> ++++++++++++++++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/sgx.h    |  3 ++
> >  arch/x86/kernel/cpu/sgx/virt.c   |  6 +++
> >  7 files changed, 116 insertions(+), 15 deletions(-)
> 
> Gah, how did this get squished back down to a single patch? It was
> multiple patches before. There are multiple logical things going on here
> and they need to be broken out.

Yes, but the code actually did get smaller, so it didn’t feel
like it is worth breaking this into different patches. 
But I will follow your suggestion.

> 
> > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> > index 6a0069761508..8ac026ef6365 100644
> > --- a/arch/x86/include/asm/sgx.h
> > +++ b/arch/x86/include/asm/sgx.h
> > @@ -27,22 +27,26 @@
> >  /* The bitmask for the EPC section type. */
> >  #define SGX_CPUID_EPC_MASK	GENMASK(3, 0)
> >
> > +/* EUPDATESVN support in CPUID.0x12.0.EAX */
> > +#define SGX_CPUID_EUPDATESVN	BIT(10)
> > +
> >  enum sgx_encls_function {
> > -	ECREATE	= 0x00,
> > -	EADD	= 0x01,
> > -	EINIT	= 0x02,
> > -	EREMOVE	= 0x03,
> > -	EDGBRD	= 0x04,
> > -	EDGBWR	= 0x05,
> > -	EEXTEND	= 0x06,
> > -	ELDU	= 0x08,
> > -	EBLOCK	= 0x09,
> > -	EPA	= 0x0A,
> > -	EWB	= 0x0B,
> > -	ETRACK	= 0x0C,
> > -	EAUG	= 0x0D,
> > -	EMODPR	= 0x0E,
> > -	EMODT	= 0x0F,
> > +	ECREATE		= 0x00,
> > +	EADD		= 0x01,
> > +	EINIT		= 0x02,
> > +	EREMOVE		= 0x03,
> > +	EDGBRD		= 0x04,
> > +	EDGBWR		= 0x05,
> > +	EEXTEND		= 0x06,
> > +	ELDU		= 0x08,
> > +	EBLOCK		= 0x09,
> > +	EPA			= 0x0A,
> > +	EWB			= 0x0B,
> > +	ETRACK		= 0x0C,
> > +	EAUG		= 0x0D,
> > +	EMODPR		= 0x0E,
> > +	EMODT		= 0x0F,
> > +	EUPDATESVN	= 0x18,
> >  };
> >
> >  /**
> > @@ -73,6 +77,11 @@ enum sgx_encls_function {
> >   *				public key does not match
> IA32_SGXLEPUBKEYHASH.
> >   * %SGX_PAGE_NOT_MODIFIABLE:	The EPC page cannot be modified
> because it
> >   *				is in the PENDING or MODIFIED state.
> > + * %SGX_INSUFFICIENT_ENTROPY:	Insufficient entropy in RNG.
> > + * %SGX_EPC_NOT_READY:			EPC is not ready for SVN
> update.
> > + * %SGX_NO_UPDATE:		EUPDATESVN was successful, but
> CPUSVN was not
> > + *				updated because current SVN was not newer
> than
> > + *				CPUSVN.
> >   * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was
> received
> >   */
> >  enum sgx_return_code {
> > @@ -81,6 +90,9 @@ enum sgx_return_code {
> >  	SGX_CHILD_PRESENT		= 13,
> >  	SGX_INVALID_EINITTOKEN		= 16,
> >  	SGX_PAGE_NOT_MODIFIABLE		= 20,
> > +	SGX_INSUFFICIENT_ENTROPY	= 29,
> > +	SGX_EPC_NOT_READY			= 30,
> > +	SGX_NO_UPDATE				= 31,
> >  	SGX_UNMASKED_EVENT		= 128,
> >  };
> 
> I'd *much* rather that these mechanical constant introductions and
> mindless refactoring (like reindenting) not be mixed with actual logic code.

Understood. 

> 
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/driver.c
> b/arch/x86/kernel/cpu/sgx/driver.c
> > index 7f8d1e11dbee..669e44d61f9f 100644
> > --- a/arch/x86/kernel/cpu/sgx/driver.c
> > +++ b/arch/x86/kernel/cpu/sgx/driver.c
> > @@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file
> *file)
> >  	struct sgx_encl *encl;
> >  	int ret;
> >
> > +	ret = sgx_inc_usage_count();
> > +	if (ret)
> > +		return ret;
> > +
> >  	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
> >  	if (!encl)
> >  		return -ENOMEM;
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 279148e72459..3b54889ae4a4 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -765,6 +765,7 @@ void sgx_encl_release(struct kref *ref)
> >  	WARN_ON_ONCE(encl->secs.epc_page);
> >
> >  	kfree(encl);
> > +	sgx_dec_usage_count();
> >  }
> >
> >  /*
> > diff --git a/arch/x86/kernel/cpu/sgx/encls.h
> b/arch/x86/kernel/cpu/sgx/encls.h
> > index 99004b02e2ed..788acf8ec563 100644
> > --- a/arch/x86/kernel/cpu/sgx/encls.h
> > +++ b/arch/x86/kernel/cpu/sgx/encls.h
> > @@ -233,4 +233,9 @@ static inline int __eaug(struct sgx_pageinfo *pginfo,
> void *addr)
> >  	return __encls_2(EAUG, pginfo, addr);
> >  }
> >
> > +/* Update CPUSVN at runtime. */
> > +static inline int __eupdatesvn(void)
> > +{
> > +	return __encls_ret_1(EUPDATESVN, "");
> > +}
> >  #endif /* _X86_ENCLS_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/main.c
> b/arch/x86/kernel/cpu/sgx/main.c
> > index 8ce352fc72ac..2872567cd22b 100644
> > --- a/arch/x86/kernel/cpu/sgx/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/main.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/sysfs.h>
> >  #include <linux/vmalloc.h>
> >  #include <asm/sgx.h>
> > +#include <asm/archrandom.h>
> >  #include "driver.h"
> >  #include "encl.h"
> >  #include "encls.h"
> > @@ -914,6 +915,73 @@ int sgx_set_attribute(unsigned long
> *allowed_attributes,
> >  }
> >  EXPORT_SYMBOL_GPL(sgx_set_attribute);
> >
> > +static bool sgx_has_eupdatesvn;
> 
> We have CPUID "caches" of sorts. Why open code this?

You mean X86_FEATURE_*? 

SGX CPUID is only defined in SGX code currently (btw, I am not sure
why they are made special) so it doesn’t support this.
Is this a past decision that you want to change? 

> 
> > +static atomic_t sgx_usage_count;
> 
> Is 32 bits enough for sgx_usage_count? What goes boom when it overflows?

I can increase the value easily, but even with current number we are talking
about 2^32 enclaves, i dont think this is realistic to happen in practice. 

> 
> > +static DEFINE_MUTEX(sgx_svn_lock);
> 
> What does this lock protect?

Will add description. 

> 
> > +/**
> > + * sgx_updatesvn() - Issue ENCLS[EUPDATESVN]
> > + * If EPC is empty, this instruction will update CPUSVN to the currently
> > + * loaded microcode update SVN and generate new cryptographic assets.
> 
> This is *NOT* EUPDATESVN. "this instruction" is not what's happening here.
> 
> sgx_updatesvn() _tries_ to update the SVN. Most of the time, there will
> be no update and that's OK. This should only be called with EPC is empty.

Will fix description. 

> 
> > + * Return:
> > + * 0: Success or not supported
> > + * errno on error
> 
> I'm not a big fan of filling thing out just because.
> 
> What value is there in saying "errno on error"?

Will remove. 

> 
> > + */
> > +static int sgx_update_svn(void)
> > +{
> > +	int retry = RDRAND_RETRY_LOOPS;
> > +	int ret;
> > +
> > +	if (!sgx_has_eupdatesvn)
> > +		return 0;
> 
> This looks goofy. Why is it OK to just silently ignore an update
> request? (I know the answer, but it needs to be obvious)

Will add the explicit comment. 

> 
> > +	do {
> > +		ret = __eupdatesvn();
> > +	} while (ret == SGX_INSUFFICIENT_ENTROPY && --retry);
> 
> 
> 	for (i = 0; i < RDRAND_RETRY_LOOPS; i++) {
> 		ret = __eupdatesvn();
> 
> 		/* Stop on success or unexpected errors: */
> 		if (ret != SGX_INSUFFICIENT_ENTROPY)
> 			break;
> 	}

Will change.

> 
> > +	if (!ret || ret == SGX_NO_UPDATE) {
> > +		/*
> > +		 * SVN successfully updated, or it was already up-to-date.
> > +		 * Let users know when the update was successful.
> > +		 */
> > +		if (!ret)
> > +			pr_info("SVN updated successfully\n");
> > +		return 0;
> > +	}
> 
> Isn't this a lot simpler?
> 
> 	if (ret == SGX_NO_UPDATE)
> 		return 0;
> 
> 	if (!ret) {
> 		pr_info("SVN updated successfully\n");
> 		return 0;
> 	}

Sure, will change. 

> 
> > +	/*
> > +	 * EUPDATESVN was called when EPC is empty, all other error
> > +	 * codes are unexcepted except running out of entropy.
> 
> 			^ unexpected
> 
> Spell check, please.

Will fix. 

> 
> > +	 */
> > +	if (ret != SGX_INSUFFICIENT_ENTROPY)
> > +		ENCLS_WARN(ret, "EUPDATESVN");
> > +	return ret;
> > +}
> 
> The indentation here is backwards. The error paths should be indented
> and the success path at the lowest indent whenever possible. This, for
> example:

Will fix. 

> 
> 	if (ret == SGX_NO_UPDATE)
> 		return 0;
> 
> 	if (ret) {
> 		ENCLS_WARN(ret, "EUPDATESVN");
> 		return ret;
> 	}
> 
> 	pr_info("SVN updated successfully\n");
> 	return 0;
> 
> Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I thought it
> was supposed to be horribly rare. Shouldn't we warn on it?

The entropy comes from RDSEED in this case, not RDRAND.
This is not a horribly rare but very realistic failure. 

> 
> > +int sgx_inc_usage_count(void)
> > +{
> 
> No comments, eh?
> 
> What does success _mean_? What does failure mean?

Will add. 

> 
> > +	int ret;
> > +
> > +	if (atomic_inc_not_zero(&sgx_usage_count))
> > +		return 0;
> > +
> > +	guard(mutex)(&sgx_svn_lock);
> > +
> > +	if (atomic_inc_not_zero(&sgx_usage_count))
> > +		return 0;
> > +
> > +	ret = sgx_update_svn();
> > +	if (!ret)
> > +		atomic_inc(&sgx_usage_count);
> > +	return ret;
> > +}
> 
> Gah, this is 100% *NOT* obvious what's going on. Yet there are zero
> comments on it. The lock is uncommented. The atomic is uncommented.
> 
> What does any of this mean? What do the components do?

Will fix. 

> 
> > +
> > +void sgx_dec_usage_count(void)
> > +{
> > +	atomic_dec(&sgx_usage_count);
> > +}
> 
> 
> 
> >  static int __init sgx_init(void)
> >  {
> >  	int ret;
> > @@ -947,6 +1015,8 @@ static int __init sgx_init(void)
> >  	if (sgx_vepc_init() && ret)
> >  		goto err_provision;
> >
> > +	sgx_has_eupdatesvn = (cpuid_eax(SGX_CPUID) &
> SGX_CPUID_EUPDATESVN);
> > +
> >  	return 0;
> >
> >  err_provision:
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> > index d2dad21259a8..f5940393d9bd 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -102,6 +102,9 @@ static inline int __init sgx_vepc_init(void)
> >  }
> >  #endif
> >
> > +int sgx_inc_usage_count(void);
> > +void sgx_dec_usage_count(void);
> > +
> >  void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
> >
> >  #endif /* _X86_SGX_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> > index 7aaa3652e31d..e548de340c2e 100644
> > --- a/arch/x86/kernel/cpu/sgx/virt.c
> > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > @@ -255,12 +255,18 @@ static int sgx_vepc_release(struct inode *inode,
> struct file *file)
> >  	xa_destroy(&vepc->page_array);
> >  	kfree(vepc);
> >
> > +	sgx_dec_usage_count();
> >  	return 0;
> >  }
> >
> >  static int sgx_vepc_open(struct inode *inode, struct file *file)
> >  {
> >  	struct sgx_vepc *vepc;
> > +	int ret;
> > +
> > +	ret = sgx_inc_usage_count();
> > +	if (ret)
> > +		return ret;
> >
> >  	vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
> >  	if (!vepc)
> 
> I think I'd do this in at least 4 patches:
> 
> 1. Introduce the usage count tracking: the atomic and the open/release
>    "hooks", maybe without error handling on the open() side
> 2. Introduce the EUPDATESVN mechanical bits: the CPUID bit, the
>    enumeration, the bool, the new error enum values
> 3. Introduce the mechanical eupdatesvn function. The retry loop and the
>    "no entropy" handling
> 4. Plumb #3 into #1
> 
> #4 is your place to argue if EUPDATESVN failures should cascade to open().

I will use this breakout for the next version, thank you!

Best Regards,
Elena.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-08 14:07     ` Reshetova, Elena
@ 2025-05-08 15:45       ` Dave Hansen
  2025-05-12 12:54         ` Reshetova, Elena
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2025-05-08 15:45 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: jarkko@kernel.org, seanjc@google.com, Huang, Kai,
	linux-sgx@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, Mallick, Asit K, Scarlata, Vincent R, Cai, Chong,
	Aktas, Erdem, Annapurve, Vishal, dionnaglaze@google.com,
	bondarn@google.com, Raynor, Scott

On 5/8/25 07:07, Reshetova, Elena wrote:
...
>> Actually, I think I wrote changelogs for this once upon a time. Could
>> you please go dig those up and use them?
> 
> Could you please suggest where can I find them? Was it for the previous
> submission done by Cathy? 

Yes. There were also some long discussions on an Intel internal mailing
list. I'll send you the details.

>>> https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true
>>
>> These become stale almost immediately. Please also cite the document title.
> 
> Sure, I can add the title, but I dont have another link. 

Right, Intel doesn't provide long-term stable links. Also, the
"explicitVersion" should be able to be removed.

>> Gah, how did this get squished back down to a single patch? It was
>> multiple patches before. There are multiple logical things going on here
>> and they need to be broken out.
> 
> Yes, but the code actually did get smaller, so it didn’t feel
> like it is worth breaking this into different patches. 
> But I will follow your suggestion.

It's not really about the size. It's about making it as reviewable as
possible and minimizing the time a reviewer needs to spend on it. Even
for small things.

>>> +static bool sgx_has_eupdatesvn;
>>
>> We have CPUID "caches" of sorts. Why open code this?
> 
> You mean X86_FEATURE_*? 

Yes.

> SGX CPUID is only defined in SGX code currently (btw, I am not sure
> why they are made special) so it doesn’t support this.

It's only used in SGX code and each subleaf is only used once, at init
time. There's really no reason to add any caching of any parts of it for
a single init-time use.

> Is this a past decision that you want to change? 

*This* patch changes things. It moves the one user from a single
init-time use to a runtime user. It adds a global variable. It's not
about *me* wanting change. It's about this patch changing the status quo.

I don't really care about the #define. That's not the question. The
question is whether we want the a scattered X86_FEATURE bit for
'sgx_has_eupdatesvn' or not.

>>> +static atomic_t sgx_usage_count;
>>
>> Is 32 bits enough for sgx_usage_count? What goes boom when it overflows?
> 
> I can increase the value easily, but even with current number we are talking
> about 2^32 enclaves, i dont think this is realistic to happen in practice. 

I don't mean to be pedantic, but is it 2^32? I thought we had signed
ints underneath atomic_t and we allow negative atomic_t values. Are you
suggesting that since we're just using a counter that the overflows into
the negative space are OK and it doesn't matter until it wraps all the
way back around to 0?

Assuming that you actually mean 2^32... It is it about 2^32 enclaves? Or
2^32 *opens*? Those are very, very different things.

Also, this is kinda the second question that I asked that hasn't really
been answered directly. I really asked it for a good reason, and I'd
really prefer that it be answered, even if you disagree with the premise.

I really want to know what goes boom when it overflows, so I'll go into
detail why it matters.

If it's just that an extra EUPDATESVN gets run and some random SGX user
might see a random #GP, that's not so bad. But if it's kernel crash or
use-after-free, that _is_ bad.

Is 4 bytes (maybe even 0 with the alignment of other things in practice)
worth it to take the problem from "not realistic" to "utterly  impossible".

sizeof(struct file) == 240, so I guess it would take almost 1TB of
'struct file' to overflow the counter, plus the overhead of 2^32 'struct
sgx_encl's. I'd call that "stupid" but not impossible on a bit modern
machine.

But with a atomic64_t, you literally can't overflow it with 'struct
file' in a 64-bit address space because the address space fills up
before the counter overflows. It's in "utterly impossible" territory.

I'll admit, I was rushing through the review and didn't spell all of
that out my first pass around. Yeah, it's a little bit overkill to
explain this for the type of one variable. But, I'm hoping that with all
of that logic laid out, you will consider answering my question this time.

I'd also very much appreciate if you could directly answer my questions
in future reviews, even if they appear silly. It'll make this all go faster.

...
>> 	if (ret == SGX_NO_UPDATE)
>> 		return 0;
>>
>> 	if (ret) {
>> 		ENCLS_WARN(ret, "EUPDATESVN");
>> 		return ret;
>> 	}
>>
>> 	pr_info("SVN updated successfully\n");
>> 	return 0;
>>
>> Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I thought it
>> was supposed to be horribly rare. Shouldn't we warn on it?
> 
> The entropy comes from RDSEED in this case, not RDRAND.
> This is not a horribly rare but very realistic failure. 

Fair enough. If you want to keep it this way, could you run an
experiment and see how realistic it is? Surely, if it's very realistic,
you should be able to show it on real hardware quite easily.

We had a loooooooong discussion about this not so long ago. I believe
older CPUs were much more likely to be able to observe RDSEED failures.
But, even on those CPUs, repeated failures were pretty darn rare. New
CPUs are less likely to observe a single RDSEED failures. They are very
unlikely to see repeated failures.

So really want to know if this is needed in _practice_. Sure,
theoretically, the architecture allows CPUs to fail RDSEED all the time.
But do the CPUs with EUPDATESVN also have RDSEED implementations that
fail easily?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-07 11:14 ` [PATCH v4 1/1] x86/sgx: " Elena Reshetova
  2025-05-07 16:04   ` Dave Hansen
@ 2025-05-08 20:13   ` Jarkko Sakkinen
  2025-05-12 13:54     ` Reshetova, Elena
  1 sibling, 1 reply; 10+ messages in thread
From: Jarkko Sakkinen @ 2025-05-08 20:13 UTC (permalink / raw)
  To: Elena Reshetova
  Cc: dave.hansen, seanjc, kai.huang, linux-sgx, linux-kernel, x86,
	asit.k.mallick, vincent.r.scarlata, chongc, erdemaktas,
	vannapurve, dionnaglaze, bondarn, scott.raynor

On Wed, May 07, 2025 at 02:14:00PM +0300, Elena Reshetova wrote:
  
> diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
> index 7f8d1e11dbee..669e44d61f9f 100644
> --- a/arch/x86/kernel/cpu/sgx/driver.c
> +++ b/arch/x86/kernel/cpu/sgx/driver.c
> @@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file *file)
>  	struct sgx_encl *encl;
>  	int ret;
>  
> +	ret = sgx_inc_usage_count();
> +	if (ret)
> +		return ret;
> +
>  	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
>  	if (!encl)
>  		return -ENOMEM;

The rollback looks broken to me.

Let's clean up error handling a bit:

	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
	if (!encl) {
		ret = -ENOMEM;
		goto err_usage_count;
	}

And later on in the same function:

	ret = init_srcu_struct(&encl->srcu);
	if (ret)
		goto err_encl;

And finally tail:

	return 0;

err_encl:
	kfree(encl);

err_usage_count:
	sgx_dec_usage_count();
	return ret;
}

BR, Jarkko

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-08 15:45       ` Dave Hansen
@ 2025-05-12 12:54         ` Reshetova, Elena
  2025-05-14  7:32           ` Reshetova, Elena
  0 siblings, 1 reply; 10+ messages in thread
From: Reshetova, Elena @ 2025-05-12 12:54 UTC (permalink / raw)
  To: Hansen, Dave
  Cc: jarkko@kernel.org, seanjc@google.com, Huang, Kai,
	linux-sgx@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, Mallick, Asit K, Scarlata, Vincent R, Cai, Chong,
	Aktas, Erdem, Annapurve, Vishal, dionnaglaze@google.com,
	bondarn@google.com, Raynor, Scott

> >>> +static bool sgx_has_eupdatesvn;
> >>
> >> We have CPUID "caches" of sorts. Why open code this?
> >
> > You mean X86_FEATURE_*?
> 
> Yes.
> 
> > SGX CPUID is only defined in SGX code currently (btw, I am not sure
> > why they are made special) so it doesn’t support this.
> 
> It's only used in SGX code and each subleaf is only used once, at init
> time. There's really no reason to add any caching of any parts of it for
> a single init-time use.
> 
> > Is this a past decision that you want to change?
> 
> *This* patch changes things. It moves the one user from a single
> init-time use to a runtime user. It adds a global variable. It's not
> about *me* wanting change. It's about this patch changing the status quo.
> 
> I don't really care about the #define. That's not the question. The
> question is whether we want the a scattered X86_FEATURE bit for
> 'sgx_has_eupdatesvn' or not.

I don't have a strong opinion about this.
What is your suggestion here? 

> 
> >>> +static atomic_t sgx_usage_count;
> >>
> >> Is 32 bits enough for sgx_usage_count? What goes boom when it
> overflows?
> >
> > I can increase the value easily, but even with current number we are talking
> > about 2^32 enclaves, i dont think this is realistic to happen in practice.
> 
> I don't mean to be pedantic, but is it 2^32? I thought we had signed
> ints underneath atomic_t and we allow negative atomic_t values. Are you
> suggesting that since we're just using a counter that the overflows into
> the negative space are OK and it doesn't matter until it wraps all the
> way back around to 0?

Yes, you would think that wrapping to negative is ok in this case as soon as
dont end up hitting 0. But when it does happen, we will think that EPC is empty
and attempt to execute EUPDATESVN. In this case, eupdatevsn fails with
SGX_EPC_NOT_READY or SGX_LOCKFAIL errors depending on the state
of the system, see the microcode flow: 

IF (Other instruction is accessing EPC) THEN
 RFLAGS.ZF := 1
 RAX := SGX_LOCKFAIL;
 GOTO ERROR_EXIT;
FI
(* Verify EPC is ready *)
IF (the EPC contains any valid pages) THEN
 RFLAGS.ZF := 1;
 RAX := SGX_EPC_NOT_READY;
 GOTO ERROR_EXIT;
FI

This is ok on its own, but in the case any *other* ENCLS
instruction is executing in parallel during this, it might #GP,
which is going to be unexpected behaviour for legacy SW
and smth we would like to prevent. So, I would state that
this counter must be designed against possible overflows
because of this.

> 
> Assuming that you actually mean 2^32... It is it about 2^32 enclaves? Or
> 2^32 *opens*? Those are very, very different things.

We will increment counter on every open, so 2^32 opens at the same time,
because each release drops a counter. 

> 
> Also, this is kinda the second question that I asked that hasn't really
> been answered directly. I really asked it for a good reason, and I'd
> really prefer that it be answered, even if you disagree with the premise.
> 
> I really want to know what goes boom when it overflows, so I'll go into
> detail why it matters.
> 
> If it's just that an extra EUPDATESVN gets run and some random SGX user
> might see a random #GP, that's not so bad. But if it's kernel crash or
> use-after-free, that _is_ bad.

Well, I would still say that a random legacy SGX user getting a #GP is not a good
behaviour also, but yes, this is the worst case. 

> 
> Is 4 bytes (maybe even 0 with the alignment of other things in practice)
> worth it to take the problem from "not realistic" to "utterly  impossible".
> 
> sizeof(struct file) == 240, so I guess it would take almost 1TB of
> 'struct file' to overflow the counter, plus the overhead of 2^32 'struct
> sgx_encl's. I'd call that "stupid" but not impossible on a bit modern
> machine.
> 
> But with a atomic64_t, you literally can't overflow it with 'struct
> file' in a 64-bit address space because the address space fills up
> before the counter overflows. It's in "utterly impossible" territory.

Yes, so I will change it to atomic64_t. 

> 
> I'll admit, I was rushing through the review and didn't spell all of
> that out my first pass around. Yeah, it's a little bit overkill to
> explain this for the type of one variable. But, I'm hoping that with all
> of that logic laid out, you will consider answering my question this time.
> 
> I'd also very much appreciate if you could directly answer my questions
> in future reviews, even if they appear silly. It'll make this all go faster.

Yes, of course, I am sorry not to answer to this point properly before.
Hopefully you have an answer now. 

> 
> ...
> >> 	if (ret == SGX_NO_UPDATE)
> >> 		return 0;
> >>
> >> 	if (ret) {
> >> 		ENCLS_WARN(ret, "EUPDATESVN");
> >> 		return ret;
> >> 	}
> >>
> >> 	pr_info("SVN updated successfully\n");
> >> 	return 0;
> >>
> >> Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I thought
> it
> >> was supposed to be horribly rare. Shouldn't we warn on it?
> >
> > The entropy comes from RDSEED in this case, not RDRAND.
> > This is not a horribly rare but very realistic failure.
> 
> Fair enough. If you want to keep it this way, could you run an
> experiment and see how realistic it is? Surely, if it's very realistic,
> you should be able to show it on real hardware quite easily.
> 
> We had a loooooooong discussion about this not so long ago. I believe
> older CPUs were much more likely to be able to observe RDSEED failures.
> But, even on those CPUs, repeated failures were pretty darn rare. New
> CPUs are less likely to observe a single RDSEED failures. They are very
> unlikely to see repeated failures.
> 
> So really want to know if this is needed in _practice_. Sure,
> theoretically, the architecture allows CPUs to fail RDSEED all the time.
> But do the CPUs with EUPDATESVN also have RDSEED implementations that
> fail easily?

This was the recent discussion I am aware we had on this matter:
https://lkml.org/lkml/2024/2/5/1595
The measurements were done for older platform (skylake), but I am not
aware of any architectural changes since that time to improve this. 


Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-08 20:13   ` Jarkko Sakkinen
@ 2025-05-12 13:54     ` Reshetova, Elena
  0 siblings, 0 replies; 10+ messages in thread
From: Reshetova, Elena @ 2025-05-12 13:54 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Hansen, Dave, seanjc@google.com, Huang, Kai,
	linux-sgx@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, Mallick, Asit K, Scarlata, Vincent R, Cai, Chong,
	Aktas, Erdem, Annapurve, Vishal, dionnaglaze@google.com,
	bondarn@google.com, Raynor, Scott

> On Wed, May 07, 2025 at 02:14:00PM +0300, Elena Reshetova wrote:
> 
> > diff --git a/arch/x86/kernel/cpu/sgx/driver.c
> b/arch/x86/kernel/cpu/sgx/driver.c
> > index 7f8d1e11dbee..669e44d61f9f 100644
> > --- a/arch/x86/kernel/cpu/sgx/driver.c
> > +++ b/arch/x86/kernel/cpu/sgx/driver.c
> > @@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file
> *file)
> >  	struct sgx_encl *encl;
> >  	int ret;
> >
> > +	ret = sgx_inc_usage_count();
> > +	if (ret)
> > +		return ret;
> > +
> >  	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
> >  	if (!encl)
> >  		return -ENOMEM;
> 
> The rollback looks broken to me.
> 
> Let's clean up error handling a bit:
> 
> 	encl = kzalloc(sizeof(*encl), GFP_KERNEL);
> 	if (!encl) {
> 		ret = -ENOMEM;
> 		goto err_usage_count;
> 	}
> 
> And later on in the same function:
> 
> 	ret = init_srcu_struct(&encl->srcu);
> 	if (ret)
> 		goto err_encl;
> 
> And finally tail:
> 
> 	return 0;
> 
> err_encl:
> 	kfree(encl);
> 
> err_usage_count:
> 	sgx_dec_usage_count();
> 	return ret;
> }


Yes, thank you for catching this!
Will fix. 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-12 12:54         ` Reshetova, Elena
@ 2025-05-14  7:32           ` Reshetova, Elena
  2025-05-15 15:37             ` Dave Hansen
  0 siblings, 1 reply; 10+ messages in thread
From: Reshetova, Elena @ 2025-05-14  7:32 UTC (permalink / raw)
  To: Reshetova, Elena, Hansen, Dave
  Cc: jarkko@kernel.org, seanjc@google.com, Huang, Kai,
	linux-sgx@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, Mallick, Asit K, Scarlata, Vincent R, Cai, Chong,
	Aktas, Erdem, Annapurve, Vishal, dionnaglaze@google.com,
	bondarn@google.com, Raynor, Scott

> > >>> +static bool sgx_has_eupdatesvn;
> > >>
> > >> We have CPUID "caches" of sorts. Why open code this?
> > >
> > > You mean X86_FEATURE_*?
> >
> > Yes.
> >
> > > SGX CPUID is only defined in SGX code currently (btw, I am not sure
> > > why they are made special) so it doesn’t support this.
> >
> > It's only used in SGX code and each subleaf is only used once, at init
> > time. There's really no reason to add any caching of any parts of it for
> > a single init-time use.
> >
> > > Is this a past decision that you want to change?
> >
> > *This* patch changes things. It moves the one user from a single
> > init-time use to a runtime user. It adds a global variable. It's not
> > about *me* wanting change. It's about this patch changing the status quo.
> >
> > I don't really care about the #define. That's not the question. The
> > question is whether we want the a scattered X86_FEATURE bit for
> > 'sgx_has_eupdatesvn' or not.
> 
> I don't have a strong opinion about this.
> What is your suggestion here?
> 
> >
> > >>> +static atomic_t sgx_usage_count;
> > >>
> > >> Is 32 bits enough for sgx_usage_count? What goes boom when it
> > overflows?
> > >
> > > I can increase the value easily, but even with current number we are
> talking
> > > about 2^32 enclaves, i dont think this is realistic to happen in practice.
> >
> > I don't mean to be pedantic, but is it 2^32? I thought we had signed
> > ints underneath atomic_t and we allow negative atomic_t values. Are you
> > suggesting that since we're just using a counter that the overflows into
> > the negative space are OK and it doesn't matter until it wraps all the
> > way back around to 0?
> 
> Yes, you would think that wrapping to negative is ok in this case as soon as
> dont end up hitting 0. But when it does happen, we will think that EPC is
> empty
> and attempt to execute EUPDATESVN. In this case, eupdatevsn fails with
> SGX_EPC_NOT_READY or SGX_LOCKFAIL errors depending on the state
> of the system, see the microcode flow:
> 
> IF (Other instruction is accessing EPC) THEN
>  RFLAGS.ZF := 1
>  RAX := SGX_LOCKFAIL;
>  GOTO ERROR_EXIT;
> FI
> (* Verify EPC is ready *)
> IF (the EPC contains any valid pages) THEN
>  RFLAGS.ZF := 1;
>  RAX := SGX_EPC_NOT_READY;
>  GOTO ERROR_EXIT;
> FI
> 
> This is ok on its own, but in the case any *other* ENCLS
> instruction is executing in parallel during this, it might #GP,
> which is going to be unexpected behaviour for legacy SW
> and smth we would like to prevent. So, I would state that
> this counter must be designed against possible overflows
> because of this.
> 
> >
> > Assuming that you actually mean 2^32... It is it about 2^32 enclaves? Or
> > 2^32 *opens*? Those are very, very different things.
> 
> We will increment counter on every open, so 2^32 opens at the same time,
> because each release drops a counter.
> 
> >
> > Also, this is kinda the second question that I asked that hasn't really
> > been answered directly. I really asked it for a good reason, and I'd
> > really prefer that it be answered, even if you disagree with the premise.
> >
> > I really want to know what goes boom when it overflows, so I'll go into
> > detail why it matters.
> >
> > If it's just that an extra EUPDATESVN gets run and some random SGX user
> > might see a random #GP, that's not so bad. But if it's kernel crash or
> > use-after-free, that _is_ bad.
> 
> Well, I would still say that a random legacy SGX user getting a #GP is not a
> good
> behaviour also, but yes, this is the worst case.
> 
> >
> > Is 4 bytes (maybe even 0 with the alignment of other things in practice)
> > worth it to take the problem from "not realistic" to "utterly  impossible".
> >
> > sizeof(struct file) == 240, so I guess it would take almost 1TB of
> > 'struct file' to overflow the counter, plus the overhead of 2^32 'struct
> > sgx_encl's. I'd call that "stupid" but not impossible on a bit modern
> > machine.
> >
> > But with a atomic64_t, you literally can't overflow it with 'struct
> > file' in a 64-bit address space because the address space fills up
> > before the counter overflows. It's in "utterly impossible" territory.
> 
> Yes, so I will change it to atomic64_t.
> 
> >
> > I'll admit, I was rushing through the review and didn't spell all of
> > that out my first pass around. Yeah, it's a little bit overkill to
> > explain this for the type of one variable. But, I'm hoping that with all
> > of that logic laid out, you will consider answering my question this time.
> >
> > I'd also very much appreciate if you could directly answer my questions
> > in future reviews, even if they appear silly. It'll make this all go faster.
> 
> Yes, of course, I am sorry not to answer to this point properly before.
> Hopefully you have an answer now.
> 
> >
> > ...
> > >> 	if (ret == SGX_NO_UPDATE)
> > >> 		return 0;
> > >>
> > >> 	if (ret) {
> > >> 		ENCLS_WARN(ret, "EUPDATESVN");
> > >> 		return ret;
> > >> 	}
> > >>
> > >> 	pr_info("SVN updated successfully\n");
> > >> 	return 0;
> > >>
> > >> Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I
> thought
> > it
> > >> was supposed to be horribly rare. Shouldn't we warn on it?
> > >
> > > The entropy comes from RDSEED in this case, not RDRAND.
> > > This is not a horribly rare but very realistic failure.
> >
> > Fair enough. If you want to keep it this way, could you run an
> > experiment and see how realistic it is? Surely, if it's very realistic,
> > you should be able to show it on real hardware quite easily.
> >
> > We had a loooooooong discussion about this not so long ago. I believe
> > older CPUs were much more likely to be able to observe RDSEED failures.
> > But, even on those CPUs, repeated failures were pretty darn rare. New
> > CPUs are less likely to observe a single RDSEED failures. They are very
> > unlikely to see repeated failures.
> >
> > So really want to know if this is needed in _practice_. Sure,
> > theoretically, the architecture allows CPUs to fail RDSEED all the time.
> > But do the CPUs with EUPDATESVN also have RDSEED implementations that
> > fail easily?
> 
> This was the recent discussion I am aware we had on this matter:
> https://lkml.org/lkml/2024/2/5/1595
> The measurements were done for older platform (skylake), but I am not
> aware of any architectural changes since that time to improve this.

And to make it more concrete, I made some simple tests based
on program for stress testing rdrand/rdseed that was discussed in that
threat earlier: https://lkml.org/lkml/2024/2/6/746 
Using this stress testing and enough threads, I can make EUPDATESVN fail
reliably and quite easily even with 10 time retry loop by kernel. So,
given this, what should be the correct behaviour be here? WARN_ON
is not justified imo. But should we return an error to userspace and
block open()? 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
  2025-05-14  7:32           ` Reshetova, Elena
@ 2025-05-15 15:37             ` Dave Hansen
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Hansen @ 2025-05-15 15:37 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: jarkko@kernel.org, seanjc@google.com, Huang, Kai,
	linux-sgx@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, Mallick, Asit K, Scarlata, Vincent R, Cai, Chong,
	Aktas, Erdem, Annapurve, Vishal, dionnaglaze@google.com,
	bondarn@google.com, Raynor, Scott

On 5/14/25 00:32, Reshetova, Elena wrote:
>> This was the recent discussion I am aware we had on this matter:
>> https://lkml.org/lkml/2024/2/5/1595
>> The measurements were done for older platform (skylake), but I am not
>> aware of any architectural changes since that time to improve this.
> And to make it more concrete, I made some simple tests based
> on program for stress testing rdrand/rdseed that was discussed in that
> threat earlier: https://lkml.org/lkml/2024/2/6/746 
> Using this stress testing and enough threads, I can make EUPDATESVN fail
> reliably and quite easily even with 10 time retry loop by kernel. So,
> given this, what should be the correct behaviour be here? WARN_ON
> is not justified imo. But should we return an error to userspace and
> block open()? 

No, we can't block open. Just silently fail the EUPDATESVN for now.

We can have a separate bikeshedding session about how to warn about the
failure (or not).

I can also hear Sean screaming from across the room that silent failure
is going to be really nasty to debug. Once we settle on this set, we can
have another discussion on an explicit update ABI so that folks who
truly control their environment can stop all the guests and explicitly
run EUPDATESVN at a nice convenient time. *THAT* interface can punt
retries out to userspace and warn for sure.

But, one thing at a time, please. This set will cover *normal* users:
those who don't know exactly where each SGX user is and what their
lifetimes are. Oh, and normal users also aren't under RDSEED attacks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-05-15 15:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07 11:13 [PATCH v4 0/1] Enable automatic SVN updates for SGX enclaves Elena Reshetova
2025-05-07 11:14 ` [PATCH v4 1/1] x86/sgx: " Elena Reshetova
2025-05-07 16:04   ` Dave Hansen
2025-05-08 14:07     ` Reshetova, Elena
2025-05-08 15:45       ` Dave Hansen
2025-05-12 12:54         ` Reshetova, Elena
2025-05-14  7:32           ` Reshetova, Elena
2025-05-15 15:37             ` Dave Hansen
2025-05-08 20:13   ` Jarkko Sakkinen
2025-05-12 13:54     ` Reshetova, Elena

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox