[PATCH v4] x86/sgx: Fix RCU Tasks stall in EPC sanitization loop

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4] x86/sgx: Fix RCU Tasks stall in EPC sanitization loop
@ 2026-06-24 14:20 Jun Miao
  0 siblings, 0 replies; only message in thread
From: Jun Miao @ 2026-06-24 14:20 UTC (permalink / raw)
  To: jarkko, dave.hansen, kai.huang
  Cc: challvy.tee, fan.du, linux-kernel, linux-sgx, qiang.zhang,
	jun.miao

The kernel resets all EPC pages to a clean state in a loop before using them
for enclaves.  The number of EPC pages could be large (e.g., GBs) thus
resetting them could take a fair amount of time.  Because of that, during
early boot, the kernel resets EPC pages through a kernel thread ksgxd() and
there's a cond_resched() after resetting each EPC page.

This is fine in most cases, but becomes a problem when there's other kernel
code waiting for RCU-Tasks grace period but the cond_resched() in ksgxd()
never triggers rescheduling.  Because cond_resched() doesn't report quiescent
state when it doesn't trigger rescheduling, the thread that is waiting for
RCU-Tasks grace period will need to wait until all EPC pages are reset.

For instance, BPF LSM subsystem can invoke synchronize_rcu_tasks() at kernel
boot time.  A VM with a large EPC assigned and have BPF LSM enabled can take
a long time to boot, with a call trace triggered:

    rcu_tasks_wait_gp: rcu_tasks grace period number 1 (since boot) is 130631 jiffies old.
    INFO: task systemd:1 blocked for more than 122 seconds.
    ...
    task:systemd  state:D stack:0  pid:1  tpid:1  ppid:0  flags:0x00000002
    Call Trace:
    ...
    schedule_timeout+0x157/0x170
    wait_for_completion+0x88/0x150
    __wait_rcu_gp+0x17e/0x190
    synchronize_rcu_tasks_generic+0x64/0x60
    ...
    synchronize_rcu_tasks+0x15/0x20
    register_ftrace_direct+0x31f/0x350
    ...
    bpf_trampoline_link_prog+0x33/0x60
    bpf_tracing_prog_attach+0x3c5/0x5f0

Replace cond_resched() with cond_resched_tasks_rcu_qs() which explicitly report quiescent
regardless whether actual rescheduling is triggered.  Resetting all EPC pages in ksgxd()
isn't performance critical so the extra cost of cond_resched_tasks_rcu_qs() isn't a problem.

Tests showed this reduced the VM kernel boot time from ~50s to ~700ms.

Reported-by: Challvy Tee <challvy.tee@gmail.com>
Link: https://github.com/systemd/systemd/issues/40423
Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections")
Tested-by: Challvy Tee <challvy.tee@gmail.com>
Suggested-by: Kai Huang <kai.huang@intel.com>
Co-developed-by: Fan Du <fan.du@intel.com>
Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: Jun Miao <jun.miao@intel.com>

---
v1 -> v2:
 - Clarify the RCU Tasks stall root cause.
 - Use cond_resched_rcu_qs() following the Kai`s suggestion.

v2 -> v3:
 - cee439398933 ("rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs()")

v3 ->v4:
 - Trim down/rewrite changelog following Kai`s suggestion.

---
 arch/x86/kernel/cpu/sgx/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4505f808af5e..7ba3d0a5a05d 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -106,7 +106,7 @@ static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 			left_dirty++;
 		}

-		cond_resched();
+		cond_resched_tasks_rcu_qs();
 	}

 	list_splice(&dirty, dirty_page_list);
-- 
2.32.0

^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-06-24 14:19 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 14:20 [PATCH v4] x86/sgx: Fix RCU Tasks stall in EPC sanitization loop Jun Miao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.