All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/sgx: Periodically yield in EPC sanitization to unblock rcu_tasks GP
@ 2026-06-18 10:04 Jun Miao
  2026-06-19  0:41 ` Jarkko Sakkinen
  0 siblings, 1 reply; 2+ messages in thread
From: Jun Miao @ 2026-06-18 10:04 UTC (permalink / raw)
  To: jarkko, dave.hansen
  Cc: linux-sgx, linux-kernel, fan.du, challvy.tee, jun.miao

During early boot, ksgxd (Intel Software Guard Extensions Kernel Thread)
iterates over all post-kexec dirty EPC pages in a tight loop calling
cond_resched() after each page.  But, on isolated CPUs
(a common configuration in cloud VMs), cond_resched() never triggers a
real context switch because TIF_NEED_RESCHED is not set when no competing
runnable task exists on that CPU.

synchronize_rcu_tasks(), invoked by BPF LSM during initialization, must
wait for every task that was running at the start of the grace period to
pass through a quiescent state (a voluntary sleep or preemption point).
If ksgxd never leaves the CPU, the rcu_tasks grace period stalls, causing
boot delays exceeding 60 seconds on machines with large EPC regions.

Fix this by introducing SGX_SANITIZE_RESCHED_INTERVAL (32768) and forcing
ksgxd to sleep for one jiffy every that many pages, guaranteeing that an
rcu_tasks quiescent state is reached in bounded time regardless of CPU
isolation.  Keep cond_resched() for all other iterations.

Without this patch, instead, virtual machines (VMs) experience a long OS boot times:

[    4.110549] systemd[1]: Detected architecture x86-64.
[    4.115279] systemd[1]: Hostname set to <i2bp1g0g0m0i8406er0g1zX2>.
[    4.115554] systemd[1]: Installed transient /etc/machine-id file.
[   14.262158] rcu_tasks_wait_gp: rcu_tasks grace period number 1 (since boot) is 10087 jiffies old.
[   14.374158] rcu_tasks_wait_gp: rcu_tasks grace period number 1 (since boot) is 40199 jiffies old.
[  134.806157] rcu_tasks_wait_gp: rcu_tasks grace period number 1 (since boot) is 130631 jiffies old.
[  248.086158] INFO: task systemd:1 blocked for more than 122 seconds.
[  248.086491] Not tainted 6.8.0-90-generic #91-Ubuntu
[  248.086739] 'echo 0 > /proc/sys/kernel/hung_task_timeout_secs' disables this message.
[  248.086993] task:systemd    state:D stack:0    pid:1    tpid:1    ppid:0    flags:0x00000002
[  248.087274] Call Trace:
[  248.087434] <TASK>
[  248.087557] __schedule+0x27c/0x6b0
[  248.087770] schedule+0x33/0x110
[  248.087939] schedule_timeout+0x157/0x170
[  248.088120] wait_for_completion+0x88/0x150
[  248.088304] __wait_rcu_gp+0x17e/0x190
[  248.088481] synchronize_rcu_tasks_generic+0x64/0x60
[  248.088672] ? __pfx_call_rcu_tasks+0x10/0x10
[  248.088858] ? __pfx_wakeme_after_rcu+0x10/0x10
[  248.089047] synchronize_rcu_tasks+0x15/0x20
[  248.089260] register_ftrace_direct+0x31f/0x350
[  248.089445] ? __pfx_bpf_lsm_file_open+0x10/0x10
[  248.089629] bpf_trampoline_update+0x469/0x650
[  248.089814] ? 0xffffffffffffffff
[  248.089988] ? 0xffffffffffffffff
[  248.090153] __bpf_trampoline_link_prog+0x10d/0x330
[  248.090339] bpf_trampoline_link_prog+0x33/0x60
[  248.090518] bpf_tracing_prog_attach+0x3c5/0x5f0
[  248.090699] link_create+0x1a5/0x280
[  248.090886] ? security_bpf+0x3c/0x70
[  248.091101] __sys_bpf+0x4ae/0x10
[  248.091312] __x64_sys_bpf+0x1a/0x30
[  248.091477] x64_sys_call+0x199/0x250
[  248.091647] do_syscall_64+0x7f/0x180
[  248.091818] ? arch_exit_to_user_mode_prepare.isa.0+0x1a/0x60
[  248.092022] ? irqentry_exit_to_user_mode+0x38/0x1e0
[  248.092246] ? irqentry_exit+0x43/0x50
[  248.092401] entry_SYSCALL_64_after_hwframe+0x78/0x80
[  248.092590] RIP: 0033:0x7b53e592728d
[  248.092756] RSP: 002b:00007ffdaa9d696 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[  248.092856] RAX: ffffffffffffffda RBX: 00007ffdaa9d696 RCX: 00007b53e592728d
[  248.092956] RDX: 0000000000000000 RSI: 00007ffdaa9d696 RDI: 0000000000000001
[  248.093056] RBP: 00007ffdaa9d696 R08: 00007b53e5a03a8 R09: 00007ffdaa9d696
[  248.093156] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  248.093256] R13: 0000000000000000 R14: 00005d81ed2cfd0 R15: 00005d81ed2b7ec0
[  248.093406] </TASK>

Reported-by: challvy <challvy.tee@gmail.com>
Link: https://github.com/systemd/systemd/issues/40423
Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections")
Co-developed-by: Fan Du <fan.du@intel.com>
Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: Jun Miao <jun.miao@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4505f808af5e..4642d2d47186 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -52,6 +52,13 @@ static struct sgx_numa_node *sgx_numa_nodes;
 
 static LIST_HEAD(sgx_dirty_page_list);
 
+/*
+ * Force a voluntary context switch every SGX_SANITIZE_RESCHED_INTERVAL
+ * iterations to let synchronize_rcu_tasks() (e.g. called by BPF LSM at
+ * init) complete its grace period.
+ */
+#define SGX_SANITIZE_RESCHED_INTERVAL	(1 << 15)
+
 /*
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
@@ -63,6 +70,7 @@ static LIST_HEAD(sgx_dirty_page_list);
 static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
 	unsigned long left_dirty = 0;
+	unsigned long count = 0;
 	struct sgx_epc_page *page;
 	LIST_HEAD(dirty);
 	int ret;
@@ -72,6 +80,18 @@ static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		if (kthread_should_stop())
 			return 0;
 
+		/*
+		 * On isolated CPUs cond_resched() does not trigger a real
+		 * context switch when no competing runnable task exists.
+		 * Periodically force ksgxd to sleep so that synchronize_rcu_tasks()
+		 * (e.g. BPF LSM) can complete the grace period in bounded time.
+		 * Keep cond_resched() between forced sleeps for higher-priority tasks.
+		 */
+		if (!(++count & (SGX_SANITIZE_RESCHED_INTERVAL - 1)))
+			schedule_timeout_interruptible(1);
+		else
+			cond_resched();
+
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
 		/*
@@ -105,8 +125,6 @@ static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 			list_move_tail(&page->list, &dirty);
 			left_dirty++;
 		}
-
-		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread


end of thread, other threads:[~2026-06-19  0:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 10:04 [PATCH] x86/sgx: Periodically yield in EPC sanitization to unblock rcu_tasks GP Jun Miao
2026-06-19  0:41 ` Jarkko Sakkinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.