Linux EFI development
 help / color / mirror / Atom feed
* [PATCH 0/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
@ 2026-06-09 11:55 Breno Leitao
  2026-06-09 11:55 ` [PATCH 1/2] " Breno Leitao
  2026-06-09 11:55 ` [PATCH 2/2] efi/runtime-wrappers: disable EFI runtime services after a hang Breno Leitao
  0 siblings, 2 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-09 11:55 UTC (permalink / raw)
  To: Ard Biesheuvel, Ilias Apalodimas
  Cc: linux-efi, linux-kernel, Breno Leitao, kernel-team

When an EFI runtime service call hangs in firmware, the kworker on
efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
The kernel currently waits indefinitely on the completion, and the
caller holds efi_runtime_lock for the duration, so every subsequent
EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
handlers, ...) is wedged until reboot. The only externally visible
symptom is a "workqueue lockup" message and userspace processes
piling up uninterruptibly on the semaphore.

A real example from one of our NVIDIA Grace hosts:

  BUG: workqueue lockup - pool cpus=28 node=0 flags=0x0 nice=0 stuck for 127s!
  ...
  CPU: 28 PID: 590 Comm: kworker/u288:6
  Workqueue: efi_rts_wq efi_call_rts
  Call trace:
   0x4052f11ecc (P)
   0x4052f10ed4
   ...
   __efi_rt_asm_wrapper+0x50/0x78
   efi_call_rts+0x178/0x240
   process_scheduled_works+0x17c/0x420
   worker_thread+0x184/0x4d8
   kthread+0xcc/0x1f8
   ret_from_fork+0x10/0x20

PC and LR are inside EFI runtime services firmware memory; firmware
never returned; the worker stayed stuck across the 127s / 157s / 188s
"workqueue lockup" reports until external monitoring eventually rebooted
the host.

This series doesn't fix the firmware bug - that's vendor territory -
but it stops one stuck EFI call from taking the rest of userspace
down with it, and turns a generic stalled-task mystery into an
unambiguous "EFI firmware is at fault" signal in dmesg, which is
especially valuable at fleet scale where the same symptom could
otherwise be attributed to dozens of unrelated stalls.

Patch 1 bounds the wait at 120 seconds via wait_for_completion_timeout().
On timeout it logs the wedged runtime service id and returns
EFI_TIMEOUT to the caller instead of letting the task hang forever.

Patch 2 introduces the efi_rts_dead flag set on timeout and checked
at the entry of __efi_queue_work() so all subsequent callers fail
fast with EFI_DEVICE_ERROR rather than each paying another 120
seconds. The flag is also required for correctness - without it the
next caller after a timeout walks into INIT_WORK() and
init_completion() on the work_struct and completion the leaked
worker still owns. Patch 1 and patch 2 should land together;
reviewers may prefer to squash them.

The wedged worker is intentionally leaked - it is still inside
firmware and cannot be cancelled - and the shared efi_rts_work is
abandoned to it. EFI runtime services are unavailable until reboot,
but the rest of userspace keeps running.

Known limitation: the union efi_rts_args that the worker receives
contains pointers into the caller's stack frame (the compound literal
in efi_queue_work() and the in/out buffers it points to, e.g. *tm in
GetTime). Once the caller returns -EIO and unwinds, those slots are
reusable. If firmware eventually unblocks and writes the output
buffers after the timeout has fired, the writes land in whatever now
occupies that memory. In practice firmware that hangs for more than
120 seconds tends to stay hung, but the trade-off is real. A
follow-up bouncing args and output buffers through kmalloc would
close this gap.

Tested under virtme-ng + OVMF with a debug hook that hangs one
runtime service on demand: pr_err fires at +120s, the syscall that
triggered it (mount -t efivarfs) returns with EFI_TIMEOUT
(status=0x8000000000000012) propagated through efivars instead of
blocking indefinitely.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Breno Leitao (2):
      efi/runtime-wrappers: bound the wait for EFI runtime service calls
      efi/runtime-wrappers: disable EFI runtime services after a hang

 drivers/firmware/efi/runtime-wrappers.c | 35 ++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260609-efi_timeout-6f51d5bbcfb7

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
  2026-06-09 11:55 [PATCH 0/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls Breno Leitao
@ 2026-06-09 11:55 ` Breno Leitao
  2026-06-11 10:21   ` Ard Biesheuvel
  2026-06-09 11:55 ` [PATCH 2/2] efi/runtime-wrappers: disable EFI runtime services after a hang Breno Leitao
  1 sibling, 1 reply; 7+ messages in thread
From: Breno Leitao @ 2026-06-09 11:55 UTC (permalink / raw)
  To: Ard Biesheuvel, Ilias Apalodimas
  Cc: linux-efi, linux-kernel, Breno Leitao, kernel-team

When an EFI runtime service call hangs in firmware, the kworker on
efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
The kernel currently waits indefinitely on the completion, and the
caller holds efi_runtime_lock for the duration, so every subsequent
EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
handlers, ...) is wedged until reboot. The only externally visible
symptom is a "workqueue lockup" message and userspace processes piling
up uninterruptibly on the semaphore.

Replace the indefinite wait_for_completion() in __efi_queue_work()
with wait_for_completion_timeout() bounded by EFI_RTS_TIMEOUT
(120 seconds). On timeout, log the wedged runtime service id and
return EFI_TIMEOUT to the caller.

The wedged worker is intentionally leaked - it is still inside
firmware and cannot be cancelled - and the shared efi_rts_work is
abandoned to it. EFI runtime services become unavailable until reboot;
the alternative is permanent userspace hang.

This patch should land together with the follow-up that introduces
the efi_rts_dead fast-fail flag: between the two, a subsequent
__efi_queue_work() caller will re-INIT_WORK() and re-init_completion()
on the work_struct and completion that the leaked worker still owns,
which can corrupt workqueue state and let the next caller observe
the leaked call's status as if it were its own. The split exists for
review clarity; reviewers may prefer to squash.

Known limitation: the union efi_rts_args that __efi_queue_work() hands
to the worker contains pointers into the caller's stack frame (the
compound literal in efi_queue_work() and the in/out buffers it points
to, e.g. *tm in GetTime). Once the caller returns -EIO and unwinds,
those slots are reusable. If firmware eventually unblocks and writes
to the output buffers after the timeout has fired, the writes land in
whatever now occupies that memory. In practice firmware that hangs for
more than 120 seconds tends to stay hung. A follow-up bouncing args
and output buffers through kmalloc would close this gap.

At fleet scale this turns a generic "workqueue lockup" or stalled-task
mystery into an unambiguous "EFI firmware is at fault" signal in
dmesg.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/firmware/efi/runtime-wrappers.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index da8d29621644..6ce6d094066e 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -118,6 +118,14 @@ union efi_rts_args {
 
 struct efi_runtime_work efi_rts_work;
 
+/*
+ * Upper bound on how long we wait for a single EFI runtime service
+ * call to finish before declaring firmware wedged. Chosen to be longer
+ * than any plausible legitimate call (including UpdateCapsule on slow
+ * SPI-NOR) while still bounding userspace wait time.
+ */
+#define EFI_RTS_TIMEOUT		(120 * HZ)
+
 /*
  * efi_queue_work:	Queue EFI runtime service call and wait for completion
  * @_rts:		EFI runtime service function identifier
@@ -338,10 +346,16 @@ static efi_status_t __efi_queue_work(enum efi_rts_ids id,
 	 * queue_work() returns 0 if work was already on queue,
 	 * _ideally_ this should never happen.
 	 */
-	if (queue_work(efi_rts_wq, &efi_rts_work.work))
-		wait_for_completion(&efi_rts_work.efi_rts_comp);
-	else
+	if (!queue_work(efi_rts_wq, &efi_rts_work.work)) {
 		pr_err("Failed to queue work to efi_rts_wq.\n");
+		goto exit;
+	}
+
+	if (!wait_for_completion_timeout(&efi_rts_work.efi_rts_comp,
+					 EFI_RTS_TIMEOUT)) {
+		pr_err("EFI runtime service %d wedged in firmware\n", id);
+		return EFI_TIMEOUT;
+	}
 
 	WARN_ON_ONCE(efi_rts_work.status == EFI_ABORTED);
 exit:

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] efi/runtime-wrappers: disable EFI runtime services after a hang
  2026-06-09 11:55 [PATCH 0/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls Breno Leitao
  2026-06-09 11:55 ` [PATCH 1/2] " Breno Leitao
@ 2026-06-09 11:55 ` Breno Leitao
  1 sibling, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-09 11:55 UTC (permalink / raw)
  To: Ard Biesheuvel, Ilias Apalodimas
  Cc: linux-efi, linux-kernel, Breno Leitao, kernel-team

Once a runtime call has tripped EFI_RTS_TIMEOUT the firmware is
wedged and any subsequent call would also hang for 120 seconds
before timing out, blocking userspace each time.

Introduce a one-shot efi_rts_dead flag set on timeout, and check it
at the entry of __efi_queue_work() so further calls fail fast with
EFI_DEVICE_ERROR.

Beyond the wall-clock saving, the flag is required for correctness:
without it the next __efi_queue_work() caller after a timeout walks
into INIT_WORK() and init_completion() on the work_struct and
completion that the leaked worker still owns, which can corrupt the
workqueue's bookkeeping for the leaked work and let the next caller
observe the leaked call's status as if it were its own.

The flag is intentionally never cleared: the worker that wedged is
leaked inside firmware, so reusing efi_rts_work is unsafe.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/firmware/efi/runtime-wrappers.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index 6ce6d094066e..a76ba819855f 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -126,6 +126,16 @@ struct efi_runtime_work efi_rts_work;
  */
 #define EFI_RTS_TIMEOUT		(120 * HZ)
 
+/*
+ * Set once an EFI runtime service call has hung past EFI_RTS_TIMEOUT.
+ * Subsequent __efi_queue_work() callers fail fast: the wedged worker
+ * is still inside firmware, owns efi_rts_work, and any reuse of the
+ * shared work_struct or completion would corrupt the workqueue's
+ * bookkeeping for the leaked work. The flag is intentionally never
+ * cleared - recovery requires reboot.
+ */
+static bool efi_rts_dead;
+
 /*
  * efi_queue_work:	Queue EFI runtime service call and wait for completion
  * @_rts:		EFI runtime service function identifier
@@ -328,6 +338,9 @@ static void __nocfi efi_call_rts(struct work_struct *work)
 static efi_status_t __efi_queue_work(enum efi_rts_ids id,
 				     union efi_rts_args *args)
 {
+	if (READ_ONCE(efi_rts_dead))
+		return EFI_DEVICE_ERROR;
+
 	efi_rts_work.efi_rts_id = id;
 	efi_rts_work.args = args;
 	efi_rts_work.caller = __builtin_return_address(0);
@@ -353,7 +366,9 @@ static efi_status_t __efi_queue_work(enum efi_rts_ids id,
 
 	if (!wait_for_completion_timeout(&efi_rts_work.efi_rts_comp,
 					 EFI_RTS_TIMEOUT)) {
-		pr_err("EFI runtime service %d wedged in firmware\n", id);
+		WRITE_ONCE(efi_rts_dead, true);
+		pr_err("EFI runtime service %d wedged in firmware; disabling EFI runtime services\n",
+		       id);
 		return EFI_TIMEOUT;
 	}
 

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
  2026-06-09 11:55 ` [PATCH 1/2] " Breno Leitao
@ 2026-06-11 10:21   ` Ard Biesheuvel
  2026-06-11 10:57     ` Ard Biesheuvel
  2026-06-12 10:05     ` Breno Leitao
  0 siblings, 2 replies; 7+ messages in thread
From: Ard Biesheuvel @ 2026-06-11 10:21 UTC (permalink / raw)
  To: Breno Leitao, Ilias Apalodimas; +Cc: linux-efi, linux-kernel, kernel-team

Hi Breno,

On Tue, 9 Jun 2026, at 13:55, Breno Leitao wrote:
> When an EFI runtime service call hangs in firmware, the kworker on
> efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
> The kernel currently waits indefinitely on the completion, and the
> caller holds efi_runtime_lock for the duration, so every subsequent
> EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
> handlers, ...) is wedged until reboot. The only externally visible
> symptom is a "workqueue lockup" message and userspace processes piling
> up uninterruptibly on the semaphore.
>
> Replace the indefinite wait_for_completion() in __efi_queue_work()
> with wait_for_completion_timeout() bounded by EFI_RTS_TIMEOUT
> (120 seconds). On timeout, log the wedged runtime service id and
> return EFI_TIMEOUT to the caller.
>
> The wedged worker is intentionally leaked - it is still inside
> firmware and cannot be cancelled - and the shared efi_rts_work is
> abandoned to it. EFI runtime services become unavailable until reboot;
> the alternative is permanent userspace hang.
>
> This patch should land together with the follow-up that introduces
> the efi_rts_dead fast-fail flag: between the two, a subsequent
> __efi_queue_work() caller will re-INIT_WORK() and re-init_completion()
> on the work_struct and completion that the leaked worker still owns,
> which can corrupt workqueue state and let the next caller observe
> the leaked call's status as if it were its own. The split exists for
> review clarity; reviewers may prefer to squash.
>
> Known limitation: the union efi_rts_args that __efi_queue_work() hands
> to the worker contains pointers into the caller's stack frame (the
> compound literal in efi_queue_work() and the in/out buffers it points
> to, e.g. *tm in GetTime). Once the caller returns -EIO and unwinds,
> those slots are reusable. If firmware eventually unblocks and writes
> to the output buffers after the timeout has fired, the writes land in
> whatever now occupies that memory. In practice firmware that hangs for
> more than 120 seconds tends to stay hung. A follow-up bouncing args
> and output buffers through kmalloc would close this gap.
>
> At fleet scale this turns a generic "workqueue lockup" or stalled-task
> mystery into an unambiguous "EFI firmware is at fault" signal in
> dmesg.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  drivers/firmware/efi/runtime-wrappers.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/firmware/efi/runtime-wrappers.c 
> b/drivers/firmware/efi/runtime-wrappers.c
> index da8d29621644..6ce6d094066e 100644
> --- a/drivers/firmware/efi/runtime-wrappers.c
> +++ b/drivers/firmware/efi/runtime-wrappers.c
> @@ -118,6 +118,14 @@ union efi_rts_args {
> 
>  struct efi_runtime_work efi_rts_work;
> 
> +/*
> + * Upper bound on how long we wait for a single EFI runtime service
> + * call to finish before declaring firmware wedged. Chosen to be longer
> + * than any plausible legitimate call (including UpdateCapsule on slow
> + * SPI-NOR) while still bounding userspace wait time.
> + */
> +#define EFI_RTS_TIMEOUT		(120 * HZ)
> +
>  /*
>   * efi_queue_work:	Queue EFI runtime service call and wait for completion
>   * @_rts:		EFI runtime service function identifier
> @@ -338,10 +346,16 @@ static efi_status_t __efi_queue_work(enum efi_rts_ids id,
>  	 * queue_work() returns 0 if work was already on queue,
>  	 * _ideally_ this should never happen.
>  	 */
> -	if (queue_work(efi_rts_wq, &efi_rts_work.work))
> -		wait_for_completion(&efi_rts_work.efi_rts_comp);
> -	else
> +	if (!queue_work(efi_rts_wq, &efi_rts_work.work)) {
>  		pr_err("Failed to queue work to efi_rts_wq.\n");
> +		goto exit;
> +	}
> +
> +	if (!wait_for_completion_timeout(&efi_rts_work.efi_rts_comp,
> +					 EFI_RTS_TIMEOUT)) {
> +		pr_err("EFI runtime service %d wedged in firmware\n", id);
> +		return EFI_TIMEOUT;

Could we just clear the EFI_RUNTIME_SERVICES bit here right away? That
way, we probably won't need the second patch (unless I'm mistaken). It
/should/ also block the calls that are not routed via the workqueue,
e.g., any EFI pstore calls to the non-blocking SetVariable() variant,
but I just noticed that we never check EFI_RUNTIME_SERVICES on those
code paths, which is probably a bug. 

And please return EFI_ABORTED rather than EFI_TIMEOUT - probably doesn't
matter in practice but I'd like to avoid introducing more EFI return codes
in the runtime context that the spec mentions only for boot services stuff.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
  2026-06-11 10:21   ` Ard Biesheuvel
@ 2026-06-11 10:57     ` Ard Biesheuvel
  2026-06-12 10:28       ` Breno Leitao
  2026-06-12 10:05     ` Breno Leitao
  1 sibling, 1 reply; 7+ messages in thread
From: Ard Biesheuvel @ 2026-06-11 10:57 UTC (permalink / raw)
  To: Breno Leitao, Ilias Apalodimas; +Cc: linux-efi, linux-kernel, kernel-team



On Thu, 11 Jun 2026, at 12:21, Ard Biesheuvel wrote:
> Hi Breno,
>
> On Tue, 9 Jun 2026, at 13:55, Breno Leitao wrote:
>> When an EFI runtime service call hangs in firmware, the kworker on
>> efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
>> The kernel currently waits indefinitely on the completion, and the
>> caller holds efi_runtime_lock for the duration, so every subsequent
>> EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
>> handlers, ...) is wedged until reboot. The only externally visible
>> symptom is a "workqueue lockup" message and userspace processes piling
>> up uninterruptibly on the semaphore.
>>
>> Replace the indefinite wait_for_completion() in __efi_queue_work()
>> with wait_for_completion_timeout() bounded by EFI_RTS_TIMEOUT
>> (120 seconds). On timeout, log the wedged runtime service id and
>> return EFI_TIMEOUT to the caller.
>>
>> The wedged worker is intentionally leaked - it is still inside
>> firmware and cannot be cancelled - and the shared efi_rts_work is
>> abandoned to it. EFI runtime services become unavailable until reboot;
>> the alternative is permanent userspace hang.
>>
>> This patch should land together with the follow-up that introduces
>> the efi_rts_dead fast-fail flag: between the two, a subsequent
>> __efi_queue_work() caller will re-INIT_WORK() and re-init_completion()
>> on the work_struct and completion that the leaked worker still owns,
>> which can corrupt workqueue state and let the next caller observe
>> the leaked call's status as if it were its own. The split exists for
>> review clarity; reviewers may prefer to squash.
>>
>> Known limitation: the union efi_rts_args that __efi_queue_work() hands
>> to the worker contains pointers into the caller's stack frame (the
>> compound literal in efi_queue_work() and the in/out buffers it points
>> to, e.g. *tm in GetTime). Once the caller returns -EIO and unwinds,
>> those slots are reusable. If firmware eventually unblocks and writes
>> to the output buffers after the timeout has fired, the writes land in
>> whatever now occupies that memory. In practice firmware that hangs for
>> more than 120 seconds tends to stay hung. A follow-up bouncing args
>> and output buffers through kmalloc would close this gap.
>>
>> At fleet scale this turns a generic "workqueue lockup" or stalled-task
>> mystery into an unambiguous "EFI firmware is at fault" signal in
>> dmesg.
>>
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>>  drivers/firmware/efi/runtime-wrappers.c | 20 +++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/firmware/efi/runtime-wrappers.c 
>> b/drivers/firmware/efi/runtime-wrappers.c
>> index da8d29621644..6ce6d094066e 100644
>> --- a/drivers/firmware/efi/runtime-wrappers.c
>> +++ b/drivers/firmware/efi/runtime-wrappers.c
>> @@ -118,6 +118,14 @@ union efi_rts_args {
>> 
>>  struct efi_runtime_work efi_rts_work;
>> 
>> +/*
>> + * Upper bound on how long we wait for a single EFI runtime service
>> + * call to finish before declaring firmware wedged. Chosen to be longer
>> + * than any plausible legitimate call (including UpdateCapsule on slow
>> + * SPI-NOR) while still bounding userspace wait time.
>> + */
>> +#define EFI_RTS_TIMEOUT		(120 * HZ)
>> +
>>  /*
>>   * efi_queue_work:	Queue EFI runtime service call and wait for completion
>>   * @_rts:		EFI runtime service function identifier
>> @@ -338,10 +346,16 @@ static efi_status_t __efi_queue_work(enum efi_rts_ids id,
>>  	 * queue_work() returns 0 if work was already on queue,
>>  	 * _ideally_ this should never happen.
>>  	 */
>> -	if (queue_work(efi_rts_wq, &efi_rts_work.work))
>> -		wait_for_completion(&efi_rts_work.efi_rts_comp);
>> -	else
>> +	if (!queue_work(efi_rts_wq, &efi_rts_work.work)) {
>>  		pr_err("Failed to queue work to efi_rts_wq.\n");
>> +		goto exit;
>> +	}
>> +
>> +	if (!wait_for_completion_timeout(&efi_rts_work.efi_rts_comp,
>> +					 EFI_RTS_TIMEOUT)) {
>> +		pr_err("EFI runtime service %d wedged in firmware\n", id);
>> +		return EFI_TIMEOUT;
>
> Could we just clear the EFI_RUNTIME_SERVICES bit here right away? That
> way, we probably won't need the second patch (unless I'm mistaken). It
> /should/ also block the calls that are not routed via the workqueue,
> e.g., any EFI pstore calls to the non-blocking SetVariable() variant,
> but I just noticed that we never check EFI_RUNTIME_SERVICES on those
> code paths, which is probably a bug. 
>
> And please return EFI_ABORTED rather than EFI_TIMEOUT - probably doesn't
> matter in practice but I'd like to avoid introducing more EFI return codes
> in the runtime context that the spec mentions only for boot services stuff.

Also, could we prevent the kthread that runs the workqueue from being scheduled
again if we decide that the runtime services are wedged? x86 has some logic for
this when a page fault occurs, and I wonder if there is a generic way to do
something similar.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
  2026-06-11 10:21   ` Ard Biesheuvel
  2026-06-11 10:57     ` Ard Biesheuvel
@ 2026-06-12 10:05     ` Breno Leitao
  1 sibling, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-12 10:05 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Ilias Apalodimas, linux-efi, linux-kernel, kernel-team

On Thu, Jun 11, 2026 at 12:21:42PM +0200, Ard Biesheuvel wrote:
> Hi Breno,
> 
> On Tue, 9 Jun 2026, at 13:55, Breno Leitao wrote:
> > When an EFI runtime service call hangs in firmware, the kworker on
> > efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
> > The kernel currently waits indefinitely on the completion, and the
> > caller holds efi_runtime_lock for the duration, so every subsequent
> > EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
> > handlers, ...) is wedged until reboot. The only externally visible
> > symptom is a "workqueue lockup" message and userspace processes piling
> > up uninterruptibly on the semaphore.
> >
> > Replace the indefinite wait_for_completion() in __efi_queue_work()
> > with wait_for_completion_timeout() bounded by EFI_RTS_TIMEOUT
> > (120 seconds). On timeout, log the wedged runtime service id and
> > return EFI_TIMEOUT to the caller.
> >
> > The wedged worker is intentionally leaked - it is still inside
> > firmware and cannot be cancelled - and the shared efi_rts_work is
> > abandoned to it. EFI runtime services become unavailable until reboot;
> > the alternative is permanent userspace hang.
> >
> > This patch should land together with the follow-up that introduces
> > the efi_rts_dead fast-fail flag: between the two, a subsequent
> > __efi_queue_work() caller will re-INIT_WORK() and re-init_completion()
> > on the work_struct and completion that the leaked worker still owns,
> > which can corrupt workqueue state and let the next caller observe
> > the leaked call's status as if it were its own. The split exists for
> > review clarity; reviewers may prefer to squash.
> >
> > Known limitation: the union efi_rts_args that __efi_queue_work() hands
> > to the worker contains pointers into the caller's stack frame (the
> > compound literal in efi_queue_work() and the in/out buffers it points
> > to, e.g. *tm in GetTime). Once the caller returns -EIO and unwinds,
> > those slots are reusable. If firmware eventually unblocks and writes
> > to the output buffers after the timeout has fired, the writes land in
> > whatever now occupies that memory. In practice firmware that hangs for
> > more than 120 seconds tends to stay hung. A follow-up bouncing args
> > and output buffers through kmalloc would close this gap.
> >
> > At fleet scale this turns a generic "workqueue lockup" or stalled-task
> > mystery into an unambiguous "EFI firmware is at fault" signal in
> > dmesg.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> >  drivers/firmware/efi/runtime-wrappers.c | 20 +++++++++++++++++---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/runtime-wrappers.c 
> > b/drivers/firmware/efi/runtime-wrappers.c
> > index da8d29621644..6ce6d094066e 100644
> > --- a/drivers/firmware/efi/runtime-wrappers.c
> > +++ b/drivers/firmware/efi/runtime-wrappers.c
> > @@ -118,6 +118,14 @@ union efi_rts_args {
> > 
> >  struct efi_runtime_work efi_rts_work;
> > 
> > +/*
> > + * Upper bound on how long we wait for a single EFI runtime service
> > + * call to finish before declaring firmware wedged. Chosen to be longer
> > + * than any plausible legitimate call (including UpdateCapsule on slow
> > + * SPI-NOR) while still bounding userspace wait time.
> > + */
> > +#define EFI_RTS_TIMEOUT		(120 * HZ)
> > +
> >  /*
> >   * efi_queue_work:	Queue EFI runtime service call and wait for completion
> >   * @_rts:		EFI runtime service function identifier
> > @@ -338,10 +346,16 @@ static efi_status_t __efi_queue_work(enum efi_rts_ids id,
> >  	 * queue_work() returns 0 if work was already on queue,
> >  	 * _ideally_ this should never happen.
> >  	 */
> > -	if (queue_work(efi_rts_wq, &efi_rts_work.work))
> > -		wait_for_completion(&efi_rts_work.efi_rts_comp);
> > -	else
> > +	if (!queue_work(efi_rts_wq, &efi_rts_work.work)) {
> >  		pr_err("Failed to queue work to efi_rts_wq.\n");
> > +		goto exit;
> > +	}
> > +
> > +	if (!wait_for_completion_timeout(&efi_rts_work.efi_rts_comp,
> > +					 EFI_RTS_TIMEOUT)) {
> > +		pr_err("EFI runtime service %d wedged in firmware\n", id);
> > +		return EFI_TIMEOUT;
> 
> Could we just clear the EFI_RUNTIME_SERVICES bit here right away?

Ack. This is better than createing a new variable. Thanks!

I am planing to move that entry check ahead of the efi_rts_work
assignments (its own small prep patch), Otherwise a post-timeout caller
would overwrite efi_rts_work.efi_rts_id and reset it to EFI_NONE on the
way out.

> It /should/ also block the calls that are not routed via
> the workqueue, e.g., any EFI pstore calls to the non-blocking
> SetVariable() variant, but I just noticed that we never check
> EFI_RUNTIME_SERVICES on those code paths, which is probably a bug. 

Confirmed, fixed as a separate patch: virt_efi_set_variable_nb(),
virt_efi_query_variable_info_nb() and virt_efi_reset_system() now check
efi_enabled(EFI_RUNTIME_SERVICES) before calling firmware.

> And please return EFI_ABORTED rather than EFI_TIMEOUT - probably
> doesn't matter in practice but I'd like to avoid introducing more EFI
> return codes in the runtime context that the spec mentions only for
> boot services stuff.

Ack. The timeout path returns EFI_ABORTED, matching what
efi_recover_from_page_fault() already reports

Thanks for the review,
--breno

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
  2026-06-11 10:57     ` Ard Biesheuvel
@ 2026-06-12 10:28       ` Breno Leitao
  0 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-12 10:28 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Ilias Apalodimas, linux-efi, linux-kernel, kernel-team

Hi Ard,

On Thu, Jun 11, 2026 at 12:57:50PM +0200, Ard Biesheuvel wrote:
> > Could we just clear the EFI_RUNTIME_SERVICES bit here right away? That
> > way, we probably won't need the second patch (unless I'm mistaken). It
> > /should/ also block the calls that are not routed via the workqueue,
> > e.g., any EFI pstore calls to the non-blocking SetVariable() variant,
> > but I just noticed that we never check EFI_RUNTIME_SERVICES on those
> > code paths, which is probably a bug. 
> >
> > And please return EFI_ABORTED rather than EFI_TIMEOUT - probably doesn't
> > matter in practice but I'd like to avoid introducing more EFI return codes
> > in the runtime context that the spec mentions only for boot services stuff.
> 
> Also, could we prevent the kthread that runs the workqueue from being scheduled
> again if we decide that the runtime services are wedged?

Sure, I can park it at for (;;) schedule()). Maybe with a helper?!

> x86 has some logic for this when a page fault occurs, and I wonder if
> there is a generic way to do something similar.

I have no clue, to be honest. If you have something specific in mind,
I'll dig in.

Thanks for the review,
--breno



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-12 10:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 11:55 [PATCH 0/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls Breno Leitao
2026-06-09 11:55 ` [PATCH 1/2] " Breno Leitao
2026-06-11 10:21   ` Ard Biesheuvel
2026-06-11 10:57     ` Ard Biesheuvel
2026-06-12 10:28       ` Breno Leitao
2026-06-12 10:05     ` Breno Leitao
2026-06-09 11:55 ` [PATCH 2/2] efi/runtime-wrappers: disable EFI runtime services after a hang Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox