From: Breno Leitao <leitao@debian.org>
To: Ard Biesheuvel <ardb@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
Breno Leitao <leitao@debian.org>,
kernel-team@meta.com
Subject: [PATCH 0/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
Date: Tue, 09 Jun 2026 04:55:26 -0700 [thread overview]
Message-ID: <20260609-efi_timeout-v1-0-69a896faa805@debian.org> (raw)
When an EFI runtime service call hangs in firmware, the kworker on
efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
The kernel currently waits indefinitely on the completion, and the
caller holds efi_runtime_lock for the duration, so every subsequent
EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
handlers, ...) is wedged until reboot. The only externally visible
symptom is a "workqueue lockup" message and userspace processes
piling up uninterruptibly on the semaphore.
A real example from one of our NVIDIA Grace hosts:
BUG: workqueue lockup - pool cpus=28 node=0 flags=0x0 nice=0 stuck for 127s!
...
CPU: 28 PID: 590 Comm: kworker/u288:6
Workqueue: efi_rts_wq efi_call_rts
Call trace:
0x4052f11ecc (P)
0x4052f10ed4
...
__efi_rt_asm_wrapper+0x50/0x78
efi_call_rts+0x178/0x240
process_scheduled_works+0x17c/0x420
worker_thread+0x184/0x4d8
kthread+0xcc/0x1f8
ret_from_fork+0x10/0x20
PC and LR are inside EFI runtime services firmware memory; firmware
never returned; the worker stayed stuck across the 127s / 157s / 188s
"workqueue lockup" reports until external monitoring eventually rebooted
the host.
This series doesn't fix the firmware bug - that's vendor territory -
but it stops one stuck EFI call from taking the rest of userspace
down with it, and turns a generic stalled-task mystery into an
unambiguous "EFI firmware is at fault" signal in dmesg, which is
especially valuable at fleet scale where the same symptom could
otherwise be attributed to dozens of unrelated stalls.
Patch 1 bounds the wait at 120 seconds via wait_for_completion_timeout().
On timeout it logs the wedged runtime service id and returns
EFI_TIMEOUT to the caller instead of letting the task hang forever.
Patch 2 introduces the efi_rts_dead flag set on timeout and checked
at the entry of __efi_queue_work() so all subsequent callers fail
fast with EFI_DEVICE_ERROR rather than each paying another 120
seconds. The flag is also required for correctness - without it the
next caller after a timeout walks into INIT_WORK() and
init_completion() on the work_struct and completion the leaked
worker still owns. Patch 1 and patch 2 should land together;
reviewers may prefer to squash them.
The wedged worker is intentionally leaked - it is still inside
firmware and cannot be cancelled - and the shared efi_rts_work is
abandoned to it. EFI runtime services are unavailable until reboot,
but the rest of userspace keeps running.
Known limitation: the union efi_rts_args that the worker receives
contains pointers into the caller's stack frame (the compound literal
in efi_queue_work() and the in/out buffers it points to, e.g. *tm in
GetTime). Once the caller returns -EIO and unwinds, those slots are
reusable. If firmware eventually unblocks and writes the output
buffers after the timeout has fired, the writes land in whatever now
occupies that memory. In practice firmware that hangs for more than
120 seconds tends to stay hung, but the trade-off is real. A
follow-up bouncing args and output buffers through kmalloc would
close this gap.
Tested under virtme-ng + OVMF with a debug hook that hangs one
runtime service on demand: pr_err fires at +120s, the syscall that
triggered it (mount -t efivarfs) returns with EFI_TIMEOUT
(status=0x8000000000000012) propagated through efivars instead of
blocking indefinitely.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Breno Leitao (2):
efi/runtime-wrappers: bound the wait for EFI runtime service calls
efi/runtime-wrappers: disable EFI runtime services after a hang
drivers/firmware/efi/runtime-wrappers.c | 35 ++++++++++++++++++++++++++++++---
1 file changed, 32 insertions(+), 3 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260609-efi_timeout-6f51d5bbcfb7
Best regards,
--
Breno Leitao <leitao@debian.org>
next reply other threads:[~2026-06-09 11:56 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 11:55 Breno Leitao [this message]
2026-06-09 11:55 ` [PATCH 1/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls Breno Leitao
2026-06-11 10:21 ` Ard Biesheuvel
2026-06-11 10:57 ` Ard Biesheuvel
2026-06-12 10:28 ` Breno Leitao
2026-06-12 10:05 ` Breno Leitao
2026-06-09 11:55 ` [PATCH 2/2] efi/runtime-wrappers: disable EFI runtime services after a hang Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260609-efi_timeout-v1-0-69a896faa805@debian.org \
--to=leitao@debian.org \
--cc=ardb@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kernel-team@meta.com \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox