From: Breno Leitao <leitao@debian.org>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>,
linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH 1/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
Date: Fri, 12 Jun 2026 03:05:29 -0700 [thread overview]
Message-ID: <aivZN29oru1TWOvu@gmail.com> (raw)
In-Reply-To: <71de953a-43e9-4987-8b8e-00d9b99a099e@app.fastmail.com>
On Thu, Jun 11, 2026 at 12:21:42PM +0200, Ard Biesheuvel wrote:
> Hi Breno,
>
> On Tue, 9 Jun 2026, at 13:55, Breno Leitao wrote:
> > When an EFI runtime service call hangs in firmware, the kworker on
> > efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
> > The kernel currently waits indefinitely on the completion, and the
> > caller holds efi_runtime_lock for the duration, so every subsequent
> > EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
> > handlers, ...) is wedged until reboot. The only externally visible
> > symptom is a "workqueue lockup" message and userspace processes piling
> > up uninterruptibly on the semaphore.
> >
> > Replace the indefinite wait_for_completion() in __efi_queue_work()
> > with wait_for_completion_timeout() bounded by EFI_RTS_TIMEOUT
> > (120 seconds). On timeout, log the wedged runtime service id and
> > return EFI_TIMEOUT to the caller.
> >
> > The wedged worker is intentionally leaked - it is still inside
> > firmware and cannot be cancelled - and the shared efi_rts_work is
> > abandoned to it. EFI runtime services become unavailable until reboot;
> > the alternative is permanent userspace hang.
> >
> > This patch should land together with the follow-up that introduces
> > the efi_rts_dead fast-fail flag: between the two, a subsequent
> > __efi_queue_work() caller will re-INIT_WORK() and re-init_completion()
> > on the work_struct and completion that the leaked worker still owns,
> > which can corrupt workqueue state and let the next caller observe
> > the leaked call's status as if it were its own. The split exists for
> > review clarity; reviewers may prefer to squash.
> >
> > Known limitation: the union efi_rts_args that __efi_queue_work() hands
> > to the worker contains pointers into the caller's stack frame (the
> > compound literal in efi_queue_work() and the in/out buffers it points
> > to, e.g. *tm in GetTime). Once the caller returns -EIO and unwinds,
> > those slots are reusable. If firmware eventually unblocks and writes
> > to the output buffers after the timeout has fired, the writes land in
> > whatever now occupies that memory. In practice firmware that hangs for
> > more than 120 seconds tends to stay hung. A follow-up bouncing args
> > and output buffers through kmalloc would close this gap.
> >
> > At fleet scale this turns a generic "workqueue lockup" or stalled-task
> > mystery into an unambiguous "EFI firmware is at fault" signal in
> > dmesg.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > drivers/firmware/efi/runtime-wrappers.c | 20 +++++++++++++++++---
> > 1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/runtime-wrappers.c
> > b/drivers/firmware/efi/runtime-wrappers.c
> > index da8d29621644..6ce6d094066e 100644
> > --- a/drivers/firmware/efi/runtime-wrappers.c
> > +++ b/drivers/firmware/efi/runtime-wrappers.c
> > @@ -118,6 +118,14 @@ union efi_rts_args {
> >
> > struct efi_runtime_work efi_rts_work;
> >
> > +/*
> > + * Upper bound on how long we wait for a single EFI runtime service
> > + * call to finish before declaring firmware wedged. Chosen to be longer
> > + * than any plausible legitimate call (including UpdateCapsule on slow
> > + * SPI-NOR) while still bounding userspace wait time.
> > + */
> > +#define EFI_RTS_TIMEOUT (120 * HZ)
> > +
> > /*
> > * efi_queue_work: Queue EFI runtime service call and wait for completion
> > * @_rts: EFI runtime service function identifier
> > @@ -338,10 +346,16 @@ static efi_status_t __efi_queue_work(enum efi_rts_ids id,
> > * queue_work() returns 0 if work was already on queue,
> > * _ideally_ this should never happen.
> > */
> > - if (queue_work(efi_rts_wq, &efi_rts_work.work))
> > - wait_for_completion(&efi_rts_work.efi_rts_comp);
> > - else
> > + if (!queue_work(efi_rts_wq, &efi_rts_work.work)) {
> > pr_err("Failed to queue work to efi_rts_wq.\n");
> > + goto exit;
> > + }
> > +
> > + if (!wait_for_completion_timeout(&efi_rts_work.efi_rts_comp,
> > + EFI_RTS_TIMEOUT)) {
> > + pr_err("EFI runtime service %d wedged in firmware\n", id);
> > + return EFI_TIMEOUT;
>
> Could we just clear the EFI_RUNTIME_SERVICES bit here right away?
Ack. This is better than createing a new variable. Thanks!
I am planing to move that entry check ahead of the efi_rts_work
assignments (its own small prep patch), Otherwise a post-timeout caller
would overwrite efi_rts_work.efi_rts_id and reset it to EFI_NONE on the
way out.
> It /should/ also block the calls that are not routed via
> the workqueue, e.g., any EFI pstore calls to the non-blocking
> SetVariable() variant, but I just noticed that we never check
> EFI_RUNTIME_SERVICES on those code paths, which is probably a bug.
Confirmed, fixed as a separate patch: virt_efi_set_variable_nb(),
virt_efi_query_variable_info_nb() and virt_efi_reset_system() now check
efi_enabled(EFI_RUNTIME_SERVICES) before calling firmware.
> And please return EFI_ABORTED rather than EFI_TIMEOUT - probably
> doesn't matter in practice but I'd like to avoid introducing more EFI
> return codes in the runtime context that the spec mentions only for
> boot services stuff.
Ack. The timeout path returns EFI_ABORTED, matching what
efi_recover_from_page_fault() already reports
Thanks for the review,
--breno
next prev parent reply other threads:[~2026-06-12 10:05 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 11:55 [PATCH 0/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls Breno Leitao
2026-06-09 11:55 ` [PATCH 1/2] " Breno Leitao
2026-06-11 10:21 ` Ard Biesheuvel
2026-06-11 10:57 ` Ard Biesheuvel
2026-06-12 10:28 ` Breno Leitao
2026-06-12 10:05 ` Breno Leitao [this message]
2026-06-09 11:55 ` [PATCH 2/2] efi/runtime-wrappers: disable EFI runtime services after a hang Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aivZN29oru1TWOvu@gmail.com \
--to=leitao@debian.org \
--cc=ardb@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kernel-team@meta.com \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.