From: Breno Leitao <leitao@debian.org>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>,
linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH 1/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls
Date: Fri, 12 Jun 2026 03:05:29 -0700 [thread overview]
Message-ID: <aivZN29oru1TWOvu@gmail.com> (raw)
In-Reply-To: <71de953a-43e9-4987-8b8e-00d9b99a099e@app.fastmail.com>
On Thu, Jun 11, 2026 at 12:21:42PM +0200, Ard Biesheuvel wrote:
> Hi Breno,
>
> On Tue, 9 Jun 2026, at 13:55, Breno Leitao wrote:
> > When an EFI runtime service call hangs in firmware, the kworker on
> > efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
> > The kernel currently waits indefinitely on the completion, and the
> > caller holds efi_runtime_lock for the duration, so every subsequent
> > EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
> > handlers, ...) is wedged until reboot. The only externally visible
> > symptom is a "workqueue lockup" message and userspace processes piling
> > up uninterruptibly on the semaphore.
> >
> > Replace the indefinite wait_for_completion() in __efi_queue_work()
> > with wait_for_completion_timeout() bounded by EFI_RTS_TIMEOUT
> > (120 seconds). On timeout, log the wedged runtime service id and
> > return EFI_TIMEOUT to the caller.
> >
> > The wedged worker is intentionally leaked - it is still inside
> > firmware and cannot be cancelled - and the shared efi_rts_work is
> > abandoned to it. EFI runtime services become unavailable until reboot;
> > the alternative is permanent userspace hang.
> >
> > This patch should land together with the follow-up that introduces
> > the efi_rts_dead fast-fail flag: between the two, a subsequent
> > __efi_queue_work() caller will re-INIT_WORK() and re-init_completion()
> > on the work_struct and completion that the leaked worker still owns,
> > which can corrupt workqueue state and let the next caller observe
> > the leaked call's status as if it were its own. The split exists for
> > review clarity; reviewers may prefer to squash.
> >
> > Known limitation: the union efi_rts_args that __efi_queue_work() hands
> > to the worker contains pointers into the caller's stack frame (the
> > compound literal in efi_queue_work() and the in/out buffers it points
> > to, e.g. *tm in GetTime). Once the caller returns -EIO and unwinds,
> > those slots are reusable. If firmware eventually unblocks and writes
> > to the output buffers after the timeout has fired, the writes land in
> > whatever now occupies that memory. In practice firmware that hangs for
> > more than 120 seconds tends to stay hung. A follow-up bouncing args
> > and output buffers through kmalloc would close this gap.
> >
> > At fleet scale this turns a generic "workqueue lockup" or stalled-task
> > mystery into an unambiguous "EFI firmware is at fault" signal in
> > dmesg.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > drivers/firmware/efi/runtime-wrappers.c | 20 +++++++++++++++++---
> > 1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/runtime-wrappers.c
> > b/drivers/firmware/efi/runtime-wrappers.c
> > index da8d29621644..6ce6d094066e 100644
> > --- a/drivers/firmware/efi/runtime-wrappers.c
> > +++ b/drivers/firmware/efi/runtime-wrappers.c
> > @@ -118,6 +118,14 @@ union efi_rts_args {
> >
> > struct efi_runtime_work efi_rts_work;
> >
> > +/*
> > + * Upper bound on how long we wait for a single EFI runtime service
> > + * call to finish before declaring firmware wedged. Chosen to be longer
> > + * than any plausible legitimate call (including UpdateCapsule on slow
> > + * SPI-NOR) while still bounding userspace wait time.
> > + */
> > +#define EFI_RTS_TIMEOUT (120 * HZ)
> > +
> > /*
> > * efi_queue_work: Queue EFI runtime service call and wait for completion
> > * @_rts: EFI runtime service function identifier
> > @@ -338,10 +346,16 @@ static efi_status_t __efi_queue_work(enum efi_rts_ids id,
> > * queue_work() returns 0 if work was already on queue,
> > * _ideally_ this should never happen.
> > */
> > - if (queue_work(efi_rts_wq, &efi_rts_work.work))
> > - wait_for_completion(&efi_rts_work.efi_rts_comp);
> > - else
> > + if (!queue_work(efi_rts_wq, &efi_rts_work.work)) {
> > pr_err("Failed to queue work to efi_rts_wq.\n");
> > + goto exit;
> > + }
> > +
> > + if (!wait_for_completion_timeout(&efi_rts_work.efi_rts_comp,
> > + EFI_RTS_TIMEOUT)) {
> > + pr_err("EFI runtime service %d wedged in firmware\n", id);
> > + return EFI_TIMEOUT;
>
> Could we just clear the EFI_RUNTIME_SERVICES bit here right away?
Ack. This is better than createing a new variable. Thanks!
I am planing to move that entry check ahead of the efi_rts_work
assignments (its own small prep patch), Otherwise a post-timeout caller
would overwrite efi_rts_work.efi_rts_id and reset it to EFI_NONE on the
way out.
> It /should/ also block the calls that are not routed via
> the workqueue, e.g., any EFI pstore calls to the non-blocking
> SetVariable() variant, but I just noticed that we never check
> EFI_RUNTIME_SERVICES on those code paths, which is probably a bug.
Confirmed, fixed as a separate patch: virt_efi_set_variable_nb(),
virt_efi_query_variable_info_nb() and virt_efi_reset_system() now check
efi_enabled(EFI_RUNTIME_SERVICES) before calling firmware.
> And please return EFI_ABORTED rather than EFI_TIMEOUT - probably
> doesn't matter in practice but I'd like to avoid introducing more EFI
> return codes in the runtime context that the spec mentions only for
> boot services stuff.
Ack. The timeout path returns EFI_ABORTED, matching what
efi_recover_from_page_fault() already reports
Thanks for the review,
--breno
next prev parent reply other threads:[~2026-06-12 10:05 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 11:55 [PATCH 0/2] efi/runtime-wrappers: bound the wait for EFI runtime service calls Breno Leitao
2026-06-09 11:55 ` [PATCH 1/2] " Breno Leitao
2026-06-11 10:21 ` Ard Biesheuvel
2026-06-11 10:57 ` Ard Biesheuvel
2026-06-12 10:28 ` Breno Leitao
2026-06-12 10:05 ` Breno Leitao [this message]
2026-06-09 11:55 ` [PATCH 2/2] efi/runtime-wrappers: disable EFI runtime services after a hang Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aivZN29oru1TWOvu@gmail.com \
--to=leitao@debian.org \
--cc=ardb@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kernel-team@meta.com \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox