Linux EFI development
 help / color / mirror / Atom feed
From: Breno Leitao <leitao@debian.org>
To: Ard Biesheuvel <ardb@kernel.org>,
	 Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	 Borislav Petkov <bp@suse.de>, Andy Lutomirski <luto@kernel.org>,
	 Kees Cook <kees@kernel.org>, Tony Luck <tony.luck@intel.com>,
	 "Guilherme G. Piccoli" <gpiccoli@igalia.com>,
	 Thomas Gleixner <tglx@kernel.org>,
	Ingo Molnar <mingo@redhat.com>,  Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Breno Leitao <leitao@debian.org>,
	kernel-team@meta.com
Subject: [PATCH v2 0/6] efi/runtime-wrappers: bound the wait for EFI runtime service calls
Date: Fri, 12 Jun 2026 04:01:27 -0700	[thread overview]
Message-ID: <20260612-efi_timeout-v2-0-f714bb016df6@debian.org> (raw)

When an EFI runtime service call hangs in firmware, the kworker on
efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
The kernel currently waits indefinitely on the completion, and the
caller holds efi_runtime_lock for the duration, so every subsequent
EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
handlers, ...) is wedged until reboot. The only externally visible
symptom is a "workqueue lockup" message and userspace processes
piling up uninterruptibly on the semaphore.

A real example from one of our NVIDIA Grace hosts:

  BUG: workqueue lockup - pool cpus=28 node=0 flags=0x0 nice=0 stuck for 127s!
  ...
  CPU: 28 PID: 590 Comm: kworker/u288:6
  Workqueue: efi_rts_wq efi_call_rts
  Call trace:
   0x4052f11ecc (P)
   0x4052f10ed4
   ...
   __efi_rt_asm_wrapper+0x50/0x78
   efi_call_rts+0x178/0x240
   process_scheduled_works+0x17c/0x420
   worker_thread+0x184/0x4d8
   kthread+0xcc/0x1f8
   ret_from_fork+0x10/0x20

PC and LR are inside EFI runtime services firmware memory; firmware
never returned; the worker stayed stuck across the 127s / 157s / 188s
"workqueue lockup" reports until external monitoring eventually rebooted
the host.

This series doesn't fix the firmware bug - that's vendor territory -
but it stops one stuck EFI call from taking the rest of userspace
down with it, and turns a generic stalled-task mystery into an
unambiguous "EFI firmware is at fault" signal in dmesg, which is
especially valuable at fleet scale where the same symptom could
otherwise be attributed to dozens of unrelated stalls.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- Drop v1's efi_rts_dead flag; reuse the existing EFI_RUNTIME_SERVICES bit
(cleared on timeout) and return EFI_ABORTED instead of EFI_TIMEOUT (per Ard). 
- Also guard the non-blocking paths (set_variable/query_variable_info/reset_system)
 and park the leaked worker via a shared efi_rts_park_worker() reused by x86's
page-fault handler;
- Split into smaller prep patches.
- Link to v1: https://lore.kernel.org/r/20260609-efi_timeout-v1-0-69a896faa805@debian.org

---
Breno Leitao (6):
      efi: fix stale reference to efi_recover_from_page_fault()
      efi/runtime-wrappers: handle queue_work() failure with goto exit
      efi/runtime-wrappers: check EFI_RUNTIME_SERVICES before using efi_rts_work
      efi/runtime-wrappers: bound the wait for EFI runtime service calls
      efi/runtime-wrappers: honour EFI_RUNTIME_SERVICES in the non-blocking paths
      efi/runtime-wrappers: retire the worker if a wedged call ever returns

 arch/x86/platform/efi/quirks.c          |  9 +----
 drivers/firmware/efi/runtime-wrappers.c | 65 ++++++++++++++++++++++++++++-----
 include/linux/efi.h                     |  6 ++-
 3 files changed, 61 insertions(+), 19 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260609-efi_timeout-6f51d5bbcfb7

Best regards,
-- 
Breno Leitao <leitao@debian.org>


             reply	other threads:[~2026-06-12 11:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12 11:01 Breno Leitao [this message]
2026-06-12 11:01 ` [PATCH v2 1/6] efi: fix stale reference to efi_recover_from_page_fault() Breno Leitao
2026-06-12 11:01 ` [PATCH v2 2/6] efi/runtime-wrappers: handle queue_work() failure with goto exit Breno Leitao
2026-06-12 11:01 ` [PATCH v2 3/6] efi/runtime-wrappers: check EFI_RUNTIME_SERVICES before using efi_rts_work Breno Leitao
2026-06-12 11:01 ` [PATCH v2 4/6] efi/runtime-wrappers: bound the wait for EFI runtime service calls Breno Leitao
2026-06-12 11:01 ` [PATCH v2 5/6] efi/runtime-wrappers: honour EFI_RUNTIME_SERVICES in the non-blocking paths Breno Leitao
2026-06-12 11:01 ` [PATCH v2 6/6] efi/runtime-wrappers: retire the worker if a wedged call ever returns Breno Leitao
2026-06-12 11:11   ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612-efi_timeout-v2-0-f714bb016df6@debian.org \
    --to=leitao@debian.org \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=bp@suse.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gpiccoli@igalia.com \
    --cc=hpa@zytor.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kees@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox