From: Jon Hunter <jonathanh@nvidia.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
linux-fsdevel@vger.kernel.org, linux-efi@vger.kernel.org,
linux-pm@vger.kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>, Jeremy Kerr <jk@ozlabs.org>,
Christian Brauner <brauner@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Lennart Poettering <mzxreary@0pointer.de>,
"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>
Subject: Re: [PATCH 2/2] efivarfs: add variable resync after hibernation
Date: Fri, 28 Feb 2025 17:53:43 +0000 [thread overview]
Message-ID: <253edd1f-044b-4d7d-a718-b79015618b80@nvidia.com> (raw)
In-Reply-To: <3c371af00ce12575ab11522189cd37d4167cc3aa.camel@HansenPartnership.com>
On 28/02/2025 16:49, James Bottomley wrote:
> On Fri, 2025-02-28 at 16:44 +0000, Jon Hunter wrote:
>> Hi James,
>>
>> On 07/01/2025 21:31, James Bottomley wrote:
>>> Hibernation allows other OSs to boot and thus the variable state
>>> might
>>> be altered by the time the hibernation image is resumed. Resync
>>> the
>>> variable state by looping over all the dentries and update the size
>>> (in case of alteration) delete any which no-longer exist. Finally,
>>> loop over all efi variables creating any which don't have
>>> corresponding dentries.
>>>
>>> Signed-off-by: James Bottomley
>>> <James.Bottomley@HansenPartnership.com>
>>> ---
>>> fs/efivarfs/internal.h | 3 +-
>>> fs/efivarfs/super.c | 151
>>> ++++++++++++++++++++++++++++++++++++++++-
>>> fs/efivarfs/vars.c | 5 +-
>>> 3 files changed, 155 insertions(+), 4 deletions(-)
>>
>> ...
>>
>>> +static int efivarfs_pm_notify(struct notifier_block *nb, unsigned
>>> long action,
>>> + void *ptr)
>>> +{
>>> + struct efivarfs_fs_info *sfi = container_of(nb, struct
>>> efivarfs_fs_info,
>>> + pm_nb);
>>> + struct path path = { .mnt = NULL, .dentry = sfi->sb-
>>>> s_root, };
>>> + struct efivarfs_ctx ectx = {
>>> + .ctx = {
>>> + .actor = efivarfs_actor,
>>> + },
>>> + .sb = sfi->sb,
>>> + };
>>> + struct file *file;
>>> + static bool rescan_done = true;
>>> +
>>> + if (action == PM_HIBERNATION_PREPARE) {
>>> + rescan_done = false;
>>> + return NOTIFY_OK;
>>> + } else if (action != PM_POST_HIBERNATION) {
>>> + return NOTIFY_DONE;
>>> + }
>>> +
>>> + if (rescan_done)
>>> + return NOTIFY_DONE;
>>> +
>>> + pr_info("efivarfs: resyncing variable state\n");
>>> +
>>> + /* O_NOATIME is required to prevent oops on NULL mnt */
>>> + file = kernel_file_open(&path, O_RDONLY | O_DIRECTORY |
>>> O_NOATIME,
>>> + current_cred());
>>> + if (!file)
>>> + return NOTIFY_DONE;
>>> +
>>> + rescan_done = true;
>>> +
>>> + /*
>>> + * First loop over the directory and verify each entry
>>> exists,
>>> + * removing it if it doesn't
>>> + */
>>> + file->f_pos = 2; /* skip . and .. */
>>> + do {
>>> + ectx.dentry = NULL;
>>> + iterate_dir(file, &ectx.ctx);
>>> + if (ectx.dentry) {
>>> + pr_info("efivarfs: removing variable
>>> %pd\n",
>>> + ectx.dentry);
>>> + simple_recursive_removal(ectx.dentry,
>>> NULL);
>>> + dput(ectx.dentry);
>>> + }
>>> + } while (ectx.dentry);
>>> + fput(file);
>>> +
>>> + /*
>>> + * then loop over variables, creating them if there's no
>>> matching
>>> + * dentry
>>> + */
>>> + efivar_init(efivarfs_check_missing, sfi->sb, false);
>>> +
>>> + return NOTIFY_OK;
>>> +}
>>
>>
>> With the current mainline I have observed the following crash when
>> testing suspend on one of our Tegra devices ...
>>
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Feb 28 16:25:55
>> 2025
>> [ 246.593485] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000068
>> [ 246.602601] Mem abort info:
>> [ 246.602603] ESR = 0x0000000096000004
>> [ 246.602605] EC = 0x25: DABT (current EL), IL = 32 bits
>> [ 246.602608] SET = 0, FnV = 0
>> [ 246.602610] EA = 0, S1PTW = 0
>> [ 246.602612] FSC = 0x04: level 0 translation fault
>> [ 246.602615] Data abort info:
>> [ 246.602617] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> [ 246.634959] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> [ 246.634961] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [ 246.634964] user pgtable: 4k pages, 48-bit VAs,
>> pgdp=0000000105205000
>> [ 246.634967] [0000000000000068] pgd=0000000000000000,
>> p4d=0000000000000000
>> [ 246.634974] Internal error: Oops: 0000000096000004 [#1] PREEMPT
>> SMP
>> [ 246.665796] Modules linked in: qrtr bridge stp llc usb_f_ncm
>> usb_f_mass_storage usb_f_acm u_serial usb_f_rndis u_ether
>> libcomposite tegra_drm btusb btrtl drm_dp_aux_bus btintel cec nvme
>> btmtk nvme_core btbcm drm_display_helper bluetoot
>> h snd_soc_tegra210_admaif drm_client_lib snd_soc_tegra210_dmic
>> snd_soc_tegra186_asrc snd_soc_tegra210_mixer snd_soc_tegra_pcm
>> snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_adx
>> snd_soc_tegra210_amx snd_soc_tegra210_sfc snd_soc
>> _tegra210_i2s drm_kms_helper ecdh_generic tegra_se ucsi_ccg ecc
>> snd_hda_codec_hdmi typec_ucsi snd_soc_tegra_audio_graph_card
>> snd_soc_tegra210_ahub rfkill tegra210_adma snd_hda_tegra
>> crypto_engine snd_soc_audio_graph_card typec snd_hda_cod
>> ec snd_soc_simple_card_utils tegra_aconnect arm_dsu_pmu
>> snd_soc_rt5640 snd_hda_core tegra_xudc ramoops phy_tegra194_p2u
>> snd_soc_rl6231 at24 pcie_tegra194 tegra_bpmp_thermal host1x
>> reed_solomon lm90 pwm_tegra ina3221 pwm_fan fuse drm backl
>> ight dm_mod ip_tables x_tables ipv6
>> [ 246.756182] CPU: 9 UID: 0 PID: 1255 Comm: rtcwake Not tainted
>> 6.14.0-rc4-g8d538b296d56 #61
>> [ 246.764677] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer
>> Kit/Jetson, BIOS 00.0.0-dev-main_88214_5a0f5_a213e 02/26/2025
>> [ 246.776569] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [ 246.783718] pc : efivarfs_pm_notify+0x48/0x180
>> [ 246.788285] lr : blocking_notifier_call_chain_robust+0x78/0xe4
>> [ 246.794286] sp : ffff800085cebb60
>> [ 246.797684] x29: ffff800085cebb60 x28: ffff000093f1b480 x27:
>> 0000000000000000
>> [ 246.805021] x26: 0000000000000004 x25: ffff8000828d3638 x24:
>> 0000000000000003
>> [ 246.812355] x23: 0000000000000000 x22: 0000000000000005 x21:
>> 0000000000000006
>> [ 246.819694] x20: ffff000087f698c0 x19: 0000000000000003 x18:
>> 000000007f19bcda
>> [ 246.827029] x17: 00000000c42545a5 x16: 000000000000001c x15:
>> 000000001709e0a9
>> [ 246.834372] x14: 00000000ac6b3a37 x13: 18286bf36c021b08 x12:
>> 00000039694cf81c
>> [ 246.841713] x11: 00000000f1e0faad x10: 0000000000000001 x9 :
>> 000000004ff99d57
>> [ 246.849046] x8 : 00000000bb51f9d6 x7 : 00000001f4d4185c x6 :
>> 00000001f4d4185c
>> [ 246.856382] x5 : ffff800085cebb58 x4 : ffff800080952930 x3 :
>> ffff8000804cba44
>> [ 246.863713] x2 : 0000000000000000 x1 : 0000000000000003 x0 :
>> ffff8000804cbf84
>> [ 246.871045] Call trace:
>> [ 246.873550] efivarfs_pm_notify+0x48/0x180 (P)
>> [ 246.878119] blocking_notifier_call_chain_robust+0x78/0xe4
>> [ 246.883753] pm_notifier_call_chain_robust+0x28/0x48
>> [ 246.888852] pm_suspend+0x138/0x1c8
>> [ 246.892438] state_store+0x8c/0xfc
>> [ 246.895931] kobj_attr_store+0x18/0x2c
>> [ 246.899791] sysfs_kf_write+0x44/0x54
>> [ 246.903553] kernfs_fop_write_iter+0x118/0x1a8
>> [ 246.908113] vfs_write+0x2b0/0x35c
>> [ 246.911608] ksys_write+0x68/0xfc
>> [ 246.915013] __arm64_sys_write+0x1c/0x28
>> [ 246.919038] invoke_syscall+0x48/0x110
>> [ 246.922897] el0_svc_common.constprop.0+0x40/0xe8
>> [ 246.927731] do_el0_svc+0x20/0x2c
>> [ 246.931127] el0_svc+0x30/0xd0
>> [ 246.934265] el0t_64_sync_handler+0x144/0x168
>> [ 246.938737] el0t_64_sync+0x198/0x19c
>> [ 246.942505] Code: f9400682 f90027ff a906ffe2 f100043f (f9403440)
>> [ 246.948767] ---[ end trace 0000000000000000 ]---
>>
>>
>> Bisect is pointing to this commit. I had a quick look at this and the
>> following fixes it for me ...
>
> Yes, this occurs because the notifier can be triggered before the
> superblock information is filled. Ard fixed this:
>
> https://web.git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git/commit/?h=urgent&id=cb6ae457bc6af58c84a7854df5e7e32ba1c6a715
Thanks for the quick response. I can confirm that this fixes it for me.
Feel free to add my ...
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Cheers
Jon
--
nvpublic
prev parent reply other threads:[~2025-02-28 17:53 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 21:31 [PATCH 0/2] efivarfs: fix hibernation problem with EFI variables James Bottomley
2025-01-07 21:31 ` [PATCH 1/2] efivarfs: abstract initial variable creation routine James Bottomley
2025-01-07 21:31 ` [PATCH 2/2] efivarfs: add variable resync after hibernation James Bottomley
2025-02-28 16:44 ` Jon Hunter
2025-02-28 16:49 ` James Bottomley
2025-02-28 17:53 ` Jon Hunter [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=253edd1f-044b-4d7d-a718-b79015618b80@nvidia.com \
--to=jonathanh@nvidia.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=ardb@kernel.org \
--cc=brauner@kernel.org \
--cc=jk@ozlabs.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=mzxreary@0pointer.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox