public inbox for linux-efi@vger.kernel.org
 help / color / mirror / Atom feed
From: Jon Hunter <jonathanh@nvidia.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
	linux-fsdevel@vger.kernel.org, linux-efi@vger.kernel.org,
	linux-pm@vger.kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>, Jeremy Kerr <jk@ozlabs.org>,
	Christian Brauner <brauner@kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Lennart Poettering <mzxreary@0pointer.de>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>
Subject: Re: [PATCH 2/2] efivarfs: add variable resync after hibernation
Date: Fri, 28 Feb 2025 17:53:43 +0000	[thread overview]
Message-ID: <253edd1f-044b-4d7d-a718-b79015618b80@nvidia.com> (raw)
In-Reply-To: <3c371af00ce12575ab11522189cd37d4167cc3aa.camel@HansenPartnership.com>


On 28/02/2025 16:49, James Bottomley wrote:
> On Fri, 2025-02-28 at 16:44 +0000, Jon Hunter wrote:
>> Hi James,
>>
>> On 07/01/2025 21:31, James Bottomley wrote:
>>> Hibernation allows other OSs to boot and thus the variable state
>>> might
>>> be altered by the time the hibernation image is resumed.  Resync
>>> the
>>> variable state by looping over all the dentries and update the size
>>> (in case of alteration) delete any which no-longer exist.  Finally,
>>> loop over all efi variables creating any which don't have
>>> corresponding dentries.
>>>
>>> Signed-off-by: James Bottomley
>>> <James.Bottomley@HansenPartnership.com>
>>> ---
>>>    fs/efivarfs/internal.h |   3 +-
>>>    fs/efivarfs/super.c    | 151
>>> ++++++++++++++++++++++++++++++++++++++++-
>>>    fs/efivarfs/vars.c     |   5 +-
>>>    3 files changed, 155 insertions(+), 4 deletions(-)
>>
>> ...
>>    
>>> +static int efivarfs_pm_notify(struct notifier_block *nb, unsigned
>>> long action,
>>> +			      void *ptr)
>>> +{
>>> +	struct efivarfs_fs_info *sfi = container_of(nb, struct
>>> efivarfs_fs_info,
>>> +						    pm_nb);
>>> +	struct path path = { .mnt = NULL, .dentry = sfi->sb-
>>>> s_root, };
>>> +	struct efivarfs_ctx ectx = {
>>> +		.ctx = {
>>> +			.actor	= efivarfs_actor,
>>> +		},
>>> +		.sb = sfi->sb,
>>> +	};
>>> +	struct file *file;
>>> +	static bool rescan_done = true;
>>> +
>>> +	if (action == PM_HIBERNATION_PREPARE) {
>>> +		rescan_done = false;
>>> +		return NOTIFY_OK;
>>> +	} else if (action != PM_POST_HIBERNATION) {
>>> +		return NOTIFY_DONE;
>>> +	}
>>> +
>>> +	if (rescan_done)
>>> +		return NOTIFY_DONE;
>>> +
>>> +	pr_info("efivarfs: resyncing variable state\n");
>>> +
>>> +	/* O_NOATIME is required to prevent oops on NULL mnt */
>>> +	file = kernel_file_open(&path, O_RDONLY | O_DIRECTORY |
>>> O_NOATIME,
>>> +				current_cred());
>>> +	if (!file)
>>> +		return NOTIFY_DONE;
>>> +
>>> +	rescan_done = true;
>>> +
>>> +	/*
>>> +	 * First loop over the directory and verify each entry
>>> exists,
>>> +	 * removing it if it doesn't
>>> +	 */
>>> +	file->f_pos = 2;	/* skip . and .. */
>>> +	do {
>>> +		ectx.dentry = NULL;
>>> +		iterate_dir(file, &ectx.ctx);
>>> +		if (ectx.dentry) {
>>> +			pr_info("efivarfs: removing variable
>>> %pd\n",
>>> +				ectx.dentry);
>>> +			simple_recursive_removal(ectx.dentry,
>>> NULL);
>>> +			dput(ectx.dentry);
>>> +		}
>>> +	} while (ectx.dentry);
>>> +	fput(file);
>>> +
>>> +	/*
>>> +	 * then loop over variables, creating them if there's no
>>> matching
>>> +	 * dentry
>>> +	 */
>>> +	efivar_init(efivarfs_check_missing, sfi->sb, false);
>>> +
>>> +	return NOTIFY_OK;
>>> +}
>>
>>
>> With the current mainline I have observed the following crash when
>> testing suspend on one of our Tegra devices ...
>>
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Feb 28 16:25:55
>> 2025
>> [  246.593485] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000068
>> [  246.602601] Mem abort info:
>> [  246.602603]   ESR = 0x0000000096000004
>> [  246.602605]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [  246.602608]   SET = 0, FnV = 0
>> [  246.602610]   EA = 0, S1PTW = 0
>> [  246.602612]   FSC = 0x04: level 0 translation fault
>> [  246.602615] Data abort info:
>> [  246.602617]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> [  246.634959]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> [  246.634961]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [  246.634964] user pgtable: 4k pages, 48-bit VAs,
>> pgdp=0000000105205000
>> [  246.634967] [0000000000000068] pgd=0000000000000000,
>> p4d=0000000000000000
>> [  246.634974] Internal error: Oops: 0000000096000004 [#1] PREEMPT
>> SMP
>> [  246.665796] Modules linked in: qrtr bridge stp llc usb_f_ncm
>> usb_f_mass_storage usb_f_acm u_serial usb_f_rndis u_ether
>> libcomposite tegra_drm btusb btrtl drm_dp_aux_bus btintel cec nvme
>> btmtk nvme_core btbcm drm_display_helper bluetoot
>> h snd_soc_tegra210_admaif drm_client_lib snd_soc_tegra210_dmic
>> snd_soc_tegra186_asrc snd_soc_tegra210_mixer snd_soc_tegra_pcm
>> snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_adx
>> snd_soc_tegra210_amx snd_soc_tegra210_sfc snd_soc
>> _tegra210_i2s drm_kms_helper ecdh_generic tegra_se ucsi_ccg ecc
>> snd_hda_codec_hdmi typec_ucsi snd_soc_tegra_audio_graph_card
>> snd_soc_tegra210_ahub rfkill tegra210_adma snd_hda_tegra
>> crypto_engine snd_soc_audio_graph_card typec snd_hda_cod
>> ec snd_soc_simple_card_utils tegra_aconnect arm_dsu_pmu
>> snd_soc_rt5640 snd_hda_core tegra_xudc ramoops phy_tegra194_p2u
>> snd_soc_rl6231 at24 pcie_tegra194 tegra_bpmp_thermal host1x
>> reed_solomon lm90 pwm_tegra ina3221 pwm_fan fuse drm backl
>> ight dm_mod ip_tables x_tables ipv6
>> [  246.756182] CPU: 9 UID: 0 PID: 1255 Comm: rtcwake Not tainted
>> 6.14.0-rc4-g8d538b296d56 #61
>> [  246.764677] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer
>> Kit/Jetson, BIOS 00.0.0-dev-main_88214_5a0f5_a213e 02/26/2025
>> [  246.776569] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [  246.783718] pc : efivarfs_pm_notify+0x48/0x180
>> [  246.788285] lr : blocking_notifier_call_chain_robust+0x78/0xe4
>> [  246.794286] sp : ffff800085cebb60
>> [  246.797684] x29: ffff800085cebb60 x28: ffff000093f1b480 x27:
>> 0000000000000000
>> [  246.805021] x26: 0000000000000004 x25: ffff8000828d3638 x24:
>> 0000000000000003
>> [  246.812355] x23: 0000000000000000 x22: 0000000000000005 x21:
>> 0000000000000006
>> [  246.819694] x20: ffff000087f698c0 x19: 0000000000000003 x18:
>> 000000007f19bcda
>> [  246.827029] x17: 00000000c42545a5 x16: 000000000000001c x15:
>> 000000001709e0a9
>> [  246.834372] x14: 00000000ac6b3a37 x13: 18286bf36c021b08 x12:
>> 00000039694cf81c
>> [  246.841713] x11: 00000000f1e0faad x10: 0000000000000001 x9 :
>> 000000004ff99d57
>> [  246.849046] x8 : 00000000bb51f9d6 x7 : 00000001f4d4185c x6 :
>> 00000001f4d4185c
>> [  246.856382] x5 : ffff800085cebb58 x4 : ffff800080952930 x3 :
>> ffff8000804cba44
>> [  246.863713] x2 : 0000000000000000 x1 : 0000000000000003 x0 :
>> ffff8000804cbf84
>> [  246.871045] Call trace:
>> [  246.873550]  efivarfs_pm_notify+0x48/0x180 (P)
>> [  246.878119]  blocking_notifier_call_chain_robust+0x78/0xe4
>> [  246.883753]  pm_notifier_call_chain_robust+0x28/0x48
>> [  246.888852]  pm_suspend+0x138/0x1c8
>> [  246.892438]  state_store+0x8c/0xfc
>> [  246.895931]  kobj_attr_store+0x18/0x2c
>> [  246.899791]  sysfs_kf_write+0x44/0x54
>> [  246.903553]  kernfs_fop_write_iter+0x118/0x1a8
>> [  246.908113]  vfs_write+0x2b0/0x35c
>> [  246.911608]  ksys_write+0x68/0xfc
>> [  246.915013]  __arm64_sys_write+0x1c/0x28
>> [  246.919038]  invoke_syscall+0x48/0x110
>> [  246.922897]  el0_svc_common.constprop.0+0x40/0xe8
>> [  246.927731]  do_el0_svc+0x20/0x2c
>> [  246.931127]  el0_svc+0x30/0xd0
>> [  246.934265]  el0t_64_sync_handler+0x144/0x168
>> [  246.938737]  el0t_64_sync+0x198/0x19c
>> [  246.942505] Code: f9400682 f90027ff a906ffe2 f100043f (f9403440)
>> [  246.948767] ---[ end trace 0000000000000000 ]---
>>
>>
>> Bisect is pointing to this commit. I had a quick look at this and the
>> following fixes it for me ...
> 
> Yes, this occurs because the notifier can be triggered before the
> superblock information is filled.  Ard fixed this:
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git/commit/?h=urgent&id=cb6ae457bc6af58c84a7854df5e7e32ba1c6a715


Thanks for the quick response. I can confirm that this fixes it for me. 
Feel free to add my ...

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic


      reply	other threads:[~2025-02-28 17:53 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-07 21:31 [PATCH 0/2] efivarfs: fix hibernation problem with EFI variables James Bottomley
2025-01-07 21:31 ` [PATCH 1/2] efivarfs: abstract initial variable creation routine James Bottomley
2025-01-07 21:31 ` [PATCH 2/2] efivarfs: add variable resync after hibernation James Bottomley
2025-02-28 16:44   ` Jon Hunter
2025-02-28 16:49     ` James Bottomley
2025-02-28 17:53       ` Jon Hunter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=253edd1f-044b-4d7d-a718-b79015618b80@nvidia.com \
    --to=jonathanh@nvidia.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=ardb@kernel.org \
    --cc=brauner@kernel.org \
    --cc=jk@ozlabs.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mzxreary@0pointer.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox