From: David Woodhouse <dwmw2@infradead.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@kernel.org>,
linux-pm <linux-pm@vger.kernel.org>,
Marc Zyngier <maz@kernel.org>,
linux-arm-kernel@lists.infradead.org, "Saidi,
Ali" <alisaidi@amazon.com>,
"oliver.upton" <oliver.upton@linux.dev>,
Joey Gouly <joey.gouly@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Zenghui Yu <yuzenghui@huawei.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
"Heyne, Maximilian" <mheyne@amazon.de>,
Alexander Graf <graf@amazon.com>,
"Stamatis, Ilias" <ilstam@amazon.com>
Subject: Re: Memory corruption after resume from hibernate with Arm GICv3 ITS
Date: Thu, 24 Jul 2025 15:48:51 +0200 [thread overview]
Message-ID: <aefe451c31d693cfe8f3cc157fd0009040fd48c7.camel@infradead.org> (raw)
In-Reply-To: <CAJZ5v0iF7xAF105byp4j777Aks8KDKAh0-hJyfzkUFq5pm-JVQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4731 bytes --]
On Thu, 2025-07-24 at 11:51 +0200, Rafael J. Wysocki wrote:
>
> > So the hibernated kernel seems to be doing the right thing in both
> > suspend and resume phases but it looks like the *boot* kernel doesn't
> > call the suspend method before transitioning;
>
> No, it does this, but the messages are missing from the log.
>
> The last message you see from the boot/restore kernel is about loading
> the image; a lot of stuff happens afterwards.
>
> This message:
>
> [ 1.871617] PM: hibernation: Read 462616 kbytes in 0.47 seconds (984.28 MB/s)
>
> is printed by load_compressed_image() which gets called by
> swsusp_read(), which is invoked by load_image_and_restore().
>
> It is successful, so hibernation_restore() gets called and it does
> quite a bit of work, including calling resume_target_kernel(), which
> among other things calls syscore_suspend(), from where your messages
> should be printed if I'm not mistaken.
>
> I have no idea why those messages don't get into the log (that would
> happen if your boot kernel were different from the image kernel and it
> didn't actually print them).
This is serial console output (stdout from 'qemu -serial mon:stdio'). I
guess the missing messages were in the printk buffer of the boot kernel
but just didn't get flushed? I added some --trace arguments to qemu to
see what's actually happening.
So when resuming, the boot looks like this:
gicv3_its_process_command GICv3 ITS: processing command at offset 0x4: 0x8
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10f0a20 V 1
gicv3_its_dte_write GICv3 ITS: Device Table write for DeviceID 0x10: valid 1 size 0x6 ITTaddr 0x10f0a20
[ 27.440351] its_build_mapd_cmd dev 0x10 valid 1 addr 0x10f0a2000
And then the transition goes:
[ 47.668973] PM: Image loading progress: 90%
[ 48.030462] PM: Image loading progress: 100%
[ 48.031307] PM: Image loading done
[ 48.031773] PM: hibernation: Read 460728 kbytes in 13.11 seconds (35.14 MB/s)
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10f0a20 V 0
gicv3_its_dte_write GICv3 ITS: Device Table write for DeviceID 0x10: valid 0 size 0x6 ITTaddr 0x10f0a20
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10e3130 V 1
gicv3_its_dte_write GICv3 ITS: Device Table write for DeviceID 0x10: valid 1 size 0x6 ITTaddr 0x10e3130
[ 178.261284] Disabling non-boot CPUs ...
[ 178.261674] its_save_disable
[ 178.261674] its_build_mapd_cmd dev 0x10 valid 0 addr 0x10e313000
[ 178.261674] PM: hibernation: Creating image:
[ 178.261674] PM: hibernation: Need to copy 152532 pages
[ 178.261674] hibernate: Restored 0 MTE pages
[ 178.261674] its_restore_enable
[ 178.261674] its_build_mapd_cmd dev 0x10 valid 1 addr 0x10e313000
[ 178.831481] OOM killer enabled.
[ 178.831614] Restarting tasks: Starting
So we don't see the *printk* from the boot kernel, as you said. But it
*is* unmapping from the old address (MAPD, ITT_addr 0x10f0a20, Valid 0)
before the resumed kernel does the map at the address *it* was using
(MAPD, ITT_addr 0x10e3130, Valid 1). Looking just at the MAPD traces:
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10e3130 V 1 ← Original clean boot
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10e3130 V 0 ← Prior to generating hibernate image
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10e3130 V 1 ← Before *writing* hibernate image and powering down (actually reboot in this case)
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10f0a20 V 1 ← Boot kernel starting up prior to resume
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10f0a20 V 0 ← Boot kernel unmapping when we don't see its printk
gicv3_its_cmd_mapd GICv3 ITS: command MAPD DeviceID 0x10 Size 0x6 ITT_addr 0x10e3130 V 1 ← Hibernated kernel remapping the ITT
So it looks like my test patch is doing the right thing, at least for
hibernation? I'm not sure about kexec?
There are also *other* tables where the GIC scribbles on memory, for
pending interrupts for KVM guests (vLPI pending tables). We've had
problems with those too¹, causing machines to crash on kexec because
the GIC scribbles on pages which are *actually* now the new kernel's
text. I'm not sure if we should try to come up with a unified solution
for that or deal with them separately... the solution there seems to
involve iterating ∀ kvm ∀ vCPU so I suspect it does need to live in
KVM.
¹ https://lore.kernel.org/all/20250623132714.965474-2-dwmw2@infradead.org/
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
prev parent reply other threads:[~2025-07-24 13:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-23 10:04 Memory corruption after resume from hibernate with Arm GICv3 ITS David Woodhouse
2025-07-24 9:25 ` David Woodhouse
2025-07-24 9:51 ` Rafael J. Wysocki
2025-07-24 13:48 ` David Woodhouse [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aefe451c31d693cfe8f3cc157fd0009040fd48c7.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=alisaidi@amazon.com \
--cc=catalin.marinas@arm.com \
--cc=graf@amazon.com \
--cc=ilstam@amazon.com \
--cc=joey.gouly@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mheyne@amazon.de \
--cc=oliver.upton@linux.dev \
--cc=pavel@kernel.org \
--cc=rafael@kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).