From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Russ Anderson <rja@hpe.com>
Cc: Ingo Molnar <mingo@kernel.org>, Steve Wahl <steve.wahl@hpe.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org,
Linux regressions mailing list <regressions@lists.linux.dev>,
Pavin Joseph <me@pavinjoseph.com>,
stable@vger.kernel.org, Eric Hagberg <ehagberg@gmail.com>,
Simon Horman <horms@verge.net.au>,
Dave Young <dyoung@redhat.com>, Sarah Brofeldt <srhb@dbc.dk>,
Dimitri Sivanich <sivanich@hpe.com>
Subject: Re: [PATCH] x86/mm/ident_map: Use full gbpages in identity maps except on UV platform.
Date: Mon, 25 Mar 2024 10:04:41 -0500 [thread overview]
Message-ID: <87o7b273p2.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20240325020334.GA10309@hpe.com> (Russ Anderson's message of "Sun, 24 Mar 2024 21:03:34 -0500")
Russ Anderson <rja@hpe.com> writes:
> On Sun, Mar 24, 2024 at 11:31:39AM +0100, Ingo Molnar wrote:
>>
>> * Steve Wahl <steve.wahl@hpe.com> wrote:
>>
>> > Some systems have ACPI tables that don't include everything that needs
>> > to be mapped for a successful kexec. These systems rely on identity
>> > maps that include the full gigabyte surrounding any smaller region
>> > requested for kexec success. Without this, they fail to kexec and end
>> > up doing a full firmware reboot.
>> >
>> > So, reduce the use of GB pages only on systems where this is known to
>> > be necessary (specifically, UV systems).
>> >
>> > Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
>> > Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
>> > Reported-by: Pavin Joseph <me@pavinjoseph.com>
>>
>> Sigh, why was d794734c9bbf marked for a -stable backport? The commit
>> never explains ...
>
> I will try to explain, since Steve is offline. That commit fixes a
> legitimate bug where more address range is mapped (1G) than the
> requested address range. The fix avoids the issue of cpu speculativly
> loading beyond the requested range, which inludes specutalive loads
> from reserved memory. That is why it was marked for -stable.
To call that a bug presumes that the memory type range registers
were not setup properly by the boot firmware.
I think I saw something that the existence of memory type range
registers is changing/has changed in recent cpus, but historically it
has been the job of the memory type range registers to ensure that the
attributes of specific addresses are correct.
The memory attributes should guide the speculation.
To depend upon page tables to ensure the attributes are correct would
presumably require a cpu that does not have support for disabling page
tables in 32bit mode and does not have 16bit mode.
On older systems (I haven't looked lately) I have seen all kinds of
oddities in the descriptions of memory. Like not describing the memory
at address 0 where the real mode IDT lives. So I am not at all certain
any firmware information can be depended upon or reasonably expected to
be complete. For a while there was no concept of firmware memory areas
so on some older systems it was actually required for their to be gaps
in the description of memory provided to the system, so that operating
systems would not touch memory used by the firmware.
Which definitely means in the case of kexec there are legitimate reasons
to access memory areas that are well known but have not always been
descried by the boot firmware. So the assertion that it is necessarily
a firmware bug for not describing all of memory of memory is at least
historically incorrect on x86_64.
There may be different requirements for the kexec identity map and the
ordinary kernel boot type memory map and as we look at solutions that
can reasonably be explored
> Some memory ends up not being mapped, but it is not
> clear if it is due to some other bug, such as bios not accurately
> providing the right memory map or some other kernel code path did
> not map what it should.
> The 1G mapping covers up that type issue.
I have seen this assertion repeated several times, and at least
historically on x86_64 it is most definitely false. The E820 map which
was the primary information source for a long time could not describe
all of memory so depending upon it to be complete is erroneous.
>> When there's boot breakage with new patches, we back out the bad patch
>> and re-try in 99.9% of the cases.
>
> Steve can certainly merge his two patches and resubmit, to replace the
> reverted original patch. He should be on in the morning to speak for
> himself.
I am going to push back and suggest that this is perhaps a bug in the
HPE UV systems firmware not setting up the cpus memory type range
registers correctly.
Unless those systems are using new fangled cpus that don't have 16bit
and 32bit support, and don't implement memory type range registers,
I don't see how something that only affects HPE UV systems could be
anything except an HPE UV specific bug.
Eric
next prev parent reply other threads:[~2024-03-25 15:05 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-22 16:21 [PATCH] x86/mm/ident_map: Use full gbpages in identity maps except on UV platform Steve Wahl
2024-03-22 16:27 ` Dave Hansen
2024-03-22 17:31 ` Eric W. Biederman
2024-03-22 17:40 ` Dave Hansen
2024-03-22 17:43 ` Dave Hansen
2024-03-22 18:06 ` Steve Wahl
2024-03-22 18:05 ` Steve Wahl
2024-03-22 23:29 ` Dave Hansen
2024-03-24 4:45 ` Eric W. Biederman
2024-03-24 18:16 ` Dave Hansen
2024-03-25 19:15 ` Steve Wahl
2024-03-24 10:31 ` Ingo Molnar
2024-03-25 2:03 ` Russ Anderson
2024-03-25 10:58 ` Ingo Molnar
2024-04-05 13:13 ` Eric Hagberg
2024-04-05 13:35 ` Greg KH
2024-03-25 15:04 ` Eric W. Biederman [this message]
2024-03-25 19:41 ` Steve Wahl
2024-03-27 12:57 ` Eric W. Biederman
2024-03-27 15:33 ` Steve Wahl
2024-03-28 5:05 ` Eric W. Biederman
2024-03-28 15:38 ` Steve Wahl
2024-03-31 3:46 ` Eric W. Biederman
2024-04-01 15:15 ` Steve Wahl
2024-04-01 18:03 ` Dave Hansen
2024-04-01 18:49 ` Steve Wahl
2024-04-04 19:56 ` Steve Wahl
2024-03-25 19:22 ` Steve Wahl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7b273p2.fsf@email.froward.int.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dyoung@redhat.com \
--cc=ehagberg@gmail.com \
--cc=horms@verge.net.au \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=me@pavinjoseph.com \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=regressions@lists.linux.dev \
--cc=rja@hpe.com \
--cc=sivanich@hpe.com \
--cc=srhb@dbc.dk \
--cc=stable@vger.kernel.org \
--cc=steve.wahl@hpe.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox