All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Jason Andryuk" <jason.andryuk@amd.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH 00/12] const-ify vendor checks
Date: Mon, 9 Feb 2026 15:37:59 +0100	[thread overview]
Message-ID: <DGAIAEFBG8RK.3EP3TSR06KTOZ@amd.com> (raw)
In-Reply-To: <57c4d4d5-5588-40cf-919c-1e337328858f@suse.com>

On Mon Feb 9, 2026 at 1:52 PM CET, Jan Beulich wrote:
> On 09.02.2026 12:56, Alejandro Vallejo wrote:
>> On Mon Feb 9, 2026 at 11:15 AM CET, Jan Beulich wrote:
>>> On 09.02.2026 11:05, Alejandro Vallejo wrote:
>>>> On Mon Feb 9, 2026 at 10:21 AM CET, Jan Beulich wrote:
>>>>> On 06.02.2026 17:15, Alejandro Vallejo wrote:
>>>>>> High level description
>>>>>> ======================
>>>>>>
>>>>>> When compared to the RFC this makes a different approach The series introduces
>>>>>> cpu_vendor() which maps to a constant in the single vendor case and to
>>>>>> (boot_cpu_data.vendor & X86_ENABLED_VENDORS), where X86_ENABLED_VENDORS is a
>>>>>> mask of the compile-time chosen vendors. This enables the compiler to detect
>>>>>> dead-code at the uses and remove all unreachable branches, including in
>>>>>> switches.
>>>>>>
>>>>>> When compared to the x86_vendor_is() macro introduced in the RFC, this is
>>>>>> simpler. It achieves MOST of what the older macro did without touching the
>>>>>> switches, with a few caveats:
>>>>>>
>>>>>>   1. Compiled-out vendors cause a panic, they don't fallback onto the unknown
>>>>>>      vendor case. In retrospect, this is a much saner thing to do.
>>>>>
>>>>> I'm unconvinced here. Especially our Centaur and Shanghai support is at best
>>>>> rudimentary. Treating those worse when configured-out than when configured-in
>>>>> feels questionable.
>>>>
>>>> Isn't that the point of configuring out?
>>>
>>> That's what I'm unsure about.
>> 
>> I'm really missing what you're trying to make, sorry. How, if at all, is it
>> helpful for a hypervisor with a compiled out vendor to be bootable on a machine
>> of that vendor?
>
> No more and no less than for a system with CPUs from a vendor we don't have
> support for at all. Let's assume someone wants to start adding support for
> a new vendor. They may first try Xen as-is. This wouldn't panic.
> on how exactly they would start adding stuff, Xen might suddenly panic,
> despite functionally nothing having changed.

There's an "unknown CPU vendor" option for that. With that option enabled
unknown vendor keep working as intended. If it booted in the first place it's
because the unknown vendor option was enabled. The panic would happen iff they
add their own vendor to the code, hook it up on the lookup function and miss
the X86_ENABLED_VENDORS addition. Which, incidentally, would notify them right
away through a panic rather than having them waste time trying to trigger code
DCE is intentionally removing.

>
>>>> Besides the philosophical matter of whether or not a compiled-out vendor
>>>> should be allowed to run there's the more practical matter of what to do
>>>> with the x86_vendor field of boot_cpu_data. Because at that point our take
>>>> that cross-vendor support is forbidden is a lot weaker. If I can run an
>>>> AMD-hypervisor on an Intel host, what then? What policies would be allowed? If I
>>>> wipe x86_vendor then policies with some unknown vendor would be fine. Should the
>>>> leaves match too? If I do not wipe the field, should I do black magic to ensure
>>>> the behaviour is different depending on whether the vendor is compiled in or
>>>> not? What if I want to migrate a VM currently running in this hypothetical
>>>> hypervisor? The rules becomes seriously complex.
>>>>
>>>> It's just a lot cleaner to take the stance that compiled out vendors can't run.
>>>> Then everything else is crystal clear and we avoid a universe's worth of corner
>>>> cases. I expect upstream Xen to support all cases (I'm skeptical about the
>>>> utility of the unknown vendor path, but oh well), but many downstreams might
>>>> benefit from killing off support for vendors they really will never touch.
>>>
>>> To them, will panic()ing (or not) make a difference?
>> 
>> One would hope not because the're compiling them out for a reason.
>> But for upstream, not panicking brings a sea of corner cases. The ones I
>> mentioned above is not the whole list.
>> 
>> Turning the question around. Who benefits from not panicking?
>
> Certain things may work. But more generally - see above. Turning this
> question around also isn't quite appropriate imo: You want to introduce
> the panic(), so it's on you to justify doing so (which includes making
> clear why omitting that small piece of code would be a bad idea).

The justification is twofold. To aleviate complexity in Xen, and fullful a
security invariant in the presence of misuse. Letting a guest run is easy (not
quite a oneliner, but not much more). However it raises a number of questions
with complicated answers:

Save/restore | Live migration
=============================

The biggest hurdle, as I mentioned, is live-migrations/save-restore.

If I xl restore a Hygon VM on a host running in "unknown" mode (being an
AMD-only hypervisor with the unknown vendor cfg enabled on an Centaur host): What
should the vendor policy check say when I restore?

    a. They are incompatible: Because Hygon is compiled out.
    b. They are incompatible: Because Hygon doesn't match the vendor bytes of the host.
    b. They are compatible: Because they are both "unknown".
    c. What if the host _was_ Hygon, but Hygon was disabled?

It's just a mess. And the fact that the unknown vendor path has even less
testing than that of Centaur we can probably count on it being extremely buggy.

And on the topic of the consequences of bugs, there's the other argument...

Security argument
=================

If an AMD-only hypervisor runs on Intel hardware it just wouldn't be safe. We
can make statements that this configuration being unsupported and known unsafe,
and everything along those lines but mistakes happen. A mistake in production
where a vendor-trimmed hypervisor lands on the very trimmed vendor means a
security hole. And saying "that's a 'you' problem" doesn't cut it, because I can
make the same statement to users and developers of new vendors.

I can only see very theoretical use cases for the lack of a panic and a very
clear danger of misuse. early_cpu_init() runs after COM ports are configured so
it's not like we'd reboot without stating why.

================================================

If it's a hard requirement, I can change it so the panic behaves as it did
on the RFC, but IMO it creates real problems for the benefit of imaginary use
cases.

Cheers,
Alejandro


      reply	other threads:[~2026-02-09 14:38 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 16:15 [PATCH 00/12] const-ify vendor checks Alejandro Vallejo
2026-02-06 16:15 ` [PATCH 01/12] x86: Reject CPU policies with vendors other than the host's Alejandro Vallejo
2026-02-10  8:19   ` Roger Pau Monné
2026-02-10 10:11     ` Alejandro Vallejo
2026-02-11 15:41   ` Jan Beulich
2026-02-11 17:41     ` Alejandro Vallejo
2026-02-12  7:16       ` Jan Beulich
2026-02-06 16:15 ` [PATCH 02/12] x86: Add more granularity to the vendors in Kconfig Alejandro Vallejo
2026-02-10  8:26   ` Roger Pau Monné
2026-02-10 10:04     ` Alejandro Vallejo
2026-02-11 16:06   ` Jan Beulich
2026-02-11 17:51     ` Alejandro Vallejo
2026-02-12  7:24       ` Jan Beulich
2026-02-06 16:15 ` [PATCH 03/12] x86: Add cpu_vendor() as a wrapper for the host's CPU vendor Alejandro Vallejo
2026-02-10  8:46   ` Roger Pau Monné
2026-02-10 10:35     ` Alejandro Vallejo
2026-02-10 12:06       ` Roger Pau Monné
2026-02-11 16:04   ` Jan Beulich
2026-02-11 17:35     ` Alejandro Vallejo
2026-02-11 17:57       ` Alejandro Vallejo
2026-02-12 10:52   ` Jan Beulich
2026-02-12 14:36     ` Alejandro Vallejo
2026-02-06 16:15 ` [PATCH 04/12] x86: Migrate MSR handler vendor checks to cpu_vendor() Alejandro Vallejo
2026-02-11 16:15   ` Jan Beulich
2026-02-06 16:15 ` [PATCH 05/12] x86: Migrate spec_ctrl " Alejandro Vallejo
2026-02-12 10:49   ` Jan Beulich
2026-02-12 14:55     ` Alejandro Vallejo
2026-02-06 16:15 ` [PATCH 06/12] x86: Migrate switch " Alejandro Vallejo
2026-02-12 11:06   ` Jan Beulich
2026-02-12 15:06     ` Alejandro Vallejo
2026-02-06 16:15 ` [PATCH 07/12] x86: Have x86_emulate/ implement the single-vendor optimisation Alejandro Vallejo
2026-02-12 11:26   ` Jan Beulich
2026-02-12 15:29     ` Alejandro Vallejo
2026-02-06 16:15 ` [PATCH 07/12] x86: Migrate x86_emulate/ to use cpu_vendor() Alejandro Vallejo
2026-02-12 11:31   ` Jan Beulich
2026-02-12 15:30     ` Alejandro Vallejo
2026-02-06 16:15 ` [PATCH 08/12] x86/acpi: Migrate vendor checks to cpu_vendor() Alejandro Vallejo
2026-02-12 11:52   ` Jan Beulich
2026-02-12 15:34     ` Alejandro Vallejo
2026-02-12 15:52       ` Jan Beulich
2026-02-06 16:15 ` [PATCH 09/12] x86/pv: " Alejandro Vallejo
2026-02-06 16:15 ` [PATCH 10/12] x86/mcheck: Migrate vendor checks to use cpu_vendor() Alejandro Vallejo
2026-02-12 12:02   ` Jan Beulich
2026-02-12 12:21     ` Jan Beulich
2026-02-12 15:46     ` Alejandro Vallejo
2026-02-06 16:15 ` [PATCH 11/12] x86/cpu: " Alejandro Vallejo
2026-02-12 13:17   ` Jan Beulich
2026-02-06 16:15 ` [PATCH 12/12] x86: Migrate every remaining raw vendor check to cpu_vendor() Alejandro Vallejo
2026-02-12 13:29   ` Jan Beulich
2026-02-09  9:21 ` [PATCH 00/12] const-ify vendor checks Jan Beulich
2026-02-09 10:05   ` Alejandro Vallejo
2026-02-09 10:15     ` Jan Beulich
2026-02-09 11:56       ` Alejandro Vallejo
2026-02-09 12:52         ` Jan Beulich
2026-02-09 14:37           ` Alejandro Vallejo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DGAIAEFBG8RK.3EP3TSR06KTOZ@amd.com \
    --to=alejandro.garciavallejo@amd.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jason.andryuk@amd.com \
    --cc=jbeulich@suse.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.