All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen Summit 2025 - Design Discussion Notes - Xen ABI
@ 2025-09-17  4:49 Alexander M. Merritt
  2025-09-17 14:22 ` Jan Beulich
  2025-09-17 17:40 ` Alex Brett
  0 siblings, 2 replies; 3+ messages in thread
From: Alexander M. Merritt @ 2025-09-17  4:49 UTC (permalink / raw)
  To: xen-devel

Hi all, it was requested I send the notes I took during the design discussion on the ABI / APIs to the list.

Normally I keep this as personal notes, so there may be errors (esp if I did not hear correctly), so please feel free to correct or expand. Details may be missing where I am unaware of the history behind something.

-Alex Merritt




Design Discussion: Xen ABIs and APIs

- Chris on remote: Andrew has been and wants to work on a new ABI
- Andrew: put together a collection of documents to understand what we have
to work with, what we want to improve, before starting the work on any
design or iterations on interfaces we currently have
- link to document in the design session
        ‐ https://design-sessions.xenproject.org/uid/discussion/disc_3IEQbyaCTkqLf2fFzoze/view
- number of things we have been aware of for a while
- some attempts to address them on the list
- one problem: if you only try to fix one of them, it brings in discussion of
fixing many other items
- everyone has opinion on what the end result will look like
- existing designs only fix subsets, not the whole thing
- we want to address all the problems from the start, before deciding on a plan
to fix them
- enumerate the ABIs and APIs that currently exist
        ‐ problems not apparent if you just think about this
        ‐ many folks think this is just the hypercalls
        ‐ there is the enumeration information
        ‐ xen has many bugs - originally monorepo with xen, linux, qemu, BSDs,
        bochs, ... with “make world” you got a system. All guests were
        required to have event channel - no discovery exists because they all
        had it
        ‐ grant table v2, migrate old version of xen to new, exercise new code
        paths, then kernel crashed
        ‐ initial state of vcpus - many folks don’t think about them, but what
        xen presents, we have bugs describing those via the hypercalls we use
        ‐ the hypercalls themselves -- 46? -- half of them specific for PV guests
                - x86 HVM / ARM HVM are only a small fraction of the total
                hypercalls that exist
- the reason the hypercalls look like this now, Xen started with pv guests on
x86, a VAS system made sense
        ‐ when HVM guests came along, we have hacks fitting PV guests into HVM
        ‐ Xen has to walk the page tables of the guest just to get the
        information it needs, you cannot do that in encrypted VMs by design
        ‐ need to change the way we deal with pointers in the API
- evtchn send, pass pointer information on the stack
        ‐ get interrupt for someone else!
- look over all APIs and ABIs that exist because they have different problems in
different areas
- XenServer cares most about right now host UEFI secure boot
        ‐ new priv boundary that does not exist previously
        ‐ admin with root cannot (should not) violate security boundary, cannot
        read/write arbitrary memroy
        ‐ hypercalls: open /dev/xen/privcmd and pointers into user space
        memory, nothing stops passing kernel pointer memory
                - giant privilege escalation hole in UEFI secure boot
                - root user space is not priv enough to execute arbitrary code
        ‐ all problems compound, thus we want to look at all of them before we
        start figuring out what to do
- another example: being based on x86 originally, large hypercalls have a shift
by 12, assume 4k pages, problem with ARM wanting 64k page tables
        ‐ event the data layout wants to change
- if you change the version of Xen, you break the user space (library versions)
        ‐ was intentional choice early on, doesn’t scale
        ‐ get rid of unstable APIs -- killing xen
- security hotfix - recompile QEMU
        ‐ ABI rules say any change in hypervisor, thus rebuilding user space,
        and QEMU -- anything that links against the xen packages!
- Bertrand: look at problem yesterday: how we create and configure a guest,
coherency to reach dom0less
        ‐ twice code to create a guest, duplicated code
        ‐ duplicate configuration format
        ‐ if we modify ABI between dom0 and Xen, need to look at have
        coherent format so we can reuse the same code
- Alex M: can we hide hypercalls via libraries?
        ‐ yes but currently the versions for a break
        ‐ definitely an option forward
        ‐ still doesn’t solve the issue, because other libraries in other languages
        won’t be shielded from unstable ABIs
- Jan: both knowing what to do and where we go is useful
        ‐ Andrew: have to have broad idea where to go....
- Jan: carrying out hypercall is independent of the mechansim we define
        ‐ Andrew: still needs backwards compatibility
        ‐ Andrew: use higher op numbers
- Alex M: is our problem unique to us?
        ‐ Andrew: we have enough corner cases that yes
        ‐ Bertrand: PV guests require a large number of hypercalls
        ‐ Jan: keep VA for PV hypercalls
- Rich on call: work together with Chris to write down something difficult in
scope
        ‐ any work written down, useful for folks on other side where we may
        encounter failures
        ‐ newcomers: xen forked by HP (?)
        ‐ everyone tried to narrow to verticle markets, focus on specific markets
        ‐ Xen: is last entity standing, still trying to pull all stakeholders together,
        but not sure how long it will last
        ‐ if collapses: accidental or intentional interoperability, carve out the
        pieces so that the ppl at table today have a chance to know what
        results from it
        ‐ what will last longest: certified entities that have long lifecycles,
        decades or more
        ‐ certified snapshots will become longest lived design choices
- Andrew: shared info page
        ‐ layout was done with unsigned longs which changed sizes
        ‐ layout of the shared info page changes
        ‐ different vcpus can be in different modes at a time
        ‐ we cache the mode of the cpu at the point which it makes one of two
        types of hypercalls
- another design session tomorrow


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Xen Summit 2025 - Design Discussion Notes - Xen ABI
  2025-09-17  4:49 Xen Summit 2025 - Design Discussion Notes - Xen ABI Alexander M. Merritt
@ 2025-09-17 14:22 ` Jan Beulich
  2025-09-17 17:40 ` Alex Brett
  1 sibling, 0 replies; 3+ messages in thread
From: Jan Beulich @ 2025-09-17 14:22 UTC (permalink / raw)
  To: Alexander M. Merritt; +Cc: xen-devel

On 17.09.2025 06:49, Alexander M. Merritt wrote:
> Hi all, it was requested I send the notes I took during the design discussion on the ABI / APIs to the list.

Thanks much. We will want to try to have notes taken more systematically
today.

Jan


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Xen Summit 2025 - Design Discussion Notes - Xen ABI
  2025-09-17  4:49 Xen Summit 2025 - Design Discussion Notes - Xen ABI Alexander M. Merritt
  2025-09-17 14:22 ` Jan Beulich
@ 2025-09-17 17:40 ` Alex Brett
  1 sibling, 0 replies; 3+ messages in thread
From: Alex Brett @ 2025-09-17 17:40 UTC (permalink / raw)
  To: Alexander M. Merritt, xen-devel@lists.xenproject.org

I also took notes during this session - as usual apologies for any misquotes etc:

Christopher Clark (CC): Preparing a collection of docs to understand the problem space and what we want to achieve ahead of doing the work
- List of API/ABIs
- List of known limitations / issues / concerns with them, indexed back to the API/ABIs

Andrew Cooper (AC): Previously tried addressing some of the issues, but addressing only part of it results in a lot of the feedback being "what about this other thing?" etc. Designs presented so far fix part/some of the problems, but not all. Trying this time to identify all the problems before identifying a design to fix them.

First enumerate the APIs and ABIs that currently exist. Set of problems not apparent when you just think about API/ABIs, e.g. people think it's "just hypercalls", but it's not, there's things like enumeration information. Long history of bugs due to Xen originally being a monorepo and having forks of kernel/qemu etc, so many ABIs not properly defined/enumerated. Also e.g. initial state of vCPUs is an ABI. ~46 hypercalls, but around half are specific to PV guests. Hypercalls are the way they are as Xen started with x86 PV guests, HVM guests were created by doing minimal adjustments to PV and thus resulted in lots of legacy poor choices (e.g. having to walk the guest page tables).

Other examples:
- evtchn_send passes a 32-bit pointer to the event channel to use, rather than just the event channel id.
- hypercalls have a shift by 12 assuming 4k pages, causes problems in ARM
- Changing the version of Xen breaks userspace (intentional choice early on, but causes user pain) - unstable ABIs are killing Xen
- Security patches nominally/officially require rebuilding userspace (including qemu)

This is relevant to work like host UEFI secure boot (introduces new privilege boundary - admin with root in dom0 can't violate the MS defined boundary of not reading/writing arbitrary memory, hypercalls work currently with /dev/xen/privcmd which entirely violates this).

All problems compound, want to look at all of them before figuring out solutions.

Bertrand Marquis (BM): Need to look at problem discussed yesterday - how we create/configure a guest (two ways currently, dom0less and xl), duplicating code and configuration format etc. If modifying ABI between dom0 and Xen, can we have a coherent format we can use in multiple places to solve problems like this and prevent duplication / reduce required hypercalls to create a guest.

Alexander Merrit (AM): Regular applications don't make syscalls directly they use libraries, is that an option?
AC: Xen has libraries, but they currently need to be recompiled every time Xen changes
Jan Beulich (JB): This could be solved by having a user library vs the library that actually calls into the hypervisor
AC: Libraries currently all C, would like libraries for other languages that are stable. Need to consider all these possibilities when designing a solution.

JB: Concern about taking a global approach - will changing everything in one go mean we never end?
AC: Idea is to come up with one global design. Parties have specific interests (e.g. XS around secure boot, Vates for SEV) that will lead to implementation, but we should agree the approach up front rather than having people pulling in different directions.

JB: Could we deal with e.g. the hypercall problem up front rather than having to redesign everything else at the same time?
AC: Don't need e.g. function prototypes, but need the broad strokes agreed to ensure things don't diverge later on.

AC: Important to ensure backwards compatibility (can't break HVM guests)

AM: Is the problem unique to us, or are there historical references we can copy?
AC: No one problem we've found is unique, but there's enough overlap between problems in different areas that we can't take something off the shelf
BM: Potential problems with lots of compatibility code etc - do we still need it? (Room: Yes)

AC/JB/BM: discussion around whether to use physical or virtual addresses from PV guests. Currently made HVM consistent with PV, should it be the other way round?

Rich Persaud (RP): Unlikely this task can ever be completed, but any work done will be very useful to people after this task has failed. Xen already forked once (Bromium), other hypervisors focusing on specfic verticals etc. Xen is last entity standing trying to pull all stakeholders together, unclear how long this will last, but the products which will last the longest are certified items, and whatever snapshot is taken at that point will thus be the things which last the longest.

Andrei Smenov (AS): Can Xen and guest negotiate which version to use?
AC/JB: Not really, can be multiple instances within a guest (bootloader, kernel, kexec kernels etc). Another example OVMF coming up to do boot services (blkback) load a file then handing over to the kernel etc (can leave shared info page in a bad state)

AC: Shared info page another problem by itself. Layout done with unsigned longs which change size between 32 and 64 bit code, so layout of shared info page changes. Different vCPUs can be in 32 or 64 bit mode so we cache the mode of the vCPU at the point it makes one of two hypercalls, so final vCPU to crash / go through kexec hypercall will change the format of the page. Something we need to address. Whatever we provide must have in mind aspects like this.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-09-17 17:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-17  4:49 Xen Summit 2025 - Design Discussion Notes - Xen ABI Alexander M. Merritt
2025-09-17 14:22 ` Jan Beulich
2025-09-17 17:40 ` Alex Brett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.