From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Liuqiming (John)" <john.liuqiming@huawei.com>
Subject: Re: [RFC v2] xSplice design
Date: Mon, 18 May 2015 20:54:22 +0800
Message-ID: <5559E0FE.2060307@huawei.com>
References: <20150515194440.GA24313@l.oracle.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <john.liuqiming@huawei.com>) id 1YuKa9-0005D2-JR
	for xen-devel@lists.xenproject.org; Mon, 18 May 2015 12:55:45 +0000
In-Reply-To: <20150515194440.GA24313@l.oracle.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>, jeremy@goop.org, hanweidong@huawei.com, jbeulich@suse.com, Paul Voccio <paul.voccio@rackspace.com>, Daniel Kiper <daniel.kiper@oracle.com>, Major Hayden <major.hayden@rackspace.com>, liuyingdong@huawei.com, aliguori@amazon.com, konrad@darnok.org, xiantao.zxt@alibaba-inc.com, lars.kurth@citrix.com, Steven Wilson <steven.wilson@rackspace.com>, peter.huangpeng@huawei.com, msw@amazon.com, xen-devel@lists.xenproject.org, Rick Harris <rick.harris@rackspace.com>, boris.ostrovsky@oracle.com, Josh Kearney <josh.kearney@rackspace.com>, jinsong.liu@alibaba-inc.com, Antony Messerli <amesserl@rackspace.com>, fanhenglong@huawei.com, andrew.cooper3@citrix.com
List-Id: xen-devel@lists.xenproject.org

Hi Konrad,

Will this design include hotpatch build tools chain?
Such as how these .xplice_ section are created? How to handle xen symbols when creating hotpatch elf file?

On 2015/5/16 3:44, Konrad Rzeszutek Wilk wrote:
> Hey!
>
> During the Xen Hacka^H^H^H^HProject Summit? we chatted about live-patching
> the hypervisor. We sketched out how it could be done, and brainstormed
> some of the problems.
>
> I took that and wrote an design - which is very much RFC. The design is
> laid out in two sections - the format of the ELF payload - and then the
> hypercalls to act on it.
>
> Hypercall preemption has caused a couple of XSAs so I've baked the need
> for that in the design so we hopefully won't have an XSA for this code.
>
> There are two big *TODO* in the design which I had hoped to get done
> before sending this out - however I am going on vacation for two weeks
> so I figured it would be better to send this off for folks to mull now
> then to have it languish.
>
> Please feel free to add more folks on the CC list.
>
> Enjoy!
>
>
> # xSplice Design v1 (EXTERNAL RFC v2)
>
> ## Rationale
>
> A mechanism is required to binarily patch the running hypervisor with new
> opcodes that have come about due to primarily security updates.
>
> This document describes the design of the API that would allow us to
> upload to the hypervisor binary patches.
>
> ## Glossary
>
>   * splice - patch in the binary code with new opcodes
>   * trampoline - a jump to a new instruction.
>   * payload - telemetries of the old code along with binary blob of the new
>     function (if needed).
>   * reloc - telemetries contained in the payload to construct proper trampoline.
>
> ## Multiple ways to patch
>
> The mechanism needs to be flexible to patch the hypervisor in multiple ways
> and be as simple as possible. The compiled code is contiguous in memory with
> no gaps - so we have no luxury of 'moving' existing code and must either
> insert a trampoline to the new code to be executed - or only modify in-place
> the code if there is sufficient space. The placement of new code has to be done
> by hypervisor and the virtual address for the new code is allocated dynamically.
> i
> This implies that the hypervisor must compute the new offsets when splicing
> in the new trampoline code. Where the trampoline is added (inside
> the function we are patching or just the callers?) is also important.
>
> To lessen the amount of code in hypervisor, the consumer of the API
> is responsible for identifying which mechanism to employ and how many locations
> to patch. Combinations of modifying in-place code, adding trampoline, etc
> has to be supported. The API should allow read/write any memory within
> the hypervisor virtual address space.
>
> We must also have a mechanism to query what has been applied and a mechanism
> to revert it if needed.
>
> We must also have a mechanism to: provide an copy of the old code - so that
> the hypervisor can verify it against the code in memory; the new code;
> the symbol name of the function to be patched; or offset from the symbol;
> or virtual address.
>
> The complications that this design will encounter are explained later
> in this document.
>
> ## Patching code
>
> The first mechanism to patch that comes in mind is in-place replacement.
> That is replace the affected code with new code. Unfortunately the x86
> ISA is variable size which places limits on how much space we have available
> to replace the instructions.
>
> The second mechanism is by replacing the call or jump to the
> old function with the address of the new function.
>
> A third mechanism is to add a jump to the new function at the
> start of the old function.
>
> ### Example of trampoline and in-place splicing
>
> As example we will assume the hypervisor does not have XSA-132 (see
> *domctl/sysctl: don't leak hypervisor stack to toolstacks*
> 4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch
> the hypervisor with it. The original code looks as so:
>
> <pre>
>     48 89 e0                  mov    %rsp,%rax
>     48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax
> </pre>
>
> while the new patched hypervisor would be:
>
> <pre>
>     48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)
>     48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)
>     48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)
>     48 89 e0                  mov    %rsp,%rax
>     48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax
> </pre>
>
> This is inside the arch_do_domctl. This new change adds 21 extra
> bytes of code which alters all the offsets inside the function. To alter
> these offsets and add the extra 21 bytes of code we might not have enough
> space in .text to squeze this in.
>
> As such we could simplify this problem by only patching the site
> which calls arch_do_domctl:
>
> <pre>
> <do_domctl>:
>   e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>
> </pre>
>
> with a new address for where the new `arch_do_domctl` would be (this
> area would be allocated dynamically).
>
> Astute readers will wonder what we need to do if we were to patch `do_domctl`
> - which is not called directly by hypervisor but on behalf of the guests via
> the `compat_hypercall_table` and `hypercall_table`.
> Patching the offset in `hypercall_table` for `do_domctl:
> (ffff82d080103079 <do_domctl>:)
> <pre>
>
>   ffff82d08024d490:   79 30
>   ffff82d08024d492:   10 80 d0 82 ff ff
>
> </pre>
> with the new address where the new `do_domctl` is possible. The other
> place where it is used is in `hvm_hypercall64_table` which would need
> to be patched in a similar way. This would require an in-place splicing
> of the new virtual address of `arch_do_domctl`.
>
> In summary this example patched the callee of the affected function by
>   * allocating memory for the new code to live in,
>   * changing the virtual address of all the functions which called the old
>     code (computing the new offset, patching the callq with a new callq).
>   * changing the function pointer tables with the new virtual address of
>     the function (splicing in the new virtual address). Since this table
>     resides in the .rodata section we would need to temporarily change the
>     page table permissions during this part.
>
>
> However it has severe drawbacks - the safety checks which have to make sure
> the function is not on the stack - must also check every caller. For some
> patches this could if there were an sufficient large amount of callers
> that we would never be able to apply the update.
>
> ### Example of different trampoline patching.
>
> An alternative mechanism exists where we can insert an trampoline in the
> existing function to be patched to jump directly to the new code. This
> lessens the locations to be patched to one but it puts pressure on the
> CPU branching logic (I-cache, but it is just one unconditional jump).
>
> For this example we will assume that the hypervisor has not been compiled
> with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures
> for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
> in `xen_version` hypercall. This function is not called **anywhere** in
> the hypervisor (it is called by the guest) but referenced in the
> `compat_hypercall_table` and `hypercall_table` (and indirectly called
> from that). Patching the offset in `hypercall_table` for the old
> `do_xen_version` (ffff82d080112f9e <do_xen_version>)
>
> </pre>
>   ffff82d08024b270 <hypercall_table>
>   ...
>   ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff
>
> </pre>
> with the new address where the new `do_xen_version` is possible. The other
> place where it is used is in `hvm_hypercall64_table` which would need
> to be patched in a similar way. This would require an in-place splicing
> of the new virtual address of `do_xen_version`.
>
> An alternative solution would be to patch insert an trampoline in the
> old `do_xen_version' function to directly jump to the new `do_xen_version`.
>
> <pre>
>   ffff82d080112f9e <do_xen_version>:
>   ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax
>   ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi
>   ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 <do_xen_version+0x534>
> </pre>
>
> with:
>
> <pre>
>   ffff82d080112f9e <do_xen_version>:
>   ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]
> </pre>
>
> which would lessen the amount of patching to just one location.
>
> In summary this example patched the affected function to jump to the
> new replacement function which required:
>   * allocating memory for the new code to live in,
>   * inserting trampoline with new offset in the old function to point to the
>     new function.
>   * Optionally we can insert in the old function an trampoline jump to an function
>     providing an BUG_ON to catch errant code.
>
> The disadvantage of this are that the unconditional jump will consume a small
> I-cache penalty. However the simplicity of the patching of safety checks
> make this a worthwhile option.
>
> ### Security
>
> With this method we can re-write the hypervisor - and as such we **MUST** be
> diligent in only allowing certain guests to perform this operation.
>
> Furthermore with SecureBoot or tboot, we **MUST** also verify the signature
> of the payload to be certain it came from a trusted source.
>
> As such the hypercall **MUST** support an XSM policy to limit the what
> guest is allowed. If the system is booted with signature checking the
> signature checking will be enforced.
>
> ## Payload format
>
> The payload **MUST** contain enough data to allow us to apply the update
> and also safely reverse it. As such we **MUST** know:
>
>   * What the old code is expected to be. We **MUST** verify it against the
>     runtime code.
>   * The locations in memory to be patched. This can be determined dynamically
>     via symbols or via virtual addresses.
>   * The new code to be used.
>   * Signature to verify the payload.
>
> This binary format can be constructed using an custom binary format but
> there are severe disadvantages of it:
>
>   * The format might need to be change and we need an mechanism to accommodate
>     that.
>   * It has to be platform agnostic.
>   * Easily constructed using existing tools.
>
> As such having the payload in an ELF file is the sensible way. We would be
> carrying the various set of structures (and data) in the ELF sections under
> different names and with definitions. The prefix for the ELF section name
> would always be: *.xsplice_*
>
> Note that every structure has padding. This is added so that the hypervisor
> can re-use those fields as it sees fit.
>
> There are five sections *.xsplice_* sections:
>
>   * `.xsplice_symbols` and `.xsplice_str`. The array of symbols to be referenced
>     during the update. This can contain the symbols (functions) that will be
>     patched, or the list of symbols (functions) to be checked pre-patching which
>     may not be on the stack.
>
> * `.xsplice_reloc` and `.xsplice_reloc_howto`. The howto properly construct
>     trampolines for an patch. We can have multiple locations for which we
>     need to insert an trampoline for a payload and each location might require
>     a different way of handling it. This would naturally reference the `.text`
>     section and its proper offset. The `.xsplice_reloc` is not directly concerned
>     with patches but rather is an ELF relocation - describing the target
>     of a relocation and how that is performed.  They're also used for where
>     the new code references the run code too.
>
>   * `.xsplice_sections`. The safety data for the old code and new code.
>     This contains an array of symbols (pointing to `.xsplice_symbols` to
>     and `.text`) which are to be used during safety and dependency checking.
>
>
>   * `.xsplice_patches`: The description of the new functions to be patched
>     in (size, type, pointer to code, etc.).
>
>   * `.xsplice_change`. The structure that ties all of this together and defines
>     the payload.
>
> Additionally the ELF file would contain:
>
>   * `.text` section for the new and old code (function).
>   * `.rela.text` relocation data for the `.text` (both new and old).
>   * `.rela.xsplice_patches` relocation data for `.xsplice_patches` (such as offset
>     to the `.text` ,`.xsplice_symbols`, or `.xsplice_reloc` section).
>   * `.bss` section for the new code (function)
>   * `.data` and `.data.read_mostly` section for the new and old code (function)
>   * `.rodata` section for the new and old code (function).
>
> In short the *.xsplice_* sections represent various structures and the
> ELF provides the mechanism to glue it all together when loaded in memory.
>
> Note that a lot of these ideas are borrowed from kSplice which is
> available at: https://github.com/jirislaby/ksplice
>
> For ELF understanding the best starting point is the OSDev Wiki
> (http://wiki.osdev.org/ELF). Furthermore the ELF specification is
> at http://www.skyfree.org/linux/references/ELF_Format.pdf and
> at Oracle's web site:
> http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-46512.html#scrolltoc
>
> ### ASCII art of the ELF structures
>
> *TODO*: Include an ASCII art of how the sections are tied together.
>
> ### xsplice_symbols
>
> The section contains an array of an structure that outlines the name
> of the symbol to be patched (or checked against). The structure is
> as follow:
>
> <pre>
> struct xsplice_symbol {
>      const char *name; /* The ELF name of the symbol. */
>      const char *label; /* A unique xSplice name for the symbol. */
>      uint8_t pad[16]; /* Must be zero. */
> };
> </pre>
> The structures may be in the section in any order and in any amount
> (duplicate entries are permitted).
>
> Both `name` and `label` would be pointing to entries in `.xsplice_str`.
>
> The `label` is used for diagnostic purposes - such as including the
> name and the offset.
>
> ### xsplice_reloc and xsplice_reloc_howto
>
> The section contains an array of a structure that outlines the different
> locations (and howto) for which an trampoline is to be inserted.
>
> The howto defines in the detail the change. It contains the type,
> whether the relocation is relative, the size of the relocation,
> bitmask for which parts of the instruction or data are to be replaced,
> amount of final relocation is shifted by (to drop unwanted data), and
> whether the replacement should be interpreted as signed value.
>
> The structure is as follow:
>
> <pre>
> #define XSPLICE_HOWTO_RELOC_INLINE  0 /* Inline replacement. */
> #define XSPLICE_HOWTO_RELOC_PATCH   1 /* Add trampoline. */
> #define XSPLICE_HOWTO_RELOC_DATA    2 /*  __DATE__ type change. */
> #define XSPLICE_HOWTO_RELOC_TIME    3 /* __TIME__ type chnage. */
> #define XSPLICE_HOWTO_BUG           4 /* BUG_ON being replaced.*/
> #define XSPLICE_HOWTO_EXTABLE       5 /* exception_table change. */
> #define XSPLICE_HOWTO_SYMBOL        6 /* change in symbol table. */
>
> #define XSPLICE_HOWTO_FLAG_PC_REL    0x00000001 /* Is PC relative. */
> #define XSPLICE_HOWOT_FLAG_SIGN      0x00000002 /* Should the new value be treated as signed value. */
>
> struct xsplice_reloc_howto {
>      uint32_t    type; /* XSPLICE_HOWTO_* */
>      uint32_t    flag; /* XSPLICE_HOWTO_FLAG_* */
>      uint32_t    size; /* Size, in bytes, of the item to be relocated. */
>      uint32_t    r_shift; /* The value the final relocation is shifted right by; used to drop unwanted data from the relocation. */
>      uint64_t    mask; /* Bitmask for which parts of the instruction or data are replaced with the relocated value. */
>      uint8_t     pad[8]; /* Must be zero. */
> };
>
> </pre>
>
> This structure is used in:
>
> <pre>
> struct xsplice_reloc {
>      uint64_t addr; /* The address of the relocation (if known). */
>      struct xsplice_symbol *symbol; /* Symbol for this relocation. */
>      struct xsplice_reloc_howto  *howto; /* Pointer to the above structure. */
>      uint64_t isns_added; /* ELF addend resulting from quirks of instruction one of whose operands is the relocation. For example, this is -4 on x86 pc-relative jumps. */
>      uint64_t isns_target; /* rest of the ELF addend.  This is equal to the offset against the symbol that the relocation refers to. */
>      uint8_t pad[8];  /* Must be zero. */
> };
> </pre>
>
> ### xsplice_sections
>
> The structure defined in this section is used to verify that it is safe
> to update with the new changes. It can contain safety data on the old code
> and what kind of matching we are to expect.
>
> It also can contain safety date of what to check when about to patch.
> That is whether any of the addresses (either provided or resolved
> when payload is loaded by referencing the symbols) are in memory
> with what we expect it to be.
>
> As such the flags can be or-ed together:
>
> <pre>
> #define XSPLICE_SECTION_TEXT   0x00000001 /* Section is in .text */
> #define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .ro */
> #define XSPLICE_SECTION_DATA   0x00000004 /* Section is in .rodata */
> #define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */
> #define XSPLICE_SECTION_ALTINSTRUCTIONS 0x00000010 /* Section has .altinstructions. */
> #define XSPLICE_SECTION_TEXT_INPLACE 0x00000200 /* Change is in place. */
> #dekine XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */
> #define XSPLICE_SECTION_NO_STACKCHECK 0x00000800 /* Do not check the stack. */
>
> struct xsplice_section {
>      struct xsplice_symbol *symbol; /* The symbol associated with this change. */
>      uint64_t address; /* The address of the section (if known). */
>      uint64_t size; /* The size of the section. */
>      uint64_t flags; /* Various XSPLICE_SECTION_* flags. */
>      uint8_t pad[16]; /* To be zero. */
> };
>
> </pre>
>
> ### xsplice_patches
>
> Within this section we have an array of a structure defining the new code (patch).
>
> This structure consist of an pointer to the new code (which in ELF ends up
> pointing to an offset in `.text` or `.data` section); the type of patch:
> inline - either text or data, or requesting an trampoline; and size of patch.
>
> The structure is as follow:
>
> <pre>
> #define XSPLICE_PATCH_INLINE_TEXT   0
> #define XSPLICE_PATCH_INLINE_DATA   1
> #define XSPLICE_PATCH_RELOC_TEXT    2
>
> struct xsplice_patch {
>      uint32_t type; /* XSPLICE_PATCH_* .*/
>      uint32_t size; /* Size of patch. */
>      uint64_t addr; /* The address of the new code (or data). */
>      void *content; /* The bytes to be installed. */
>      uint8_t pad[16]; /* Must be zero. */
> };
>
> </pre>
>
> ### xsplice_code
>
> The structure embedded within this section ties it all together.
> It has the name of the patch, and pointers to all the above
> mentioned structures (the start and end addresses).
>
> The structure is as follow:
>
> <pre>
> struct xsplice_code {
>      const char *name; /* A sensible name for the patch. Up to 40 characters. */
>      struct xsplice_reloc *relocs, *relocs_end; /* How to patch it */
>      struct xsplice_section *sections, *sections_end; /* Safety data */
>      struct xsplice_patch *patches, *patches_end; /* Patch code & data */
>      uint8_t pad[32]; /* Must be zero. */
> };
> </pre>
>
> There should only be one such structure in the section.
>
> ### Example
>
> *TODO*: Include an objdump of how the ELF would look like for the XSA
> mentioned earlier.
>
> ## Signature checking requirements.
>
> The signature checking requires that the layout of the data in memory
> **MUST** be same for signature to be verified. This means that the payload
> data layout in ELF format **MUST** match what the hypervisor would be
> expecting such that it can properly do signature verification.
>
> The signature is based on the all of the payloads continuously laid out
> in memory. The signature is to be appended at the end of the ELF payload
> prefixed with the string '~Module signature appended~\n", followed by
> an signature header then followed by the signature, key identifier, and signers
> name.
>
> Specifically the signature header would be:
>
> <pre>
> #define PKEY_ALGO_DSA       0
> #define PKEY_ALGO_RSA       1
>
> #define PKEY_ID_PGP         0 /* OpenPGP generated key ID */
> #define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */
>
> #define HASH_ALGO_MD4          0
> #define HASH_ALGO_MD5          1
> #define HASH_ALGO_SHA1         2
> #define HASH_ALGO_RIPE_MD_160  3
> #define HASH_ALGO_SHA256       4
> #define HASH_ALGO_SHA384       5
> #define HASH_ALGO_SHA512       6
> #define HASH_ALGO_SHA224       7
> #define HASH_ALGO_RIPE_MD_128  8
> #define HASH_ALGO_RIPE_MD_256  9
> #define HASH_ALGO_RIPE_MD_320 10
> #define HASH_ALGO_WP_256      11
> #define HASH_ALGO_WP_384      12
> #define HASH_ALGO_WP_512      13
> #define HASH_ALGO_TGR_128     14
> #define HASH_ALGO_TGR_160     15
> #define HASH_ALGO_TGR_192     16
>
>
> struct elf_payload_signature {
>     u8    algo;        /* Public-key crypto algorithm PKEY_ALGO_*. */
>     u8    hash;        /* Digest algorithm: HASH_ALGO_*. */
>     u8    id_type;    /* Key identifier type PKEY_ID*. */
>     u8    signer_len;    /* Length of signer's name */
>     u8    key_id_len;    /* Length of key identifier */
>     u8    __pad[3];
>     __be32    sig_len;    /* Length of signature data */
> };
>
> </pre>
> (Note that this has been borrowed from Linux module signature code.).
>
>
> ## Hypercalls
>
> We will employ the sub operations of the system management hypercall (sysctl).
> There are to be four sub-operations:
>
>   * upload the payloads.
>   * listing of payloads summary uploaded and their state.
>   * getting an particular payload summary and its state.
>   * command to apply, delete, or revert the payload.
>
> The patching is asynchronous therefore the caller is responsible
> to verify that it has been applied properly by retrieving the summary of it
> and verifying that there are no error codes associated with the payload.
>
> We **MUST** make it asynchronous due to the nature of patching: it requires
> every physical CPU to be lock-step with each other. The patching mechanism
> while an implementation detail, is not an short operation and as such
> the design **MUST** assume it will be an long-running operation.
>
> Furthermore it is possible to have multiple different payloads for the same
> function. As such an unique id has to be visible to allow proper manipulation.
>
> The hypercall is part of the `xen_sysctl`. The top level structure contains
> one uint32_t to determine the sub-operations:
>
> <pre>
> struct xen_sysctl_xsplice_op {
>      uint32_t cmd;
>     union {
>            ... see below ...
>          } u;
> };
>
> </pre>
> while the rest of hypercall specific structures are part of the this structure.
>
>
> ### XEN_SYSCTL_XSPLICE_UPLOAD (0)
>
> Upload a payload to the hypervisor. The payload is verified and if there
> are any issues the proper return code will be returned. The payload is
> not applied at this time - that is controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
>
> The caller provides:
>
>   * `id` unique id.
>   * `payload` the virtual address of where the ELF payload is.
>
> The return value is zero if the payload was succesfully uploaded and the
> signature was verified. Otherwise an EXX return value is provided.
> Duplicate `id` are not supported.
>
> The `payload` is the ELF payload as mentioned in the `Payload format` section.
>
> The structure is as follow:
>
> <pre>
> struct xen_sysctl_xsplice_upload {
>      char id[40];  /* IN, name of the patch. */
>      uint64_t size; /* IN, size of the ELF file. */
>      XEN_GUEST_HANDLE_64(uint8) payload; /* ELF file. */
> };
> </pre>
>
> ### XEN_SYSCTL_XSPLICE_GET (1)
>
> Retrieve an summary of an specific payload. This caller provides:
>
>   * `id` the unique id.
>   * `status` *MUST* be set to zero.
>   * `rc` *MUST* be set to zero.
>
> The `summary` structure contains an summary of payload which includes:
>
>   * `id` the unique id.
>   * `status` - whether it has been:
>   1. *XSPLICE_STATUS_LOADED* (0) has been loaded.
>   2. *XSPLICE_STATUS_PROGRESS* (1) acting on the **XEN_SYSCTL_XSPLICE_ACTION** command.
>   3. *XSPLICE_STATUS_CHECKED*  (2) the ELF payload safety checks passed.
>   4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
>   5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also reverted.
>   6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult `rc` for details.
>   * `rc` - its error state if any.
>
> The structure is as follow:
>
> <pre>
> #define XSPLICE_STATUS_LOADED    0
> #define XSPLICE_STATUS_PROGRESS  1
> #define XSPLICE_STATUS_CHECKED   2
> #define XSPLICE_STATUS_APPLIED   3
> #define XSPLICE_STATUS_REVERTED  4
> #define XSPLICE_STATUS_IN_ERROR  5
>
> struct xen_sysctl_xsplice_summary {
>      char id[40];  /* IN/OUT, name of the patch. */
>      uint32_t status;   /* OUT */
>      int32_t rc;  /* OUT */
> };
> </pre>
>
> ### XEN_SYSCTL_XSPLICE_LIST (2)
>
> Retrieve an array of abbreviated summary of payloads that are loaded in the
> hypervisor.
>
> The caller provides:
>
>   * `idx` index iterator. Initially it *MUST* be zero.
>   * `count` the max number of entries to populate.
>   * `summary` virtual address of where to write payload summaries.
>
> The hypercall returns zero on success and updates the `idx` (index) iterator
> with the number of payloads returned, `count` to the number of remaining
> payloads, and `summary` with an number of payload summaries.
>
> If the hypercall returns E2BIG the `count` is too big and should be
> lowered.
>
> Note that due to the asynchronous nature of hypercalls the domain might have
> added or removed the number of payloads making this information stale. It is
> the responsibility of the domain to provide proper accounting.
>
> The `summary` structure contains an summary of payload which includes:
>
>   * `id` unique id.
>   * `status` - whether it has been:
>   1. *XSPLICE_STATUS_LOADED* (0) has been loaded.
>   2. *XSPLICE_STATUS_PROGRESS* (1) acting on the **XEN_SYSCTL_XSPLICE_ACTION** command.
>   3. *XSPLICE_STATUS_CHECKED*  (2) the payload `old` and `addr` match with the hypervisor.
>   4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
>   5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also reverted.
>   6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult `rc` for details.
>   * `rc` - its error state if any.
>
> The structure is as follow:
>
> <pre>
> struct xen_sysctl_xsplice_list {
>      uint32_t idx;  /* IN/OUT */
>      uint32_t count;  /* IN/OUT */
>      XEN_GUEST_HANDLE_64(xen_sysctl_xsplice_summary) summary;  /* OUT */
> };
>
> struct xen_sysctl_xsplice_summary {
>      char id[40];  /* OUT, name of the patch. */
>      uint32_t status;   /* OUT */
>      int32_t rc;  /* OUT */
> };
>
> </pre>
> ### XEN_SYSCTL_XSPLICE_ACTION (3)
>
> Perform an operation on the payload structure referenced by the `id` field.
> The operation request is asynchronous and the status should be retrieved
> by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall.
>
> The caller provides:
>
>   * `id` the unique id.
>   * `cmd` the command requested:
>    1. *XSPLICE_ACTION_CHECK* (0) check that the payload will apply properly.
>    2. *XSPLICE_ACTION_UNLOAD* (1) unload the payload.
>    3. *XSPLICE_ACTION_REVERT* (2) revert the payload.
>    4. *XSPLICE_ACTION_APPLY* (3) apply the payload.
>
>
> The return value will be zero unless the provided fields are incorrect.
>
> The structure is as follow:
>
> <pre>
> #define XSPLICE_ACTION_CHECK  0
> #define XSPLICE_ACTION_UNLOAD 1
> #define XSPLICE_ACTION_REVERT 2
> #define XSPLICE_ACTION_APPLY  3
>
> struct xen_sysctl_xsplice_action {
>      char id[40];  /* IN, name of the patch. */
>      uint32_t cmd; /* IN */
> };
>
> </pre>
>
> ## Sequence of events.
>
> The normal sequence of events is to:
>
>   1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors *STOP* here.
>   2. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_LOADED* go to next step.
>   3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to verify that the payload can be succesfully applied.
>   4. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_CHECKED* go to next step.
>   5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch.
>   6. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_APPLIED* exit with success.
>
>
> ## Addendum
>
> Implementation quirks should not be discussed in a design document.
>
> However these observations can provide aid when developing against this
> document.
>
>
> ### Alternative assembler
>
> Alternative assembler is a mechanism to use different instructions depending
> on what the CPU supports. This is done by providing multiple streams of code
> that can be patched in - or if the CPU does not support it - padded with
> `nop` operations. The alternative assembler macros cause the compiler to
> expand the code to place a most generic code in place - emit a special
> ELF .section header to tag this location. During run-time the hypervisor
> can leave the areas alone or patch them with an better suited opcodes.
>
> As we might be patching the alternative assembler sections as well - by
> providing a new better suited op-codes or perhaps with nops - we need to
> also re-run the alternative assembler patching after we have done our
> patching.
>
> Also when we are doing safety checks the code we are checking might be
> utilizing alternative assembler. As such we should relax out checks to
> accomodate that.
>
> ### .rodata sections
>
> The patching might require strings to be updated as well. As such we must be
> also able to patch the strings as needed. This sounds simple - but the compiler
> has a habit of coalescing strings that are the same - which means if we in-place
> alter the strings - other users will be inadvertently affected as well.
>
> This is also where pointers to functions live - and we may need to patch this
> as well.
>
> To guard against that we must be prepared to do patching similar to
> trampoline patching or in-line depending on the flavour. If we can
> do in-line patching we would need to:
>
>   * alter `.rodata` to be writeable.
>   * inline patch.
>   * alter `.rodata` to be read-only.
>
> If are doing trampoline patching we would need to:
>
>   * allocate a new memory location for the string.
>   * all locations which use this string will have to be updated to use the
>     offset to the string.
>   * mark the region RO when we are done.
>
> ### .bss sections
>
> Patching writable data is not suitable as it is unclear what should be done
> depending on the current state of data. As such it should not be attempted.
>
>
> ### Patching code which is in the stack.
>
> We should not patch the code which is on the stack. That can lead
> to corruption.
>
> ### Trampoline (e9 opcode)
>
> The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
> we are limited to up to 2GB of virtual address to place the new code
> from the old code. That should not be a problem since Xen hypervisor has
> a very small footprint.
>
> However if we need - we can always add two trampolines. One at the 2GB
> limit that calls the next trampoline.
>
> ### Time rendezvous code instead of stop_machine for patching
>
> The hypervisor's time rendezvous code runs synchronously across all CPUs
> every second. Using the stop_machine to patch can stall the time rendezvous
> code and result in NMI. As such having the patching be done at the tail
> of rendezvous code should avoid this problem.
>
> ### Security
>
> Only the privileged domain should be allowed to do this operation.
>
> .
>