From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Liuqiming (John)" Subject: Re: [RFC v2] xSplice design Date: Mon, 18 May 2015 20:54:22 +0800 Message-ID: <5559E0FE.2060307@huawei.com> References: <20150515194440.GA24313@l.oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YuKa9-0005D2-JR for xen-devel@lists.xenproject.org; Mon, 18 May 2015 12:55:45 +0000 In-Reply-To: <20150515194440.GA24313@l.oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Elena Ufimtseva , jeremy@goop.org, hanweidong@huawei.com, jbeulich@suse.com, Paul Voccio , Daniel Kiper , Major Hayden , liuyingdong@huawei.com, aliguori@amazon.com, konrad@darnok.org, xiantao.zxt@alibaba-inc.com, lars.kurth@citrix.com, Steven Wilson , peter.huangpeng@huawei.com, msw@amazon.com, xen-devel@lists.xenproject.org, Rick Harris , boris.ostrovsky@oracle.com, Josh Kearney , jinsong.liu@alibaba-inc.com, Antony Messerli , fanhenglong@huawei.com, andrew.cooper3@citrix.com List-Id: xen-devel@lists.xenproject.org Hi Konrad, Will this design include hotpatch build tools chain? Such as how these .xplice_ section are created? How to handle xen symbols when creating hotpatch elf file? On 2015/5/16 3:44, Konrad Rzeszutek Wilk wrote: > Hey! > > During the Xen Hacka^H^H^H^HProject Summit? we chatted about live-patching > the hypervisor. We sketched out how it could be done, and brainstormed > some of the problems. > > I took that and wrote an design - which is very much RFC. The design is > laid out in two sections - the format of the ELF payload - and then the > hypercalls to act on it. > > Hypercall preemption has caused a couple of XSAs so I've baked the need > for that in the design so we hopefully won't have an XSA for this code. > > There are two big *TODO* in the design which I had hoped to get done > before sending this out - however I am going on vacation for two weeks > so I figured it would be better to send this off for folks to mull now > then to have it languish. > > Please feel free to add more folks on the CC list. > > Enjoy! > > > # xSplice Design v1 (EXTERNAL RFC v2) > > ## Rationale > > A mechanism is required to binarily patch the running hypervisor with new > opcodes that have come about due to primarily security updates. > > This document describes the design of the API that would allow us to > upload to the hypervisor binary patches. > > ## Glossary > > * splice - patch in the binary code with new opcodes > * trampoline - a jump to a new instruction. > * payload - telemetries of the old code along with binary blob of the new > function (if needed). > * reloc - telemetries contained in the payload to construct proper trampoline. > > ## Multiple ways to patch > > The mechanism needs to be flexible to patch the hypervisor in multiple ways > and be as simple as possible. The compiled code is contiguous in memory with > no gaps - so we have no luxury of 'moving' existing code and must either > insert a trampoline to the new code to be executed - or only modify in-place > the code if there is sufficient space. The placement of new code has to be done > by hypervisor and the virtual address for the new code is allocated dynamically. > i > This implies that the hypervisor must compute the new offsets when splicing > in the new trampoline code. Where the trampoline is added (inside > the function we are patching or just the callers?) is also important. > > To lessen the amount of code in hypervisor, the consumer of the API > is responsible for identifying which mechanism to employ and how many locations > to patch. Combinations of modifying in-place code, adding trampoline, etc > has to be supported. The API should allow read/write any memory within > the hypervisor virtual address space. > > We must also have a mechanism to query what has been applied and a mechanism > to revert it if needed. > > We must also have a mechanism to: provide an copy of the old code - so that > the hypervisor can verify it against the code in memory; the new code; > the symbol name of the function to be patched; or offset from the symbol; > or virtual address. > > The complications that this design will encounter are explained later > in this document. > > ## Patching code > > The first mechanism to patch that comes in mind is in-place replacement. > That is replace the affected code with new code. Unfortunately the x86 > ISA is variable size which places limits on how much space we have available > to replace the instructions. > > The second mechanism is by replacing the call or jump to the > old function with the address of the new function. > > A third mechanism is to add a jump to the new function at the > start of the old function. > > ### Example of trampoline and in-place splicing > > As example we will assume the hypervisor does not have XSA-132 (see > *domctl/sysctl: don't leak hypervisor stack to toolstacks* > 4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch > the hypervisor with it. The original code looks as so: > >
>     48 89 e0                  mov    %rsp,%rax
>     48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax
> 
> > while the new patched hypervisor would be: > >
>     48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)
>     48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)
>     48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)
>     48 89 e0                  mov    %rsp,%rax
>     48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax
> 
> > This is inside the arch_do_domctl. This new change adds 21 extra > bytes of code which alters all the offsets inside the function. To alter > these offsets and add the extra 21 bytes of code we might not have enough > space in .text to squeze this in. > > As such we could simplify this problem by only patching the site > which calls arch_do_domctl: > >
> :
>   e8 4b b1 05 00          callq  ffff82d08015fbb9 
> 
> > with a new address for where the new `arch_do_domctl` would be (this > area would be allocated dynamically). > > Astute readers will wonder what we need to do if we were to patch `do_domctl` > - which is not called directly by hypervisor but on behalf of the guests via > the `compat_hypercall_table` and `hypercall_table`. > Patching the offset in `hypercall_table` for `do_domctl: > (ffff82d080103079 :) >
>
>   ffff82d08024d490:   79 30
>   ffff82d08024d492:   10 80 d0 82 ff ff
>
> 
> with the new address where the new `do_domctl` is possible. The other > place where it is used is in `hvm_hypercall64_table` which would need > to be patched in a similar way. This would require an in-place splicing > of the new virtual address of `arch_do_domctl`. > > In summary this example patched the callee of the affected function by > * allocating memory for the new code to live in, > * changing the virtual address of all the functions which called the old > code (computing the new offset, patching the callq with a new callq). > * changing the function pointer tables with the new virtual address of > the function (splicing in the new virtual address). Since this table > resides in the .rodata section we would need to temporarily change the > page table permissions during this part. > > > However it has severe drawbacks - the safety checks which have to make sure > the function is not on the stack - must also check every caller. For some > patches this could if there were an sufficient large amount of callers > that we would never be able to apply the update. > > ### Example of different trampoline patching. > > An alternative mechanism exists where we can insert an trampoline in the > existing function to be patched to jump directly to the new code. This > lessens the locations to be patched to one but it puts pressure on the > CPU branching logic (I-cache, but it is just one unconditional jump). > > For this example we will assume that the hypervisor has not been compiled > with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures > for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure > in `xen_version` hypercall. This function is not called **anywhere** in > the hypervisor (it is called by the guest) but referenced in the > `compat_hypercall_table` and `hypercall_table` (and indirectly called > from that). Patching the offset in `hypercall_table` for the old > `do_xen_version` (ffff82d080112f9e ) > > > ffff82d08024b270 > ... > ffff82d08024b2f8: 9e 2f 11 80 d0 82 ff ff > > > with the new address where the new `do_xen_version` is possible. The other > place where it is used is in `hvm_hypercall64_table` which would need > to be patched in a similar way. This would require an in-place splicing > of the new virtual address of `do_xen_version`. > > An alternative solution would be to patch insert an trampoline in the > old `do_xen_version' function to directly jump to the new `do_xen_version`. > >
>   ffff82d080112f9e :
>   ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax
>   ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi
>   ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 
> 
> > with: > >
>   ffff82d080112f9e :
>   ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]
> 
> > which would lessen the amount of patching to just one location. > > In summary this example patched the affected function to jump to the > new replacement function which required: > * allocating memory for the new code to live in, > * inserting trampoline with new offset in the old function to point to the > new function. > * Optionally we can insert in the old function an trampoline jump to an function > providing an BUG_ON to catch errant code. > > The disadvantage of this are that the unconditional jump will consume a small > I-cache penalty. However the simplicity of the patching of safety checks > make this a worthwhile option. > > ### Security > > With this method we can re-write the hypervisor - and as such we **MUST** be > diligent in only allowing certain guests to perform this operation. > > Furthermore with SecureBoot or tboot, we **MUST** also verify the signature > of the payload to be certain it came from a trusted source. > > As such the hypercall **MUST** support an XSM policy to limit the what > guest is allowed. If the system is booted with signature checking the > signature checking will be enforced. > > ## Payload format > > The payload **MUST** contain enough data to allow us to apply the update > and also safely reverse it. As such we **MUST** know: > > * What the old code is expected to be. We **MUST** verify it against the > runtime code. > * The locations in memory to be patched. This can be determined dynamically > via symbols or via virtual addresses. > * The new code to be used. > * Signature to verify the payload. > > This binary format can be constructed using an custom binary format but > there are severe disadvantages of it: > > * The format might need to be change and we need an mechanism to accommodate > that. > * It has to be platform agnostic. > * Easily constructed using existing tools. > > As such having the payload in an ELF file is the sensible way. We would be > carrying the various set of structures (and data) in the ELF sections under > different names and with definitions. The prefix for the ELF section name > would always be: *.xsplice_* > > Note that every structure has padding. This is added so that the hypervisor > can re-use those fields as it sees fit. > > There are five sections *.xsplice_* sections: > > * `.xsplice_symbols` and `.xsplice_str`. The array of symbols to be referenced > during the update. This can contain the symbols (functions) that will be > patched, or the list of symbols (functions) to be checked pre-patching which > may not be on the stack. > > * `.xsplice_reloc` and `.xsplice_reloc_howto`. The howto properly construct > trampolines for an patch. We can have multiple locations for which we > need to insert an trampoline for a payload and each location might require > a different way of handling it. This would naturally reference the `.text` > section and its proper offset. The `.xsplice_reloc` is not directly concerned > with patches but rather is an ELF relocation - describing the target > of a relocation and how that is performed. They're also used for where > the new code references the run code too. > > * `.xsplice_sections`. The safety data for the old code and new code. > This contains an array of symbols (pointing to `.xsplice_symbols` to > and `.text`) which are to be used during safety and dependency checking. > > > * `.xsplice_patches`: The description of the new functions to be patched > in (size, type, pointer to code, etc.). > > * `.xsplice_change`. The structure that ties all of this together and defines > the payload. > > Additionally the ELF file would contain: > > * `.text` section for the new and old code (function). > * `.rela.text` relocation data for the `.text` (both new and old). > * `.rela.xsplice_patches` relocation data for `.xsplice_patches` (such as offset > to the `.text` ,`.xsplice_symbols`, or `.xsplice_reloc` section). > * `.bss` section for the new code (function) > * `.data` and `.data.read_mostly` section for the new and old code (function) > * `.rodata` section for the new and old code (function). > > In short the *.xsplice_* sections represent various structures and the > ELF provides the mechanism to glue it all together when loaded in memory. > > Note that a lot of these ideas are borrowed from kSplice which is > available at: https://github.com/jirislaby/ksplice > > For ELF understanding the best starting point is the OSDev Wiki > (http://wiki.osdev.org/ELF). Furthermore the ELF specification is > at http://www.skyfree.org/linux/references/ELF_Format.pdf and > at Oracle's web site: > http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-46512.html#scrolltoc > > ### ASCII art of the ELF structures > > *TODO*: Include an ASCII art of how the sections are tied together. > > ### xsplice_symbols > > The section contains an array of an structure that outlines the name > of the symbol to be patched (or checked against). The structure is > as follow: > >
> struct xsplice_symbol {
>      const char *name; /* The ELF name of the symbol. */
>      const char *label; /* A unique xSplice name for the symbol. */
>      uint8_t pad[16]; /* Must be zero. */
> };
> 
> The structures may be in the section in any order and in any amount > (duplicate entries are permitted). > > Both `name` and `label` would be pointing to entries in `.xsplice_str`. > > The `label` is used for diagnostic purposes - such as including the > name and the offset. > > ### xsplice_reloc and xsplice_reloc_howto > > The section contains an array of a structure that outlines the different > locations (and howto) for which an trampoline is to be inserted. > > The howto defines in the detail the change. It contains the type, > whether the relocation is relative, the size of the relocation, > bitmask for which parts of the instruction or data are to be replaced, > amount of final relocation is shifted by (to drop unwanted data), and > whether the replacement should be interpreted as signed value. > > The structure is as follow: > >
> #define XSPLICE_HOWTO_RELOC_INLINE  0 /* Inline replacement. */
> #define XSPLICE_HOWTO_RELOC_PATCH   1 /* Add trampoline. */
> #define XSPLICE_HOWTO_RELOC_DATA    2 /*  __DATE__ type change. */
> #define XSPLICE_HOWTO_RELOC_TIME    3 /* __TIME__ type chnage. */
> #define XSPLICE_HOWTO_BUG           4 /* BUG_ON being replaced.*/
> #define XSPLICE_HOWTO_EXTABLE       5 /* exception_table change. */
> #define XSPLICE_HOWTO_SYMBOL        6 /* change in symbol table. */
>
> #define XSPLICE_HOWTO_FLAG_PC_REL    0x00000001 /* Is PC relative. */
> #define XSPLICE_HOWOT_FLAG_SIGN      0x00000002 /* Should the new value be treated as signed value. */
>
> struct xsplice_reloc_howto {
>      uint32_t    type; /* XSPLICE_HOWTO_* */
>      uint32_t    flag; /* XSPLICE_HOWTO_FLAG_* */
>      uint32_t    size; /* Size, in bytes, of the item to be relocated. */
>      uint32_t    r_shift; /* The value the final relocation is shifted right by; used to drop unwanted data from the relocation. */
>      uint64_t    mask; /* Bitmask for which parts of the instruction or data are replaced with the relocated value. */
>      uint8_t     pad[8]; /* Must be zero. */
> };
>
> 
> > This structure is used in: > >
> struct xsplice_reloc {
>      uint64_t addr; /* The address of the relocation (if known). */
>      struct xsplice_symbol *symbol; /* Symbol for this relocation. */
>      struct xsplice_reloc_howto  *howto; /* Pointer to the above structure. */
>      uint64_t isns_added; /* ELF addend resulting from quirks of instruction one of whose operands is the relocation. For example, this is -4 on x86 pc-relative jumps. */
>      uint64_t isns_target; /* rest of the ELF addend.  This is equal to the offset against the symbol that the relocation refers to. */
>      uint8_t pad[8];  /* Must be zero. */
> };
> 
> > ### xsplice_sections > > The structure defined in this section is used to verify that it is safe > to update with the new changes. It can contain safety data on the old code > and what kind of matching we are to expect. > > It also can contain safety date of what to check when about to patch. > That is whether any of the addresses (either provided or resolved > when payload is loaded by referencing the symbols) are in memory > with what we expect it to be. > > As such the flags can be or-ed together: > >
> #define XSPLICE_SECTION_TEXT   0x00000001 /* Section is in .text */
> #define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .ro */
> #define XSPLICE_SECTION_DATA   0x00000004 /* Section is in .rodata */
> #define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */
> #define XSPLICE_SECTION_ALTINSTRUCTIONS 0x00000010 /* Section has .altinstructions. */
> #define XSPLICE_SECTION_TEXT_INPLACE 0x00000200 /* Change is in place. */
> #dekine XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */
> #define XSPLICE_SECTION_NO_STACKCHECK 0x00000800 /* Do not check the stack. */
>
> struct xsplice_section {
>      struct xsplice_symbol *symbol; /* The symbol associated with this change. */
>      uint64_t address; /* The address of the section (if known). */
>      uint64_t size; /* The size of the section. */
>      uint64_t flags; /* Various XSPLICE_SECTION_* flags. */
>      uint8_t pad[16]; /* To be zero. */
> };
>
> 
> > ### xsplice_patches > > Within this section we have an array of a structure defining the new code (patch). > > This structure consist of an pointer to the new code (which in ELF ends up > pointing to an offset in `.text` or `.data` section); the type of patch: > inline - either text or data, or requesting an trampoline; and size of patch. > > The structure is as follow: > >
> #define XSPLICE_PATCH_INLINE_TEXT   0
> #define XSPLICE_PATCH_INLINE_DATA   1
> #define XSPLICE_PATCH_RELOC_TEXT    2
>
> struct xsplice_patch {
>      uint32_t type; /* XSPLICE_PATCH_* .*/
>      uint32_t size; /* Size of patch. */
>      uint64_t addr; /* The address of the new code (or data). */
>      void *content; /* The bytes to be installed. */
>      uint8_t pad[16]; /* Must be zero. */
> };
>
> 
> > ### xsplice_code > > The structure embedded within this section ties it all together. > It has the name of the patch, and pointers to all the above > mentioned structures (the start and end addresses). > > The structure is as follow: > >
> struct xsplice_code {
>      const char *name; /* A sensible name for the patch. Up to 40 characters. */
>      struct xsplice_reloc *relocs, *relocs_end; /* How to patch it */
>      struct xsplice_section *sections, *sections_end; /* Safety data */
>      struct xsplice_patch *patches, *patches_end; /* Patch code & data */
>      uint8_t pad[32]; /* Must be zero. */
> };
> 
> > There should only be one such structure in the section. > > ### Example > > *TODO*: Include an objdump of how the ELF would look like for the XSA > mentioned earlier. > > ## Signature checking requirements. > > The signature checking requires that the layout of the data in memory > **MUST** be same for signature to be verified. This means that the payload > data layout in ELF format **MUST** match what the hypervisor would be > expecting such that it can properly do signature verification. > > The signature is based on the all of the payloads continuously laid out > in memory. The signature is to be appended at the end of the ELF payload > prefixed with the string '~Module signature appended~\n", followed by > an signature header then followed by the signature, key identifier, and signers > name. > > Specifically the signature header would be: > >
> #define PKEY_ALGO_DSA       0
> #define PKEY_ALGO_RSA       1
>
> #define PKEY_ID_PGP         0 /* OpenPGP generated key ID */
> #define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */
>
> #define HASH_ALGO_MD4          0
> #define HASH_ALGO_MD5          1
> #define HASH_ALGO_SHA1         2
> #define HASH_ALGO_RIPE_MD_160  3
> #define HASH_ALGO_SHA256       4
> #define HASH_ALGO_SHA384       5
> #define HASH_ALGO_SHA512       6
> #define HASH_ALGO_SHA224       7
> #define HASH_ALGO_RIPE_MD_128  8
> #define HASH_ALGO_RIPE_MD_256  9
> #define HASH_ALGO_RIPE_MD_320 10
> #define HASH_ALGO_WP_256      11
> #define HASH_ALGO_WP_384      12
> #define HASH_ALGO_WP_512      13
> #define HASH_ALGO_TGR_128     14
> #define HASH_ALGO_TGR_160     15
> #define HASH_ALGO_TGR_192     16
>
>
> struct elf_payload_signature {
>     u8    algo;        /* Public-key crypto algorithm PKEY_ALGO_*. */
>     u8    hash;        /* Digest algorithm: HASH_ALGO_*. */
>     u8    id_type;    /* Key identifier type PKEY_ID*. */
>     u8    signer_len;    /* Length of signer's name */
>     u8    key_id_len;    /* Length of key identifier */
>     u8    __pad[3];
>     __be32    sig_len;    /* Length of signature data */
> };
>
> 
> (Note that this has been borrowed from Linux module signature code.). > > > ## Hypercalls > > We will employ the sub operations of the system management hypercall (sysctl). > There are to be four sub-operations: > > * upload the payloads. > * listing of payloads summary uploaded and their state. > * getting an particular payload summary and its state. > * command to apply, delete, or revert the payload. > > The patching is asynchronous therefore the caller is responsible > to verify that it has been applied properly by retrieving the summary of it > and verifying that there are no error codes associated with the payload. > > We **MUST** make it asynchronous due to the nature of patching: it requires > every physical CPU to be lock-step with each other. The patching mechanism > while an implementation detail, is not an short operation and as such > the design **MUST** assume it will be an long-running operation. > > Furthermore it is possible to have multiple different payloads for the same > function. As such an unique id has to be visible to allow proper manipulation. > > The hypercall is part of the `xen_sysctl`. The top level structure contains > one uint32_t to determine the sub-operations: > >
> struct xen_sysctl_xsplice_op {
>      uint32_t cmd;
>     union {
>            ... see below ...
>          } u;
> };
>
> 
> while the rest of hypercall specific structures are part of the this structure. > > > ### XEN_SYSCTL_XSPLICE_UPLOAD (0) > > Upload a payload to the hypervisor. The payload is verified and if there > are any issues the proper return code will be returned. The payload is > not applied at this time - that is controlled by *XEN_SYSCTL_XSPLICE_ACTION*. > > The caller provides: > > * `id` unique id. > * `payload` the virtual address of where the ELF payload is. > > The return value is zero if the payload was succesfully uploaded and the > signature was verified. Otherwise an EXX return value is provided. > Duplicate `id` are not supported. > > The `payload` is the ELF payload as mentioned in the `Payload format` section. > > The structure is as follow: > >
> struct xen_sysctl_xsplice_upload {
>      char id[40];  /* IN, name of the patch. */
>      uint64_t size; /* IN, size of the ELF file. */
>      XEN_GUEST_HANDLE_64(uint8) payload; /* ELF file. */
> };
> 
> > ### XEN_SYSCTL_XSPLICE_GET (1) > > Retrieve an summary of an specific payload. This caller provides: > > * `id` the unique id. > * `status` *MUST* be set to zero. > * `rc` *MUST* be set to zero. > > The `summary` structure contains an summary of payload which includes: > > * `id` the unique id. > * `status` - whether it has been: > 1. *XSPLICE_STATUS_LOADED* (0) has been loaded. > 2. *XSPLICE_STATUS_PROGRESS* (1) acting on the **XEN_SYSCTL_XSPLICE_ACTION** command. > 3. *XSPLICE_STATUS_CHECKED* (2) the ELF payload safety checks passed. > 4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied. > 5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also reverted. > 6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult `rc` for details. > * `rc` - its error state if any. > > The structure is as follow: > >
> #define XSPLICE_STATUS_LOADED    0
> #define XSPLICE_STATUS_PROGRESS  1
> #define XSPLICE_STATUS_CHECKED   2
> #define XSPLICE_STATUS_APPLIED   3
> #define XSPLICE_STATUS_REVERTED  4
> #define XSPLICE_STATUS_IN_ERROR  5
>
> struct xen_sysctl_xsplice_summary {
>      char id[40];  /* IN/OUT, name of the patch. */
>      uint32_t status;   /* OUT */
>      int32_t rc;  /* OUT */
> };
> 
> > ### XEN_SYSCTL_XSPLICE_LIST (2) > > Retrieve an array of abbreviated summary of payloads that are loaded in the > hypervisor. > > The caller provides: > > * `idx` index iterator. Initially it *MUST* be zero. > * `count` the max number of entries to populate. > * `summary` virtual address of where to write payload summaries. > > The hypercall returns zero on success and updates the `idx` (index) iterator > with the number of payloads returned, `count` to the number of remaining > payloads, and `summary` with an number of payload summaries. > > If the hypercall returns E2BIG the `count` is too big and should be > lowered. > > Note that due to the asynchronous nature of hypercalls the domain might have > added or removed the number of payloads making this information stale. It is > the responsibility of the domain to provide proper accounting. > > The `summary` structure contains an summary of payload which includes: > > * `id` unique id. > * `status` - whether it has been: > 1. *XSPLICE_STATUS_LOADED* (0) has been loaded. > 2. *XSPLICE_STATUS_PROGRESS* (1) acting on the **XEN_SYSCTL_XSPLICE_ACTION** command. > 3. *XSPLICE_STATUS_CHECKED* (2) the payload `old` and `addr` match with the hypervisor. > 4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied. > 5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also reverted. > 6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult `rc` for details. > * `rc` - its error state if any. > > The structure is as follow: > >
> struct xen_sysctl_xsplice_list {
>      uint32_t idx;  /* IN/OUT */
>      uint32_t count;  /* IN/OUT */
>      XEN_GUEST_HANDLE_64(xen_sysctl_xsplice_summary) summary;  /* OUT */
> };
>
> struct xen_sysctl_xsplice_summary {
>      char id[40];  /* OUT, name of the patch. */
>      uint32_t status;   /* OUT */
>      int32_t rc;  /* OUT */
> };
>
> 
> ### XEN_SYSCTL_XSPLICE_ACTION (3) > > Perform an operation on the payload structure referenced by the `id` field. > The operation request is asynchronous and the status should be retrieved > by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall. > > The caller provides: > > * `id` the unique id. > * `cmd` the command requested: > 1. *XSPLICE_ACTION_CHECK* (0) check that the payload will apply properly. > 2. *XSPLICE_ACTION_UNLOAD* (1) unload the payload. > 3. *XSPLICE_ACTION_REVERT* (2) revert the payload. > 4. *XSPLICE_ACTION_APPLY* (3) apply the payload. > > > The return value will be zero unless the provided fields are incorrect. > > The structure is as follow: > >
> #define XSPLICE_ACTION_CHECK  0
> #define XSPLICE_ACTION_UNLOAD 1
> #define XSPLICE_ACTION_REVERT 2
> #define XSPLICE_ACTION_APPLY  3
>
> struct xen_sysctl_xsplice_action {
>      char id[40];  /* IN, name of the patch. */
>      uint32_t cmd; /* IN */
> };
>
> 
> > ## Sequence of events. > > The normal sequence of events is to: > > 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors *STOP* here. > 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_LOADED* go to next step. > 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to verify that the payload can be succesfully applied. > 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_CHECKED* go to next step. > 5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch. > 6. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_APPLIED* exit with success. > > > ## Addendum > > Implementation quirks should not be discussed in a design document. > > However these observations can provide aid when developing against this > document. > > > ### Alternative assembler > > Alternative assembler is a mechanism to use different instructions depending > on what the CPU supports. This is done by providing multiple streams of code > that can be patched in - or if the CPU does not support it - padded with > `nop` operations. The alternative assembler macros cause the compiler to > expand the code to place a most generic code in place - emit a special > ELF .section header to tag this location. During run-time the hypervisor > can leave the areas alone or patch them with an better suited opcodes. > > As we might be patching the alternative assembler sections as well - by > providing a new better suited op-codes or perhaps with nops - we need to > also re-run the alternative assembler patching after we have done our > patching. > > Also when we are doing safety checks the code we are checking might be > utilizing alternative assembler. As such we should relax out checks to > accomodate that. > > ### .rodata sections > > The patching might require strings to be updated as well. As such we must be > also able to patch the strings as needed. This sounds simple - but the compiler > has a habit of coalescing strings that are the same - which means if we in-place > alter the strings - other users will be inadvertently affected as well. > > This is also where pointers to functions live - and we may need to patch this > as well. > > To guard against that we must be prepared to do patching similar to > trampoline patching or in-line depending on the flavour. If we can > do in-line patching we would need to: > > * alter `.rodata` to be writeable. > * inline patch. > * alter `.rodata` to be read-only. > > If are doing trampoline patching we would need to: > > * allocate a new memory location for the string. > * all locations which use this string will have to be updated to use the > offset to the string. > * mark the region RO when we are done. > > ### .bss sections > > Patching writable data is not suitable as it is unclear what should be done > depending on the current state of data. As such it should not be attempted. > > > ### Patching code which is in the stack. > > We should not patch the code which is on the stack. That can lead > to corruption. > > ### Trampoline (e9 opcode) > > The e9 opcode used for jmpq uses a 32-bit signed displacement. That means > we are limited to up to 2GB of virtual address to place the new code > from the old code. That should not be a problem since Xen hypervisor has > a very small footprint. > > However if we need - we can always add two trampolines. One at the 2GB > limit that calls the next trampoline. > > ### Time rendezvous code instead of stop_machine for patching > > The hypervisor's time rendezvous code runs synchronously across all CPUs > every second. Using the stop_machine to patch can stall the time rendezvous > code and result in NMI. As such having the patching be done at the tail > of rendezvous code should avoid this problem. > > ### Security > > Only the privileged domain should be allowed to do this operation. > > . >