Re: [RFC v2] xSplice design

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Lars Kurth <lars.kurth@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"msw@amazon.com" <msw@amazon.com>,
	"aliguori@amazon.com" <aliguori@amazon.com>,
	Antony Messerli <amesserl@rackspace.com>,
	Rick Harris <rick.harris@rackspace.com>,
	Paul Voccio <paul.voccio@rackspace.com>,
	Steven Wilson <steven.wilson@rackspace.com>,
	Major Hayden <major.hayden@rackspace.com>,
	Josh Kearney <josh.kearney@rackspace.com>,
	"jinsong.liu@alibaba-inc.com" <jinsong.liu@alibaba-inc.com>,
	"xiantao.zxt@alibaba-inc.com" <xiantao.zxt@alibaba-inc.com>,
	"boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
	Daniel Kiper <daniel.kiper@oracle.com>,
	Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	"bob.liu@oracle.com" <bob.liu@oracle.com>,
	"hanweidong@huawei.com" <hanweidong@huawei.com>,
	"peter.huangpeng@huawei.com" <peter.huangpeng@huawei.com>,
	"fanhenglong@huawei.com" <fanhenglong@huawei.com>,
	liuyingdong@huawei.com
Cc: "konrad@darnok.org" <konrad@darnok.org>
Subject: Re: [RFC v2] xSplice design
Date: Tue, 19 May 2015 19:13:03 +0000	[thread overview]
Message-ID: <D180D933.1C0D0%lars.kurth@citrix.com> (raw)
In-Reply-To: <20150515194440.GA24313@l.oracle.com>


Adding Don Slutz as he requested to be added
Lars

On 15/05/2015 20:44, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>
wrote:

>Hey!
>
>During the Xen Hacka^H^H^H^HProject Summit? we chatted about live-patching
>the hypervisor. We sketched out how it could be done, and brainstormed
>some of the problems.
>
>I took that and wrote an design - which is very much RFC. The design is
>laid out in two sections - the format of the ELF payload - and then the
>hypercalls to act on it.
>
>Hypercall preemption has caused a couple of XSAs so I've baked the need
>for that in the design so we hopefully won't have an XSA for this code.
>
>There are two big *TODO* in the design which I had hoped to get done
>before sending this out - however I am going on vacation for two weeks
>so I figured it would be better to send this off for folks to mull now
>then to have it languish.
>
>Please feel free to add more folks on the CC list.
>
>Enjoy!
>
>
># xSplice Design v1 (EXTERNAL RFC v2)
>
>## Rationale
>
>A mechanism is required to binarily patch the running hypervisor with new
>opcodes that have come about due to primarily security updates.
>
>This document describes the design of the API that would allow us to
>upload to the hypervisor binary patches.
>
>## Glossary
>
> * splice - patch in the binary code with new opcodes
> * trampoline - a jump to a new instruction.
> * payload - telemetries of the old code along with binary blob of the new
>   function (if needed).
> * reloc - telemetries contained in the payload to construct proper
>trampoline.
>
>## Multiple ways to patch
>
>The mechanism needs to be flexible to patch the hypervisor in multiple
>ways
>and be as simple as possible. The compiled code is contiguous in memory
>with
>no gaps - so we have no luxury of 'moving' existing code and must either
>insert a trampoline to the new code to be executed - or only modify
>in-place
>the code if there is sufficient space. The placement of new code has to
>be done
>by hypervisor and the virtual address for the new code is allocated
>dynamically.
>i
>This implies that the hypervisor must compute the new offsets when
>splicing
>in the new trampoline code. Where the trampoline is added (inside
>the function we are patching or just the callers?) is also important.
>
>To lessen the amount of code in hypervisor, the consumer of the API
>is responsible for identifying which mechanism to employ and how many
>locations
>to patch. Combinations of modifying in-place code, adding trampoline, etc
>has to be supported. The API should allow read/write any memory within
>the hypervisor virtual address space.
>
>We must also have a mechanism to query what has been applied and a
>mechanism
>to revert it if needed.
>
>We must also have a mechanism to: provide an copy of the old code - so
>that
>the hypervisor can verify it against the code in memory; the new code;
>the symbol name of the function to be patched; or offset from the symbol;
>or virtual address.
>
>The complications that this design will encounter are explained later
>in this document.
>
>## Patching code
>
>The first mechanism to patch that comes in mind is in-place replacement.
>That is replace the affected code with new code. Unfortunately the x86
>ISA is variable size which places limits on how much space we have
>available
>to replace the instructions.
>
>The second mechanism is by replacing the call or jump to the
>old function with the address of the new function.
>
>A third mechanism is to add a jump to the new function at the
>start of the old function.
>
>### Example of trampoline and in-place splicing
>
>As example we will assume the hypervisor does not have XSA-132 (see
>*domctl/sysctl: don't leak hypervisor stack to toolstacks*
>4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary
>patch
>the hypervisor with it. The original code looks as so:
>
><pre>
>   48 89 e0                  mov    %rsp,%rax
>   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax
></pre>
>
>while the new patched hypervisor would be:
>
><pre>
>   48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)
>   48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)
>   48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)
>   48 89 e0                  mov    %rsp,%rax
>   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax
></pre>
>
>This is inside the arch_do_domctl. This new change adds 21 extra
>bytes of code which alters all the offsets inside the function. To alter
>these offsets and add the extra 21 bytes of code we might not have enough
>space in .text to squeze this in.
>
>As such we could simplify this problem by only patching the site
>which calls arch_do_domctl:
>
><pre>
><do_domctl>:  
> e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>
></pre>
>
>with a new address for where the new `arch_do_domctl` would be (this
>area would be allocated dynamically).
>
>Astute readers will wonder what we need to do if we were to patch
>`do_domctl`
>- which is not called directly by hypervisor but on behalf of the guests
>via
>the `compat_hypercall_table` and `hypercall_table`.
>Patching the offset in `hypercall_table` for `do_domctl:
>(ffff82d080103079 <do_domctl>:)
><pre>
>
> ffff82d08024d490:   79 30
> ffff82d08024d492:   10 80 d0 82 ff ff
>
></pre>
>with the new address where the new `do_domctl` is possible. The other
>place where it is used is in `hvm_hypercall64_table` which would need
>to be patched in a similar way. This would require an in-place splicing
>of the new virtual address of `arch_do_domctl`.
>
>In summary this example patched the callee of the affected function by
> * allocating memory for the new code to live in,
> * changing the virtual address of all the functions which called the old
>   code (computing the new offset, patching the callq with a new callq).
> * changing the function pointer tables with the new virtual address of
>   the function (splicing in the new virtual address). Since this table
>   resides in the .rodata section we would need to temporarily change the
>   page table permissions during this part.
>
>
>However it has severe drawbacks - the safety checks which have to make
>sure
>the function is not on the stack - must also check every caller. For some
>patches this could if there were an sufficient large amount of callers
>that we would never be able to apply the update.
>
>### Example of different trampoline patching.
>
>An alternative mechanism exists where we can insert an trampoline in the
>existing function to be patched to jump directly to the new code. This
>lessens the locations to be patched to one but it puts pressure on the
>CPU branching logic (I-cache, but it is just one unconditional jump).
>
>For this example we will assume that the hypervisor has not been compiled
>with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill
>structures
>for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
>in `xen_version` hypercall. This function is not called **anywhere** in
>the hypervisor (it is called by the guest) but referenced in the
>`compat_hypercall_table` and `hypercall_table` (and indirectly called
>from that). Patching the offset in `hypercall_table` for the old
>`do_xen_version` (ffff82d080112f9e <do_xen_version>)
>
></pre>
> ffff82d08024b270 <hypercall_table>
> ...  
> ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff
>
></pre>
>with the new address where the new `do_xen_version` is possible. The other
>place where it is used is in `hvm_hypercall64_table` which would need
>to be patched in a similar way. This would require an in-place splicing
>of the new virtual address of `do_xen_version`.
>
>An alternative solution would be to patch insert an trampoline in the
>old `do_xen_version' function to directly jump to the new
>`do_xen_version`.
>
><pre>
> ffff82d080112f9e <do_xen_version>:
> ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov
>$0xffffffffffffffda,%rax
> ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi
> ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2
><do_xen_version+0x534>
></pre>
>
>with:
>
><pre>
> ffff82d080112f9e <do_xen_version>:
> ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new
>do_xen_version]  
></pre>
>
>which would lessen the amount of patching to just one location.
>
>In summary this example patched the affected function to jump to the
>new replacement function which required:
> * allocating memory for the new code to live in,
> * inserting trampoline with new offset in the old function to point to
>the
>   new function.
> * Optionally we can insert in the old function an trampoline jump to an
>function
>   providing an BUG_ON to catch errant code.
>
>The disadvantage of this are that the unconditional jump will consume a
>small
>I-cache penalty. However the simplicity of the patching of safety checks
>make this a worthwhile option.
>
>### Security
>
>With this method we can re-write the hypervisor - and as such we **MUST**
>be
>diligent in only allowing certain guests to perform this operation.
>
>Furthermore with SecureBoot or tboot, we **MUST** also verify the
>signature
>of the payload to be certain it came from a trusted source.
>
>As such the hypercall **MUST** support an XSM policy to limit the what
>guest is allowed. If the system is booted with signature checking the
>signature checking will be enforced.
>
>## Payload format
>
>The payload **MUST** contain enough data to allow us to apply the update
>and also safely reverse it. As such we **MUST** know:
>
> * What the old code is expected to be. We **MUST** verify it against the
>   runtime code.
> * The locations in memory to be patched. This can be determined
>dynamically
>   via symbols or via virtual addresses.
> * The new code to be used.
> * Signature to verify the payload.
>
>This binary format can be constructed using an custom binary format but
>there are severe disadvantages of it:
>
> * The format might need to be change and we need an mechanism to
>accommodate
>   that.
> * It has to be platform agnostic.
> * Easily constructed using existing tools.
>
>As such having the payload in an ELF file is the sensible way. We would be
>carrying the various set of structures (and data) in the ELF sections
>under
>different names and with definitions. The prefix for the ELF section name
>would always be: *.xsplice_*
>
>Note that every structure has padding. This is added so that the
>hypervisor
>can re-use those fields as it sees fit.
>
>There are five sections *.xsplice_* sections:
>
> * `.xsplice_symbols` and `.xsplice_str`. The array of symbols to be
>referenced
>   during the update. This can contain the symbols (functions) that will
>be
>   patched, or the list of symbols (functions) to be checked pre-patching
>which
>   may not be on the stack.
>
>* `.xsplice_reloc` and `.xsplice_reloc_howto`. The howto properly
>construct
>   trampolines for an patch. We can have multiple locations for which we
>   need to insert an trampoline for a payload and each location might
>require
>   a different way of handling it. This would naturally reference the
>`.text`
>   section and its proper offset. The `.xsplice_reloc` is not directly
>concerned
>   with patches but rather is an ELF relocation - describing the target
>   of a relocation and how that is performed.  They're also used for where
>   the new code references the run code too.
>
> * `.xsplice_sections`. The safety data for the old code and new code.
>   This contains an array of symbols (pointing to `.xsplice_symbols` to
>   and `.text`) which are to be used during safety and dependency
>checking.
>
>
> * `.xsplice_patches`: The description of the new functions to be patched
>   in (size, type, pointer to code, etc.).
>
> * `.xsplice_change`. The structure that ties all of this together and
>defines
>   the payload.
>
>Additionally the ELF file would contain:
>
> * `.text` section for the new and old code (function).
> * `.rela.text` relocation data for the `.text` (both new and old).
> * `.rela.xsplice_patches` relocation data for `.xsplice_patches` (such
>as offset
>   to the `.text` ,`.xsplice_symbols`, or `.xsplice_reloc` section).
> * `.bss` section for the new code (function)
> * `.data` and `.data.read_mostly` section for the new and old code
>(function)
> * `.rodata` section for the new and old code (function).
>
>In short the *.xsplice_* sections represent various structures and the
>ELF provides the mechanism to glue it all together when loaded in memory.
>
>Note that a lot of these ideas are borrowed from kSplice which is
>available at: https://github.com/jirislaby/ksplice
>
>For ELF understanding the best starting point is the OSDev Wiki
>(http://wiki.osdev.org/ELF). Furthermore the ELF specification is
>at http://www.skyfree.org/linux/references/ELF_Format.pdf and
>at Oracle's web site:
>http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-46512.html#scro
>lltoc
>
>### ASCII art of the ELF structures
>
>*TODO*: Include an ASCII art of how the sections are tied together.
>
>### xsplice_symbols
>
>The section contains an array of an structure that outlines the name
>of the symbol to be patched (or checked against). The structure is
>as follow:
>
><pre>
>struct xsplice_symbol {
>    const char *name; /* The ELF name of the symbol. */
>    const char *label; /* A unique xSplice name for the symbol. */
>    uint8_t pad[16]; /* Must be zero. */
>};  
></pre>
>The structures may be in the section in any order and in any amount
>(duplicate entries are permitted).
>
>Both `name` and `label` would be pointing to entries in `.xsplice_str`.
>
>The `label` is used for diagnostic purposes - such as including the
>name and the offset.
>
>### xsplice_reloc and xsplice_reloc_howto
>
>The section contains an array of a structure that outlines the different
>locations (and howto) for which an trampoline is to be inserted.
>
>The howto defines in the detail the change. It contains the type,
>whether the relocation is relative, the size of the relocation,
>bitmask for which parts of the instruction or data are to be replaced,
>amount of final relocation is shifted by (to drop unwanted data), and
>whether the replacement should be interpreted as signed value.
>
>The structure is as follow:
>
><pre>
>#define XSPLICE_HOWTO_RELOC_INLINE  0 /* Inline replacement. */
>#define XSPLICE_HOWTO_RELOC_PATCH   1 /* Add trampoline. */
>#define XSPLICE_HOWTO_RELOC_DATA    2 /*  __DATE__ type change. */
>#define XSPLICE_HOWTO_RELOC_TIME    3 /* __TIME__ type chnage. */
>#define XSPLICE_HOWTO_BUG           4 /* BUG_ON being replaced.*/
>#define XSPLICE_HOWTO_EXTABLE       5 /* exception_table change. */
>#define XSPLICE_HOWTO_SYMBOL        6 /* change in symbol table. */
>
>#define XSPLICE_HOWTO_FLAG_PC_REL    0x00000001 /* Is PC relative. */
>#define XSPLICE_HOWOT_FLAG_SIGN      0x00000002 /* Should the new value
>be treated as signed value. */
>
>struct xsplice_reloc_howto {
>    uint32_t    type; /* XSPLICE_HOWTO_* */
>    uint32_t    flag; /* XSPLICE_HOWTO_FLAG_* */
>    uint32_t    size; /* Size, in bytes, of the item to be relocated. */
>    uint32_t    r_shift; /* The value the final relocation is shifted
>right by; used to drop unwanted data from the relocation. */
>    uint64_t    mask; /* Bitmask for which parts of the instruction or
>data are replaced with the relocated value. */
>    uint8_t     pad[8]; /* Must be zero. */
>};  
>
></pre>
>
>This structure is used in:
>
><pre>
>struct xsplice_reloc {
>    uint64_t addr; /* The address of the relocation (if known). */
>    struct xsplice_symbol *symbol; /* Symbol for this relocation. */
>    struct xsplice_reloc_howto  *howto; /* Pointer to the above
>structure. */  
>    uint64_t isns_added; /* ELF addend resulting from quirks of
>instruction one of whose operands is the relocation. For example, this is
>-4 on x86 pc-relative jumps. */
>    uint64_t isns_target; /* rest of the ELF addend.  This is equal to
>the offset against the symbol that the relocation refers to. */
>    uint8_t pad[8];  /* Must be zero. */
>};  
></pre>
>
>### xsplice_sections
>
>The structure defined in this section is used to verify that it is safe
>to update with the new changes. It can contain safety data on the old code
>and what kind of matching we are to expect.
>
>It also can contain safety date of what to check when about to patch.
>That is whether any of the addresses (either provided or resolved
>when payload is loaded by referencing the symbols) are in memory
>with what we expect it to be.
>
>As such the flags can be or-ed together:
>
><pre>
>#define XSPLICE_SECTION_TEXT   0x00000001 /* Section is in .text */
>#define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .ro */
>#define XSPLICE_SECTION_DATA   0x00000004 /* Section is in .rodata */
>#define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */
>#define XSPLICE_SECTION_ALTINSTRUCTIONS 0x00000010 /* Section has
>.altinstructions. */
>#define XSPLICE_SECTION_TEXT_INPLACE 0x00000200 /* Change is in place. */
>  
>#dekine XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */
>#define XSPLICE_SECTION_NO_STACKCHECK 0x00000800 /* Do not check the
>stack. */  
>
>struct xsplice_section {
>    struct xsplice_symbol *symbol; /* The symbol associated with this
>change. */  
>    uint64_t address; /* The address of the section (if known). */
>    uint64_t size; /* The size of the section. */
>    uint64_t flags; /* Various XSPLICE_SECTION_* flags. */
>    uint8_t pad[16]; /* To be zero. */
>};
>
></pre>
>
>### xsplice_patches
>
>Within this section we have an array of a structure defining the new code
>(patch).
>
>This structure consist of an pointer to the new code (which in ELF ends up
>pointing to an offset in `.text` or `.data` section); the type of patch:
>inline - either text or data, or requesting an trampoline; and size of
>patch.
>
>The structure is as follow:
>
><pre>
>#define XSPLICE_PATCH_INLINE_TEXT   0
>#define XSPLICE_PATCH_INLINE_DATA   1
>#define XSPLICE_PATCH_RELOC_TEXT    2
>
>struct xsplice_patch {
>    uint32_t type; /* XSPLICE_PATCH_* .*/
>    uint32_t size; /* Size of patch. */
>    uint64_t addr; /* The address of the new code (or data). */
>    void *content; /* The bytes to be installed. */
>    uint8_t pad[16]; /* Must be zero. */
>};
>
></pre>
>
>### xsplice_code
>
>The structure embedded within this section ties it all together.
>It has the name of the patch, and pointers to all the above
>mentioned structures (the start and end addresses).
>
>The structure is as follow:
>
><pre>
>struct xsplice_code {
>    const char *name; /* A sensible name for the patch. Up to 40
>characters. */  
>    struct xsplice_reloc *relocs, *relocs_end; /* How to patch it */
>    struct xsplice_section *sections, *sections_end; /* Safety data */
>    struct xsplice_patch *patches, *patches_end; /* Patch code & data */
>    uint8_t pad[32]; /* Must be zero. */
>};
></pre>
>
>There should only be one such structure in the section.
>
>### Example
>
>*TODO*: Include an objdump of how the ELF would look like for the XSA
>mentioned earlier.
>
>## Signature checking requirements.
>
>The signature checking requires that the layout of the data in memory
>**MUST** be same for signature to be verified. This means that the payload
>data layout in ELF format **MUST** match what the hypervisor would be
>expecting such that it can properly do signature verification.
>
>The signature is based on the all of the payloads continuously laid out
>in memory. The signature is to be appended at the end of the ELF payload
>prefixed with the string '~Module signature appended~\n", followed by
>an signature header then followed by the signature, key identifier, and
>signers
>name.
>
>Specifically the signature header would be:
>
><pre>
>#define PKEY_ALGO_DSA       0
>#define PKEY_ALGO_RSA       1
>
>#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */
>#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */
>
>#define HASH_ALGO_MD4          0
>#define HASH_ALGO_MD5          1
>#define HASH_ALGO_SHA1         2
>#define HASH_ALGO_RIPE_MD_160  3
>#define HASH_ALGO_SHA256       4
>#define HASH_ALGO_SHA384       5
>#define HASH_ALGO_SHA512       6
>#define HASH_ALGO_SHA224       7
>#define HASH_ALGO_RIPE_MD_128  8
>#define HASH_ALGO_RIPE_MD_256  9
>#define HASH_ALGO_RIPE_MD_320 10
>#define HASH_ALGO_WP_256      11
>#define HASH_ALGO_WP_384      12
>#define HASH_ALGO_WP_512      13
>#define HASH_ALGO_TGR_128     14
>#define HASH_ALGO_TGR_160     15
>#define HASH_ALGO_TGR_192     16
>
>
>struct elf_payload_signature {
>	u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */
>	u8	hash;		/* Digest algorithm: HASH_ALGO_*. */
>	u8	id_type;	/* Key identifier type PKEY_ID*. */
>	u8	signer_len;	/* Length of signer's name */
>	u8	key_id_len;	/* Length of key identifier */
>	u8	__pad[3];  
>	__be32	sig_len;	/* Length of signature data */
>};
>
></pre>
>(Note that this has been borrowed from Linux module signature code.).
>
>
>## Hypercalls
>
>We will employ the sub operations of the system management hypercall
>(sysctl).
>There are to be four sub-operations:
>
> * upload the payloads.
> * listing of payloads summary uploaded and their state.
> * getting an particular payload summary and its state.
> * command to apply, delete, or revert the payload.
>
>The patching is asynchronous therefore the caller is responsible
>to verify that it has been applied properly by retrieving the summary of
>it
>and verifying that there are no error codes associated with the payload.
>
>We **MUST** make it asynchronous due to the nature of patching: it
>requires
>every physical CPU to be lock-step with each other. The patching mechanism
>while an implementation detail, is not an short operation and as such
>the design **MUST** assume it will be an long-running operation.
>
>Furthermore it is possible to have multiple different payloads for the
>same
>function. As such an unique id has to be visible to allow proper
>manipulation.
>
>The hypercall is part of the `xen_sysctl`. The top level structure
>contains
>one uint32_t to determine the sub-operations:
>
><pre>
>struct xen_sysctl_xsplice_op {
>    uint32_t cmd; 
>	union {  
>          ... see below ...
>        } u;  
>};  
>
></pre>
>while the rest of hypercall specific structures are part of the this
>structure.
>
>
>### XEN_SYSCTL_XSPLICE_UPLOAD (0)
>
>Upload a payload to the hypervisor. The payload is verified and if there
>are any issues the proper return code will be returned. The payload is
>not applied at this time - that is controlled by
>*XEN_SYSCTL_XSPLICE_ACTION*.
>
>The caller provides:
>
> * `id` unique id.
> * `payload` the virtual address of where the ELF payload is.
>
>The return value is zero if the payload was succesfully uploaded and the
>signature was verified. Otherwise an EXX return value is provided.
>Duplicate `id` are not supported.
>
>The `payload` is the ELF payload as mentioned in the `Payload format`
>section.
>
>The structure is as follow:
>
><pre>
>struct xen_sysctl_xsplice_upload {
>    char id[40];  /* IN, name of the patch. */
>    uint64_t size; /* IN, size of the ELF file. */
>    XEN_GUEST_HANDLE_64(uint8) payload; /* ELF file. */
>}; 
></pre>
>
>### XEN_SYSCTL_XSPLICE_GET (1)
>
>Retrieve an summary of an specific payload. This caller provides:
>
> * `id` the unique id.
> * `status` *MUST* be set to zero.
> * `rc` *MUST* be set to zero.
>
>The `summary` structure contains an summary of payload which includes:
>
> * `id` the unique id.
> * `status` - whether it has been:
> 1. *XSPLICE_STATUS_LOADED* (0) has been loaded.
> 2. *XSPLICE_STATUS_PROGRESS* (1) acting on the
>**XEN_SYSCTL_XSPLICE_ACTION** command.
> 3. *XSPLICE_STATUS_CHECKED*  (2) the ELF payload safety checks passed.
> 4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
> 5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also
>reverted.
> 6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult
>`rc` for details.
> * `rc` - its error state if any.
>
>The structure is as follow:
>
><pre>
>#define XSPLICE_STATUS_LOADED    0
>#define XSPLICE_STATUS_PROGRESS  1
>#define XSPLICE_STATUS_CHECKED   2
>#define XSPLICE_STATUS_APPLIED   3
>#define XSPLICE_STATUS_REVERTED  4
>#define XSPLICE_STATUS_IN_ERROR  5
>
>struct xen_sysctl_xsplice_summary {
>    char id[40];  /* IN/OUT, name of the patch. */
>    uint32_t status;   /* OUT */
>    int32_t rc;  /* OUT */
>}; 
></pre>
>
>### XEN_SYSCTL_XSPLICE_LIST (2)
>
>Retrieve an array of abbreviated summary of payloads that are loaded in
>the
>hypervisor.
>
>The caller provides:
>
> * `idx` index iterator. Initially it *MUST* be zero.
> * `count` the max number of entries to populate.
> * `summary` virtual address of where to write payload summaries.
>
>The hypercall returns zero on success and updates the `idx` (index)
>iterator
>with the number of payloads returned, `count` to the number of remaining
>payloads, and `summary` with an number of payload summaries.
>
>If the hypercall returns E2BIG the `count` is too big and should be
>lowered.
>
>Note that due to the asynchronous nature of hypercalls the domain might
>have
>added or removed the number of payloads making this information stale. It
>is
>the responsibility of the domain to provide proper accounting.
>
>The `summary` structure contains an summary of payload which includes:
>
> * `id` unique id.
> * `status` - whether it has been:
> 1. *XSPLICE_STATUS_LOADED* (0) has been loaded.
> 2. *XSPLICE_STATUS_PROGRESS* (1) acting on the
>**XEN_SYSCTL_XSPLICE_ACTION** command.
> 3. *XSPLICE_STATUS_CHECKED*  (2) the payload `old` and `addr` match with
>the hypervisor.
> 4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
> 5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also
>reverted.
> 6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult
>`rc` for details.
> * `rc` - its error state if any.
>
>The structure is as follow:
>
><pre>
>struct xen_sysctl_xsplice_list {
>    uint32_t idx;  /* IN/OUT */
>    uint32_t count;  /* IN/OUT */
>    XEN_GUEST_HANDLE_64(xen_sysctl_xsplice_summary) summary;  /* OUT */
>};  
>
>struct xen_sysctl_xsplice_summary {
>    char id[40];  /* OUT, name of the patch. */
>    uint32_t status;   /* OUT */
>    int32_t rc;  /* OUT */
>};  
>
></pre>
>### XEN_SYSCTL_XSPLICE_ACTION (3)
>
>Perform an operation on the payload structure referenced by the `id`
>field.
>The operation request is asynchronous and the status should be retrieved
>by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST**
>hypercall.
>
>The caller provides:
>
> * `id` the unique id.
> * `cmd` the command requested:
>  1. *XSPLICE_ACTION_CHECK* (0) check that the payload will apply
>properly.
>  2. *XSPLICE_ACTION_UNLOAD* (1) unload the payload.
>  3. *XSPLICE_ACTION_REVERT* (2) revert the payload.
>  4. *XSPLICE_ACTION_APPLY* (3) apply the payload.
>
>
>The return value will be zero unless the provided fields are incorrect.
>
>The structure is as follow:
>
><pre>
>#define XSPLICE_ACTION_CHECK  0
>#define XSPLICE_ACTION_UNLOAD 1
>#define XSPLICE_ACTION_REVERT 2
>#define XSPLICE_ACTION_APPLY  3
>
>struct xen_sysctl_xsplice_action {
>    char id[40];  /* IN, name of the patch. */
>    uint32_t cmd; /* IN */
>};  
>
></pre>
>
>## Sequence of events.
>
>The normal sequence of events is to:
>
> 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are
>errors *STOP* here.
> 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in
>*XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_LOADED* go to next
>step.
> 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to
>verify that the payload can be succesfully applied.
> 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in
>*XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_CHECKED* go to next
>step.
> 5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the
>patch.
> 6. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in
>*XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_APPLIED* exit with
>success.
>
> 
>## Addendum
>
>Implementation quirks should not be discussed in a design document.
>
>However these observations can provide aid when developing against this
>document.
>
>
>### Alternative assembler
>
>Alternative assembler is a mechanism to use different instructions
>depending
>on what the CPU supports. This is done by providing multiple streams of
>code
>that can be patched in - or if the CPU does not support it - padded with
>`nop` operations. The alternative assembler macros cause the compiler to
>expand the code to place a most generic code in place - emit a special
>ELF .section header to tag this location. During run-time the hypervisor
>can leave the areas alone or patch them with an better suited opcodes.
>
>As we might be patching the alternative assembler sections as well - by
>providing a new better suited op-codes or perhaps with nops - we need to
>also re-run the alternative assembler patching after we have done our
>patching.
>
>Also when we are doing safety checks the code we are checking might be
>utilizing alternative assembler. As such we should relax out checks to
>accomodate that.
>
>### .rodata sections
>
>The patching might require strings to be updated as well. As such we must
>be
>also able to patch the strings as needed. This sounds simple - but the
>compiler
>has a habit of coalescing strings that are the same - which means if we
>in-place
>alter the strings - other users will be inadvertently affected as well.
>
>This is also where pointers to functions live - and we may need to patch
>this
>as well.
>
>To guard against that we must be prepared to do patching similar to
>trampoline patching or in-line depending on the flavour. If we can
>do in-line patching we would need to:
>
> * alter `.rodata` to be writeable.
> * inline patch.
> * alter `.rodata` to be read-only.
>
>If are doing trampoline patching we would need to:
>
> * allocate a new memory location for the string.
> * all locations which use this string will have to be updated to use the
>   offset to the string.
> * mark the region RO when we are done.
>
>### .bss sections
>
>Patching writable data is not suitable as it is unclear what should be
>done
>depending on the current state of data. As such it should not be
>attempted.
>
>
>### Patching code which is in the stack.
>
>We should not patch the code which is on the stack. That can lead
>to corruption.
>
>### Trampoline (e9 opcode)
>
>The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
>we are limited to up to 2GB of virtual address to place the new code
>from the old code. That should not be a problem since Xen hypervisor has
>a very small footprint.
>
>However if we need - we can always add two trampolines. One at the 2GB
>limit that calls the next trampoline.
>
>### Time rendezvous code instead of stop_machine for patching
>
>The hypervisor's time rendezvous code runs synchronously across all CPUs
>every second. Using the stop_machine to patch can stall the time
>rendezvous
>code and result in NMI. As such having the patching be done at the tail
>of rendezvous code should avoid this problem.
>
>### Security
>
>Only the privileged domain should be allowed to do this operation.

next prev parent reply	other threads:[~2015-05-19 19:13 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-15 19:44 [RFC v2] xSplice design Konrad Rzeszutek Wilk
2015-05-18 12:41 ` Jan Beulich
2015-06-05 14:49   ` Konrad Rzeszutek Wilk
2015-06-05 15:16     ` Jan Beulich
2015-06-05 16:00       ` Konrad Rzeszutek Wilk
2015-06-05 16:14         ` Jan Beulich
2015-05-18 12:54 ` Liuqiming (John)
2015-05-18 13:11   ` Daniel Kiper
2015-06-05 14:50   ` Konrad Rzeszutek Wilk
2015-05-19 19:13 ` Lars Kurth [this message]
2015-05-20 15:11 ` Martin Pohlack
2015-06-05 15:00   ` Konrad Rzeszutek Wilk
2015-06-05 15:15     ` Andrew Cooper
2015-06-05 15:27     ` Jan Beulich
2015-06-08  8:34       ` Martin Pohlack
2015-06-08  8:51         ` Jan Beulich
2015-06-08 14:38     ` Martin Pohlack
2015-06-08 15:19       ` Konrad Rzeszutek Wilk
2015-06-12 11:51         ` Martin Pohlack
2015-06-12 14:06           ` Konrad Rzeszutek Wilk
2015-06-12 11:39 ` Martin Pohlack
2015-06-12 14:03   ` Konrad Rzeszutek Wilk
2015-06-12 14:31     ` Martin Pohlack
2015-06-12 14:43       ` Jan Beulich
2015-06-12 17:31         ` Martin Pohlack
2015-06-12 18:46           ` Konrad Rzeszutek Wilk
2015-06-12 16:09       ` Konrad Rzeszutek Wilk
2015-06-12 16:17         ` Andrew Cooper
2015-06-12 16:39           ` Konrad Rzeszutek Wilk
2015-06-12 18:36             ` Martin Pohlack
2015-06-12 18:51               ` Konrad Rzeszutek Wilk
2015-07-06 19:36         ` Konrad Rzeszutek Wilk
2015-10-27 12:05   ` Ross Lagerwall
2015-10-29 16:55     ` Ross Lagerwall
2015-10-30 10:39       ` Martin Pohlack
2015-10-30 14:03         ` Ross Lagerwall
2015-10-30 14:06           ` Martin Pohlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D180D933.1C0D0%lars.kurth@citrix.com \
    --to=lars.kurth@citrix.com \
    --cc=aliguori@amazon.com \
    --cc=amesserl@rackspace.com \
    --cc=bob.liu@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=daniel.kiper@oracle.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=fanhenglong@huawei.com \
    --cc=hanweidong@huawei.com \
    --cc=jinsong.liu@alibaba-inc.com \
    --cc=josh.kearney@rackspace.com \
    --cc=konrad.wilk@oracle.com \
    --cc=konrad@darnok.org \
    --cc=liuyingdong@huawei.com \
    --cc=major.hayden@rackspace.com \
    --cc=msw@amazon.com \
    --cc=paul.voccio@rackspace.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=rick.harris@rackspace.com \
    --cc=steven.wilson@rackspace.com \
    --cc=xiantao.zxt@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).