From: "Yu, Zhang" <yu.c.zhang@linux.intel.com>
To: Malcolm Crossley <malcolm.crossley@citrix.com>,
xen-devel <xen-devel@lists.xenproject.org>,
Jan Beulich <JBeulich@suse.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Paul Durrant <Paul.Durrant@citrix.com>,
Kevin Tian <kevin.tian@intel.com>,
"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
David Vrabel <david.vrabel@citrix.com>
Subject: Re: [RFC] Xen PV IOMMU interface draft B
Date: Wed, 17 Jun 2015 20:48:44 +0800 [thread overview]
Message-ID: <55816CAC.7090104@linux.intel.com> (raw)
In-Reply-To: <557B0C35.4080907@citrix.com>
Hi Malcolm,
Thank you very much for accommodate our XenGT requirement in your
design. Following are some XenGT related questions. :)
On 6/13/2015 12:43 AM, Malcolm Crossley wrote:
> Hi All,
>
> Here is a design for allowing guests to control the IOMMU. This
> allows for the guest GFN mapping to be programmed into the IOMMU and
> avoid using the SWIOTLB bounce buffer technique in the Linux kernel
> (except for legacy 32 bit DMA IO devices).
>
> Draft B has been expanded to include Bus Address mapping/lookup for Mediated
> pass-through emulators.
>
> The pandoc markdown format of the document is provided below to allow
> for easier inline comments:
>
> % Xen PV IOMMU interface
> % Malcolm Crossley <<malcolm.crossley@citrix.com>>
> Paul Durrant <<paul.durrant@citrix.com>>
> % Draft B
>
> Introduction
> ============
>
> Revision History
> ----------------
>
> --------------------------------------------------------------------
> Version Date Changes
> ------- ----------- ----------------------------------------------
> Draft A 10 Apr 2014 Initial draft.
>
> Draft B 12 Jun 2015 Second draft.
> --------------------------------------------------------------------
>
> Background
> ==========
>
> Linux kernel SWIOTLB
> --------------------
>
> Xen PV guests use a Pseudophysical Frame Number(PFN) address space which is
> decoupled from the host Machine Frame Number(MFN) address space.
>
> PV guest hardware drivers are only aware of the PFN address space only and
> assume that if PFN addresses are contiguous then the hardware addresses would
> be contiguous as well. The decoupling between PFN and MFN address spaces means
> PFN and MFN addresses may not be contiguous across page boundaries and thus a
> buffer allocated in GFN address space which spans a page boundary may not be
> contiguous in MFN address space.
>
> PV hardware drivers cannot tolerate this behaviour and so a special
> "bounce buffer" region is used to hide this issue from the drivers.
>
> A bounce buffer region is a special part of the PFN address space which has
> been made to be contiguous in both PFN and MFN address spaces. When a driver
> requests a buffer which spans a page boundary be made available for hardware
> to read the core operating system code copies the buffer into a temporarily
> reserved part of the bounce buffer region and then returns the MFN address of
> the reserved part of the bounce buffer region back to the driver itself. The
> driver then instructs the hardware to read the copy of the buffer in the
> bounce buffer. Similarly if the driver requests a buffer is made available
> for hardware to write to the first a region of the bounce buffer is reserved
> and then after the hardware completes writing then the reserved region of
> bounce buffer is copied to the originally allocated buffer.
>
> The overheard of memory copies to/from the bounce buffer region is high
> and damages performance. Furthermore, there is a risk the fixed size
> bounce buffer region will become exhausted and it will not be possible to
> return an hardware address back to the driver. The Linux kernel drivers do not
> tolerate this failure and so the kernel is forced to crash, as an
> uncorrectable error has occurred.
>
> Input/Output Memory Management Units (IOMMU) allow for an inbound address
> mapping to be created from the I/O Bus address space (typically PCI) to
> the machine frame number address space. IOMMU's typically use a page table
> mechanism to manage the mappings and therefore can create mappings of page size
> granularity or larger.
>
> The I/O Bus address space will be referred to as the Bus Frame Number (BFN)
> address space for the rest of this document.
>
>
> Mediated Pass-through Emulators
> -------------------------------
>
> Mediated Pass-through emulators allow guest domains to interact with
> hardware devices via emulator mediation. The emulator runs in a domain separate
> to the guest domain and it is used to enforce security of guest access to the
> hardware devices and isolation of different guests accessing the same hardware
> device.
>
> The emulator requires a mechanism to map guest address's to a bus address that
> the hardware devices can access.
>
>
> Clarification of GFN and BFN fields for different guest types
> -------------------------------------------------------------
> Guest Frame Numbers (GFN) definition varies depending on the guest type.
>
> Diagram below details the memory accesses originating from CPU, per guest type:
>
> HVM guest PV guest
>
> (VA) (VA)
> | |
> MMU MMU
> | |
> (GFN) |
> | | (GFN)
> HAP a.k.a EPT/NPT |
> | |
> (MFN) (MFN)
> | |
> RAM RAM
>
> For PV guests GFN is equal to MFN for a single page but not for a contiguous
> range of pages.
>
> Bus Frame Numbers (BFN) refer to the address presented on the physical bus
> before being translated by the IOMMU.
>
> Diagram below details memory accesses originating from physical device.
>
> Physical Device
> |
> (BFN)
> |
> IOMMU-PT
> |
> (MFN)
> |
> RAM
>
>
>
> Purpose
> =======
>
> 1. Allow Xen guests to create/modify/destroy IOMMU mappings for
> hardware devices that the PV guests has access to. This enables the PV guest to
> program a bus address space mapping which matches it's GFN mapping. Once a 1-1
> mapping of PFN to bus address space is created then a bounce buffer
> region is not required for the IO devices connected to the IOMMU.
>
> 2. Allow for Xen guests to lookup/create/modify/destroy IOMMU mappings for
> guest memory of domains the calling Xen guest has sufficient privilege over.
> This enables domains to provide mediated hardware acceleration to other
> guest domains.
>
>
> Xen Architecture
> ================
>
> The Xen architecture consists of a new hypercall interface and changes to the
> grant map interface.
>
> The existing IOMMU mappings setup at domain creation time will be preserved so
> that PV domains unaware of this feature will continue to function with no
> changes required.
>
> Memory ballooning will be supported by taking an additional reference on the
> MFN backing the GFN for each successful IOMMU mapping created.
>
> An M2B tracking structure will be used to ensure all reference's to a MFN can
> be located easily.
>
> Xen PV IOMMU hypercall interface
> --------------------------------
> A two argument hypercall interface (do_iommu_op).
>
> ret_t do_iommu_op(XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int count)
>
> First argument, guest handle pointer to array of `struct pv_iommu_op`
> Second argument, unsigned integer count of `struct pv_iommu_op` elements in array.
>
> Definition of struct pv_iommu_op:
>
> struct pv_iommu_op {
>
> uint16_t subop_id;
> uint16_t flags;
> int32_t status;
>
> union {
> struct {
> uint64_t bfn;
> uint64_t gfn;
> } map_page;
>
> struct {
> uint64_t bfn;
> } unmap_page;
>
> struct {
> uint64_t bfn;
> uint64_t gfn;
> uint16_t domid;
> ioservid_t ioserver;
> } map_foreign_page;
>
> struct {
> uint64_t bfn;
> uint64_t gfn;
> uint16_t domid;
> ioservid_t ioserver;
> } lookup_foreign_page;
>
> struct {
> uint64_t bfn;
> ioservid_t ioserver;
> } unmap_foreign_page;
> } u;
> };
>
> Definition of PV IOMMU subops:
>
> #define IOMMUOP_query_caps 1
> #define IOMMUOP_map_page 2
> #define IOMMUOP_unmap_page 3
> #define IOMMUOP_map_foreign_page 4
> #define IOMMUOP_lookup_foreign_page 5
> #define IOMMUOP_unmap_foreign_page 6
>
>
> Design considerations for hypercall op
> -------------------------------------------
> IOMMU map/unmap operations can be slow and can involve flushing the IOMMU TLB
> to ensure the IO device uses the updated mappings.
>
> The op has been designed to take an array of operations and a count as
> parameters. This allows for easily implemented hypercall continuations to be
> used and allows for batches of IOMMU operations to be submitted before flushing
> the IOMMU TLB.
>
> The subop_id to be used for a particular element is encoded into the element
> itself. This allows for map and unmap operations to be performed in one hypercall
> and for the IOMMU TLB flushing optimisations to be still applied.
>
> The hypercall will ensure that the required IOMMU TLB flushes are applied before
> returning to guest via either hypercall completion or a hypercall continuation.
>
> IOMMUOP_query_caps
> ------------------
>
> This subop queries the runtime capabilities of the PV-IOMMU interface for the
> specific called domain. This subop uses `struct pv_iommu_op` directly.
>
> ------------------------------------------------------------------------------
> Field Purpose
> ----- ---------------------------------------------------------------
> `flags` [out] This field details the IOMMUOP capabilities.
>
> `status` [out] Status of this op, op specific values listed below
> ------------------------------------------------------------------------------
>
> Defined bits for flags field:
>
> ------------------------------------------------------------------------------
> Name Bit Definition
> ---- ------ ----------------------------------
> IOMMU_QUERY_map_cap 0 IOMMUOP_map_page or IOMMUOP_map_foreign
> can be used for this domain
>
> IOMMU_QUERY_map_all_gfns 1 IOMMUOP_map_page subop can map any MFN
> not used by Xen
>
> Reserved for future use 2-9 n/a
>
> IOMMU_page_order 10-15 Returns maximum possible page order for
> all other IOMMUOP subops
> ------------------------------------------------------------------------------
>
> Defined values for query_caps subop status field:
>
> Value Reason
> ------ ----------------------------------------------------------
> 0 subop successfully returned
>
> IOMMUOP_map_page
> ----------------------
> This subop uses `struct map_page` part of the `struct pv_iommu_op`.
>
> If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> allowed to map all GFN's except for Xen owned MFN's else the hardware
> domain will only be allowed to map GFN's which it owns.
>
> If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
> allowed to map all GFN's without taking a reference to the MFN backing the GFN
> by setting the IOMMU_MAP_OP_no_ref_cnt flag.
>
> Every successful pv_iommu_op will result in an additional page reference being
> taken on the MFN backing the GFN except for the condition detailed above.
>
> If the map_op flags indicate a writeable mapping is required then a writeable
> page type reference will be taken otherwise a standard page reference will be
> taken.
>
> All the following conditions are required to be true for PV IOMMU map
> subop to succeed:
>
> 1. IOMMU detected and supported by Xen
> 2. The domain has IOMMU controlled hardware allocated to it
> 3. If hardware_domain and the following Xen IOMMU options are
> NOT enabled: dom0-passthrough
>
> This subop usage of the "struct pv_iommu_op" and ``struct map_page` fields
> are detailed below:
>
> ------------------------------------------------------------------------------
> Field Purpose
> ----- ---------------------------------------------------------------
> `bfn` [in] Bus address frame number(BFN) to be mapped to specified gfn
> below
>
> `gfn` [in] Guest address frame number for DOMID_SELF
>
> `flags` [in] Flags for signalling type of IOMMU mapping to be created,
> Flags can be combined.
>
> `status` [out] Mapping status of this op, op specific values listed below
> ------------------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name Bit Definition
> ---- ----- ----------------------------------
> IOMMU_OP_readable 0 Create readable IOMMU mapping
> IOMMU_OP_writeable 1 Create writeable IOMMU mapping
> IOMMU_MAP_OP_no_ref_cnt 2 IOMMU mapping does not take a reference to
> MFN backing BFN mapping
> Reserved for future use 3-9 n/a
> IOMMU_page_order 10-15 Page order to be used for both gfn and bfn
>
> Defined values for map_page subop status field:
>
> Value Reason
> ------ ----------------------------------------------------------------------
> 0 subop successfully returned
> -EIO IOMMU unit returned error when attempting to map BFN to GFN.
> -EPERM GFN could not be mapped because the GFN belongs to Xen.
> -EPERM Domain is not a domain and GFN does not belong to domain
> -EPERM Domain is a hardware domain, IOMMU dom-strict mode is enabled and
> GFN does not belong to domain
> -EACCES BFN address conflicts with RMRR regions for device's attached to
> DOMID_SELF
> -ENOSPC Page order is too large for either BFN, GFN or IOMMU unit
>
> IOMMUOP_unmap_page
> ------------------
> This subop uses `struct unmap_page` part of the `struct pv_iommu_op`.
>
> The subop usage of the "struct pv_iommu_op" and ``struct unmap_page` fields
> are detailed below:
>
> --------------------------------------------------------------------
> Field Purpose
> ----- -----------------------------------------------------
> `bfn` [in] Bus address frame number to be unmapped in DOMID_SELF
>
> `flags` [in] Flags for signalling page order of unmap operation
>
> `status` [out] Mapping status of this unmap operation, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name Bit Definition
> ---- ----- ----------------------------------
> Reserved for future use 0-9 n/a
> IOMMU_page_order 10-15 Page order to be used for bfn
>
>
> Defined values for unmap_page subop status field:
>
> Error code Reason
> ---------- ------------------------------------------------------------
> 0 subop successfully returned
> -EIO IOMMU unit returned error when attempting to unmap BFN.
> -ENOSPC Page order is too large for either BFN address or IOMMU unit
> ------------------------------------------------------------------------
>
>
> IOMMUOP_map_foreign_page
> ----------------
> This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`.
>
> It is not valid to use domid representing the calling domain.
>
> The hypercall will only succeed if calling domain has sufficient privilege over
> the specified domid
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
>
> If there IOMMU support then the specified BFN is returned for the GFN + domid
> combination
>
> The M2B mechanism is a MFN to (BFN,domid,ioserver) tuple.
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry.
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> All the following conditions are required to be true for PV IOMMU map_foreign
> subop to succeed:
>
> 1. IOMMU detected and supported by Xen
> 2. The domain has IOMMU controlled hardware allocated to it
> 3. The domain is a hardware_domain and the following Xen IOMMU options are
> NOT enabled: dom0-passthrough
What if the IOMMU is enabled, and runs in the default mode, which 1:1
maps all memories except owned by Xen?
>
>
> This subop usage of the "struct pv_iommu_op" and ``struct map_foreign_page`
> fields are detailed below:
>
> --------------------------------------------------------------------
> Field Purpose
> ----- -----------------------------------------------------
> `domid` [in] The domain ID for which the gfn field applies
>
> `ioserver` [in] IOREQ server id associated with mapping
>
> `bfn` [in] Bus address frame number for gfn address
>
> `gfn` [in] Guest address frame number
>
> `flags` [in] Details the status of the BFN mapping
>
> `status` [out] status of this subop, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name Bit Definition
> ---- ----- ----------------------------------
> IOMMUOP_readable 0 BFN IOMMU mapping is readable
> IOMMUOP_writeable 1 BFN IOMMU mapping is writeable
> IOMMUOP_swap_mfn 2 BFN IOMMU mapping can be safely
> swapped to scratch page
> Reserved for future use 3-9 Reserved flag bits should be 0
> IOMMU_page_order 10-15 Returns maximum possible page order for
> all other IOMMUOP subops
>
> Defined values for map_foreign_page subop status field:
>
> Error code Reason
> ---------- ------------------------------------------------------------
> 0 subop successfully returned
> -EIO IOMMU unit returned error when attempting to map BFN to GFN.
> -EPERM Calling domain does not have sufficient privilege over domid
> -EPERM GFN could not be mapped because the GFN belongs to Xen.
> -EPERM domid maps to DOMID_SELF
> -EACCES BFN address conflicts with RMRR regions for device's attached to
> DOMID_SELF
> -ENODEV Provided ioserver id is not valid
> -ENXIO Provided domid id is not valid
> -ENXIO Provided GFN address is not valid
> -ENOSPC Page order is too large for either BFN, GFN or IOMMU unit
>
> IOMMU_lookup_foreign_page
> ----------------
> This subop uses `struct lookup_foreign_page` part of the `struct pv_iommu_op`.
>
> If the BFN is specified as an input and parameter and there is no IOMMU support
> for the calling domain then an error will be returned.
>
> It is the calling domain responsibility to ensure there are no conflicts
>
> The hypercall will only succeed if calling domain has sufficient privilege over
> the specified domid
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
Similarly, what if the IOMMU is enabled, and runs in the default mode,
which 1:1 maps all memories except owned by Xen? Will a MFN be returned?
Or should we take the query/map ops instead of the lookup op for this
situation?
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry.
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> This subop usage of the "struct pv_iommu_op" and ``struct lookup_foreign_page`
> fields are detailed below:
>
> --------------------------------------------------------------------
> Field Purpose
> ----- -----------------------------------------------------
> `domid` [in] The domain ID for which the gfn field applies
>
> `ioserver` [in] IOREQ server id associated with mapping
>
> `bfn` [out] Bus address frame number for gfn address
>
> `gfn` [in] Guest address frame number
>
> `flags` [out] Details the status of the BFN mapping
>
> `status` [out] status of this subop, 0 indicates success
> --------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name Bit Definition
> ---- ----- ----------------------------------
> IOMMUOP_readable 0 Returned BFN IOMMU mapping is readable
> IOMMUOP_writeable 1 Returned BFN IOMMU mapping is writeable
> Reserved for future use 2-9 Reserved flag bits should be 0
> IOMMU_page_order 10-15 Returns maximum possible page order for
> all other IOMMUOP subops
>
> Defined values for lookup_foreign_page subop status field:
>
> Error code Reason
> ---------- ------------------------------------------------------------
> 0 subop successfully returned
> -EPERM Calling domain does not have sufficient privilege over domid
> -ENOENT There is no available BFN for provided GFN + domid combination
> -ENODEV Provided ioserver id is not valid
> -ENXIO Provided domid id is not valid
> -ENXIO Provided GFN address is not valid
>
>
> IOMMUOP_unmap_foreign_page
> ----------------
> This subop uses `struct unmap_foreign_page` part of the `struct pv_iommu_op`.
>
> If there is no IOMMU support then the MFN is returned in the BFN field (that is
> the only valid bus address for the GFN + domid combination).
>
> If there is IOMMU support then the specified BFN is returned for the GFN + domid
> combination
>
> Each successful subop will add to the M2B if there was not an existing identical
> M2B entry. The
>
> Every new M2B entry will take a reference to the MFN backing the GFN.
>
> This subop usage of the "struct pv_iommu_op" and ``struct unmap_foreign_page` fields
> are detailed below:
>
> -----------------------------------------------------------------------
> Field Purpose
> ----- --------------------------------------------------------
> `ioserver` [in] IOREQ server id associated with mapping
>
> `bfn` [in] Bus address frame number for gfn address
>
> `flags` [out] Flags for signalling page order of unmap operation
>
> `status` [out] status of this subop, 0 indicates success
> -----------------------------------------------------------------------
>
> Defined bits for flags field:
>
> Name Bit Definition
> ---- ----- ----------------------------------
> Reserved for future use 0-9 n/a
> IOMMU_page_order 10-15 Page order to be used for bfn
>
> Defined values for unmap_foreign_page subop status field:
>
> Error code Reason
> ---------- ------------------------------------------------------------
> 0 subop successfully returned
> -ENOENT There is no mapped BFN + ioserver id combination to unmap
>
>
> IOMMUOP_*_foreign_page interactions with guest domain ballooning
> ================================================================
>
> Guest domains can balloon out a set of GFN mappings at any time and render the
> BFN to GFN mapping invalid.
>
> When a BFN to GFN mapping becomes invalid, Xen will issue a buffered IO request
> of type IOREQ_TYPE_INVALIDATE to the affected IOREQ servers with the now invalid
> BFN address in the data field. If the buffered IO request ring is full then a
> standard (synchronous) IO request of type IOREQ_TYPE_INVALIDATE will be issued
> to the affected IOREQ server the with just invalidated BFN address in the data
> field.
>
> The BFN mappings cannot be simply unmapped at the point of the balloon hypercall
> otherwise a malicious guest could specifically balloon out an in use GFN address
> in use by an emulator and trigger IOMMU faults for the domains with BFN
> mappings.
>
> For hosts with no IOMMU support: The affected emulator(s) must specifically
> issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN address so that
> the references to the underlying MFN are removed and the MFN can be freed back
> to the Xen memory allocator.
I do not quite understand this. With no IOMMU support, these BFNs are
supplied by hypervisor. So why not let hypervisor do this unmap and
notify the calling domain?
>
> For hosts with IOMMU support:
> If the BFN was mapped without the IOMMUOP_swap_mfn flag set in the
> IOMMUOP_map_foreign_page then the affected affected emulator(s) must
> specifically issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN
> address so that the references to the underlying MFN are removed.
>
> If the BFN was mapped with the IOMMUOP_swap_mfn flag set in the
> IOMMUOP_map_foreign_page subop for all emulators with mappings of that GFN then
> the BFN mapping will be swapped to point at a scratch MFN page and all BFN
> references to the invalid MFN will be removed by Xen after the BFN mapping has
> been updated to point at the scratch MFN page.
>
> The rationale for swapping the BFN mapping to point at scratch pages is to
> enable guest domains to balloon quickly without requiring hypercall(s) from
> emulators.
>
> Not all BFN mappings can be swapped without potentially causing problems for the
> hardware itself (command rings etc.) so the IOMMUOP_swap_mfn flag is used to
> allow per BFN control of Xen ballooning behaviour.
>
>
> PV IOMMU interactions with self ballooning
> ==========================================
>
> The guest should clear any IOMMU mappings it has of it's own pages before
> releasing a page back to Xen. It will need to add IOMMU mappings after
> repopulating a page with the populate_physmap hypercall.
>
> This requires that IOMMU mappings get a writeable page type reference count and
> that guests clear any IOMMU mappings before pinning page table pages.
>
>
> Security Implications of allowing domain IOMMU control
> ===============================================================
>
> Xen currently allows IO devices attached to hardware domain to have direct
> access to the all of the MFN address space (except Xen hypervisor memory regions),
> provided the Xen IOMMU option dom0-strict is not enabled.
>
> The PV IOMMU feature provides the same level of access to MFN address space
> and the feature is not enabled when the Xen IOMMU option dom0-strict is
> enabled. Therefore security is not degraded by the PV IOMMU feature.
>
> Domains with physical device(s) assigned which are not hardware domains are only
> allowed to map their own GFNs or GFNs for domain(s) they have privilege over.
>
>
> PV IOMMU interactions with grant map/unmap operations
> =====================================================
>
> Grant map operations return a Physical device accessible address (BFN) if the
> GNTMAP_device_map flag is set. This operation currently returns the MFN for PV
> guests which may conflict with the BFN address space the guest uses if PV IOMMU
> map support is available to the guest.
>
> This design proposes to allow the calling domain to control the BFN address that
> a grant map operation uses.
>
> This can be achieved by specifying that the dev_bus_addr in the
> gnttab_map_grant_ref structure is used an input parameter instead of the
> output parameter it is currently.
>
> Only PAGE_SIZE aligned addresses are allowed for dev_bus_addr input parameter.
>
> The revised structure is shown below for convenience.
>
> struct gnttab_map_grant_ref {
> /* IN parameters. */
> uint64_t host_addr;
> uint32_t flags; /* GNTMAP_* */
> grant_ref_t ref;
> domid_t dom;
> /* OUT parameters. */
> int16_t status; /* => enum grant_status */
> grant_handle_t handle;
> /* IN/OUT parameters */
> uint64_t dev_bus_addr;
> };
>
>
> The grant map operation would then behave similarly to the IOMMUOP_map_page
> subop for the creation of the IOMMU mapping.
>
> The grant unmap operation would then behave similarly to the IOMMUOP_unmap_page
> subop for the removal of the IOMMU mapping.
>
> A new grantmap flag would be used to indicate the domain is requesting the
> dev_bus_addr field is used an input parameter.
>
>
> #define _GNTMAP_request_bfn_map (6)
> #define GNTMAP_request_bfn_map (1<<_GNTMAP_request_bfn_map)
>
>
>
> Linux kernel architecture
> =========================
>
> The Linux kernel will use the PV-IOMMU hypercalls to map it's PFN address
> space into the IOMMU. It will map the PFN's to the IOMMU address space using
> a 1:1 mapping, it does this by programming a BFN to GFN mapping which matches
> the PFN to GFN mapping.
>
> The native SWIOTLB will be used to handle device's which cannot DMA to all of
> the kernel's PFN address space.
>
> An interface shall be provided for emulator usage of IOMMUOP_*_foreign_page
> subops which will allow the Linux kernel to centrally manage that domains BFN
> resource and ensure there are no unexpected conflicts.
>
>
> Emulator usage of PV IOMMU interface
> ====================================
>
> Emulators which require bus address mapping of guest RAM must first determine if
> it's possible for the domain to control the bus addresses themselves.
>
> A IOMMUOP_query_caps subop will return the IOMMU_QUERY_map_cap flag. If this
> flag is set then the emulator may specify the BFN address it wishes guest RAM to
> be mapped to via the IOMMUOP_map_foreign_page subop. If the flag is not set
> then the emulator must use BFN addresses supplied by the Xen via the
> IOMMUOP_lookup_foreign_page.
>
> Operating systems which use the IOMMUOP_map_page subop are expected to provide a
> common interface for emulators
According to our previous internal discussions, my understanding about
the usage is this:
1> PV IOMMU has an interface in dom0's kernel to do the query/map/lookup
all at once, which also includes the BFN allocation algorithm.
2> When XenGT emulator tries to construct a shadow PTE, we can just call
your interface, which returns a BFN whatever.
However, the above description seems the XenGT device model need to do
the query/lookup/map by itself?
Besides, could you please give a more detailed information about this
'common interface'? :)
Thanks
Yu
>
> Emulators should unmap unused GFN mappings as often as possible using
> IOMMUOP_unmap_foreign_page subops so that guest domains can balloon pages
> quickly and efficiently.
>
> Emulators should conform to the ballooning behaviour described section
> "IOMMUOP_*_foreign_page interactions with guest domain ballooning" so that guest
> domains are able to effectively balloon out and in memory.
>
> Emulators must unmap any active BFN mappings when they shutdown.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
next prev parent reply other threads:[~2015-06-17 12:53 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-12 16:43 [RFC] Xen PV IOMMU interface draft B Malcolm Crossley
2015-06-16 13:19 ` Jan Beulich
2015-06-16 14:47 ` Malcolm Crossley
2015-06-16 15:56 ` Jan Beulich
2015-06-17 12:48 ` Yu, Zhang [this message]
2015-06-17 13:34 ` Jan Beulich
2015-06-17 13:44 ` Malcolm Crossley
2015-06-26 10:23 ` Xen PV IOMMU interface draft C Malcolm Crossley
2015-06-26 11:03 ` Ian Campbell
2015-06-29 14:40 ` Konrad Rzeszutek Wilk
2015-06-29 14:52 ` Ian Campbell
2015-06-29 15:05 ` Malcolm Crossley
2015-06-29 15:24 ` David Vrabel
2015-06-29 15:36 ` Ian Campbell
2015-07-10 19:32 ` Konrad Rzeszutek Wilk
2016-02-10 10:09 ` Xen PV IOMMU interface draft D Malcolm Crossley
2016-02-18 8:21 ` Tian, Kevin
2016-02-23 16:17 ` Jan Beulich
2016-02-23 16:22 ` Malcolm Crossley
2016-03-02 6:54 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55816CAC.7090104@linux.intel.com \
--to=yu.c.zhang@linux.intel.com \
--cc=JBeulich@suse.com \
--cc=Paul.Durrant@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=kevin.tian@intel.com \
--cc=konrad.wilk@oracle.com \
--cc=malcolm.crossley@citrix.com \
--cc=xen-devel@lists.xenproject.org \
--cc=zhiyuan.lv@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).