From mboxrd@z Thu Jan  1 00:00:00 1970
From: Malcolm Crossley <malcolm.crossley@citrix.com>
Subject: Re: [RFC] Dom0 PV IOMMU control design (draft A)
Date: Mon, 14 Apr 2014 13:12:07 +0100
Message-ID: <534BD097.1090407@citrix.com>
References: <5348264B.1040800@citrix.com>
	<20140411175036.GA15429@phenom.dumpdata.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <malcolm.crossley@citrix.com>) id 1WZfkC-00084y-4u
	for xen-devel@lists.xenproject.org; Mon, 14 Apr 2014 12:12:12 +0000
In-Reply-To: <20140411175036.GA15429@phenom.dumpdata.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
List-Id: xen-devel@lists.xenproject.org

On 11/04/14 18:50, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 11, 2014 at 06:28:43PM +0100, Malcolm Crossley wrote:
>> Hi,
>>
>> Here is a design for allowing Dom0 PV guests to control the IOMMU.
>> This allows for the Dom0 GPFN mapping to be programmed into the
>> IOMMU and avoid using the SWIOTLB bounce buffer technique in the
>> Linux kernel (except for legacy 32 bit DMA IO devices)
>>
>> ...
>>
>> Design considerations for hypercall subops
>> -------------------------------------------
>> IOMMU map/unmap operations can be slow and can involve flushing the
>> IOMMU TLB
>> to ensure the IO device uses the updated mappings.
>>
>> The subops have been designed to take an array of operations and a count as
>> parameters. This allows for easily implemented hypercall
>> continuations to be
>> used and allows for batches of IOMMU operations to be submitted
>> before flushing
>> the IOMMU TLB.
>>
>>
>>
>> IOMMUOP_map_page
>> ----------------
>> First argument, pointer to array of `struct iommu_map_op`
>> Second argument, integer count of `struct iommu_map_op` elements in array
> Could this be 'unsigned integer' count?
Yes will change for draft B
>
> Is there a limit? Can I do 31415 of them? Can I do it for the whole
> memory space of the guest?
There is no current limit, the hypercall will be implemented with 
continuations to prevent denial of service attacks.
>> This subop will attempt to IOMMU map each element in the `struct
>> iommu_map_op`
>> array and record the mapping status back into the array itself. If
>> an mapping
>> fault occurs then the hypercall will return with -EFAULT.
>> This subop will inspect the MFN address being mapped in each
>> iommu_map_op to
>> ensure it does not belong to the Xen hypervisor itself. If the MFN
>> does belong
>> to the Xen hypervisor the subop will return -EPERM in the status
>> field for that
>> particular iommu_map_op.
> Is it OK if the MFN belongs to another guest?
>
It is OK for the MFN to belong to another guest because only Dom0 is 
performing the mapping. This is to allow grant mapped pages to have a 
Dom0 BFN mapping.
>> The IOMMU TLB will only be flushed when the hypercall completes or a
>> hypercall
>> continuation is created.
>>
>>      struct iommu_map_op {
>>          uint64_t bfn;
> bus_frame ?
Yes, or you could say, Bus Address with 4k page size granularity.
>
>>          uint64_t mfn;
>>          uint32_t flags;
>>          int32_t status;
>>      };
>>
>> ------------------------------------------------------------------------------
>> Field          Purpose
>> ----- ---------------------------------------------------------------
>> `bfn`          [in] Bus address frame number to mapped to specified
>> mfn below
> Huh? Isn't this out? If not, isn't bfn == mfn for dom0?
> How would dom0 know the bus address? That usually is something only the
> IOMMU knows.
The idea is that Domain0 sets up BFN to MFN mappings which are the same 
as the GPFN to MFN mappings. This cannot be done from Xen's M2P mappings 
because they are not up to date for PV guests.
>
>> `mfn`          [in] Machine address frame number
>>
>
> We still need to do a bit of PFN -> MFN -> hypercall -> GFN and program
> that in the PCIe devices right?
Yes, a GPFN to MFN lookup will be required but this is simply consulting 
the Guest P2M. BTW, we are programming the IOMMU not the PCIe device's 
themselves.
>
>
>> `flags`        [in] Flags for signalling type of IOMMU mapping to be created
>>
>> `status`       [out] Mapping status of this map operation, 0
>> indicates success
>> ------------------------------------------------------------------------------
>>
>>
>> Defined bits for flags field
>> ------------------------------------------------------------------------
>> Name                        Bit                Definition
>> ----                       ----- ----------------------------------
>> IOMMU_MAP_OP_readable        0        Create readable IOMMU mapping
>> IOMMU_MAP_OP_writeable       1        Create writeable IOMMU mapping
> And is it OK to use both?
Yes and typically both would be used. I'm just allowing for read only 
mappings and write only mappings to be created.
>
>> Reserved for future use     2-31                   n/a
>> ------------------------------------------------------------------------
>>
>> Additional error codes specific to this hypercall:
>>
>> Error code  Reason
>> ---------- ------------------------------------------------------------
>> EPERM       PV IOMMU mode not enabled or calling domain is not domain 0
> And -EFAULT
I was considering that EFAULT would be standard error code, do you want 
it to be explicitly listed?
>
> and what about success? Do you return 0 or the number of ops that were
> successfull?
Return 0 if all ops were successful, otherwise return the number of 
failed operations as positive number. I leave it up to the caller to 
determine which operations failed by iterating the input array. I'll 
will add this to draft B.
>
>> ------------------------------------------------------------------------
>>
>> IOMMUOP_unmap_page
>> ----------------
>> First argument, pointer to array of `struct iommu_map_op`
>> Second argument, integer count of `struct iommu_map_op` elements in array
> Um, 'unsigned integer' count?
Yes will change for draft B
>
>> This subop will attempt to unmap each element in the `struct
>> iommu_map_op` array
>> and record the mapping status back into the array itself. If an
>> unmapping fault
>> occurs then the hypercall stop processing the array and return with
>> an EFAULT;
>>
>> The IOMMU TLB will only be flushed when the hypercall completes or a
>> hypercall
>> continuation is created.
>>
>>      struct iommu_map_op {
>>          uint64_t bfn;
>>          uint64_t mfn;
>>          uint32_t flags;
>>          int32_t status;
>>      };
>>
>> --------------------------------------------------------------------
>> Field          Purpose
>> -----          -----------------------------------------------------
>> `bfn`          [in] Bus address frame number to be unmapped
> I presume this is gathered from the 'map' call?
It does not need to be gathered, Domain 0 is responsible for it's own 
BFN mappings and there is no auditing of the BFN address themselves.
>
>> `mfn`          [in] This field is ignored for unmap subop
>>
>> `flags`        [in] This field is ignored for unmap subop
>>
>> `status`       [out] Mapping status of this unmap operation, 0
>> indicates success
>> --------------------------------------------------------------------
>>
>> Additional error codes specific to this hypercall:
>>
>> Error code  Reason
>> ---------- ------------------------------------------------------------
>> EPERM       PV IOMMU mode not enabled or calling domain is not domain 0
> EFAULT too
I was considering that EFAULT would be standard error code, do you want 
it to be explicitly listed?
>
>> ------------------------------------------------------------------------
>>
>>
>> Conditions for which PV IOMMU hypercalls succeed
>> ------------------------------------------------
>> All the following conditions are required to be true for PV IOMMU hypercalls
>> to succeed:
>>
>> 1. IOMMU detected and supported by Xen
>> 2. The following Xen IOMMU options are NOT enabled:
>> dom0-passthrough, dom0-strict
>> 3. Domain 0 is making the hypercall
>>
>>
>> Security Implications of allowing Domain 0 IOMMU control
>> ========================================================
>>
>> Xen currently allows IO devices attached to Domain 0 to have direct
>> access to
>> the all of the MFN address space (except Xen hypervisor memory regions),
>> provided the Xen IOMMU option dom0-strict is not enabled.
>>
>> The PV IOMMU feature provides the same level of access to MFN address space
>> and the feature is not enabled when the Xen IOMMU option dom0-strict is
>> enabled. Therefore security is not affected by the PV IOMMU feature.
>>
Thanks for your comments.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel