From: Lan Tianyu <tianyu.lan@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>,
xuquan8@huawei.com, Stefano Stabellini <sstabellini@kernel.org>,
Andrew Cooper <andrew.cooper3@citrix.com>,
"ian.jackson@eu.citrix.com" <ian.jackson@eu.citrix.com>,
Kevin Tian <kevin.tian@intel.com>,
Jun Nakajima <jun.nakajima@intel.com>,
"anthony.perard@citrix.com" <anthony.perard@citrix.com>,
xen-devel <xen-devel@lists.xenproject.org>,
Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: Xen virtual IOMMU high level design doc
Date: Wed, 31 Aug 2016 16:39:24 +0800 [thread overview]
Message-ID: <57C697BC.3090506@intel.com> (raw)
In-Reply-To: <57BEEE8F0200007800108EEE@prv-mh.provo.novell.com>
Hi Jan:
Sorry for later response. Thanks a lot for your comments.
On 2016年08月25日 19:11, Jan Beulich wrote:
>>>> On 17.08.16 at 14:05, <tianyu.lan@intel.com> wrote:
>> 1 Motivation for Xen vIOMMU
>> ============================================================================
>> ===
>> 1.1 Enable more than 255 vcpu support
>> HPC virtualization requires more than 255 vcpus support in a single VM
>> to meet parallel computing requirement. More than 255 vcpus support
>> requires interrupt remapping capability present on vIOMMU to deliver
>> interrupt to #vcpu >255 Otherwise Linux guest fails to boot up with >255
>> vcpus if interrupt remapping is absent.
>
> I continue to question this as a valid motivation at this point in
> time, for the reasons Andrew has been explaining.
If we want to support Linux guest with >255 vcpus, interrupt remapping
is necessary.
From Linux commit introducing x2apic and IR mode, it said IR was
a pre-requisite for enabling x2apic mode in the CPU.
https://lwn.net/Articles/289881/
So far, no sure behavior on the other OS. We may watch Windows guest
behavior later on KVM and there is still a bug to run Windows guest with
IR function on KVM.
>
>> 2. Xen vIOMMU Architecture
>> ============================================================================
>> ====
>>
>> * vIOMMU will be inside Xen hypervisor for following factors
>> 1) Avoid round trips between Qemu and Xen hypervisor
>> 2) Ease of integration with the rest of the hypervisor
>> 3) HVMlite/PVH doesn't use Qemu
>> * Dummy xen-vIOMMU in Qemu as a wrapper of new hypercall to create
>> /destory vIOMMU in hypervisor and deal with virtual PCI device's 2th
>> level translation.
>
> How does the create/destroy part of this match up with 3) right
> ahead of it?
The create/destroy hypercalls will work for both hvm and hvmlite.
Suppose hvmlite has tool stack(E.G libxl) which can call new hypercalls
to create or destroy virtual iommu in hypervisor.
>
>> 3 Xen hypervisor
>> ==========================================================================
>>
>> 3.1 New hypercall XEN_SYSCTL_viommu_op
>> 1) Definition of "struct xen_sysctl_viommu_op" as new hypercall parameter.
>>
>> struct xen_sysctl_viommu_op {
>> u32 cmd;
>> u32 domid;
>> union {
>> struct {
>> u32 capabilities;
>> } query_capabilities;
>> struct {
>> u32 capabilities;
>> u64 base_address;
>> } create_iommu;
>> struct {
>> u8 bus;
>> u8 devfn;
>
> Please can we avoid introducing any new interfaces without segment/
> domain value, even if for now it'll be always zero?
Sure. Will add segment field.
>
>> u64 iova;
>> u64 translated_addr;
>> u64 addr_mask; /* Translation page size */
>> IOMMUAccessFlags permisson;
>> } 2th_level_translation;
>
> I suppose "translated_addr" is an output here, but for the following
> fields this already isn't clear. Please add IN and OUT annotations for
> clarity.
>
> Also, may I suggest to name this "l2_translation"? (But there are
> other implementation specific things to be considered here, which
> I guess don't belong into a design doc discussion.)
How about this?
struct {
/* IN parameters. */
u8 segment;
u8 bus;
u8 devfn;
u64 iova;
/* Out parameters. */
u64 translated_addr;
u64 addr_mask; /* Translation page size */
IOMMUAccessFlags permisson;
} l2_translation;
>
>> };
>>
>> typedef enum {
>> IOMMU_NONE = 0,
>> IOMMU_RO = 1,
>> IOMMU_WO = 2,
>> IOMMU_RW = 3,
>> } IOMMUAccessFlags;
>>
>>
>> Definition of VIOMMU subops:
>> #define XEN_SYSCTL_viommu_query_capability 0
>> #define XEN_SYSCTL_viommu_create 1
>> #define XEN_SYSCTL_viommu_destroy 2
>> #define XEN_SYSCTL_viommu_dma_translation_for_vpdev 3
>>
>> Definition of VIOMMU capabilities
>> #define XEN_VIOMMU_CAPABILITY_1nd_level_translation (1 << 0)
>> #define XEN_VIOMMU_CAPABILITY_2nd_level_translation (1 << 1)
>
> l1 and l2 respectively again, please.
Will update.
>
>> 3.3 Interrupt remapping
>> Interrupts from virtual devices and physical devices will be delivered
>> to vlapic from vIOAPIC and vMSI. It needs to add interrupt remapping
>> hooks in the vmsi_deliver() and ioapic_deliver() to find target vlapic
>> according interrupt remapping table. The following diagram shows the logic.
>
> Missing diagram or stale sentence?
Sorry. It's stale sentence and moved the diagram to 2.2 Interrupt
remapping overview.
>
>> 3.5 Implementation consideration
>> Linux Intel IOMMU driver will fail to be loaded without 2th level
>> translation support even if interrupt remapping and 1th level
>> translation are available. This means it's needed to enable 2th level
>> translation first before other functions.
>
> Is there a reason for this? I.e. do they unconditionally need that
> functionality?
Yes, Linux intel IOMMU driver unconditionally needs l2 translation.
Driver checks whether there is a valid sagaw(supported Adjusted Guest
Address Widths) during initializing IOMMU data struct and return error
if not.
--
Best regards
Tianyu Lan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-08-31 8:53 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-26 8:29 Discussion about virtual iommu support for Xen guest Lan Tianyu
2016-05-26 8:42 ` Dong, Eddie
2016-05-27 2:26 ` Lan Tianyu
2016-05-27 8:11 ` Tian, Kevin
2016-05-26 11:35 ` Andrew Cooper
2016-05-27 8:19 ` Lan Tianyu
2016-06-02 15:03 ` Lan, Tianyu
2016-06-02 18:58 ` Andrew Cooper
2016-06-03 11:01 ` Current PVH/HVMlite work and planning (was :Re: Discussion about virtual iommu support for Xen guest) Roger Pau Monne
2016-06-03 11:21 ` Tian, Kevin
2016-06-03 11:52 ` Roger Pau Monne
2016-06-03 12:11 ` Tian, Kevin
2016-06-03 16:56 ` Stefano Stabellini
2016-06-07 5:48 ` Tian, Kevin
2016-06-03 11:17 ` Discussion about virtual iommu support for Xen guest Tian, Kevin
2016-06-03 13:09 ` Lan, Tianyu
2016-06-03 14:00 ` Andrew Cooper
2016-06-03 13:51 ` Andrew Cooper
2016-06-03 14:31 ` Jan Beulich
2016-06-03 17:14 ` Stefano Stabellini
2016-06-07 5:14 ` Tian, Kevin
2016-06-07 7:26 ` Jan Beulich
2016-06-07 10:07 ` Stefano Stabellini
2016-06-08 8:11 ` Tian, Kevin
2016-06-26 13:42 ` Lan, Tianyu
2016-06-29 3:04 ` Tian, Kevin
2016-07-05 13:37 ` Lan, Tianyu
2016-07-05 13:57 ` Jan Beulich
2016-07-05 14:19 ` Lan, Tianyu
2016-08-17 12:05 ` Xen virtual IOMMU high level design doc Lan, Tianyu
2016-08-17 12:42 ` Paul Durrant
2016-08-18 2:57 ` Lan, Tianyu
2016-08-25 11:11 ` Jan Beulich
2016-08-31 8:39 ` Lan Tianyu [this message]
2016-08-31 12:02 ` Jan Beulich
2016-09-01 1:26 ` Tian, Kevin
2016-09-01 2:35 ` Lan Tianyu
2016-09-15 14:22 ` Lan, Tianyu
2016-10-05 18:36 ` Konrad Rzeszutek Wilk
2016-10-11 1:52 ` Lan Tianyu
2016-11-23 18:19 ` Edgar E. Iglesias
2016-11-23 19:09 ` Stefano Stabellini
2016-11-24 2:00 ` Tian, Kevin
2016-11-24 4:09 ` Edgar E. Iglesias
2016-11-24 6:49 ` Lan Tianyu
2016-11-24 13:37 ` Edgar E. Iglesias
2016-11-25 2:01 ` Xuquan (Quan Xu)
2016-11-25 5:53 ` Lan, Tianyu
2016-10-18 14:14 ` Xen virtual IOMMU high level design doc V2 Lan Tianyu
2016-10-18 19:17 ` Andrew Cooper
2016-10-20 9:53 ` Tian, Kevin
2016-10-20 18:10 ` Andrew Cooper
2016-10-20 14:17 ` Lan Tianyu
2016-10-20 20:36 ` Andrew Cooper
2016-10-22 7:32 ` Lan, Tianyu
2016-10-26 9:39 ` Jan Beulich
2016-10-26 15:03 ` Lan, Tianyu
2016-11-03 15:41 ` Lan, Tianyu
2016-10-28 15:36 ` Lan Tianyu
2016-10-18 20:26 ` Konrad Rzeszutek Wilk
2016-10-20 10:11 ` Tian, Kevin
2016-10-20 14:56 ` Lan, Tianyu
2016-10-26 9:36 ` Jan Beulich
2016-10-26 14:53 ` Lan, Tianyu
2016-11-17 15:36 ` Xen virtual IOMMU high level design doc V3 Lan Tianyu
2016-11-18 19:43 ` Julien Grall
2016-11-21 2:21 ` Lan, Tianyu
2016-11-21 13:17 ` Julien Grall
2016-11-21 18:24 ` Stefano Stabellini
2016-11-21 7:05 ` Tian, Kevin
2016-11-23 1:36 ` Lan Tianyu
2016-11-21 13:41 ` Andrew Cooper
2016-11-22 6:02 ` Tian, Kevin
2016-11-22 8:32 ` Lan Tianyu
2016-11-22 10:24 ` Jan Beulich
2016-11-24 2:34 ` Lan Tianyu
2016-06-03 19:51 ` Is: 'basic pci bridge and root device support. 'Was:Re: Discussion about virtual iommu support for Xen guest Konrad Rzeszutek Wilk
2016-06-06 9:55 ` Jan Beulich
2016-06-06 17:25 ` Konrad Rzeszutek Wilk
2016-08-02 15:15 ` Lan, Tianyu
2016-05-27 8:35 ` Tian, Kevin
2016-05-27 8:46 ` Paul Durrant
2016-05-27 9:39 ` Tian, Kevin
2016-05-31 9:43 ` George Dunlap
2016-05-27 2:26 ` Yang Zhang
2016-05-27 8:13 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57C697BC.3090506@intel.com \
--to=tianyu.lan@intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=anthony.perard@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
--cc=xuquan8@huawei.com \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.