public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Patches for vtd pci passthrough
@ 2008-06-17 19:05 benami
  2008-06-17 19:05 ` [PATCH] KVM: PCIPT: fix compilation errors benami
  0 siblings, 1 reply; 22+ messages in thread
From: benami @ 2008-06-17 19:05 UTC (permalink / raw)
  To: amit.shah; +Cc: weidong.han, muli, raharper, kvm

Amit,

I gathered the last few patches that Randy sent and that I had for the
vtd branch of the PCI passthrough tree.
With these patches passthrough with VT-d seems to work.
We tested this with an e1000 NIC.

Comments appreciated.

Thanks,
Ben



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] KVM: PCIPT: fix compilation errors
  2008-06-17 19:05 Patches for vtd pci passthrough benami
@ 2008-06-17 19:05 ` benami
  2008-06-17 19:05   ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap benami
  0 siblings, 1 reply; 22+ messages in thread
From: benami @ 2008-06-17 19:05 UTC (permalink / raw)
  To: amit.shah; +Cc: weidong.han, muli, raharper, kvm, Ben-Ami Yassour

From: Han Weidong <weidong.han@intel.com>

This patch applies to the PCIPT vtd branch

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 arch/x86/kvm/x86.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f3c5b11..89d9cda 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -307,9 +307,10 @@ static int kvm_vm_ioctl_pci_pt_dev(struct kvm *kvm,
 	}
 
 	if (kvm_intel_iommu_found()) {
-		r = kvm_iommu_map_guest(kvm, &pci_pt_dev);
+		r = kvm_iommu_map_guest(kvm, pci_pt_dev);
 		if (r)
-			goto out;
+			goto out_free;
+	}
 
 	write_lock_irqsave(&kvm_pci_pt_lock, flags);
 
@@ -1968,7 +1969,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = kvm_vm_ioctl_pci_pt_dev(kvm, &pci_pt_dev);
 		if (r)
 			goto out;
-		}
 		break;
 	}
 	case KVM_GET_PIT: {
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] KVM: PCIPT: VT-d: fix guest unmap
  2008-06-17 19:05 ` [PATCH] KVM: PCIPT: fix compilation errors benami
@ 2008-06-17 19:05   ` benami
  2008-06-17 19:05     ` [PATCH] KVM: PCIPT: free device structure on vm destroy benami
  2008-06-17 21:29     ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap Anthony Liguori
  0 siblings, 2 replies; 22+ messages in thread
From: benami @ 2008-06-17 19:05 UTC (permalink / raw)
  To: amit.shah; +Cc: weidong.han, muli, raharper, kvm, Ben-Ami Yassour

From: Han Weidong <weidong.han@intel.com>

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 arch/x86/kvm/vtd.c        |    2 +-
 drivers/pci/intel-iommu.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index c2d9192..be775cd 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -157,7 +157,7 @@ int kvm_iommu_unmap_guest(struct kvm *kvm)
 	struct kvm_pci_pt_dev_list *entry;
 	struct pci_dev *pdev = NULL;
 
- 	if (kvm->arch.domain)
+	if (!kvm->arch.domain)
 		return 0;
 
 	list_for_each_entry(entry, &kvm->arch.pci_pt_dev_head, list) {
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index d927a13..930874f 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2483,7 +2483,7 @@ EXPORT_SYMBOL_GPL(kvm_intel_iommu_page_mapping);
 
 void kvm_intel_iommu_detach_dev(struct dmar_domain *domain, u8 bus, u8 devfn)
 {
-	kvm_intel_iommu_detach_dev(domain, bus, devfn);
+	detach_domain_for_dev(domain, bus, devfn);
 }
 EXPORT_SYMBOL_GPL(kvm_intel_iommu_detach_dev);
 
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] KVM: PCIPT: free device structure on vm destroy
  2008-06-17 19:05   ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap benami
@ 2008-06-17 19:05     ` benami
  2008-06-17 19:05       ` [PATCH] KVM: PCIPT: VT-d: fix context mapping benami
  2008-06-17 21:29     ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap Anthony Liguori
  1 sibling, 1 reply; 22+ messages in thread
From: benami @ 2008-06-17 19:05 UTC (permalink / raw)
  To: amit.shah; +Cc: weidong.han, muli, raharper, kvm, Ben-Ami Yassour

From: Han Weidong <weidong.han@intel.com>

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 arch/x86/kvm/x86.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 89d9cda..896ae5c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -358,6 +358,7 @@ static void kvm_free_pci_passthrough(struct kvm *kvm)
 			free_irq(pci_pt_dev->pt_dev.host.irq, kvm);
 
 		list_del(&pci_pt_dev->list);
+		kfree(pci_pt_dev);
 	}
 	write_unlock_irqrestore(&kvm_pci_pt_lock, flags);
 
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-17 19:05     ` [PATCH] KVM: PCIPT: free device structure on vm destroy benami
@ 2008-06-17 19:05       ` benami
  2008-06-17 19:25         ` KVM: PCIPT: temporary fix for pio (userspace part) benami
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: benami @ 2008-06-17 19:05 UTC (permalink / raw)
  To: amit.shah; +Cc: weidong.han, muli, raharper, kvm, Ben-Ami Yassour

From: Ben-Ami Yassour <benami@il.ibm.com>

When changing the VT-d context mapping, according to the spec, it is required
to first set the context to not present, flush and only then apply the
new context.

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 drivers/pci/intel-iommu.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 930874f..dcdfa97 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -56,6 +56,7 @@
 
 
 static void flush_unmaps_timeout(unsigned long data);
+static void detach_domain_for_dev(struct dmar_domain *domain, u8 bus, u8 devfn);
 
 DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
 
@@ -1264,7 +1265,23 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	if (!context)
 		return -ENOMEM;
 	spin_lock_irqsave(&iommu->lock, flags);
+	
+	if (context_present(*context) &&
+	    (context_domain_id(*context) == domain->id) &&
+	    (context_address_width(*context) == domain->agaw) &&
+	    (context_address_root(*context) == virt_to_phys(domain->pgd)) &&
+	    (context_translation_type(*context) == CONTEXT_TT_MULTI_LEVEL) &&
+	    (!context_fault_disable(*context))) {
+		spin_unlock_irqrestore(&iommu->lock, flags);
+		return 0;
+	}
+
+	spin_unlock_irqrestore(&iommu->lock, flags);
+
+	detach_domain_for_dev(domain, bus, devfn);
 
+	spin_lock_irqsave(&iommu->lock, flags);
+	
 	context_clear_entry(*context);
 	context_set_domain_id(*context, domain->id);
 	context_set_address_width(*context, domain->agaw);
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* KVM: PCIPT: temporary fix for pio (userspace part)
  2008-06-17 19:05       ` [PATCH] KVM: PCIPT: VT-d: fix context mapping benami
@ 2008-06-17 19:25         ` benami
  2008-06-17 19:25           ` [PATCH] " benami
  2008-06-18 20:30         ` [PATCH] KVM: PCIPT: VT-d: fix context mapping Muli Ben-Yehuda
  2008-06-19  8:59         ` Han, Weidong
  2 siblings, 1 reply; 22+ messages in thread
From: benami @ 2008-06-17 19:25 UTC (permalink / raw)
  To: amit.shah; +Cc: weidong.han, muli, raharper, kvm

PIO for passthrough does not work properly since the permissions are not
set, it is not enough to call iopl at init time.

This temporary patch calls iopl for every pio request.
A better solution would be have a per thread iopl.
An alternative approach would be to use sysfs for pio.
In the mean time, if you want to use the vtd branch, then you probably
need this patch.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] KVM: PCIPT: temporary fix for pio (userspace part)
  2008-06-17 19:25         ` KVM: PCIPT: temporary fix for pio (userspace part) benami
@ 2008-06-17 19:25           ` benami
  0 siblings, 0 replies; 22+ messages in thread
From: benami @ 2008-06-17 19:25 UTC (permalink / raw)
  To: amit.shah; +Cc: weidong.han, muli, raharper, kvm, Ben-Ami Yassour

From: Ben-Ami Yassour <benami@il.ibm.com>

PIO for passthrough does not work properly since the permissions are not
set, it is not enough to call iopl at init time.

This temporary patch calls iopl for every pio request.
A better solution would be have a per thread iopl.
An alternative approach would be to use sysfs for pio.
In the mean time, if you want to use the vtd branch, then you probably
need this patch.

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
---
 qemu/hw/pci-passthrough.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/qemu/hw/pci-passthrough.c b/qemu/hw/pci-passthrough.c
index 2d916a2..b091a85 100644
--- a/qemu/hw/pci-passthrough.c
+++ b/qemu/hw/pci-passthrough.c
@@ -117,6 +117,7 @@ static void pt_ioport_write##suffix(void *opaque, uint32_t addr, uint32_t value)
 			r_pio, (int)r_access->e_physbase,		\
 			(unsigned long)r_access->r_virtbase, value);	\
 	}								\
+	iopl(3);                                                        \
 	out##suffix(value, r_pio);					\
 }
 
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
  2008-06-17 19:05   ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap benami
  2008-06-17 19:05     ` [PATCH] KVM: PCIPT: free device structure on vm destroy benami
@ 2008-06-17 21:29     ` Anthony Liguori
  2008-06-18 12:06       ` Ben-Ami Yassour
  2008-06-20 18:59       ` Avi Kivity
  1 sibling, 2 replies; 22+ messages in thread
From: Anthony Liguori @ 2008-06-17 21:29 UTC (permalink / raw)
  To: benami; +Cc: amit.shah, weidong.han, muli, raharper, kvm

I think the current VT-d code needs some reworking.

We should build the table as the shadow page table gets built.  We 
should suppress iotlb flushes unless the table is actually being updated.

Regards,

Anthony Liguori

benami@il.ibm.com wrote:
> From: Han Weidong <weidong.han@intel.com>
>
> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> ---
>  arch/x86/kvm/vtd.c        |    2 +-
>  drivers/pci/intel-iommu.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> index c2d9192..be775cd 100644
> --- a/arch/x86/kvm/vtd.c
> +++ b/arch/x86/kvm/vtd.c
> @@ -157,7 +157,7 @@ int kvm_iommu_unmap_guest(struct kvm *kvm)
>  	struct kvm_pci_pt_dev_list *entry;
>  	struct pci_dev *pdev = NULL;
>  
> - 	if (kvm->arch.domain)
> +	if (!kvm->arch.domain)
>  		return 0;
>  
>  	list_for_each_entry(entry, &kvm->arch.pci_pt_dev_head, list) {
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index d927a13..930874f 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -2483,7 +2483,7 @@ EXPORT_SYMBOL_GPL(kvm_intel_iommu_page_mapping);
>  
>  void kvm_intel_iommu_detach_dev(struct dmar_domain *domain, u8 bus, u8 devfn)
>  {
> -	kvm_intel_iommu_detach_dev(domain, bus, devfn);
> +	detach_domain_for_dev(domain, bus, devfn);
>  }
>  EXPORT_SYMBOL_GPL(kvm_intel_iommu_detach_dev);
>  
>   


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
  2008-06-17 21:29     ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap Anthony Liguori
@ 2008-06-18 12:06       ` Ben-Ami Yassour
  2008-06-18 20:48         ` Anthony Liguori
  2008-06-20 18:59       ` Avi Kivity
  1 sibling, 1 reply; 22+ messages in thread
From: Ben-Ami Yassour @ 2008-06-18 12:06 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: amit.shah, weidong.han, Muli Ben-Yehuda, raharper, kvm

On Tue, 2008-06-17 at 16:29 -0500, Anthony Liguori wrote:
> I think the current VT-d code needs some reworking.
> 
> We should build the table as the shadow page table gets built.  We 
> should suppress iotlb flushes unless the table is actually being updated.
> 

I'm not sure what you mean.
The current implementation of vtd for passthrough is a direct map, which
means that we map the entire guest memory (and pin it).
In this case there are no iotlb flushes after the first initialization.

Obviously, pinning the entire guest is not desirable since we waste a
lot of memory resources, but this is the approach that we currently
have. Do you find it good enough for a merge with the main KVM tree, and
optimize later?

When you mentioned building a table as the shadow page table, did you
mean that we should map the IOMMU on demand?
I'm not sure how we can do that... the guest can send a guest physical
address to the device for DMA, even without generating a page-fault on
the host for that address... which implies that the host must pin the
entire guest memory in advance. agree?

The only way I can think of avoiding that is PVDMA with VT-d, which
means that there is a hyper call for each DMA request, but this is a
different solution, cause it only applies to PV guests.

Do you see a way to avoid mapping (and pinning) the entire guest memory
for fully virtual guests (and without parsing every transaction between
the guest and the device to figure out the DMA addresses)?

Regards,
Ben

> Regards,
> 
> Anthony Liguori
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-17 19:05       ` [PATCH] KVM: PCIPT: VT-d: fix context mapping benami
  2008-06-17 19:25         ` KVM: PCIPT: temporary fix for pio (userspace part) benami
@ 2008-06-18 20:30         ` Muli Ben-Yehuda
  2008-06-19  8:59         ` Han, Weidong
  2 siblings, 0 replies; 22+ messages in thread
From: Muli Ben-Yehuda @ 2008-06-18 20:30 UTC (permalink / raw)
  To: Ben-Ami Yassour1; +Cc: amit.shah, weidong.han, raharper, kvm

On Tue, Jun 17, 2008 at 10:05:26PM +0300, Ben-Ami Yassour1 wrote:

> From: Ben-Ami Yassour <benami@il.ibm.com>
> 
> When changing the VT-d context mapping, according to the spec, it is
> required to first set the context to not present, flush and only
> then apply the new context.
> 
> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> ---
>  drivers/pci/intel-iommu.c |   17 +++++++++++++++++
>  1 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index 930874f..dcdfa97 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -56,6 +56,7 @@
>  
>  
>  static void flush_unmaps_timeout(unsigned long data);
> +static void detach_domain_for_dev(struct dmar_domain *domain, u8 bus, u8 devfn);
>  
>  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
>  
> @@ -1264,7 +1265,23 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
>  	if (!context)
>  		return -ENOMEM;
>  	spin_lock_irqsave(&iommu->lock, flags);
> +	
> +	if (context_present(*context) &&
> +	    (context_domain_id(*context) == domain->id) &&
> +	    (context_address_width(*context) == domain->agaw) &&
> +	    (context_address_root(*context) == virt_to_phys(domain->pgd)) &&
> +	    (context_translation_type(*context) == CONTEXT_TT_MULTI_LEVEL) &&
> +	    (!context_fault_disable(*context))) {
> +		spin_unlock_irqrestore(&iommu->lock, flags);
> +		return 0;
> +	}

Can we wrap this in a descriptively named function, which will also
and release the lock? I think it will make the flow more obvious here.

Cheers,
Muli

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
  2008-06-18 12:06       ` Ben-Ami Yassour
@ 2008-06-18 20:48         ` Anthony Liguori
  2008-06-18 21:23           ` Muli Ben-Yehuda
  0 siblings, 1 reply; 22+ messages in thread
From: Anthony Liguori @ 2008-06-18 20:48 UTC (permalink / raw)
  To: Ben-Ami Yassour; +Cc: amit.shah, weidong.han, Muli Ben-Yehuda, raharper, kvm

Ben-Ami Yassour wrote:
> On Tue, 2008-06-17 at 16:29 -0500, Anthony Liguori wrote:
>   
>> I think the current VT-d code needs some reworking.
>>
>> We should build the table as the shadow page table gets built.  We 
>> should suppress iotlb flushes unless the table is actually being updated.
>>
>>     
>
> I'm not sure what you mean.
> The current implementation of vtd for passthrough is a direct map, which
> means that we map the entire guest memory (and pin it).
> In this case there are no iotlb flushes after the first initialization.
>   

Right.  But this is not ideal.  Instead of pinning up-front, it would 
make more sense IMHO to build the VT-d table as the shadow page table 
gets faulted in.  In certain circumstances, this will result in 
extraneous updates (because a GPA=>HPA mapping is already present) and 
that's where we should eliminate iotlb flushes.

For now, we should basically do this for all of physical memory but we 
should have the right infrastructure such that we can be more clever 
once we have a PVDMA API.

> Obviously, pinning the entire guest is not desirable since we waste a
> lot of memory resources, but this is the approach that we currently
> have. Do you find it good enough for a merge with the main KVM tree, and
> optimize later?
>   

No, it's not safe.  What happens mmap(MAP_FIXED) into phys_ram_base?  We 
need to use MMU notifiers to handle such events and appropriately flush 
the iotlb. 

> When you mentioned building a table as the shadow page table, did you
> mean that we should map the IOMMU on demand?
>   

Yes, but in the absence of a PV guest, there's a very special case where 
we pre-fault the entire table.

> I'm not sure how we can do that... the guest can send a guest physical
> address to the device for DMA, even without generating a page-fault on
> the host for that address... which implies that the host must pin the
> entire guest memory in advance. agree?
>   

See above.  Ideally we would wait until the first PCI config space 
access for a device before special casing the guest.  Otherwise, there's 
no way to allow a DMA-aware guest to avoid pinning up front.

> The only way I can think of avoiding that is PVDMA with VT-d, which
> means that there is a hyper call for each DMA request, but this is a
> different solution, cause it only applies to PV guests.
>   

It doesn't strictly require a hypercall, but yes, that's the general 
solution.

> Do you see a way to avoid mapping (and pinning) the entire guest memory
> for fully virtual guests (and without parsing every transaction between
> the guest and the device to figure out the DMA addresses)?
>   

The key is to support both cases with the same infrastructure.  The 
unmodified guest should just be a special case.

Regards,

Anthony Liguori

> Regards,
> Ben
>
>   
>> Regards,
>>
>> Anthony Liguori
>>
>>     
>
>
>   


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
  2008-06-18 20:48         ` Anthony Liguori
@ 2008-06-18 21:23           ` Muli Ben-Yehuda
  2008-06-18 21:41             ` Anthony Liguori
  0 siblings, 1 reply; 22+ messages in thread
From: Muli Ben-Yehuda @ 2008-06-18 21:23 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Ben-Ami Yassour1, amit.shah, weidong.han, raharper, kvm

On Wed, Jun 18, 2008 at 03:48:33PM -0500, Anthony Liguori wrote:

> Right.  But this is not ideal.  Instead of pinning up-front, it
> would make more sense IMHO to build the VT-d table as the shadow
> page table gets faulted in.  In certain circumstances, this will
> result in extraneous updates (because a GPA=>HPA mapping is already
> present) and that's where we should eliminate iotlb flushes.

As Ben wrote, we can't do this and must fault everything in up-front
(assuming no PVDMA API). Assume we don't do this: it is valid for the
guest to program the device with a GPA that does not yet have a
corresponding HPA (because the guest did not write or read to/from it
and thus we haven't yet faulted in a frame for it). Then, once the
device DMA's to it, the DMA will be stopped incorrectly.

>> Obviously, pinning the entire guest is not desirable since we waste
>> a lot of memory resources, but this is the approach that we
>> currently have. Do you find it good enough for a merge with the
>> main KVM tree, and optimize later?
>
> No, it's not safe.  What happens mmap(MAP_FIXED) into phys_ram_base?
> We need to use MMU notifiers to handle such events and appropriately
> flush the iotlb.

Could you elaborate on what you mean here and what is not safe? Our
current approach is to just fault in all of guest memory---are you
concerned about a case where some of the guest frames get replaced by
other frames because of the mmap()? 

I'd like to stress that we are shooting at the moment for the simplest
possible solution that is good enough, so that we'll be able to
finally merge this into the tree...

>> I'm not sure how we can do that... the guest can send a guest
>> physical address to the device for DMA, even without generating a
>> page-fault on the host for that address... which implies that the
>> host must pin the entire guest memory in advance. agree?
>
> See above.  Ideally we would wait until the first PCI config space
> access for a device before special casing the guest.  Otherwise,
> there's no way to allow a DMA-aware guest to avoid pinning up front.

Err, if the user gave the guest pass-through access to a PCI device,
presumably it is because the guest will use it... What do we win by
delaying the inevitable?

Cheers,
Muli

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
  2008-06-18 21:23           ` Muli Ben-Yehuda
@ 2008-06-18 21:41             ` Anthony Liguori
  0 siblings, 0 replies; 22+ messages in thread
From: Anthony Liguori @ 2008-06-18 21:41 UTC (permalink / raw)
  To: Muli Ben-Yehuda; +Cc: Ben-Ami Yassour1, amit.shah, weidong.han, raharper, kvm

Muli Ben-Yehuda wrote:
> On Wed, Jun 18, 2008 at 03:48:33PM -0500, Anthony Liguori wrote:
>
>   
>> Right.  But this is not ideal.  Instead of pinning up-front, it
>> would make more sense IMHO to build the VT-d table as the shadow
>> page table gets faulted in.  In certain circumstances, this will
>> result in extraneous updates (because a GPA=>HPA mapping is already
>> present) and that's where we should eliminate iotlb flushes.
>>     
>
> As Ben wrote, we can't do this and must fault everything in up-front
> (assuming no PVDMA API). Assume we don't do this: it is valid for the
> guest to program the device with a GPA that does not yet have a
> corresponding HPA (because the guest did not write or read to/from it
> and thus we haven't yet faulted in a frame for it). Then, once the
> device DMA's to it, the DMA will be stopped incorrectly.
>   

As I've said, the lack of PVDMA API is a special case.  The key is to 
use the same internal infrastructure.

>>> Obviously, pinning the entire guest is not desirable since we waste
>>> a lot of memory resources, but this is the approach that we
>>> currently have. Do you find it good enough for a merge with the
>>> main KVM tree, and optimize later?
>>>       
>> No, it's not safe.  What happens mmap(MAP_FIXED) into phys_ram_base?
>> We need to use MMU notifiers to handle such events and appropriately
>> flush the iotlb.
>>     
>
> Could you elaborate on what you mean here and what is not safe? Our
> current approach is to just fault in all of guest memory---are you
> concerned about a case where some of the guest frames get replaced by
> other frames because of the mmap()? 
>   

Because the guest is now accessing memory that is not guest memory.  
When mmu-notifiers forcefully change a mapping, we need to react to it.

> I'd like to stress that we are shooting at the moment for the simplest
> possible solution that is good enough, so that we'll be able to
> finally merge this into the tree...
>   

I don't think what I'm suggesting is more code than the current 
implementation and it fits more cleanly into KVM.

>>> I'm not sure how we can do that... the guest can send a guest
>>> physical address to the device for DMA, even without generating a
>>> page-fault on the host for that address... which implies that the
>>> host must pin the entire guest memory in advance. agree?
>>>       
>> See above.  Ideally we would wait until the first PCI config space
>> access for a device before special casing the guest.  Otherwise,
>> there's no way to allow a DMA-aware guest to avoid pinning up front.
>>     
>
> Err, if the user gave the guest pass-through access to a PCI device,
> presumably it is because the guest will use it... What do we win by
> delaying the inevitable?
>   

s/DMA-aware/PVDMA-aware/

You do not know if a guest is PVDMA-aware until the guest tells you 
so.   If you pin all of memory before the guest starts running, you may 
not have needed to allocate all of that memory.  As we move to 
cooperative memory management between the host and guest, I expect the 
normal circumstance will be to launch a guest with far more memory than 
it needs relying on the fact that the guest will not touch that memory.  
Pinning memory unconditionally defeats this.

In terms of merging, I don't think it's going to be reasonable to merge 
for 2.6.27 so there's not much of an argument for not doing it correctly.

Regards,

Anthony Liguori

> Cheers,
> Muli
>   


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-17 19:05       ` [PATCH] KVM: PCIPT: VT-d: fix context mapping benami
  2008-06-17 19:25         ` KVM: PCIPT: temporary fix for pio (userspace part) benami
  2008-06-18 20:30         ` [PATCH] KVM: PCIPT: VT-d: fix context mapping Muli Ben-Yehuda
@ 2008-06-19  8:59         ` Han, Weidong
  2008-06-19 12:28           ` Ben-Ami Yassour
  2 siblings, 1 reply; 22+ messages in thread
From: Han, Weidong @ 2008-06-19  8:59 UTC (permalink / raw)
  To: benami, amit.shah; +Cc: muli, raharper, kvm

benami@il.ibm.com wrote:
> From: Ben-Ami Yassour <benami@il.ibm.com>
> 
> When changing the VT-d context mapping, according to the spec, it is
> required 
> to first set the context to not present, flush and only then apply the
> new context.
> 
> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> ---
>  drivers/pci/intel-iommu.c |   17 +++++++++++++++++
>  1 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index 930874f..dcdfa97 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -56,6 +56,7 @@
> 
> 
>  static void flush_unmaps_timeout(unsigned long data);
> +static void detach_domain_for_dev(struct dmar_domain *domain, u8
> bus, u8 devfn); 
> 
>  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
> 
> @@ -1264,7 +1265,23 @@ static int domain_context_mapping_one(struct
>  	dmar_domain *domain, if (!context)
>  		return -ENOMEM;
>  	spin_lock_irqsave(&iommu->lock, flags);
> +
> +	if (context_present(*context) &&
> +	    (context_domain_id(*context) == domain->id) &&
> +	    (context_address_width(*context) == domain->agaw) &&
> +	    (context_address_root(*context) ==
virt_to_phys(domain->pgd)) &&
> +	    (context_translation_type(*context) ==
CONTEXT_TT_MULTI_LEVEL)
> && +	    (!context_fault_disable(*context))) {
> +		spin_unlock_irqrestore(&iommu->lock, flags);
> +		return 0;
> +	}

Only need to check context_present(*context) according to VT-d spec,
which says "software must not modify fields other than the Present (P)
field of currently present root-entries or context-entries."

Randy (Weidong)

> +
> +	spin_unlock_irqrestore(&iommu->lock, flags);
> +
> +	detach_domain_for_dev(domain, bus, devfn);
> 
> +	spin_lock_irqsave(&iommu->lock, flags);
> +
>  	context_clear_entry(*context);
>  	context_set_domain_id(*context, domain->id);
>  	context_set_address_width(*context, domain->agaw);


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-19  8:59         ` Han, Weidong
@ 2008-06-19 12:28           ` Ben-Ami Yassour
  2008-06-19 14:18             ` Han, Weidong
  0 siblings, 1 reply; 22+ messages in thread
From: Ben-Ami Yassour @ 2008-06-19 12:28 UTC (permalink / raw)
  To: Han, Weidong; +Cc: amit.shah, Muli Ben-Yehuda, raharper, kvm

On Thu, 2008-06-19 at 16:59 +0800, Han, Weidong wrote:
> benami@il.ibm.com wrote:
> > From: Ben-Ami Yassour <benami@il.ibm.com>
> > 
> > When changing the VT-d context mapping, according to the spec, it is
> > required 
> > to first set the context to not present, flush and only then apply the
> > new context.
> > 
> > Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> > ---
> >  drivers/pci/intel-iommu.c |   17 +++++++++++++++++
> >  1 files changed, 17 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> > index 930874f..dcdfa97 100644
> > --- a/drivers/pci/intel-iommu.c
> > +++ b/drivers/pci/intel-iommu.c
> > @@ -56,6 +56,7 @@
> > 
> > 
> >  static void flush_unmaps_timeout(unsigned long data);
> > +static void detach_domain_for_dev(struct dmar_domain *domain, u8
> > bus, u8 devfn); 
> > 
> >  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
> > 
> > @@ -1264,7 +1265,23 @@ static int domain_context_mapping_one(struct
> >  	dmar_domain *domain, if (!context)
> >  		return -ENOMEM;
> >  	spin_lock_irqsave(&iommu->lock, flags);
> > +
> > +	if (context_present(*context) &&
> > +	    (context_domain_id(*context) == domain->id) &&
> > +	    (context_address_width(*context) == domain->agaw) &&
> > +	    (context_address_root(*context) ==
> virt_to_phys(domain->pgd)) &&
> > +	    (context_translation_type(*context) ==
> CONTEXT_TT_MULTI_LEVEL)
> > && +	    (!context_fault_disable(*context))) {
> > +		spin_unlock_irqrestore(&iommu->lock, flags);
> > +		return 0;
> > +	}
> 
> Only need to check context_present(*context) according to VT-d spec,
> which says "software must not modify fields other than the Present (P)
> field of currently present root-entries or context-entries."
> 
> Randy (Weidong)

The logic here is that, if no change is made to the context then just
return ok (0). Otherwise, according to the spec, we need to first
invalidate the context, flush it, and only then apply the changes to the
context.

The other option is to make sure that before every call to this function
the context is invalidated, but disabling it inside the function seems
safer. do you agree with that?

Regards,
Ben

> 
> > +
> > +	spin_unlock_irqrestore(&iommu->lock, flags);
> > +
> > +	detach_domain_for_dev(domain, bus, devfn);
> > 
> > +	spin_lock_irqsave(&iommu->lock, flags);
> > +
> >  	context_clear_entry(*context);
> >  	context_set_domain_id(*context, domain->id);
> >  	context_set_address_width(*context, domain->agaw);


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-19 12:28           ` Ben-Ami Yassour
@ 2008-06-19 14:18             ` Han, Weidong
  2008-06-19 17:44               ` Ben-Ami Yassour1
  0 siblings, 1 reply; 22+ messages in thread
From: Han, Weidong @ 2008-06-19 14:18 UTC (permalink / raw)
  To: Ben-Ami Yassour; +Cc: amit.shah, Muli Ben-Yehuda, raharper, kvm

Ben-Ami Yassour wrote:
> On Thu, 2008-06-19 at 16:59 +0800, Han, Weidong wrote:
>> benami@il.ibm.com wrote:
>>> From: Ben-Ami Yassour <benami@il.ibm.com>
>>> 
>>> When changing the VT-d context mapping, according to the spec, it
>>> is required to first set the context to not present, flush and only
>>> then apply the new context. 
>>> 
>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
>>> ---
>>>  drivers/pci/intel-iommu.c |   17 +++++++++++++++++
>>>  1 files changed, 17 insertions(+), 0 deletions(-)
>>> 
>>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
>>> index 930874f..dcdfa97 100644 --- a/drivers/pci/intel-iommu.c
>>> +++ b/drivers/pci/intel-iommu.c
>>> @@ -56,6 +56,7 @@
>>> 
>>> 
>>>  static void flush_unmaps_timeout(unsigned long data);
>>> +static void detach_domain_for_dev(struct dmar_domain *domain, u8
>>> bus, u8 devfn); 
>>> 
>>>  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
>>> 
>>> @@ -1264,7 +1265,23 @@ static int domain_context_mapping_one(struct
>>>  	dmar_domain *domain, if (!context)
>>>  		return -ENOMEM;
>>>  	spin_lock_irqsave(&iommu->lock, flags);
>>> +
>>> +	if (context_present(*context) &&
>>> +	    (context_domain_id(*context) == domain->id) &&
>>> +	    (context_address_width(*context) == domain->agaw) &&
>>> +	    (context_address_root(*context) == virt_to_phys(domain->pgd))
>>> && +	    (context_translation_type(*context) ==
>>> CONTEXT_TT_MULTI_LEVEL) && +	   
>>> (!context_fault_disable(*context))) {
>>> +		spin_unlock_irqrestore(&iommu->lock, flags); +		return 0;
>>> +	}
>> 
>> Only need to check context_present(*context) according to VT-d spec,
>> which says "software must not modify fields other than the Present
>> (P) field of currently present root-entries or context-entries."
>> 
>> Randy (Weidong)
> 
> The logic here is that, if no change is made to the context then just
> return ok (0). Otherwise, according to the spec, we need to first
> invalidate the context, flush it, and only then apply the changes to
> the context.
> 
> The other option is to make sure that before every call to this
> function the context is invalidated, but disabling it inside the
> function seems safer. do you agree with that?
> 

After a device can be assigned to guest with VT-d, it needs a context unmap function. When a device is assigned to a guest, map context for it, while when it is detached from a guest, should unmap its context. With the context unmap function, I think we needn't to implement your logic in domain_context_mapping_one(), instead just check its present. What's your opinion?

Randy (Weidong)

> 
>> 
>>> +
>>> +	spin_unlock_irqrestore(&iommu->lock, flags);
>>> +
>>> +	detach_domain_for_dev(domain, bus, devfn);
>>> 
>>> +	spin_lock_irqsave(&iommu->lock, flags);
>>> +
>>>  	context_clear_entry(*context);
>>>  	context_set_domain_id(*context, domain->id);
>>>  	context_set_address_width(*context, domain->agaw);


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-19 14:18             ` Han, Weidong
@ 2008-06-19 17:44               ` Ben-Ami Yassour1
  2008-06-20  6:23                 ` Han, Weidong
  0 siblings, 1 reply; 22+ messages in thread
From: Ben-Ami Yassour1 @ 2008-06-19 17:44 UTC (permalink / raw)
  To: Han, Weidong; +Cc: amit.shah, kvm, Muli Ben-Yehuda, raharper



"Han, Weidong" <weidong.han@intel.com> wrote on 19/06/2008 17:18:00:

> Ben-Ami Yassour wrote:
> > On Thu, 2008-06-19 at 16:59 +0800, Han, Weidong wrote:
> >> benami@il.ibm.com wrote:
> >>> From: Ben-Ami Yassour <benami@il.ibm.com>
> >>>
> >>> When changing the VT-d context mapping, according to the spec, it
> >>> is required to first set the context to not present, flush and only
> >>> then apply the new context.
> >>>
> >>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> >>> ---
> >>>  drivers/pci/intel-iommu.c |   17 +++++++++++++++++
> >>>  1 files changed, 17 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> >>> index 930874f..dcdfa97 100644 --- a/drivers/pci/intel-iommu.c
> >>> +++ b/drivers/pci/intel-iommu.c
> >>> @@ -56,6 +56,7 @@
> >>>
> >>>
> >>>  static void flush_unmaps_timeout(unsigned long data);
> >>> +static void detach_domain_for_dev(struct dmar_domain *domain, u8
> >>> bus, u8 devfn);
> >>>
> >>>  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
> >>>
> >>> @@ -1264,7 +1265,23 @@ static int domain_context_mapping_one(struct
> >>>     dmar_domain *domain, if (!context)
> >>>        return -ENOMEM;
> >>>     spin_lock_irqsave(&iommu->lock, flags);
> >>> +
> >>> +   if (context_present(*context) &&
> >>> +       (context_domain_id(*context) == domain->id) &&
> >>> +       (context_address_width(*context) == domain->agaw) &&
> >>> +       (context_address_root(*context) == virt_to_phys(domain->pgd))
> >>> && +       (context_translation_type(*context) ==
> >>> CONTEXT_TT_MULTI_LEVEL) && +
> >>> (!context_fault_disable(*context))) {
> >>> +      spin_unlock_irqrestore(&iommu->lock, flags); +      return 0;
> >>> +   }
> >>
> >> Only need to check context_present(*context) according to VT-d spec,
> >> which says "software must not modify fields other than the Present
> >> (P) field of currently present root-entries or context-entries."
> >>
> >> Randy (Weidong)
> >
> > The logic here is that, if no change is made to the context then just
> > return ok (0). Otherwise, according to the spec, we need to first
> > invalidate the context, flush it, and only then apply the changes to
> > the context.
> >
> > The other option is to make sure that before every call to this
> > function the context is invalidated, but disabling it inside the
> > function seems safer. do you agree with that?
> >
>
> After a device can be assigned to guest with VT-d, it needs a
> context unmap function. When a device is assigned to a guest, map
> context for it, while when it is detached from a guest, should unmap
> its context. With the context unmap function, I think we needn't to
> implement your logic in domain_context_mapping_one(), instead just
> check its present. What's your opinion?

Sure, that's fine, this is the other option I mentioned.
But we need to add the context unmap function.
Something like:
diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index be775cd..4b96fbb 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -109,11 +109,17 @@ found:
                kvm_iommu_unmap_memslots(kvm);
                return -EFAULT;
        }
+
+       kvm_intel_iommu_detach_dev(kvm->arch.domain,
+                                  pdev->bus->number, pdev->devfn);
+
        kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev);
        return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_iommu_map_guest);

Agree?

Thanks,
Ben


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-19 17:44               ` Ben-Ami Yassour1
@ 2008-06-20  6:23                 ` Han, Weidong
  2008-06-30 15:32                   ` Ben-Ami Yassour
  0 siblings, 1 reply; 22+ messages in thread
From: Han, Weidong @ 2008-06-20  6:23 UTC (permalink / raw)
  To: Ben-Ami Yassour1; +Cc: amit.shah, kvm, Muli Ben-Yehuda, raharper

Ben-Ami Yassour1 wrote:
> "Han, Weidong" <weidong.han@intel.com> wrote on 19/06/2008 17:18:00:
> 
>> Ben-Ami Yassour wrote:
>>> On Thu, 2008-06-19 at 16:59 +0800, Han, Weidong wrote:
>>>> benami@il.ibm.com wrote:
>>>>> From: Ben-Ami Yassour <benami@il.ibm.com>
>>>>> 
>>>>> When changing the VT-d context mapping, according to the spec, it
>>>>> is required to first set the context to not present, flush and
>>>>> only then apply the new context. 
>>>>> 
>>>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
>>>>> ---
>>>>>  drivers/pci/intel-iommu.c |   17 +++++++++++++++++
>>>>>  1 files changed, 17 insertions(+), 0 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
>>>>> index 930874f..dcdfa97 100644 --- a/drivers/pci/intel-iommu.c
>>>>> +++ b/drivers/pci/intel-iommu.c
>>>>> @@ -56,6 +56,7 @@
>>>>> 
>>>>> 
>>>>>  static void flush_unmaps_timeout(unsigned long data);
>>>>> +static void detach_domain_for_dev(struct dmar_domain *domain, u8
>>>>> bus, u8 devfn); 
>>>>> 
>>>>>  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
>>>>> 
>>>>> @@ -1264,7 +1265,23 @@ static int
>>>>>     domain_context_mapping_one(struct dmar_domain *domain, if
>>>>>        (!context) return -ENOMEM;
>>>>>     spin_lock_irqsave(&iommu->lock, flags);
>>>>> +
>>>>> +   if (context_present(*context) &&
>>>>> +       (context_domain_id(*context) == domain->id) &&
>>>>> +       (context_address_width(*context) == domain->agaw) &&
>>>>> +       (context_address_root(*context) ==
>>>>> virt_to_phys(domain->pgd)) && +      
>>>>> (context_translation_type(*context) == CONTEXT_TT_MULTI_LEVEL) &&
>>>>> + (!context_fault_disable(*context))) {
>>>>> +      spin_unlock_irqrestore(&iommu->lock, flags); +      return
>>>>> 0; +   }
>>>> 
>>>> Only need to check context_present(*context) according to VT-d
>>>> spec, which says "software must not modify fields other than the
>>>> Present (P) field of currently present root-entries or
>>>> context-entries." 
>>>> 
>>>> Randy (Weidong)
>>> 
>>> The logic here is that, if no change is made to the context then
>>> just return ok (0). Otherwise, according to the spec, we need to
>>> first invalidate the context, flush it, and only then apply the
>>> changes to the context. 
>>> 
>>> The other option is to make sure that before every call to this
>>> function the context is invalidated, but disabling it inside the
>>> function seems safer. do you agree with that?
>>> 
>> 
>> After a device can be assigned to guest with VT-d, it needs a
>> context unmap function. When a device is assigned to a guest, map
>> context for it, while when it is detached from a guest, should unmap
>> its context. With the context unmap function, I think we needn't to
>> implement your logic in domain_context_mapping_one(), instead just
>> check its present. What's your opinion?
> 
> Sure, that's fine, this is the other option I mentioned.
> But we need to add the context unmap function.
> Something like:
> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> index be775cd..4b96fbb 100644
> --- a/arch/x86/kvm/vtd.c
> +++ b/arch/x86/kvm/vtd.c
> @@ -109,11 +109,17 @@ found:
>                 kvm_iommu_unmap_memslots(kvm);
>                 return -EFAULT;
>         }
> +
> +       kvm_intel_iommu_detach_dev(kvm->arch.domain,
> +                                  pdev->bus->number, pdev->devfn);
> +
>         kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev);
>         return 0;
>  }
>  EXPORT_SYMBOL_GPL(kvm_iommu_map_guest);
> 
> Agree?
> 

I don't think it's necessary to add kvm_intel_iommu_detach_dev() here.
If a device can be assigned to a guest, it should not be used by other
guest (assuming no hotplug support). And also
kvm_intel_iommu_detach_dev() is already called in
kvm_iommu_unmap_guest(). Normally context won't be present here.
Otherwise, there should be some wrong. I attach a patch to change it
back to original kernel VT-d code. I think it's correct and clean.

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index 930874f..04517a2 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -1264,8 +1264,11 @@ static int domain_context_mapping_one(struct
dmar_domain *domain,
 	if (!context)
 		return -ENOMEM;
 	spin_lock_irqsave(&iommu->lock, flags);
+	if (context_present(*context)) {
+		spin_unlock_irqrestore(&iommu->lock, flags);
+		return 0;
+	}
 
-	context_clear_entry(*context);
 	context_set_domain_id(*context, domain->id);
 	context_set_address_width(*context, domain->agaw);
 	context_set_address_root(*context, virt_to_phys(domain->pgd));


Randy (Weidong)



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
  2008-06-17 21:29     ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap Anthony Liguori
  2008-06-18 12:06       ` Ben-Ami Yassour
@ 2008-06-20 18:59       ` Avi Kivity
  2008-06-20 19:28         ` Anthony Liguori
  1 sibling, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2008-06-20 18:59 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: benami, amit.shah, weidong.han, muli, raharper, kvm

Anthony Liguori wrote:
> I think the current VT-d code needs some reworking.
>
> We should build the table as the shadow page table gets built.  We 
> should suppress iotlb flushes unless the table is actually being updated.
>

We can't, since we need the iommu tables populated before we issue any dma.

Perhaps we want something like MAP_POPULATE for shadow, which would then 
affect the iommu tables.  Userspace would then do:

mlock()
... set up device assignment
ioctl(..., KVM_POPULATE)


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
  2008-06-20 18:59       ` Avi Kivity
@ 2008-06-20 19:28         ` Anthony Liguori
  0 siblings, 0 replies; 22+ messages in thread
From: Anthony Liguori @ 2008-06-20 19:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: benami, amit.shah, weidong.han, muli, raharper, kvm

Avi Kivity wrote:
> Anthony Liguori wrote:
>> I think the current VT-d code needs some reworking.
>>
>> We should build the table as the shadow page table gets built.  We 
>> should suppress iotlb flushes unless the table is actually being 
>> updated.
>>
>
> We can't, since we need the iommu tables populated before we issue any 
> dma.

Yes, as I've mentioned, the lack of a DMA window notification API can be 
handled as a special case.

> Perhaps we want something like MAP_POPULATE for shadow, which would 
> then affect the iommu tables.  Userspace would then do:
>
> mlock()
> ... set up device assignment
> ioctl(..., KVM_POPULATE)

Exactly like this :-)

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-20  6:23                 ` Han, Weidong
@ 2008-06-30 15:32                   ` Ben-Ami Yassour
  2008-07-01  2:22                     ` Han, Weidong
  0 siblings, 1 reply; 22+ messages in thread
From: Ben-Ami Yassour @ 2008-06-30 15:32 UTC (permalink / raw)
  To: Han, Weidong; +Cc: amit.shah, kvm, Muli Ben-Yehuda, raharper

On Fri, 2008-06-20 at 14:23 +0800, Han, Weidong wrote:
> Ben-Ami Yassour1 wrote:
> > "Han, Weidong" <weidong.han@intel.com> wrote on 19/06/2008 17:18:00:
> > 
> >> Ben-Ami Yassour wrote:
> >>> On Thu, 2008-06-19 at 16:59 +0800, Han, Weidong wrote:
> >>>> benami@il.ibm.com wrote:
> >>>>> From: Ben-Ami Yassour <benami@il.ibm.com>
> >>>>> 
> >>>>> When changing the VT-d context mapping, according to the spec, it
> >>>>> is required to first set the context to not present, flush and
> >>>>> only then apply the new context. 
> >>>>> 
> >>>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
> >>>>> ---
> >>>>>  drivers/pci/intel-iommu.c |   17 +++++++++++++++++
> >>>>>  1 files changed, 17 insertions(+), 0 deletions(-)
> >>>>> 
> >>>>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> >>>>> index 930874f..dcdfa97 100644 --- a/drivers/pci/intel-iommu.c
> >>>>> +++ b/drivers/pci/intel-iommu.c
> >>>>> @@ -56,6 +56,7 @@
> >>>>> 
> >>>>> 
> >>>>>  static void flush_unmaps_timeout(unsigned long data);
> >>>>> +static void detach_domain_for_dev(struct dmar_domain *domain, u8
> >>>>> bus, u8 devfn); 
> >>>>> 
> >>>>>  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
> >>>>> 
> >>>>> @@ -1264,7 +1265,23 @@ static int
> >>>>>     domain_context_mapping_one(struct dmar_domain *domain, if
> >>>>>        (!context) return -ENOMEM;
> >>>>>     spin_lock_irqsave(&iommu->lock, flags);
> >>>>> +
> >>>>> +   if (context_present(*context) &&
> >>>>> +       (context_domain_id(*context) == domain->id) &&
> >>>>> +       (context_address_width(*context) == domain->agaw) &&
> >>>>> +       (context_address_root(*context) ==
> >>>>> virt_to_phys(domain->pgd)) && +      
> >>>>> (context_translation_type(*context) == CONTEXT_TT_MULTI_LEVEL) &&
> >>>>> + (!context_fault_disable(*context))) {
> >>>>> +      spin_unlock_irqrestore(&iommu->lock, flags); +      return
> >>>>> 0; +   }
> >>>> 
> >>>> Only need to check context_present(*context) according to VT-d
> >>>> spec, which says "software must not modify fields other than the
> >>>> Present (P) field of currently present root-entries or
> >>>> context-entries." 
> >>>> 
> >>>> Randy (Weidong)
> >>> 
> >>> The logic here is that, if no change is made to the context then
> >>> just return ok (0). Otherwise, according to the spec, we need to
> >>> first invalidate the context, flush it, and only then apply the
> >>> changes to the context. 
> >>> 
> >>> The other option is to make sure that before every call to this
> >>> function the context is invalidated, but disabling it inside the
> >>> function seems safer. do you agree with that?
> >>> 
> >> 
> >> After a device can be assigned to guest with VT-d, it needs a
> >> context unmap function. When a device is assigned to a guest, map
> >> context for it, while when it is detached from a guest, should unmap
> >> its context. With the context unmap function, I think we needn't to
> >> implement your logic in domain_context_mapping_one(), instead just
> >> check its present. What's your opinion?
> > 
> > Sure, that's fine, this is the other option I mentioned.
> > But we need to add the context unmap function.
> > Something like:
> > diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> > index be775cd..4b96fbb 100644
> > --- a/arch/x86/kvm/vtd.c
> > +++ b/arch/x86/kvm/vtd.c
> > @@ -109,11 +109,17 @@ found:
> >                 kvm_iommu_unmap_memslots(kvm);
> >                 return -EFAULT;
> >         }
> > +
> > +       kvm_intel_iommu_detach_dev(kvm->arch.domain,
> > +                                  pdev->bus->number, pdev->devfn);
> > +
> >         kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev);
> >         return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_iommu_map_guest);
> > 
> > Agree?
> > 
> 
> I don't think it's necessary to add kvm_intel_iommu_detach_dev() here.
> If a device can be assigned to a guest, it should not be used by other
> guest (assuming no hotplug support). And also
> kvm_intel_iommu_detach_dev() is already called in
> kvm_iommu_unmap_guest(). Normally context won't be present here.
> Otherwise, there should be some wrong. I attach a patch to change it
> back to original kernel VT-d code. I think it's correct and clean.

The problem is with the following scenario:
1. load the host NIC driver
2. unload the host NIC driver
3. start kvm with passthrough for that NIC

In this case the context is not cleaned, and we get VT-d failures.
I agree with changing the VT-d driver code back to original.
But we do need:

diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index be775cd..4b96fbb 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -109,11 +109,17 @@ found:
                kvm_iommu_unmap_memslots(kvm);
                return -EFAULT;
        }
+
+       kvm_intel_iommu_detach_dev(kvm->arch.domain,
+                                  pdev->bus->number, pdev->devfn);
+
        kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev);
        return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_iommu_map_guest);

Otherwise kvm_intel_iommu_context_mapping will exit on the context_present(*context) check.
This means that the context is still going to point to the iova of the host NIC.
If you do not agree, please explain what code path clears the context written for the host NIC driver in this scenario
(note that empirically I see that my assumption is correct).

Thanks,
Ben

> 
> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> index 930874f..04517a2 100644
> --- a/drivers/pci/intel-iommu.c
> +++ b/drivers/pci/intel-iommu.c
> @@ -1264,8 +1264,11 @@ static int domain_context_mapping_one(struct
> dmar_domain *domain,
>  	if (!context)
>  		return -ENOMEM;
>  	spin_lock_irqsave(&iommu->lock, flags);
> +	if (context_present(*context)) {
> +		spin_unlock_irqrestore(&iommu->lock, flags);
> +		return 0;
> +	}
>  
> -	context_clear_entry(*context);
>  	context_set_domain_id(*context, domain->id);
>  	context_set_address_width(*context, domain->agaw);
>  	context_set_address_root(*context, virt_to_phys(domain->pgd));
> 
> 
> Randy (Weidong)
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* RE: [PATCH] KVM: PCIPT: VT-d: fix context mapping
  2008-06-30 15:32                   ` Ben-Ami Yassour
@ 2008-07-01  2:22                     ` Han, Weidong
  0 siblings, 0 replies; 22+ messages in thread
From: Han, Weidong @ 2008-07-01  2:22 UTC (permalink / raw)
  To: Ben-Ami Yassour; +Cc: amit.shah, kvm, Muli Ben-Yehuda, raharper

Ben-Ami Yassour wrote:
> On Fri, 2008-06-20 at 14:23 +0800, Han, Weidong wrote:
>> Ben-Ami Yassour1 wrote:
>>> "Han, Weidong" <weidong.han@intel.com> wrote on 19/06/2008 17:18:00:
>>> 
>>>> Ben-Ami Yassour wrote:
>>>>> On Thu, 2008-06-19 at 16:59 +0800, Han, Weidong wrote:
>>>>>> benami@il.ibm.com wrote:
>>>>>>> From: Ben-Ami Yassour <benami@il.ibm.com>
>>>>>>> 
>>>>>>> When changing the VT-d context mapping, according to the spec,
>>>>>>> it is required to first set the context to not present, flush
>>>>>>> and only then apply the new context.
>>>>>>> 
>>>>>>> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
>>>>>>> ---
>>>>>>>  drivers/pci/intel-iommu.c |   17 +++++++++++++++++
>>>>>>>  1 files changed, 17 insertions(+), 0 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/pci/intel-iommu.c
>>>>>>> b/drivers/pci/intel-iommu.c index 930874f..dcdfa97 100644 ---
>>>>>>> a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c
>>>>>>> @@ -56,6 +56,7 @@
>>>>>>> 
>>>>>>> 
>>>>>>>  static void flush_unmaps_timeout(unsigned long data);
>>>>>>> +static void detach_domain_for_dev(struct dmar_domain *domain,
>>>>>>> u8 bus, u8 devfn); 
>>>>>>> 
>>>>>>>  DEFINE_TIMER(unmap_timer,  flush_unmaps_timeout, 0, 0);
>>>>>>> 
>>>>>>> @@ -1264,7 +1265,23 @@ static int
>>>>>>>     domain_context_mapping_one(struct dmar_domain *domain, if
>>>>>>>        (!context) return -ENOMEM;
>>>>>>>     spin_lock_irqsave(&iommu->lock, flags);
>>>>>>> +
>>>>>>> +   if (context_present(*context) &&
>>>>>>> +       (context_domain_id(*context) == domain->id) &&
>>>>>>> +       (context_address_width(*context) == domain->agaw) &&
>>>>>>> +       (context_address_root(*context) ==
>>>>>>> virt_to_phys(domain->pgd)) && +
>>>>>>> (context_translation_type(*context) == CONTEXT_TT_MULTI_LEVEL)
>>>>>>> && + (!context_fault_disable(*context))) {
>>>>>>> +      spin_unlock_irqrestore(&iommu->lock, flags); +     
>>>>>>> return 0; +   }
>>>>>> 
>>>>>> Only need to check context_present(*context) according to VT-d
>>>>>> spec, which says "software must not modify fields other than the
>>>>>> Present (P) field of currently present root-entries or
>>>>>> context-entries." 
>>>>>> 
>>>>>> Randy (Weidong)
>>>>> 
>>>>> The logic here is that, if no change is made to the context then
>>>>> just return ok (0). Otherwise, according to the spec, we need to
>>>>> first invalidate the context, flush it, and only then apply the
>>>>> changes to the context. 
>>>>> 
>>>>> The other option is to make sure that before every call to this
>>>>> function the context is invalidated, but disabling it inside the
>>>>> function seems safer. do you agree with that?
>>>>> 
>>>> 
>>>> After a device can be assigned to guest with VT-d, it needs a
>>>> context unmap function. When a device is assigned to a guest, map
>>>> context for it, while when it is detached from a guest, should
>>>> unmap its context. With the context unmap function, I think we
>>>> needn't to implement your logic in domain_context_mapping_one(),
>>>> instead just check its present. What's your opinion?
>>> 
>>> Sure, that's fine, this is the other option I mentioned.
>>> But we need to add the context unmap function.
>>> Something like:
>>> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
>>> index be775cd..4b96fbb 100644
>>> --- a/arch/x86/kvm/vtd.c
>>> +++ b/arch/x86/kvm/vtd.c
>>> @@ -109,11 +109,17 @@ found:
>>>                 kvm_iommu_unmap_memslots(kvm);
>>>                 return -EFAULT;
>>>         }
>>> +
>>> +       kvm_intel_iommu_detach_dev(kvm->arch.domain,
>>> +                                  pdev->bus->number, pdev->devfn);
>>>         + kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev); 
>>>  return 0; }
>>>  EXPORT_SYMBOL_GPL(kvm_iommu_map_guest);
>>> 
>>> Agree?
>>> 
>> 
>> I don't think it's necessary to add kvm_intel_iommu_detach_dev()
>> here. If a device can be assigned to a guest, it should not be used
>> by other guest (assuming no hotplug support). And also
>> kvm_intel_iommu_detach_dev() is already called in
>> kvm_iommu_unmap_guest(). Normally context won't be present here.
>> Otherwise, there should be some wrong. I attach a patch to change it
>> back to original kernel VT-d code. I think it's correct and clean.
> 
> The problem is with the following scenario:
> 1. load the host NIC driver
> 2. unload the host NIC driver
> 3. start kvm with passthrough for that NIC
> 
> In this case the context is not cleaned, and we get VT-d failures.
> I agree with changing the VT-d driver code back to original.
> But we do need:
> 
> diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
> index be775cd..4b96fbb 100644
> --- a/arch/x86/kvm/vtd.c
> +++ b/arch/x86/kvm/vtd.c
> @@ -109,11 +109,17 @@ found:
>                 kvm_iommu_unmap_memslots(kvm);
>                 return -EFAULT;
>         }
> +
> +       kvm_intel_iommu_detach_dev(kvm->arch.domain,
> +                                  pdev->bus->number, pdev->devfn);
> +
>         kvm_intel_iommu_context_mapping(kvm->arch.domain, pdev);
>         return 0;
>  }
>  EXPORT_SYMBOL_GPL(kvm_iommu_map_guest);
> 
> Otherwise kvm_intel_iommu_context_mapping will exit on the
> context_present(*context) check. 
> This means that the context is still going to point to the iova of
> the host NIC. 
> If you do not agree, please explain what code path clears the context
> written for the host NIC driver in this scenario (note that
> empirically I see that my assumption is correct). 
> 

Yes, in this load/unload case, kvm_intel_iommu_detach_dev() is necessary, because the device will be mapped in host VT-d page table when its driver is loaded, but it is not unmapped when the driver is unloaded.

Randy (Weidong)

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-07-01  2:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-17 19:05 Patches for vtd pci passthrough benami
2008-06-17 19:05 ` [PATCH] KVM: PCIPT: fix compilation errors benami
2008-06-17 19:05   ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap benami
2008-06-17 19:05     ` [PATCH] KVM: PCIPT: free device structure on vm destroy benami
2008-06-17 19:05       ` [PATCH] KVM: PCIPT: VT-d: fix context mapping benami
2008-06-17 19:25         ` KVM: PCIPT: temporary fix for pio (userspace part) benami
2008-06-17 19:25           ` [PATCH] " benami
2008-06-18 20:30         ` [PATCH] KVM: PCIPT: VT-d: fix context mapping Muli Ben-Yehuda
2008-06-19  8:59         ` Han, Weidong
2008-06-19 12:28           ` Ben-Ami Yassour
2008-06-19 14:18             ` Han, Weidong
2008-06-19 17:44               ` Ben-Ami Yassour1
2008-06-20  6:23                 ` Han, Weidong
2008-06-30 15:32                   ` Ben-Ami Yassour
2008-07-01  2:22                     ` Han, Weidong
2008-06-17 21:29     ` [PATCH] KVM: PCIPT: VT-d: fix guest unmap Anthony Liguori
2008-06-18 12:06       ` Ben-Ami Yassour
2008-06-18 20:48         ` Anthony Liguori
2008-06-18 21:23           ` Muli Ben-Yehuda
2008-06-18 21:41             ` Anthony Liguori
2008-06-20 18:59       ` Avi Kivity
2008-06-20 19:28         ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox