xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] VT-d flush issue
@ 2015-12-12 13:21 Quan Xu
  2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Quan Xu @ 2015-12-12 13:21 UTC (permalink / raw)
  To: jbeulich, kevin.tian
  Cc: feng.wu, eddie.dong, george.dunlap, andrew.cooper3, tim,
	xen-devel, jun.nakajima, Quan Xu, keir

This patches are based on Kevin Tian's previous discussion 'Revisit VT-d asynchronous flush issue'.
Fix current timeout concern and also allow limited ATS support in a light way:

1. Reduce spin timeout to 1ms, which can be boot-time changed with 'iommu_qi_timeout_ms'.
   For example:
           multiboot /boot/xen.gz ats=1 iommu_qi_timeout_ms=100

2. Fix vt-d flush timeout issue.

    If IOTLB/Context/IETC flush is timeout, we should think all devices under this IOMMU cannot function correctly.
    So for each device under this IOMMU we'll mark it as unassignable and kill the domain owning the device.

    If Device-TLB flush is timeout, we'll mark the target ATS device as unassignable and kill the domain owning
    this device.

    If impacted domain is hardware domain, just throw out a warning. It's an open here whether we want to kill
    hardware domain (or directly panic hypervisor). Comments are welcomed.

    Device marked as unassignable will be disallowed to be further assigned to any domain.

--

 * DMAR_OPERATION_TIMEOUT should be also chopped down to a low number of milliseconds.
   As Kevin Tian mentioned in 'Revisit VT-d asynchronous flush issue', We also confirmed with hardware team
   that 1ms is large enough for IOMMU internal flush. So I can change DMAR_OPERATION_TIMEOUT from 1000 ms to 1 ms.

   IOMMU_WAIT_OP() is only for VT-d registers read/write, and there is also a panic. We need a further discussion
   whether or how to remove this panic in next patch set. 


 * Kevin Tian did basic functional review.

--Changes in v3:
   1. Once found you can break the for loop immediately since BDF is unique.

   2. Separate invalidate_timeout(struct iommu *iommu, int type, u16 did, u16 seg, u8 bus, u8 devfn)
      into invalidate_timeout(struct iommu *iommu) and device_tlb_invalidate_timeout(struct iommu *iommu,
      u16 did, u16 seg, u8 bus, u8 devfn).
      invalidate_timeout() is for iotlb/iec/context flush error.
      device_tlb_invalidate_timeout() is for Device-TLB flush error. 
      then ignore these INVALID_* parameters.

   3. Change macros(mark_pdev_unassignable/IS_PDEV_UNASSIGNABLE) into static inline functions.


Quan Xu (2):
  VT-d: Reduce spin timeout to 1ms, which can be boot-time changed.
  VT-d: Fix vt-d flush timeout issue.

 xen/drivers/passthrough/vtd/extern.h  |  5 ++
 xen/drivers/passthrough/vtd/iommu.c   |  6 +++
 xen/drivers/passthrough/vtd/qinval.c  | 89 +++++++++++++++++++++++++++++++++--
 xen/drivers/passthrough/vtd/x86/ats.c | 16 +++++++
 xen/include/xen/pci.h                 | 11 +++++
 5 files changed, 123 insertions(+), 4 deletions(-)

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed.
  2015-12-12 13:21 [PATCH v3 0/2] VT-d flush issue Quan Xu
@ 2015-12-12 13:21 ` Quan Xu
  2015-12-14  9:14   ` Jan Beulich
  2015-12-12 13:21 ` [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue Quan Xu
  2015-12-19 14:01 ` [PATCH v3 0/2] VT-d flush issue Xu, Quan
  2 siblings, 1 reply; 50+ messages in thread
From: Quan Xu @ 2015-12-12 13:21 UTC (permalink / raw)
  To: jbeulich, kevin.tian
  Cc: feng.wu, eddie.dong, george.dunlap, andrew.cooper3, tim,
	xen-devel, jun.nakajima, Quan Xu, keir

Signed-off-by: Quan Xu <quan.xu@intel.com>
---
 xen/drivers/passthrough/vtd/qinval.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c
index b81b0bd..990baf2 100644
--- a/xen/drivers/passthrough/vtd/qinval.c
+++ b/xen/drivers/passthrough/vtd/qinval.c
@@ -28,6 +28,11 @@
 #include "vtd.h"
 #include "extern.h"
 
+static int __read_mostly iommu_qi_timeout_ms = 1;
+integer_param("iommu_qi_timeout_ms", iommu_qi_timeout_ms);
+
+#define IOMMU_QI_TIMEOUT (iommu_qi_timeout_ms * MILLISECS(1))
+
 static void print_qi_regs(struct iommu *iommu)
 {
     u64 val;
@@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu *iommu,
         start_time = NOW();
         while ( poll_slot != QINVAL_STAT_DONE )
         {
-            if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) )
+            if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) )
             {
                 print_qi_regs(iommu);
-                panic("queue invalidate wait descriptor was not executed");
+                dprintk(XENLOG_WARNING VTDPREFIX,
+                        "Queue invalidate wait descriptor was timeout.\n");
+                return -ETIMEDOUT;
             }
             cpu_relax();
         }
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-12 13:21 [PATCH v3 0/2] VT-d flush issue Quan Xu
  2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu
@ 2015-12-12 13:21 ` Quan Xu
  2015-12-14  9:28   ` Jan Beulich
  2015-12-19 14:01 ` [PATCH v3 0/2] VT-d flush issue Xu, Quan
  2 siblings, 1 reply; 50+ messages in thread
From: Quan Xu @ 2015-12-12 13:21 UTC (permalink / raw)
  To: jbeulich, kevin.tian
  Cc: feng.wu, eddie.dong, george.dunlap, andrew.cooper3, tim,
	xen-devel, jun.nakajima, Quan Xu, keir

If IOTLB/Context/IETC flush is timeout, we should think
all devices under this IOMMU cannot function correctly.
So for each device under this IOMMU we'll mark it as
unassignable and kill the domain owning the device.

If Device-TLB flush is timeout, we'll mark the target
ATS device as unassignable and kill the domain owning
this device.

If impacted domain is hardware domain, just throw out
a warning. It's an open here whether we want to kill
hardware domain (or directly panic hypervisor). Comments
are welcomed.

Device marked as unassignable will be disallowed to be
further assigned to any domain.

Signed-off-by: Quan Xu <quan.xu@intel.com>
---
 xen/drivers/passthrough/vtd/extern.h  |  5 +++
 xen/drivers/passthrough/vtd/iommu.c   |  6 +++
 xen/drivers/passthrough/vtd/qinval.c  | 78 ++++++++++++++++++++++++++++++++++-
 xen/drivers/passthrough/vtd/x86/ats.c | 16 +++++++
 xen/include/xen/pci.h                 | 11 +++++
 5 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index 8acf889..96c1a28 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -62,6 +62,11 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
 int qinval_device_iotlb(struct iommu *iommu,
                         u32 max_invs_pend, u16 sid, u16 size, u64 addr);
 
+void invalidate_timeout(struct iommu *iommu);
+void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did,
+                                   u16 seg, u8 bus, u8 devfn);
+int invalidate_sync(struct iommu *iommu);
+
 unsigned int get_cache_line_size(void);
 void cacheline_flush(char *);
 void flush_all_cache(void);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index dd13865..f9a5d66 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
     if ( !pdev->domain )
         return -EINVAL;
 
+    if ( is_pdev_unassignable(pdev) )
+        return -EACCES;
+
     ret = domain_context_mapping(pdev->domain, devfn, pdev);
     if ( ret )
     {
@@ -2301,6 +2304,9 @@ static int intel_iommu_assign_device(
     if ( list_empty(&acpi_drhd_units) )
         return -ENODEV;
 
+    if ( is_pdev_unassignable(pdev) )
+        return -EACCES;
+
     seg = pdev->seg;
     bus = pdev->bus;
     /*
diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c
index 990baf2..c514f99 100644
--- a/xen/drivers/passthrough/vtd/qinval.c
+++ b/xen/drivers/passthrough/vtd/qinval.c
@@ -27,12 +27,58 @@
 #include "dmar.h"
 #include "vtd.h"
 #include "extern.h"
+#include "../ats.h"
 
 static int __read_mostly iommu_qi_timeout_ms = 1;
 integer_param("iommu_qi_timeout_ms", iommu_qi_timeout_ms);
 
 #define IOMMU_QI_TIMEOUT (iommu_qi_timeout_ms * MILLISECS(1))
 
+void invalidate_timeout(struct iommu *iommu)
+{
+    struct domain *d;
+    unsigned long nr_dom, i;
+    struct pci_dev *pdev;
+
+    nr_dom = cap_ndoms(iommu->cap);
+    i = find_first_bit(iommu->domid_bitmap, nr_dom);
+    while ( i < nr_dom ) {
+        d = rcu_lock_domain_by_id(iommu->domid_map[i]);
+        ASSERT(d);
+
+        /* Mark the devices as unassignable. */
+        for_each_pdev(d, pdev)
+            mark_pdev_unassignable(pdev);
+        if ( !is_hardware_domain(d) )
+            domain_kill(d);
+
+        rcu_unlock_domain(d);
+        i = find_next_bit(iommu->domid_bitmap, nr_dom, i + 1);
+    }
+}
+
+void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did,
+                                   u16 seg, u8 bus, u8 devfn)
+{
+    struct domain *d;
+    struct pci_dev *pdev;
+
+        d = rcu_lock_domain_by_id(iommu->domid_map[did]);
+        ASSERT(d);
+        for_each_pdev(d, pdev)
+            if ( (pdev->seg == seg) &&
+                 (pdev->bus == bus) &&
+                 (pdev->devfn == devfn) )
+            {
+                mark_pdev_unassignable(pdev);
+                break;
+            }
+
+        if ( !is_hardware_domain(d) )
+            domain_kill(d);
+        rcu_unlock_domain(d);
+}
+
 static void print_qi_regs(struct iommu *iommu)
 {
     u64 val;
@@ -187,7 +233,7 @@ static int queue_invalidate_wait(struct iommu *iommu,
     return -EOPNOTSUPP;
 }
 
-static int invalidate_sync(struct iommu *iommu)
+int invalidate_sync(struct iommu *iommu)
 {
     struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu);
 
@@ -262,6 +308,14 @@ static int __iommu_flush_iec(struct iommu *iommu, u8 granu, u8 im, u16 iidx)
 
     queue_invalidate_iec(iommu, granu, im, iidx);
     ret = invalidate_sync(iommu);
+
+    if ( ret == -ETIMEDOUT )
+    {
+        invalidate_timeout(iommu);
+        dprintk(XENLOG_WARNING VTDPREFIX,
+                "IEC flush timeout.\n");
+        return ret;
+    }
     /*
      * reading vt-d architecture register will ensure
      * draining happens in implementation independent way.
@@ -308,6 +362,13 @@ static int flush_context_qi(
         queue_invalidate_context(iommu, did, sid, fm,
                                  type >> DMA_CCMD_INVL_GRANU_OFFSET);
         ret = invalidate_sync(iommu);
+        if ( ret == -ETIMEDOUT )
+        {
+            invalidate_timeout(iommu);
+            dprintk(XENLOG_WARNING  VTDPREFIX,
+                    "Context flush timeout.\n");
+            return ret;
+        }
     }
     return ret;
 }
@@ -349,9 +410,22 @@ static int flush_iotlb_qi(
         queue_invalidate_iotlb(iommu,
                                type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr,
                                dw, did, size_order, 0, addr);
+
+        /*
+         * Synchronize with hardware for invalidation request descriptors
+         * submitted before Device-TLB invalidate descriptor.
+         */
+        rc = invalidate_sync(iommu);
+        if ( rc == -ETIMEDOUT )
+        {
+            invalidate_timeout(iommu);
+            dprintk(XENLOG_WARNING VTDPREFIX, "IOTLB flush timeout.\n");
+            return rc;
+        }
+
         if ( flush_dev_iotlb )
             ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type);
-        rc = invalidate_sync(iommu);
+
         if ( !ret )
             ret = rc;
     }
diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
index 7c797f6..973dde7 100644
--- a/xen/drivers/passthrough/vtd/x86/ats.c
+++ b/xen/drivers/passthrough/vtd/x86/ats.c
@@ -156,6 +156,22 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did,
 
             rc = qinval_device_iotlb(iommu, pdev->ats_queue_depth,
                                      sid, sbit, addr);
+
+            /*
+             * Synchronize with hardware for Device-TLB invalidate
+             * descriptor.
+             */
+            rc = invalidate_sync(iommu);
+
+            if ( rc == -ETIMEDOUT )
+            {
+                device_tlb_invalidate_timeout(iommu, did, pdev->seg, pdev->bus,
+                                              pdev->devfn);
+                dprintk(XENLOG_WARNING VTDPREFIX,
+                        "Device-TLB flush timeout.\n");
+                return rc;
+            }
+
             break;
         default:
             dprintk(XENLOG_WARNING VTDPREFIX, "invalid vt-d flush type\n");
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index a5aef55..21a0af6 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -41,6 +41,7 @@
 struct pci_dev_info {
     bool_t is_extfn;
     bool_t is_virtfn;
+    bool_t is_unassignable;
     struct {
         u8 bus;
         u8 devfn;
@@ -88,6 +89,16 @@ struct pci_dev {
 #define for_each_pdev(domain, pdev) \
     list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list)
 
+static inline void mark_pdev_unassignable(struct pci_dev *pdev)
+{
+    pdev->info.is_unassignable = 1;
+}
+
+static inline bool_t is_pdev_unassignable(const struct pci_dev *pdev)
+{
+    return pdev->info.is_unassignable;
+}
+
 /*
  * The pcidevs_lock protect alldevs_list, and the assignment for the 
  * devices, it also sync the access to the msi capability that is not
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed.
  2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu
@ 2015-12-14  9:14   ` Jan Beulich
  2015-12-15  8:35     ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-14  9:14 UTC (permalink / raw)
  To: Quan Xu
  Cc: kevin.tian, feng.wu, george.dunlap, andrew.cooper3, tim,
	xen-devel, jun.nakajima, keir

>>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu *iommu,
>          start_time = NOW();
>          while ( poll_slot != QINVAL_STAT_DONE )
>          {
> -            if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) )
> +            if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) )
>              {
>                  print_qi_regs(iommu);
> -                panic("queue invalidate wait descriptor was not executed");
> +                dprintk(XENLOG_WARNING VTDPREFIX,
> +                        "Queue invalidate wait descriptor was timeout.\n");
> +                return -ETIMEDOUT;
>              }

Without the v2 discussion even having finished, and without you
having taken care of v2 comments here, I don't see much value in
this v3.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-12 13:21 ` [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue Quan Xu
@ 2015-12-14  9:28   ` Jan Beulich
  2015-12-15  8:15     ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-14  9:28 UTC (permalink / raw)
  To: Quan Xu
  Cc: kevin.tian, feng.wu, george.dunlap, andrew.cooper3, tim,
	xen-devel, jun.nakajima, keir

>>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>      if ( !pdev->domain )
>          return -EINVAL;
>  
> +    if ( is_pdev_unassignable(pdev) )
> +        return -EACCES;

Is this case possible at all (i.e. a newly added device being
unassignable)?

> --- a/xen/drivers/passthrough/vtd/qinval.c
> +++ b/xen/drivers/passthrough/vtd/qinval.c
> @@ -27,12 +27,58 @@
>  #include "dmar.h"
>  #include "vtd.h"
>  #include "extern.h"
> +#include "../ats.h"
>  
>  static int __read_mostly iommu_qi_timeout_ms = 1;
>  integer_param("iommu_qi_timeout_ms", iommu_qi_timeout_ms);
>  
>  #define IOMMU_QI_TIMEOUT (iommu_qi_timeout_ms * MILLISECS(1))
>  
> +void invalidate_timeout(struct iommu *iommu)
> +{
> +    struct domain *d;
> +    unsigned long nr_dom, i;
> +    struct pci_dev *pdev;
> +
> +    nr_dom = cap_ndoms(iommu->cap);
> +    i = find_first_bit(iommu->domid_bitmap, nr_dom);
> +    while ( i < nr_dom ) {
> +        d = rcu_lock_domain_by_id(iommu->domid_map[i]);
> +        ASSERT(d);
> +
> +        /* Mark the devices as unassignable. */
> +        for_each_pdev(d, pdev)
> +            mark_pdev_unassignable(pdev);
> +        if ( !is_hardware_domain(d) )
> +            domain_kill(d);

DYM domain_crash() here?

> +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did,
> +                                   u16 seg, u8 bus, u8 devfn)
> +{
> +    struct domain *d;
> +    struct pci_dev *pdev;
> +
> +        d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> +        ASSERT(d);
> +        for_each_pdev(d, pdev)
> +            if ( (pdev->seg == seg) &&
> +                 (pdev->bus == bus) &&
> +                 (pdev->devfn == devfn) )
> +            {
> +                mark_pdev_unassignable(pdev);
> +                break;
> +            }
> +
> +        if ( !is_hardware_domain(d) )
> +            domain_kill(d);
> +        rcu_unlock_domain(d);
> +}

Except for the variable declarations, indentation is broken for the
entire function.

> @@ -262,6 +308,14 @@ static int __iommu_flush_iec(struct iommu *iommu, u8 granu, u8 im, u16 iidx)
>  
>      queue_invalidate_iec(iommu, granu, im, iidx);
>      ret = invalidate_sync(iommu);
> +
> +    if ( ret == -ETIMEDOUT )
> +    {
> +        invalidate_timeout(iommu);
> +        dprintk(XENLOG_WARNING VTDPREFIX,
> +                "IEC flush timeout.\n");
> +        return ret;
> +    }
>      /*

Considering the recurring pattern, wouldn't it be better for
invalidate_sync() to invoke invalidate_timeout() (at once making
sure no current or future caller misses the need to do so)?

Also please insert the blank line at the end of your additions, and
no trailing full stops in log messages please.

> @@ -88,6 +89,16 @@ struct pci_dev {
>  #define for_each_pdev(domain, pdev) \
>      list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list)
>  
> +static inline void mark_pdev_unassignable(struct pci_dev *pdev)
> +{
> +    pdev->info.is_unassignable = 1;
> +}
> +
> +static inline bool_t is_pdev_unassignable(const struct pci_dev *pdev)
> +{
> +    return pdev->info.is_unassignable;
> +}

Are you aware that we already have a mechanism to prevent
assignment (via pci_{ro,hide}_device())? I think at the very least
this check should consider both variants. Whether fully using the
existing mechanism for your purpose is feasible I can't immediately
tell (since the ownership change may be problematic at the points
where you want the flagging to happen).

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-14  9:28   ` Jan Beulich
@ 2015-12-15  8:15     ` Xu, Quan
  2015-12-15  9:16       ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-15  8:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

>On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct
> pci_dev *pdev)
> >      if ( !pdev->domain )
> >          return -EINVAL;
> >
> > +    if ( is_pdev_unassignable(pdev) )
> > +        return -EACCES;
> 
> Is this case possible at all (i.e. a newly added device being unassignable)?
> 

IMO, it is possible, and I have checked and printed the invoke flow when Xen init.
iommu_setup() / iommu_hwdom_init() are called before pci_add_device() when Xen init.
If it flushes error in iommu_setup() / iommu_hwdom_init(), it will mark device as unassignable.
So it is possible. right?

> > --- a/xen/drivers/passthrough/vtd/qinval.c
> > +++ b/xen/drivers/passthrough/vtd/qinval.c
> > @@ -27,12 +27,58 @@
> >  #include "dmar.h"
> >  #include "vtd.h"
> >  #include "extern.h"
> > +#include "../ats.h"
> >
> >  static int __read_mostly iommu_qi_timeout_ms = 1;
> > integer_param("iommu_qi_timeout_ms", iommu_qi_timeout_ms);
> >
> >  #define IOMMU_QI_TIMEOUT (iommu_qi_timeout_ms * MILLISECS(1))
> >
> > +void invalidate_timeout(struct iommu *iommu) {
> > +    struct domain *d;
> > +    unsigned long nr_dom, i;
> > +    struct pci_dev *pdev;
> > +
> > +    nr_dom = cap_ndoms(iommu->cap);
> > +    i = find_first_bit(iommu->domid_bitmap, nr_dom);
> > +    while ( i < nr_dom ) {
> > +        d = rcu_lock_domain_by_id(iommu->domid_map[i]);
> > +        ASSERT(d);
> > +
> > +        /* Mark the devices as unassignable. */
> > +        for_each_pdev(d, pdev)
> > +            mark_pdev_unassignable(pdev);
> > +        if ( !is_hardware_domain(d) )
> > +            domain_kill(d);
> 
> DYM domain_crash() here?
> 

Agreed.


> > +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did,
> > +                                   u16 seg, u8 bus, u8 devfn) {
> > +    struct domain *d;
> > +    struct pci_dev *pdev;
> > +
> > +        d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > +        ASSERT(d);
> > +        for_each_pdev(d, pdev)
> > +            if ( (pdev->seg == seg) &&
> > +                 (pdev->bus == bus) &&
> > +                 (pdev->devfn == devfn) )
> > +            {
> > +                mark_pdev_unassignable(pdev);
> > +                break;
> > +            }
> > +
> > +        if ( !is_hardware_domain(d) )
> > +            domain_kill(d);
> > +        rcu_unlock_domain(d);
> > +}
> 
> Except for the variable declarations, indentation is broken for the entire
> function.
> 

I will modify it in v4.

> > @@ -262,6 +308,14 @@ static int __iommu_flush_iec(struct iommu *iommu,
> > u8 granu, u8 im, u16 iidx)
> >
> >      queue_invalidate_iec(iommu, granu, im, iidx);
> >      ret = invalidate_sync(iommu);
> > +
> > +    if ( ret == -ETIMEDOUT )
> > +    {
> > +        invalidate_timeout(iommu);
> > +        dprintk(XENLOG_WARNING VTDPREFIX,
> > +                "IEC flush timeout.\n");
> > +        return ret;
> > +    }
> >      /*
> 
> Considering the recurring pattern, wouldn't it be better for
> invalidate_sync() to invoke invalidate_timeout() (at once making sure no current
> or future caller misses the need to do so)?
> 

Now invalidate_sync() is a common function for iec/iotlb/context/device-tlb invalidation.
It is different to deal with the flush error.
For device-tlb, it needs some parameters of device, such device seg/bus/devfn.
But iec/iotlb/context don't need these parameters.

Could I introduce device_tlb_invalidate_sync(struct iommu *iommu, u16 did, u16 seg, u8 bus, u8 devfn) for device-tlb flush?
invalidate_sync(struct iommu *iommu) is for iec/iotlb/context.


> Also please insert the blank line at the end of your additions, and no trailing full
> stops in log messages please.
> 

I will modify it in v4.


> > @@ -88,6 +89,16 @@ struct pci_dev {
> >  #define for_each_pdev(domain, pdev) \
> >      list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list)
> >
> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) {
> > +    pdev->info.is_unassignable = 1;
> > +}
> > +
> > +static inline bool_t is_pdev_unassignable(const struct pci_dev *pdev)
> > +{
> > +    return pdev->info.is_unassignable; }
> 
> Are you aware that we already have a mechanism to prevent assignment (via
> pci_{ro,hide}_device())? I think at the very least this check should consider both
> variants. Whether fully using the existing mechanism for your purpose is
> feasible I can't immediately tell (since the ownership change may be
> problematic at the points where you want the flagging to happen).

pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it.
for this case, IMO I think a flag is a better solution.


-Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed.
  2015-12-14  9:14   ` Jan Beulich
@ 2015-12-15  8:35     ` Xu, Quan
  2015-12-15  9:18       ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-15  8:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 14.12.2015 at 5:14pm, <JBeulich@suse.com> wrote:
> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> > @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu
> *iommu,
> >          start_time = NOW();
> >          while ( poll_slot != QINVAL_STAT_DONE )
> >          {
> > -            if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) )
> > +            if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) )
> >              {
> >                  print_qi_regs(iommu);
> > -                panic("queue invalidate wait descriptor was not
> executed");
> > +                dprintk(XENLOG_WARNING VTDPREFIX,
> > +                        "Queue invalidate wait descriptor was
> timeout.\n");
> > +                return -ETIMEDOUT;
> >              }
> 
> Without the v2 discussion even having finished, and without you having taken
> care of v2 comments here, I don't see much value in this v3.
> 

I should check for _any_ kind_of_error and all_the_way_up_the_call_trees in [patch 1/2].

1. For _any_ kind_of_error,
now it includes '-ETIMEDOUT' and '-EOPNOTSUPP'. I should check the return code, if it is _not_ZERO.
The callers need to also either pass it on or deal with it. (if the return code is _not_ZERO, it should check whether it is '-ETIMEDOUT' in [patch 2/2])

2. all_the_way_up_the_call_trees

.i.e
enable_intremap()
|--iommu_flush_iec_global()
  |--__iommu_flush_iec()
    |--invalidate_sync()

In enable_intremap(), if iommu_flush_iec_global()  is failed, enable_intremap() should deal with this error and then return.

right? 
Jan, thanks for your patience.

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15  8:15     ` Xu, Quan
@ 2015-12-15  9:16       ` Jan Beulich
  2015-12-15 10:24         ` Xu, Quan
                           ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Jan Beulich @ 2015-12-15  9:16 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Jun Nakajima, keir@xen.org

>>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
>> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
>> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
>> > --- a/xen/drivers/passthrough/vtd/iommu.c
>> > +++ b/xen/drivers/passthrough/vtd/iommu.c
>> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct
>> pci_dev *pdev)
>> >      if ( !pdev->domain )
>> >          return -EINVAL;
>> >
>> > +    if ( is_pdev_unassignable(pdev) )
>> > +        return -EACCES;
>> 
>> Is this case possible at all (i.e. a newly added device being unassignable)?
>> 
> 
> IMO, it is possible, and I have checked and printed the invoke flow when Xen init.
> iommu_setup() / iommu_hwdom_init() are called before pci_add_device() when Xen init.
> If it flushes error in iommu_setup() / iommu_hwdom_init(), it will mark 
> device as unassignable.

Please be more precise: I can't see how e.g. iommu_setup() would
manage to time out on invalidation on a particular device. I.e.
please provide an exact scenario where the flag could get set
before the device gets reported through iommu_add_device().

Furthermore I think that boot time invalidations shouldn't lead to
any devices getting marked un-assignable. Instead such would
probably better result in the IOMMU to be turned of altogether.

>> > @@ -262,6 +308,14 @@ static int __iommu_flush_iec(struct iommu *iommu,
>> > u8 granu, u8 im, u16 iidx)
>> >
>> >      queue_invalidate_iec(iommu, granu, im, iidx);
>> >      ret = invalidate_sync(iommu);
>> > +
>> > +    if ( ret == -ETIMEDOUT )
>> > +    {
>> > +        invalidate_timeout(iommu);
>> > +        dprintk(XENLOG_WARNING VTDPREFIX,
>> > +                "IEC flush timeout.\n");
>> > +        return ret;
>> > +    }
>> >      /*
>> 
>> Considering the recurring pattern, wouldn't it be better for
>> invalidate_sync() to invoke invalidate_timeout() (at once making sure no current
>> or future caller misses the need to do so)?
> 
> Now invalidate_sync() is a common function for iec/iotlb/context/device-tlb 
> invalidation.
> It is different to deal with the flush error.
> For device-tlb, it needs some parameters of device, such device 
> seg/bus/devfn.
> But iec/iotlb/context don't need these parameters.

Ah, I see, there's indeed one case where the handling is different.
Never mind then.

>> > @@ -88,6 +89,16 @@ struct pci_dev {
>> >  #define for_each_pdev(domain, pdev) \
>> >      list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list)
>> >
>> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) {
>> > +    pdev->info.is_unassignable = 1;
>> > +}
>> > +
>> > +static inline bool_t is_pdev_unassignable(const struct pci_dev *pdev)
>> > +{
>> > +    return pdev->info.is_unassignable; }
>> 
>> Are you aware that we already have a mechanism to prevent assignment (via
>> pci_{ro,hide}_device())? I think at the very least this check should 
> consider both
>> variants. Whether fully using the existing mechanism for your purpose is
>> feasible I can't immediately tell (since the ownership change may be
>> problematic at the points where you want the flagging to happen).
> 
> pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it.
> for this case, IMO I think a flag is a better solution.

I can't really judge on this without know details of the crash. As said,
I'd prefer to have just a single mechanism, and would accept a second
one only with proper justification (i.e. more than "is dangerous" or
"makes xen crash" - after all that may also be a result of the functions
not being used as necessary).

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed.
  2015-12-15  8:35     ` Xu, Quan
@ 2015-12-15  9:18       ` Jan Beulich
  2015-12-15 10:10         ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-15  9:18 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Jun Nakajima, keir@xen.org

>>> On 15.12.15 at 09:35, <quan.xu@intel.com> wrote:
>>  On 14.12.2015 at 5:14pm, <JBeulich@suse.com> wrote:
>> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
>> > @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu
>> *iommu,
>> >          start_time = NOW();
>> >          while ( poll_slot != QINVAL_STAT_DONE )
>> >          {
>> > -            if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) )
>> > +            if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) )
>> >              {
>> >                  print_qi_regs(iommu);
>> > -                panic("queue invalidate wait descriptor was not
>> executed");
>> > +                dprintk(XENLOG_WARNING VTDPREFIX,
>> > +                        "Queue invalidate wait descriptor was
>> timeout.\n");
>> > +                return -ETIMEDOUT;
>> >              }
>> 
>> Without the v2 discussion even having finished, and without you having taken
>> care of v2 comments here, I don't see much value in this v3.
>> 
> 
> I should check for _any_ kind_of_error and all_the_way_up_the_call_trees in 
> [patch 1/2].

As said, perhaps better in patch 1/3.

> 1. For _any_ kind_of_error,
> now it includes '-ETIMEDOUT' and '-EOPNOTSUPP'. I should check the return 
> code, if it is _not_ZERO.
> The callers need to also either pass it on or deal with it. (if the return 
> code is _not_ZERO, it should check whether it is '-ETIMEDOUT' in [patch 2/2])
> 
> 2. all_the_way_up_the_call_trees
> 
> .i.e
> enable_intremap()
> |--iommu_flush_iec_global()
>   |--__iommu_flush_iec()
>     |--invalidate_sync()
> 
> In enable_intremap(), if iommu_flush_iec_global()  is failed, 
> enable_intremap() should deal with this error and then return.

Yes.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed.
  2015-12-15  9:18       ` Jan Beulich
@ 2015-12-15 10:10         ` Xu, Quan
  0 siblings, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-15 10:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

>On 15.12.2015 at 5:18pm, <JBeulich@suse.com> wrote:
> >>> On 15.12.15 at 09:35, <quan.xu@intel.com> wrote:
> >>  On 14.12.2015 at 5:14pm, <JBeulich@suse.com> wrote:
> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> >> > @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu
> >> *iommu,
> >> >          start_time = NOW();
> >> >          while ( poll_slot != QINVAL_STAT_DONE )
> >> >          {
> >> > -            if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) )
> >> > +            if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) )
> >> >              {
> >> >                  print_qi_regs(iommu);
> >> > -                panic("queue invalidate wait descriptor was not
> >> executed");
> >> > +                dprintk(XENLOG_WARNING VTDPREFIX,
> >> > +                        "Queue invalidate wait descriptor was
> >> timeout.\n");
> >> > +                return -ETIMEDOUT;
> >> >              }
> >>
> >> Without the v2 discussion even having finished, and without you
> >> having taken care of v2 comments here, I don't see much value in this v3.
> >>
> >
> > I should check for _any_ kind_of_error and
> > all_the_way_up_the_call_trees in [patch 1/2].
> 
> As said, perhaps better in patch 1/3.

Agreed.

> 
> > 1. For _any_ kind_of_error,
> > now it includes '-ETIMEDOUT' and '-EOPNOTSUPP'. I should check the
> > return code, if it is _not_ZERO.
> > The callers need to also either pass it on or deal with it. (if the
> > return code is _not_ZERO, it should check whether it is '-ETIMEDOUT'
> > in [patch 2/2])
> >
> > 2. all_the_way_up_the_call_trees
> >
> > .i.e
> > enable_intremap()
> > |--iommu_flush_iec_global()
> >   |--__iommu_flush_iec()
> >     |--invalidate_sync()
> >
> > In enable_intremap(), if iommu_flush_iec_global()  is failed,
> > enable_intremap() should deal with this error and then return.
> 
> Yes.

Good..

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15  9:16       ` Jan Beulich
@ 2015-12-15 10:24         ` Xu, Quan
  2015-12-15 12:32           ` Jan Beulich
  2015-12-15 10:58         ` Xu, Quan
  2015-12-15 12:23         ` Xu, Quan
  2 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-15 10:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote:
> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> >> > --- a/xen/drivers/passthrough/vtd/iommu.c
> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn,
> >> > struct
> >> pci_dev *pdev)
> >> >      if ( !pdev->domain )
> >> >          return -EINVAL;
> >> >
> >> > +    if ( is_pdev_unassignable(pdev) )
> >> > +        return -EACCES;
> >>
> >> Is this case possible at all (i.e. a newly added device being unassignable)?
> >>
> >
> > IMO, it is possible, and I have checked and printed the invoke flow when Xen
> init.
> > iommu_setup() / iommu_hwdom_init() are called before pci_add_device()
> when Xen init.
> > If it flushes error in iommu_setup() / iommu_hwdom_init(), it will
> > mark device as unassignable.
> 
> Please be more precise: I can't see how e.g. iommu_setup() would manage to
> time out on invalidation on a particular device. I.e.
> please provide an exact scenario where the flag could get set before the device
> gets reported through iommu_add_device().
> 
> Furthermore I think that boot time invalidations shouldn't lead to any devices
> getting marked un-assignable. Instead such would probably better result in the
> IOMMU to be turned of altogether.


...
construct_dom0()
|--iommu_hwdom_init()
  |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for x86]
    |--iommu_flush_all()
    ...


If it is failed in iommu_flush_all(), it will mask all of dom0's devices as unassignable.
And all of that called before pci_add_device() ..   right?


> 
> >> > @@ -88,6 +89,16 @@ struct pci_dev {  #define for_each_pdev(domain,
> >> > pdev) \
> >> >      list_for_each_entry(pdev, &(domain->arch.pdev_list),
> >> > domain_list)
> >> >
> >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) {
> >> > +    pdev->info.is_unassignable = 1; }
> >> > +
> >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev
> >> > +*pdev) {
> >> > +    return pdev->info.is_unassignable; }
> >>
> >> Are you aware that we already have a mechanism to prevent assignment
> >> (via pci_{ro,hide}_device())? I think at the very least this check
> >> should
> > consider both
> >> variants. Whether fully using the existing mechanism for your purpose
> >> is feasible I can't immediately tell (since the ownership change may
> >> be problematic at the points where you want the flagging to happen).
> >
> > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it.
> > for this case, IMO I think a flag is a better solution.
> 
> I can't really judge on this without know details of the crash. As said, I'd prefer
> to have just a single mechanism, and would accept a second one only with
> proper justification (i.e. more than "is dangerous" or "makes xen crash" - after
> all that may also be a result of the functions not being used as necessary).
> 


I need do some more investigation and try a second one only with proper justification.


Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15  9:16       ` Jan Beulich
  2015-12-15 10:24         ` Xu, Quan
@ 2015-12-15 10:58         ` Xu, Quan
  2015-12-15 12:23         ` Xu, Quan
  2 siblings, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-15 10:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote:
> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> >> > --- a/xen/drivers/passthrough/vtd/iommu.c
> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, 
> >> > struct
> >> pci_dev *pdev)
> >> >      if ( !pdev->domain )
> >> >          return -EINVAL;
> >> >
> >> > +    if ( is_pdev_unassignable(pdev) )
> >> > +        return -EACCES;
> >>
> >> Is this case possible at all (i.e. a newly added device being unassignable)?
> >>
> >
> > IMO, it is possible, and I have checked and printed the invoke flow 
> > when Xen
> init.
> > iommu_setup() / iommu_hwdom_init() are called before 
> > pci_add_device()
> when Xen init.
> > If it flushes error in iommu_setup() / iommu_hwdom_init(), it will 
> > mark device as unassignable.
> 
> Please be more precise: I can't see how e.g. iommu_setup() would 
> manage to time out on invalidation on a particular device. I.e.
> please provide an exact scenario where the flag could get set before 
> the device gets reported through iommu_add_device().
> 
> Furthermore I think that boot time invalidations shouldn't lead to any 
> devices getting marked un-assignable. Instead such would probably 
> better result in the IOMMU to be turned of altogether.


...
construct_dom0()
|--iommu_hwdom_init()
  |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for x86]
    |--iommu_flush_all()
    ...


If it is failed in iommu_flush_all(), it will mask all of dom0's devices as unassignable.
And all of that called before pci_add_device() ..   right?


> 
> >> > @@ -88,6 +89,16 @@ struct pci_dev {  #define 
> >> > for_each_pdev(domain,
> >> > pdev) \
> >> >      list_for_each_entry(pdev, &(domain->arch.pdev_list),
> >> > domain_list)
> >> >
> >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) {
> >> > +    pdev->info.is_unassignable = 1; }
> >> > +
> >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev
> >> > +*pdev) {
> >> > +    return pdev->info.is_unassignable; }
> >>
> >> Are you aware that we already have a mechanism to prevent 
> >> assignment (via pci_{ro,hide}_device())? I think at the very least 
> >> this check should
> > consider both
> >> variants. Whether fully using the existing mechanism for your 
> >> purpose is feasible I can't immediately tell (since the ownership 
> >> change may be problematic at the points where you want the flagging to happen).
> >
> > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it.
> > for this case, IMO I think a flag is a better solution.
> 
> I can't really judge on this without know details of the crash. As 
> said, I'd prefer to have just a single mechanism, and would accept a 
> second one only with proper justification (i.e. more than "is 
> dangerous" or "makes xen crash" - after all that may also be a result of the functions not being used as necessary).
> 


I need do some more investigation and try a second one only with proper justification.


Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15  9:16       ` Jan Beulich
  2015-12-15 10:24         ` Xu, Quan
  2015-12-15 10:58         ` Xu, Quan
@ 2015-12-15 12:23         ` Xu, Quan
  2015-12-15 12:29           ` Jan Beulich
  2 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-15 12:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org


> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote:
> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:


> >> > @@ -88,6 +89,16 @@ struct pci_dev {  #define for_each_pdev(domain,
> >> > pdev) \
> >> >      list_for_each_entry(pdev, &(domain->arch.pdev_list),
> >> > domain_list)
> >> >
> >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) {
> >> > +    pdev->info.is_unassignable = 1; }
> >> > +
> >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev
> >> > +*pdev) {
> >> > +    return pdev->info.is_unassignable; }
> >>
> >> Are you aware that we already have a mechanism to prevent assignment
> >> (via pci_{ro,hide}_device())? I think at the very least this check
> >> should
> > consider both
> >> variants. Whether fully using the existing mechanism for your purpose
> >> is feasible I can't immediately tell (since the ownership change may
> >> be problematic at the points where you want the flagging to happen).
> >
> > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it.
> > for this case, IMO I think a flag is a better solution.
> 
> I can't really judge on this without know details of the crash. As said, I'd prefer
> to have just a single mechanism, and would accept a second one only with
> proper justification (i.e. more than "is dangerous" or "makes xen crash" - after
> all that may also be a result of the functions not being used as necessary).
> 

Jan, one question, do pci_{ro,hide}_device() work to hide pci devices?

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15 12:23         ` Xu, Quan
@ 2015-12-15 12:29           ` Jan Beulich
  2015-12-15 13:31             ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-15 12:29 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Jun Nakajima, keir@xen.org

>>> On 15.12.15 at 13:23, <quan.xu@intel.com> wrote:
>> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote:
>> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
>> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
>> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
>> >> > @@ -88,6 +89,16 @@ struct pci_dev {  #define for_each_pdev(domain,
>> >> > pdev) \
>> >> >      list_for_each_entry(pdev, &(domain->arch.pdev_list),
>> >> > domain_list)
>> >> >
>> >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) {
>> >> > +    pdev->info.is_unassignable = 1; }
>> >> > +
>> >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev
>> >> > +*pdev) {
>> >> > +    return pdev->info.is_unassignable; }
>> >>
>> >> Are you aware that we already have a mechanism to prevent assignment
>> >> (via pci_{ro,hide}_device())? I think at the very least this check
>> >> should
>> > consider both
>> >> variants. Whether fully using the existing mechanism for your purpose
>> >> is feasible I can't immediately tell (since the ownership change may
>> >> be problematic at the points where you want the flagging to happen).
>> >
>> > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it.
>> > for this case, IMO I think a flag is a better solution.
>> 
>> I can't really judge on this without know details of the crash. As said, I'd prefer
>> to have just a single mechanism, and would accept a second one only with
>> proper justification (i.e. more than "is dangerous" or "makes xen crash" - after
>> all that may also be a result of the functions not being used as necessary).
> 
> Jan, one question, do pci_{ro,hide}_device() work to hide pci devices?

Not sure I understand the question - the place they're used in they
do work, yes.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15 10:24         ` Xu, Quan
@ 2015-12-15 12:32           ` Jan Beulich
  2015-12-15 13:01             ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-15 12:32 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Jun Nakajima, keir@xen.org

>>> On 15.12.15 at 12:42,  wrote:
>>>> On 15.12.15 at 11:24, <quan.xu@intel.com> wrote:
> >>  On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote:
> >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
> >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> >> >> > --- a/xen/drivers/passthrough/vtd/iommu.c
> >> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> >> >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn,
> >> >> > struct
> >> >> pci_dev *pdev)
> >> >> >      if ( !pdev->domain )
> >> >> >          return -EINVAL;
> >> >> >
> >> >> > +    if ( is_pdev_unassignable(pdev) )
> >> >> > +        return -EACCES;
> >> >>
> >> >> Is this case possible at all (i.e. a newly added device being 
> > unassignable)?
> >> >>
> >> >
> >> > IMO, it is possible, and I have checked and printed the invoke flow when 
> > Xen
> >> init.
> >> > iommu_setup() / iommu_hwdom_init() are called before pci_add_device()
> >> when Xen init.
> >> > If it flushes error in iommu_setup() / iommu_hwdom_init(), it will
> >> > mark device as unassignable.
> >> 
> >> Please be more precise: I can't see how e.g. iommu_setup() would manage to
> >> time out on invalidation on a particular device. I.e.
> >> please provide an exact scenario where the flag could get set before the 
> > device
> >> gets reported through iommu_add_device().
> >> 
> >> Furthermore I think that boot time invalidations shouldn't lead to any 
> > devices
> >> getting marked un-assignable. Instead such would probably better result in 
> > the
> >> IOMMU to be turned of altogether.
> > 
> > 
> > ...
> > construct_dom0()
> > |--iommu_hwdom_init()
> >   |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for x86]
> >     |--iommu_flush_all()
> >     ...
> > 
> > 
> > If it is failed in iommu_flush_all(), it will mask all of dom0's devices as 
> > unassignable.
> > And all of that called before pci_add_device() ..   right?
> 
> Agreed. But as said, I don't think that's the right action in this specific
> case.

Having thought about it some more over lunch, I think leaving
the conditional in place is going to be fine, and much less of a
change than trying to unroll IOMMU setup at that point.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15 12:32           ` Jan Beulich
@ 2015-12-15 13:01             ` Xu, Quan
  2015-12-15 13:26               ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-15 13:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 15.12.2015 at 8:32pm, <JBeulich@suse.com> wrote:
> >>> On 15.12.15 at 12:42,  wrote:
> >>>> On 15.12.15 at 11:24, <quan.xu@intel.com> wrote:
> > >>  On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote:
> > >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
> > >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
> > >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> > >> >> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > >> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > >> >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8
> > >> >> > devfn, struct
> > >> >> pci_dev *pdev)
> > >> >> >      if ( !pdev->domain )
> > >> >> >          return -EINVAL;
> > >> >> >
> > >> >> > +    if ( is_pdev_unassignable(pdev) )
> > >> >> > +        return -EACCES;
> > >> >>
> > >> >> Is this case possible at all (i.e. a newly added device being
> > > unassignable)?
> > >> >>
> > >> >
> > >> > IMO, it is possible, and I have checked and printed the invoke
> > >> > flow when
> > > Xen
> > >> init.
> > >> > iommu_setup() / iommu_hwdom_init() are called before
> > >> > pci_add_device()
> > >> when Xen init.
> > >> > If it flushes error in iommu_setup() / iommu_hwdom_init(), it
> > >> > will mark device as unassignable.
> > >>
> > >> Please be more precise: I can't see how e.g. iommu_setup() would
> > >> manage to time out on invalidation on a particular device. I.e.
> > >> please provide an exact scenario where the flag could get set
> > >> before the
> > > device
> > >> gets reported through iommu_add_device().
> > >>
> > >> Furthermore I think that boot time invalidations shouldn't lead to
> > >> any
> > > devices
> > >> getting marked un-assignable. Instead such would probably better
> > >> result in
> > > the
> > >> IOMMU to be turned of altogether.
> > >
> > >
> > > ...
> > > construct_dom0()
> > > |--iommu_hwdom_init()
> > >   |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for
> x86]
> > >     |--iommu_flush_all()
> > >     ...
> > >
> > >
> > > If it is failed in iommu_flush_all(), it will mask all of dom0's
> > > devices as unassignable.
> > > And all of that called before pci_add_device() ..   right?
> >
> > Agreed. But as said, I don't think that's the right action in this
> > specific case.
> 
> Having thought about it some more over lunch, I think leaving the conditional in
> place is going to be fine, and much less of a change than trying to unroll
> IOMMU setup at that point.
> 
Thanks Jan!
Do you mean that I can ignore it in intel_iommu_add_device()?

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15 13:01             ` Xu, Quan
@ 2015-12-15 13:26               ` Jan Beulich
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2015-12-15 13:26 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Jun Nakajima, keir@xen.org

>>> On 15.12.15 at 14:01, <quan.xu@intel.com> wrote:
>>  On 15.12.2015 at 8:32pm, <JBeulich@suse.com> wrote:
>> >>> On 15.12.15 at 12:42,  wrote:
>> >>>> On 15.12.15 at 11:24, <quan.xu@intel.com> wrote:
>> > >>  On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote:
>> > >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
>> > >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
>> > >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
>> > >> >> > --- a/xen/drivers/passthrough/vtd/iommu.c
>> > >> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c
>> > >> >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8
>> > >> >> > devfn, struct
>> > >> >> pci_dev *pdev)
>> > >> >> >      if ( !pdev->domain )
>> > >> >> >          return -EINVAL;
>> > >> >> >
>> > >> >> > +    if ( is_pdev_unassignable(pdev) )
>> > >> >> > +        return -EACCES;
>> > >> >>
>> > >> >> Is this case possible at all (i.e. a newly added device being
>> > > unassignable)?
>> > >> >>
>> > >> >
>> > >> > IMO, it is possible, and I have checked and printed the invoke
>> > >> > flow when
>> > > Xen
>> > >> init.
>> > >> > iommu_setup() / iommu_hwdom_init() are called before
>> > >> > pci_add_device()
>> > >> when Xen init.
>> > >> > If it flushes error in iommu_setup() / iommu_hwdom_init(), it
>> > >> > will mark device as unassignable.
>> > >>
>> > >> Please be more precise: I can't see how e.g. iommu_setup() would
>> > >> manage to time out on invalidation on a particular device. I.e.
>> > >> please provide an exact scenario where the flag could get set
>> > >> before the
>> > > device
>> > >> gets reported through iommu_add_device().
>> > >>
>> > >> Furthermore I think that boot time invalidations shouldn't lead to
>> > >> any
>> > > devices
>> > >> getting marked un-assignable. Instead such would probably better
>> > >> result in
>> > > the
>> > >> IOMMU to be turned of altogether.
>> > >
>> > >
>> > > ...
>> > > construct_dom0()
>> > > |--iommu_hwdom_init()
>> > >   |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for
>> x86]
>> > >     |--iommu_flush_all()
>> > >     ...
>> > >
>> > >
>> > > If it is failed in iommu_flush_all(), it will mask all of dom0's
>> > > devices as unassignable.
>> > > And all of that called before pci_add_device() ..   right?
>> >
>> > Agreed. But as said, I don't think that's the right action in this
>> > specific case.
>> 
>> Having thought about it some more over lunch, I think leaving the 
> conditional in
>> place is going to be fine, and much less of a change than trying to unroll
>> IOMMU setup at that point.
>> 
> Thanks Jan!
> Do you mean that I can ignore it in intel_iommu_add_device()?

Ignore? Just keep the code the way it was (still seen above).

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-15 12:29           ` Jan Beulich
@ 2015-12-15 13:31             ` Xu, Quan
       [not found]               ` <5670271402000078000BFA35@prv-mh.provo.novell.com>
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-15 13:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 15.12.2015 at 8:30pm, <JBeulich@suse.com> wrote:
> >>> On 15.12.15 at 13:23, <quan.xu@intel.com> wrote:
> >> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote:
> >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote:
> >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote:
> >> >> > @@ -88,6 +89,16 @@ struct pci_dev {  #define
> >> >> > for_each_pdev(domain,
> >> >> > pdev) \
> >> >> >      list_for_each_entry(pdev, &(domain->arch.pdev_list),
> >> >> > domain_list)
> >> >> >
> >> >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) {
> >> >> > +    pdev->info.is_unassignable = 1; }
> >> >> > +
> >> >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev
> >> >> > +*pdev) {
> >> >> > +    return pdev->info.is_unassignable; }
> >> >>
> >> >> Are you aware that we already have a mechanism to prevent
> >> >> assignment (via pci_{ro,hide}_device())? I think at the very least
> >> >> this check should
> >> > consider both
> >> >> variants. Whether fully using the existing mechanism for your
> >> >> purpose is feasible I can't immediately tell (since the ownership
> >> >> change may be problematic at the points where you want the flagging to
> happen).
> >> >
> >> > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it.
> >> > for this case, IMO I think a flag is a better solution.
> >>
> >> I can't really judge on this without know details of the crash. As
> >> said, I'd prefer to have just a single mechanism, and would accept a
> >> second one only with proper justification (i.e. more than "is
> >> dangerous" or "makes xen crash" - after all that may also be a result of the
> functions not being used as necessary).
> >
> > Jan, one question, do pci_{ro,hide}_device() work to hide pci devices?
> 
> Not sure I understand the question - the place they're used in they do work, yes.


Copy from pci_hide_device(), which is actually add device to dom_xen and
add pdev->domain_list to dom_xen->arch.pdev_list.

Quite similar, a second one only with proper justification, I can reassign
Device form _domain to dom_xen directly. The below is the how to deal
With device_tlb ( is it acceptable? ). It is working to hide device.





+void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did,
+                                   u16 seg, u8 bus, u8 devfn)
+{
+    struct domain *d;
+    struct pci_dev *pdev;
+    struct hvm_iommu *hd;
+    int rc;
+
+    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
+    ASSERT(d);
+    for_each_pdev(d, pdev)
+        if ( (pdev->seg == seg) &&
+             (pdev->bus == bus) &&
+             (pdev->devfn == devfn) )
+        {
+            if ( pdev->domain )
+            {
+                hd = domain_hvm_iommu(d);
+                rc =  hd->platform_ops->reassign_device(d,
+                      dom_xen, devfn, pdev);
+                if ( rc )
+                    continue;
+            }
+            break;
+        }
+
+    if ( !is_hardware_domain(d) )
+        domain_crash(d);
+    rcu_unlock_domain(d);
+}



Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
       [not found]               ` <5670271402000078000BFA35@prv-mh.provo.novell.com>
@ 2015-12-15 14:49                 ` Xu, Quan
  2015-12-16  3:51                 ` Xu, Quan
  1 sibling, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-15 14:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 15.12.2015 at 9:44pm, <JBeulich@suse.com> wrote:
> >>> On 15.12.15 at 14:31, <quan.xu@intel.com> wrote:
> > Copy from pci_hide_device(), which is actually add device to dom_xen
> > and add pdev->domain_list to dom_xen->arch.pdev_list.
> >
> > Quite similar, a second one only with proper justification, I can
> > reassign Device form _domain to dom_xen directly. The below is the how
> > to deal With device_tlb ( is it acceptable? ). It is working to hide device.
> 
> Looks reasonable, but ...
> 
> > +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did,
> > +                                   u16 seg, u8 bus, u8 devfn) {
> > +    struct domain *d;
> > +    struct pci_dev *pdev;
> > +    struct hvm_iommu *hd;
> > +    int rc;
> > +
> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > +    ASSERT(d);
> > +    for_each_pdev(d, pdev)
> > +        if ( (pdev->seg == seg) &&
> > +             (pdev->bus == bus) &&
> > +             (pdev->devfn == devfn) )
> > +        {
> > +            if ( pdev->domain )
> > +            {
> > +                hd = domain_hvm_iommu(d);
> > +                rc =  hd->platform_ops->reassign_device(d,
> > +                      dom_xen, devfn, pdev);
> 
> ... doesn't this have the potential of generating further flushes? You clearly need
> to be certain not to recurse here.

Good consideration. I should continue to enhance it.
Thanks Jan...

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
       [not found]               ` <5670271402000078000BFA35@prv-mh.provo.novell.com>
  2015-12-15 14:49                 ` Xu, Quan
@ 2015-12-16  3:51                 ` Xu, Quan
  2015-12-16  8:08                   ` Jan Beulich
  1 sibling, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-16  3:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 15.12.15 at 9:44pm, <JBeulich@suse.com> wrote:
> >>> On 15.12.15 at 14:31, <quan.xu@intel.com> wrote:
> > Copy from pci_hide_device(), which is actually add device to dom_xen
> > and add pdev->domain_list to dom_xen->arch.pdev_list.
> >
> > Quite similar, a second one only with proper justification, I can
> > reassign Device form _domain to dom_xen directly. The below is the how
> > to deal With device_tlb ( is it acceptable? ). It is working to hide device.
> 
> Looks reasonable, but ...
> 
> > +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did,
> > +                                   u16 seg, u8 bus, u8 devfn) {
> > +    struct domain *d;
> > +    struct pci_dev *pdev;
> > +    struct hvm_iommu *hd;
> > +    int rc;
> > +
> > +    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
> > +    ASSERT(d);
> > +    for_each_pdev(d, pdev)
> > +        if ( (pdev->seg == seg) &&
> > +             (pdev->bus == bus) &&
> > +             (pdev->devfn == devfn) )
> > +        {
> > +            if ( pdev->domain )
> > +            {
> > +                hd = domain_hvm_iommu(d);
> > +                rc =  hd->platform_ops->reassign_device(d,
> > +                      dom_xen, devfn, pdev);
> 
> ... doesn't this have the potential of generating further flushes? You clearly need
> to be certain not to recurse here.


Jan, The following code is for how to hide a device. I think it is feasible with proper justification.
Have a look first. Thanks.




--- a/xen/drivers/passthrough/vtd/qinval.c
+++ b/xen/drivers/passthrough/vtd/qinval.c
+void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did,
+                                   u16 seg, u8 bus, u8 devfn)
+{
+    struct domain *d;
+    struct pci_dev *pdev;
+
+    d = rcu_lock_domain_by_id(iommu->domid_map[did]);
+    ASSERT(d);
+    for_each_pdev(d, pdev)
+        if ( (pdev->seg == seg) &&
+             (pdev->bus == bus) &&
+             (pdev->devfn == devfn) )
+        {
+            if ( iommu_hide_device(pdev) )
+            {
+                printk(XENLOG_ERR
+                       "IOMMU hide device %04x:%02x:%02x error.",
+                       seg, bus, devfn);
+                break;
+            }
+
+            break;
+        }
+
+    if ( !is_hardware_domain(d) )
+        domain_crash(d);
+    rcu_unlock_domain(d);
+}

--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
 int iommu_remove_device(struct pci_dev *pdev);
+int iommu_hide_device(struct pci_dev *pdev);


--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1318,6 +1318,25 @@ int iommu_remove_device(struct pci_dev *pdev)
     return hd->platform_ops->remove_device(pdev->devfn, pci_to_dev(pdev));
 }

+int iommu_hide_device(struct pci_dev *pdev)
+{
+    if ( !pdev || !pdev->domain )
+        return -EINVAL;
+
+    spin_lock(&pcidevs_lock);
+    pdev->domain = dom_xen;
+    list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
+    spin_unlock(&pcidevs_lock);
+
+    return 0;
+}
+

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
     if ( !pdev->domain )
         return -EINVAL;

+    if ( pdev->domain == dom_xen )
+        return -EACCES;
+
     ret = domain_context_mapping(pdev->domain, devfn, pdev);
     if ( ret )
     {
@@ -2301,6 +2304,9 @@ static int intel_iommu_assign_device(
     if ( list_empty(&acpi_drhd_units) )
         return -ENODEV;

+    if ( pdev->domain == dom_xen )
+        return -EACCES;
+
     seg = pdev->seg;
     bus = pdev->bus;




Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-16  3:51                 ` Xu, Quan
@ 2015-12-16  8:08                   ` Jan Beulich
  2015-12-17 11:43                     ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-16  8:08 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Jun Nakajima, keir@xen.org

>>> On 16.12.15 at 04:51, <quan.xu@intel.com> wrote:
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1318,6 +1318,25 @@ int iommu_remove_device(struct pci_dev *pdev)
>      return hd->platform_ops->remove_device(pdev->devfn, pci_to_dev(pdev));
>  }
> 
> +int iommu_hide_device(struct pci_dev *pdev)
> +{
> +    if ( !pdev || !pdev->domain )
> +        return -EINVAL;
> +
> +    spin_lock(&pcidevs_lock);
> +    pdev->domain = dom_xen;
> +    list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
> +    spin_unlock(&pcidevs_lock);
> +
> +    return 0;
> +}

This is effectively a copy of pci_hide_device(), and is misnamed (since
it takes a PCI device as argument). I do not see why you shouldn't be
able to use pci_hide_device() after removing its __init annotation or
a suitably named wrapper around _pci_hide_device(). Not specifically
that the way you do this right now you corrupt the original owning
domain's PCI device list - you need to remove the device from that
list before adding it to dom_xen's (which then will naturally entail
clearing ->domain, at once satisfying _pci_hide_device()'s early
check, which is there for the very reason of ensuring not to corrupt
any list).

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
>      if ( !pdev->domain )
>          return -EINVAL;
> 
> +    if ( pdev->domain == dom_xen )
> +        return -EACCES;

I'm not sure about the need for this check, ...

> @@ -2301,6 +2304,9 @@ static int intel_iommu_assign_device(
>      if ( list_empty(&acpi_drhd_units) )
>          return -ENODEV;
> 
> +    if ( pdev->domain == dom_xen )
> +        return -EACCES;

... and I clearly don't see the need for this one. Please explain,
keeping in mind that generic IOMMU code should be enforcing things
like this (and at least in the assign case should already be doing so).

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-16  8:08                   ` Jan Beulich
@ 2015-12-17 11:43                     ` Xu, Quan
  2015-12-17 12:10                       ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-17 11:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 16.12.2015 at 4:08pm, <JBeulich@suse.com> wrote:
> >>> On 16.12.15 at 04:51, <quan.xu@intel.com> wrote:
> > --- a/xen/drivers/passthrough/pci.c
> > +++ b/xen/drivers/passthrough/pci.c
> > @@ -1318,6 +1318,25 @@ int iommu_remove_device(struct pci_dev *pdev)
> >      return hd->platform_ops->remove_device(pdev->devfn,
> > pci_to_dev(pdev));  }
> >
> > +int iommu_hide_device(struct pci_dev *pdev) {
> > +    if ( !pdev || !pdev->domain )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&pcidevs_lock);
> > +    pdev->domain = dom_xen;
> > +    list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
> > +    spin_unlock(&pcidevs_lock);
> > +
> > +    return 0;
> > +}
> 
> This is effectively a copy of pci_hide_device(), and is misnamed (since it takes a
> PCI device as argument). I do not see why you shouldn't be able to use
> pci_hide_device() after removing its __init annotation or a suitably named
> wrapper around _pci_hide_device(). Not specifically that the way you do this
> right now you corrupt the original owning domain's PCI device list - you need to
> remove the device from that list before adding it to dom_xen's (which then will
> naturally entail clearing ->domain, at once satisfying _pci_hide_device()'s early
> check, which is there for the very reason of ensuring not to corrupt any list).
>

You are correct.
As the _pci_hide_device()'s early check, I didn't use it. 
Could I remove the device from that list before adding it to dom_xen's, and reuse
pci_hide_device() as below?


+            list_del(&pdev->domain_list);
+            pdev->domain = NULL;
+            if ( pci_hide_device(bus, devfn) )
+            {
+                printk(XENLOG_ERR
+                       "IOMMU hide device %04x:%02x:%02x error.",
+                       seg, bus, devfn);
+                break;
+            }



> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct
> pci_dev *pdev)
> >      if ( !pdev->domain )
> >          return -EINVAL;
> >
> > +    if ( pdev->domain == dom_xen )
> > +        return -EACCES;
> 
> I'm not sure about the need for this check, ...
> 
> > @@ -2301,6 +2304,9 @@ static int intel_iommu_assign_device(
> >      if ( list_empty(&acpi_drhd_units) )
> >          return -ENODEV;
> >
> > +    if ( pdev->domain == dom_xen )
> > +        return -EACCES;
> 
> ... and I clearly don't see the need for this one. Please explain, keeping in mind
> that generic IOMMU code should be enforcing things like this (and at least in the
> assign case should already be doing so).

I have verified it. the above 2 checks are excessive protection. I can remove it.

((Jan, thanks very much! Sometime it is not just for technology help.))
Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-17 11:43                     ` Xu, Quan
@ 2015-12-17 12:10                       ` Jan Beulich
  2015-12-17 12:55                         ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-17 12:10 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Jun Nakajima, keir@xen.org

>>> On 17.12.15 at 12:43, <quan.xu@intel.com> wrote:
>>  On 16.12.2015 at 4:08pm, <JBeulich@suse.com> wrote:
>> >>> On 16.12.15 at 04:51, <quan.xu@intel.com> wrote:
>> > --- a/xen/drivers/passthrough/pci.c
>> > +++ b/xen/drivers/passthrough/pci.c
>> > @@ -1318,6 +1318,25 @@ int iommu_remove_device(struct pci_dev *pdev)
>> >      return hd->platform_ops->remove_device(pdev->devfn,
>> > pci_to_dev(pdev));  }
>> >
>> > +int iommu_hide_device(struct pci_dev *pdev) {
>> > +    if ( !pdev || !pdev->domain )
>> > +        return -EINVAL;
>> > +
>> > +    spin_lock(&pcidevs_lock);
>> > +    pdev->domain = dom_xen;
>> > +    list_add(&pdev->domain_list, &dom_xen->arch.pdev_list);
>> > +    spin_unlock(&pcidevs_lock);
>> > +
>> > +    return 0;
>> > +}
>> 
>> This is effectively a copy of pci_hide_device(), and is misnamed (since it 
> takes a
>> PCI device as argument). I do not see why you shouldn't be able to use
>> pci_hide_device() after removing its __init annotation or a suitably named
>> wrapper around _pci_hide_device(). Not specifically that the way you do this
>> right now you corrupt the original owning domain's PCI device list - you need 
> to
>> remove the device from that list before adding it to dom_xen's (which then 
> will
>> naturally entail clearing ->domain, at once satisfying _pci_hide_device()'s 
> early
>> check, which is there for the very reason of ensuring not to corrupt any 
> list).
>>
> 
> You are correct.
> As the _pci_hide_device()'s early check, I didn't use it. 
> Could I remove the device from that list before adding it to dom_xen's, and 
> reuse pci_hide_device() as below?

That's what I was trying to suggest. Just that you should list_del()
only when pdev->domain is not NULL.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue.
  2015-12-17 12:10                       ` Jan Beulich
@ 2015-12-17 12:55                         ` Xu, Quan
  0 siblings, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-17 12:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	Nakajima, Jun, keir@xen.org

> On 17.12.2015 at 8:10pm, <JBeulich@suse.com> wrote:
> >>> On 17.12.15 at 12:43, <quan.xu@intel.com> wrote:


> > You are correct.
> > As the _pci_hide_device()'s early check, I didn't use it.
> > Could I remove the device from that list before adding it to
> > dom_xen's, and reuse pci_hide_device() as below?
> 
> That's what I was trying to suggest. Just that you should list_del() only when
> pdev->domain is not NULL.
> 

ok. Now I think we are on the same page. I am getting started to v4.
Thanks.


Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-12 13:21 [PATCH v3 0/2] VT-d flush issue Quan Xu
  2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu
  2015-12-12 13:21 ` [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue Quan Xu
@ 2015-12-19 14:01 ` Xu, Quan
  2 siblings, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-19 14:01 UTC (permalink / raw)
  To: jbeulich@suse.com, Tian, Kevin, keir@xen.org, tim@xen.org,
	george.dunlap@eu.citrix.com, Nakajima, Jun,
	andrew.cooper3@citrix.com, Wu, Feng, xen-devel@lists.xen.org

>On 12.12.2015 at 9:22pm, <quan.xu@intel.com> wrote:
> This patches are based on Kevin Tian's previous discussion 'Revisit VT-d
> asynchronous flush issue'.
> Fix current timeout concern and also allow limited ATS support in a light way:


> 2. Fix vt-d flush timeout issue.
> 
>     If IOTLB/Context/IETC flush is timeout, we should think all devices under
> this IOMMU cannot function correctly.
>     So for each device under this IOMMU we'll mark it as unassignable and kill
> the domain owning the device.
> 

Any comment for this point?
Instead of panic, is it enough?

-Quan



>     If Device-TLB flush is timeout, we'll mark the target ATS device as
> unassignable and kill the domain owning
>     this device.
> 
>     If impacted domain is hardware domain, just throw out a warning. It's an
> open here whether we want to kill
>     hardware domain (or directly panic hypervisor). Comments are welcomed.
> 
>     Device marked as unassignable will be disallowed to be further assigned to
> any domain.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
@ 2015-12-20 13:57 Xu, Quan
  2015-12-20 15:51 ` Andrew Cooper
  2015-12-21 11:46 ` Jan Beulich
  0 siblings, 2 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-20 13:57 UTC (permalink / raw)
  To: 'jbeulich@suse.com', Tian, Kevin, 'keir@xen.org',
	'tim@xen.org', 'george.dunlap@eu.citrix.com',
	Nakajima, Jun, 'andrew.cooper3@citrix.com', Wu, Feng,
	'xen-devel@lists.xen.org'
  Cc: Xu, Quan


>On 12.12.2015 at 9:22pm, <quan.xu@intel.com> wrote:
> This patches are based on Kevin Tian's previous discussion 'Revisit 
>VT-d  asynchronous flush issue'.
> Fix current timeout concern and also allow limited ATS support in a light way:


> 2. Fix vt-d flush timeout issue.
> 
>     If IOTLB/Context/IETC flush is timeout, we should think all 
> devices under this IOMMU cannot function correctly.
>     So for each device under this IOMMU we'll mark it as unassignable 
> and kill the domain owning the device.
> 

Hi, 

Through research and investigation, when IEC/Iotlb/Context are flush error(VT-d is bug),
IMO it is unavoidable to panic. The following are some reasons:

1. The below is the general platform topology, illustrated by VT-d spec.
VT-d is a key component of the platform infrastructure in virtualization usage,
providing DMA/Intr remapping capabilities.
If such a key component VT-d is bug, it can't provide reliability for recording
and reporting of DMA/Intr error to VMM that may otherwise corrupt memory or
impact VM isolation.


       Processor  ... Processor
       ---------      ---------
                  ^
                  |

             North Bridge
            --------------      <---> DRAM
           DMA/Intr Remapping

                ^^^^
                |||| PCIe Devices
                vvvv



2. If VT-d is bug, does the hardware_domain continue to work with PCIe Devices / DRAM well with DMA remapping error?
   I think it is no. furthermore, i think VMM can NOT run a normal HVM domain without device-passthrough.

3. There are so many reasons for IEC/iotlb/Conetxt flush, .i.e. msi/ept... update.
   It distributed across the VMM source code, it is challenge to make sure callers actually honor errors
   and check all the way up the call trees. it looks like rewriting VMM source code.

4. Much more detail, some flush errors are very tricky. .i.e. how to deal with msi free with IEC flush error,
   restore or ignore it?


Welcome your comments and correct me if i am wrong. thanks.

-Quan



>     If Device-TLB flush is timeout, we'll mark the target ATS device 
> as unassignable and kill the domain owning
>     this device.
> 
>     If impacted domain is hardware domain, just throw out a warning. 
> It's an open here whether we want to kill
>     hardware domain (or directly panic hypervisor). Comments are welcomed.
> 
>     Device marked as unassignable will be disallowed to be further 
> assigned to any domain.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-20 13:57 Xu, Quan
@ 2015-12-20 15:51 ` Andrew Cooper
  2015-12-21 11:46 ` Jan Beulich
  1 sibling, 0 replies; 50+ messages in thread
From: Andrew Cooper @ 2015-12-20 15:51 UTC (permalink / raw)
  To: Xu, Quan, 'jbeulich@suse.com', Tian, Kevin,
	'keir@xen.org', 'tim@xen.org',
	'george.dunlap@eu.citrix.com', Nakajima, Jun, Wu, Feng,
	'xen-devel@lists.xen.org'

On 20/12/15 13:57, Xu, Quan wrote:
>> On 12.12.2015 at 9:22pm, <quan.xu@intel.com> wrote:
>> This patches are based on Kevin Tian's previous discussion 'Revisit 
>> VT-d  asynchronous flush issue'.
>> Fix current timeout concern and also allow limited ATS support in a light way:
>
>> 2. Fix vt-d flush timeout issue.
>>
>>     If IOTLB/Context/IETC flush is timeout, we should think all 
>> devices under this IOMMU cannot function correctly.
>>     So for each device under this IOMMU we'll mark it as unassignable 
>> and kill the domain owning the device.
>>
> Hi, 
>
> Through research and investigation, when IEC/Iotlb/Context are flush error(VT-d is bug),
> IMO it is unavoidable to panic. The following are some reasons:
>
> 1. The below is the general platform topology, illustrated by VT-d spec.
> VT-d is a key component of the platform infrastructure in virtualization usage,
> providing DMA/Intr remapping capabilities.
> If such a key component VT-d is bug, it can't provide reliability for recording
> and reporting of DMA/Intr error to VMM that may otherwise corrupt memory or
> impact VM isolation.
>
>
>        Processor  ... Processor
>        ---------      ---------
>                   ^
>                   |
>
>              North Bridge
>             --------------      <---> DRAM
>            DMA/Intr Remapping
>
>                 ^^^^
>                 |||| PCIe Devices
>                 vvvv
>
>
>
> 2. If VT-d is bug, does the hardware_domain continue to work with PCIe Devices / DRAM well with DMA remapping error?
>    I think it is no. furthermore, i think VMM can NOT run a normal HVM domain without device-passthrough.
>
> 3. There are so many reasons for IEC/iotlb/Conetxt flush, .i.e. msi/ept... update.
>    It distributed across the VMM source code, it is challenge to make sure callers actually honor errors
>    and check all the way up the call trees. it looks like rewriting VMM source code.
>
> 4. Much more detail, some flush errors are very tricky. .i.e. how to deal with msi free with IEC flush error,
>    restore or ignore it?
>
>
> Welcome your comments and correct me if i am wrong. thanks.
>
> -Quan

I believe everywhere you say "is bug", you mean "is buggy".

I agree that, if the remapping engine itself is buggy, Xen can't be
certain about anything else functioning correctly.

However, there are many errors the remapping engine could generate which
are not because it itself is buggy.  These could be because of a
downstream device misbehaving, or because the remapping engine was
incorrectly programmed.  These cases should not be able to directly
cause a crash.

In the case of a buggy device, it doesn't matter if it, and all its
resources, are left in stuck state; it can be ignored and the rest of
the host can try to continue normally.

For point 3 specifically.  The code is indeed currently broken, and
needs fixing, one way or another.  It is sad that we are still suffering
from lots of very poor quality code submitted in the past.

Even in the case that we discover that a remapping engine itself is
buggy, it is likely not to be the only remapping engine in the system. 
If it can be safely disabled, the host has a good chance of being able
to continue sensibly.

~Andrew

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-20 13:57 Xu, Quan
  2015-12-20 15:51 ` Andrew Cooper
@ 2015-12-21 11:46 ` Jan Beulich
  2015-12-21 12:28   ` Xu, Quan
  2015-12-21 12:44   ` Xu, Quan
  1 sibling, 2 replies; 50+ messages in thread
From: Jan Beulich @ 2015-12-21 11:46 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima,
	'keir@xen.org'

>>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> 2. If VT-d is bug, does the hardware_domain continue to work with PCIe 
> Devices / DRAM well with DMA remapping error?
>    I think it is no. furthermore, i think VMM can NOT run a normal HVM 
> domain without device-passthrough.

In addition to what Andrew said - VT-d is effectively not in use for
domains without PT device. Impacting all such domains by crashing
the hypervisor just because (in the extreme case) a single domain
with PT devices exhibited a flush issue is a no-go imo.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 11:46 ` Jan Beulich
@ 2015-12-21 12:28   ` Xu, Quan
  2015-12-21 12:50     ` Jan Beulich
  2015-12-21 12:44   ` Xu, Quan
  1 sibling, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-21 12:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> > 2. If VT-d is bug, does the hardware_domain continue to work with PCIe
> > Devices / DRAM well with DMA remapping error?
> >    I think it is no. furthermore, i think VMM can NOT run a normal HVM
> > domain without device-passthrough.
> 
> In addition to what Andrew said - VT-d is effectively not in use for domains
> without PT device.

IMO, When VT-d is enabled, but is not working correct. These PCI-e devices (Disks/NICs..) DMA/Interrupt behaviors are not predictable. 
Assumed that, VT-d is effectively not in use for domains without PT device, while at least the virtualization infrastructure is not trusted.
I think it is also not secure to run PV domains.

> Impacting all such domains by crashing the hypervisor just
> because (in the extreme case) a single domain with PT devices exhibited a flush
> issue is a no-go imo.
> 

IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain behavior, it is a Hypervisor and infrastructure issue.
ATS device's Device-TLB flush is a single domain issue.
Back to our original goal, my patch set is for ATS flush issue. right?

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 11:46 ` Jan Beulich
  2015-12-21 12:28   ` Xu, Quan
@ 2015-12-21 12:44   ` Xu, Quan
  1 sibling, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-21 12:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, Xu, Quan,
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> > 2. If VT-d is bug, does the hardware_domain continue to work with 
> > PCIe Devices / DRAM well with DMA remapping error?
> >    I think it is no. furthermore, i think VMM can NOT run a normal 
> > HVM domain without device-passthrough.
> 
> In addition to what Andrew said - VT-d is effectively not in use for 
> domains without PT device.

IMO, When VT-d is enabled, but is not working correct. These PCI-e devices (Disks/NICs..) DMA/Interrupt behaviors are not predictable. 
Assumed that, VT-d is effectively not in use for domains without PT device, while at least the virtualization infrastructure is not trusted.
I think it is also not secure to run PV domains.

> Impacting all such domains by crashing the hypervisor just because (in 
> the extreme case) a single domain with PT devices exhibited a flush 
> issue is a no-go imo.
> 

IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain behavior, it is a Hypervisor and infrastructure issue.
ATS device's Device-TLB flush is a single domain issue.
Back to our original goal, my patch set is for ATS flush issue. right?

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 12:28   ` Xu, Quan
@ 2015-12-21 12:50     ` Jan Beulich
  2015-12-21 13:08       ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-21 12:50 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima,
	'keir@xen.org'

>>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
> On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
>> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
>> > 2. If VT-d is bug, does the hardware_domain continue to work with PCIe
>> > Devices / DRAM well with DMA remapping error?
>> >    I think it is no. furthermore, i think VMM can NOT run a normal HVM
>> > domain without device-passthrough.
>> 
>> In addition to what Andrew said - VT-d is effectively not in use for domains
>> without PT device.
> 
> IMO, When VT-d is enabled, but is not working correct. These PCI-e devices 
> (Disks/NICs..) DMA/Interrupt behaviors are not predictable. 
> Assumed that, VT-d is effectively not in use for domains without PT device, 
> while at least the virtualization infrastructure is not trusted.
> I think it is also not secure to run PV domains.
> 
>> Impacting all such domains by crashing the hypervisor just
>> because (in the extreme case) a single domain with PT devices exhibited a flush
>> issue is a no-go imo.
>> 
> 
> IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain behavior, 
> it is a Hypervisor and infrastructure issue.
> ATS device's Device-TLB flush is a single domain issue.
> Back to our original goal, my patch set is for ATS flush issue. right?

You mean you don't like this entailing clean up of other code? I'm
sorry, but I'm afraid you won't get away without - perhaps the
VT-d maintainers could help here, but in the end you have to face
that it was mainly Intel people who introduced the code which now
needs fixing up, so I consider it not exactly unfair for you (as a
company) to do this work.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 12:50     ` Jan Beulich
@ 2015-12-21 13:08       ` Xu, Quan
  2015-12-21 13:22         ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-21 13:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, Xu, Quan,
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

> On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> >> > 2. If VT-d is bug, does the hardware_domain continue to work with
> >> > PCIe Devices / DRAM well with DMA remapping error?
> >> >    I think it is no. furthermore, i think VMM can NOT run a normal
> >> > HVM domain without device-passthrough.
> >>
> >> In addition to what Andrew said - VT-d is effectively not in use for
> >> domains without PT device.
> >
> > IMO, When VT-d is enabled, but is not working correct. These PCI-e
> > devices
> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
> > Assumed that, VT-d is effectively not in use for domains without PT
> > device, while at least the virtualization infrastructure is not trusted.
> > I think it is also not secure to run PV domains.
> >
> >> Impacting all such domains by crashing the hypervisor just because
> >> (in the extreme case) a single domain with PT devices exhibited a
> >> flush issue is a no-go imo.
> >>
> >
> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain
> > behavior, it is a Hypervisor and infrastructure issue.
> > ATS device's Device-TLB flush is a single domain issue.
> > Back to our original goal, my patch set is for ATS flush issue. right?
> 
> You mean you don't like this entailing clean up of other code? 

 Jan, for ARM/AMD, I really have no knowledge to fix it. and I have no
 ARM/AMD hardware to verify it. if I need to fix these common part of INTEL/ARM/AMD, I think I need to make
 Xen compile correct and not to destroy the logic.

> I'm sorry, but I'm
> afraid you won't get away without - perhaps the VT-d maintainers could help
> here, but in the end you have to face that it was mainly Intel people who
> introduced the code which now needs fixing up, so I consider it not exactly
> unfair for you (as a
> company) to do this work.
> 

Furthermore, I found out that
     if IEC/Iotlb/Context flush error, then panic.
     Else if device-tlb flush error, we'll hide the target ATS device and kill the domain owning this ATS device. If impacted domain is hardware domain, just throw out a warning.

     Then, it is fine to _not_check all the way up the device-tlb flush call trees( maybe it is our next topic of discussion). Then it will not hurt the ARM/AMD platform.

               Quan 














> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 13:08       ` Xu, Quan
@ 2015-12-21 13:22         ` Jan Beulich
  2015-12-21 13:35           ` Xu, Quan
                             ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Jan Beulich @ 2015-12-21 13:22 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima,
	'keir@xen.org'

>>> On 21.12.15 at 14:08, <quan.xu@intel.com> wrote:
>>  On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
>> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
>> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
>> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
>> >> > 2. If VT-d is bug, does the hardware_domain continue to work with
>> >> > PCIe Devices / DRAM well with DMA remapping error?
>> >> >    I think it is no. furthermore, i think VMM can NOT run a normal
>> >> > HVM domain without device-passthrough.
>> >>
>> >> In addition to what Andrew said - VT-d is effectively not in use for
>> >> domains without PT device.
>> >
>> > IMO, When VT-d is enabled, but is not working correct. These PCI-e
>> > devices
>> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
>> > Assumed that, VT-d is effectively not in use for domains without PT
>> > device, while at least the virtualization infrastructure is not trusted.
>> > I think it is also not secure to run PV domains.
>> >
>> >> Impacting all such domains by crashing the hypervisor just because
>> >> (in the extreme case) a single domain with PT devices exhibited a
>> >> flush issue is a no-go imo.
>> >>
>> >
>> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain
>> > behavior, it is a Hypervisor and infrastructure issue.
>> > ATS device's Device-TLB flush is a single domain issue.
>> > Back to our original goal, my patch set is for ATS flush issue. right?
>> 
>> You mean you don't like this entailing clean up of other code? 
> 
>  Jan, for ARM/AMD, I really have no knowledge to fix it. and I have no
>  ARM/AMD hardware to verify it. if I need to fix these common part of 
> INTEL/ARM/AMD, I think I need to make
>  Xen compile correct and not to destroy the logic.

You indeed aren't expected to fix AMD or ARM code, but it may be
necessary to adjust that code to make error propagation work.

>> I'm sorry, but I'm
>> afraid you won't get away without - perhaps the VT-d maintainers could help
>> here, but in the end you have to face that it was mainly Intel people who
>> introduced the code which now needs fixing up, so I consider it not exactly
>> unfair for you (as a
>> company) to do this work.
>> 
> 
> Furthermore, I found out that
>      if IEC/Iotlb/Context flush error, then panic.
>      Else if device-tlb flush error, we'll hide the target ATS device and 
> kill the domain owning this ATS device. If impacted domain is hardware 
> domain, just throw out a warning.
> 
>      Then, it is fine to _not_check all the way up the device-tlb flush call 
> trees( maybe it is our next topic of discussion).

I don't follow - this sounds more or less like the model you've been
following in past versions, yet it was that which prompted the
request to properly propagate errors.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 13:22         ` Jan Beulich
@ 2015-12-21 13:35           ` Xu, Quan
  2015-12-21 14:08           ` Xu, Quan
  2015-12-22  7:40           ` Wu, Feng
  2 siblings, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-21 13:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

> On 21.12.2015 at 9:23pm, <JBeulich@suse.com> wrote:
> >>> On 21.12.15 at 14:08, <quan.xu@intel.com> wrote:
> >>  On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
> >> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
> >> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> >> >> > 2. If VT-d is bug, does the hardware_domain continue to work
> >> >> > with PCIe Devices / DRAM well with DMA remapping error?
> >> >> >    I think it is no. furthermore, i think VMM can NOT run a
> >> >> > normal HVM domain without device-passthrough.
> >> >>
> >> >> In addition to what Andrew said - VT-d is effectively not in use
> >> >> for domains without PT device.
> >> >
> >> > IMO, When VT-d is enabled, but is not working correct. These PCI-e
> >> > devices
> >> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
> >> > Assumed that, VT-d is effectively not in use for domains without PT
> >> > device, while at least the virtualization infrastructure is not trusted.
> >> > I think it is also not secure to run PV domains.
> >> >
> >> >> Impacting all such domains by crashing the hypervisor just because
> >> >> (in the extreme case) a single domain with PT devices exhibited a
> >> >> flush issue is a no-go imo.
> >> >>
> >> >
> >> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain
> >> > behavior, it is a Hypervisor and infrastructure issue.
> >> > ATS device's Device-TLB flush is a single domain issue.
> >> > Back to our original goal, my patch set is for ATS flush issue. right?
> >>
> >> You mean you don't like this entailing clean up of other code?
> >
> >  Jan, for ARM/AMD, I really have no knowledge to fix it. and I have no
> > ARM/AMD hardware to verify it. if I need to fix these common part of
> > INTEL/ARM/AMD, I think I need to make  Xen compile correct and not to
> > destroy the logic.
> 
> You indeed aren't expected to fix AMD or ARM code, but it may be necessary to
> adjust that code to make error propagation work.
> 
> >> I'm sorry, but I'm
> >> afraid you won't get away without - perhaps the VT-d maintainers
> >> could help here, but in the end you have to face that it was mainly
> >> Intel people who introduced the code which now needs fixing up, so I
> >> consider it not exactly unfair for you (as a
> >> company) to do this work.
> >>
> >
> > Furthermore, I found out that
> >      if IEC/Iotlb/Context flush error, then panic.
> >      Else if device-tlb flush error, we'll hide the target ATS device
> > and kill the domain owning this ATS device. If impacted domain is
> > hardware domain, just throw out a warning.
> >
> >      Then, it is fine to _not_check all the way up the device-tlb
> > flush call trees( maybe it is our next topic of discussion).
> 
> I don't follow - this sounds more or less like the model you've been following in
> past versions, yet it was that which prompted the request to properly propagate
> errors.
> 
Jan,
Maybe we can discuss the big picture first on how to deal with iec/iotlb/context and Device-TLB flush error.
Then we can discuss it in detail. We can ignore some point of the way up the device-tlb flush call trees. Such as 

   iommu_hwdom_init()
   *|--hd->platform_ops->map_page(d, gfn, mfn, mapping);


And more, if we are on same page, I am glad to write patch for all of vt-d issue, including IOMMU_WAIT_OP issue .etc..

-Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 13:22         ` Jan Beulich
  2015-12-21 13:35           ` Xu, Quan
@ 2015-12-21 14:08           ` Xu, Quan
  2015-12-21 14:16             ` Jan Beulich
  2015-12-22  7:40           ` Wu, Feng
  2 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-21 14:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

> On 21.12.2015 at 9:23pm, <JBeulich@suse.com> wrote:
> >>> On 21.12.15 at 14:08, <quan.xu@intel.com> wrote:
> >>  On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
> >> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
> >> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:

1. 
> >> > IMO, When VT-d is enabled, but is not working correct. These PCI-e
> >> > devices
> >> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
> >> > Assumed that, VT-d is effectively not in use for domains without PT
> >> > device, while at least the virtualization infrastructure is not trusted.
> >> > I think it is also not secure to run PV domains.
> >> >

2. 
> >> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain
> >> > behavior, it is a Hypervisor and infrastructure issue.
> >> > ATS device's Device-TLB flush is a single domain issue.
> >> > Back to our original goal, my patch set is for ATS flush issue. right?
> >>

One quick question, 
Jan, do you agreed the above 2 descriptions?

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 14:08           ` Xu, Quan
@ 2015-12-21 14:16             ` Jan Beulich
  2015-12-21 14:31               ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-21 14:16 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima,
	'keir@xen.org'

>>> On 21.12.15 at 15:08, <quan.xu@intel.com> wrote:
>>  On 21.12.2015 at 9:23pm, <JBeulich@suse.com> wrote:
>> >>> On 21.12.15 at 14:08, <quan.xu@intel.com> wrote:
>> >>  On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
>> >> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
>> >> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
>> >> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> 
> 1. 
>> >> > IMO, When VT-d is enabled, but is not working correct. These PCI-e
>> >> > devices
>> >> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
>> >> > Assumed that, VT-d is effectively not in use for domains without PT
>> >> > device, while at least the virtualization infrastructure is not trusted.
> 
> 2. 
>> >> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain
>> >> > behavior, it is a Hypervisor and infrastructure issue.
>> >> > ATS device's Device-TLB flush is a single domain issue.
> 
> One quick question, 
> Jan, do you agreed the above 2 descriptions?

I agree, but (see also Andrew's earlier reply) don't take them as an
excuse to crash Xen upon flush failures. Please accept that the
general policy has to be to handle errors with as narrow an impact
as possible.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 14:16             ` Jan Beulich
@ 2015-12-21 14:31               ` Xu, Quan
  2015-12-21 14:52                 ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-21 14:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

> On 21.12.2015 at 10:16pm, <JBeulich@suse.com> wrote:
> >>> On 21.12.15 at 15:08, <quan.xu@intel.com> wrote:
> >>  On 21.12.2015 at 9:23pm, <JBeulich@suse.com> wrote:
> >> >>> On 21.12.15 at 14:08, <quan.xu@intel.com> wrote:
> >> >>  On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
> >> >> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
> >> >> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> >
> > 1.
> >> >> > IMO, When VT-d is enabled, but is not working correct. These
> >> >> > PCI-e devices
> >> >> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
> >> >> > Assumed that, VT-d is effectively not in use for domains without
> >> >> > PT device, while at least the virtualization infrastructure is not trusted.
> >
> > 2.
> >> >> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single
> >> >> > domain behavior, it is a Hypervisor and infrastructure issue.
> >> >> > ATS device's Device-TLB flush is a single domain issue.
> >
> > One quick question,
> > Jan, do you agreed the above 2 descriptions?
> 
> I agree, but (see also Andrew's earlier reply) don't take them as an excuse to
> crash Xen upon flush failures. Please accept that the general policy has to be to
> handle errors with as narrow an impact as possible.
>

That maybe the gap between us. It is really an issue that require to crash Xen.

I think we are on the same page for Device-TLB flush issue.
Could you share your idea how to deal with VT-d (IEC/Context/Iotlb) flush issue?  Thanks.

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 14:31               ` Xu, Quan
@ 2015-12-21 14:52                 ` Jan Beulich
  2015-12-21 15:15                   ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-21 14:52 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima,
	'keir@xen.org'

>>> On 21.12.15 at 15:31, <quan.xu@intel.com> wrote:
>>  On 21.12.2015 at 10:16pm, <JBeulich@suse.com> wrote:
>> >>> On 21.12.15 at 15:08, <quan.xu@intel.com> wrote:
>> >>  On 21.12.2015 at 9:23pm, <JBeulich@suse.com> wrote:
>> >> >>> On 21.12.15 at 14:08, <quan.xu@intel.com> wrote:
>> >> >>  On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
>> >> >> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
>> >> >> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
>> >> >> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
>> >
>> > 1.
>> >> >> > IMO, When VT-d is enabled, but is not working correct. These
>> >> >> > PCI-e devices
>> >> >> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
>> >> >> > Assumed that, VT-d is effectively not in use for domains without
>> >> >> > PT device, while at least the virtualization infrastructure is not 
> trusted.
>> >
>> > 2.
>> >> >> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single
>> >> >> > domain behavior, it is a Hypervisor and infrastructure issue.
>> >> >> > ATS device's Device-TLB flush is a single domain issue.
>> >
>> > One quick question,
>> > Jan, do you agreed the above 2 descriptions?
>> 
>> I agree, but (see also Andrew's earlier reply) don't take them as an excuse to
>> crash Xen upon flush failures. Please accept that the general policy has to be to
>> handle errors with as narrow an impact as possible.
>>
> 
> That maybe the gap between us. It is really an issue that require to crash 
> Xen.
> 
> I think we are on the same page for Device-TLB flush issue.
> Could you share your idea how to deal with VT-d (IEC/Context/Iotlb) flush 
> issue?  Thanks.

I think this was sufficiently outlined before, by both Andrew and me.
I'm sorry, but I'm not willing to start over with the discussion.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 14:52                 ` Jan Beulich
@ 2015-12-21 15:15                   ` Xu, Quan
  0 siblings, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-21 15:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

>On 21.12.2015 at 10:53pm, <JBeulich@suse.com> wrote:
> >>> On 21.12.15 at 15:31, <quan.xu@intel.com> wrote:
> >>  On 21.12.2015 at 10:16pm, <JBeulich@suse.com> wrote:
> >> >>> On 21.12.15 at 15:08, <quan.xu@intel.com> wrote:
> >> >>  On 21.12.2015 at 9:23pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 21.12.15 at 14:08, <quan.xu@intel.com> wrote:
> >> >> >>  On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
> >> >> >> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
> >> >> >> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
> >> >> >> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> >> >
> >> > 1.
> >> >> >> > IMO, When VT-d is enabled, but is not working correct. These
> >> >> >> > PCI-e devices
> >> >> >> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
> >> >> >> > Assumed that, VT-d is effectively not in use for domains
> >> >> >> > without PT device, while at least the virtualization
> >> >> >> > infrastructure is not
> > trusted.
> >> >
> >> > 2.
> >> >> >> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single
> >> >> >> > domain behavior, it is a Hypervisor and infrastructure issue.
> >> >> >> > ATS device's Device-TLB flush is a single domain issue.
> >> >
> >> > One quick question,
> >> > Jan, do you agreed the above 2 descriptions?
> >>
> >> I agree, but (see also Andrew's earlier reply) don't take them as an
> >> excuse to crash Xen upon flush failures. Please accept that the
> >> general policy has to be to handle errors with as narrow an impact as
> possible.
> >>
> >
> > That maybe the gap between us. It is really an issue that require to
> > crash Xen.
> >
> > I think we are on the same page for Device-TLB flush issue.
> > Could you share your idea how to deal with VT-d (IEC/Context/Iotlb)
> > flush issue?  Thanks.
> 
> I think this was sufficiently outlined before, by both Andrew and me.
> I'm sorry, but I'm not willing to start over with the discussion.
> 

Jan, Never mind.. I understand that.
But could you forward the result to me? when you are available. 
I am really not in the previous thread and I can't find in in my Xen maillist archive. Thanks.

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-21 13:22         ` Jan Beulich
  2015-12-21 13:35           ` Xu, Quan
  2015-12-21 14:08           ` Xu, Quan
@ 2015-12-22  7:40           ` Wu, Feng
  2015-12-22  8:01             ` Jan Beulich
  2 siblings, 1 reply; 50+ messages in thread
From: Wu, Feng @ 2015-12-22  7:40 UTC (permalink / raw)
  To: Jan Beulich, Xu, Quan
  Cc: Tian, Kevin, 'keir@xen.org',
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, December 21, 2015 9:23 PM
> To: Xu, Quan <quan.xu@intel.com>
> Cc: 'andrew.cooper3@citrix.com' <andrew.cooper3@citrix.com>;
> 'george.dunlap@eu.citrix.com' <george.dunlap@eu.citrix.com>; Wu, Feng
> <feng.wu@intel.com>; Nakajima, Jun <jun.nakajima@intel.com>; Tian, Kevin
> <kevin.tian@intel.com>; 'xen-devel@lists.xen.org' <xen-devel@lists.xen.org>;
> 'keir@xen.org' <keir@xen.org>; 'tim@xen.org' <tim@xen.org>
> Subject: RE: [Xen-devel] [PATCH v3 0/2] VT-d flush issue
> 
> >>> On 21.12.15 at 14:08, <quan.xu@intel.com> wrote:
> >>  On 21.12.2015 at 8:50pm, <JBeulich@suse.com> wrote:
> >> >>> On 21.12.15 at 13:28, <quan.xu@intel.com> wrote:
> >> > On 21.12.2015 at 7:47pm, <JBeulich@suse.com> wrote:
> >> >> >>> On 20.12.15 at 14:57, <quan.xu@intel.com> wrote:
> >> >> > 2. If VT-d is bug, does the hardware_domain continue to work with
> >> >> > PCIe Devices / DRAM well with DMA remapping error?
> >> >> >    I think it is no. furthermore, i think VMM can NOT run a normal
> >> >> > HVM domain without device-passthrough.
> >> >>
> >> >> In addition to what Andrew said - VT-d is effectively not in use for
> >> >> domains without PT device.
> >> >
> >> > IMO, When VT-d is enabled, but is not working correct. These PCI-e
> >> > devices
> >> > (Disks/NICs..) DMA/Interrupt behaviors are not predictable.
> >> > Assumed that, VT-d is effectively not in use for domains without PT
> >> > device, while at least the virtualization infrastructure is not trusted.
> >> > I think it is also not secure to run PV domains.
> >> >
> >> >> Impacting all such domains by crashing the hypervisor just because
> >> >> (in the extreme case) a single domain with PT devices exhibited a
> >> >> flush issue is a no-go imo.
> >> >>
> >> >
> >> > IMO, a VT-d (IEC/Context/Iotlb) flush issue is not a single domain
> >> > behavior, it is a Hypervisor and infrastructure issue.
> >> > ATS device's Device-TLB flush is a single domain issue.
> >> > Back to our original goal, my patch set is for ATS flush issue. right?
> >>
> >> You mean you don't like this entailing clean up of other code?
> >
> >  Jan, for ARM/AMD, I really have no knowledge to fix it. and I have no
> >  ARM/AMD hardware to verify it. if I need to fix these common part of
> > INTEL/ARM/AMD, I think I need to make
> >  Xen compile correct and not to destroy the logic.
> 
> You indeed aren't expected to fix AMD or ARM code, but it may be
> necessary to adjust that code to make error propagation work.
> 
> >> I'm sorry, but I'm
> >> afraid you won't get away without - perhaps the VT-d maintainers could help
> >> here, but in the end you have to face that it was mainly Intel people who
> >> introduced the code which now needs fixing up, so I consider it not exactly
> >> unfair for you (as a
> >> company) to do this work.
> >>
> >
> > Furthermore, I found out that
> >      if IEC/Iotlb/Context flush error, then panic.
> >      Else if device-tlb flush error, we'll hide the target ATS device and
> > kill the domain owning this ATS device. If impacted domain is hardware
> > domain, just throw out a warning.
> >
> >      Then, it is fine to _not_check all the way up the device-tlb flush call
> > trees( maybe it is our next topic of discussion).
> 
> I don't follow - this sounds more or less like the model you've been
> following in past versions, yet it was that which prompted the
> request to properly propagate errors.

Maybe, there are still some misunderstanding about your expectation.
Let me summarize it here.

After Quan's patch-set, there are two types of error code:
- -EOPNOTSUPP
Now we only support and use software way to synchronize the invalidation,
if someone calls queue_invalidate_wait() and passes sw with 0, then
-EOPNOTSUPP is returned (Though this cannot happen in real world, since 
queue_invalidate_wait() is called only in one place and 1 is passed in to 'sw').
So I am not sure what should we do for this return value, if we really get
that return value, it means the flush is not actually executed, so the iommu
state is incorrect, the data is inconsistent. Do you think what should we do
for this case?

- -ETIMEDOUT
For this case, Quan has elaborate a lot, IIUIC, the main gap between you
and Quan is you think the error code should be propagated to the up caller,
while in Quan's implementation, he deals with this error in invalidate_timeout()
and device_tlb_invalidate_timeout(), hence no need to propagated it to
up called, since the handling policy will crash the domain, so we don't think
propagated error code is needed. Even we propagate it, the up caller still
doesn't need to do anything for it.

Quan, please correct me if I am wrong.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  7:40           ` Wu, Feng
@ 2015-12-22  8:01             ` Jan Beulich
  2015-12-22  8:10               ` Wu, Feng
  2015-12-22  8:10               ` Xu, Quan
  0 siblings, 2 replies; 50+ messages in thread
From: Jan Beulich @ 2015-12-22  8:01 UTC (permalink / raw)
  To: Feng Wu, Quan Xu
  Cc: Kevin Tian, 'keir@xen.org',
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima

>>> On 22.12.15 at 08:40, <feng.wu@intel.com> wrote:
> Maybe, there are still some misunderstanding about your expectation.
> Let me summarize it here.
> 
> After Quan's patch-set, there are two types of error code:
> - -EOPNOTSUPP
> Now we only support and use software way to synchronize the invalidation,
> if someone calls queue_invalidate_wait() and passes sw with 0, then
> -EOPNOTSUPP is returned (Though this cannot happen in real world, since 
> queue_invalidate_wait() is called only in one place and 1 is passed in to 'sw').
> So I am not sure what should we do for this return value, if we really get
> that return value, it means the flush is not actually executed, so the iommu
> state is incorrect, the data is inconsistent. Do you think what should we do
> for this case?

Since seeing this error would be a software bug, BUG() or ASSERT()
are fine to handle this specific case, if need be.

> - -ETIMEDOUT
> For this case, Quan has elaborate a lot, IIUIC, the main gap between you
> and Quan is you think the error code should be propagated to the up caller,
> while in Quan's implementation, he deals with this error in 
> invalidate_timeout()
> and device_tlb_invalidate_timeout(), hence no need to propagated it to
> up called, since the handling policy will crash the domain, so we don't think
> propagated error code is needed. Even we propagate it, the up caller still
> doesn't need to do anything for it.

"Handling" an error by e.g. domain_crash() doesn't mean you don't
need to also modify (or at the very least inspect) callers: They may
continue doing things _assuming_ success. Of course you don't
need to domain_crash() at all layers. But errors from lower layers
should, at least in most ordinary cases, lead to higher layers bailing
instead of continuing.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  8:01             ` Jan Beulich
@ 2015-12-22  8:10               ` Wu, Feng
  2015-12-22  8:20                 ` Jan Beulich
  2015-12-22  8:10               ` Xu, Quan
  1 sibling, 1 reply; 50+ messages in thread
From: Wu, Feng @ 2015-12-22  8:10 UTC (permalink / raw)
  To: Jan Beulich, Xu, Quan
  Cc: Tian, Kevin, 'keir@xen.org',
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, December 22, 2015 4:01 PM
> To: Wu, Feng <feng.wu@intel.com>; Xu, Quan <quan.xu@intel.com>
> Cc: 'andrew.cooper3@citrix.com' <andrew.cooper3@citrix.com>;
> 'george.dunlap@eu.citrix.com' <george.dunlap@eu.citrix.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Tian, Kevin <kevin.tian@intel.com>; 'xen-
> devel@lists.xen.org' <xen-devel@lists.xen.org>; 'keir@xen.org' <keir@xen.org>;
> 'tim@xen.org' <tim@xen.org>
> Subject: RE: [Xen-devel] [PATCH v3 0/2] VT-d flush issue
> 
> >>> On 22.12.15 at 08:40, <feng.wu@intel.com> wrote:
> > Maybe, there are still some misunderstanding about your expectation.
> > Let me summarize it here.
> >
> > After Quan's patch-set, there are two types of error code:
> > - -EOPNOTSUPP
> > Now we only support and use software way to synchronize the invalidation,
> > if someone calls queue_invalidate_wait() and passes sw with 0, then
> > -EOPNOTSUPP is returned (Though this cannot happen in real world, since
> > queue_invalidate_wait() is called only in one place and 1 is passed in to 'sw').
> > So I am not sure what should we do for this return value, if we really get
> > that return value, it means the flush is not actually executed, so the iommu
> > state is incorrect, the data is inconsistent. Do you think what should we do
> > for this case?
> 
> Since seeing this error would be a software bug, BUG() or ASSERT()
> are fine to handle this specific case, if need be.
> 
> > - -ETIMEDOUT
> > For this case, Quan has elaborate a lot, IIUIC, the main gap between you
> > and Quan is you think the error code should be propagated to the up caller,
> > while in Quan's implementation, he deals with this error in
> > invalidate_timeout()
> > and device_tlb_invalidate_timeout(), hence no need to propagated it to
> > up called, since the handling policy will crash the domain, so we don't think
> > propagated error code is needed. Even we propagate it, the up caller still
> > doesn't need to do anything for it.
> 
> "Handling" an error by e.g. domain_crash() doesn't mean you don't
> need to also modify (or at the very least inspect) callers: They may
> continue doing things _assuming_ success. Of course you don't
> need to domain_crash() at all layers. But errors from lower layers
> should, at least in most ordinary cases, lead to higher layers bailing
> instead of continuing.

So there are two questions:
1. Just confirmed with Quan, for IEC/context/iotlb flush timeout, the policy
will be Xen panic. Do you agree with this policy? If you do, we don't need to
pass the error code to the caller, right? If you don't, we may need more
discussion about how the handle this case first before anything else.
2. For Device iotlb flush, the current policy is to crash the domain, in this case
we need to check case by case whether the caller need to handle something
if timeout is encountered, right? If needed, we need do something there, if
not needed, we can just let it be.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  8:01             ` Jan Beulich
  2015-12-22  8:10               ` Wu, Feng
@ 2015-12-22  8:10               ` Xu, Quan
  2015-12-22  8:27                 ` Jan Beulich
  1 sibling, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-22  8:10 UTC (permalink / raw)
  To: Jan Beulich, Wu, Feng
  Cc: Tian, Kevin, 'keir@xen.org',
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun

>On 22.12.2015 at 4:01pm <JBeulich@suse.com> wrote:
> >>> On 22.12.15 at 08:40, <feng.wu@intel.com> wrote:
> > Maybe, there are still some misunderstanding about your expectation.
> > Let me summarize it here.
> >
> > After Quan's patch-set, there are two types of error code:
> > - -EOPNOTSUPP
> > Now we only support and use software way to synchronize the
> > invalidation, if someone calls queue_invalidate_wait() and passes sw
> > with 0, then -EOPNOTSUPP is returned (Though this cannot happen in
> > real world, since
> > queue_invalidate_wait() is called only in one place and 1 is passed in to 'sw').
> > So I am not sure what should we do for this return value, if we really
> > get that return value, it means the flush is not actually executed, so
> > the iommu state is incorrect, the data is inconsistent. Do you think
> > what should we do for this case?
> 
> Since seeing this error would be a software bug, BUG() or ASSERT() are fine to
> handle this specific case, if need be.
> 
> > - -ETIMEDOUT
> > For this case, Quan has elaborate a lot, IIUIC, the main gap between
> > you and Quan is you think the error code should be propagated to the
> > up caller, while in Quan's implementation, he deals with this error in
> > invalidate_timeout()
> > and device_tlb_invalidate_timeout(), hence no need to propagated it to
> > up called, since the handling policy will crash the domain, so we
> > don't think propagated error code is needed. Even we propagate it, the
> > up caller still doesn't need to do anything for it.
> 
> "Handling" an error by e.g. domain_crash() doesn't mean you don't need to also
> modify (or at the very least inspect) callers: They may continue doing things
> _assuming_ success. Of course you don't need to domain_crash() at all layers.
> But errors from lower layers should, at least in most ordinary cases, lead to
> higher layers bailing instead of continuing.
> 

For Device-TLB flush error, I think we need to propagated error code.
For IEC/iotlb/context flush error, if panic is acceptable, we can ignore the propagated error code. BTW, it is very challenge / tricky to handle all
Of error, and some error is unrecoverable. As mentioned, it looks like rewriting Xen hypervisor.

For -EOPNOTSUPP, we can print warning message. If it supports interrupt method, we can return 0 in queue_invalidate_wait().

Feng, thanks for your update!

-Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  8:10               ` Wu, Feng
@ 2015-12-22  8:20                 ` Jan Beulich
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Beulich @ 2015-12-22  8:20 UTC (permalink / raw)
  To: Feng Wu, Quan Xu
  Cc: Kevin Tian, 'keir@xen.org',
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima

>>> On 22.12.15 at 09:10, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, December 22, 2015 4:01 PM
>> To: Wu, Feng <feng.wu@intel.com>; Xu, Quan <quan.xu@intel.com>
>> Cc: 'andrew.cooper3@citrix.com' <andrew.cooper3@citrix.com>;
>> 'george.dunlap@eu.citrix.com' <george.dunlap@eu.citrix.com>; Nakajima, Jun
>> <jun.nakajima@intel.com>; Tian, Kevin <kevin.tian@intel.com>; 'xen-
>> devel@lists.xen.org' <xen-devel@lists.xen.org>; 'keir@xen.org' <keir@xen.org>;
>> 'tim@xen.org' <tim@xen.org>
>> Subject: RE: [Xen-devel] [PATCH v3 0/2] VT-d flush issue
>> 
>> >>> On 22.12.15 at 08:40, <feng.wu@intel.com> wrote:
>> > Maybe, there are still some misunderstanding about your expectation.
>> > Let me summarize it here.
>> >
>> > After Quan's patch-set, there are two types of error code:
>> > - -EOPNOTSUPP
>> > Now we only support and use software way to synchronize the invalidation,
>> > if someone calls queue_invalidate_wait() and passes sw with 0, then
>> > -EOPNOTSUPP is returned (Though this cannot happen in real world, since
>> > queue_invalidate_wait() is called only in one place and 1 is passed in to 
> 'sw').
>> > So I am not sure what should we do for this return value, if we really get
>> > that return value, it means the flush is not actually executed, so the 
> iommu
>> > state is incorrect, the data is inconsistent. Do you think what should we 
> do
>> > for this case?
>> 
>> Since seeing this error would be a software bug, BUG() or ASSERT()
>> are fine to handle this specific case, if need be.
>> 
>> > - -ETIMEDOUT
>> > For this case, Quan has elaborate a lot, IIUIC, the main gap between you
>> > and Quan is you think the error code should be propagated to the up caller,
>> > while in Quan's implementation, he deals with this error in
>> > invalidate_timeout()
>> > and device_tlb_invalidate_timeout(), hence no need to propagated it to
>> > up called, since the handling policy will crash the domain, so we don't 
> think
>> > propagated error code is needed. Even we propagate it, the up caller still
>> > doesn't need to do anything for it.
>> 
>> "Handling" an error by e.g. domain_crash() doesn't mean you don't
>> need to also modify (or at the very least inspect) callers: They may
>> continue doing things _assuming_ success. Of course you don't
>> need to domain_crash() at all layers. But errors from lower layers
>> should, at least in most ordinary cases, lead to higher layers bailing
>> instead of continuing.
> 
> So there are two questions:
> 1. Just confirmed with Quan, for IEC/context/iotlb flush timeout, the policy
> will be Xen panic. Do you agree with this policy? If you do, we don't need 
> to
> pass the error code to the caller, right? If you don't, we may need more
> discussion about how the handle this case first before anything else.

Just to repeat several earlier statements: BUG() and panic() ought
to be avoided wherever possible (i.e. including here).

> 2. For Device iotlb flush, the current policy is to crash the domain, in 
> this case
> we need to check case by case whether the caller need to handle something
> if timeout is encountered, right? If needed, we need do something there, if
> not needed, we can just let it be.

"Just let it be" is wrong (and is what led to the bad state the code is
in right now in this regard). I gave advice for the direction before:
Add __must_check. This will make the compiler point out to you
where changes are needed.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  8:10               ` Xu, Quan
@ 2015-12-22  8:27                 ` Jan Beulich
  2015-12-22  8:43                   ` Xu, Quan
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Beulich @ 2015-12-22  8:27 UTC (permalink / raw)
  To: Feng Wu, Quan Xu
  Cc: Kevin Tian, 'keir@xen.org',
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima

>>> On 22.12.15 at 09:10, <quan.xu@intel.com> wrote:
> For Device-TLB flush error, I think we need to propagated error code.
> For IEC/iotlb/context flush error, if panic is acceptable, we can ignore the 
> propagated error code. BTW, it is very challenge / tricky to handle all
> Of error, and some error is unrecoverable. As mentioned, it looks like 
> rewriting Xen hypervisor.

We're moving in circles. In particular I don't believe this last sentence.
Re-writing many parts of the hypervisor would be required is you
were to return the error to the guest (which technically isn't possible
in many case afaict). Having crashed the guest, you don't need to be
concerned about unrolling previous (partially completed) operations,
so I don't think it's this much of a rewrite.

And just to be clear - I hope you recall that the current approach
taken to the flush issue is already a compromise far away from
where we initially meant the code to move, and it was always
clear that the bad error handling state the code is in is going to
get in the way of this being a simple fix. It's sad that the people
originally having introduced that code can't be held responsible
for fixing this up, but that's a situation we find ourselves in all
the time.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  8:27                 ` Jan Beulich
@ 2015-12-22  8:43                   ` Xu, Quan
  2015-12-22  9:08                     ` Jan Beulich
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-22  8:43 UTC (permalink / raw)
  To: Jan Beulich, Wu, Feng
  Cc: Tian, Kevin, 'keir@xen.org',
	'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun

> On December 22, 2015 4:27pm, <JBeulich@suse.com> wrote:
> >>> On 22.12.15 at 09:10, <quan.xu@intel.com> wrote:
> > For Device-TLB flush error, I think we need to propagated error code.
> > For IEC/iotlb/context flush error, if panic is acceptable, we can
> > ignore the propagated error code. BTW, it is very challenge / tricky
> > to handle all Of error, and some error is unrecoverable. As mentioned,
> > it looks like rewriting Xen hypervisor.
> 
> We're moving in circles. In particular I don't believe this last sentence.
> Re-writing many parts of the hypervisor would be required is you were to return
> the error to the guest (which technically isn't possible in many case afaict).
> Having crashed the guest, you don't need to be concerned about unrolling
> previous (partially completed) operations, so I don't think it's this much of a
> rewrite.
> 
> And just to be clear - I hope you recall that the current approach taken to the
> flush issue is already a compromise far away from where we initially meant the
> code to move, and it was always clear that the bad error handling state the
> code is in is going to get in the way of this being a simple fix. It's sad that the
> people originally having introduced that code can't be held responsible for fixing
> this up, but that's a situation we find ourselves in all the time.
> 
> Jan

Jan,
Let's finish our discussion. I accept your idea. But I need to separate it into 3 patch set (It is complicated for me, sometime it makes me crash.):
   Patch set 1:  Device-TLB/iotlb flush error. (send out this week)
   Patch set 2:  context flush error. (need 2 ~ 3 weeks)
   Patch set 3:  iec flush error. (need 3 ~ 4 weeks)

If it is acceptable, we can discuss in detail one by one..

Thanks
-Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  8:43                   ` Xu, Quan
@ 2015-12-22  9:08                     ` Jan Beulich
  2015-12-22  9:18                       ` Xu, Quan
  2015-12-22 10:26                       ` Xu, Quan
  0 siblings, 2 replies; 50+ messages in thread
From: Jan Beulich @ 2015-12-22  9:08 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Jun Nakajima,
	'keir@xen.org'

>>> On 22.12.15 at 09:43, <quan.xu@intel.com> wrote:
> Let's finish our discussion. I accept your idea. But I need to separate it 
> into 3 patch set (It is complicated for me, sometime it makes me crash.):
>    Patch set 1:  Device-TLB/iotlb flush error. (send out this week)
>    Patch set 2:  context flush error. (need 2 ~ 3 weeks)
>    Patch set 3:  iec flush error. (need 3 ~ 4 weeks)
> 
> If it is acceptable, we can discuss in detail one by one..

Splitting up is of course acceptable.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  9:08                     ` Jan Beulich
@ 2015-12-22  9:18                       ` Xu, Quan
  2015-12-22 10:26                       ` Xu, Quan
  1 sibling, 0 replies; 50+ messages in thread
From: Xu, Quan @ 2015-12-22  9:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

> On 22.12.2015 at 5:09pm, <JBeulich@suse.com> wrote:
> >>> On 22.12.15 at 09:43, <quan.xu@intel.com> wrote:
> > Let's finish our discussion. I accept your idea. But I need to
> > separate it into 3 patch set (It is complicated for me, sometime it makes me
> crash.):
> >    Patch set 1:  Device-TLB/iotlb flush error. (send out this week)
> >    Patch set 2:  context flush error. (need 2 ~ 3 weeks)
> >    Patch set 3:  iec flush error. (need 3 ~ 4 weeks)
> >
> > If it is acceptable, we can discuss in detail one by one..
> 
> Splitting up is of course acceptable.
> 

Thanks Jan. I am getting started to Patch set 1. 

Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22  9:08                     ` Jan Beulich
  2015-12-22  9:18                       ` Xu, Quan
@ 2015-12-22 10:26                       ` Xu, Quan
  2015-12-23  6:21                         ` Tian, Kevin
  1 sibling, 1 reply; 50+ messages in thread
From: Xu, Quan @ 2015-12-22 10:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun,
	'keir@xen.org'

> On 22.12.2015 at 5:09pm, <JBeulich@suse.com> wrote:
> >>> On 22.12.15 at 09:43, <quan.xu@intel.com> wrote:
> > Let's finish our discussion. I accept your idea. But I need to
> > separate it into 3 patch set (It is complicated for me, sometime it makes me
> crash.):
> >    Patch set 1:  Device-TLB/iotlb flush error. (send out this week)
> >    Patch set 2:  context flush error. (need 2 ~ 3 weeks)
> >    Patch set 3:  iec flush error. (need 3 ~ 4 weeks)
> >
> > If it is acceptable, we can discuss in detail one by one..
> 
> Splitting up is of course acceptable.

Jan,
   Just update the combination,

   Patch set 1:  Device-TLB flush error. (send out this week)
   Patch set 2:  context/iotlb flush error. (need 2 ~ 3 weeks)
   Patch set 3:  iec flush error. (need 3 ~ 4 weeks)

-Quan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/2] VT-d flush issue
  2015-12-22 10:26                       ` Xu, Quan
@ 2015-12-23  6:21                         ` Tian, Kevin
  0 siblings, 0 replies; 50+ messages in thread
From: Tian, Kevin @ 2015-12-23  6:21 UTC (permalink / raw)
  To: Xu, Quan, Jan Beulich
  Cc: 'keir@xen.org', 'george.dunlap@eu.citrix.com',
	'andrew.cooper3@citrix.com', 'tim@xen.org',
	'xen-devel@lists.xen.org', Nakajima, Jun, Wu, Feng

> From: Xu, Quan
> Sent: Tuesday, December 22, 2015 6:26 PM
> 
> > On 22.12.2015 at 5:09pm, <JBeulich@suse.com> wrote:
> > >>> On 22.12.15 at 09:43, <quan.xu@intel.com> wrote:
> > > Let's finish our discussion. I accept your idea. But I need to
> > > separate it into 3 patch set (It is complicated for me, sometime it makes me
> > crash.):
> > >    Patch set 1:  Device-TLB/iotlb flush error. (send out this week)
> > >    Patch set 2:  context flush error. (need 2 ~ 3 weeks)
> > >    Patch set 3:  iec flush error. (need 3 ~ 4 weeks)
> > >
> > > If it is acceptable, we can discuss in detail one by one..
> >
> > Splitting up is of course acceptable.
> 
> Jan,
>    Just update the combination,
> 
>    Patch set 1:  Device-TLB flush error. (send out this week)
>    Patch set 2:  context/iotlb flush error. (need 2 ~ 3 weeks)
>    Patch set 3:  iec flush error. (need 3 ~ 4 weeks)
> 
> -Quan

Sorry being late to this discussion. Thanks Jan/Andrew's comments
for overall gap in VT-d error handling. Intel is definitely committed 
to continue improve current code quality, but let's do things one-by-one.

Above is a good split. Let's start from there.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-12-23  6:21 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-12 13:21 [PATCH v3 0/2] VT-d flush issue Quan Xu
2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu
2015-12-14  9:14   ` Jan Beulich
2015-12-15  8:35     ` Xu, Quan
2015-12-15  9:18       ` Jan Beulich
2015-12-15 10:10         ` Xu, Quan
2015-12-12 13:21 ` [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue Quan Xu
2015-12-14  9:28   ` Jan Beulich
2015-12-15  8:15     ` Xu, Quan
2015-12-15  9:16       ` Jan Beulich
2015-12-15 10:24         ` Xu, Quan
2015-12-15 12:32           ` Jan Beulich
2015-12-15 13:01             ` Xu, Quan
2015-12-15 13:26               ` Jan Beulich
2015-12-15 10:58         ` Xu, Quan
2015-12-15 12:23         ` Xu, Quan
2015-12-15 12:29           ` Jan Beulich
2015-12-15 13:31             ` Xu, Quan
     [not found]               ` <5670271402000078000BFA35@prv-mh.provo.novell.com>
2015-12-15 14:49                 ` Xu, Quan
2015-12-16  3:51                 ` Xu, Quan
2015-12-16  8:08                   ` Jan Beulich
2015-12-17 11:43                     ` Xu, Quan
2015-12-17 12:10                       ` Jan Beulich
2015-12-17 12:55                         ` Xu, Quan
2015-12-19 14:01 ` [PATCH v3 0/2] VT-d flush issue Xu, Quan
  -- strict thread matches above, loose matches on Subject: below --
2015-12-20 13:57 Xu, Quan
2015-12-20 15:51 ` Andrew Cooper
2015-12-21 11:46 ` Jan Beulich
2015-12-21 12:28   ` Xu, Quan
2015-12-21 12:50     ` Jan Beulich
2015-12-21 13:08       ` Xu, Quan
2015-12-21 13:22         ` Jan Beulich
2015-12-21 13:35           ` Xu, Quan
2015-12-21 14:08           ` Xu, Quan
2015-12-21 14:16             ` Jan Beulich
2015-12-21 14:31               ` Xu, Quan
2015-12-21 14:52                 ` Jan Beulich
2015-12-21 15:15                   ` Xu, Quan
2015-12-22  7:40           ` Wu, Feng
2015-12-22  8:01             ` Jan Beulich
2015-12-22  8:10               ` Wu, Feng
2015-12-22  8:20                 ` Jan Beulich
2015-12-22  8:10               ` Xu, Quan
2015-12-22  8:27                 ` Jan Beulich
2015-12-22  8:43                   ` Xu, Quan
2015-12-22  9:08                     ` Jan Beulich
2015-12-22  9:18                       ` Xu, Quan
2015-12-22 10:26                       ` Xu, Quan
2015-12-23  6:21                         ` Tian, Kevin
2015-12-21 12:44   ` Xu, Quan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).