* [PATCH v3 0/2] VT-d flush issue @ 2015-12-12 13:21 Quan Xu 2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Quan Xu @ 2015-12-12 13:21 UTC (permalink / raw) To: jbeulich, kevin.tian Cc: feng.wu, eddie.dong, george.dunlap, andrew.cooper3, tim, xen-devel, jun.nakajima, Quan Xu, keir This patches are based on Kevin Tian's previous discussion 'Revisit VT-d asynchronous flush issue'. Fix current timeout concern and also allow limited ATS support in a light way: 1. Reduce spin timeout to 1ms, which can be boot-time changed with 'iommu_qi_timeout_ms'. For example: multiboot /boot/xen.gz ats=1 iommu_qi_timeout_ms=100 2. Fix vt-d flush timeout issue. If IOTLB/Context/IETC flush is timeout, we should think all devices under this IOMMU cannot function correctly. So for each device under this IOMMU we'll mark it as unassignable and kill the domain owning the device. If Device-TLB flush is timeout, we'll mark the target ATS device as unassignable and kill the domain owning this device. If impacted domain is hardware domain, just throw out a warning. It's an open here whether we want to kill hardware domain (or directly panic hypervisor). Comments are welcomed. Device marked as unassignable will be disallowed to be further assigned to any domain. -- * DMAR_OPERATION_TIMEOUT should be also chopped down to a low number of milliseconds. As Kevin Tian mentioned in 'Revisit VT-d asynchronous flush issue', We also confirmed with hardware team that 1ms is large enough for IOMMU internal flush. So I can change DMAR_OPERATION_TIMEOUT from 1000 ms to 1 ms. IOMMU_WAIT_OP() is only for VT-d registers read/write, and there is also a panic. We need a further discussion whether or how to remove this panic in next patch set. * Kevin Tian did basic functional review. --Changes in v3: 1. Once found you can break the for loop immediately since BDF is unique. 2. Separate invalidate_timeout(struct iommu *iommu, int type, u16 did, u16 seg, u8 bus, u8 devfn) into invalidate_timeout(struct iommu *iommu) and device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, u16 seg, u8 bus, u8 devfn). invalidate_timeout() is for iotlb/iec/context flush error. device_tlb_invalidate_timeout() is for Device-TLB flush error. then ignore these INVALID_* parameters. 3. Change macros(mark_pdev_unassignable/IS_PDEV_UNASSIGNABLE) into static inline functions. Quan Xu (2): VT-d: Reduce spin timeout to 1ms, which can be boot-time changed. VT-d: Fix vt-d flush timeout issue. xen/drivers/passthrough/vtd/extern.h | 5 ++ xen/drivers/passthrough/vtd/iommu.c | 6 +++ xen/drivers/passthrough/vtd/qinval.c | 89 +++++++++++++++++++++++++++++++++-- xen/drivers/passthrough/vtd/x86/ats.c | 16 +++++++ xen/include/xen/pci.h | 11 +++++ 5 files changed, 123 insertions(+), 4 deletions(-) -- 1.8.1.2 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed. 2015-12-12 13:21 [PATCH v3 0/2] VT-d flush issue Quan Xu @ 2015-12-12 13:21 ` Quan Xu 2015-12-14 9:14 ` Jan Beulich 2015-12-12 13:21 ` [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue Quan Xu 2015-12-19 14:01 ` [PATCH v3 0/2] VT-d flush issue Xu, Quan 2 siblings, 1 reply; 25+ messages in thread From: Quan Xu @ 2015-12-12 13:21 UTC (permalink / raw) To: jbeulich, kevin.tian Cc: feng.wu, eddie.dong, george.dunlap, andrew.cooper3, tim, xen-devel, jun.nakajima, Quan Xu, keir Signed-off-by: Quan Xu <quan.xu@intel.com> --- xen/drivers/passthrough/vtd/qinval.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c index b81b0bd..990baf2 100644 --- a/xen/drivers/passthrough/vtd/qinval.c +++ b/xen/drivers/passthrough/vtd/qinval.c @@ -28,6 +28,11 @@ #include "vtd.h" #include "extern.h" +static int __read_mostly iommu_qi_timeout_ms = 1; +integer_param("iommu_qi_timeout_ms", iommu_qi_timeout_ms); + +#define IOMMU_QI_TIMEOUT (iommu_qi_timeout_ms * MILLISECS(1)) + static void print_qi_regs(struct iommu *iommu) { u64 val; @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu *iommu, start_time = NOW(); while ( poll_slot != QINVAL_STAT_DONE ) { - if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) ) + if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) ) { print_qi_regs(iommu); - panic("queue invalidate wait descriptor was not executed"); + dprintk(XENLOG_WARNING VTDPREFIX, + "Queue invalidate wait descriptor was timeout.\n"); + return -ETIMEDOUT; } cpu_relax(); } -- 1.8.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed. 2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu @ 2015-12-14 9:14 ` Jan Beulich 2015-12-15 8:35 ` Xu, Quan 0 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2015-12-14 9:14 UTC (permalink / raw) To: Quan Xu Cc: kevin.tian, feng.wu, george.dunlap, andrew.cooper3, tim, xen-devel, jun.nakajima, keir >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu *iommu, > start_time = NOW(); > while ( poll_slot != QINVAL_STAT_DONE ) > { > - if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) ) > + if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) ) > { > print_qi_regs(iommu); > - panic("queue invalidate wait descriptor was not executed"); > + dprintk(XENLOG_WARNING VTDPREFIX, > + "Queue invalidate wait descriptor was timeout.\n"); > + return -ETIMEDOUT; > } Without the v2 discussion even having finished, and without you having taken care of v2 comments here, I don't see much value in this v3. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed. 2015-12-14 9:14 ` Jan Beulich @ 2015-12-15 8:35 ` Xu, Quan 2015-12-15 9:18 ` Jan Beulich 0 siblings, 1 reply; 25+ messages in thread From: Xu, Quan @ 2015-12-15 8:35 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 14.12.2015 at 5:14pm, <JBeulich@suse.com> wrote: > >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > > @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu > *iommu, > > start_time = NOW(); > > while ( poll_slot != QINVAL_STAT_DONE ) > > { > > - if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) ) > > + if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) ) > > { > > print_qi_regs(iommu); > > - panic("queue invalidate wait descriptor was not > executed"); > > + dprintk(XENLOG_WARNING VTDPREFIX, > > + "Queue invalidate wait descriptor was > timeout.\n"); > > + return -ETIMEDOUT; > > } > > Without the v2 discussion even having finished, and without you having taken > care of v2 comments here, I don't see much value in this v3. > I should check for _any_ kind_of_error and all_the_way_up_the_call_trees in [patch 1/2]. 1. For _any_ kind_of_error, now it includes '-ETIMEDOUT' and '-EOPNOTSUPP'. I should check the return code, if it is _not_ZERO. The callers need to also either pass it on or deal with it. (if the return code is _not_ZERO, it should check whether it is '-ETIMEDOUT' in [patch 2/2]) 2. all_the_way_up_the_call_trees .i.e enable_intremap() |--iommu_flush_iec_global() |--__iommu_flush_iec() |--invalidate_sync() In enable_intremap(), if iommu_flush_iec_global() is failed, enable_intremap() should deal with this error and then return. right? Jan, thanks for your patience. Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed. 2015-12-15 8:35 ` Xu, Quan @ 2015-12-15 9:18 ` Jan Beulich 2015-12-15 10:10 ` Xu, Quan 0 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2015-12-15 9:18 UTC (permalink / raw) To: Quan Xu Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Jun Nakajima, keir@xen.org >>> On 15.12.15 at 09:35, <quan.xu@intel.com> wrote: >> On 14.12.2015 at 5:14pm, <JBeulich@suse.com> wrote: >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: >> > @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu >> *iommu, >> > start_time = NOW(); >> > while ( poll_slot != QINVAL_STAT_DONE ) >> > { >> > - if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) ) >> > + if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) ) >> > { >> > print_qi_regs(iommu); >> > - panic("queue invalidate wait descriptor was not >> executed"); >> > + dprintk(XENLOG_WARNING VTDPREFIX, >> > + "Queue invalidate wait descriptor was >> timeout.\n"); >> > + return -ETIMEDOUT; >> > } >> >> Without the v2 discussion even having finished, and without you having taken >> care of v2 comments here, I don't see much value in this v3. >> > > I should check for _any_ kind_of_error and all_the_way_up_the_call_trees in > [patch 1/2]. As said, perhaps better in patch 1/3. > 1. For _any_ kind_of_error, > now it includes '-ETIMEDOUT' and '-EOPNOTSUPP'. I should check the return > code, if it is _not_ZERO. > The callers need to also either pass it on or deal with it. (if the return > code is _not_ZERO, it should check whether it is '-ETIMEDOUT' in [patch 2/2]) > > 2. all_the_way_up_the_call_trees > > .i.e > enable_intremap() > |--iommu_flush_iec_global() > |--__iommu_flush_iec() > |--invalidate_sync() > > In enable_intremap(), if iommu_flush_iec_global() is failed, > enable_intremap() should deal with this error and then return. Yes. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed. 2015-12-15 9:18 ` Jan Beulich @ 2015-12-15 10:10 ` Xu, Quan 0 siblings, 0 replies; 25+ messages in thread From: Xu, Quan @ 2015-12-15 10:10 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org >On 15.12.2015 at 5:18pm, <JBeulich@suse.com> wrote: > >>> On 15.12.15 at 09:35, <quan.xu@intel.com> wrote: > >> On 14.12.2015 at 5:14pm, <JBeulich@suse.com> wrote: > >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > >> > @@ -167,10 +172,12 @@ static int queue_invalidate_wait(struct iommu > >> *iommu, > >> > start_time = NOW(); > >> > while ( poll_slot != QINVAL_STAT_DONE ) > >> > { > >> > - if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) ) > >> > + if ( NOW() > (start_time + IOMMU_QI_TIMEOUT) ) > >> > { > >> > print_qi_regs(iommu); > >> > - panic("queue invalidate wait descriptor was not > >> executed"); > >> > + dprintk(XENLOG_WARNING VTDPREFIX, > >> > + "Queue invalidate wait descriptor was > >> timeout.\n"); > >> > + return -ETIMEDOUT; > >> > } > >> > >> Without the v2 discussion even having finished, and without you > >> having taken care of v2 comments here, I don't see much value in this v3. > >> > > > > I should check for _any_ kind_of_error and > > all_the_way_up_the_call_trees in [patch 1/2]. > > As said, perhaps better in patch 1/3. Agreed. > > > 1. For _any_ kind_of_error, > > now it includes '-ETIMEDOUT' and '-EOPNOTSUPP'. I should check the > > return code, if it is _not_ZERO. > > The callers need to also either pass it on or deal with it. (if the > > return code is _not_ZERO, it should check whether it is '-ETIMEDOUT' > > in [patch 2/2]) > > > > 2. all_the_way_up_the_call_trees > > > > .i.e > > enable_intremap() > > |--iommu_flush_iec_global() > > |--__iommu_flush_iec() > > |--invalidate_sync() > > > > In enable_intremap(), if iommu_flush_iec_global() is failed, > > enable_intremap() should deal with this error and then return. > > Yes. Good.. Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-12 13:21 [PATCH v3 0/2] VT-d flush issue Quan Xu 2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu @ 2015-12-12 13:21 ` Quan Xu 2015-12-14 9:28 ` Jan Beulich 2015-12-19 14:01 ` [PATCH v3 0/2] VT-d flush issue Xu, Quan 2 siblings, 1 reply; 25+ messages in thread From: Quan Xu @ 2015-12-12 13:21 UTC (permalink / raw) To: jbeulich, kevin.tian Cc: feng.wu, eddie.dong, george.dunlap, andrew.cooper3, tim, xen-devel, jun.nakajima, Quan Xu, keir If IOTLB/Context/IETC flush is timeout, we should think all devices under this IOMMU cannot function correctly. So for each device under this IOMMU we'll mark it as unassignable and kill the domain owning the device. If Device-TLB flush is timeout, we'll mark the target ATS device as unassignable and kill the domain owning this device. If impacted domain is hardware domain, just throw out a warning. It's an open here whether we want to kill hardware domain (or directly panic hypervisor). Comments are welcomed. Device marked as unassignable will be disallowed to be further assigned to any domain. Signed-off-by: Quan Xu <quan.xu@intel.com> --- xen/drivers/passthrough/vtd/extern.h | 5 +++ xen/drivers/passthrough/vtd/iommu.c | 6 +++ xen/drivers/passthrough/vtd/qinval.c | 78 ++++++++++++++++++++++++++++++++++- xen/drivers/passthrough/vtd/x86/ats.c | 16 +++++++ xen/include/xen/pci.h | 11 +++++ 5 files changed, 114 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h index 8acf889..96c1a28 100644 --- a/xen/drivers/passthrough/vtd/extern.h +++ b/xen/drivers/passthrough/vtd/extern.h @@ -62,6 +62,11 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did, int qinval_device_iotlb(struct iommu *iommu, u32 max_invs_pend, u16 sid, u16 size, u64 addr); +void invalidate_timeout(struct iommu *iommu); +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, + u16 seg, u8 bus, u8 devfn); +int invalidate_sync(struct iommu *iommu); + unsigned int get_cache_line_size(void); void cacheline_flush(char *); void flush_all_cache(void); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index dd13865..f9a5d66 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) if ( !pdev->domain ) return -EINVAL; + if ( is_pdev_unassignable(pdev) ) + return -EACCES; + ret = domain_context_mapping(pdev->domain, devfn, pdev); if ( ret ) { @@ -2301,6 +2304,9 @@ static int intel_iommu_assign_device( if ( list_empty(&acpi_drhd_units) ) return -ENODEV; + if ( is_pdev_unassignable(pdev) ) + return -EACCES; + seg = pdev->seg; bus = pdev->bus; /* diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c index 990baf2..c514f99 100644 --- a/xen/drivers/passthrough/vtd/qinval.c +++ b/xen/drivers/passthrough/vtd/qinval.c @@ -27,12 +27,58 @@ #include "dmar.h" #include "vtd.h" #include "extern.h" +#include "../ats.h" static int __read_mostly iommu_qi_timeout_ms = 1; integer_param("iommu_qi_timeout_ms", iommu_qi_timeout_ms); #define IOMMU_QI_TIMEOUT (iommu_qi_timeout_ms * MILLISECS(1)) +void invalidate_timeout(struct iommu *iommu) +{ + struct domain *d; + unsigned long nr_dom, i; + struct pci_dev *pdev; + + nr_dom = cap_ndoms(iommu->cap); + i = find_first_bit(iommu->domid_bitmap, nr_dom); + while ( i < nr_dom ) { + d = rcu_lock_domain_by_id(iommu->domid_map[i]); + ASSERT(d); + + /* Mark the devices as unassignable. */ + for_each_pdev(d, pdev) + mark_pdev_unassignable(pdev); + if ( !is_hardware_domain(d) ) + domain_kill(d); + + rcu_unlock_domain(d); + i = find_next_bit(iommu->domid_bitmap, nr_dom, i + 1); + } +} + +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, + u16 seg, u8 bus, u8 devfn) +{ + struct domain *d; + struct pci_dev *pdev; + + d = rcu_lock_domain_by_id(iommu->domid_map[did]); + ASSERT(d); + for_each_pdev(d, pdev) + if ( (pdev->seg == seg) && + (pdev->bus == bus) && + (pdev->devfn == devfn) ) + { + mark_pdev_unassignable(pdev); + break; + } + + if ( !is_hardware_domain(d) ) + domain_kill(d); + rcu_unlock_domain(d); +} + static void print_qi_regs(struct iommu *iommu) { u64 val; @@ -187,7 +233,7 @@ static int queue_invalidate_wait(struct iommu *iommu, return -EOPNOTSUPP; } -static int invalidate_sync(struct iommu *iommu) +int invalidate_sync(struct iommu *iommu) { struct qi_ctrl *qi_ctrl = iommu_qi_ctrl(iommu); @@ -262,6 +308,14 @@ static int __iommu_flush_iec(struct iommu *iommu, u8 granu, u8 im, u16 iidx) queue_invalidate_iec(iommu, granu, im, iidx); ret = invalidate_sync(iommu); + + if ( ret == -ETIMEDOUT ) + { + invalidate_timeout(iommu); + dprintk(XENLOG_WARNING VTDPREFIX, + "IEC flush timeout.\n"); + return ret; + } /* * reading vt-d architecture register will ensure * draining happens in implementation independent way. @@ -308,6 +362,13 @@ static int flush_context_qi( queue_invalidate_context(iommu, did, sid, fm, type >> DMA_CCMD_INVL_GRANU_OFFSET); ret = invalidate_sync(iommu); + if ( ret == -ETIMEDOUT ) + { + invalidate_timeout(iommu); + dprintk(XENLOG_WARNING VTDPREFIX, + "Context flush timeout.\n"); + return ret; + } } return ret; } @@ -349,9 +410,22 @@ static int flush_iotlb_qi( queue_invalidate_iotlb(iommu, type >> DMA_TLB_FLUSH_GRANU_OFFSET, dr, dw, did, size_order, 0, addr); + + /* + * Synchronize with hardware for invalidation request descriptors + * submitted before Device-TLB invalidate descriptor. + */ + rc = invalidate_sync(iommu); + if ( rc == -ETIMEDOUT ) + { + invalidate_timeout(iommu); + dprintk(XENLOG_WARNING VTDPREFIX, "IOTLB flush timeout.\n"); + return rc; + } + if ( flush_dev_iotlb ) ret = dev_invalidate_iotlb(iommu, did, addr, size_order, type); - rc = invalidate_sync(iommu); + if ( !ret ) ret = rc; } diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c index 7c797f6..973dde7 100644 --- a/xen/drivers/passthrough/vtd/x86/ats.c +++ b/xen/drivers/passthrough/vtd/x86/ats.c @@ -156,6 +156,22 @@ int dev_invalidate_iotlb(struct iommu *iommu, u16 did, rc = qinval_device_iotlb(iommu, pdev->ats_queue_depth, sid, sbit, addr); + + /* + * Synchronize with hardware for Device-TLB invalidate + * descriptor. + */ + rc = invalidate_sync(iommu); + + if ( rc == -ETIMEDOUT ) + { + device_tlb_invalidate_timeout(iommu, did, pdev->seg, pdev->bus, + pdev->devfn); + dprintk(XENLOG_WARNING VTDPREFIX, + "Device-TLB flush timeout.\n"); + return rc; + } + break; default: dprintk(XENLOG_WARNING VTDPREFIX, "invalid vt-d flush type\n"); diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index a5aef55..21a0af6 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -41,6 +41,7 @@ struct pci_dev_info { bool_t is_extfn; bool_t is_virtfn; + bool_t is_unassignable; struct { u8 bus; u8 devfn; @@ -88,6 +89,16 @@ struct pci_dev { #define for_each_pdev(domain, pdev) \ list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list) +static inline void mark_pdev_unassignable(struct pci_dev *pdev) +{ + pdev->info.is_unassignable = 1; +} + +static inline bool_t is_pdev_unassignable(const struct pci_dev *pdev) +{ + return pdev->info.is_unassignable; +} + /* * The pcidevs_lock protect alldevs_list, and the assignment for the * devices, it also sync the access to the msi capability that is not -- 1.8.1.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-12 13:21 ` [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue Quan Xu @ 2015-12-14 9:28 ` Jan Beulich 2015-12-15 8:15 ` Xu, Quan 0 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2015-12-14 9:28 UTC (permalink / raw) To: Quan Xu Cc: kevin.tian, feng.wu, george.dunlap, andrew.cooper3, tim, xen-devel, jun.nakajima, keir >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) > if ( !pdev->domain ) > return -EINVAL; > > + if ( is_pdev_unassignable(pdev) ) > + return -EACCES; Is this case possible at all (i.e. a newly added device being unassignable)? > --- a/xen/drivers/passthrough/vtd/qinval.c > +++ b/xen/drivers/passthrough/vtd/qinval.c > @@ -27,12 +27,58 @@ > #include "dmar.h" > #include "vtd.h" > #include "extern.h" > +#include "../ats.h" > > static int __read_mostly iommu_qi_timeout_ms = 1; > integer_param("iommu_qi_timeout_ms", iommu_qi_timeout_ms); > > #define IOMMU_QI_TIMEOUT (iommu_qi_timeout_ms * MILLISECS(1)) > > +void invalidate_timeout(struct iommu *iommu) > +{ > + struct domain *d; > + unsigned long nr_dom, i; > + struct pci_dev *pdev; > + > + nr_dom = cap_ndoms(iommu->cap); > + i = find_first_bit(iommu->domid_bitmap, nr_dom); > + while ( i < nr_dom ) { > + d = rcu_lock_domain_by_id(iommu->domid_map[i]); > + ASSERT(d); > + > + /* Mark the devices as unassignable. */ > + for_each_pdev(d, pdev) > + mark_pdev_unassignable(pdev); > + if ( !is_hardware_domain(d) ) > + domain_kill(d); DYM domain_crash() here? > +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, > + u16 seg, u8 bus, u8 devfn) > +{ > + struct domain *d; > + struct pci_dev *pdev; > + > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > + ASSERT(d); > + for_each_pdev(d, pdev) > + if ( (pdev->seg == seg) && > + (pdev->bus == bus) && > + (pdev->devfn == devfn) ) > + { > + mark_pdev_unassignable(pdev); > + break; > + } > + > + if ( !is_hardware_domain(d) ) > + domain_kill(d); > + rcu_unlock_domain(d); > +} Except for the variable declarations, indentation is broken for the entire function. > @@ -262,6 +308,14 @@ static int __iommu_flush_iec(struct iommu *iommu, u8 granu, u8 im, u16 iidx) > > queue_invalidate_iec(iommu, granu, im, iidx); > ret = invalidate_sync(iommu); > + > + if ( ret == -ETIMEDOUT ) > + { > + invalidate_timeout(iommu); > + dprintk(XENLOG_WARNING VTDPREFIX, > + "IEC flush timeout.\n"); > + return ret; > + } > /* Considering the recurring pattern, wouldn't it be better for invalidate_sync() to invoke invalidate_timeout() (at once making sure no current or future caller misses the need to do so)? Also please insert the blank line at the end of your additions, and no trailing full stops in log messages please. > @@ -88,6 +89,16 @@ struct pci_dev { > #define for_each_pdev(domain, pdev) \ > list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list) > > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) > +{ > + pdev->info.is_unassignable = 1; > +} > + > +static inline bool_t is_pdev_unassignable(const struct pci_dev *pdev) > +{ > + return pdev->info.is_unassignable; > +} Are you aware that we already have a mechanism to prevent assignment (via pci_{ro,hide}_device())? I think at the very least this check should consider both variants. Whether fully using the existing mechanism for your purpose is feasible I can't immediately tell (since the ownership change may be problematic at the points where you want the flagging to happen). Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-14 9:28 ` Jan Beulich @ 2015-12-15 8:15 ` Xu, Quan 2015-12-15 9:16 ` Jan Beulich 0 siblings, 1 reply; 25+ messages in thread From: Xu, Quan @ 2015-12-15 8:15 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org >On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: > >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > > --- a/xen/drivers/passthrough/vtd/iommu.c > > +++ b/xen/drivers/passthrough/vtd/iommu.c > > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct > pci_dev *pdev) > > if ( !pdev->domain ) > > return -EINVAL; > > > > + if ( is_pdev_unassignable(pdev) ) > > + return -EACCES; > > Is this case possible at all (i.e. a newly added device being unassignable)? > IMO, it is possible, and I have checked and printed the invoke flow when Xen init. iommu_setup() / iommu_hwdom_init() are called before pci_add_device() when Xen init. If it flushes error in iommu_setup() / iommu_hwdom_init(), it will mark device as unassignable. So it is possible. right? > > --- a/xen/drivers/passthrough/vtd/qinval.c > > +++ b/xen/drivers/passthrough/vtd/qinval.c > > @@ -27,12 +27,58 @@ > > #include "dmar.h" > > #include "vtd.h" > > #include "extern.h" > > +#include "../ats.h" > > > > static int __read_mostly iommu_qi_timeout_ms = 1; > > integer_param("iommu_qi_timeout_ms", iommu_qi_timeout_ms); > > > > #define IOMMU_QI_TIMEOUT (iommu_qi_timeout_ms * MILLISECS(1)) > > > > +void invalidate_timeout(struct iommu *iommu) { > > + struct domain *d; > > + unsigned long nr_dom, i; > > + struct pci_dev *pdev; > > + > > + nr_dom = cap_ndoms(iommu->cap); > > + i = find_first_bit(iommu->domid_bitmap, nr_dom); > > + while ( i < nr_dom ) { > > + d = rcu_lock_domain_by_id(iommu->domid_map[i]); > > + ASSERT(d); > > + > > + /* Mark the devices as unassignable. */ > > + for_each_pdev(d, pdev) > > + mark_pdev_unassignable(pdev); > > + if ( !is_hardware_domain(d) ) > > + domain_kill(d); > > DYM domain_crash() here? > Agreed. > > +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, > > + u16 seg, u8 bus, u8 devfn) { > > + struct domain *d; > > + struct pci_dev *pdev; > > + > > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > > + ASSERT(d); > > + for_each_pdev(d, pdev) > > + if ( (pdev->seg == seg) && > > + (pdev->bus == bus) && > > + (pdev->devfn == devfn) ) > > + { > > + mark_pdev_unassignable(pdev); > > + break; > > + } > > + > > + if ( !is_hardware_domain(d) ) > > + domain_kill(d); > > + rcu_unlock_domain(d); > > +} > > Except for the variable declarations, indentation is broken for the entire > function. > I will modify it in v4. > > @@ -262,6 +308,14 @@ static int __iommu_flush_iec(struct iommu *iommu, > > u8 granu, u8 im, u16 iidx) > > > > queue_invalidate_iec(iommu, granu, im, iidx); > > ret = invalidate_sync(iommu); > > + > > + if ( ret == -ETIMEDOUT ) > > + { > > + invalidate_timeout(iommu); > > + dprintk(XENLOG_WARNING VTDPREFIX, > > + "IEC flush timeout.\n"); > > + return ret; > > + } > > /* > > Considering the recurring pattern, wouldn't it be better for > invalidate_sync() to invoke invalidate_timeout() (at once making sure no current > or future caller misses the need to do so)? > Now invalidate_sync() is a common function for iec/iotlb/context/device-tlb invalidation. It is different to deal with the flush error. For device-tlb, it needs some parameters of device, such device seg/bus/devfn. But iec/iotlb/context don't need these parameters. Could I introduce device_tlb_invalidate_sync(struct iommu *iommu, u16 did, u16 seg, u8 bus, u8 devfn) for device-tlb flush? invalidate_sync(struct iommu *iommu) is for iec/iotlb/context. > Also please insert the blank line at the end of your additions, and no trailing full > stops in log messages please. > I will modify it in v4. > > @@ -88,6 +89,16 @@ struct pci_dev { > > #define for_each_pdev(domain, pdev) \ > > list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list) > > > > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) { > > + pdev->info.is_unassignable = 1; > > +} > > + > > +static inline bool_t is_pdev_unassignable(const struct pci_dev *pdev) > > +{ > > + return pdev->info.is_unassignable; } > > Are you aware that we already have a mechanism to prevent assignment (via > pci_{ro,hide}_device())? I think at the very least this check should consider both > variants. Whether fully using the existing mechanism for your purpose is > feasible I can't immediately tell (since the ownership change may be > problematic at the points where you want the flagging to happen). pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it. for this case, IMO I think a flag is a better solution. -Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 8:15 ` Xu, Quan @ 2015-12-15 9:16 ` Jan Beulich 2015-12-15 10:24 ` Xu, Quan ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Jan Beulich @ 2015-12-15 9:16 UTC (permalink / raw) To: Quan Xu Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Jun Nakajima, keir@xen.org >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: >> > --- a/xen/drivers/passthrough/vtd/iommu.c >> > +++ b/xen/drivers/passthrough/vtd/iommu.c >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct >> pci_dev *pdev) >> > if ( !pdev->domain ) >> > return -EINVAL; >> > >> > + if ( is_pdev_unassignable(pdev) ) >> > + return -EACCES; >> >> Is this case possible at all (i.e. a newly added device being unassignable)? >> > > IMO, it is possible, and I have checked and printed the invoke flow when Xen init. > iommu_setup() / iommu_hwdom_init() are called before pci_add_device() when Xen init. > If it flushes error in iommu_setup() / iommu_hwdom_init(), it will mark > device as unassignable. Please be more precise: I can't see how e.g. iommu_setup() would manage to time out on invalidation on a particular device. I.e. please provide an exact scenario where the flag could get set before the device gets reported through iommu_add_device(). Furthermore I think that boot time invalidations shouldn't lead to any devices getting marked un-assignable. Instead such would probably better result in the IOMMU to be turned of altogether. >> > @@ -262,6 +308,14 @@ static int __iommu_flush_iec(struct iommu *iommu, >> > u8 granu, u8 im, u16 iidx) >> > >> > queue_invalidate_iec(iommu, granu, im, iidx); >> > ret = invalidate_sync(iommu); >> > + >> > + if ( ret == -ETIMEDOUT ) >> > + { >> > + invalidate_timeout(iommu); >> > + dprintk(XENLOG_WARNING VTDPREFIX, >> > + "IEC flush timeout.\n"); >> > + return ret; >> > + } >> > /* >> >> Considering the recurring pattern, wouldn't it be better for >> invalidate_sync() to invoke invalidate_timeout() (at once making sure no current >> or future caller misses the need to do so)? > > Now invalidate_sync() is a common function for iec/iotlb/context/device-tlb > invalidation. > It is different to deal with the flush error. > For device-tlb, it needs some parameters of device, such device > seg/bus/devfn. > But iec/iotlb/context don't need these parameters. Ah, I see, there's indeed one case where the handling is different. Never mind then. >> > @@ -88,6 +89,16 @@ struct pci_dev { >> > #define for_each_pdev(domain, pdev) \ >> > list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list) >> > >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) { >> > + pdev->info.is_unassignable = 1; >> > +} >> > + >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev *pdev) >> > +{ >> > + return pdev->info.is_unassignable; } >> >> Are you aware that we already have a mechanism to prevent assignment (via >> pci_{ro,hide}_device())? I think at the very least this check should > consider both >> variants. Whether fully using the existing mechanism for your purpose is >> feasible I can't immediately tell (since the ownership change may be >> problematic at the points where you want the flagging to happen). > > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it. > for this case, IMO I think a flag is a better solution. I can't really judge on this without know details of the crash. As said, I'd prefer to have just a single mechanism, and would accept a second one only with proper justification (i.e. more than "is dangerous" or "makes xen crash" - after all that may also be a result of the functions not being used as necessary). Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 9:16 ` Jan Beulich @ 2015-12-15 10:24 ` Xu, Quan 2015-12-15 12:32 ` Jan Beulich 2015-12-15 10:58 ` Xu, Quan 2015-12-15 12:23 ` Xu, Quan 2 siblings, 1 reply; 25+ messages in thread From: Xu, Quan @ 2015-12-15 10:24 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote: > >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: > >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: > >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > >> > --- a/xen/drivers/passthrough/vtd/iommu.c > >> > +++ b/xen/drivers/passthrough/vtd/iommu.c > >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, > >> > struct > >> pci_dev *pdev) > >> > if ( !pdev->domain ) > >> > return -EINVAL; > >> > > >> > + if ( is_pdev_unassignable(pdev) ) > >> > + return -EACCES; > >> > >> Is this case possible at all (i.e. a newly added device being unassignable)? > >> > > > > IMO, it is possible, and I have checked and printed the invoke flow when Xen > init. > > iommu_setup() / iommu_hwdom_init() are called before pci_add_device() > when Xen init. > > If it flushes error in iommu_setup() / iommu_hwdom_init(), it will > > mark device as unassignable. > > Please be more precise: I can't see how e.g. iommu_setup() would manage to > time out on invalidation on a particular device. I.e. > please provide an exact scenario where the flag could get set before the device > gets reported through iommu_add_device(). > > Furthermore I think that boot time invalidations shouldn't lead to any devices > getting marked un-assignable. Instead such would probably better result in the > IOMMU to be turned of altogether. ... construct_dom0() |--iommu_hwdom_init() |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for x86] |--iommu_flush_all() ... If it is failed in iommu_flush_all(), it will mask all of dom0's devices as unassignable. And all of that called before pci_add_device() .. right? > > >> > @@ -88,6 +89,16 @@ struct pci_dev { #define for_each_pdev(domain, > >> > pdev) \ > >> > list_for_each_entry(pdev, &(domain->arch.pdev_list), > >> > domain_list) > >> > > >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) { > >> > + pdev->info.is_unassignable = 1; } > >> > + > >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev > >> > +*pdev) { > >> > + return pdev->info.is_unassignable; } > >> > >> Are you aware that we already have a mechanism to prevent assignment > >> (via pci_{ro,hide}_device())? I think at the very least this check > >> should > > consider both > >> variants. Whether fully using the existing mechanism for your purpose > >> is feasible I can't immediately tell (since the ownership change may > >> be problematic at the points where you want the flagging to happen). > > > > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it. > > for this case, IMO I think a flag is a better solution. > > I can't really judge on this without know details of the crash. As said, I'd prefer > to have just a single mechanism, and would accept a second one only with > proper justification (i.e. more than "is dangerous" or "makes xen crash" - after > all that may also be a result of the functions not being used as necessary). > I need do some more investigation and try a second one only with proper justification. Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 10:24 ` Xu, Quan @ 2015-12-15 12:32 ` Jan Beulich 2015-12-15 13:01 ` Xu, Quan 0 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2015-12-15 12:32 UTC (permalink / raw) To: Quan Xu Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Jun Nakajima, keir@xen.org >>> On 15.12.15 at 12:42, wrote: >>>> On 15.12.15 at 11:24, <quan.xu@intel.com> wrote: > >> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote: > >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: > >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: > >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > >> >> > --- a/xen/drivers/passthrough/vtd/iommu.c > >> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c > >> >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, > >> >> > struct > >> >> pci_dev *pdev) > >> >> > if ( !pdev->domain ) > >> >> > return -EINVAL; > >> >> > > >> >> > + if ( is_pdev_unassignable(pdev) ) > >> >> > + return -EACCES; > >> >> > >> >> Is this case possible at all (i.e. a newly added device being > > unassignable)? > >> >> > >> > > >> > IMO, it is possible, and I have checked and printed the invoke flow when > > Xen > >> init. > >> > iommu_setup() / iommu_hwdom_init() are called before pci_add_device() > >> when Xen init. > >> > If it flushes error in iommu_setup() / iommu_hwdom_init(), it will > >> > mark device as unassignable. > >> > >> Please be more precise: I can't see how e.g. iommu_setup() would manage to > >> time out on invalidation on a particular device. I.e. > >> please provide an exact scenario where the flag could get set before the > > device > >> gets reported through iommu_add_device(). > >> > >> Furthermore I think that boot time invalidations shouldn't lead to any > > devices > >> getting marked un-assignable. Instead such would probably better result in > > the > >> IOMMU to be turned of altogether. > > > > > > ... > > construct_dom0() > > |--iommu_hwdom_init() > > |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for x86] > > |--iommu_flush_all() > > ... > > > > > > If it is failed in iommu_flush_all(), it will mask all of dom0's devices as > > unassignable. > > And all of that called before pci_add_device() .. right? > > Agreed. But as said, I don't think that's the right action in this specific > case. Having thought about it some more over lunch, I think leaving the conditional in place is going to be fine, and much less of a change than trying to unroll IOMMU setup at that point. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 12:32 ` Jan Beulich @ 2015-12-15 13:01 ` Xu, Quan 2015-12-15 13:26 ` Jan Beulich 0 siblings, 1 reply; 25+ messages in thread From: Xu, Quan @ 2015-12-15 13:01 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 15.12.2015 at 8:32pm, <JBeulich@suse.com> wrote: > >>> On 15.12.15 at 12:42, wrote: > >>>> On 15.12.15 at 11:24, <quan.xu@intel.com> wrote: > > >> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote: > > >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: > > >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: > > >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > > >> >> > --- a/xen/drivers/passthrough/vtd/iommu.c > > >> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c > > >> >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 > > >> >> > devfn, struct > > >> >> pci_dev *pdev) > > >> >> > if ( !pdev->domain ) > > >> >> > return -EINVAL; > > >> >> > > > >> >> > + if ( is_pdev_unassignable(pdev) ) > > >> >> > + return -EACCES; > > >> >> > > >> >> Is this case possible at all (i.e. a newly added device being > > > unassignable)? > > >> >> > > >> > > > >> > IMO, it is possible, and I have checked and printed the invoke > > >> > flow when > > > Xen > > >> init. > > >> > iommu_setup() / iommu_hwdom_init() are called before > > >> > pci_add_device() > > >> when Xen init. > > >> > If it flushes error in iommu_setup() / iommu_hwdom_init(), it > > >> > will mark device as unassignable. > > >> > > >> Please be more precise: I can't see how e.g. iommu_setup() would > > >> manage to time out on invalidation on a particular device. I.e. > > >> please provide an exact scenario where the flag could get set > > >> before the > > > device > > >> gets reported through iommu_add_device(). > > >> > > >> Furthermore I think that boot time invalidations shouldn't lead to > > >> any > > > devices > > >> getting marked un-assignable. Instead such would probably better > > >> result in > > > the > > >> IOMMU to be turned of altogether. > > > > > > > > > ... > > > construct_dom0() > > > |--iommu_hwdom_init() > > > |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for > x86] > > > |--iommu_flush_all() > > > ... > > > > > > > > > If it is failed in iommu_flush_all(), it will mask all of dom0's > > > devices as unassignable. > > > And all of that called before pci_add_device() .. right? > > > > Agreed. But as said, I don't think that's the right action in this > > specific case. > > Having thought about it some more over lunch, I think leaving the conditional in > place is going to be fine, and much less of a change than trying to unroll > IOMMU setup at that point. > Thanks Jan! Do you mean that I can ignore it in intel_iommu_add_device()? Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 13:01 ` Xu, Quan @ 2015-12-15 13:26 ` Jan Beulich 0 siblings, 0 replies; 25+ messages in thread From: Jan Beulich @ 2015-12-15 13:26 UTC (permalink / raw) To: Quan Xu Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Jun Nakajima, keir@xen.org >>> On 15.12.15 at 14:01, <quan.xu@intel.com> wrote: >> On 15.12.2015 at 8:32pm, <JBeulich@suse.com> wrote: >> >>> On 15.12.15 at 12:42, wrote: >> >>>> On 15.12.15 at 11:24, <quan.xu@intel.com> wrote: >> > >> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote: >> > >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: >> > >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: >> > >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: >> > >> >> > --- a/xen/drivers/passthrough/vtd/iommu.c >> > >> >> > +++ b/xen/drivers/passthrough/vtd/iommu.c >> > >> >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 >> > >> >> > devfn, struct >> > >> >> pci_dev *pdev) >> > >> >> > if ( !pdev->domain ) >> > >> >> > return -EINVAL; >> > >> >> > >> > >> >> > + if ( is_pdev_unassignable(pdev) ) >> > >> >> > + return -EACCES; >> > >> >> >> > >> >> Is this case possible at all (i.e. a newly added device being >> > > unassignable)? >> > >> >> >> > >> > >> > >> > IMO, it is possible, and I have checked and printed the invoke >> > >> > flow when >> > > Xen >> > >> init. >> > >> > iommu_setup() / iommu_hwdom_init() are called before >> > >> > pci_add_device() >> > >> when Xen init. >> > >> > If it flushes error in iommu_setup() / iommu_hwdom_init(), it >> > >> > will mark device as unassignable. >> > >> >> > >> Please be more precise: I can't see how e.g. iommu_setup() would >> > >> manage to time out on invalidation on a particular device. I.e. >> > >> please provide an exact scenario where the flag could get set >> > >> before the >> > > device >> > >> gets reported through iommu_add_device(). >> > >> >> > >> Furthermore I think that boot time invalidations shouldn't lead to >> > >> any >> > > devices >> > >> getting marked un-assignable. Instead such would probably better >> > >> result in >> > > the >> > >> IOMMU to be turned of altogether. >> > > >> > > >> > > ... >> > > construct_dom0() >> > > |--iommu_hwdom_init() >> > > |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for >> x86] >> > > |--iommu_flush_all() >> > > ... >> > > >> > > >> > > If it is failed in iommu_flush_all(), it will mask all of dom0's >> > > devices as unassignable. >> > > And all of that called before pci_add_device() .. right? >> > >> > Agreed. But as said, I don't think that's the right action in this >> > specific case. >> >> Having thought about it some more over lunch, I think leaving the > conditional in >> place is going to be fine, and much less of a change than trying to unroll >> IOMMU setup at that point. >> > Thanks Jan! > Do you mean that I can ignore it in intel_iommu_add_device()? Ignore? Just keep the code the way it was (still seen above). Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 9:16 ` Jan Beulich 2015-12-15 10:24 ` Xu, Quan @ 2015-12-15 10:58 ` Xu, Quan 2015-12-15 12:23 ` Xu, Quan 2 siblings, 0 replies; 25+ messages in thread From: Xu, Quan @ 2015-12-15 10:58 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote: > >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: > >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: > >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > >> > --- a/xen/drivers/passthrough/vtd/iommu.c > >> > +++ b/xen/drivers/passthrough/vtd/iommu.c > >> > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, > >> > struct > >> pci_dev *pdev) > >> > if ( !pdev->domain ) > >> > return -EINVAL; > >> > > >> > + if ( is_pdev_unassignable(pdev) ) > >> > + return -EACCES; > >> > >> Is this case possible at all (i.e. a newly added device being unassignable)? > >> > > > > IMO, it is possible, and I have checked and printed the invoke flow > > when Xen > init. > > iommu_setup() / iommu_hwdom_init() are called before > > pci_add_device() > when Xen init. > > If it flushes error in iommu_setup() / iommu_hwdom_init(), it will > > mark device as unassignable. > > Please be more precise: I can't see how e.g. iommu_setup() would > manage to time out on invalidation on a particular device. I.e. > please provide an exact scenario where the flag could get set before > the device gets reported through iommu_add_device(). > > Furthermore I think that boot time invalidations shouldn't lead to any > devices getting marked un-assignable. Instead such would probably > better result in the IOMMU to be turned of altogether. ... construct_dom0() |--iommu_hwdom_init() |--hd->platform_ops->hwdom_init(d) [intel_iommu_hwdom_init() for x86] |--iommu_flush_all() ... If it is failed in iommu_flush_all(), it will mask all of dom0's devices as unassignable. And all of that called before pci_add_device() .. right? > > >> > @@ -88,6 +89,16 @@ struct pci_dev { #define > >> > for_each_pdev(domain, > >> > pdev) \ > >> > list_for_each_entry(pdev, &(domain->arch.pdev_list), > >> > domain_list) > >> > > >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) { > >> > + pdev->info.is_unassignable = 1; } > >> > + > >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev > >> > +*pdev) { > >> > + return pdev->info.is_unassignable; } > >> > >> Are you aware that we already have a mechanism to prevent > >> assignment (via pci_{ro,hide}_device())? I think at the very least > >> this check should > > consider both > >> variants. Whether fully using the existing mechanism for your > >> purpose is feasible I can't immediately tell (since the ownership > >> change may be problematic at the points where you want the flagging to happen). > > > > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it. > > for this case, IMO I think a flag is a better solution. > > I can't really judge on this without know details of the crash. As > said, I'd prefer to have just a single mechanism, and would accept a > second one only with proper justification (i.e. more than "is > dangerous" or "makes xen crash" - after all that may also be a result of the functions not being used as necessary). > I need do some more investigation and try a second one only with proper justification. Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 9:16 ` Jan Beulich 2015-12-15 10:24 ` Xu, Quan 2015-12-15 10:58 ` Xu, Quan @ 2015-12-15 12:23 ` Xu, Quan 2015-12-15 12:29 ` Jan Beulich 2 siblings, 1 reply; 25+ messages in thread From: Xu, Quan @ 2015-12-15 12:23 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote: > >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: > >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: > >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > >> > @@ -88,6 +89,16 @@ struct pci_dev { #define for_each_pdev(domain, > >> > pdev) \ > >> > list_for_each_entry(pdev, &(domain->arch.pdev_list), > >> > domain_list) > >> > > >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) { > >> > + pdev->info.is_unassignable = 1; } > >> > + > >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev > >> > +*pdev) { > >> > + return pdev->info.is_unassignable; } > >> > >> Are you aware that we already have a mechanism to prevent assignment > >> (via pci_{ro,hide}_device())? I think at the very least this check > >> should > > consider both > >> variants. Whether fully using the existing mechanism for your purpose > >> is feasible I can't immediately tell (since the ownership change may > >> be problematic at the points where you want the flagging to happen). > > > > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it. > > for this case, IMO I think a flag is a better solution. > > I can't really judge on this without know details of the crash. As said, I'd prefer > to have just a single mechanism, and would accept a second one only with > proper justification (i.e. more than "is dangerous" or "makes xen crash" - after > all that may also be a result of the functions not being used as necessary). > Jan, one question, do pci_{ro,hide}_device() work to hide pci devices? Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 12:23 ` Xu, Quan @ 2015-12-15 12:29 ` Jan Beulich 2015-12-15 13:31 ` Xu, Quan 0 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2015-12-15 12:29 UTC (permalink / raw) To: Quan Xu Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Jun Nakajima, keir@xen.org >>> On 15.12.15 at 13:23, <quan.xu@intel.com> wrote: >> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote: >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: >> >> > @@ -88,6 +89,16 @@ struct pci_dev { #define for_each_pdev(domain, >> >> > pdev) \ >> >> > list_for_each_entry(pdev, &(domain->arch.pdev_list), >> >> > domain_list) >> >> > >> >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) { >> >> > + pdev->info.is_unassignable = 1; } >> >> > + >> >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev >> >> > +*pdev) { >> >> > + return pdev->info.is_unassignable; } >> >> >> >> Are you aware that we already have a mechanism to prevent assignment >> >> (via pci_{ro,hide}_device())? I think at the very least this check >> >> should >> > consider both >> >> variants. Whether fully using the existing mechanism for your purpose >> >> is feasible I can't immediately tell (since the ownership change may >> >> be problematic at the points where you want the flagging to happen). >> > >> > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it. >> > for this case, IMO I think a flag is a better solution. >> >> I can't really judge on this without know details of the crash. As said, I'd prefer >> to have just a single mechanism, and would accept a second one only with >> proper justification (i.e. more than "is dangerous" or "makes xen crash" - after >> all that may also be a result of the functions not being used as necessary). > > Jan, one question, do pci_{ro,hide}_device() work to hide pci devices? Not sure I understand the question - the place they're used in they do work, yes. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-15 12:29 ` Jan Beulich @ 2015-12-15 13:31 ` Xu, Quan [not found] ` <5670271402000078000BFA35@prv-mh.provo.novell.com> 0 siblings, 1 reply; 25+ messages in thread From: Xu, Quan @ 2015-12-15 13:31 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 15.12.2015 at 8:30pm, <JBeulich@suse.com> wrote: > >>> On 15.12.15 at 13:23, <quan.xu@intel.com> wrote: > >> On 15.12.2015 at 5:17pm, <JBeulich@suse.com> wrote: > >> >>> On 15.12.15 at 09:15, <quan.xu@intel.com> wrote: > >> >> On 14.12.2015 at 5:28pm, <JBeulich@suse.com> wrote: > >> >> >>> On 12.12.15 at 14:21, <quan.xu@intel.com> wrote: > >> >> > @@ -88,6 +89,16 @@ struct pci_dev { #define > >> >> > for_each_pdev(domain, > >> >> > pdev) \ > >> >> > list_for_each_entry(pdev, &(domain->arch.pdev_list), > >> >> > domain_list) > >> >> > > >> >> > +static inline void mark_pdev_unassignable(struct pci_dev *pdev) { > >> >> > + pdev->info.is_unassignable = 1; } > >> >> > + > >> >> > +static inline bool_t is_pdev_unassignable(const struct pci_dev > >> >> > +*pdev) { > >> >> > + return pdev->info.is_unassignable; } > >> >> > >> >> Are you aware that we already have a mechanism to prevent > >> >> assignment (via pci_{ro,hide}_device())? I think at the very least > >> >> this check should > >> > consider both > >> >> variants. Whether fully using the existing mechanism for your > >> >> purpose is feasible I can't immediately tell (since the ownership > >> >> change may be problematic at the points where you want the flagging to > happen). > >> > > >> > pci_{ro,hide}_device() is dangerous, and it makes xen crash when I tried it. > >> > for this case, IMO I think a flag is a better solution. > >> > >> I can't really judge on this without know details of the crash. As > >> said, I'd prefer to have just a single mechanism, and would accept a > >> second one only with proper justification (i.e. more than "is > >> dangerous" or "makes xen crash" - after all that may also be a result of the > functions not being used as necessary). > > > > Jan, one question, do pci_{ro,hide}_device() work to hide pci devices? > > Not sure I understand the question - the place they're used in they do work, yes. Copy from pci_hide_device(), which is actually add device to dom_xen and add pdev->domain_list to dom_xen->arch.pdev_list. Quite similar, a second one only with proper justification, I can reassign Device form _domain to dom_xen directly. The below is the how to deal With device_tlb ( is it acceptable? ). It is working to hide device. +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, + u16 seg, u8 bus, u8 devfn) +{ + struct domain *d; + struct pci_dev *pdev; + struct hvm_iommu *hd; + int rc; + + d = rcu_lock_domain_by_id(iommu->domid_map[did]); + ASSERT(d); + for_each_pdev(d, pdev) + if ( (pdev->seg == seg) && + (pdev->bus == bus) && + (pdev->devfn == devfn) ) + { + if ( pdev->domain ) + { + hd = domain_hvm_iommu(d); + rc = hd->platform_ops->reassign_device(d, + dom_xen, devfn, pdev); + if ( rc ) + continue; + } + break; + } + + if ( !is_hardware_domain(d) ) + domain_crash(d); + rcu_unlock_domain(d); +} Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <5670271402000078000BFA35@prv-mh.provo.novell.com>]
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. [not found] ` <5670271402000078000BFA35@prv-mh.provo.novell.com> @ 2015-12-15 14:49 ` Xu, Quan 2015-12-16 3:51 ` Xu, Quan 1 sibling, 0 replies; 25+ messages in thread From: Xu, Quan @ 2015-12-15 14:49 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 15.12.2015 at 9:44pm, <JBeulich@suse.com> wrote: > >>> On 15.12.15 at 14:31, <quan.xu@intel.com> wrote: > > Copy from pci_hide_device(), which is actually add device to dom_xen > > and add pdev->domain_list to dom_xen->arch.pdev_list. > > > > Quite similar, a second one only with proper justification, I can > > reassign Device form _domain to dom_xen directly. The below is the how > > to deal With device_tlb ( is it acceptable? ). It is working to hide device. > > Looks reasonable, but ... > > > +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, > > + u16 seg, u8 bus, u8 devfn) { > > + struct domain *d; > > + struct pci_dev *pdev; > > + struct hvm_iommu *hd; > > + int rc; > > + > > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > > + ASSERT(d); > > + for_each_pdev(d, pdev) > > + if ( (pdev->seg == seg) && > > + (pdev->bus == bus) && > > + (pdev->devfn == devfn) ) > > + { > > + if ( pdev->domain ) > > + { > > + hd = domain_hvm_iommu(d); > > + rc = hd->platform_ops->reassign_device(d, > > + dom_xen, devfn, pdev); > > ... doesn't this have the potential of generating further flushes? You clearly need > to be certain not to recurse here. Good consideration. I should continue to enhance it. Thanks Jan... Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. [not found] ` <5670271402000078000BFA35@prv-mh.provo.novell.com> 2015-12-15 14:49 ` Xu, Quan @ 2015-12-16 3:51 ` Xu, Quan 2015-12-16 8:08 ` Jan Beulich 1 sibling, 1 reply; 25+ messages in thread From: Xu, Quan @ 2015-12-16 3:51 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 15.12.15 at 9:44pm, <JBeulich@suse.com> wrote: > >>> On 15.12.15 at 14:31, <quan.xu@intel.com> wrote: > > Copy from pci_hide_device(), which is actually add device to dom_xen > > and add pdev->domain_list to dom_xen->arch.pdev_list. > > > > Quite similar, a second one only with proper justification, I can > > reassign Device form _domain to dom_xen directly. The below is the how > > to deal With device_tlb ( is it acceptable? ). It is working to hide device. > > Looks reasonable, but ... > > > +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, > > + u16 seg, u8 bus, u8 devfn) { > > + struct domain *d; > > + struct pci_dev *pdev; > > + struct hvm_iommu *hd; > > + int rc; > > + > > + d = rcu_lock_domain_by_id(iommu->domid_map[did]); > > + ASSERT(d); > > + for_each_pdev(d, pdev) > > + if ( (pdev->seg == seg) && > > + (pdev->bus == bus) && > > + (pdev->devfn == devfn) ) > > + { > > + if ( pdev->domain ) > > + { > > + hd = domain_hvm_iommu(d); > > + rc = hd->platform_ops->reassign_device(d, > > + dom_xen, devfn, pdev); > > ... doesn't this have the potential of generating further flushes? You clearly need > to be certain not to recurse here. Jan, The following code is for how to hide a device. I think it is feasible with proper justification. Have a look first. Thanks. --- a/xen/drivers/passthrough/vtd/qinval.c +++ b/xen/drivers/passthrough/vtd/qinval.c +void device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, + u16 seg, u8 bus, u8 devfn) +{ + struct domain *d; + struct pci_dev *pdev; + + d = rcu_lock_domain_by_id(iommu->domid_map[did]); + ASSERT(d); + for_each_pdev(d, pdev) + if ( (pdev->seg == seg) && + (pdev->bus == bus) && + (pdev->devfn == devfn) ) + { + if ( iommu_hide_device(pdev) ) + { + printk(XENLOG_ERR + "IOMMU hide device %04x:%02x:%02x error.", + seg, bus, devfn); + break; + } + + break; + } + + if ( !is_hardware_domain(d) ) + domain_crash(d); + rcu_unlock_domain(d); +} --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h int iommu_remove_device(struct pci_dev *pdev); +int iommu_hide_device(struct pci_dev *pdev); --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1318,6 +1318,25 @@ int iommu_remove_device(struct pci_dev *pdev) return hd->platform_ops->remove_device(pdev->devfn, pci_to_dev(pdev)); } +int iommu_hide_device(struct pci_dev *pdev) +{ + if ( !pdev || !pdev->domain ) + return -EINVAL; + + spin_lock(&pcidevs_lock); + pdev->domain = dom_xen; + list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); + spin_unlock(&pcidevs_lock); + + return 0; +} + --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) if ( !pdev->domain ) return -EINVAL; + if ( pdev->domain == dom_xen ) + return -EACCES; + ret = domain_context_mapping(pdev->domain, devfn, pdev); if ( ret ) { @@ -2301,6 +2304,9 @@ static int intel_iommu_assign_device( if ( list_empty(&acpi_drhd_units) ) return -ENODEV; + if ( pdev->domain == dom_xen ) + return -EACCES; + seg = pdev->seg; bus = pdev->bus; Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-16 3:51 ` Xu, Quan @ 2015-12-16 8:08 ` Jan Beulich 2015-12-17 11:43 ` Xu, Quan 0 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2015-12-16 8:08 UTC (permalink / raw) To: Quan Xu Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Jun Nakajima, keir@xen.org >>> On 16.12.15 at 04:51, <quan.xu@intel.com> wrote: > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -1318,6 +1318,25 @@ int iommu_remove_device(struct pci_dev *pdev) > return hd->platform_ops->remove_device(pdev->devfn, pci_to_dev(pdev)); > } > > +int iommu_hide_device(struct pci_dev *pdev) > +{ > + if ( !pdev || !pdev->domain ) > + return -EINVAL; > + > + spin_lock(&pcidevs_lock); > + pdev->domain = dom_xen; > + list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); > + spin_unlock(&pcidevs_lock); > + > + return 0; > +} This is effectively a copy of pci_hide_device(), and is misnamed (since it takes a PCI device as argument). I do not see why you shouldn't be able to use pci_hide_device() after removing its __init annotation or a suitably named wrapper around _pci_hide_device(). Not specifically that the way you do this right now you corrupt the original owning domain's PCI device list - you need to remove the device from that list before adding it to dom_xen's (which then will naturally entail clearing ->domain, at once satisfying _pci_hide_device()'s early check, which is there for the very reason of ensuring not to corrupt any list). > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) > if ( !pdev->domain ) > return -EINVAL; > > + if ( pdev->domain == dom_xen ) > + return -EACCES; I'm not sure about the need for this check, ... > @@ -2301,6 +2304,9 @@ static int intel_iommu_assign_device( > if ( list_empty(&acpi_drhd_units) ) > return -ENODEV; > > + if ( pdev->domain == dom_xen ) > + return -EACCES; ... and I clearly don't see the need for this one. Please explain, keeping in mind that generic IOMMU code should be enforcing things like this (and at least in the assign case should already be doing so). Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-16 8:08 ` Jan Beulich @ 2015-12-17 11:43 ` Xu, Quan 2015-12-17 12:10 ` Jan Beulich 0 siblings, 1 reply; 25+ messages in thread From: Xu, Quan @ 2015-12-17 11:43 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 16.12.2015 at 4:08pm, <JBeulich@suse.com> wrote: > >>> On 16.12.15 at 04:51, <quan.xu@intel.com> wrote: > > --- a/xen/drivers/passthrough/pci.c > > +++ b/xen/drivers/passthrough/pci.c > > @@ -1318,6 +1318,25 @@ int iommu_remove_device(struct pci_dev *pdev) > > return hd->platform_ops->remove_device(pdev->devfn, > > pci_to_dev(pdev)); } > > > > +int iommu_hide_device(struct pci_dev *pdev) { > > + if ( !pdev || !pdev->domain ) > > + return -EINVAL; > > + > > + spin_lock(&pcidevs_lock); > > + pdev->domain = dom_xen; > > + list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); > > + spin_unlock(&pcidevs_lock); > > + > > + return 0; > > +} > > This is effectively a copy of pci_hide_device(), and is misnamed (since it takes a > PCI device as argument). I do not see why you shouldn't be able to use > pci_hide_device() after removing its __init annotation or a suitably named > wrapper around _pci_hide_device(). Not specifically that the way you do this > right now you corrupt the original owning domain's PCI device list - you need to > remove the device from that list before adding it to dom_xen's (which then will > naturally entail clearing ->domain, at once satisfying _pci_hide_device()'s early > check, which is there for the very reason of ensuring not to corrupt any list). > You are correct. As the _pci_hide_device()'s early check, I didn't use it. Could I remove the device from that list before adding it to dom_xen's, and reuse pci_hide_device() as below? + list_del(&pdev->domain_list); + pdev->domain = NULL; + if ( pci_hide_device(bus, devfn) ) + { + printk(XENLOG_ERR + "IOMMU hide device %04x:%02x:%02x error.", + seg, bus, devfn); + break; + } > > --- a/xen/drivers/passthrough/vtd/iommu.c > > +++ b/xen/drivers/passthrough/vtd/iommu.c > > @@ -1890,6 +1890,9 @@ static int intel_iommu_add_device(u8 devfn, struct > pci_dev *pdev) > > if ( !pdev->domain ) > > return -EINVAL; > > > > + if ( pdev->domain == dom_xen ) > > + return -EACCES; > > I'm not sure about the need for this check, ... > > > @@ -2301,6 +2304,9 @@ static int intel_iommu_assign_device( > > if ( list_empty(&acpi_drhd_units) ) > > return -ENODEV; > > > > + if ( pdev->domain == dom_xen ) > > + return -EACCES; > > ... and I clearly don't see the need for this one. Please explain, keeping in mind > that generic IOMMU code should be enforcing things like this (and at least in the > assign case should already be doing so). I have verified it. the above 2 checks are excessive protection. I can remove it. ((Jan, thanks very much! Sometime it is not just for technology help.)) Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-17 11:43 ` Xu, Quan @ 2015-12-17 12:10 ` Jan Beulich 2015-12-17 12:55 ` Xu, Quan 0 siblings, 1 reply; 25+ messages in thread From: Jan Beulich @ 2015-12-17 12:10 UTC (permalink / raw) To: Quan Xu Cc: Kevin Tian, Feng Wu, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Jun Nakajima, keir@xen.org >>> On 17.12.15 at 12:43, <quan.xu@intel.com> wrote: >> On 16.12.2015 at 4:08pm, <JBeulich@suse.com> wrote: >> >>> On 16.12.15 at 04:51, <quan.xu@intel.com> wrote: >> > --- a/xen/drivers/passthrough/pci.c >> > +++ b/xen/drivers/passthrough/pci.c >> > @@ -1318,6 +1318,25 @@ int iommu_remove_device(struct pci_dev *pdev) >> > return hd->platform_ops->remove_device(pdev->devfn, >> > pci_to_dev(pdev)); } >> > >> > +int iommu_hide_device(struct pci_dev *pdev) { >> > + if ( !pdev || !pdev->domain ) >> > + return -EINVAL; >> > + >> > + spin_lock(&pcidevs_lock); >> > + pdev->domain = dom_xen; >> > + list_add(&pdev->domain_list, &dom_xen->arch.pdev_list); >> > + spin_unlock(&pcidevs_lock); >> > + >> > + return 0; >> > +} >> >> This is effectively a copy of pci_hide_device(), and is misnamed (since it > takes a >> PCI device as argument). I do not see why you shouldn't be able to use >> pci_hide_device() after removing its __init annotation or a suitably named >> wrapper around _pci_hide_device(). Not specifically that the way you do this >> right now you corrupt the original owning domain's PCI device list - you need > to >> remove the device from that list before adding it to dom_xen's (which then > will >> naturally entail clearing ->domain, at once satisfying _pci_hide_device()'s > early >> check, which is there for the very reason of ensuring not to corrupt any > list). >> > > You are correct. > As the _pci_hide_device()'s early check, I didn't use it. > Could I remove the device from that list before adding it to dom_xen's, and > reuse pci_hide_device() as below? That's what I was trying to suggest. Just that you should list_del() only when pdev->domain is not NULL. Jan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue. 2015-12-17 12:10 ` Jan Beulich @ 2015-12-17 12:55 ` Xu, Quan 0 siblings, 0 replies; 25+ messages in thread From: Xu, Quan @ 2015-12-17 12:55 UTC (permalink / raw) To: Jan Beulich Cc: Tian, Kevin, Wu, Feng, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, Nakajima, Jun, keir@xen.org > On 17.12.2015 at 8:10pm, <JBeulich@suse.com> wrote: > >>> On 17.12.15 at 12:43, <quan.xu@intel.com> wrote: > > You are correct. > > As the _pci_hide_device()'s early check, I didn't use it. > > Could I remove the device from that list before adding it to > > dom_xen's, and reuse pci_hide_device() as below? > > That's what I was trying to suggest. Just that you should list_del() only when > pdev->domain is not NULL. > ok. Now I think we are on the same page. I am getting started to v4. Thanks. Quan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] VT-d flush issue 2015-12-12 13:21 [PATCH v3 0/2] VT-d flush issue Quan Xu 2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu 2015-12-12 13:21 ` [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue Quan Xu @ 2015-12-19 14:01 ` Xu, Quan 2 siblings, 0 replies; 25+ messages in thread From: Xu, Quan @ 2015-12-19 14:01 UTC (permalink / raw) To: jbeulich@suse.com, Tian, Kevin, keir@xen.org, tim@xen.org, george.dunlap@eu.citrix.com, Nakajima, Jun, andrew.cooper3@citrix.com, Wu, Feng, xen-devel@lists.xen.org >On 12.12.2015 at 9:22pm, <quan.xu@intel.com> wrote: > This patches are based on Kevin Tian's previous discussion 'Revisit VT-d > asynchronous flush issue'. > Fix current timeout concern and also allow limited ATS support in a light way: > 2. Fix vt-d flush timeout issue. > > If IOTLB/Context/IETC flush is timeout, we should think all devices under > this IOMMU cannot function correctly. > So for each device under this IOMMU we'll mark it as unassignable and kill > the domain owning the device. > Any comment for this point? Instead of panic, is it enough? -Quan > If Device-TLB flush is timeout, we'll mark the target ATS device as > unassignable and kill the domain owning > this device. > > If impacted domain is hardware domain, just throw out a warning. It's an > open here whether we want to kill > hardware domain (or directly panic hypervisor). Comments are welcomed. > > Device marked as unassignable will be disallowed to be further assigned to > any domain. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2015-12-19 14:01 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-12 13:21 [PATCH v3 0/2] VT-d flush issue Quan Xu 2015-12-12 13:21 ` [PATCH v3 1/2] VT-d: Reduce spin timeout to 1ms, which can be boot-time changed Quan Xu 2015-12-14 9:14 ` Jan Beulich 2015-12-15 8:35 ` Xu, Quan 2015-12-15 9:18 ` Jan Beulich 2015-12-15 10:10 ` Xu, Quan 2015-12-12 13:21 ` [PATCH v3 2/2] VT-d: Fix vt-d flush timeout issue Quan Xu 2015-12-14 9:28 ` Jan Beulich 2015-12-15 8:15 ` Xu, Quan 2015-12-15 9:16 ` Jan Beulich 2015-12-15 10:24 ` Xu, Quan 2015-12-15 12:32 ` Jan Beulich 2015-12-15 13:01 ` Xu, Quan 2015-12-15 13:26 ` Jan Beulich 2015-12-15 10:58 ` Xu, Quan 2015-12-15 12:23 ` Xu, Quan 2015-12-15 12:29 ` Jan Beulich 2015-12-15 13:31 ` Xu, Quan [not found] ` <5670271402000078000BFA35@prv-mh.provo.novell.com> 2015-12-15 14:49 ` Xu, Quan 2015-12-16 3:51 ` Xu, Quan 2015-12-16 8:08 ` Jan Beulich 2015-12-17 11:43 ` Xu, Quan 2015-12-17 12:10 ` Jan Beulich 2015-12-17 12:55 ` Xu, Quan 2015-12-19 14:01 ` [PATCH v3 0/2] VT-d flush issue Xu, Quan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).