* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-07-19 15:07 [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit Joby Poriyath
@ 2013-07-19 15:23 ` Konrad Rzeszutek Wilk
2013-07-19 16:07 ` Joby Poriyath
2013-07-19 16:38 ` Malcolm Crossley
2013-07-23 10:54 ` Joby Poriyath
2 siblings, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-19 15:23 UTC (permalink / raw)
To: Joby Poriyath, zhenzhong.duan
Cc: keir, Ian.Campbell, andrew.cooper3, xen-devel, JBeulich,
malcolm.crossley
On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote:
> Guest needs the ability to enable and disable MSI-X interrupts
> by setting the MSI-X control bit. Currently, a write to MSI-X
> mask bit by the guest is silently ignored.
>
> A likely scenario is where we have a 82599 SR-IOV nic passed
> through to a guest. From the guest if you do
>
> ifconfig <ETH_DEV> down
> ifconfig <ETH_DEV> up
>
> the interrupts remain masked. The the mask bit for the VF is
> being set by the PF performing a reset (at the request of the VF).
> However, interrupts are enabled by VF driver by clearing the mask
> bit by writing directly to BAR3 region containing the MSI-X table.
>
> From dom0, we can verify that
> interrupts are being masked using 'xl debug-keys M'.
>
> Initially, guest was allowed to modify MSI-X bit.
> Later this behaviour was changed.
> See changeset 74c213c506afcd74a8556dd092995fd4dc38b225.
>
> Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
> ---
> xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++-------------
> 1 file changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 36de312..97d9f93 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -169,6 +169,7 @@ struct msixtbl_entry
> uint32_t msi_ad[3]; /* Shadow of address low, high and data */
> } gentries[MAX_MSIX_ACC_ENTRIES];
> struct rcu_head rcu;
> + struct pirq *pirq;
> };
>
> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
> @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> void *virt;
> unsigned int nr_entry, index;
> int r = X86EMUL_UNHANDLEABLE;
> + unsigned long flags;
> + struct irq_desc *desc;
> + unsigned long orig;
>
> if ( len != 4 || (address & 3) )
> return r;
> @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> if ( !virt )
> goto out;
>
> - /* Do not allow the mask bit to be changed. */
> -#if 0 /* XXX
> - * As the mask bit is the only defined bit in the word, and as the
> - * host MSI-X code doesn't preserve the other bits anyway, doing
> - * this is pointless. So for now just discard the write (also
> - * saving us from having to determine the matching irq_desc).
> - */
> - spin_lock_irqsave(&desc->lock, flags);
> + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags);
> + if ( !desc )
> + goto out;
> +
> + /* The mask bit is the only defined bit in the word. But we
> + * ought to preserve the reserved bits. Clearing the reserved
> + * bits can result in undefined behaviour (see PCI Local Bus
> + * Specification revision 2.3).
So.. if we do it won't that be potentially dangerous?
> + */
> orig = readl(virt);
> - val &= ~PCI_MSIX_VECTOR_BITMASK;
> - val |= orig & PCI_MSIX_VECTOR_BITMASK;
> + val &= PCI_MSIX_VECTOR_BITMASK;
> + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK );
> writel(val, virt);
> spin_unlock_irqrestore(&desc->lock, flags);
> -#endif
>
> r = X86EMUL_OKAY;
> out:
> @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = {
> static void add_msixtbl_entry(struct domain *d,
> struct pci_dev *pdev,
> uint64_t gtable,
> - struct msixtbl_entry *entry)
> + struct msixtbl_entry *entry,
> + struct pirq *pirq)
> {
> u32 len;
>
> @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d,
> entry->table_len = len;
> entry->pdev = pdev;
> entry->gtable = (unsigned long) gtable;
> + entry->pirq = pirq;
>
> list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list);
> }
> @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
>
> entry = new_entry;
> new_entry = NULL;
> - add_msixtbl_entry(d, pdev, gtable, entry);
> + add_msixtbl_entry(d, pdev, gtable, entry, pirq);
>
> found:
> atomic_inc(&entry->refcnt);
> --
> 1.7.10.4
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-07-19 15:23 ` Konrad Rzeszutek Wilk
@ 2013-07-19 16:07 ` Joby Poriyath
0 siblings, 0 replies; 20+ messages in thread
From: Joby Poriyath @ 2013-07-19 16:07 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: keir, Ian.Campbell, andrew.cooper3, zhenzhong.duan, xen-devel,
JBeulich, malcolm.crossley
On Fri, Jul 19, 2013 at 11:23:26AM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote:
> > Guest needs the ability to enable and disable MSI-X interrupts
> > by setting the MSI-X control bit. Currently, a write to MSI-X
> > mask bit by the guest is silently ignored.
> >
> > A likely scenario is where we have a 82599 SR-IOV nic passed
> > through to a guest. From the guest if you do
> >
> > ifconfig <ETH_DEV> down
> > ifconfig <ETH_DEV> up
> >
> > the interrupts remain masked. The the mask bit for the VF is
> > being set by the PF performing a reset (at the request of the VF).
> > However, interrupts are enabled by VF driver by clearing the mask
> > bit by writing directly to BAR3 region containing the MSI-X table.
> >
> > From dom0, we can verify that
> > interrupts are being masked using 'xl debug-keys M'.
> >
> > Initially, guest was allowed to modify MSI-X bit.
> > Later this behaviour was changed.
> > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225.
> >
> > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
> > ---
> > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++-------------
> > 1 file changed, 19 insertions(+), 13 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> > index 36de312..97d9f93 100644
> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -169,6 +169,7 @@ struct msixtbl_entry
> > uint32_t msi_ad[3]; /* Shadow of address low, high and data */
> > } gentries[MAX_MSIX_ACC_ENTRIES];
> > struct rcu_head rcu;
> > + struct pirq *pirq;
> > };
> >
> > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
> > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> > void *virt;
> > unsigned int nr_entry, index;
> > int r = X86EMUL_UNHANDLEABLE;
> > + unsigned long flags;
> > + struct irq_desc *desc;
> > + unsigned long orig;
> >
> > if ( len != 4 || (address & 3) )
> > return r;
> > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> > if ( !virt )
> > goto out;
> >
> > - /* Do not allow the mask bit to be changed. */
> > -#if 0 /* XXX
> > - * As the mask bit is the only defined bit in the word, and as the
> > - * host MSI-X code doesn't preserve the other bits anyway, doing
> > - * this is pointless. So for now just discard the write (also
> > - * saving us from having to determine the matching irq_desc).
> > - */
> > - spin_lock_irqsave(&desc->lock, flags);
> > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags);
> > + if ( !desc )
> > + goto out;
> > +
> > + /* The mask bit is the only defined bit in the word. But we
> > + * ought to preserve the reserved bits. Clearing the reserved
> > + * bits can result in undefined behaviour (see PCI Local Bus
> > + * Specification revision 2.3).
>
> So.. if we do it won't that be potentially dangerous?
Yes it can be. That's why we are reading the value from MSI-X vector
control register and only changing 0th bit and retaining the rest.
> > + */
> > orig = readl(virt);
> > - val &= ~PCI_MSIX_VECTOR_BITMASK;
> > - val |= orig & PCI_MSIX_VECTOR_BITMASK;
> > + val &= PCI_MSIX_VECTOR_BITMASK;
> > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK );
> > writel(val, virt);
> > spin_unlock_irqrestore(&desc->lock, flags);
> > -#endif
> >
> > r = X86EMUL_OKAY;
> > out:
> > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = {
> > static void add_msixtbl_entry(struct domain *d,
> > struct pci_dev *pdev,
> > uint64_t gtable,
> > - struct msixtbl_entry *entry)
> > + struct msixtbl_entry *entry,
> > + struct pirq *pirq)
> > {
> > u32 len;
> >
> > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d,
> > entry->table_len = len;
> > entry->pdev = pdev;
> > entry->gtable = (unsigned long) gtable;
> > + entry->pirq = pirq;
> >
> > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list);
> > }
> > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
> >
> > entry = new_entry;
> > new_entry = NULL;
> > - add_msixtbl_entry(d, pdev, gtable, entry);
> > + add_msixtbl_entry(d, pdev, gtable, entry, pirq);
> >
> > found:
> > atomic_inc(&entry->refcnt);
> > --
> > 1.7.10.4
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-07-19 15:07 [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit Joby Poriyath
2013-07-19 15:23 ` Konrad Rzeszutek Wilk
@ 2013-07-19 16:38 ` Malcolm Crossley
2013-07-23 10:54 ` Joby Poriyath
2 siblings, 0 replies; 20+ messages in thread
From: Malcolm Crossley @ 2013-07-19 16:38 UTC (permalink / raw)
To: Joby Poriyath; +Cc: andrew.cooper3, keir, Ian.Campbell, JBeulich, xen-devel
On 19/07/13 16:07, Joby Poriyath wrote:
> Guest needs the ability to enable and disable MSI-X interrupts
> by setting the MSI-X control bit. Currently, a write to MSI-X
> mask bit by the guest is silently ignored.
>
> A likely scenario is where we have a 82599 SR-IOV nic passed
> through to a guest. From the guest if you do
>
> ifconfig <ETH_DEV> down
> ifconfig <ETH_DEV> up
>
> the interrupts remain masked. The the mask bit for the VF is
> being set by the PF performing a reset (at the request of the VF).
> However, interrupts are enabled by VF driver by clearing the mask
> bit by writing directly to BAR3 region containing the MSI-X table.
>
> From dom0, we can verify that
> interrupts are being masked using 'xl debug-keys M'.
>
> Initially, guest was allowed to modify MSI-X bit.
> Later this behaviour was changed.
> See changeset 74c213c506afcd74a8556dd092995fd4dc38b225.
>
> Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
Reviewed-by: Malcolm Crossley <malcolm.crossley@citrix.com>
> ---
> xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++-------------
> 1 file changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 36de312..97d9f93 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -169,6 +169,7 @@ struct msixtbl_entry
> uint32_t msi_ad[3]; /* Shadow of address low, high and data */
> } gentries[MAX_MSIX_ACC_ENTRIES];
> struct rcu_head rcu;
> + struct pirq *pirq;
> };
>
> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
> @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> void *virt;
> unsigned int nr_entry, index;
> int r = X86EMUL_UNHANDLEABLE;
> + unsigned long flags;
> + struct irq_desc *desc;
> + unsigned long orig;
>
> if ( len != 4 || (address & 3) )
> return r;
> @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> if ( !virt )
> goto out;
>
> - /* Do not allow the mask bit to be changed. */
> -#if 0 /* XXX
> - * As the mask bit is the only defined bit in the word, and as the
> - * host MSI-X code doesn't preserve the other bits anyway, doing
> - * this is pointless. So for now just discard the write (also
> - * saving us from having to determine the matching irq_desc).
> - */
> - spin_lock_irqsave(&desc->lock, flags);
> + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags);
> + if ( !desc )
> + goto out;
> +
> + /* The mask bit is the only defined bit in the word. But we
> + * ought to preserve the reserved bits. Clearing the reserved
> + * bits can result in undefined behaviour (see PCI Local Bus
> + * Specification revision 2.3).
> + */
> orig = readl(virt);
> - val &= ~PCI_MSIX_VECTOR_BITMASK;
> - val |= orig & PCI_MSIX_VECTOR_BITMASK;
> + val &= PCI_MSIX_VECTOR_BITMASK;
> + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK );
> writel(val, virt);
> spin_unlock_irqrestore(&desc->lock, flags);
> -#endif
>
> r = X86EMUL_OKAY;
> out:
> @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = {
> static void add_msixtbl_entry(struct domain *d,
> struct pci_dev *pdev,
> uint64_t gtable,
> - struct msixtbl_entry *entry)
> + struct msixtbl_entry *entry,
> + struct pirq *pirq)
> {
> u32 len;
>
> @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d,
> entry->table_len = len;
> entry->pdev = pdev;
> entry->gtable = (unsigned long) gtable;
> + entry->pirq = pirq;
>
> list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list);
> }
> @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
>
> entry = new_entry;
> new_entry = NULL;
> - add_msixtbl_entry(d, pdev, gtable, entry);
> + add_msixtbl_entry(d, pdev, gtable, entry, pirq);
>
> found:
> atomic_inc(&entry->refcnt);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-07-19 15:07 [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit Joby Poriyath
2013-07-19 15:23 ` Konrad Rzeszutek Wilk
2013-07-19 16:38 ` Malcolm Crossley
@ 2013-07-23 10:54 ` Joby Poriyath
2013-07-23 13:21 ` Andrew Cooper
2013-07-23 13:28 ` Konrad Rzeszutek Wilk
2 siblings, 2 replies; 20+ messages in thread
From: Joby Poriyath @ 2013-07-23 10:54 UTC (permalink / raw)
To: xen-devel; +Cc: andrew.cooper3, malcolm.crossley, keir, Ian.Campbell, JBeulich
Ping...
Could you kindly review this?
Many thanks,
Joby
On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote:
> Guest needs the ability to enable and disable MSI-X interrupts
> by setting the MSI-X control bit. Currently, a write to MSI-X
> mask bit by the guest is silently ignored.
>
> A likely scenario is where we have a 82599 SR-IOV nic passed
> through to a guest. From the guest if you do
>
> ifconfig <ETH_DEV> down
> ifconfig <ETH_DEV> up
>
> the interrupts remain masked. The the mask bit for the VF is
> being set by the PF performing a reset (at the request of the VF).
> However, interrupts are enabled by VF driver by clearing the mask
> bit by writing directly to BAR3 region containing the MSI-X table.
>
> From dom0, we can verify that
> interrupts are being masked using 'xl debug-keys M'.
>
> Initially, guest was allowed to modify MSI-X bit.
> Later this behaviour was changed.
> See changeset 74c213c506afcd74a8556dd092995fd4dc38b225.
>
> Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
> ---
> xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++-------------
> 1 file changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 36de312..97d9f93 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -169,6 +169,7 @@ struct msixtbl_entry
> uint32_t msi_ad[3]; /* Shadow of address low, high and data */
> } gentries[MAX_MSIX_ACC_ENTRIES];
> struct rcu_head rcu;
> + struct pirq *pirq;
> };
>
> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
> @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> void *virt;
> unsigned int nr_entry, index;
> int r = X86EMUL_UNHANDLEABLE;
> + unsigned long flags;
> + struct irq_desc *desc;
> + unsigned long orig;
>
> if ( len != 4 || (address & 3) )
> return r;
> @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> if ( !virt )
> goto out;
>
> - /* Do not allow the mask bit to be changed. */
> -#if 0 /* XXX
> - * As the mask bit is the only defined bit in the word, and as the
> - * host MSI-X code doesn't preserve the other bits anyway, doing
> - * this is pointless. So for now just discard the write (also
> - * saving us from having to determine the matching irq_desc).
> - */
> - spin_lock_irqsave(&desc->lock, flags);
> + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags);
> + if ( !desc )
> + goto out;
> +
> + /* The mask bit is the only defined bit in the word. But we
> + * ought to preserve the reserved bits. Clearing the reserved
> + * bits can result in undefined behaviour (see PCI Local Bus
> + * Specification revision 2.3).
> + */
> orig = readl(virt);
> - val &= ~PCI_MSIX_VECTOR_BITMASK;
> - val |= orig & PCI_MSIX_VECTOR_BITMASK;
> + val &= PCI_MSIX_VECTOR_BITMASK;
> + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK );
> writel(val, virt);
> spin_unlock_irqrestore(&desc->lock, flags);
> -#endif
>
> r = X86EMUL_OKAY;
> out:
> @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = {
> static void add_msixtbl_entry(struct domain *d,
> struct pci_dev *pdev,
> uint64_t gtable,
> - struct msixtbl_entry *entry)
> + struct msixtbl_entry *entry,
> + struct pirq *pirq)
> {
> u32 len;
>
> @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d,
> entry->table_len = len;
> entry->pdev = pdev;
> entry->gtable = (unsigned long) gtable;
> + entry->pirq = pirq;
>
> list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list);
> }
> @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
>
> entry = new_entry;
> new_entry = NULL;
> - add_msixtbl_entry(d, pdev, gtable, entry);
> + add_msixtbl_entry(d, pdev, gtable, entry, pirq);
>
> found:
> atomic_inc(&entry->refcnt);
> --
> 1.7.10.4
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-07-23 10:54 ` Joby Poriyath
@ 2013-07-23 13:21 ` Andrew Cooper
2013-07-23 13:28 ` Konrad Rzeszutek Wilk
1 sibling, 0 replies; 20+ messages in thread
From: Andrew Cooper @ 2013-07-23 13:21 UTC (permalink / raw)
To: Joby Poriyath; +Cc: malcolm.crossley, keir, Ian.Campbell, JBeulich, xen-devel
On 23/07/13 11:54, Joby Poriyath wrote:
> Ping...
>
> Could you kindly review this?
>
> Many thanks,
> Joby
>
> On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote:
>> Guest needs the ability to enable and disable MSI-X interrupts
>> by setting the MSI-X control bit. Currently, a write to MSI-X
>> mask bit by the guest is silently ignored.
>>
>> A likely scenario is where we have a 82599 SR-IOV nic passed
>> through to a guest. From the guest if you do
>>
>> ifconfig <ETH_DEV> down
>> ifconfig <ETH_DEV> up
>>
>> the interrupts remain masked. The the mask bit for the VF is
>> being set by the PF performing a reset (at the request of the VF).
>> However, interrupts are enabled by VF driver by clearing the mask
>> bit by writing directly to BAR3 region containing the MSI-X table.
>>
>> From dom0, we can verify that
>> interrupts are being masked using 'xl debug-keys M'.
>>
>> Initially, guest was allowed to modify MSI-X bit.
>> Later this behaviour was changed.
>> See changeset 74c213c506afcd74a8556dd092995fd4dc38b225.
>>
>> Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
This should be backported to all stable versions (perhaps including 4.1)
as the buggy changeset 74c213c50 was backported in the name of
security/general fixes.
~Andrew
>> ---
>> xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++-------------
>> 1 file changed, 19 insertions(+), 13 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index 36de312..97d9f93 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -169,6 +169,7 @@ struct msixtbl_entry
>> uint32_t msi_ad[3]; /* Shadow of address low, high and data */
>> } gentries[MAX_MSIX_ACC_ENTRIES];
>> struct rcu_head rcu;
>> + struct pirq *pirq;
>> };
>>
>> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
>> @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
>> void *virt;
>> unsigned int nr_entry, index;
>> int r = X86EMUL_UNHANDLEABLE;
>> + unsigned long flags;
>> + struct irq_desc *desc;
>> + unsigned long orig;
>>
>> if ( len != 4 || (address & 3) )
>> return r;
>> @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
>> if ( !virt )
>> goto out;
>>
>> - /* Do not allow the mask bit to be changed. */
>> -#if 0 /* XXX
>> - * As the mask bit is the only defined bit in the word, and as the
>> - * host MSI-X code doesn't preserve the other bits anyway, doing
>> - * this is pointless. So for now just discard the write (also
>> - * saving us from having to determine the matching irq_desc).
>> - */
>> - spin_lock_irqsave(&desc->lock, flags);
>> + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags);
>> + if ( !desc )
>> + goto out;
>> +
>> + /* The mask bit is the only defined bit in the word. But we
>> + * ought to preserve the reserved bits. Clearing the reserved
>> + * bits can result in undefined behaviour (see PCI Local Bus
>> + * Specification revision 2.3).
>> + */
>> orig = readl(virt);
>> - val &= ~PCI_MSIX_VECTOR_BITMASK;
>> - val |= orig & PCI_MSIX_VECTOR_BITMASK;
>> + val &= PCI_MSIX_VECTOR_BITMASK;
>> + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK );
>> writel(val, virt);
>> spin_unlock_irqrestore(&desc->lock, flags);
>> -#endif
>>
>> r = X86EMUL_OKAY;
>> out:
>> @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = {
>> static void add_msixtbl_entry(struct domain *d,
>> struct pci_dev *pdev,
>> uint64_t gtable,
>> - struct msixtbl_entry *entry)
>> + struct msixtbl_entry *entry,
>> + struct pirq *pirq)
>> {
>> u32 len;
>>
>> @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d,
>> entry->table_len = len;
>> entry->pdev = pdev;
>> entry->gtable = (unsigned long) gtable;
>> + entry->pirq = pirq;
>>
>> list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list);
>> }
>> @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
>>
>> entry = new_entry;
>> new_entry = NULL;
>> - add_msixtbl_entry(d, pdev, gtable, entry);
>> + add_msixtbl_entry(d, pdev, gtable, entry, pirq);
>>
>> found:
>> atomic_inc(&entry->refcnt);
>> --
>> 1.7.10.4
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-07-23 10:54 ` Joby Poriyath
2013-07-23 13:21 ` Andrew Cooper
@ 2013-07-23 13:28 ` Konrad Rzeszutek Wilk
2013-07-23 17:59 ` Joby Poriyath
1 sibling, 1 reply; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-23 13:28 UTC (permalink / raw)
To: Joby Poriyath
Cc: keir, Ian.Campbell, andrew.cooper3, xen-devel, JBeulich,
malcolm.crossley
On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote:
> Ping...
>
> Could you kindly review this?
I believe Malcom reviewed and sent you an email with his response.
Are you waiting for Andrew to look at it as well? Or Jan? Jan is
out on vacation for a couple of weeks.
Anyhow if it is Andrew that you are asking to look over it I would
suggest you put his name on the 'To' in the email in case he is
using filters to stick non-To emails in some 'lower priority' mailbox.
>
> Many thanks,
> Joby
>
> On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote:
> > Guest needs the ability to enable and disable MSI-X interrupts
> > by setting the MSI-X control bit. Currently, a write to MSI-X
> > mask bit by the guest is silently ignored.
> >
> > A likely scenario is where we have a 82599 SR-IOV nic passed
> > through to a guest. From the guest if you do
> >
> > ifconfig <ETH_DEV> down
> > ifconfig <ETH_DEV> up
> >
> > the interrupts remain masked. The the mask bit for the VF is
> > being set by the PF performing a reset (at the request of the VF).
> > However, interrupts are enabled by VF driver by clearing the mask
> > bit by writing directly to BAR3 region containing the MSI-X table.
> >
> > From dom0, we can verify that
> > interrupts are being masked using 'xl debug-keys M'.
> >
> > Initially, guest was allowed to modify MSI-X bit.
> > Later this behaviour was changed.
> > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225.
> >
> > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
> > ---
> > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++-------------
> > 1 file changed, 19 insertions(+), 13 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> > index 36de312..97d9f93 100644
> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -169,6 +169,7 @@ struct msixtbl_entry
> > uint32_t msi_ad[3]; /* Shadow of address low, high and data */
> > } gentries[MAX_MSIX_ACC_ENTRIES];
> > struct rcu_head rcu;
> > + struct pirq *pirq;
> > };
> >
> > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
> > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> > void *virt;
> > unsigned int nr_entry, index;
> > int r = X86EMUL_UNHANDLEABLE;
> > + unsigned long flags;
> > + struct irq_desc *desc;
> > + unsigned long orig;
> >
> > if ( len != 4 || (address & 3) )
> > return r;
> > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> > if ( !virt )
> > goto out;
> >
> > - /* Do not allow the mask bit to be changed. */
> > -#if 0 /* XXX
> > - * As the mask bit is the only defined bit in the word, and as the
> > - * host MSI-X code doesn't preserve the other bits anyway, doing
> > - * this is pointless. So for now just discard the write (also
> > - * saving us from having to determine the matching irq_desc).
> > - */
> > - spin_lock_irqsave(&desc->lock, flags);
> > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags);
> > + if ( !desc )
> > + goto out;
> > +
> > + /* The mask bit is the only defined bit in the word. But we
> > + * ought to preserve the reserved bits. Clearing the reserved
> > + * bits can result in undefined behaviour (see PCI Local Bus
> > + * Specification revision 2.3).
> > + */
> > orig = readl(virt);
> > - val &= ~PCI_MSIX_VECTOR_BITMASK;
> > - val |= orig & PCI_MSIX_VECTOR_BITMASK;
> > + val &= PCI_MSIX_VECTOR_BITMASK;
> > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK );
> > writel(val, virt);
> > spin_unlock_irqrestore(&desc->lock, flags);
> > -#endif
> >
> > r = X86EMUL_OKAY;
> > out:
> > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = {
> > static void add_msixtbl_entry(struct domain *d,
> > struct pci_dev *pdev,
> > uint64_t gtable,
> > - struct msixtbl_entry *entry)
> > + struct msixtbl_entry *entry,
> > + struct pirq *pirq)
> > {
> > u32 len;
> >
> > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d,
> > entry->table_len = len;
> > entry->pdev = pdev;
> > entry->gtable = (unsigned long) gtable;
> > + entry->pirq = pirq;
> >
> > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list);
> > }
> > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
> >
> > entry = new_entry;
> > new_entry = NULL;
> > - add_msixtbl_entry(d, pdev, gtable, entry);
> > + add_msixtbl_entry(d, pdev, gtable, entry, pirq);
> >
> > found:
> > atomic_inc(&entry->refcnt);
> > --
> > 1.7.10.4
> >
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-07-23 13:28 ` Konrad Rzeszutek Wilk
@ 2013-07-23 17:59 ` Joby Poriyath
2013-08-05 10:44 ` Jan Beulich
0 siblings, 1 reply; 20+ messages in thread
From: Joby Poriyath @ 2013-07-23 17:59 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: keir, Ian.Campbell, andrew.cooper3, xen-devel, JBeulich,
malcolm.crossley
Hi Konrad,
On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote:
> > Ping...
> >
> > Could you kindly review this?
>
> I believe Malcom reviewed and sent you an email with his response.
Andrew and Malcolm has reviewed the code.
>
> Are you waiting for Andrew to look at it as well? Or Jan? Jan is
> out on vacation for a couple of weeks.
I was hoping that Jan would also review this, since this patch
partly reverts his code.
This patch makes an assumption that, for a passed through device,
xen needs to mask the MSI-X control bit only when it does vcpu
migration. Otherwise xen leaves it in the state guest has set it.
>
> Anyhow if it is Andrew that you are asking to look over it I would
> suggest you put his name on the 'To' in the email in case he is
> using filters to stick non-To emails in some 'lower priority' mailbox.
>
> >
> > Many thanks,
> > Joby
> >
Thanks,
Joby
> > On Fri, Jul 19, 2013 at 04:07:37PM +0100, Joby Poriyath wrote:
> > > Guest needs the ability to enable and disable MSI-X interrupts
> > > by setting the MSI-X control bit. Currently, a write to MSI-X
> > > mask bit by the guest is silently ignored.
> > >
> > > A likely scenario is where we have a 82599 SR-IOV nic passed
> > > through to a guest. From the guest if you do
> > >
> > > ifconfig <ETH_DEV> down
> > > ifconfig <ETH_DEV> up
> > >
> > > the interrupts remain masked. The the mask bit for the VF is
> > > being set by the PF performing a reset (at the request of the VF).
> > > However, interrupts are enabled by VF driver by clearing the mask
> > > bit by writing directly to BAR3 region containing the MSI-X table.
> > >
> > > From dom0, we can verify that
> > > interrupts are being masked using 'xl debug-keys M'.
> > >
> > > Initially, guest was allowed to modify MSI-X bit.
> > > Later this behaviour was changed.
> > > See changeset 74c213c506afcd74a8556dd092995fd4dc38b225.
> > >
> > > Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
> > > ---
> > > xen/arch/x86/hvm/vmsi.c | 32 +++++++++++++++++++-------------
> > > 1 file changed, 19 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> > > index 36de312..97d9f93 100644
> > > --- a/xen/arch/x86/hvm/vmsi.c
> > > +++ b/xen/arch/x86/hvm/vmsi.c
> > > @@ -169,6 +169,7 @@ struct msixtbl_entry
> > > uint32_t msi_ad[3]; /* Shadow of address low, high and data */
> > > } gentries[MAX_MSIX_ACC_ENTRIES];
> > > struct rcu_head rcu;
> > > + struct pirq *pirq;
> > > };
> > >
> > > static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
> > > @@ -254,6 +255,9 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> > > void *virt;
> > > unsigned int nr_entry, index;
> > > int r = X86EMUL_UNHANDLEABLE;
> > > + unsigned long flags;
> > > + struct irq_desc *desc;
> > > + unsigned long orig;
> > >
> > > if ( len != 4 || (address & 3) )
> > > return r;
> > > @@ -283,20 +287,20 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
> > > if ( !virt )
> > > goto out;
> > >
> > > - /* Do not allow the mask bit to be changed. */
> > > -#if 0 /* XXX
> > > - * As the mask bit is the only defined bit in the word, and as the
> > > - * host MSI-X code doesn't preserve the other bits anyway, doing
> > > - * this is pointless. So for now just discard the write (also
> > > - * saving us from having to determine the matching irq_desc).
> > > - */
> > > - spin_lock_irqsave(&desc->lock, flags);
> > > + desc = pirq_spin_lock_irq_desc(entry->pirq, &flags);
> > > + if ( !desc )
> > > + goto out;
> > > +
> > > + /* The mask bit is the only defined bit in the word. But we
> > > + * ought to preserve the reserved bits. Clearing the reserved
> > > + * bits can result in undefined behaviour (see PCI Local Bus
> > > + * Specification revision 2.3).
> > > + */
> > > orig = readl(virt);
> > > - val &= ~PCI_MSIX_VECTOR_BITMASK;
> > > - val |= orig & PCI_MSIX_VECTOR_BITMASK;
> > > + val &= PCI_MSIX_VECTOR_BITMASK;
> > > + val |= ( orig & ~PCI_MSIX_VECTOR_BITMASK );
> > > writel(val, virt);
> > > spin_unlock_irqrestore(&desc->lock, flags);
> > > -#endif
> > >
> > > r = X86EMUL_OKAY;
> > > out:
> > > @@ -328,7 +332,8 @@ const struct hvm_mmio_handler msixtbl_mmio_handler = {
> > > static void add_msixtbl_entry(struct domain *d,
> > > struct pci_dev *pdev,
> > > uint64_t gtable,
> > > - struct msixtbl_entry *entry)
> > > + struct msixtbl_entry *entry,
> > > + struct pirq *pirq)
> > > {
> > > u32 len;
> > >
> > > @@ -342,6 +347,7 @@ static void add_msixtbl_entry(struct domain *d,
> > > entry->table_len = len;
> > > entry->pdev = pdev;
> > > entry->gtable = (unsigned long) gtable;
> > > + entry->pirq = pirq;
> > >
> > > list_add_rcu(&entry->list, &d->arch.hvm_domain.msixtbl_list);
> > > }
> > > @@ -404,7 +410,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq *pirq, uint64_t gtable)
> > >
> > > entry = new_entry;
> > > new_entry = NULL;
> > > - add_msixtbl_entry(d, pdev, gtable, entry);
> > > + add_msixtbl_entry(d, pdev, gtable, entry, pirq);
> > >
> > > found:
> > > atomic_inc(&entry->refcnt);
> > > --
> > > 1.7.10.4
> > >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-07-23 17:59 ` Joby Poriyath
@ 2013-08-05 10:44 ` Jan Beulich
2013-08-05 11:01 ` Andrew Cooper
0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2013-08-05 10:44 UTC (permalink / raw)
To: Joby Poriyath
Cc: keir, Ian.Campbell, andrew.cooper3, xen-devel, malcolm.crossley
>>> On 23.07.13 at 19:59, Joby Poriyath <joby.poriyath@citrix.com> wrote:
> On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote:
>> > Ping...
>> >
>> > Could you kindly review this?
>>
>> I believe Malcom reviewed and sent you an email with his response.
>
> Andrew and Malcolm has reviewed the code.
>
>>
>> Are you waiting for Andrew to look at it as well? Or Jan? Jan is
>> out on vacation for a couple of weeks.
>
> I was hoping that Jan would also review this, since this patch
> partly reverts his code.
>
> This patch makes an assumption that, for a passed through device,
> xen needs to mask the MSI-X control bit only when it does vcpu
> migration. Otherwise xen leaves it in the state guest has set it.
Exactly. And hence the patch is wrong, re-introducing a security
issue. If you need to give the guest control over the _virtual_
mask bit, you need to add proper emulation / masking of the
delivery of the virtual interrupt (i.e. the physical mask bit should
always be under the control of Xen, but interrupts should not be
delivered to the guest when the virtual mask bit is set).
In any event - if the solution was as simple as what you propose,
I would have implemented it right away rather than leaving the
code known broken (which in fact was partly based on the
observation that KVM/QEMU - at least at that time - got away
without properly emulating the guest attempts to control this bit
as well).
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-05 10:44 ` Jan Beulich
@ 2013-08-05 11:01 ` Andrew Cooper
2013-08-05 11:49 ` Jan Beulich
0 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2013-08-05 11:01 UTC (permalink / raw)
To: Jan Beulich
Cc: keir, Ian.Campbell, xen-devel, Joby Poriyath, malcolm.crossley
On 05/08/13 11:44, Jan Beulich wrote:
>>>> On 23.07.13 at 19:59, Joby Poriyath <joby.poriyath@citrix.com> wrote:
>> On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote:
>>>> Ping...
>>>>
>>>> Could you kindly review this?
>>> I believe Malcom reviewed and sent you an email with his response.
>> Andrew and Malcolm has reviewed the code.
>>
>>> Are you waiting for Andrew to look at it as well? Or Jan? Jan is
>>> out on vacation for a couple of weeks.
>> I was hoping that Jan would also review this, since this patch
>> partly reverts his code.
>>
>> This patch makes an assumption that, for a passed through device,
>> xen needs to mask the MSI-X control bit only when it does vcpu
>> migration. Otherwise xen leaves it in the state guest has set it.
> Exactly. And hence the patch is wrong,
Unfortunately it is not.
The whole problem is that the VF can (and does) request that the PF
resets itself (using an out-of-band mailbox system in the hardware
itself), causing the mask bit to change (i.e. get set) without Xen's
knowledge.
At this point, it is only the guest which can rectify the situation and
clear the mask bit.
I would agree that the solution is not perfect, but SRIOV virtual
functions rely on the ability to clear the mask bit, whether or not it
is a security issue. This means that your patch to disable writing to
the mask bit has caused a functional regression in all SRIOV operations
in Xen.
I would say that the less bad alternative is to reintroduce the ability
to change the mask bit, and fix the regression, then work out how to fix
the security aspect.
~Andrew
> re-introducing a security
> issue. If you need to give the guest control over the _virtual_
> mask bit, you need to add proper emulation / masking of the
> delivery of the virtual interrupt (i.e. the physical mask bit should
> always be under the control of Xen, but interrupts should not be
> delivered to the guest when the virtual mask bit is set).
>
> In any event - if the solution was as simple as what you propose,
> I would have implemented it right away rather than leaving the
> code known broken (which in fact was partly based on the
> observation that KVM/QEMU - at least at that time - got away
> without properly emulating the guest attempts to control this bit
> as well).
>
> Jan
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-05 11:01 ` Andrew Cooper
@ 2013-08-05 11:49 ` Jan Beulich
2013-08-05 16:03 ` Andrew Cooper
0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2013-08-05 11:49 UTC (permalink / raw)
To: Andrew Cooper
Cc: keir, Ian.Campbell, xen-devel, Joby Poriyath, malcolm.crossley
>>> On 05.08.13 at 13:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 05/08/13 11:44, Jan Beulich wrote:
>>>>> On 23.07.13 at 19:59, Joby Poriyath <joby.poriyath@citrix.com> wrote:
>>> On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote:
>>>> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote:
>>>>> Ping...
>>>>>
>>>>> Could you kindly review this?
>>>> I believe Malcom reviewed and sent you an email with his response.
>>> Andrew and Malcolm has reviewed the code.
>>>
>>>> Are you waiting for Andrew to look at it as well? Or Jan? Jan is
>>>> out on vacation for a couple of weeks.
>>> I was hoping that Jan would also review this, since this patch
>>> partly reverts his code.
>>>
>>> This patch makes an assumption that, for a passed through device,
>>> xen needs to mask the MSI-X control bit only when it does vcpu
>>> migration. Otherwise xen leaves it in the state guest has set it.
>> Exactly. And hence the patch is wrong,
>
> Unfortunately it is not.
>
> The whole problem is that the VF can (and does) request that the PF
> resets itself (using an out-of-band mailbox system in the hardware
> itself), causing the mask bit to change (i.e. get set) without Xen's
> knowledge.
The at the very least such a code change should be done for VFs
only. But I'd object to that too.
> At this point, it is only the guest which can rectify the situation and
> clear the mask bit.
>
> I would agree that the solution is not perfect, but SRIOV virtual
> functions rely on the ability to clear the mask bit, whether or not it
> is a security issue. This means that your patch to disable writing to
> the mask bit has caused a functional regression in all SRIOV operations
> in Xen.
Only in the special case that was described in the patch header
afaict (or else I'm sure we would have known much earlier).
> I would say that the less bad alternative is to reintroduce the ability
> to change the mask bit, and fix the regression, then work out how to fix
> the security aspect.
No, my position is pretty clear here: No re-introduction of security
issues when fixing secondary problems.
The more that the immediate (but only partial) solution would be quite
clear: If the guest wants to clear the flag and Xen thinks the flag is
clear, it could certainly allow the write. The security aspect here is
when the guest wants to clear the flag while Xen expects it to be set.
But a proper solution would of course be preferred. That may involve
notifying Xen of the reset from the PF driver.
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-05 11:49 ` Jan Beulich
@ 2013-08-05 16:03 ` Andrew Cooper
2013-08-06 8:29 ` Jan Beulich
0 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2013-08-05 16:03 UTC (permalink / raw)
To: Jan Beulich
Cc: keir, Ian.Campbell, xen-devel, Joby Poriyath, malcolm.crossley
On 05/08/13 12:49, Jan Beulich wrote:
>>>> On 05.08.13 at 13:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 05/08/13 11:44, Jan Beulich wrote:
>>>>>> On 23.07.13 at 19:59, Joby Poriyath <joby.poriyath@citrix.com> wrote:
>>>> On Tue, Jul 23, 2013 at 09:28:42AM -0400, Konrad Rzeszutek Wilk wrote:
>>>>> On Tue, Jul 23, 2013 at 11:54:41AM +0100, Joby Poriyath wrote:
>>>>>> Ping...
>>>>>>
>>>>>> Could you kindly review this?
>>>>> I believe Malcom reviewed and sent you an email with his response.
>>>> Andrew and Malcolm has reviewed the code.
>>>>
>>>>> Are you waiting for Andrew to look at it as well? Or Jan? Jan is
>>>>> out on vacation for a couple of weeks.
>>>> I was hoping that Jan would also review this, since this patch
>>>> partly reverts his code.
>>>>
>>>> This patch makes an assumption that, for a passed through device,
>>>> xen needs to mask the MSI-X control bit only when it does vcpu
>>>> migration. Otherwise xen leaves it in the state guest has set it.
>>> Exactly. And hence the patch is wrong,
>> Unfortunately it is not.
>>
>> The whole problem is that the VF can (and does) request that the PF
>> resets itself (using an out-of-band mailbox system in the hardware
>> itself), causing the mask bit to change (i.e. get set) without Xen's
>> knowledge.
> The at the very least such a code change should be done for VFs
> only. But I'd object to that too.
>
>> At this point, it is only the guest which can rectify the situation and
>> clear the mask bit.
>>
>> I would agree that the solution is not perfect, but SRIOV virtual
>> functions rely on the ability to clear the mask bit, whether or not it
>> is a security issue. This means that your patch to disable writing to
>> the mask bit has caused a functional regression in all SRIOV operations
>> in Xen.
> Only in the special case that was described in the patch header
> afaict (or else I'm sure we would have known much earlier).
No - this unconditionally breaks any attempt to use an SRIOV virtual
function by a VF-aware driver. This includes FreeBSD and SLES VMs, and
results in a complete loss of interrupts, because the mask bit has been
set without Xen's knowledge, and nothing ever clears it, as the guests
attempt to clear it gets intercepted and discarded.
This was discovered as a regression from XenServer 6.1 to 6.2, and was
identified specifically this change which was backported into Xen 4.1.5
>
>> I would say that the less bad alternative is to reintroduce the ability
>> to change the mask bit, and fix the regression, then work out how to fix
>> the security aspect.
> No, my position is pretty clear here: No re-introduction of security
> issues when fixing secondary problems.
Then we have no option but to state that SRIOV is completely broken
under Xen (including all stable trees) until we have a fix.
>
> The more that the immediate (but only partial) solution would be quite
> clear: If the guest wants to clear the flag and Xen thinks the flag is
> clear, it could certainly allow the write. The security aspect here is
> when the guest wants to clear the flag while Xen expects it to be set.
>
> But a proper solution would of course be preferred. That may involve
> notifying Xen of the reset from the PF driver.
>
> Jan
>
That is making the assumption that the PF driver is even aware that the
reset has occurred. If the PF driver can get at the reset information,
I still cant see Xen specific hooks being accepted upstream.
Fundamentally, given that only the VM is in a position to actually know
about the state of the mask bit (As we cant possibly have Xen polling
for the state), then the guest has to have access to the mask bit.
~Andrew
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-05 16:03 ` Andrew Cooper
@ 2013-08-06 8:29 ` Jan Beulich
2013-08-06 9:52 ` Andrew Cooper
2013-08-13 17:08 ` Joby Poriyath
0 siblings, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2013-08-06 8:29 UTC (permalink / raw)
To: Andrew Cooper
Cc: keir, Ian.Campbell, xen-devel, Joby Poriyath, malcolm.crossley
>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 05/08/13 12:49, Jan Beulich wrote:
>> But a proper solution would of course be preferred. That may involve
>> notifying Xen of the reset from the PF driver.
>
> That is making the assumption that the PF driver is even aware that the
> reset has occurred. If the PF driver can get at the reset information,
> I still cant see Xen specific hooks being accepted upstream.
You sort of contradict your earlier statement here that "the VF
requests the PF to reset itself" (where you probably meant "it",
as the VF is hardly allowed to cause a reset of the PF): Such an
operation should imo be visible to the PF driver.
And then, even if the reset is being done, shouldn't the interrupt
setup occur _after_ the reset? Which would make Xen clear the
flag...
> Fundamentally, given that only the VM is in a position to actually know
> about the state of the mask bit (As we cant possibly have Xen polling
> for the state), then the guest has to have access to the mask bit.
No, that continues to not be an option as long as Xen uses the mask
bit itself. Once again, the obvious immediate solution would appear
to be to carry out the write with the OR of both the guest intended
and the hypervisor required mask bits, thus allowing the bit to get
cleared immediately if Xen doesn't need it to be set (and the clearing
would happen subsequently if Xen needs the flag to be set at the
point of the guest write). This requires associating back the table
entry to the IRQ in question (if any), in order to look at
irq_desc->msi_desc->msi_attrib.masked.
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-06 8:29 ` Jan Beulich
@ 2013-08-06 9:52 ` Andrew Cooper
2013-08-06 10:17 ` Jan Beulich
2013-08-06 10:17 ` Pasi Kärkkäinen
2013-08-13 17:08 ` Joby Poriyath
1 sibling, 2 replies; 20+ messages in thread
From: Andrew Cooper @ 2013-08-06 9:52 UTC (permalink / raw)
To: Jan Beulich
Cc: keir, Ian.Campbell, xen-devel, Joby Poriyath, malcolm.crossley
On 06/08/13 09:29, Jan Beulich wrote:
>>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 05/08/13 12:49, Jan Beulich wrote:
>>> But a proper solution would of course be preferred. That may involve
>>> notifying Xen of the reset from the PF driver.
>> That is making the assumption that the PF driver is even aware that the
>> reset has occurred. If the PF driver can get at the reset information,
>> I still cant see Xen specific hooks being accepted upstream.
> You sort of contradict your earlier statement here that "the VF
> requests the PF to reset itself" (where you probably meant "it",
> as the VF is hardly allowed to cause a reset of the PF): Such an
> operation should imo be visible to the PF driver.
>
> And then, even if the reset is being done, shouldn't the interrupt
> setup occur _after_ the reset? Which would make Xen clear the
> flag...
The observed behaviour is this: On startup, the VF driver issues this
backchannel reset of itself (the VF) which, amongst other things, sets
the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but
on further consideration, relying on this behaviour might not be a good
idea. The VF is then expected to re-enable the interrupt at its
convenience.
There are no obvious hooks in the PF driver to receive notification of
these backchannel resets.
~Andrew
>
>> Fundamentally, given that only the VM is in a position to actually know
>> about the state of the mask bit (As we cant possibly have Xen polling
>> for the state), then the guest has to have access to the mask bit.
> No, that continues to not be an option as long as Xen uses the mask
> bit itself. Once again, the obvious immediate solution would appear
> to be to carry out the write with the OR of both the guest intended
> and the hypervisor required mask bits, thus allowing the bit to get
> cleared immediately if Xen doesn't need it to be set (and the clearing
> would happen subsequently if Xen needs the flag to be set at the
> point of the guest write). This requires associating back the table
> entry to the IRQ in question (if any), in order to look at
> irq_desc->msi_desc->msi_attrib.masked.
>
> Jan
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-06 9:52 ` Andrew Cooper
@ 2013-08-06 10:17 ` Jan Beulich
2013-08-06 10:17 ` Pasi Kärkkäinen
1 sibling, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2013-08-06 10:17 UTC (permalink / raw)
To: Andrew Cooper
Cc: keir, Ian.Campbell, xen-devel, Joby Poriyath, malcolm.crossley
>>> On 06.08.13 at 11:52, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 06/08/13 09:29, Jan Beulich wrote:
>>>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 05/08/13 12:49, Jan Beulich wrote:
>>>> But a proper solution would of course be preferred. That may involve
>>>> notifying Xen of the reset from the PF driver.
>>> That is making the assumption that the PF driver is even aware that the
>>> reset has occurred. If the PF driver can get at the reset information,
>>> I still cant see Xen specific hooks being accepted upstream.
>> You sort of contradict your earlier statement here that "the VF
>> requests the PF to reset itself" (where you probably meant "it",
>> as the VF is hardly allowed to cause a reset of the PF): Such an
>> operation should imo be visible to the PF driver.
>>
>> And then, even if the reset is being done, shouldn't the interrupt
>> setup occur _after_ the reset? Which would make Xen clear the
>> flag...
>
> The observed behaviour is this: On startup, the VF driver issues this
> backchannel reset of itself (the VF) which, amongst other things, sets
> the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but
> on further consideration, relying on this behaviour might not be a good
> idea. The VF is then expected to re-enable the interrupt at its
> convenience.
Re-enable? Not set up? That would at once deal with your valid
concern regarding MSI addr/data.
This still looks to me like a device/driver specific bug rather than a
general SR-IOV issue...
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-06 9:52 ` Andrew Cooper
2013-08-06 10:17 ` Jan Beulich
@ 2013-08-06 10:17 ` Pasi Kärkkäinen
2013-08-06 13:11 ` Pasi Kärkkäinen
1 sibling, 1 reply; 20+ messages in thread
From: Pasi Kärkkäinen @ 2013-08-06 10:17 UTC (permalink / raw)
To: Andrew Cooper
Cc: keir, Ian.Campbell, xen-devel, Jan Beulich, Joby Poriyath,
malcolm.crossley
On Tue, Aug 06, 2013 at 10:52:58AM +0100, Andrew Cooper wrote:
> On 06/08/13 09:29, Jan Beulich wrote:
> >>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> On 05/08/13 12:49, Jan Beulich wrote:
> >>> But a proper solution would of course be preferred. That may involve
> >>> notifying Xen of the reset from the PF driver.
> >> That is making the assumption that the PF driver is even aware that the
> >> reset has occurred. If the PF driver can get at the reset information,
> >> I still cant see Xen specific hooks being accepted upstream.
> > You sort of contradict your earlier statement here that "the VF
> > requests the PF to reset itself" (where you probably meant "it",
> > as the VF is hardly allowed to cause a reset of the PF): Such an
> > operation should imo be visible to the PF driver.
> >
> > And then, even if the reset is being done, shouldn't the interrupt
> > setup occur _after_ the reset? Which would make Xen clear the
> > flag...
>
> The observed behaviour is this: On startup, the VF driver issues this
> backchannel reset of itself (the VF) which, amongst other things, sets
> the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but
> on further consideration, relying on this behaviour might not be a good
> idea. The VF is then expected to re-enable the interrupt at its
> convenience.
>
> There are no obvious hooks in the PF driver to receive notification of
> these backchannel resets.
>
Doesn't the PF driver log some entries to dom0 kernel dmesg when the VF in domU resets itself?
-- Pasi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-06 10:17 ` Pasi Kärkkäinen
@ 2013-08-06 13:11 ` Pasi Kärkkäinen
2013-08-13 17:37 ` Joby Poriyath
0 siblings, 1 reply; 20+ messages in thread
From: Pasi Kärkkäinen @ 2013-08-06 13:11 UTC (permalink / raw)
To: Andrew Cooper
Cc: keir, Ian.Campbell, xen-devel, Jan Beulich, Joby Poriyath,
malcolm.crossley
On Tue, Aug 06, 2013 at 01:17:35PM +0300, Pasi Kärkkäinen wrote:
> On Tue, Aug 06, 2013 at 10:52:58AM +0100, Andrew Cooper wrote:
> > On 06/08/13 09:29, Jan Beulich wrote:
> > >>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > >> On 05/08/13 12:49, Jan Beulich wrote:
> > >>> But a proper solution would of course be preferred. That may involve
> > >>> notifying Xen of the reset from the PF driver.
> > >> That is making the assumption that the PF driver is even aware that the
> > >> reset has occurred. If the PF driver can get at the reset information,
> > >> I still cant see Xen specific hooks being accepted upstream.
> > > You sort of contradict your earlier statement here that "the VF
> > > requests the PF to reset itself" (where you probably meant "it",
> > > as the VF is hardly allowed to cause a reset of the PF): Such an
> > > operation should imo be visible to the PF driver.
> > >
> > > And then, even if the reset is being done, shouldn't the interrupt
> > > setup occur _after_ the reset? Which would make Xen clear the
> > > flag...
> >
> > The observed behaviour is this: On startup, the VF driver issues this
> > backchannel reset of itself (the VF) which, amongst other things, sets
> > the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but
> > on further consideration, relying on this behaviour might not be a good
> > idea. The VF is then expected to re-enable the interrupt at its
> > convenience.
> >
> > There are no obvious hooks in the PF driver to receive notification of
> > these backchannel resets.
> >
>
> Doesn't the PF driver log some entries to dom0 kernel dmesg when the VF in domU resets itself?
>
At least I've seen such messages from the Intel igxbe PF driver.. or did I misunderstand something?
-- Pasi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-06 13:11 ` Pasi Kärkkäinen
@ 2013-08-13 17:37 ` Joby Poriyath
2013-08-14 10:11 ` Jan Beulich
0 siblings, 1 reply; 20+ messages in thread
From: Joby Poriyath @ 2013-08-13 17:37 UTC (permalink / raw)
To: Pasi Kärkkäinen
Cc: Andrew Cooper, malcolm.crossley, Jan Beulich, xen-devel
On Tue, Aug 06, 2013 at 04:11:56PM +0300, Pasi Kärkkäinen wrote:
> On Tue, Aug 06, 2013 at 01:17:35PM +0300, Pasi Kärkkäinen wrote:
> > On Tue, Aug 06, 2013 at 10:52:58AM +0100, Andrew Cooper wrote:
> > > On 06/08/13 09:29, Jan Beulich wrote:
> > > >>>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > > >> On 05/08/13 12:49, Jan Beulich wrote:
> > > >>> But a proper solution would of course be preferred. That may involve
> > > >>> notifying Xen of the reset from the PF driver.
> > > >> That is making the assumption that the PF driver is even aware that the
> > > >> reset has occurred. If the PF driver can get at the reset information,
> > > >> I still cant see Xen specific hooks being accepted upstream.
> > > > You sort of contradict your earlier statement here that "the VF
> > > > requests the PF to reset itself" (where you probably meant "it",
> > > > as the VF is hardly allowed to cause a reset of the PF): Such an
> > > > operation should imo be visible to the PF driver.
> > > >
> > > > And then, even if the reset is being done, shouldn't the interrupt
> > > > setup occur _after_ the reset? Which would make Xen clear the
> > > > flag...
> > >
> > > The observed behaviour is this: On startup, the VF driver issues this
> > > backchannel reset of itself (the VF) which, amongst other things, sets
> > > the mask bit. It leaves the MSI/MSI-X addr and data fields alone, but
> > > on further consideration, relying on this behaviour might not be a good
> > > idea. The VF is then expected to re-enable the interrupt at its
> > > convenience.
> > >
> > > There are no obvious hooks in the PF driver to receive notification of
> > > these backchannel resets.
> > >
> >
> > Doesn't the PF driver log some entries to dom0 kernel dmesg when the VF in domU resets itself?
> >
>
> At least I've seen such messages from the Intel igxbe PF driver.. or did I misunderstand something?
>
Looking at the ixgbevf code in kernel (tip), reset function is
ixgbevf_reset_hw_vf.
The reset happens with
IXGBE_WRITE_REG(hw, IXGBE_VFCTRL, IXGBE_CTRL_RST);
This will in turn mask the MSI-X vector. The controller does this automagically.
<snip> from intel 82599 controller spec
MSI–X Vector Control — MSIXTVCTRL (BAR3: 0xC + 0x10*n, n=0...255; RW)
Bits Default Type
0 1b RW
Description
-----------
Mask Bit.
When this bit is set, the function is prohibited from sending a message using this
MSI-X table entry. However, any other MSI-X table entries programmed with the same
vector are still capable of sending an equivalent message unless they are also
masked. This bit’s state after reset is 1b (entry is masked).
This bit is read/write.
<snip>
After the reset, VF sends a message to PF informing PF about the reset.
Guest was allowed to modify the MSI-X control bit to reduce the load on dom0.
Otherwise this would have to be emulated by qemu-dm. I found a paper
which explains this in detail (http://zhenxiao.com/read/SR-IOV.pdf).
> -- Pasi
>
Joby
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-13 17:37 ` Joby Poriyath
@ 2013-08-14 10:11 ` Jan Beulich
0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2013-08-14 10:11 UTC (permalink / raw)
To: Joby Poriyath, pasik; +Cc: Andrew Cooper, malcolm.crossley, xen-devel
>>> On 13.08.13 at 19:37, Joby Poriyath <joby.poriyath@citrix.com> wrote:
> Guest was allowed to modify the MSI-X control bit to reduce the load on
> dom0.
> Otherwise this would have to be emulated by qemu-dm. I found a paper
> which explains this in detail (http://zhenxiao.com/read/SR-IOV.pdf).
Without going through it in full, I wasn't able to find any mention
of the guest being granted to the MSI-X mask bit (searching for
all of "MSI-X", "mask", and "bit" individually). Which would anyway
be more consistent with MSI, where the mask bit in PCI config
space, and hence the guest can't be allowed direct access in any
case.
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] interrupts: allow guest to set and clear MSI-X mask bit
2013-08-06 8:29 ` Jan Beulich
2013-08-06 9:52 ` Andrew Cooper
@ 2013-08-13 17:08 ` Joby Poriyath
1 sibling, 0 replies; 20+ messages in thread
From: Joby Poriyath @ 2013-08-13 17:08 UTC (permalink / raw)
To: Jan Beulich
Cc: keir, Ian.Campbell, Andrew Cooper, xen-devel, malcolm.crossley
Hi Jan,
On Tue, Aug 06, 2013 at 09:29:22AM +0100, Jan Beulich wrote:
> >>> On 05.08.13 at 18:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > On 05/08/13 12:49, Jan Beulich wrote:
> >> But a proper solution would of course be preferred. That may involve
> >> notifying Xen of the reset from the PF driver.
> >
> > That is making the assumption that the PF driver is even aware that the
> > reset has occurred. If the PF driver can get at the reset information,
> > I still cant see Xen specific hooks being accepted upstream.
>
> You sort of contradict your earlier statement here that "the VF
> requests the PF to reset itself" (where you probably meant "it",
> as the VF is hardly allowed to cause a reset of the PF): Such an
> operation should imo be visible to the PF driver.
>
> And then, even if the reset is being done, shouldn't the interrupt
> setup occur _after_ the reset? Which would make Xen clear the
> flag...
>
> > Fundamentally, given that only the VM is in a position to actually know
> > about the state of the mask bit (As we cant possibly have Xen polling
> > for the state), then the guest has to have access to the mask bit.
>
> No, that continues to not be an option as long as Xen uses the mask
> bit itself. Once again, the obvious immediate solution would appear
> to be to carry out the write with the OR of both the guest intended
> and the hypervisor required mask bits, thus allowing the bit to get
> cleared immediately if Xen doesn't need it to be set (and the clearing
> would happen subsequently if Xen needs the flag to be set at the
> point of the guest write). This requires associating back the table
> entry to the IRQ in question (if any), in order to look at
> irq_desc->msi_desc->msi_attrib.masked.
Thanks for the suggestion.
The reason why guest was allowed to modify MSIX control bit, was to reduce
the load on qemu-dm (see http://zhenxiao.com/read/SR-IOV.pdf).
I'll post the updated patch.
>
> Jan
>
Thanks,
Joby
^ permalink raw reply [flat|nested] 20+ messages in thread