* [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
@ 2018-11-15 1:10 Chao Gao
2018-11-15 10:40 ` Roger Pau Monné
2018-12-11 17:03 ` Jan Beulich
0 siblings, 2 replies; 12+ messages in thread
From: Chao Gao @ 2018-11-15 1:10 UTC (permalink / raw)
To: xen-devel
Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Jan Beulich,
Chao Gao
I find some pass-thru devices don't work any more across guest
reboot. Assigning it to another domain also meets the same issue. And
the only way to make it work again is un-binding and binding it to
pciback. Someone reported this issue one year ago [1].
If the device's driver doesn't disable MSI-X during shutdown or qemu is
killed/crashed before the domain shutdown, this domain's pirq won't be
unmapped. Then xen will unmap all pirq. But pciback has already disabled
meory decoding before xen unmapping pirq. Then when Xen is disabling a
MSI of the device, it has to sets the host_maskall flag and maskall bit
to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
this process is:
->arch_domain_destroy
->free_domain_pirqs
->unmap_domain_pirq (if pirq isn't unmap by qemu)
->pirq_guest_force_unbind
->__pirq_guest_unbind
->mask_msi_irq(=desc->handler->disable())
->the warning in msi_set_mask_bit()
The host_maskall bit will prevent guests from clearing the maskall bit
even the device is assigned to another guest later. Guests cannot
receive interrupts from this device.
To fix this, host_maskall flag is cleared when all MSIs of a device are freed.
It is definitely safely to clear it because no msi is actually set up
for this device. Also, 'msix->warned' is initialized to DOMID_INVALID
rather than 0 to avoid warnings missing for Dom0.
[1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
xen/arch/x86/msi.c | 18 ++++++++++++++++++
xen/drivers/passthrough/pci.c | 1 +
2 files changed, 19 insertions(+)
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 5567990..cd570bc 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -637,6 +637,7 @@ int msi_free_irq(struct msi_desc *entry)
{
unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
? entry->msi.nvec : 1;
+ struct pci_dev *pdev = entry->dev;
while ( nr-- )
{
@@ -654,6 +655,23 @@ int msi_free_irq(struct msi_desc *entry)
list_del(&entry->list);
xfree(entry);
+
+ /*
+ * In some cases, the 'host_maskall' is set for safety. Here clear
+ * 'host_maskall' if all MSIs are freed for a msi-x capable device.
+ * Without it, the device's msix keeps disabled even been reassigned
+ * to another domain.
+ */
+ if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
+ {
+ if ( pdev->msix->host_maskall )
+ printk(XENLOG_G_WARNING
+ "Resetting msix status of %04x:%02x:%02x.%u\n",
+ pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+ PCI_FUNC(pdev->devfn));
+ pdev->msix->host_maskall = false;
+ pdev->msix->warned = DOMID_INVALID;
+ }
return 0;
}
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e5b9602..d0ae03d 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -327,6 +327,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
return NULL;
}
spin_lock_init(&msix->table_lock);
+ msix->warned = DOMID_INVALID;
pdev->msix = msix;
}
--
1.8.3.1
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-15 1:10 [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
@ 2018-11-15 10:40 ` Roger Pau Monné
2018-11-16 7:53 ` Chao Gao
2018-12-11 17:03 ` Jan Beulich
1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2018-11-15 10:40 UTC (permalink / raw)
To: Chao Gao; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper
On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote:
> I find some pass-thru devices don't work any more across guest
> reboot. Assigning it to another domain also meets the same issue. And
> the only way to make it work again is un-binding and binding it to
> pciback. Someone reported this issue one year ago [1].
How does unbinding and binding to pciback fix the issue? Is this due
to Dom0 using some PV or Dom0 only hypercall that can reset the
host_maskall state?
>
> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> killed/crashed before the domain shutdown, this domain's pirq won't be
> unmapped. Then xen will unmap all pirq. But pciback has already disabled
> meory decoding before xen unmapping pirq. Then when Xen is disabling a
> MSI of the device, it has to sets the host_maskall flag and maskall bit
> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> this process is:
> ->arch_domain_destroy
> ->free_domain_pirqs
> ->unmap_domain_pirq (if pirq isn't unmap by qemu)
> ->pirq_guest_force_unbind
> ->__pirq_guest_unbind
> ->mask_msi_irq(=desc->handler->disable())
> ->the warning in msi_set_mask_bit()
>
> The host_maskall bit will prevent guests from clearing the maskall bit
> even the device is assigned to another guest later. Guests cannot
> receive interrupts from this device.
>
>
> To fix this, host_maskall flag is cleared when all MSIs of a device are freed.
> It is definitely safely to clear it because no msi is actually set up
> for this device. Also, 'msix->warned' is initialized to DOMID_INVALID
> rather than 0 to avoid warnings missing for Dom0.
>
> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> xen/arch/x86/msi.c | 18 ++++++++++++++++++
> xen/drivers/passthrough/pci.c | 1 +
> 2 files changed, 19 insertions(+)
>
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index 5567990..cd570bc 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -637,6 +637,7 @@ int msi_free_irq(struct msi_desc *entry)
> {
> unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
> ? entry->msi.nvec : 1;
> + struct pci_dev *pdev = entry->dev;
>
> while ( nr-- )
> {
> @@ -654,6 +655,23 @@ int msi_free_irq(struct msi_desc *entry)
>
> list_del(&entry->list);
> xfree(entry);
> +
> + /*
> + * In some cases, the 'host_maskall' is set for safety. Here clear
> + * 'host_maskall' if all MSIs are freed for a msi-x capable device.
> + * Without it, the device's msix keeps disabled even been reassigned
"... even after being reassigned ..."
I think it's clearer.
> + * to another domain.
> + */
> + if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
> + {
> + if ( pdev->msix->host_maskall )
> + printk(XENLOG_G_WARNING
> + "Resetting msix status of %04x:%02x:%02x.%u\n",
> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> + PCI_FUNC(pdev->devfn));
> + pdev->msix->host_maskall = false;
> + pdev->msix->warned = DOMID_INVALID;
> + }
> return 0;
IMO this looks quite similar to a msi_reset_state function or at least
the start of it.
Maybe it should be in a separate helper that sets all the fields to
their initial values, with the expectation that Dom0 will perform the
device reset?
The underlying problem here AFAICT is that the Xen internal device
state is not the same between device assignments.
In any case there should be at least a note here pointing out that Xen
expects the hardware domain to perform a device reset, so the Xen
internal state actually matches the device state before trying to
assign the device to another guest.
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-15 10:40 ` Roger Pau Monné
@ 2018-11-16 7:53 ` Chao Gao
2018-11-16 9:35 ` Roger Pau Monné
0 siblings, 1 reply; 12+ messages in thread
From: Chao Gao @ 2018-11-16 7:53 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper
On Thu, Nov 15, 2018 at 11:40:39AM +0100, Roger Pau Monné wrote:
>On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote:
>> I find some pass-thru devices don't work any more across guest
>> reboot. Assigning it to another domain also meets the same issue. And
>> the only way to make it work again is un-binding and binding it to
>> pciback. Someone reported this issue one year ago [1].
>
>How does unbinding and binding to pciback fix the issue? Is this due
>to Dom0 using some PV or Dom0 only hypercall that can reset the
>host_maskall state?
I think it is msix_capability_init() that clear host_maskall. And this
function is called by PHYSDEVOP_prepare_msix sub-hypercall.
>
>>
>> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> killed/crashed before the domain shutdown, this domain's pirq won't be
>> unmapped. Then xen will unmap all pirq. But pciback has already disabled
>> meory decoding before xen unmapping pirq. Then when Xen is disabling a
>> MSI of the device, it has to sets the host_maskall flag and maskall bit
>> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> this process is:
>> ->arch_domain_destroy
>> ->free_domain_pirqs
>> ->unmap_domain_pirq (if pirq isn't unmap by qemu)
>> ->pirq_guest_force_unbind
>> ->__pirq_guest_unbind
>> ->mask_msi_irq(=desc->handler->disable())
>> ->the warning in msi_set_mask_bit()
>>
>> The host_maskall bit will prevent guests from clearing the maskall bit
>> even the device is assigned to another guest later. Guests cannot
>> receive interrupts from this device.
>>
>>
>> To fix this, host_maskall flag is cleared when all MSIs of a device are freed.
>> It is definitely safely to clear it because no msi is actually set up
>> for this device. Also, 'msix->warned' is initialized to DOMID_INVALID
>> rather than 0 to avoid warnings missing for Dom0.
>>
>> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> xen/arch/x86/msi.c | 18 ++++++++++++++++++
>> xen/drivers/passthrough/pci.c | 1 +
>> 2 files changed, 19 insertions(+)
>>
>> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
>> index 5567990..cd570bc 100644
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -637,6 +637,7 @@ int msi_free_irq(struct msi_desc *entry)
>> {
>> unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
>> ? entry->msi.nvec : 1;
>> + struct pci_dev *pdev = entry->dev;
>>
>> while ( nr-- )
>> {
>> @@ -654,6 +655,23 @@ int msi_free_irq(struct msi_desc *entry)
>>
>> list_del(&entry->list);
>> xfree(entry);
>> +
>> + /*
>> + * In some cases, the 'host_maskall' is set for safety. Here clear
>> + * 'host_maskall' if all MSIs are freed for a msi-x capable device.
>> + * Without it, the device's msix keeps disabled even been reassigned
>
>"... even after being reassigned ..."
>
>I think it's clearer.
>
>> + * to another domain.
>> + */
>> + if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
>> + {
>> + if ( pdev->msix->host_maskall )
>> + printk(XENLOG_G_WARNING
>> + "Resetting msix status of %04x:%02x:%02x.%u\n",
>> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> + PCI_FUNC(pdev->devfn));
>> + pdev->msix->host_maskall = false;
>> + pdev->msix->warned = DOMID_INVALID;
>> + }
>> return 0;
>
>IMO this looks quite similar to a msi_reset_state function or at least
>the start of it.
>
>Maybe it should be in a separate helper that sets all the fields to
>their initial values,
Will do.
>with the expectation that Dom0 will perform the
>device reset?
Dom0 resets devices when (de-)assignment by echo 1 to
/sys/bus/pci/devices/<sbdf>/reset.
>
>The underlying problem here AFAICT is that the Xen internal device
>state is not the same between device assignments.
Yes, it is accurate.
>
>In any case there should be at least a note here pointing out that Xen
>expects the hardware domain to perform a device reset, so the Xen
>internal state actually matches the device state before trying to
>assign the device to another guest.
Sounds good. This issue is that Xen tries to mask msi (when unmapping pirq)
after memory decoding is disabled by pciback. If pciback can unmap all the
pirq-s related a given device before disabling memory decoding, Xen won't meet
this issue. Currently, pciback doesn't maintain the pirq information; it
isn't able to do this.
Anyway, this fix patch can serve as a sanity check.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-16 7:53 ` Chao Gao
@ 2018-11-16 9:35 ` Roger Pau Monné
2018-11-16 9:59 ` Jan Beulich
0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2018-11-16 9:35 UTC (permalink / raw)
To: Chao Gao; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper
On Fri, Nov 16, 2018 at 03:53:50PM +0800, Chao Gao wrote:
> On Thu, Nov 15, 2018 at 11:40:39AM +0100, Roger Pau Monné wrote:
> >On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote:
> >> I find some pass-thru devices don't work any more across guest
> >> reboot. Assigning it to another domain also meets the same issue. And
> >> the only way to make it work again is un-binding and binding it to
> >> pciback. Someone reported this issue one year ago [1].
> >
> >How does unbinding and binding to pciback fix the issue? Is this due
> >to Dom0 using some PV or Dom0 only hypercall that can reset the
> >host_maskall state?
>
> I think it is msix_capability_init() that clear host_maskall. And this
> function is called by PHYSDEVOP_prepare_msix sub-hypercall.
>
> >
> >>
> >> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> >> killed/crashed before the domain shutdown, this domain's pirq won't be
> >> unmapped. Then xen will unmap all pirq. But pciback has already disabled
> >> meory decoding before xen unmapping pirq. Then when Xen is disabling a
> >> MSI of the device, it has to sets the host_maskall flag and maskall bit
> >> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> >> this process is:
> >> ->arch_domain_destroy
> >> ->free_domain_pirqs
> >> ->unmap_domain_pirq (if pirq isn't unmap by qemu)
> >> ->pirq_guest_force_unbind
> >> ->__pirq_guest_unbind
> >> ->mask_msi_irq(=desc->handler->disable())
> >> ->the warning in msi_set_mask_bit()
> >>
> >> The host_maskall bit will prevent guests from clearing the maskall bit
> >> even the device is assigned to another guest later. Guests cannot
> >> receive interrupts from this device.
> >>
> >>
> >> To fix this, host_maskall flag is cleared when all MSIs of a device are freed.
> >> It is definitely safely to clear it because no msi is actually set up
> >> for this device. Also, 'msix->warned' is initialized to DOMID_INVALID
> >> rather than 0 to avoid warnings missing for Dom0.
> >>
> >> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
> >>
> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> >> ---
> >> xen/arch/x86/msi.c | 18 ++++++++++++++++++
> >> xen/drivers/passthrough/pci.c | 1 +
> >> 2 files changed, 19 insertions(+)
> >>
> >> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> >> index 5567990..cd570bc 100644
> >> --- a/xen/arch/x86/msi.c
> >> +++ b/xen/arch/x86/msi.c
> >> @@ -637,6 +637,7 @@ int msi_free_irq(struct msi_desc *entry)
> >> {
> >> unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
> >> ? entry->msi.nvec : 1;
> >> + struct pci_dev *pdev = entry->dev;
> >>
> >> while ( nr-- )
> >> {
> >> @@ -654,6 +655,23 @@ int msi_free_irq(struct msi_desc *entry)
> >>
> >> list_del(&entry->list);
> >> xfree(entry);
> >> +
> >> + /*
> >> + * In some cases, the 'host_maskall' is set for safety. Here clear
> >> + * 'host_maskall' if all MSIs are freed for a msi-x capable device.
> >> + * Without it, the device's msix keeps disabled even been reassigned
> >
> >"... even after being reassigned ..."
> >
> >I think it's clearer.
> >
> >> + * to another domain.
> >> + */
> >> + if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
> >> + {
> >> + if ( pdev->msix->host_maskall )
> >> + printk(XENLOG_G_WARNING
> >> + "Resetting msix status of %04x:%02x:%02x.%u\n",
> >> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >> + PCI_FUNC(pdev->devfn));
> >> + pdev->msix->host_maskall = false;
> >> + pdev->msix->warned = DOMID_INVALID;
AFAICT a guest could trigger this message multiple times by forcing a
PIRQ map/unmap of all the vectors in MSIX, thus likely flooding the
console since this is not rate limited. Since I think a guest can
manage to reach this code path while running, clearing warned is not
correct.
Also, if a guest can manage to trigger this path during it's runtime,
could it also hit the issue of getting host_maskall set and not being
able to clear it?
> >> + }
> >> return 0;
> >
> >IMO this looks quite similar to a msi_reset_state function or at least
> >the start of it.
> >
> >Maybe it should be in a separate helper that sets all the fields to
> >their initial values,
>
> Will do.
>
> >with the expectation that Dom0 will perform the
> >device reset?
>
> Dom0 resets devices when (de-)assignment by echo 1 to
> /sys/bus/pci/devices/<sbdf>/reset.
>
> >
> >The underlying problem here AFAICT is that the Xen internal device
> >state is not the same between device assignments.
>
> Yes, it is accurate.
>
> >
> >In any case there should be at least a note here pointing out that Xen
> >expects the hardware domain to perform a device reset, so the Xen
> >internal state actually matches the device state before trying to
> >assign the device to another guest.
>
> Sounds good. This issue is that Xen tries to mask msi (when unmapping pirq)
> after memory decoding is disabled by pciback. If pciback can unmap all the
> pirq-s related a given device before disabling memory decoding, Xen won't meet
> this issue. Currently, pciback doesn't maintain the pirq information; it
> isn't able to do this.
I would like to hear Jan's opinion on this, but I think it might be
helpful to introduce a new hypercall Dom0 (ie: toolstack) can use to
signal Xen a PCI device has been reset, so that Xen can safely reset
the device state to the initial one. This would be simpler if Xen was
the one performing the device reset.
Once a device has been assigned it would become 'dirty' and would
require such reset before it could be assigned again to a domain.
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-16 9:35 ` Roger Pau Monné
@ 2018-11-16 9:59 ` Jan Beulich
2018-11-16 14:30 ` Roger Pau Monné
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2018-11-16 9:59 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel, Chao Gao
>>> On 16.11.18 at 10:35, <roger.pau@citrix.com> wrote:
> On Fri, Nov 16, 2018 at 03:53:50PM +0800, Chao Gao wrote:
>> On Thu, Nov 15, 2018 at 11:40:39AM +0100, Roger Pau Monné wrote:
>> >On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote:
>> >> + if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
>> >> + {
>> >> + if ( pdev->msix->host_maskall )
>> >> + printk(XENLOG_G_WARNING
>> >> + "Resetting msix status of %04x:%02x:%02x.%u\n",
>> >> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> >> + PCI_FUNC(pdev->devfn));
>> >> + pdev->msix->host_maskall = false;
>> >> + pdev->msix->warned = DOMID_INVALID;
>
> AFAICT a guest could trigger this message multiple times by forcing a
> PIRQ map/unmap of all the vectors in MSIX, thus likely flooding the
> console since this is not rate limited. Since I think a guest can
> manage to reach this code path while running, clearing warned is not
> correct.
Did you overlook the _G_ infix? That guarantees rate limiting, unless
the admin specified a non-default "guest_loglvl=".
> Also, if a guest can manage to trigger this path during it's runtime,
> could it also hit the issue of getting host_maskall set and not being
> able to clear it?
But _can_ a guest trigger this path? So far I didn't think it can.
>> >In any case there should be at least a note here pointing out that Xen
>> >expects the hardware domain to perform a device reset, so the Xen
>> >internal state actually matches the device state before trying to
>> >assign the device to another guest.
>>
>> Sounds good. This issue is that Xen tries to mask msi (when unmapping pirq)
>> after memory decoding is disabled by pciback. If pciback can unmap all the
>> pirq-s related a given device before disabling memory decoding, Xen won't meet
>> this issue. Currently, pciback doesn't maintain the pirq information; it
>> isn't able to do this.
>
> I would like to hear Jan's opinion on this, but I think it might be
> helpful to introduce a new hypercall Dom0 (ie: toolstack) can use to
> signal Xen a PCI device has been reset, so that Xen can safely reset
> the device state to the initial one. This would be simpler if Xen was
> the one performing the device reset.
Such a notification might be helpful, if it can't be expressed via the
existing PHYSDEVOP_{prepare,release}_msix. For the moment I can't
see though why these two would be insufficient.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-16 9:59 ` Jan Beulich
@ 2018-11-16 14:30 ` Roger Pau Monné
2018-11-19 12:23 ` Jan Beulich
2018-11-19 12:45 ` Chao Gao
0 siblings, 2 replies; 12+ messages in thread
From: Roger Pau Monné @ 2018-11-16 14:30 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Chao Gao
On Fri, Nov 16, 2018 at 02:59:41AM -0700, Jan Beulich wrote:
> >>> On 16.11.18 at 10:35, <roger.pau@citrix.com> wrote:
> > On Fri, Nov 16, 2018 at 03:53:50PM +0800, Chao Gao wrote:
> >> On Thu, Nov 15, 2018 at 11:40:39AM +0100, Roger Pau Monné wrote:
> >> >On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote:
> >> >> + if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
> >> >> + {
> >> >> + if ( pdev->msix->host_maskall )
> >> >> + printk(XENLOG_G_WARNING
> >> >> + "Resetting msix status of %04x:%02x:%02x.%u\n",
> >> >> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >> >> + PCI_FUNC(pdev->devfn));
> >> >> + pdev->msix->host_maskall = false;
> >> >> + pdev->msix->warned = DOMID_INVALID;
> >
> > AFAICT a guest could trigger this message multiple times by forcing a
> > PIRQ map/unmap of all the vectors in MSIX, thus likely flooding the
> > console since this is not rate limited. Since I think a guest can
> > manage to reach this code path while running, clearing warned is not
> > correct.
>
> Did you overlook the _G_ infix? That guarantees rate limiting, unless
> the admin specified a non-default "guest_loglvl=".
Right, I tend to use the gprintk variant and I've indeed overlooked
the _G_.
> > Also, if a guest can manage to trigger this path during it's runtime,
> > could it also hit the issue of getting host_maskall set and not being
> > able to clear it?
>
> But _can_ a guest trigger this path? So far I didn't think it can.
AFAICT (and I might have missed something) a guest can trigger the
execution of unmap_domain_pirq which ends up calling msi_free_irq by
enabling and then disabling MSIX after having setup some vectors. This
is the trace from QEMU and Xen:
xen_pt_msixctrl_reg_write
xen_pt_msix_disable
msi_msix_disable
xc_physdev_unmap_pirq
-> PHYSDEVOP_unmap_pirq hypercall
physdev_unmap_pirq
unmap_domain_pirq
msi_free_irq
Given this I would only clean host_maskall in msi_free_irq if the
domain is being destroyed (d->is_shutting_down), or even better I
would consider using something like PHYSDEVOP_prepare_msix in order to
reset Xen's internal MSI state after device reset.
> >> >In any case there should be at least a note here pointing out that Xen
> >> >expects the hardware domain to perform a device reset, so the Xen
> >> >internal state actually matches the device state before trying to
> >> >assign the device to another guest.
> >>
> >> Sounds good. This issue is that Xen tries to mask msi (when unmapping pirq)
> >> after memory decoding is disabled by pciback. If pciback can unmap all the
> >> pirq-s related a given device before disabling memory decoding, Xen won't meet
> >> this issue. Currently, pciback doesn't maintain the pirq information; it
> >> isn't able to do this.
> >
> > I would like to hear Jan's opinion on this, but I think it might be
> > helpful to introduce a new hypercall Dom0 (ie: toolstack) can use to
> > signal Xen a PCI device has been reset, so that Xen can safely reset
> > the device state to the initial one. This would be simpler if Xen was
> > the one performing the device reset.
>
> Such a notification might be helpful, if it can't be expressed via the
> existing PHYSDEVOP_{prepare,release}_msix. For the moment I can't
> see though why these two would be insufficient.
I think using PHYSDEVOP_{prepare,release}_msix should be enough, since
it will reset host_maskall by calling msix_capability_init.
Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-16 14:30 ` Roger Pau Monné
@ 2018-11-19 12:23 ` Jan Beulich
2018-11-19 12:45 ` Chao Gao
1 sibling, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2018-11-19 12:23 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel, Chao Gao
>>> On 16.11.18 at 15:30, <roger.pau@citrix.com> wrote:
> On Fri, Nov 16, 2018 at 02:59:41AM -0700, Jan Beulich wrote:
>> >>> On 16.11.18 at 10:35, <roger.pau@citrix.com> wrote:
>> > On Fri, Nov 16, 2018 at 03:53:50PM +0800, Chao Gao wrote:
>> >> On Thu, Nov 15, 2018 at 11:40:39AM +0100, Roger Pau Monné wrote:
>> >> >On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote:
>> >> >> + if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
>> >> >> + {
>> >> >> + if ( pdev->msix->host_maskall )
>> >> >> + printk(XENLOG_G_WARNING
>> >> >> + "Resetting msix status of %04x:%02x:%02x.%u\n",
>> >> >> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> >> >> + PCI_FUNC(pdev->devfn));
>> >> >> + pdev->msix->host_maskall = false;
>> >> >> + pdev->msix->warned = DOMID_INVALID;
>> >
>> > AFAICT a guest could trigger this message multiple times by forcing a
>> > PIRQ map/unmap of all the vectors in MSIX, thus likely flooding the
>> > console since this is not rate limited. Since I think a guest can
>> > manage to reach this code path while running, clearing warned is not
>> > correct.
>>
>> Did you overlook the _G_ infix? That guarantees rate limiting, unless
>> the admin specified a non-default "guest_loglvl=".
>
> Right, I tend to use the gprintk variant and I've indeed overlooked
> the _G_.
>
>> > Also, if a guest can manage to trigger this path during it's runtime,
>> > could it also hit the issue of getting host_maskall set and not being
>> > able to clear it?
>>
>> But _can_ a guest trigger this path? So far I didn't think it can.
>
> AFAICT (and I might have missed something) a guest can trigger the
> execution of unmap_domain_pirq which ends up calling msi_free_irq by
> enabling and then disabling MSIX after having setup some vectors. This
> is the trace from QEMU and Xen:
>
> xen_pt_msixctrl_reg_write
> xen_pt_msix_disable
> msi_msix_disable
> xc_physdev_unmap_pirq
> -> PHYSDEVOP_unmap_pirq hypercall
> physdev_unmap_pirq
> unmap_domain_pirq
> msi_free_irq
>
> Given this I would only clean host_maskall in msi_free_irq if the
> domain is being destroyed (d->is_shutting_down), or even better I
> would consider using something like PHYSDEVOP_prepare_msix in order to
> reset Xen's internal MSI state after device reset.
Oh, right - so far I had wrongly assumed it's msi_free_irqs() and its
call to pci_disable_msi() / __pci_disable_msix() only which may set
host_maskall permanently. msi_free_irq() would indeed result in
shutdown_msi_irq() to be called, which then would have the same
effect. But wait - judging from qemu's .emu_mask member for the
command register the guest can't turn off the physical memory
decode bit.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-16 14:30 ` Roger Pau Monné
2018-11-19 12:23 ` Jan Beulich
@ 2018-11-19 12:45 ` Chao Gao
2018-11-19 12:55 ` Jan Beulich
1 sibling, 1 reply; 12+ messages in thread
From: Chao Gao @ 2018-11-19 12:45 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, xen-devel
On Fri, Nov 16, 2018 at 03:30:11PM +0100, Roger Pau Monné wrote:
>On Fri, Nov 16, 2018 at 02:59:41AM -0700, Jan Beulich wrote:
>> >>> On 16.11.18 at 10:35, <roger.pau@citrix.com> wrote:
>> > On Fri, Nov 16, 2018 at 03:53:50PM +0800, Chao Gao wrote:
>> >> On Thu, Nov 15, 2018 at 11:40:39AM +0100, Roger Pau Monné wrote:
>> >> >On Thu, Nov 15, 2018 at 09:10:26AM +0800, Chao Gao wrote:
>> >> >> + if ( pdev && list_empty(&pdev->msi_list) && pdev->msix )
>> >> >> + {
>> >> >> + if ( pdev->msix->host_maskall )
>> >> >> + printk(XENLOG_G_WARNING
>> >> >> + "Resetting msix status of %04x:%02x:%02x.%u\n",
>> >> >> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> >> >> + PCI_FUNC(pdev->devfn));
>> >> >> + pdev->msix->host_maskall = false;
>> >> >> + pdev->msix->warned = DOMID_INVALID;
>> >
>> > AFAICT a guest could trigger this message multiple times by forcing a
>> > PIRQ map/unmap of all the vectors in MSIX, thus likely flooding the
>> > console since this is not rate limited. Since I think a guest can
>> > manage to reach this code path while running, clearing warned is not
>> > correct.
>>
>> Did you overlook the _G_ infix? That guarantees rate limiting, unless
>> the admin specified a non-default "guest_loglvl=".
>
>Right, I tend to use the gprintk variant and I've indeed overlooked
>the _G_.
>
>> > Also, if a guest can manage to trigger this path during it's runtime,
>> > could it also hit the issue of getting host_maskall set and not being
>> > able to clear it?
>>
>> But _can_ a guest trigger this path? So far I didn't think it can.
>
>AFAICT (and I might have missed something) a guest can trigger the
>execution of unmap_domain_pirq which ends up calling msi_free_irq by
>enabling and then disabling MSIX after having setup some vectors. This
>is the trace from QEMU and Xen:
>
>xen_pt_msixctrl_reg_write
> xen_pt_msix_disable
> msi_msix_disable
> xc_physdev_unmap_pirq
> -> PHYSDEVOP_unmap_pirq hypercall
> physdev_unmap_pirq
> unmap_domain_pirq
> msi_free_irq
>
>Given this I would only clean host_maskall in msi_free_irq if the
>domain is being destroyed (d->is_shutting_down),
Considering hot-unplug case, it isn't a good idea. Although qemu always
disables msi-x when hot-unplug a device, but it can be compromised.
>or even better I
>would consider using something like PHYSDEVOP_prepare_msix in order to
>reset Xen's internal MSI state after device reset.
It might be a clean solution. But to me, current code is complicated enough.
Extending what the two sub-hypercall is doing and wrapping device reset with
these two sub-hypercall should be very careful. One obvious error is
pci_prepare_msix() will return -EBUSY if 'msix->used_entries' isn't 0 or 1.
To make it work, we also rely on qemu to disable msix then Xen will decrease
the used_entries.
Another solution came to my mind:
The intention of Xen setting 'host_maskall' is to mask a single vector. How
about converting the host_maskall to mask all vectors when Xen tries to init
the first vector in msix_capability_init()? Actually, on hardware, all
vector's mask bit is already set when pciback is performing device reset. So
it won't break anything. With commit 69d99d1b223, even a guest has cleared
some vectors' mask bit before the convertion, it won't be an issue.
Do you think this solution is theoretically correct?
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-19 12:45 ` Chao Gao
@ 2018-11-19 12:55 ` Jan Beulich
2018-11-19 13:20 ` Chao Gao
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2018-11-19 12:55 UTC (permalink / raw)
To: Roger Pau Monne, Chao Gao; +Cc: Andrew Cooper, Wei Liu, xen-devel
>>> On 19.11.18 at 13:45, <chao.gao@intel.com> wrote:
> Another solution came to my mind:
>
> The intention of Xen setting 'host_maskall' is to mask a single vector. How
> about converting the host_maskall to mask all vectors when Xen tries to init
> the first vector in msix_capability_init()? Actually, on hardware, all
> vector's mask bit is already set when pciback is performing device reset. So
> it won't break anything. With commit 69d99d1b223, even a guest has cleared
> some vectors' mask bit before the convertion, it won't be an issue.
But we already mask vectors in msix_capability_init(). I thought we had
already settled that if we came through this function before assigning
the device to a (new or same/restarting) guest, all ought to be fine:
When we come here for the first entry, we also turn off host_maskall.
That's why we've been suggesting to go through a release/prepare
cycle.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-19 12:55 ` Jan Beulich
@ 2018-11-19 13:20 ` Chao Gao
0 siblings, 0 replies; 12+ messages in thread
From: Chao Gao @ 2018-11-19 13:20 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne
On Mon, Nov 19, 2018 at 05:55:08AM -0700, Jan Beulich wrote:
>>>> On 19.11.18 at 13:45, <chao.gao@intel.com> wrote:
>> Another solution came to my mind:
>>
>> The intention of Xen setting 'host_maskall' is to mask a single vector. How
>> about converting the host_maskall to mask all vectors when Xen tries to init
>> the first vector in msix_capability_init()? Actually, on hardware, all
>> vector's mask bit is already set when pciback is performing device reset. So
>> it won't break anything. With commit 69d99d1b223, even a guest has cleared
>> some vectors' mask bit before the convertion, it won't be an issue.
>
>But we already mask vectors in msix_capability_init(). I thought we had
>already settled that if we came through this function before assigning
>the device to a (new or same/restarting) guest, all ought to be fine:
>When we come here for the first entry, we also turn off host_maskall.
Yes, it is reasonable.
>That's why we've been suggesting to go through a release/prepare
>cycle.
Will follow your suggestion.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-11-15 1:10 [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
2018-11-15 10:40 ` Roger Pau Monné
@ 2018-12-11 17:03 ` Jan Beulich
2018-12-12 4:20 ` Chao Gao
1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2018-12-11 17:03 UTC (permalink / raw)
To: Chao Gao; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne
>>> On 15.11.18 at 02:10, <chao.gao@intel.com> wrote:
> I find some pass-thru devices don't work any more across guest
> reboot. Assigning it to another domain also meets the same issue. And
> the only way to make it work again is un-binding and binding it to
> pciback. Someone reported this issue one year ago [1].
>
> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> killed/crashed before the domain shutdown, this domain's pirq won't be
> unmapped. Then xen will unmap all pirq. But pciback has already disabled
> meory decoding before xen unmapping pirq. Then when Xen is disabling a
> MSI of the device, it has to sets the host_maskall flag and maskall bit
> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> this process is:
> ->arch_domain_destroy
> ->free_domain_pirqs
> ->unmap_domain_pirq (if pirq isn't unmap by qemu)
> ->pirq_guest_force_unbind
> ->__pirq_guest_unbind
> ->mask_msi_irq(=desc->handler->disable())
> ->the warning in msi_set_mask_bit()
>
> The host_maskall bit will prevent guests from clearing the maskall bit
> even the device is assigned to another guest later. Guests cannot
> receive interrupts from this device.
>
> To fix this, host_maskall flag is cleared when all MSIs of a device are
> freed.
> It is definitely safely to clear it because no msi is actually set up
> for this device. Also, 'msix->warned' is initialized to DOMID_INVALID
> rather than 0 to avoid warnings missing for Dom0.
>
> [1]:
> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
So I take it this patch has become obsolete with the xen-pciback
change you've posted a few days ago?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot
2018-12-11 17:03 ` Jan Beulich
@ 2018-12-12 4:20 ` Chao Gao
0 siblings, 0 replies; 12+ messages in thread
From: Chao Gao @ 2018-12-12 4:20 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne
On Tue, Dec 11, 2018 at 10:03:10AM -0700, Jan Beulich wrote:
>>>> On 15.11.18 at 02:10, <chao.gao@intel.com> wrote:
>> I find some pass-thru devices don't work any more across guest
>> reboot. Assigning it to another domain also meets the same issue. And
>> the only way to make it work again is un-binding and binding it to
>> pciback. Someone reported this issue one year ago [1].
>>
>> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> killed/crashed before the domain shutdown, this domain's pirq won't be
>> unmapped. Then xen will unmap all pirq. But pciback has already disabled
>> meory decoding before xen unmapping pirq. Then when Xen is disabling a
>> MSI of the device, it has to sets the host_maskall flag and maskall bit
>> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> this process is:
>> ->arch_domain_destroy
>> ->free_domain_pirqs
>> ->unmap_domain_pirq (if pirq isn't unmap by qemu)
>> ->pirq_guest_force_unbind
>> ->__pirq_guest_unbind
>> ->mask_msi_irq(=desc->handler->disable())
>> ->the warning in msi_set_mask_bit()
>>
>> The host_maskall bit will prevent guests from clearing the maskall bit
>> even the device is assigned to another guest later. Guests cannot
>> receive interrupts from this device.
>>
>> To fix this, host_maskall flag is cleared when all MSIs of a device are
>> freed.
>> It is definitely safely to clear it because no msi is actually set up
>> for this device. Also, 'msix->warned' is initialized to DOMID_INVALID
>> rather than 0 to avoid warnings missing for Dom0.
>>
>> [1]:
>> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>
>So I take it this patch has become obsolete with the xen-pciback
>change you've posted a few days ago?
Yes, you are right.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-12-12 4:16 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-15 1:10 [PATCH v2] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
2018-11-15 10:40 ` Roger Pau Monné
2018-11-16 7:53 ` Chao Gao
2018-11-16 9:35 ` Roger Pau Monné
2018-11-16 9:59 ` Jan Beulich
2018-11-16 14:30 ` Roger Pau Monné
2018-11-19 12:23 ` Jan Beulich
2018-11-19 12:45 ` Chao Gao
2018-11-19 12:55 ` Jan Beulich
2018-11-19 13:20 ` Chao Gao
2018-12-11 17:03 ` Jan Beulich
2018-12-12 4:20 ` Chao Gao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).