Re: [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Zhuangyanying <ann.zhuangyanying@huawei.com>
Cc: marcel.apfelbaum@gmail.com, qemu-devel@nongnu.org,
	arei.gonglei@huawei.com
Subject: Re: [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD
Date: Tue, 9 Apr 2019 10:52:00 -0400	[thread overview]
Message-ID: <20190409102526-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1554819296-14960-1-git-send-email-ann.zhuangyanying@huawei.com>

On Tue, Apr 09, 2019 at 02:14:56PM +0000, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> 
> Recently I tested the performance of NVMe SSD passthrough and found that interrupts
> were aggregated on vcpu0(or the first vcpu of each numa) by /proc/interrupts,when
> GuestOS was upgraded to sles12sp3 (or redhat7.6). But /proc/irq/X/smp_affinity_list
> shows that the interrupt is spread out, such as 0-10, 11-21,.... and so on.
> This problem cannot be resolved by "echo X > /proc/irq/X/smp_affinity_list", because
> the NVMe SSD interrupt is requested by the API pci_alloc_irq_vectors(), so the
> interrupt has the IRQD_AFFINITY_MANAGED flag.
> 
> GuestOS sles12sp3 backport "automatic interrupt affinity for MSI/MSI-X capable devices",
> but the implementation of __setup_irq has no corresponding modification. It is still
> irq_startup(), then setup_affinity(), that is sending an affinity message when the
> interrupt is unmasked. The bare metal configuration is successful, but qemu will
> not trigger the msix update, and the affinity configuration fails.
> The affinity is configured by /proc/irq/X/smp_affinity_list, implemented at
> apic_ack_edge(), the bitmap is stored in pending_mask,
> mask->__pci_write_msi_msg()->unmask,
> and the timing is guaranteed, and the configuration takes effect.
> 
> The GuestOS linux-master incorporates the "genirq/cpuhotplug: Enforce affinity
> setting on startup of managed irqs" to ensure that the affinity is first issued
> and then __irq_startup(), for the managerred interrupt. So configuration is
> successful.
> 
> It now looks like sles12sp3 (up to sles15sp1, linux-4.12.x), redhat7.6
> (3.10.0-957.10.1) does not have backport the patch yet.
> "if (is_masked == was_masked) return;" can it be removed at qemu?
> What is the reason for this check?

The reason is simple:

The PCI spec says:

Software must not modify the Address or Data fields of an entry while it is unmasked.

It's a guest bug then?

> 
> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> ---
>  hw/pci/msix.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 4e33641..e1ff533 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -119,10 +119,6 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector, bool was_masked)
>  {
>      bool is_masked = msix_is_masked(dev, vector);
>  
> -    if (is_masked == was_masked) {
> -        return;
> -    }
> -
>      msix_fire_vector_notifier(dev, vector, is_masked);
>  
>      if (!is_masked && msix_is_pending(dev, vector)) {
> -- 
> 1.8.3.1
>

WARNING: multiple messages have this Message-ID (diff)

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Zhuangyanying <ann.zhuangyanying@huawei.com>
Cc: arei.gonglei@huawei.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD
Date: Tue, 9 Apr 2019 10:52:00 -0400	[thread overview]
Message-ID: <20190409102526-mutt-send-email-mst@kernel.org> (raw)
Message-ID: <20190409145200.cdAOkcCq1LwFh5d-SxbJxqf8R4P-5ciGodCx1gT0g9Q@z> (raw)
In-Reply-To: <1554819296-14960-1-git-send-email-ann.zhuangyanying@huawei.com>

On Tue, Apr 09, 2019 at 02:14:56PM +0000, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> 
> Recently I tested the performance of NVMe SSD passthrough and found that interrupts
> were aggregated on vcpu0(or the first vcpu of each numa) by /proc/interrupts,when
> GuestOS was upgraded to sles12sp3 (or redhat7.6). But /proc/irq/X/smp_affinity_list
> shows that the interrupt is spread out, such as 0-10, 11-21,.... and so on.
> This problem cannot be resolved by "echo X > /proc/irq/X/smp_affinity_list", because
> the NVMe SSD interrupt is requested by the API pci_alloc_irq_vectors(), so the
> interrupt has the IRQD_AFFINITY_MANAGED flag.
> 
> GuestOS sles12sp3 backport "automatic interrupt affinity for MSI/MSI-X capable devices",
> but the implementation of __setup_irq has no corresponding modification. It is still
> irq_startup(), then setup_affinity(), that is sending an affinity message when the
> interrupt is unmasked. The bare metal configuration is successful, but qemu will
> not trigger the msix update, and the affinity configuration fails.
> The affinity is configured by /proc/irq/X/smp_affinity_list, implemented at
> apic_ack_edge(), the bitmap is stored in pending_mask,
> mask->__pci_write_msi_msg()->unmask,
> and the timing is guaranteed, and the configuration takes effect.
> 
> The GuestOS linux-master incorporates the "genirq/cpuhotplug: Enforce affinity
> setting on startup of managed irqs" to ensure that the affinity is first issued
> and then __irq_startup(), for the managerred interrupt. So configuration is
> successful.
> 
> It now looks like sles12sp3 (up to sles15sp1, linux-4.12.x), redhat7.6
> (3.10.0-957.10.1) does not have backport the patch yet.
> "if (is_masked == was_masked) return;" can it be removed at qemu?
> What is the reason for this check?

The reason is simple:

The PCI spec says:

Software must not modify the Address or Data fields of an entry while it is unmasked.

It's a guest bug then?

> 
> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> ---
>  hw/pci/msix.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 4e33641..e1ff533 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -119,10 +119,6 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector, bool was_masked)
>  {
>      bool is_masked = msix_is_masked(dev, vector);
>  
> -    if (is_masked == was_masked) {
> -        return;
> -    }
> -
>      msix_fire_vector_notifier(dev, vector, is_masked);
>  
>      if (!is_masked && msix_is_pending(dev, vector)) {
> -- 
> 1.8.3.1
>

next prev parent reply	other threads:[~2019-04-09 14:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-09 14:14 [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD Zhuangyanying
2019-04-09 14:14 ` Zhuangyanying
2019-04-09 14:52 ` Michael S. Tsirkin [this message]
2019-04-09 14:52   ` Michael S. Tsirkin
2019-04-09 15:04 ` Michael S. Tsirkin
2019-04-09 15:04   ` Michael S. Tsirkin
2019-04-10  7:14   ` Zhuangyanying
2019-04-10  7:14     ` Zhuangyanying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190409102526-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=ann.zhuangyanying@huawei.com \
    --cc=arei.gonglei@huawei.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).