From: "Michael S. Tsirkin" <mst@redhat.com>
To: Zhuangyanying <ann.zhuangyanying@huawei.com>
Cc: marcel.apfelbaum@gmail.com, qemu-devel@nongnu.org,
arei.gonglei@huawei.com
Subject: Re: [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD
Date: Tue, 9 Apr 2019 10:52:00 -0400 [thread overview]
Message-ID: <20190409102526-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1554819296-14960-1-git-send-email-ann.zhuangyanying@huawei.com>
On Tue, Apr 09, 2019 at 02:14:56PM +0000, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
>
> Recently I tested the performance of NVMe SSD passthrough and found that interrupts
> were aggregated on vcpu0(or the first vcpu of each numa) by /proc/interrupts,when
> GuestOS was upgraded to sles12sp3 (or redhat7.6). But /proc/irq/X/smp_affinity_list
> shows that the interrupt is spread out, such as 0-10, 11-21,.... and so on.
> This problem cannot be resolved by "echo X > /proc/irq/X/smp_affinity_list", because
> the NVMe SSD interrupt is requested by the API pci_alloc_irq_vectors(), so the
> interrupt has the IRQD_AFFINITY_MANAGED flag.
>
> GuestOS sles12sp3 backport "automatic interrupt affinity for MSI/MSI-X capable devices",
> but the implementation of __setup_irq has no corresponding modification. It is still
> irq_startup(), then setup_affinity(), that is sending an affinity message when the
> interrupt is unmasked. The bare metal configuration is successful, but qemu will
> not trigger the msix update, and the affinity configuration fails.
> The affinity is configured by /proc/irq/X/smp_affinity_list, implemented at
> apic_ack_edge(), the bitmap is stored in pending_mask,
> mask->__pci_write_msi_msg()->unmask,
> and the timing is guaranteed, and the configuration takes effect.
>
> The GuestOS linux-master incorporates the "genirq/cpuhotplug: Enforce affinity
> setting on startup of managed irqs" to ensure that the affinity is first issued
> and then __irq_startup(), for the managerred interrupt. So configuration is
> successful.
>
> It now looks like sles12sp3 (up to sles15sp1, linux-4.12.x), redhat7.6
> (3.10.0-957.10.1) does not have backport the patch yet.
> "if (is_masked == was_masked) return;" can it be removed at qemu?
> What is the reason for this check?
The reason is simple:
The PCI spec says:
Software must not modify the Address or Data fields of an entry while it is unmasked.
It's a guest bug then?
>
> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> ---
> hw/pci/msix.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 4e33641..e1ff533 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -119,10 +119,6 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector, bool was_masked)
> {
> bool is_masked = msix_is_masked(dev, vector);
>
> - if (is_masked == was_masked) {
> - return;
> - }
> -
> msix_fire_vector_notifier(dev, vector, is_masked);
>
> if (!is_masked && msix_is_pending(dev, vector)) {
> --
> 1.8.3.1
>
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Zhuangyanying <ann.zhuangyanying@huawei.com>
Cc: arei.gonglei@huawei.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD
Date: Tue, 9 Apr 2019 10:52:00 -0400 [thread overview]
Message-ID: <20190409102526-mutt-send-email-mst@kernel.org> (raw)
Message-ID: <20190409145200.cdAOkcCq1LwFh5d-SxbJxqf8R4P-5ciGodCx1gT0g9Q@z> (raw)
In-Reply-To: <1554819296-14960-1-git-send-email-ann.zhuangyanying@huawei.com>
On Tue, Apr 09, 2019 at 02:14:56PM +0000, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
>
> Recently I tested the performance of NVMe SSD passthrough and found that interrupts
> were aggregated on vcpu0(or the first vcpu of each numa) by /proc/interrupts,when
> GuestOS was upgraded to sles12sp3 (or redhat7.6). But /proc/irq/X/smp_affinity_list
> shows that the interrupt is spread out, such as 0-10, 11-21,.... and so on.
> This problem cannot be resolved by "echo X > /proc/irq/X/smp_affinity_list", because
> the NVMe SSD interrupt is requested by the API pci_alloc_irq_vectors(), so the
> interrupt has the IRQD_AFFINITY_MANAGED flag.
>
> GuestOS sles12sp3 backport "automatic interrupt affinity for MSI/MSI-X capable devices",
> but the implementation of __setup_irq has no corresponding modification. It is still
> irq_startup(), then setup_affinity(), that is sending an affinity message when the
> interrupt is unmasked. The bare metal configuration is successful, but qemu will
> not trigger the msix update, and the affinity configuration fails.
> The affinity is configured by /proc/irq/X/smp_affinity_list, implemented at
> apic_ack_edge(), the bitmap is stored in pending_mask,
> mask->__pci_write_msi_msg()->unmask,
> and the timing is guaranteed, and the configuration takes effect.
>
> The GuestOS linux-master incorporates the "genirq/cpuhotplug: Enforce affinity
> setting on startup of managed irqs" to ensure that the affinity is first issued
> and then __irq_startup(), for the managerred interrupt. So configuration is
> successful.
>
> It now looks like sles12sp3 (up to sles15sp1, linux-4.12.x), redhat7.6
> (3.10.0-957.10.1) does not have backport the patch yet.
> "if (is_masked == was_masked) return;" can it be removed at qemu?
> What is the reason for this check?
The reason is simple:
The PCI spec says:
Software must not modify the Address or Data fields of an entry while it is unmasked.
It's a guest bug then?
>
> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> ---
> hw/pci/msix.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 4e33641..e1ff533 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -119,10 +119,6 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector, bool was_masked)
> {
> bool is_masked = msix_is_masked(dev, vector);
>
> - if (is_masked == was_masked) {
> - return;
> - }
> -
> msix_fire_vector_notifier(dev, vector, is_masked);
>
> if (!is_masked && msix_is_pending(dev, vector)) {
> --
> 1.8.3.1
>
next prev parent reply other threads:[~2019-04-09 14:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-09 14:14 [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD Zhuangyanying
2019-04-09 14:14 ` Zhuangyanying
2019-04-09 14:52 ` Michael S. Tsirkin [this message]
2019-04-09 14:52 ` Michael S. Tsirkin
2019-04-09 15:04 ` Michael S. Tsirkin
2019-04-09 15:04 ` Michael S. Tsirkin
2019-04-10 7:14 ` Zhuangyanying
2019-04-10 7:14 ` Zhuangyanying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190409102526-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=ann.zhuangyanying@huawei.com \
--cc=arei.gonglei@huawei.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.