From: "Michael S. Tsirkin" <mst@redhat.com>
To: Zhuangyanying <ann.zhuangyanying@huawei.com>
Cc: marcel.apfelbaum@gmail.com, qemu-devel@nongnu.org,
arei.gonglei@huawei.com
Subject: Re: [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD
Date: Tue, 9 Apr 2019 10:52:00 -0400 [thread overview]
Message-ID: <20190409102526-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1554819296-14960-1-git-send-email-ann.zhuangyanying@huawei.com>
On Tue, Apr 09, 2019 at 02:14:56PM +0000, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
>
> Recently I tested the performance of NVMe SSD passthrough and found that interrupts
> were aggregated on vcpu0(or the first vcpu of each numa) by /proc/interrupts,when
> GuestOS was upgraded to sles12sp3 (or redhat7.6). But /proc/irq/X/smp_affinity_list
> shows that the interrupt is spread out, such as 0-10, 11-21,.... and so on.
> This problem cannot be resolved by "echo X > /proc/irq/X/smp_affinity_list", because
> the NVMe SSD interrupt is requested by the API pci_alloc_irq_vectors(), so the
> interrupt has the IRQD_AFFINITY_MANAGED flag.
>
> GuestOS sles12sp3 backport "automatic interrupt affinity for MSI/MSI-X capable devices",
> but the implementation of __setup_irq has no corresponding modification. It is still
> irq_startup(), then setup_affinity(), that is sending an affinity message when the
> interrupt is unmasked. The bare metal configuration is successful, but qemu will
> not trigger the msix update, and the affinity configuration fails.
> The affinity is configured by /proc/irq/X/smp_affinity_list, implemented at
> apic_ack_edge(), the bitmap is stored in pending_mask,
> mask->__pci_write_msi_msg()->unmask,
> and the timing is guaranteed, and the configuration takes effect.
>
> The GuestOS linux-master incorporates the "genirq/cpuhotplug: Enforce affinity
> setting on startup of managed irqs" to ensure that the affinity is first issued
> and then __irq_startup(), for the managerred interrupt. So configuration is
> successful.
>
> It now looks like sles12sp3 (up to sles15sp1, linux-4.12.x), redhat7.6
> (3.10.0-957.10.1) does not have backport the patch yet.
> "if (is_masked == was_masked) return;" can it be removed at qemu?
> What is the reason for this check?
The reason is simple:
The PCI spec says:
Software must not modify the Address or Data fields of an entry while it is unmasked.
It's a guest bug then?
>
> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> ---
> hw/pci/msix.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 4e33641..e1ff533 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -119,10 +119,6 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector, bool was_masked)
> {
> bool is_masked = msix_is_masked(dev, vector);
>
> - if (is_masked == was_masked) {
> - return;
> - }
> -
> msix_fire_vector_notifier(dev, vector, is_masked);
>
> if (!is_masked && msix_is_pending(dev, vector)) {
> --
> 1.8.3.1
>
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Zhuangyanying <ann.zhuangyanying@huawei.com>
Cc: arei.gonglei@huawei.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD
Date: Tue, 9 Apr 2019 10:52:00 -0400 [thread overview]
Message-ID: <20190409102526-mutt-send-email-mst@kernel.org> (raw)
Message-ID: <20190409145200.cdAOkcCq1LwFh5d-SxbJxqf8R4P-5ciGodCx1gT0g9Q@z> (raw)
In-Reply-To: <1554819296-14960-1-git-send-email-ann.zhuangyanying@huawei.com>
On Tue, Apr 09, 2019 at 02:14:56PM +0000, Zhuangyanying wrote:
> From: Zhuang Yanying <ann.zhuangyanying@huawei.com>
>
> Recently I tested the performance of NVMe SSD passthrough and found that interrupts
> were aggregated on vcpu0(or the first vcpu of each numa) by /proc/interrupts,when
> GuestOS was upgraded to sles12sp3 (or redhat7.6). But /proc/irq/X/smp_affinity_list
> shows that the interrupt is spread out, such as 0-10, 11-21,.... and so on.
> This problem cannot be resolved by "echo X > /proc/irq/X/smp_affinity_list", because
> the NVMe SSD interrupt is requested by the API pci_alloc_irq_vectors(), so the
> interrupt has the IRQD_AFFINITY_MANAGED flag.
>
> GuestOS sles12sp3 backport "automatic interrupt affinity for MSI/MSI-X capable devices",
> but the implementation of __setup_irq has no corresponding modification. It is still
> irq_startup(), then setup_affinity(), that is sending an affinity message when the
> interrupt is unmasked. The bare metal configuration is successful, but qemu will
> not trigger the msix update, and the affinity configuration fails.
> The affinity is configured by /proc/irq/X/smp_affinity_list, implemented at
> apic_ack_edge(), the bitmap is stored in pending_mask,
> mask->__pci_write_msi_msg()->unmask,
> and the timing is guaranteed, and the configuration takes effect.
>
> The GuestOS linux-master incorporates the "genirq/cpuhotplug: Enforce affinity
> setting on startup of managed irqs" to ensure that the affinity is first issued
> and then __irq_startup(), for the managerred interrupt. So configuration is
> successful.
>
> It now looks like sles12sp3 (up to sles15sp1, linux-4.12.x), redhat7.6
> (3.10.0-957.10.1) does not have backport the patch yet.
> "if (is_masked == was_masked) return;" can it be removed at qemu?
> What is the reason for this check?
The reason is simple:
The PCI spec says:
Software must not modify the Address or Data fields of an entry while it is unmasked.
It's a guest bug then?
>
> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
> ---
> hw/pci/msix.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 4e33641..e1ff533 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -119,10 +119,6 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector, bool was_masked)
> {
> bool is_masked = msix_is_masked(dev, vector);
>
> - if (is_masked == was_masked) {
> - return;
> - }
> -
> msix_fire_vector_notifier(dev, vector, is_masked);
>
> if (!is_masked && msix_is_pending(dev, vector)) {
> --
> 1.8.3.1
>
next prev parent reply other threads:[~2019-04-09 14:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-09 14:14 [Qemu-devel] [PATCH] msix: fix interrupt aggregation problem at the passthrough of NVMe SSD Zhuangyanying
2019-04-09 14:14 ` Zhuangyanying
2019-04-09 14:52 ` Michael S. Tsirkin [this message]
2019-04-09 14:52 ` Michael S. Tsirkin
2019-04-09 15:04 ` Michael S. Tsirkin
2019-04-09 15:04 ` Michael S. Tsirkin
2019-04-10 7:14 ` Zhuangyanying
2019-04-10 7:14 ` Zhuangyanying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190409102526-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=ann.zhuangyanying@huawei.com \
--cc=arei.gonglei@huawei.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).