From: Dexuan Cui <decui@microsoft.com>
To: quic_jhugo@quicinc.com, wei.liu@kernel.org, kys@microsoft.com,
haiyangz@microsoft.com, sthemmin@microsoft.com,
lpieralisi@kernel.org, bhelgaas@google.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, mikelley@microsoft.com,
robh@kernel.org, kw@linux.com, helgaas@kernel.org,
alex.williamson@redhat.com, boqun.feng@gmail.com,
Boqun.Feng@microsoft.com
Cc: Dexuan Cui <decui@microsoft.com>,
Carl Vanderlip <quic_carlv@quicinc.com>
Subject: [PATCH] PCI: hv: Only reuse existing IRTE allocation for Multi-MSI
Date: Wed, 3 Aug 2022 19:51:04 -0700 [thread overview]
Message-ID: <20220804025104.15673-1-decui@microsoft.com> (raw)
Jeffrey's 4 recent patches added Multi-MSI support to the pci-hyperv driver.
Unluckily, one of the patches, i.e., b4b77778ecc5, causes a regression to a
fio test for the Azure VM SKU Standard L64s v2 (64 AMD vCPUs, 8 NVMe drives):
when fio runs against all the 8 NVMe drives, it runs fine with a low io-depth
(e.g., 2 or 4); when fio runs with a high io-depth (e.g., 256), somehow
queue-29 of each NVMe drive suddenly no longer receives any interrupts, and
the NVMe core code has to abort the queue after a timeout of 30 seconds, and
then queue-29 starts to receive interrupts again for several seconds, and
later queue-29 no longer receives interrupts again, and this pattern repeats:
[ 223.891249] nvme nvme2: I/O 320 QID 29 timeout, aborting
[ 223.896231] nvme nvme0: I/O 320 QID 29 timeout, aborting
[ 223.898340] nvme nvme4: I/O 832 QID 29 timeout, aborting
[ 259.471309] nvme nvme2: I/O 320 QID 29 timeout, aborting
[ 259.476493] nvme nvme0: I/O 321 QID 29 timeout, aborting
[ 259.482967] nvme nvme0: I/O 322 QID 29 timeout, aborting
Some other symptoms are: the throughput of the NVMe drives drops due to
commit b4b77778ecc5. When the fio test is running, the kernel prints some
soft lock-up messages from time to time.
Commit b4b77778ecc5 itself looks good, and at the moment it's unclear where
the issue is. While the issue is being investigated, restore the old behavior
in hv_compose_msi_msg(), i.e., don't reuse the existing IRTE allocation for
single-MSI and MSI-X. This is a stopgap for the above NVMe issue.
Fixes: b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Jeffrey Hugo <quic_jhugo@quicinc.com>
Cc: Carl Vanderlip <quic_carlv@quicinc.com>
---
drivers/pci/controller/pci-hyperv.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index db814f7b93ba..65d0dab25deb 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1701,6 +1701,7 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
struct compose_comp_ctxt comp;
struct tran_int_desc *int_desc;
struct msi_desc *msi_desc;
+ bool multi_msi;
u8 vector, vector_count;
struct {
struct pci_packet pci_pkt;
@@ -1714,8 +1715,16 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
u32 size;
int ret;
- /* Reuse the previous allocation */
- if (data->chip_data) {
+ msi_desc = irq_data_get_msi_desc(data);
+ multi_msi = !msi_desc->pci.msi_attrib.is_msix &&
+ msi_desc->nvec_used > 1;
+ /*
+ * Reuse the previous allocation for Multi-MSI. This is required for
+ * Multi-MSI and is optional for single-MSI and MSI-X. Note: for now,
+ * don't reuse the previous allocation for MSI-X because this causes
+ * unreliable interrupt delivery for some NVMe devices.
+ */
+ if (data->chip_data && multi_msi) {
int_desc = data->chip_data;
msg->address_hi = int_desc->address >> 32;
msg->address_lo = int_desc->address & 0xffffffff;
@@ -1723,7 +1732,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
return;
}
- msi_desc = irq_data_get_msi_desc(data);
pdev = msi_desc_to_pci_dev(msi_desc);
dest = irq_data_get_effective_affinity_mask(data);
pbus = pdev->bus;
@@ -1733,11 +1741,18 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
if (!hpdev)
goto return_null_message;
+ /* Free any previous message that might have already been composed. */
+ if (data->chip_data && !multi_msi) {
+ int_desc = data->chip_data;
+ data->chip_data = NULL;
+ hv_int_desc_free(hpdev, int_desc);
+ }
+
int_desc = kzalloc(sizeof(*int_desc), GFP_ATOMIC);
if (!int_desc)
goto drop_reference;
- if (!msi_desc->pci.msi_attrib.is_msix && msi_desc->nvec_used > 1) {
+ if (multi_msi) {
/*
* If this is not the first MSI of Multi MSI, we already have
* a mapping. Can exit early.
--
2.25.1
next reply other threads:[~2022-08-04 2:51 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-04 2:51 Dexuan Cui [this message]
2022-08-04 14:22 ` [PATCH] PCI: hv: Only reuse existing IRTE allocation for Multi-MSI Jeffrey Hugo
2022-08-10 21:51 ` Dexuan Cui
2022-08-16 15:47 ` Lorenzo Pieralisi
2022-08-16 21:04 ` Dexuan Cui
2022-08-16 15:51 ` Bjorn Helgaas
2022-08-16 21:13 ` Dexuan Cui
2022-08-17 7:51 ` Lorenzo Pieralisi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220804025104.15673-1-decui@microsoft.com \
--to=decui@microsoft.com \
--cc=Boqun.Feng@microsoft.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=boqun.feng@gmail.com \
--cc=haiyangz@microsoft.com \
--cc=helgaas@kernel.org \
--cc=kw@linux.com \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=mikelley@microsoft.com \
--cc=quic_carlv@quicinc.com \
--cc=quic_jhugo@quicinc.com \
--cc=robh@kernel.org \
--cc=sthemmin@microsoft.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).