From: korantwork@gmail.com
To: helgaas@kernel.org, nirmal.patel@linux.intel.com,
kbusch@kernel.org, jonathan.derrick@linux.dev,
lpieralisi@kernel.org
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
Xinghui Li <korantli@tencent.com>
Subject: [PATCH v5] PCI: vmd: Add the module param to adjust MSI mode
Date: Thu, 20 Apr 2023 15:09:14 +0800 [thread overview]
Message-ID: <20230420070914.1383918-1-korantwork@gmail.com> (raw)
From: Xinghui Li <korantli@tencent.com>
In the past, the vmd MSI mode can only be adjusted by configuring
vmd_ids table. This patch adds another way to adjust MSI mode by
adjusting module parameter, which allows users easier to adjust the vmd
according to the I/O scenario without rebuilding driver.
- "disable_msi_bypass=0 or other values":
Under normal circumstances, we recommend enable the VMD MSI-X bypass
feature, which improves interrupt handling performance by avoiding
the VMD MSI-X domain interrupt handler.
- "disable_msi_bypass=1":
Use this when multiple NVMe devices are mounted on the same PCIe
node with a high volume of 4K random I/O. It mitigates excessive
pressure on the PCIe node caused by numerous interrupts from NVMe
drives, resulting in improved I/O performance. Such as:
In FIO 4K random test when 4 NVME(Gen4) mounted on the same PCIE port:
- Enable bypass: read: IOPS=562k, BW=2197MiB/s, io=644GiB
- Disable bypass: read: IOPS=1144k, BW=4470MiB/s, io=1310GiB
As not all devices support VMD MSI-X bypass, this parameter is
only applicable to devices that support the bypass function and
have already enabled it, such as VMD_28C0. Besides, this parameter
does not affect the MSI-X working mode in guest.
Signed-off-by: Xinghui Li <korantli@tencent.com>
---
drivers/pci/controller/vmd.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 990630ec57c6..8ee673810cbf 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -34,6 +34,20 @@
#define MB2_SHADOW_OFFSET 0x2000
#define MB2_SHADOW_SIZE 16
+/*
+ * The VMD disable_msi_bypass module parameter provides the alternative
+ * way to adjust MSI mode when loading vmd.ko. This parameter is only applicable
+ * to devices that both support and have enabled bypass, such as VMD_28C0.
+ * Besides, it does not affect MSI-X mode in the guest.
+ *
+ * 1: disable MSI-X bypass
+ * other values: not disable MSI-X bypass
+ */
+static int disable_msi_bypass;
+module_param(disable_msi_bypass, int, 0444);
+MODULE_PARM_DESC(disable_msi_bypass, "Whether to disable MSI-X bypass function.\n"
+ "\t\t Only effective on the device supporting bypass, such as 28C0.");
+
enum vmd_features {
/*
* Device may contain registers which hint the physical location of the
@@ -875,6 +889,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
return ret;
vmd_set_msi_remapping(vmd, true);
+ dev_info(&vmd->dev->dev, "init vmd with remapping MSI-X\n");
ret = vmd_create_irq_domain(vmd);
if (ret)
@@ -887,6 +902,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
} else {
vmd_set_msi_remapping(vmd, false);
+ dev_info(&vmd->dev->dev, "init vmd with bypass MSI-X\n");
}
pci_add_resource(&resources, &vmd->resources[0]);
@@ -955,6 +971,17 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
return 0;
}
+static void vmd_config_msi_bypass_param(unsigned long *features)
+{
+ /*
+ * Not every VMD device supports and enables bypass MSI-X.
+ * Make sure current device has the bypass flag set.
+ */
+ if (disable_msi_bypass == 1 &&
+ *features & VMD_FEAT_CAN_BYPASS_MSI_REMAP)
+ *features &= ~(VMD_FEAT_CAN_BYPASS_MSI_REMAP);
+}
+
static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
{
unsigned long features = (unsigned long) id->driver_data;
@@ -984,6 +1011,8 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
if (err < 0)
goto out_release_instance;
+ vmd_config_msi_bypass_param(&features);
+
vmd->cfgbar = pcim_iomap(dev, VMD_CFGBAR, 0);
if (!vmd->cfgbar) {
err = -ENOMEM;
--
2.31.1
next reply other threads:[~2023-04-20 7:09 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-20 7:09 korantwork [this message]
2023-04-28 18:40 ` [PATCH v5] PCI: vmd: Add the module param to adjust MSI mode Bjorn Helgaas
2023-04-28 19:58 ` Bjorn Helgaas
2023-05-05 9:31 ` Xinghui Li
2023-05-05 16:07 ` Bjorn Helgaas
2023-05-08 13:01 ` Xinghui Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230420070914.1383918-1-korantwork@gmail.com \
--to=korantwork@gmail.com \
--cc=helgaas@kernel.org \
--cc=jonathan.derrick@linux.dev \
--cc=kbusch@kernel.org \
--cc=korantli@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=nirmal.patel@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox