linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting
@ 2025-08-13 23:28 Inochi Amaoto
  2025-08-13 23:28 ` [PATCH v2 1/4] genirq: Add irq_chip_(startup/shutdown)_parent() Inochi Amaoto
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-13 23:28 UTC (permalink / raw)
  To: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Inochi Amaoto, Jonathan Cameron,
	Juergen Gross, Nicolin Chen, Jason Gunthorpe, Chen Wang
  Cc: linux-kernel, linux-pci, Yixun Lan, Longbin Li

When using NVME on SG2044, the NVME always complains "I/O tag XXX
(XXX) QID XX timeout, completion polled", which is caused by the
broken handler of the sg2042-msi driver.

As PLIC driver can only setting affinity when enabling, the sg2042-msi
does not properly handled affinity setting previously and enable irq in
an unexpected executing path.

Add irq_startup/irq_shutdown support to the PCI template domain,
then set irq_chip_[startup/shutdown]_parent for irq_startup/
irq_shutdown of the sg2042-msi driver. So the irq can be started
properly.

Change from v1:
1. patch 1: Fix comment format problem, and structure the changelog.
2. patch 2: Improve the comment title and body, add describtion about
            the fact the PLIC is used as parent chip.
3. patch 2: Remove __always_inline for cond_[shutdown/startup]_parent().
4. patch 3: Update the align of the "XXX_MSI_FLAGS_XXX" macro.
5. patch 4: Claim the fact that the added flag is used by the negotiation
            of MSI controller driver and PCIe device driver, and can be
	    only used when both of them support this flag.

Inochi Amaoto (4):
  genirq: Add irq_chip_(startup/shutdown)_parent()
  PCI/MSI: Add startup/shutdown for per device domains
  irqchip/sg2042-msi: Fix broken affinity setting
  irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044

 drivers/irqchip/irq-sg2042-msi.c | 19 +++++++++---
 drivers/pci/msi/irqdomain.c      | 52 ++++++++++++++++++++++++++++++++
 include/linux/irq.h              |  2 ++
 include/linux/msi.h              |  2 ++
 kernel/irq/chip.c                | 37 +++++++++++++++++++++++
 5 files changed, 107 insertions(+), 5 deletions(-)

--
2.50.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/4] genirq: Add irq_chip_(startup/shutdown)_parent()
  2025-08-13 23:28 [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
@ 2025-08-13 23:28 ` Inochi Amaoto
  2025-08-23 19:28   ` [tip: irq/core] " tip-bot2 for Inochi Amaoto
  2025-08-13 23:28 ` [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains Inochi Amaoto
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-13 23:28 UTC (permalink / raw)
  To: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Inochi Amaoto, Jonathan Cameron,
	Juergen Gross, Nicolin Chen, Jason Gunthorpe, Chen Wang
  Cc: linux-kernel, linux-pci, Yixun Lan, Longbin Li

As the MSI controller on SG2044 uses PLIC as the underlying interrupt
controller, it needs to call the irq_enable() and irq_disable() to
startup/shutdown irqs. Otherwise, the MSI interrupt can not be startup
correctly and will not respond any incoming interrupt.

Introduce helper irq_chip_startup_parent() and irq_chip_shutdown_parent()
to allow the interrupt controller to call the irq_startup() or
irq_shutdown() of the parent interrupt chip. In case irq_startup() or
irq_shutdown() is not implemented for the parent interrupt chip, which
will fallback to irq_chip_enable_parent() or irq_chip_disable_parent().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
---
 include/linux/irq.h |  2 ++
 kernel/irq/chip.c   | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 1d6b606a81ef..890e1371f5d4 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -669,6 +669,8 @@ extern int irq_chip_set_parent_state(struct irq_data *data,
 extern int irq_chip_get_parent_state(struct irq_data *data,
 				     enum irqchip_irq_state which,
 				     bool *state);
+extern void irq_chip_shutdown_parent(struct irq_data *data);
+extern unsigned int irq_chip_startup_parent(struct irq_data *data);
 extern void irq_chip_enable_parent(struct irq_data *data);
 extern void irq_chip_disable_parent(struct irq_data *data);
 extern void irq_chip_ack_parent(struct irq_data *data);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 0d0276378c70..3ffa0d80ddd1 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1259,6 +1259,43 @@ int irq_chip_get_parent_state(struct irq_data *data,
 }
 EXPORT_SYMBOL_GPL(irq_chip_get_parent_state);
 
+/**
+ * irq_chip_shutdown_parent - Shutdown the parent interrupt
+ * @data:	Pointer to interrupt specific data
+ *
+ * Invokes the irq_shutdown() callback of the parent if available or falls
+ * back to irq_chip_disable_parent().
+ */
+void irq_chip_shutdown_parent(struct irq_data *data)
+{
+	struct irq_data *parent = data->parent_data;
+
+	if (parent->chip->irq_shutdown)
+		parent->chip->irq_shutdown(parent);
+	else
+		irq_chip_disable_parent(data);
+}
+EXPORT_SYMBOL_GPL(irq_chip_shutdown_parent);
+
+/**
+ * irq_chip_startup_parent - Startup the parent interrupt
+ * @data:	Pointer to interrupt specific data
+ *
+ * Invokes the irq_startup() callback of the parent if available or falls
+ * back to irq_chip_enable_parent().
+ */
+unsigned int irq_chip_startup_parent(struct irq_data *data)
+{
+	struct irq_data *parent = data->parent_data;
+
+	if (parent->chip->irq_startup)
+		return parent->chip->irq_startup(parent);
+
+	irq_chip_enable_parent(data);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_chip_startup_parent);
+
 /**
  * irq_chip_enable_parent - Enable the parent interrupt (defaults to unmask if
  * NULL)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-13 23:28 [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
  2025-08-13 23:28 ` [PATCH v2 1/4] genirq: Add irq_chip_(startup/shutdown)_parent() Inochi Amaoto
@ 2025-08-13 23:28 ` Inochi Amaoto
  2025-08-20 20:54   ` Bjorn Helgaas
                     ` (3 more replies)
  2025-08-13 23:28 ` [PATCH v2 3/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
                   ` (2 subsequent siblings)
  4 siblings, 4 replies; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-13 23:28 UTC (permalink / raw)
  To: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Inochi Amaoto, Jonathan Cameron,
	Juergen Gross, Nicolin Chen, Jason Gunthorpe, Chen Wang
  Cc: linux-kernel, linux-pci, Yixun Lan, Longbin Li

As the RISC-V PLIC can not apply affinity setting without calling
irq_enable(), it will make the interrupt unavailble when using as
an underlying IRQ chip for MSI controller.

Implement .irq_startup() and .irq_shutdown() for the PCI MSI and
MSI-X templates. For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT,
these startup and shutdown the parent as well, which allows the
irq on the parent chip to be enabled if the irq is not enabled
when allocating. This is necessary for the MSI controllers which
use PLIC as underlying IRQ chip.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
---
 drivers/pci/msi/irqdomain.c | 52 +++++++++++++++++++++++++++++++++++++
 include/linux/msi.h         |  2 ++
 2 files changed, 54 insertions(+)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index 0938ef7ebabf..e0a800f918e8 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -148,6 +148,23 @@ static void pci_device_domain_set_desc(msi_alloc_info_t *arg, struct msi_desc *d
 	arg->hwirq = desc->msi_index;
 }
 
+static void cond_shutdown_parent(struct irq_data *data)
+{
+	struct msi_domain_info *info = data->domain->host_data;
+
+	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
+		irq_chip_shutdown_parent(data);
+}
+
+static unsigned int cond_startup_parent(struct irq_data *data)
+{
+	struct msi_domain_info *info = data->domain->host_data;
+
+	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
+		return irq_chip_startup_parent(data);
+	return 0;
+}
+
 static __always_inline void cond_mask_parent(struct irq_data *data)
 {
 	struct msi_domain_info *info = data->domain->host_data;
@@ -164,6 +181,23 @@ static __always_inline void cond_unmask_parent(struct irq_data *data)
 		irq_chip_unmask_parent(data);
 }
 
+static void pci_irq_shutdown_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	pci_msi_mask(desc, BIT(data->irq - desc->irq));
+	cond_shutdown_parent(data);
+}
+
+static unsigned int pci_irq_startup_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	unsigned int ret = cond_startup_parent(data);
+
+	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
+	return ret;
+}
+
 static void pci_irq_mask_msi(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
@@ -194,6 +228,8 @@ static void pci_irq_unmask_msi(struct irq_data *data)
 static const struct msi_domain_template pci_msi_template = {
 	.chip = {
 		.name			= "PCI-MSI",
+		.irq_startup		= pci_irq_startup_msi,
+		.irq_shutdown		= pci_irq_shutdown_msi,
 		.irq_mask		= pci_irq_mask_msi,
 		.irq_unmask		= pci_irq_unmask_msi,
 		.irq_write_msi_msg	= pci_msi_domain_write_msg,
@@ -210,6 +246,20 @@ static const struct msi_domain_template pci_msi_template = {
 	},
 };
 
+static void pci_irq_shutdown_msix(struct irq_data *data)
+{
+	pci_msix_mask(irq_data_get_msi_desc(data));
+	cond_shutdown_parent(data);
+}
+
+static unsigned int pci_irq_startup_msix(struct irq_data *data)
+{
+	unsigned int ret = cond_startup_parent(data);
+
+	pci_msix_unmask(irq_data_get_msi_desc(data));
+	return ret;
+}
+
 static void pci_irq_mask_msix(struct irq_data *data)
 {
 	pci_msix_mask(irq_data_get_msi_desc(data));
@@ -234,6 +284,8 @@ EXPORT_SYMBOL_GPL(pci_msix_prepare_desc);
 static const struct msi_domain_template pci_msix_template = {
 	.chip = {
 		.name			= "PCI-MSIX",
+		.irq_startup		= pci_irq_startup_msix,
+		.irq_shutdown		= pci_irq_shutdown_msix,
 		.irq_mask		= pci_irq_mask_msix,
 		.irq_unmask		= pci_irq_unmask_msix,
 		.irq_write_msi_msg	= pci_msi_domain_write_msg,
diff --git a/include/linux/msi.h b/include/linux/msi.h
index e5e86a8529fb..3111ba95fbde 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -568,6 +568,8 @@ enum {
 	MSI_FLAG_PARENT_PM_DEV		= (1 << 8),
 	/* Support for parent mask/unmask */
 	MSI_FLAG_PCI_MSI_MASK_PARENT	= (1 << 9),
+	/* Support for parent startup/shutdown */
+	MSI_FLAG_PCI_MSI_STARTUP_PARENT	= (1 << 10),
 
 	/* Mask for the generic functionality */
 	MSI_GENERIC_FLAGS_MASK		= GENMASK(15, 0),
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/4] irqchip/sg2042-msi: Fix broken affinity setting
  2025-08-13 23:28 [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
  2025-08-13 23:28 ` [PATCH v2 1/4] genirq: Add irq_chip_(startup/shutdown)_parent() Inochi Amaoto
  2025-08-13 23:28 ` [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains Inochi Amaoto
@ 2025-08-13 23:28 ` Inochi Amaoto
  2025-08-23 19:30   ` [tip: irq/drivers] " tip-bot2 for Inochi Amaoto
  2025-08-13 23:28 ` [PATCH v2 4/4] irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044 Inochi Amaoto
  2025-08-21  6:38 ` [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Chen Wang
  4 siblings, 1 reply; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-13 23:28 UTC (permalink / raw)
  To: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Inochi Amaoto, Jonathan Cameron,
	Juergen Gross, Nicolin Chen, Jason Gunthorpe, Chen Wang
  Cc: linux-kernel, linux-pci, Yixun Lan, Longbin Li, Han Gao

When using NVME on SG2044, the NVME always complains "I/O tag XXX
(XXX) QID XX timeout, completion polled", which is caused by the
broken handler of the sg2042-msi driver.

As PLIC driver can only set affinity when enabling, the sg2042-msi
does not properly handled affinity setting previously and enables
irq in an unexpected executing path.

Since the PCI template domain supports irq_startup()/irq_shutdown(),
set irq_chip_[startup/shutdown]_parent() for irq_startup() and
irq_shutdown(). So the irq can be started properly.

Fixes: e96b93a97c90 ("irqchip/sg2042-msi: Add the Sophgo SG2044 MSI interrupt controller")
Reported-by: Han Gao <rabenda.cn@gmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
---
 drivers/irqchip/irq-sg2042-msi.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-sg2042-msi.c b/drivers/irqchip/irq-sg2042-msi.c
index bcfddc51bc6a..2fd4d94f9bd7 100644
--- a/drivers/irqchip/irq-sg2042-msi.c
+++ b/drivers/irqchip/irq-sg2042-msi.c
@@ -85,6 +85,8 @@ static void sg2042_msi_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *m
 
 static const struct irq_chip sg2042_msi_middle_irq_chip = {
 	.name			= "SG2042 MSI",
+	.irq_startup		= irq_chip_startup_parent,
+	.irq_shutdown		= irq_chip_shutdown_parent,
 	.irq_ack		= sg2042_msi_irq_ack,
 	.irq_mask		= irq_chip_mask_parent,
 	.irq_unmask		= irq_chip_unmask_parent,
@@ -114,6 +116,8 @@ static void sg2044_msi_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *m
 
 static struct irq_chip sg2044_msi_middle_irq_chip = {
 	.name			= "SG2044 MSI",
+	.irq_startup		= irq_chip_startup_parent,
+	.irq_shutdown		= irq_chip_shutdown_parent,
 	.irq_ack		= sg2044_msi_irq_ack,
 	.irq_mask		= irq_chip_mask_parent,
 	.irq_unmask		= irq_chip_unmask_parent,
@@ -185,8 +189,10 @@ static const struct irq_domain_ops sg204x_msi_middle_domain_ops = {
 	.select	= msi_lib_irq_domain_select,
 };
 
-#define SG2042_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS |	\
-				   MSI_FLAG_USE_DEF_CHIP_OPS)
+#define SG2042_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS |		\
+				   MSI_FLAG_USE_DEF_CHIP_OPS |		\
+				   MSI_FLAG_PCI_MSI_MASK_PARENT |	\
+				   MSI_FLAG_PCI_MSI_STARTUP_PARENT)
 
 #define SG2042_MSI_FLAGS_SUPPORTED MSI_GENERIC_FLAGS_MASK
 
@@ -200,10 +206,12 @@ static const struct msi_parent_ops sg2042_msi_parent_ops = {
 	.init_dev_msi_info	= msi_lib_init_dev_msi_info,
 };
 
-#define SG2044_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS |	\
-				   MSI_FLAG_USE_DEF_CHIP_OPS)
+#define SG2044_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS |		\
+				   MSI_FLAG_USE_DEF_CHIP_OPS |		\
+				   MSI_FLAG_PCI_MSI_MASK_PARENT |	\
+				   MSI_FLAG_PCI_MSI_STARTUP_PARENT)
 
-#define SG2044_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK |	\
+#define SG2044_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK |		\
 				    MSI_FLAG_PCI_MSIX)
 
 static const struct msi_parent_ops sg2044_msi_parent_ops = {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/4] irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044
  2025-08-13 23:28 [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
                   ` (2 preceding siblings ...)
  2025-08-13 23:28 ` [PATCH v2 3/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
@ 2025-08-13 23:28 ` Inochi Amaoto
  2025-08-23 19:30   ` [tip: irq/drivers] " tip-bot2 for Inochi Amaoto
  2025-08-21  6:38 ` [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Chen Wang
  4 siblings, 1 reply; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-13 23:28 UTC (permalink / raw)
  To: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Inochi Amaoto, Jonathan Cameron,
	Juergen Gross, Nicolin Chen, Jason Gunthorpe, Chen Wang
  Cc: linux-kernel, linux-pci, Yixun Lan, Longbin Li

The MSI controller on SG2044 has the ability to allocate multiple
PCI MSI interrupts. So the PCIe controller driver can use this
feature if it also supports multiple PCI MSI interrupts.

Add MSI_FLAG_MULTI_PCI_MSI flag for the supported_flags of
SG2044 msi_parent_ops so the PCIe controller driver can use
this feature if it also supports this feature.

Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
---
 drivers/irqchip/irq-sg2042-msi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/irqchip/irq-sg2042-msi.c b/drivers/irqchip/irq-sg2042-msi.c
index 2fd4d94f9bd7..3b13dbbfdb51 100644
--- a/drivers/irqchip/irq-sg2042-msi.c
+++ b/drivers/irqchip/irq-sg2042-msi.c
@@ -212,6 +212,7 @@ static const struct msi_parent_ops sg2042_msi_parent_ops = {
 				   MSI_FLAG_PCI_MSI_STARTUP_PARENT)
 
 #define SG2044_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK |		\
+				    MSI_FLAG_MULTI_PCI_MSI |		\
 				    MSI_FLAG_PCI_MSIX)
 
 static const struct msi_parent_ops sg2044_msi_parent_ops = {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-13 23:28 ` [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains Inochi Amaoto
@ 2025-08-20 20:54   ` Bjorn Helgaas
  2025-08-23 19:08     ` Thomas Gleixner
  2025-08-23 19:30   ` [tip: irq/drivers] " tip-bot2 for Inochi Amaoto
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2025-08-20 20:54 UTC (permalink / raw)
  To: Inochi Amaoto
  Cc: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Jonathan Cameron, Juergen Gross,
	Nicolin Chen, Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci,
	Yixun Lan, Longbin Li

On Thu, Aug 14, 2025 at 07:28:32AM +0800, Inochi Amaoto wrote:
> As the RISC-V PLIC can not apply affinity setting without calling
> irq_enable(), it will make the interrupt unavailble when using as
> an underlying IRQ chip for MSI controller.

s/unavailble/unavailable/ (mentioned previously)

> Implement .irq_startup() and .irq_shutdown() for the PCI MSI and
> MSI-X templates. For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT,
> these startup and shutdown the parent as well, which allows the
> irq on the parent chip to be enabled if the irq is not enabled
> when allocating. This is necessary for the MSI controllers which
> use PLIC as underlying IRQ chip.

s/irq/IRQ/ a couple times above

> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Inochi Amaoto <inochiama@gmail.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Thomas, I assume you'll merge this series; let me know if not.

> ---
>  drivers/pci/msi/irqdomain.c | 52 +++++++++++++++++++++++++++++++++++++
>  include/linux/msi.h         |  2 ++
>  2 files changed, 54 insertions(+)
> 
> diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> index 0938ef7ebabf..e0a800f918e8 100644
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -148,6 +148,23 @@ static void pci_device_domain_set_desc(msi_alloc_info_t *arg, struct msi_desc *d
>  	arg->hwirq = desc->msi_index;
>  }
>  
> +static void cond_shutdown_parent(struct irq_data *data)
> +{
> +	struct msi_domain_info *info = data->domain->host_data;
> +
> +	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
> +		irq_chip_shutdown_parent(data);
> +}
> +
> +static unsigned int cond_startup_parent(struct irq_data *data)
> +{
> +	struct msi_domain_info *info = data->domain->host_data;
> +
> +	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
> +		return irq_chip_startup_parent(data);
> +	return 0;
> +}
> +
>  static __always_inline void cond_mask_parent(struct irq_data *data)
>  {
>  	struct msi_domain_info *info = data->domain->host_data;
> @@ -164,6 +181,23 @@ static __always_inline void cond_unmask_parent(struct irq_data *data)
>  		irq_chip_unmask_parent(data);
>  }
>  
> +static void pci_irq_shutdown_msi(struct irq_data *data)
> +{
> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
> +
> +	pci_msi_mask(desc, BIT(data->irq - desc->irq));
> +	cond_shutdown_parent(data);
> +}
> +
> +static unsigned int pci_irq_startup_msi(struct irq_data *data)
> +{
> +	struct msi_desc *desc = irq_data_get_msi_desc(data);
> +	unsigned int ret = cond_startup_parent(data);
> +
> +	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
> +	return ret;
> +}
> +
>  static void pci_irq_mask_msi(struct irq_data *data)
>  {
>  	struct msi_desc *desc = irq_data_get_msi_desc(data);
> @@ -194,6 +228,8 @@ static void pci_irq_unmask_msi(struct irq_data *data)
>  static const struct msi_domain_template pci_msi_template = {
>  	.chip = {
>  		.name			= "PCI-MSI",
> +		.irq_startup		= pci_irq_startup_msi,
> +		.irq_shutdown		= pci_irq_shutdown_msi,
>  		.irq_mask		= pci_irq_mask_msi,
>  		.irq_unmask		= pci_irq_unmask_msi,
>  		.irq_write_msi_msg	= pci_msi_domain_write_msg,
> @@ -210,6 +246,20 @@ static const struct msi_domain_template pci_msi_template = {
>  	},
>  };
>  
> +static void pci_irq_shutdown_msix(struct irq_data *data)
> +{
> +	pci_msix_mask(irq_data_get_msi_desc(data));
> +	cond_shutdown_parent(data);
> +}
> +
> +static unsigned int pci_irq_startup_msix(struct irq_data *data)
> +{
> +	unsigned int ret = cond_startup_parent(data);
> +
> +	pci_msix_unmask(irq_data_get_msi_desc(data));
> +	return ret;
> +}
> +
>  static void pci_irq_mask_msix(struct irq_data *data)
>  {
>  	pci_msix_mask(irq_data_get_msi_desc(data));
> @@ -234,6 +284,8 @@ EXPORT_SYMBOL_GPL(pci_msix_prepare_desc);
>  static const struct msi_domain_template pci_msix_template = {
>  	.chip = {
>  		.name			= "PCI-MSIX",
> +		.irq_startup		= pci_irq_startup_msix,
> +		.irq_shutdown		= pci_irq_shutdown_msix,
>  		.irq_mask		= pci_irq_mask_msix,
>  		.irq_unmask		= pci_irq_unmask_msix,
>  		.irq_write_msi_msg	= pci_msi_domain_write_msg,
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index e5e86a8529fb..3111ba95fbde 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -568,6 +568,8 @@ enum {
>  	MSI_FLAG_PARENT_PM_DEV		= (1 << 8),
>  	/* Support for parent mask/unmask */
>  	MSI_FLAG_PCI_MSI_MASK_PARENT	= (1 << 9),
> +	/* Support for parent startup/shutdown */
> +	MSI_FLAG_PCI_MSI_STARTUP_PARENT	= (1 << 10),
>  
>  	/* Mask for the generic functionality */
>  	MSI_GENERIC_FLAGS_MASK		= GENMASK(15, 0),
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting
  2025-08-13 23:28 [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
                   ` (3 preceding siblings ...)
  2025-08-13 23:28 ` [PATCH v2 4/4] irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044 Inochi Amaoto
@ 2025-08-21  6:38 ` Chen Wang
  4 siblings, 0 replies; 24+ messages in thread
From: Chen Wang @ 2025-08-21  6:38 UTC (permalink / raw)
  To: Inochi Amaoto, Thomas Gleixner, Bjorn Helgaas, Marc Zyngier,
	Lorenzo Pieralisi, Shradha Gupta, Haiyang Zhang, Jonathan Cameron,
	Juergen Gross, Nicolin Chen, Jason Gunthorpe
  Cc: linux-kernel, linux-pci, Yixun Lan, Longbin Li


On 8/14/2025 7:28 AM, Inochi Amaoto wrote:
> When using NVME on SG2044, the NVME always complains "I/O tag XXX
> (XXX) QID XX timeout, completion polled", which is caused by the
> broken handler of the sg2042-msi driver.
>
> As PLIC driver can only setting affinity when enabling, the sg2042-msi
> does not properly handled affinity setting previously and enable irq in
> an unexpected executing path.
>
> Add irq_startup/irq_shutdown support to the PCI template domain,
> then set irq_chip_[startup/shutdown]_parent for irq_startup/
> irq_shutdown of the sg2042-msi driver. So the irq can be started
> properly.
>
> Change from v1:
> 1. patch 1: Fix comment format problem, and structure the changelog.
> 2. patch 2: Improve the comment title and body, add describtion about
>              the fact the PLIC is used as parent chip.
> 3. patch 2: Remove __always_inline for cond_[shutdown/startup]_parent().
> 4. patch 3: Update the align of the "XXX_MSI_FLAGS_XXX" macro.
> 5. patch 4: Claim the fact that the added flag is used by the negotiation
>              of MSI controller driver and PCIe device driver, and can be
> 	    only used when both of them support this flag.
>
> Inochi Amaoto (4):
>    genirq: Add irq_chip_(startup/shutdown)_parent()
>    PCI/MSI: Add startup/shutdown for per device domains
>    irqchip/sg2042-msi: Fix broken affinity setting
>    irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044
>
>   drivers/irqchip/irq-sg2042-msi.c | 19 +++++++++---
>   drivers/pci/msi/irqdomain.c      | 52 ++++++++++++++++++++++++++++++++
>   include/linux/irq.h              |  2 ++
>   include/linux/msi.h              |  2 ++
>   kernel/irq/chip.c                | 37 +++++++++++++++++++++++
>   5 files changed, 107 insertions(+), 5 deletions(-)
>
> --
> 2.50.1
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>

It is recommended to add the following email link about the relevant 
discussion in the commit message for quich reference when there is a 
revision or merge:

https://lore.kernel.org/lkml/20250722224513.22125-1-inochiama@gmail.com/

Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-20 20:54   ` Bjorn Helgaas
@ 2025-08-23 19:08     ` Thomas Gleixner
  0 siblings, 0 replies; 24+ messages in thread
From: Thomas Gleixner @ 2025-08-23 19:08 UTC (permalink / raw)
  To: Bjorn Helgaas, Inochi Amaoto
  Cc: Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi, Shradha Gupta,
	Haiyang Zhang, Jonathan Cameron, Juergen Gross, Nicolin Chen,
	Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci, Yixun Lan,
	Longbin Li

On Wed, Aug 20 2025 at 15:54, Bjorn Helgaas wrote:
> On Thu, Aug 14, 2025 at 07:28:32AM +0800, Inochi Amaoto wrote:
>> As the RISC-V PLIC can not apply affinity setting without calling
>> irq_enable(), it will make the interrupt unavailble when using as
>> an underlying IRQ chip for MSI controller.
>
> s/unavailble/unavailable/ (mentioned previously)
>
>> Implement .irq_startup() and .irq_shutdown() for the PCI MSI and
>> MSI-X templates. For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT,
>> these startup and shutdown the parent as well, which allows the
>> irq on the parent chip to be enabled if the irq is not enabled
>> when allocating. This is necessary for the MSI controllers which
>> use PLIC as underlying IRQ chip.
>
> s/irq/IRQ/ a couple times above
>
>> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Thomas, I assume you'll merge this series; let me know if not.

I'll pick it up and fixup the wording as I go.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip: irq/core] genirq: Add irq_chip_(startup/shutdown)_parent()
  2025-08-13 23:28 ` [PATCH v2 1/4] genirq: Add irq_chip_(startup/shutdown)_parent() Inochi Amaoto
@ 2025-08-23 19:28   ` tip-bot2 for Inochi Amaoto
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for Inochi Amaoto @ 2025-08-23 19:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Inochi Amaoto, Chen Wang, x86, linux-kernel, maz

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     7a721a2fee2bce01af26699a87739db8ca8ea3c8
Gitweb:        https://git.kernel.org/tip/7a721a2fee2bce01af26699a87739db8ca8ea3c8
Author:        Inochi Amaoto <inochiama@gmail.com>
AuthorDate:    Thu, 14 Aug 2025 07:28:31 +08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 23 Aug 2025 21:20:25 +02:00

genirq: Add irq_chip_(startup/shutdown)_parent()

As the MSI controller on SG2044 uses PLIC as the underlying interrupt
controller, it needs to call irq_enable() and irq_disable() to
startup/shutdown interrupts. Otherwise, the MSI interrupt can not be
startup correctly and will not respond any incoming interrupt.

Introduce irq_chip_startup_parent() and irq_chip_shutdown_parent() to allow
the interrupt controller to call the irq_startup()/irq_shutdown() callbacks
of the parent interrupt chip.

In case the irq_startup()/irq_shutdown() callbacks are not implemented for
the parent interrupt chip, this will fallback to irq_chip_enable_parent()
or irq_chip_disable_parent().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/all/20250813232835.43458-2-inochiama@gmail.com
Link: https://lore.kernel.org/lkml/20250722224513.22125-1-inochiama@gmail.com/
---
 include/linux/irq.h |  2 ++
 kernel/irq/chip.c   | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index c9bcdbf..c67e76f 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -669,6 +669,8 @@ extern int irq_chip_set_parent_state(struct irq_data *data,
 extern int irq_chip_get_parent_state(struct irq_data *data,
 				     enum irqchip_irq_state which,
 				     bool *state);
+extern void irq_chip_shutdown_parent(struct irq_data *data);
+extern unsigned int irq_chip_startup_parent(struct irq_data *data);
 extern void irq_chip_enable_parent(struct irq_data *data);
 extern void irq_chip_disable_parent(struct irq_data *data);
 extern void irq_chip_ack_parent(struct irq_data *data);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 0d02763..3ffa0d8 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1260,6 +1260,43 @@ int irq_chip_get_parent_state(struct irq_data *data,
 EXPORT_SYMBOL_GPL(irq_chip_get_parent_state);
 
 /**
+ * irq_chip_shutdown_parent - Shutdown the parent interrupt
+ * @data:	Pointer to interrupt specific data
+ *
+ * Invokes the irq_shutdown() callback of the parent if available or falls
+ * back to irq_chip_disable_parent().
+ */
+void irq_chip_shutdown_parent(struct irq_data *data)
+{
+	struct irq_data *parent = data->parent_data;
+
+	if (parent->chip->irq_shutdown)
+		parent->chip->irq_shutdown(parent);
+	else
+		irq_chip_disable_parent(data);
+}
+EXPORT_SYMBOL_GPL(irq_chip_shutdown_parent);
+
+/**
+ * irq_chip_startup_parent - Startup the parent interrupt
+ * @data:	Pointer to interrupt specific data
+ *
+ * Invokes the irq_startup() callback of the parent if available or falls
+ * back to irq_chip_enable_parent().
+ */
+unsigned int irq_chip_startup_parent(struct irq_data *data)
+{
+	struct irq_data *parent = data->parent_data;
+
+	if (parent->chip->irq_startup)
+		return parent->chip->irq_startup(parent);
+
+	irq_chip_enable_parent(data);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_chip_startup_parent);
+
+/**
  * irq_chip_enable_parent - Enable the parent interrupt (defaults to unmask if
  * NULL)
  * @data:	Pointer to interrupt specific data

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip: irq/drivers] irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044
  2025-08-13 23:28 ` [PATCH v2 4/4] irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044 Inochi Amaoto
@ 2025-08-23 19:30   ` tip-bot2 for Inochi Amaoto
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for Inochi Amaoto @ 2025-08-23 19:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Inochi Amaoto, Thomas Gleixner, Chen Wang, x86, linux-kernel

The following commit has been merged into the irq/drivers branch of tip:

Commit-ID:     7ee4a5a2ec3748facfb4ca96e4cce6cabbdecab2
Gitweb:        https://git.kernel.org/tip/7ee4a5a2ec3748facfb4ca96e4cce6cabbdecab2
Author:        Inochi Amaoto <inochiama@gmail.com>
AuthorDate:    Thu, 14 Aug 2025 07:28:34 +08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 23 Aug 2025 21:21:13 +02:00

irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044

The MSI controller on SG2044 has the ability to allocate multiple PCI MSI
interrupts. So the PCIe controller driver can use this feature if the
hardware supports multiple PCI MSI interrupts.

Add the MSI_FLAG_MULTI_PCI_MSI flag to the supported_flags of SG2044
msi_parent_ops to enable this functionality.

Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/all/20250813232835.43458-5-inochiama@gmail.com

---
 drivers/irqchip/irq-sg2042-msi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/irqchip/irq-sg2042-msi.c b/drivers/irqchip/irq-sg2042-msi.c
index 2fd4d94..3b13dbb 100644
--- a/drivers/irqchip/irq-sg2042-msi.c
+++ b/drivers/irqchip/irq-sg2042-msi.c
@@ -212,6 +212,7 @@ static const struct msi_parent_ops sg2042_msi_parent_ops = {
 				   MSI_FLAG_PCI_MSI_STARTUP_PARENT)
 
 #define SG2044_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK |		\
+				    MSI_FLAG_MULTI_PCI_MSI |		\
 				    MSI_FLAG_PCI_MSIX)
 
 static const struct msi_parent_ops sg2044_msi_parent_ops = {

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip: irq/drivers] irqchip/sg2042-msi: Fix broken affinity setting
  2025-08-13 23:28 ` [PATCH v2 3/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
@ 2025-08-23 19:30   ` tip-bot2 for Inochi Amaoto
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for Inochi Amaoto @ 2025-08-23 19:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Han Gao, Thomas Gleixner, Inochi Amaoto, Chen Wang, x86,
	linux-kernel

The following commit has been merged into the irq/drivers branch of tip:

Commit-ID:     9d8c41816bac518b4824f83b346ae30a1be83f68
Gitweb:        https://git.kernel.org/tip/9d8c41816bac518b4824f83b346ae30a1be83f68
Author:        Inochi Amaoto <inochiama@gmail.com>
AuthorDate:    Thu, 14 Aug 2025 07:28:33 +08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 23 Aug 2025 21:21:13 +02:00

irqchip/sg2042-msi: Fix broken affinity setting

When using NVME on SG2044, the NVME drvier always complains about "I/O tag
XXX (XXX) QID XX timeout, completion polled", which is caused by the broken
affinity setting mechanism of the sg2042-msi driver.

The PLIC driver can only the set the affinity when enabled, but the
sg2042-msi driver invokes the affinity setter in disabled state, which
causes the change to be lost.

Cure this by implementing the irq_startup()/shutdown() callbacks, which
allow to startup (enabled) the underlying PLIC first.

Fixes: e96b93a97c90 ("irqchip/sg2042-msi: Add the Sophgo SG2044 MSI interrupt controller")
Reported-by: Han Gao <rabenda.cn@gmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://lore.kernel.org/all/20250813232835.43458-4-inochiama@gmail.com

---
 drivers/irqchip/irq-sg2042-msi.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-sg2042-msi.c b/drivers/irqchip/irq-sg2042-msi.c
index bcfddc5..2fd4d94 100644
--- a/drivers/irqchip/irq-sg2042-msi.c
+++ b/drivers/irqchip/irq-sg2042-msi.c
@@ -85,6 +85,8 @@ static void sg2042_msi_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *m
 
 static const struct irq_chip sg2042_msi_middle_irq_chip = {
 	.name			= "SG2042 MSI",
+	.irq_startup		= irq_chip_startup_parent,
+	.irq_shutdown		= irq_chip_shutdown_parent,
 	.irq_ack		= sg2042_msi_irq_ack,
 	.irq_mask		= irq_chip_mask_parent,
 	.irq_unmask		= irq_chip_unmask_parent,
@@ -114,6 +116,8 @@ static void sg2044_msi_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *m
 
 static struct irq_chip sg2044_msi_middle_irq_chip = {
 	.name			= "SG2044 MSI",
+	.irq_startup		= irq_chip_startup_parent,
+	.irq_shutdown		= irq_chip_shutdown_parent,
 	.irq_ack		= sg2044_msi_irq_ack,
 	.irq_mask		= irq_chip_mask_parent,
 	.irq_unmask		= irq_chip_unmask_parent,
@@ -185,8 +189,10 @@ static const struct irq_domain_ops sg204x_msi_middle_domain_ops = {
 	.select	= msi_lib_irq_domain_select,
 };
 
-#define SG2042_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS |	\
-				   MSI_FLAG_USE_DEF_CHIP_OPS)
+#define SG2042_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS |		\
+				   MSI_FLAG_USE_DEF_CHIP_OPS |		\
+				   MSI_FLAG_PCI_MSI_MASK_PARENT |	\
+				   MSI_FLAG_PCI_MSI_STARTUP_PARENT)
 
 #define SG2042_MSI_FLAGS_SUPPORTED MSI_GENERIC_FLAGS_MASK
 
@@ -200,10 +206,12 @@ static const struct msi_parent_ops sg2042_msi_parent_ops = {
 	.init_dev_msi_info	= msi_lib_init_dev_msi_info,
 };
 
-#define SG2044_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS |	\
-				   MSI_FLAG_USE_DEF_CHIP_OPS)
+#define SG2044_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS |		\
+				   MSI_FLAG_USE_DEF_CHIP_OPS |		\
+				   MSI_FLAG_PCI_MSI_MASK_PARENT |	\
+				   MSI_FLAG_PCI_MSI_STARTUP_PARENT)
 
-#define SG2044_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK |	\
+#define SG2044_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK |		\
 				    MSI_FLAG_PCI_MSIX)
 
 static const struct msi_parent_ops sg2044_msi_parent_ops = {

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip: irq/drivers] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-13 23:28 ` [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains Inochi Amaoto
  2025-08-20 20:54   ` Bjorn Helgaas
@ 2025-08-23 19:30   ` tip-bot2 for Inochi Amaoto
  2025-08-26 19:45   ` [PATCH v2 2/4] " Anders Roxell
  2025-08-27  9:39   ` Wei Fang
  3 siblings, 0 replies; 24+ messages in thread
From: tip-bot2 for Inochi Amaoto @ 2025-08-23 19:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Inochi Amaoto, Chen Wang, Bjorn Helgaas, x86,
	linux-kernel

The following commit has been merged into the irq/drivers branch of tip:

Commit-ID:     54f45a30c0d0153d2be091ba2d683ab6db6d1d5b
Gitweb:        https://git.kernel.org/tip/54f45a30c0d0153d2be091ba2d683ab6db6d1d5b
Author:        Inochi Amaoto <inochiama@gmail.com>
AuthorDate:    Thu, 14 Aug 2025 07:28:32 +08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 23 Aug 2025 21:21:13 +02:00

PCI/MSI: Add startup/shutdown for per device domains

As the RISC-V PLIC cannot apply affinity settings without invoking
irq_enable(), it will make the interrupt unavailble when used as an
underlying interrupt chip for the MSI controller.

Implement the irq_startup() and irq_shutdown() callbacks for the PCI MSI
and MSI-X templates.

For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT, the parent startup
and shutdown functions are invoked. That allows the interrupt on the parent
chip to be enabled if the interrupt has not been enabled during
allocation. This is necessary for MSI controllers which use PLIC as
underlying parent interrupt chip.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250813232835.43458-3-inochiama@gmail.com

---
 drivers/pci/msi/irqdomain.c | 52 ++++++++++++++++++++++++++++++++++++-
 include/linux/msi.h         |  2 +-
 2 files changed, 54 insertions(+)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index 0938ef7..e0a800f 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -148,6 +148,23 @@ static void pci_device_domain_set_desc(msi_alloc_info_t *arg, struct msi_desc *d
 	arg->hwirq = desc->msi_index;
 }
 
+static void cond_shutdown_parent(struct irq_data *data)
+{
+	struct msi_domain_info *info = data->domain->host_data;
+
+	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
+		irq_chip_shutdown_parent(data);
+}
+
+static unsigned int cond_startup_parent(struct irq_data *data)
+{
+	struct msi_domain_info *info = data->domain->host_data;
+
+	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
+		return irq_chip_startup_parent(data);
+	return 0;
+}
+
 static __always_inline void cond_mask_parent(struct irq_data *data)
 {
 	struct msi_domain_info *info = data->domain->host_data;
@@ -164,6 +181,23 @@ static __always_inline void cond_unmask_parent(struct irq_data *data)
 		irq_chip_unmask_parent(data);
 }
 
+static void pci_irq_shutdown_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	pci_msi_mask(desc, BIT(data->irq - desc->irq));
+	cond_shutdown_parent(data);
+}
+
+static unsigned int pci_irq_startup_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	unsigned int ret = cond_startup_parent(data);
+
+	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
+	return ret;
+}
+
 static void pci_irq_mask_msi(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
@@ -194,6 +228,8 @@ static void pci_irq_unmask_msi(struct irq_data *data)
 static const struct msi_domain_template pci_msi_template = {
 	.chip = {
 		.name			= "PCI-MSI",
+		.irq_startup		= pci_irq_startup_msi,
+		.irq_shutdown		= pci_irq_shutdown_msi,
 		.irq_mask		= pci_irq_mask_msi,
 		.irq_unmask		= pci_irq_unmask_msi,
 		.irq_write_msi_msg	= pci_msi_domain_write_msg,
@@ -210,6 +246,20 @@ static const struct msi_domain_template pci_msi_template = {
 	},
 };
 
+static void pci_irq_shutdown_msix(struct irq_data *data)
+{
+	pci_msix_mask(irq_data_get_msi_desc(data));
+	cond_shutdown_parent(data);
+}
+
+static unsigned int pci_irq_startup_msix(struct irq_data *data)
+{
+	unsigned int ret = cond_startup_parent(data);
+
+	pci_msix_unmask(irq_data_get_msi_desc(data));
+	return ret;
+}
+
 static void pci_irq_mask_msix(struct irq_data *data)
 {
 	pci_msix_mask(irq_data_get_msi_desc(data));
@@ -234,6 +284,8 @@ EXPORT_SYMBOL_GPL(pci_msix_prepare_desc);
 static const struct msi_domain_template pci_msix_template = {
 	.chip = {
 		.name			= "PCI-MSIX",
+		.irq_startup		= pci_irq_startup_msix,
+		.irq_shutdown		= pci_irq_shutdown_msix,
 		.irq_mask		= pci_irq_mask_msix,
 		.irq_unmask		= pci_irq_unmask_msix,
 		.irq_write_msi_msg	= pci_msi_domain_write_msg,
diff --git a/include/linux/msi.h b/include/linux/msi.h
index e5e86a8..3111ba9 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -568,6 +568,8 @@ enum {
 	MSI_FLAG_PARENT_PM_DEV		= (1 << 8),
 	/* Support for parent mask/unmask */
 	MSI_FLAG_PCI_MSI_MASK_PARENT	= (1 << 9),
+	/* Support for parent startup/shutdown */
+	MSI_FLAG_PCI_MSI_STARTUP_PARENT	= (1 << 10),
 
 	/* Mask for the generic functionality */
 	MSI_GENERIC_FLAGS_MASK		= GENMASK(15, 0),

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-13 23:28 ` [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains Inochi Amaoto
  2025-08-20 20:54   ` Bjorn Helgaas
  2025-08-23 19:30   ` [tip: irq/drivers] " tip-bot2 for Inochi Amaoto
@ 2025-08-26 19:45   ` Anders Roxell
  2025-08-26 22:09     ` Nathan Chancellor
                       ` (2 more replies)
  2025-08-27  9:39   ` Wei Fang
  3 siblings, 3 replies; 24+ messages in thread
From: Anders Roxell @ 2025-08-26 19:45 UTC (permalink / raw)
  To: Inochi Amaoto, regressions, linux-next
  Cc: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Jonathan Cameron, Juergen Gross,
	Nicolin Chen, Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci,
	Yixun Lan, Longbin Li, arnd, dan.carpenter, naresh.kamboju,
	benjamin.copeland

On 2025-08-14 07:28, Inochi Amaoto wrote:
> As the RISC-V PLIC can not apply affinity setting without calling
> irq_enable(), it will make the interrupt unavailble when using as
> an underlying IRQ chip for MSI controller.
> 
> Implement .irq_startup() and .irq_shutdown() for the PCI MSI and
> MSI-X templates. For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT,
> these startup and shutdown the parent as well, which allows the
> irq on the parent chip to be enabled if the irq is not enabled
> when allocating. This is necessary for the MSI controllers which
> use PLIC as underlying IRQ chip.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Inochi Amaoto <inochiama@gmail.com>

Regressions found while booting the Linux next-20250826 on the
qemu-arm64, qemu-armv7 due to following kernel log.

Bisection identified this commit as the cause of the regression.

Regression Analysis:
- New regression? Yes
- Reproducible? Yes

First seen on the next-20250826
Good: next-20250825
Bad: next-20250826

Test regression: next-20250826 gcc-13 boot failed on qemu-arm64 and
qemu-armv7.

Expected behavior: System should boot normally and virtio block devices
should be detected and initialized immediately.

Actual behavior: System hangs for ~30 seconds during virtio block device
initialization before showing scheduler deadline replenish errors and
failing to complete boot.

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

[...]
<6>[    1.369038] virtio-pci 0000:00:01.0: enabling device (0000 ->
0003)
<6>[    1.420097] Serial: 8250/16550 driver, 4 ports, IRQ sharing
enabled
<6>[    1.450858] msm_serial: driver initialized
<6>[    1.454489] SuperH (H)SCI(F) driver initialized
<6>[    1.456056] STM32 USART driver initialized
<6>[    1.513325] loop: module loaded
<6>[    1.515744] virtio_blk virtio0: 2/0/0 default/read/poll queues
<5>[    1.527859] virtio_blk virtio0: [vda] 5397504 512-byte logical
blocks (2.76 GB/2.57 GiB)
<4>[   29.761219] sched: DL replenish lagged too much
[here it hangs]


Reverting this commit restores normal boot behavior.


qemu-arm64
 - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663822/suite/boot/test/gcc-13-lkftconfig/log

qemu-armv7
 - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663615/suite/boot/test/gcc-13-lkftconfig/log

## Source
* Git tree:
* https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
* Git sha: d0630b758e593506126e8eda6c3d56097d1847c5
* Git describe: next-20250826
* Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826
* Architectures: arm64
* Toolchains: gcc-13
* Kconfigs: gcc-13-lkftconfig


## Build
* Test history: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663822/suite/boot/test/gcc-13-lkftconfig/history/
* Test link: https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/31oo1cMOi0uSNKYApi80iQahbLi
* Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/31onzS5UmJVvvZucEhtB1veoJA1/
* Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/31onzS5UmJVvvZucEhtB1veoJA1/config

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-26 19:45   ` [PATCH v2 2/4] " Anders Roxell
@ 2025-08-26 22:09     ` Nathan Chancellor
  2025-08-27 10:33       ` Mark Brown
  2025-08-26 22:33     ` Inochi Amaoto
  2025-08-27  9:44     ` Inochi Amaoto
  2 siblings, 1 reply; 24+ messages in thread
From: Nathan Chancellor @ 2025-08-26 22:09 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Inochi Amaoto, regressions, linux-next, Thomas Gleixner,
	Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi, Shradha Gupta,
	Haiyang Zhang, Jonathan Cameron, Juergen Gross, Nicolin Chen,
	Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci, Yixun Lan,
	Longbin Li, arnd, dan.carpenter, naresh.kamboju,
	benjamin.copeland

On Tue, Aug 26, 2025 at 09:45:48PM +0200, Anders Roxell wrote:
> Regressions found while booting the Linux next-20250826 on the
> qemu-arm64, qemu-armv7 due to following kernel log.
> 
> Bisection identified this commit as the cause of the regression.
> 
> Regression Analysis:
> - New regression? Yes
> - Reproducible? Yes
> 
> First seen on the next-20250826
> Good: next-20250825
> Bad: next-20250826
> 
> Test regression: next-20250826 gcc-13 boot failed on qemu-arm64 and
> qemu-armv7.
> 
> Expected behavior: System should boot normally and virtio block devices
> should be detected and initialized immediately.
> 
> Actual behavior: System hangs for ~30 seconds during virtio block device
> initialization before showing scheduler deadline replenish errors and
> failing to complete boot.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> 
> [...]
> <6>[    1.369038] virtio-pci 0000:00:01.0: enabling device (0000 ->
> 0003)
> <6>[    1.420097] Serial: 8250/16550 driver, 4 ports, IRQ sharing
> enabled
> <6>[    1.450858] msm_serial: driver initialized
> <6>[    1.454489] SuperH (H)SCI(F) driver initialized
> <6>[    1.456056] STM32 USART driver initialized
> <6>[    1.513325] loop: module loaded
> <6>[    1.515744] virtio_blk virtio0: 2/0/0 default/read/poll queues
> <5>[    1.527859] virtio_blk virtio0: [vda] 5397504 512-byte logical
> blocks (2.76 GB/2.57 GiB)
> <4>[   29.761219] sched: DL replenish lagged too much
> [here it hangs]

FWIW, I am also seeing this on real arm64 hardware (an LX2160A board and
an Ampere Altra one) but with my NVMe drives failing to be recognized.
In somewhat ironic fashion, I am seeing the message from cover letter
repeating.

  nvme nvme0: I/O tag 8 (1008) QID 0 timeout, completion polled
  [  125.810062] dracut-initqueue[640]: Timed out while waiting for udev queue to empty.
  nvme nvme0: I/O tag 9 (1009) QID 0 timeout, completion polled

I am happy to test patches or provide information.

Cheers,
Nathan

# bad: [d0630b758e593506126e8eda6c3d56097d1847c5] Add linux-next specific files for 20250826
# good: [b6add54ba61890450fa54fd9327d10fdfd653439] Merge tag 'pinctrl-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect start 'd0630b758e593506126e8eda6c3d56097d1847c5' 'b6add54ba61890450fa54fd9327d10fdfd653439'
# good: [968d16786392f6e047329f5eff66acc131636019] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
git bisect good 968d16786392f6e047329f5eff66acc131636019
# good: [042e9f528d5362c499b5d8e2716cf6f64ca53add] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git
git bisect good 042e9f528d5362c499b5d8e2716cf6f64ca53add
# bad: [beebb75399dc36e7c244db0a08426053b4581ecc] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git
git bisect bad beebb75399dc36e7c244db0a08426053b4581ecc
# good: [62df8fb299358a45a915381de09025cf5e6a4a8f] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
git bisect good 62df8fb299358a45a915381de09025cf5e6a4a8f
# bad: [1e6d2dcb13c8d94b56de1eff60235ca90587046b] Merge branch 'master' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect bad 1e6d2dcb13c8d94b56de1eff60235ca90587046b
# bad: [a0daa9e939dbcd7767090151771d94ade75a4fd5] Merge branch into tip/master: 'x86/build'
git bisect bad a0daa9e939dbcd7767090151771d94ade75a4fd5
# bad: [d147a3db0dfa15c8e460f007128bd0fe2e1b877f] Merge branch into tip/master: 'perf/core'
git bisect bad d147a3db0dfa15c8e460f007128bd0fe2e1b877f
# good: [be5697d7136525a91e7f30fdca2e7de737d9a8ed] Merge branch into tip/master: 'irq/core'
git bisect good be5697d7136525a91e7f30fdca2e7de737d9a8ed
# good: [5d299897f1e36025400ca84fd36c15925a383b03] perf: Split out the RB allocation
git bisect good 5d299897f1e36025400ca84fd36c15925a383b03
# bad: [7fb83eb664e9b3a0438dd28859e9f0fd49d4c165] irqchip/loongson-eiointc: Route interrupt parsed from bios table
git bisect bad 7fb83eb664e9b3a0438dd28859e9f0fd49d4c165
# bad: [7ee4a5a2ec3748facfb4ca96e4cce6cabbdecab2] irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044
git bisect bad 7ee4a5a2ec3748facfb4ca96e4cce6cabbdecab2
# bad: [9d8c41816bac518b4824f83b346ae30a1be83f68] irqchip/sg2042-msi: Fix broken affinity setting
git bisect bad 9d8c41816bac518b4824f83b346ae30a1be83f68
# bad: [54f45a30c0d0153d2be091ba2d683ab6db6d1d5b] PCI/MSI: Add startup/shutdown for per device domains
git bisect bad 54f45a30c0d0153d2be091ba2d683ab6db6d1d5b
# first bad commit: [54f45a30c0d0153d2be091ba2d683ab6db6d1d5b] PCI/MSI: Add startup/shutdown for per device domains

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-26 19:45   ` [PATCH v2 2/4] " Anders Roxell
  2025-08-26 22:09     ` Nathan Chancellor
@ 2025-08-26 22:33     ` Inochi Amaoto
  2025-08-26 23:28       ` Inochi Amaoto
  2025-08-27  9:44     ` Inochi Amaoto
  2 siblings, 1 reply; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-26 22:33 UTC (permalink / raw)
  To: Anders Roxell, Inochi Amaoto, regressions, linux-next
  Cc: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Jonathan Cameron, Juergen Gross,
	Nicolin Chen, Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci,
	Yixun Lan, Longbin Li, arnd, dan.carpenter, naresh.kamboju,
	benjamin.copeland

On Tue, Aug 26, 2025 at 09:45:48PM +0200, Anders Roxell wrote:
> On 2025-08-14 07:28, Inochi Amaoto wrote:
> > As the RISC-V PLIC can not apply affinity setting without calling
> > irq_enable(), it will make the interrupt unavailble when using as
> > an underlying IRQ chip for MSI controller.
> > 
> > Implement .irq_startup() and .irq_shutdown() for the PCI MSI and
> > MSI-X templates. For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT,
> > these startup and shutdown the parent as well, which allows the
> > irq on the parent chip to be enabled if the irq is not enabled
> > when allocating. This is necessary for the MSI controllers which
> > use PLIC as underlying IRQ chip.
> > 
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
> 
> Regressions found while booting the Linux next-20250826 on the
> qemu-arm64, qemu-armv7 due to following kernel log.
> 
> Bisection identified this commit as the cause of the regression.
> 
> Regression Analysis:
> - New regression? Yes
> - Reproducible? Yes
> 
> First seen on the next-20250826
> Good: next-20250825
> Bad: next-20250826
> 
> Test regression: next-20250826 gcc-13 boot failed on qemu-arm64 and
> qemu-armv7.
> 
> Expected behavior: System should boot normally and virtio block devices
> should be detected and initialized immediately.
> 
> Actual behavior: System hangs for ~30 seconds during virtio block device
> initialization before showing scheduler deadline replenish errors and
> failing to complete boot.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> 
> [...]
> <6>[    1.369038] virtio-pci 0000:00:01.0: enabling device (0000 ->
> 0003)
> <6>[    1.420097] Serial: 8250/16550 driver, 4 ports, IRQ sharing
> enabled
> <6>[    1.450858] msm_serial: driver initialized
> <6>[    1.454489] SuperH (H)SCI(F) driver initialized
> <6>[    1.456056] STM32 USART driver initialized
> <6>[    1.513325] loop: module loaded
> <6>[    1.515744] virtio_blk virtio0: 2/0/0 default/read/poll queues
> <5>[    1.527859] virtio_blk virtio0: [vda] 5397504 512-byte logical
> blocks (2.76 GB/2.57 GiB)
> <4>[   29.761219] sched: DL replenish lagged too much
> [here it hangs]
> 
> 
> Reverting this commit restores normal boot behavior.
> 
> 
> qemu-arm64
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663822/suite/boot/test/gcc-13-lkftconfig/log
> 
> qemu-armv7
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663615/suite/boot/test/gcc-13-lkftconfig/log
> 
> ## Source
> * Git tree:
> * https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
> * Git sha: d0630b758e593506126e8eda6c3d56097d1847c5
> * Git describe: next-20250826
> * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826
> * Architectures: arm64
> * Toolchains: gcc-13
> * Kconfigs: gcc-13-lkftconfig
> 
> 
> ## Build
> * Test history: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663822/suite/boot/test/gcc-13-lkftconfig/history/
> * Test link: https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/31oo1cMOi0uSNKYApi80iQahbLi
> * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/31onzS5UmJVvvZucEhtB1veoJA1/
> * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/31onzS5UmJVvvZucEhtB1veoJA1/config
> 

Is there a link for me to get the command line args for qemu? So I can
reproduce it locally.

Regards,
Inochi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-26 22:33     ` Inochi Amaoto
@ 2025-08-26 23:28       ` Inochi Amaoto
  2025-08-27  0:47         ` Nathan Chancellor
  0 siblings, 1 reply; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-26 23:28 UTC (permalink / raw)
  To: Anders Roxell, Nathan Chancellor, Inochi Amaoto, regressions,
	linux-next
  Cc: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Jonathan Cameron, Juergen Gross,
	Nicolin Chen, Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci,
	Yixun Lan, Longbin Li, arnd, dan.carpenter, naresh.kamboju,
	benjamin.copeland

On Wed, Aug 27, 2025 at 06:33:44AM +0800, Inochi Amaoto wrote:
> On Tue, Aug 26, 2025 at 09:45:48PM +0200, Anders Roxell wrote:
> > On 2025-08-14 07:28, Inochi Amaoto wrote:
> > > As the RISC-V PLIC can not apply affinity setting without calling
> > > irq_enable(), it will make the interrupt unavailble when using as
> > > an underlying IRQ chip for MSI controller.
> > > 
> > > Implement .irq_startup() and .irq_shutdown() for the PCI MSI and
> > > MSI-X templates. For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT,
> > > these startup and shutdown the parent as well, which allows the
> > > irq on the parent chip to be enabled if the irq is not enabled
> > > when allocating. This is necessary for the MSI controllers which
> > > use PLIC as underlying IRQ chip.
> > > 
> > > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > > Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
> > 
> > Regressions found while booting the Linux next-20250826 on the
> > qemu-arm64, qemu-armv7 due to following kernel log.
> > 
> > Bisection identified this commit as the cause of the regression.
> > 
> > Regression Analysis:
> > - New regression? Yes
> > - Reproducible? Yes
> > 
> > First seen on the next-20250826
> > Good: next-20250825
> > Bad: next-20250826
> > 
> > Test regression: next-20250826 gcc-13 boot failed on qemu-arm64 and
> > qemu-armv7.
> > 
> > Expected behavior: System should boot normally and virtio block devices
> > should be detected and initialized immediately.
> > 
> > Actual behavior: System hangs for ~30 seconds during virtio block device
> > initialization before showing scheduler deadline replenish errors and
> > failing to complete boot.
> > 
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > 
> > [...]
> > <6>[    1.369038] virtio-pci 0000:00:01.0: enabling device (0000 ->
> > 0003)
> > <6>[    1.420097] Serial: 8250/16550 driver, 4 ports, IRQ sharing
> > enabled
> > <6>[    1.450858] msm_serial: driver initialized
> > <6>[    1.454489] SuperH (H)SCI(F) driver initialized
> > <6>[    1.456056] STM32 USART driver initialized
> > <6>[    1.513325] loop: module loaded
> > <6>[    1.515744] virtio_blk virtio0: 2/0/0 default/read/poll queues
> > <5>[    1.527859] virtio_blk virtio0: [vda] 5397504 512-byte logical
> > blocks (2.76 GB/2.57 GiB)
> > <4>[   29.761219] sched: DL replenish lagged too much
> > [here it hangs]
> > 
> > 
> > Reverting this commit restores normal boot behavior.
> > 
> > 
> > qemu-arm64
> >  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663822/suite/boot/test/gcc-13-lkftconfig/log
> > 
> > qemu-armv7
> >  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663615/suite/boot/test/gcc-13-lkftconfig/log
> > 
> > ## Source
> > * Git tree:
> > * https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
> > * Git sha: d0630b758e593506126e8eda6c3d56097d1847c5
> > * Git describe: next-20250826
> > * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826
> > * Architectures: arm64
> > * Toolchains: gcc-13
> > * Kconfigs: gcc-13-lkftconfig
> > 
> > 
> > ## Build
> > * Test history: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663822/suite/boot/test/gcc-13-lkftconfig/history/
> > * Test link: https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/31oo1cMOi0uSNKYApi80iQahbLi
> > * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/31onzS5UmJVvvZucEhtB1veoJA1/
> > * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/31onzS5UmJVvvZucEhtB1veoJA1/config
> > 
> 
> Is there a link for me to get the command line args for qemu? So I can
> reproduce it locally.
> 

OK, I guess I know why: I have missed one condition for startup.

Could you test the following patch? If worked, I will send it as
a fix.

---
diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index e0a800f918e8..b11b7f63f0d6 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -154,6 +154,8 @@ static void cond_shutdown_parent(struct irq_data *data)
 
 	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
 		irq_chip_shutdown_parent(data);
+	else if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
+		irq_chip_mask_parent(data);
 }
 
 static unsigned int cond_startup_parent(struct irq_data *data)
@@ -162,6 +164,9 @@ static unsigned int cond_startup_parent(struct irq_data *data)
 
 	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
 		return irq_chip_startup_parent(data);
+	else if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
+		irq_chip_unmask_parent(data);
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-26 23:28       ` Inochi Amaoto
@ 2025-08-27  0:47         ` Nathan Chancellor
  2025-08-27  8:17           ` Naresh Kamboju
  0 siblings, 1 reply; 24+ messages in thread
From: Nathan Chancellor @ 2025-08-27  0:47 UTC (permalink / raw)
  To: Inochi Amaoto
  Cc: Anders Roxell, regressions, linux-next, Thomas Gleixner,
	Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi, Shradha Gupta,
	Haiyang Zhang, Jonathan Cameron, Juergen Gross, Nicolin Chen,
	Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci, Yixun Lan,
	Longbin Li, arnd, dan.carpenter, naresh.kamboju,
	benjamin.copeland

On Wed, Aug 27, 2025 at 07:28:46AM +0800, Inochi Amaoto wrote:
> OK, I guess I know why: I have missed one condition for startup.
> 
> Could you test the following patch? If worked, I will send it as
> a fix.

Yes, that appears to resolve the issue on one system. I cannot test the
other at the moment since it is under load.

Tested-by: Nathan Chancellor <nathan@kernel.org>

> ---
> diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> index e0a800f918e8..b11b7f63f0d6 100644
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -154,6 +154,8 @@ static void cond_shutdown_parent(struct irq_data *data)
>  
>  	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
>  		irq_chip_shutdown_parent(data);
> +	else if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> +		irq_chip_mask_parent(data);
>  }
>  
>  static unsigned int cond_startup_parent(struct irq_data *data)
> @@ -162,6 +164,9 @@ static unsigned int cond_startup_parent(struct irq_data *data)
>  
>  	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
>  		return irq_chip_startup_parent(data);
> +	else if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> +		irq_chip_unmask_parent(data);
> +
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-27  0:47         ` Nathan Chancellor
@ 2025-08-27  8:17           ` Naresh Kamboju
  2025-08-27  9:45             ` Inochi Amaoto
  0 siblings, 1 reply; 24+ messages in thread
From: Naresh Kamboju @ 2025-08-27  8:17 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Inochi Amaoto, Anders Roxell, regressions, linux-next,
	Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Jonathan Cameron, Juergen Gross,
	Nicolin Chen, Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci,
	Yixun Lan, Longbin Li, arnd, dan.carpenter, benjamin.copeland

On Wed, 27 Aug 2025 at 06:17, Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Wed, Aug 27, 2025 at 07:28:46AM +0800, Inochi Amaoto wrote:
> > OK, I guess I know why: I have missed one condition for startup.
> >
> > Could you test the following patch? If worked, I will send it as
> > a fix.
>
> Yes, that appears to resolve the issue on one system. I cannot test the
> other at the moment since it is under load.

I have built on top of Linux next-20250826 tag and the qemu-arm64 boot test
pass and LTP smoke test also pass.

>
> Tested-by: Nathan Chancellor <nathan@kernel.org>

Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>

>
> > ---
> > diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> > index e0a800f918e8..b11b7f63f0d6 100644
> > --- a/drivers/pci/msi/irqdomain.c
> > +++ b/drivers/pci/msi/irqdomain.c
> > @@ -154,6 +154,8 @@ static void cond_shutdown_parent(struct irq_data *data)
> >
> >       if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
> >               irq_chip_shutdown_parent(data);
> > +     else if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> > +             irq_chip_mask_parent(data);
> >  }
> >
> >  static unsigned int cond_startup_parent(struct irq_data *data)
> > @@ -162,6 +164,9 @@ static unsigned int cond_startup_parent(struct irq_data *data)
> >
> >       if (unlikely(info->flags & MSI_FLAG_PCI_MSI_STARTUP_PARENT))
> >               return irq_chip_startup_parent(data);
> > +     else if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> > +             irq_chip_unmask_parent(data);
> > +
> >       return 0;
> >  }
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-13 23:28 ` [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains Inochi Amaoto
                     ` (2 preceding siblings ...)
  2025-08-26 19:45   ` [PATCH v2 2/4] " Anders Roxell
@ 2025-08-27  9:39   ` Wei Fang
  2025-08-27 10:14     ` Chen Wang
  2025-08-27 10:14     ` Inochi Amaoto
  3 siblings, 2 replies; 24+ messages in thread
From: Wei Fang @ 2025-08-27  9:39 UTC (permalink / raw)
  To: inochiama
  Cc: Jonathan.Cameron, bhelgaas, dlan, haiyangz, jgg, jgross,
	linux-kernel, linux-pci, looong.bin, lpieralisi, maz, nicolinc,
	shradhagupta, tglx, unicorn_wang

We found an issue that the ENETC network port of our i.MX95 platform
(arm64) does not work based the latest linux-next tree. According to
my observation, the MSI-X interrupts statistics from
"cat /proc/interrupts" are all 0.

root@imx95evk:~# cat /proc/interrupts | grep eth0
123:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   1 Edge      eth0-rxtx0
124:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   2 Edge      eth0-rxtx1
125:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   3 Edge      eth0-rxtx2
126:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   4 Edge      eth0-rxtx3
127:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   5 Edge      eth0-rxtx4
128:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   6 Edge      eth0-rxtx5


So I reverted this patch and then the MSI-X interrupts return to normal. 

root@imx95evk:~# cat /proc/interrupts | grep eth0
123:       4365          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   1 Edge      eth0-rxtx0
124:          0        194          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   2 Edge      eth0-rxtx1
125:          0          0        227          0          0          0 ITS-PCI-MSIX-0002:00:00.0   3 Edge      eth0-rxtx2
126:          0          0          0        219          0          0 ITS-PCI-MSIX-0002:00:00.0   4 Edge      eth0-rxtx3
127:          0          0          0          0        176          0 ITS-PCI-MSIX-0002:00:00.0   5 Edge      eth0-rxtx4
128:          0          0          0          0          0        233 ITS-PCI-MSIX-0002:00:00.0   6 Edge      eth0-rxtx5

It looks like that this patch causes this issue, but I don't know about
the PCI MSI driver, so please help investigate this issue, thanks.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-26 19:45   ` [PATCH v2 2/4] " Anders Roxell
  2025-08-26 22:09     ` Nathan Chancellor
  2025-08-26 22:33     ` Inochi Amaoto
@ 2025-08-27  9:44     ` Inochi Amaoto
  2 siblings, 0 replies; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-27  9:44 UTC (permalink / raw)
  To: Anders Roxell, Inochi Amaoto, regressions, linux-next
  Cc: Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Jonathan Cameron, Juergen Gross,
	Nicolin Chen, Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci,
	Yixun Lan, Longbin Li, arnd, dan.carpenter, naresh.kamboju,
	benjamin.copeland

On Tue, Aug 26, 2025 at 09:45:48PM +0200, Anders Roxell wrote:
> On 2025-08-14 07:28, Inochi Amaoto wrote:
> > As the RISC-V PLIC can not apply affinity setting without calling
> > irq_enable(), it will make the interrupt unavailble when using as
> > an underlying IRQ chip for MSI controller.
> > 
> > Implement .irq_startup() and .irq_shutdown() for the PCI MSI and
> > MSI-X templates. For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT,
> > these startup and shutdown the parent as well, which allows the
> > irq on the parent chip to be enabled if the irq is not enabled
> > when allocating. This is necessary for the MSI controllers which
> > use PLIC as underlying IRQ chip.
> > 
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
> 
> Regressions found while booting the Linux next-20250826 on the
> qemu-arm64, qemu-armv7 due to following kernel log.
> 
> Bisection identified this commit as the cause of the regression.
> 
> Regression Analysis:
> - New regression? Yes
> - Reproducible? Yes
> 
> First seen on the next-20250826
> Good: next-20250825
> Bad: next-20250826
> 
> Test regression: next-20250826 gcc-13 boot failed on qemu-arm64 and
> qemu-armv7.
> 
> Expected behavior: System should boot normally and virtio block devices
> should be detected and initialized immediately.
> 
> Actual behavior: System hangs for ~30 seconds during virtio block device
> initialization before showing scheduler deadline replenish errors and
> failing to complete boot.
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> 
> [...]
> <6>[    1.369038] virtio-pci 0000:00:01.0: enabling device (0000 ->
> 0003)
> <6>[    1.420097] Serial: 8250/16550 driver, 4 ports, IRQ sharing
> enabled
> <6>[    1.450858] msm_serial: driver initialized
> <6>[    1.454489] SuperH (H)SCI(F) driver initialized
> <6>[    1.456056] STM32 USART driver initialized
> <6>[    1.513325] loop: module loaded
> <6>[    1.515744] virtio_blk virtio0: 2/0/0 default/read/poll queues
> <5>[    1.527859] virtio_blk virtio0: [vda] 5397504 512-byte logical
> blocks (2.76 GB/2.57 GiB)
> <4>[   29.761219] sched: DL replenish lagged too much
> [here it hangs]
> 
> 
> Reverting this commit restores normal boot behavior.
> 
> 
> qemu-arm64
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663822/suite/boot/test/gcc-13-lkftconfig/log
> 
> qemu-armv7
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663615/suite/boot/test/gcc-13-lkftconfig/log
> 
> ## Source
> * Git tree:
> * https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
> * Git sha: d0630b758e593506126e8eda6c3d56097d1847c5
> * Git describe: next-20250826
> * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826
> * Architectures: arm64
> * Toolchains: gcc-13
> * Kconfigs: gcc-13-lkftconfig
> 
> 
> ## Build
> * Test history: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250826/testrun/29663822/suite/boot/test/gcc-13-lkftconfig/history/
> * Test link: https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/31oo1cMOi0uSNKYApi80iQahbLi
> * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/31onzS5UmJVvvZucEhtB1veoJA1/
> * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/31onzS5UmJVvvZucEhtB1veoJA1/config
> 
> --
> Linaro LKFT
> https://lkft.linaro.org

Fix patch is here:

https://lore.kernel.org/all/20250827062911.203106-1-inochiama@gmail.com/

Regards,
Inochi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-27  8:17           ` Naresh Kamboju
@ 2025-08-27  9:45             ` Inochi Amaoto
  0 siblings, 0 replies; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-27  9:45 UTC (permalink / raw)
  To: Naresh Kamboju, Nathan Chancellor
  Cc: Inochi Amaoto, Anders Roxell, regressions, linux-next,
	Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Jonathan Cameron, Juergen Gross,
	Nicolin Chen, Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci,
	Yixun Lan, Longbin Li, arnd, dan.carpenter, benjamin.copeland

On Wed, Aug 27, 2025 at 01:47:14PM +0530, Naresh Kamboju wrote:
> On Wed, 27 Aug 2025 at 06:17, Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > On Wed, Aug 27, 2025 at 07:28:46AM +0800, Inochi Amaoto wrote:
> > > OK, I guess I know why: I have missed one condition for startup.
> > >
> > > Could you test the following patch? If worked, I will send it as
> > > a fix.
> >
> > Yes, that appears to resolve the issue on one system. I cannot test the
> > other at the moment since it is under load.
> 
> I have built on top of Linux next-20250826 tag and the qemu-arm64 boot test
> pass and LTP smoke test also pass.
> 
> >
> > Tested-by: Nathan Chancellor <nathan@kernel.org>
> 
> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
> 

Thanks for your tag, can you resend you tag to the following url?
I have sent a fix patch here. Thanks.

https://lore.kernel.org/all/20250827062911.203106-1-inochiama@gmail.com/

Regards,
Inochi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-27  9:39   ` Wei Fang
@ 2025-08-27 10:14     ` Chen Wang
  2025-08-27 10:14     ` Inochi Amaoto
  1 sibling, 0 replies; 24+ messages in thread
From: Chen Wang @ 2025-08-27 10:14 UTC (permalink / raw)
  To: Wei Fang, inochiama
  Cc: Jonathan.Cameron, bhelgaas, dlan, haiyangz, jgg, jgross,
	linux-kernel, linux-pci, looong.bin, lpieralisi, maz, nicolinc,
	shradhagupta, tglx


On 8/27/2025 5:39 PM, Wei Fang wrote:
> We found an issue that the ENETC network port of our i.MX95 platform
> (arm64) does not work based the latest linux-next tree. According to
> my observation, the MSI-X interrupts statistics from
> "cat /proc/interrupts" are all 0.
>
> root@imx95evk:~# cat /proc/interrupts | grep eth0
> 123:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   1 Edge      eth0-rxtx0
> 124:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   2 Edge      eth0-rxtx1
> 125:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   3 Edge      eth0-rxtx2
> 126:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   4 Edge      eth0-rxtx3
> 127:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   5 Edge      eth0-rxtx4
> 128:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   6 Edge      eth0-rxtx5
>
>
> So I reverted this patch and then the MSI-X interrupts return to normal.
>
> root@imx95evk:~# cat /proc/interrupts | grep eth0
> 123:       4365          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   1 Edge      eth0-rxtx0
> 124:          0        194          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   2 Edge      eth0-rxtx1
> 125:          0          0        227          0          0          0 ITS-PCI-MSIX-0002:00:00.0   3 Edge      eth0-rxtx2
> 126:          0          0          0        219          0          0 ITS-PCI-MSIX-0002:00:00.0   4 Edge      eth0-rxtx3
> 127:          0          0          0          0        176          0 ITS-PCI-MSIX-0002:00:00.0   5 Edge      eth0-rxtx4
> 128:          0          0          0          0          0        233 ITS-PCI-MSIX-0002:00:00.0   6 Edge      eth0-rxtx5
>
> It looks like that this patch causes this issue, but I don't know about
> the PCI MSI driver, so please help investigate this issue, thanks.

Some people reported a similar issue, check the whole mail thread and 
inochi has provided a fixing patch, you can try it out.

Thanks,

Chen


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-27  9:39   ` Wei Fang
  2025-08-27 10:14     ` Chen Wang
@ 2025-08-27 10:14     ` Inochi Amaoto
  1 sibling, 0 replies; 24+ messages in thread
From: Inochi Amaoto @ 2025-08-27 10:14 UTC (permalink / raw)
  To: Wei Fang, inochiama
  Cc: Jonathan.Cameron, bhelgaas, dlan, haiyangz, jgg, jgross,
	linux-kernel, linux-pci, looong.bin, lpieralisi, maz, nicolinc,
	shradhagupta, tglx, unicorn_wang

On Wed, Aug 27, 2025 at 05:39:11PM +0800, Wei Fang wrote:
> We found an issue that the ENETC network port of our i.MX95 platform
> (arm64) does not work based the latest linux-next tree. According to
> my observation, the MSI-X interrupts statistics from
> "cat /proc/interrupts" are all 0.
> 
> root@imx95evk:~# cat /proc/interrupts | grep eth0
> 123:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   1 Edge      eth0-rxtx0
> 124:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   2 Edge      eth0-rxtx1
> 125:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   3 Edge      eth0-rxtx2
> 126:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   4 Edge      eth0-rxtx3
> 127:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   5 Edge      eth0-rxtx4
> 128:          0          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   6 Edge      eth0-rxtx5
> 
> 
> So I reverted this patch and then the MSI-X interrupts return to normal. 
> 
> root@imx95evk:~# cat /proc/interrupts | grep eth0
> 123:       4365          0          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   1 Edge      eth0-rxtx0
> 124:          0        194          0          0          0          0 ITS-PCI-MSIX-0002:00:00.0   2 Edge      eth0-rxtx1
> 125:          0          0        227          0          0          0 ITS-PCI-MSIX-0002:00:00.0   3 Edge      eth0-rxtx2
> 126:          0          0          0        219          0          0 ITS-PCI-MSIX-0002:00:00.0   4 Edge      eth0-rxtx3
> 127:          0          0          0          0        176          0 ITS-PCI-MSIX-0002:00:00.0   5 Edge      eth0-rxtx4
> 128:          0          0          0          0          0        233 ITS-PCI-MSIX-0002:00:00.0   6 Edge      eth0-rxtx5
> 
> It looks like that this patch causes this issue, but I don't know about
> the PCI MSI driver, so please help investigate this issue, thanks.
> 

Can you try the following patch?

https://lore.kernel.org/all/20250827062911.203106-1-inochiama@gmail.com/

Regards,
Inochi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains
  2025-08-26 22:09     ` Nathan Chancellor
@ 2025-08-27 10:33       ` Mark Brown
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Brown @ 2025-08-27 10:33 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Anders Roxell, Inochi Amaoto, regressions, linux-next,
	Thomas Gleixner, Bjorn Helgaas, Marc Zyngier, Lorenzo Pieralisi,
	Shradha Gupta, Haiyang Zhang, Jonathan Cameron, Juergen Gross,
	Nicolin Chen, Jason Gunthorpe, Chen Wang, linux-kernel, linux-pci,
	Yixun Lan, Longbin Li, arnd, dan.carpenter, naresh.kamboju,
	benjamin.copeland

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

On Tue, Aug 26, 2025 at 03:09:59PM -0700, Nathan Chancellor wrote:
> On Tue, Aug 26, 2025 at 09:45:48PM +0200, Anders Roxell wrote:

> > <5>[    1.527859] virtio_blk virtio0: [vda] 5397504 512-byte logical
> > blocks (2.76 GB/2.57 GiB)
> > <4>[   29.761219] sched: DL replenish lagged too much
> > [here it hangs]

> FWIW, I am also seeing this on real arm64 hardware (an LX2160A board and
> an Ampere Altra one) but with my NVMe drives failing to be recognized.
> In somewhat ironic fashion, I am seeing the message from cover letter
> repeating.

>   nvme nvme0: I/O tag 8 (1008) QID 0 timeout, completion polled
>   [  125.810062] dracut-initqueue[640]: Timed out while waiting for udev queue to empty.
>   nvme nvme0: I/O tag 9 (1009) QID 0 timeout, completion polled

> I am happy to test patches or provide information.

Same here, it's breaking at least Orion O6.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-08-27 10:33 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 23:28 [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
2025-08-13 23:28 ` [PATCH v2 1/4] genirq: Add irq_chip_(startup/shutdown)_parent() Inochi Amaoto
2025-08-23 19:28   ` [tip: irq/core] " tip-bot2 for Inochi Amaoto
2025-08-13 23:28 ` [PATCH v2 2/4] PCI/MSI: Add startup/shutdown for per device domains Inochi Amaoto
2025-08-20 20:54   ` Bjorn Helgaas
2025-08-23 19:08     ` Thomas Gleixner
2025-08-23 19:30   ` [tip: irq/drivers] " tip-bot2 for Inochi Amaoto
2025-08-26 19:45   ` [PATCH v2 2/4] " Anders Roxell
2025-08-26 22:09     ` Nathan Chancellor
2025-08-27 10:33       ` Mark Brown
2025-08-26 22:33     ` Inochi Amaoto
2025-08-26 23:28       ` Inochi Amaoto
2025-08-27  0:47         ` Nathan Chancellor
2025-08-27  8:17           ` Naresh Kamboju
2025-08-27  9:45             ` Inochi Amaoto
2025-08-27  9:44     ` Inochi Amaoto
2025-08-27  9:39   ` Wei Fang
2025-08-27 10:14     ` Chen Wang
2025-08-27 10:14     ` Inochi Amaoto
2025-08-13 23:28 ` [PATCH v2 3/4] irqchip/sg2042-msi: Fix broken affinity setting Inochi Amaoto
2025-08-23 19:30   ` [tip: irq/drivers] " tip-bot2 for Inochi Amaoto
2025-08-13 23:28 ` [PATCH v2 4/4] irqchip/sg2042-msi: Set MSI_FLAG_MULTI_PCI_MSI flags for SG2044 Inochi Amaoto
2025-08-23 19:30   ` [tip: irq/drivers] " tip-bot2 for Inochi Amaoto
2025-08-21  6:38 ` [PATCH v2 0/4] irqchip/sg2042-msi: Fix broken affinity setting Chen Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).