[PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
@ 2025-02-04 17:52 Tony Nguyen
  2025-02-04 17:52 ` [PATCH net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Tony Nguyen
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Tony Nguyen @ 2025-02-04 17:52 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Tony Nguyen, wander, rostedt, clrkwllms, bigeasy, jgarzik, yuma,
	linux-rt-devel

Wander Lairson Costa says:

This is the second attempt at fixing the behavior of igb_msix_other()
for PREEMPT_RT. The previous attempt [1] was reverted [2] following
concerns raised by Sebastian [3].

The initial approach proposed converting vfs_lock to a raw_spinlock,
a minor change intended to make it safe. However, it became evident
that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
which is unsafe in interrupt context on PREEMPT_RT systems.

To address this, the solution involves splitting igb_msg_task()
into two parts:

    * One part invoked from the IRQ context.
    * Another part called from the threaded interrupt handler.

To accommodate this, vfs_lock has been restructured into a double
lock: a spinlock_t and a raw_spinlock_t. In the revised design:

    * igb_disable_sriov() locks both spinlocks.
    * Each part of igb_msg_task() locks the appropriate spinlock for
    its execution context.

It is worth noting that the double lock mechanism is only active under
PREEMPT_RT. For non-PREEMPT_RT builds, the additional raw_spinlock_t
field is omitted.

If the extra raw_spinlock_t field can be tolerated under
!PREEMPT_RT (even though it remains unused), we can eliminate the
need for #ifdefs and simplify the code structure.

[1] https://lore.kernel.org/all/20240920185918.616302-2-wander@redhat.com/
[2] https://lore.kernel.org/all/20241104124050.22290-1-wander@redhat.com/
[3] https://lore.kernel.org/all/20241104110708.gFyxRFlC@linutronix.de/
---
IWL: https://lore.kernel.org/intel-wired-lan/20241204114229.21452-1-wander@redhat.com/

The following are changes since commit 4241a702e0d0c2ca9364cfac08dbf134264962de:
  rxrpc: Fix the rxrpc_connection attend queue handling
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 1GbE

Wander Lairson Costa (4):
  igb: narrow scope of vfs_lock in SR-IOV cleanup
  igb: introduce raw vfs_lock to igb_adapter
  igb: split igb_msg_task()
  igb: fix igb_msix_other() handling for PREEMPT_RT

 drivers/net/ethernet/intel/igb/igb.h      |   4 +
 drivers/net/ethernet/intel/igb/igb_main.c | 160 +++++++++++++++++++---
 2 files changed, 148 insertions(+), 16 deletions(-)

-- 
2.47.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup
  2025-02-04 17:52 [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
@ 2025-02-04 17:52 ` Tony Nguyen
  2025-02-04 17:52 ` [PATCH net 2/4] igb: introduce raw vfs_lock to igb_adapter Tony Nguyen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: Tony Nguyen @ 2025-02-04 17:52 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Wander Lairson Costa, anthony.l.nguyen, rostedt, clrkwllms,
	bigeasy, jgarzik, yuma, linux-rt-devel, Rafal Romanowski

From: Wander Lairson Costa <wander@redhat.com>

The adapter->vfs_lock currently protects critical sections shared between
igb_disable_sriov() and igb_msg_task(). Since igb_msg_task() - which is
invoked solely by the igb_msix_other() ISR-only proceeds when
adapter->vfs_allocated_count > 0, we can reduce the lock scope further.

By moving the assignment adapter->vfs_allocated_count = 0 to the start of the
cleanup code in igb_disable_sriov(), we can restrict the spinlock protection
solely to this assignment. This change removes kfree() calls from within the
locked section, simplifying lock management.

Once kfree() is outside the vfs_lock scope, it becomes possible to safely
convert vfs_lock to a raw_spin_lock.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index d368b753a467..77571f6fdbfd 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3708,12 +3708,12 @@ static int igb_disable_sriov(struct pci_dev *pdev, bool reinit)
 			msleep(500);
 		}
 		spin_lock_irqsave(&adapter->vfs_lock, flags);
+		adapter->vfs_allocated_count = 0;
+		spin_unlock_irqrestore(&adapter->vfs_lock, flags);
 		kfree(adapter->vf_mac_list);
 		adapter->vf_mac_list = NULL;
 		kfree(adapter->vf_data);
 		adapter->vf_data = NULL;
-		adapter->vfs_allocated_count = 0;
-		spin_unlock_irqrestore(&adapter->vfs_lock, flags);
 		wr32(E1000_IOVCTL, E1000_IOVCTL_REUSE_VFQ);
 		wrfl();
 		msleep(100);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 2/4] igb: introduce raw vfs_lock to igb_adapter
  2025-02-04 17:52 [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
  2025-02-04 17:52 ` [PATCH net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Tony Nguyen
@ 2025-02-04 17:52 ` Tony Nguyen
  2025-02-04 17:52 ` [PATCH net 3/4] igb: split igb_msg_task() Tony Nguyen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: Tony Nguyen @ 2025-02-04 17:52 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Wander Lairson Costa, anthony.l.nguyen, rostedt, clrkwllms,
	bigeasy, jgarzik, yuma, linux-rt-devel, Clark Williams,
	Rafal Romanowski

From: Wander Lairson Costa <wander@redhat.com>

This change adds a raw_spinlock for the vfs_lock to the
igb_adapter structure, enabling its use in both interrupt and
preemptible contexts. This is essential for upcoming modifications
to split igb_msg_task() into interrupt-safe and preemptible-safe
parts.

The motivation for this change stems from the need to modify
igb_msix_other() to run in interrupt context under PREEMPT_RT.
Currently, igb_msg_task() contains a code path that invokes
kcalloc() with the GFP_ATOMIC flag. However, on PREEMPT_RT,
GFP_ATOMIC is not honored, making it unsafe to call allocation
functions in interrupt context. By introducing this raw spinlock,
we can safely acquire the lock in both contexts, paving the way for
the necessary restructuring of igb_msg_task().

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Suggested-by: Clark Williams <williams@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |  4 ++
 drivers/net/ethernet/intel/igb/igb_main.c | 51 ++++++++++++++++++++---
 2 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 02f340280d20..30a188c5710e 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -673,6 +673,10 @@ struct igb_adapter {
 	struct vf_mac_filter *vf_mac_list;
 	/* lock for VF resources */
 	spinlock_t vfs_lock;
+#ifdef CONFIG_PREEMPT_RT
+	/* Used to lock VFS in interrupt context under PREEMPT_RT */
+	raw_spinlock_t raw_vfs_lock;
+#endif
 };
 
 /* flags controlling PTP/1588 function */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 77571f6fdbfd..4e75c88f6214 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3657,6 +3657,47 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return err;
 }
 
+#ifdef CONFIG_PREEMPT_RT
+static __always_inline void vfs_lock_init(struct igb_adapter *adapter)
+{
+	spin_lock_init(&adapter->vfs_lock);
+	raw_spin_lock_init(&adapter->raw_vfs_lock);
+}
+
+static __always_inline void vfs_lock_irqsave(struct igb_adapter *adapter,
+					     unsigned long *flags)
+{
+	/*
+	 * Remember that under PREEMPT_RT spin_lock_irqsave
+	 * ignores the flags parameter
+	 */
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+	raw_spin_lock_irqsave(&adapter->raw_vfs_lock, *flags);
+}
+
+static __always_inline void vfs_unlock_irqrestore(struct igb_adapter *adapter,
+						  unsigned long flags)
+{
+	raw_spin_unlock_irqrestore(&adapter->raw_vfs_lock, flags);
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
+#else
+static __always_inline void vfs_lock_init(struct igb_adapter *adapter)
+{
+	spin_lock_init(&adapter->vfs_lock);
+}
+
+static __always_inline void vfs_lock_irqsave(struct igb_adapter *adapter, unsigned long *flags)
+{
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+}
+
+static __always_inline void vfs_unlock_irqrestore(struct igb_adapter *adapter, unsigned long flags)
+{
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
+#endif
+
 #ifdef CONFIG_PCI_IOV
 static int igb_sriov_reinit(struct pci_dev *dev)
 {
@@ -3707,9 +3748,9 @@ static int igb_disable_sriov(struct pci_dev *pdev, bool reinit)
 			pci_disable_sriov(pdev);
 			msleep(500);
 		}
-		spin_lock_irqsave(&adapter->vfs_lock, flags);
+		vfs_lock_irqsave(adapter, &flags);
 		adapter->vfs_allocated_count = 0;
-		spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+		vfs_unlock_irqrestore(adapter, flags);
 		kfree(adapter->vf_mac_list);
 		adapter->vf_mac_list = NULL;
 		kfree(adapter->vf_data);
@@ -4042,7 +4083,7 @@ static int igb_sw_init(struct igb_adapter *adapter)
 	spin_lock_init(&adapter->stats64_lock);
 
 	/* init spinlock to avoid concurrency of VF resources */
-	spin_lock_init(&adapter->vfs_lock);
+	vfs_lock_init(adapter);
 #ifdef CONFIG_PCI_IOV
 	switch (hw->mac.type) {
 	case e1000_82576:
@@ -8078,7 +8119,7 @@ static void igb_msg_task(struct igb_adapter *adapter)
 	unsigned long flags;
 	u32 vf;
 
-	spin_lock_irqsave(&adapter->vfs_lock, flags);
+	vfs_lock_irqsave(adapter, &flags);
 	for (vf = 0; vf < adapter->vfs_allocated_count; vf++) {
 		/* process any reset requests */
 		if (!igb_check_for_rst(hw, vf))
@@ -8092,7 +8133,7 @@ static void igb_msg_task(struct igb_adapter *adapter)
 		if (!igb_check_for_ack(hw, vf))
 			igb_rcv_ack_from_vf(adapter, vf);
 	}
-	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+	vfs_unlock_irqrestore(adapter, flags);
 }
 
 /**
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 3/4] igb: split igb_msg_task()
  2025-02-04 17:52 [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
  2025-02-04 17:52 ` [PATCH net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Tony Nguyen
  2025-02-04 17:52 ` [PATCH net 2/4] igb: introduce raw vfs_lock to igb_adapter Tony Nguyen
@ 2025-02-04 17:52 ` Tony Nguyen
  2025-02-04 17:52 ` [PATCH net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
  2025-02-05  9:48 ` [PATCH net 0/4][pull request] " Sebastian Andrzej Siewior
  4 siblings, 0 replies; 25+ messages in thread
From: Tony Nguyen @ 2025-02-04 17:52 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Wander Lairson Costa, anthony.l.nguyen, rostedt, clrkwllms,
	bigeasy, jgarzik, yuma, linux-rt-devel, Rafal Romanowski

From: Wander Lairson Costa <wander@redhat.com>

From the perspective of PREEMPT_RT, igb_msg_task() invokes functions
that are a mix of IRQ-safe and non-IRQ-safe operations.

To address this, we separate igb_msg_task() into distinct IRQ-safe and
preemptible-safe components. This is a preparatory step for upcoming
commits, where the igb_msix_other interrupt handler will be split into
IRQ and threaded handlers, each invoking the appropriate part of the
newly divided igb_msg_task().

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 88 +++++++++++++++++++++--
 1 file changed, 81 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 4e75c88f6214..6d590192c27f 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -146,6 +146,8 @@ static int igb_vlan_rx_kill_vid(struct net_device *, __be16, u16);
 static void igb_restore_vlan(struct igb_adapter *);
 static void igb_rar_set_index(struct igb_adapter *, u32);
 static void igb_ping_all_vfs(struct igb_adapter *);
+static void igb_msg_task_irq_safe(struct igb_adapter *adapter);
+static void igb_msg_task_preemptible_safe(struct igb_adapter *adapter);
 static void igb_msg_task(struct igb_adapter *);
 static void igb_vmm_control(struct igb_adapter *);
 static int igb_set_vf_mac(struct igb_adapter *, int, unsigned char *);
@@ -3681,6 +3683,30 @@ static __always_inline void vfs_unlock_irqrestore(struct igb_adapter *adapter,
 	raw_spin_unlock_irqrestore(&adapter->raw_vfs_lock, flags);
 	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
 }
+
+static __always_inline void vfs_spin_lock_irqsave(struct igb_adapter *adapter,
+						  unsigned long *flags)
+{
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+}
+
+static __always_inline void vfs_spin_unlock_irqrestore(struct igb_adapter *adapter,
+						       unsigned long flags)
+{
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
+
+static __always_inline void vfs_raw_spin_lock_irqsave(struct igb_adapter *adapter,
+						      unsigned long *flags)
+{
+	raw_spin_lock_irqsave(&adapter->raw_vfs_lock, *flags);
+}
+
+static __always_inline void vfs_raw_spin_unlock_irqrestore(struct igb_adapter *adapter,
+							   unsigned long flags)
+{
+	raw_spin_unlock_irqrestore(&adapter->raw_vfs_lock, flags);
+}
 #else
 static __always_inline void vfs_lock_init(struct igb_adapter *adapter)
 {
@@ -3696,6 +3722,30 @@ static __always_inline void vfs_unlock_irqrestore(struct igb_adapter *adapter, u
 {
 	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
 }
+
+static __always_inline void vfs_spin_lock_irqsave(struct igb_adapter *adapter,
+						  unsigned long *flags)
+{
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+}
+
+static __always_inline void vfs_spin_unlock_irqrestore(struct igb_adapter *adapter,
+						       unsigned long flags)
+{
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
+
+static __always_inline void vfs_raw_spin_lock_irqsave(struct igb_adapter *adapter,
+						      unsigned long *flags)
+{
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+}
+
+static __always_inline void vfs_raw_spin_unlock_irqrestore(struct igb_adapter *adapter,
+							   unsigned long flags)
+{
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
 #endif
 
 #ifdef CONFIG_PCI_IOV
@@ -8113,27 +8163,51 @@ static void igb_rcv_msg_from_vf(struct igb_adapter *adapter, u32 vf)
 	igb_unlock_mbx(hw, vf);
 }
 
-static void igb_msg_task(struct igb_adapter *adapter)
+/*
+ * Note: the split of irq and preempible safe parts of igb_msg_task()
+ * only makes sense under PREEMPT_RT.
+ * The root cause of igb_rcv_msg_from_vf() is not IRQ safe is because
+ * it calls kcalloc with GFP_ATOMIC, but GFP_ATOMIC is not IRQ safe
+ * in PREEMPT_RT.
+ */
+static void igb_msg_task_irq_safe(struct igb_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	unsigned long flags;
 	u32 vf;
 
-	vfs_lock_irqsave(adapter, &flags);
+	vfs_raw_spin_lock_irqsave(adapter, &flags);
 	for (vf = 0; vf < adapter->vfs_allocated_count; vf++) {
 		/* process any reset requests */
 		if (!igb_check_for_rst(hw, vf))
 			igb_vf_reset_event(adapter, vf);
 
-		/* process any messages pending */
-		if (!igb_check_for_msg(hw, vf))
-			igb_rcv_msg_from_vf(adapter, vf);
-
 		/* process any acks */
 		if (!igb_check_for_ack(hw, vf))
 			igb_rcv_ack_from_vf(adapter, vf);
 	}
-	vfs_unlock_irqrestore(adapter, flags);
+	vfs_raw_spin_unlock_irqrestore(adapter, flags);
+}
+
+static void igb_msg_task_preemptible_safe(struct igb_adapter *adapter)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	unsigned long flags;
+	u32 vf;
+
+	vfs_spin_lock_irqsave(adapter, &flags);
+	for (vf = 0; vf < adapter->vfs_allocated_count; vf++) {
+		/* process any messages pending */
+		if (!igb_check_for_msg(hw, vf))
+			igb_rcv_msg_from_vf(adapter, vf);
+	}
+	vfs_spin_unlock_irqrestore(adapter, flags);
+}
+
+static __always_inline void igb_msg_task(struct igb_adapter *adapter)
+{
+	igb_msg_task_irq_safe(adapter);
+	igb_msg_task_preemptible_safe(adapter);
 }
 
 /**
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-04 17:52 [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
                   ` (2 preceding siblings ...)
  2025-02-04 17:52 ` [PATCH net 3/4] igb: split igb_msg_task() Tony Nguyen
@ 2025-02-04 17:52 ` Tony Nguyen
  2025-02-05  9:48 ` [PATCH net 0/4][pull request] " Sebastian Andrzej Siewior
  4 siblings, 0 replies; 25+ messages in thread
From: Tony Nguyen @ 2025-02-04 17:52 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Wander Lairson Costa, anthony.l.nguyen, rostedt, clrkwllms,
	bigeasy, jgarzik, yuma, linux-rt-devel, Rafal Romanowski

From: Wander Lairson Costa <wander@redhat.com>

During testing of SR-IOV, Red Hat QE encountered an issue where the
ip link up command intermittently fails for the igbvf interfaces when
using the PREEMPT_RT variant. Investigation revealed that
e1000_write_posted_mbx returns an error due to the lack of an ACK
from e1000_poll_for_ack.

The underlying issue arises from the fact that IRQs are threaded by
default under PREEMPT_RT. While the exact hardware details are not
available, it appears that the IRQ handled by igb_msix_other must
be processed before e1000_poll_for_ack times out. However,
e1000_write_posted_mbx is called with preemption disabled, leading
to a scenario where the IRQ is serviced only after the failure of
e1000_write_posted_mbx.

Commit 338c4d3902fe ("igb: Disable threaded IRQ for igb_msix_other")
forced the ISR to run in a non-threaded context. However, Sebastian
observed that some functions called within the ISR acquire locks that
may sleep.

In the previous two patches, we managed to make igb_msg_mask() safe to
call from an interrupt context.

In this commit, we move most of the ISR handling to an interrupt
context, leaving non IRQ safe code to be called from the thread
context under PREEMPT_RT.

Reproducer:

ipaddr_vlan=3
nic_test=ens14f0
vf=${nic_test}v0 # The main testing steps:
while true; do
    ip link set ${nic_test} mtu 1500
    ip link set ${vf} mtu 1500
    ip link set $vf up
    # 3. set vlan and ip for VF
    ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
    ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
    ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
    # 4. check the link state for VF and PF
    ip link show ${nic_test}
    if ! ip link show $vf | grep 'state UP'; then
        echo 'Error found'
        break
    fi
    ip link set $vf down
done

You can also reproduce it more reliably by setting nr_cpus=1 in the
kernel command line.

Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reported-by: Yuying Ma <yuma@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 35 ++++++++++++++++-------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 6d590192c27f..3cc85584d9ce 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -128,6 +128,7 @@ static void igb_set_uta(struct igb_adapter *adapter, bool set);
 static irqreturn_t igb_intr(int irq, void *);
 static irqreturn_t igb_intr_msi(int irq, void *);
 static irqreturn_t igb_msix_other(int irq, void *);
+static irqreturn_t igb_msix_other_threaded(int irq, void *);
 static irqreturn_t igb_msix_ring(int irq, void *);
 #ifdef CONFIG_IGB_DCA
 static void igb_update_dca(struct igb_q_vector *);
@@ -148,7 +149,6 @@ static void igb_rar_set_index(struct igb_adapter *, u32);
 static void igb_ping_all_vfs(struct igb_adapter *);
 static void igb_msg_task_irq_safe(struct igb_adapter *adapter);
 static void igb_msg_task_preemptible_safe(struct igb_adapter *adapter);
-static void igb_msg_task(struct igb_adapter *);
 static void igb_vmm_control(struct igb_adapter *);
 static int igb_set_vf_mac(struct igb_adapter *, int, unsigned char *);
 static void igb_flush_mac_table(struct igb_adapter *);
@@ -914,8 +914,9 @@ static int igb_request_msix(struct igb_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int i, err = 0, vector = 0, free_vector = 0;
 
-	err = request_irq(adapter->msix_entries[vector].vector,
-			  igb_msix_other, 0, netdev->name, adapter);
+	err = request_threaded_irq(adapter->msix_entries[vector].vector,
+				   igb_msix_other, igb_msix_other_threaded,
+				   IRQF_NO_THREAD, netdev->name, adapter);
 	if (err)
 		goto err_out;
 
@@ -7156,9 +7157,27 @@ static irqreturn_t igb_msix_other(int irq, void *data)
 		igb_check_wvbr(adapter);
 	}
 
-	/* Check for a mailbox event */
+	/* Check for a mailbox event (interrupt safe part) */
 	if (icr & E1000_ICR_VMMB)
-		igb_msg_task(adapter);
+		igb_msg_task_irq_safe(adapter);
+
+	adapter->test_icr = icr;
+
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		return igb_msix_other_threaded(irq, data);
+
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t igb_msix_other_threaded(int irq, void *data)
+{
+	struct igb_adapter *adapter = data;
+	struct e1000_hw *hw = &adapter->hw;
+	u32 icr = adapter->test_icr;
+
+	/* Check for a mailbox event (preempible safe part) */
+	if (icr & E1000_ICR_VMMB)
+		igb_msg_task_preemptible_safe(adapter);
 
 	if (icr & E1000_ICR_LSC) {
 		hw->mac.get_link_status = 1;
@@ -8204,12 +8223,6 @@ static void igb_msg_task_preemptible_safe(struct igb_adapter *adapter)
 	vfs_spin_unlock_irqrestore(adapter, flags);
 }
 
-static __always_inline void igb_msg_task(struct igb_adapter *adapter)
-{
-	igb_msg_task_irq_safe(adapter);
-	igb_msg_task_preemptible_safe(adapter);
-}
-
 /**
  *  igb_set_uta - Set unicast filter table address
  *  @adapter: board private structure
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-04 17:52 [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
                   ` (3 preceding siblings ...)
  2025-02-04 17:52 ` [PATCH net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
@ 2025-02-05  9:48 ` Sebastian Andrzej Siewior
  2025-02-05 20:08   ` Wander Lairson Costa
  4 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-05  9:48 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev, wander,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On 2025-02-04 09:52:36 [-0800], Tony Nguyen wrote:
> Wander Lairson Costa says:
> 
> This is the second attempt at fixing the behavior of igb_msix_other()
> for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> concerns raised by Sebastian [3].

I still prefer a solution where we don't have the ifdef in the driver. I
was presented two traces but I didn't get why it works in once case but
not in the other. Maybe it was too obvious.
In the mean time:

igb_msg_task_irq_safe()
-> vfs_raw_spin_lock_irqsave() // raw_spinlock_t
-> igb_vf_reset_event()
  -> igb_vf_reset()
    -> igb_set_rx_mode()
      -> igb_write_mc_addr_list()
         -> mta_list = kcalloc(netdev_mc_count(netdev), 6, GFP_ATOMIC); // kaboom?

By explicitly disabling preemption or using a raw_spinlock_t you need to
pay attention not to do anything that might lead to unbounded loops
(like iterating over many lists, polling on a bit for ages, …) and
paying attention that the whole API underneath that it is not doing that
is allowed to.

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-05  9:48 ` [PATCH net 0/4][pull request] " Sebastian Andrzej Siewior
@ 2025-02-05 20:08   ` Wander Lairson Costa
  2025-02-06 11:59     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 25+ messages in thread
From: Wander Lairson Costa @ 2025-02-05 20:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On Wed, Feb 05, 2025 at 10:48:18AM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-04 09:52:36 [-0800], Tony Nguyen wrote:
> > Wander Lairson Costa says:
> > 
> > This is the second attempt at fixing the behavior of igb_msix_other()
> > for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> > concerns raised by Sebastian [3].
> 
> I still prefer a solution where we don't have the ifdef in the driver. I
> was presented two traces but I didn't get why it works in once case but
> not in the other. Maybe it was too obvious.

Copying the traces here for further explanation. Both cases are for
PREEMPT_RT.

Failure case:

kworker/-86      0...1    85.381866: function:                   igbvf_reset
kworker/-86      0...2    85.381866: function:                      e1000_reset_hw_vf
kworker/-86      0...2    85.381867: function:                         e1000_check_for_rst_vf
kworker/-86      0...2    85.381868: function:                         e1000_write_posted_mbx
kworker/-86      0...2    85.381868: function:                            e1000_write_mbx_vf
kworker/-86      0...2    85.381870: function:                            e1000_check_for_ack_vf // repeats for 2000 lines
...
kworker/-86      0.N.2    86.393782: function:                         e1000_read_posted_mbx
kworker/-86      0.N.2    86.398606: function:                      e1000_init_hw_vf
kworker/-86      0.N.2    86.398606: function:                         e1000_rar_set_vf
kworker/-86      0.N.2    86.398606: function:                            e1000_write_posted_mbx
irq/65-e-1287    0d..1    86.398609: function:             igb_msix_other
irq/65-e-1287    0d..1    86.398609: function:                igb_rd32
irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst
irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst_pf
irq/65-e-1287    0d..2    86.398610: function:                   igb_rd32
irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg
irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg_pf
irq/65-e-1287    0d..2    86.398611: function:                   igb_rd32
irq/65-e-1287    0d..2    86.398612: function:                igb_rcv_msg_from_vf
irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx
irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx_pf
irq/65-e-1287    0d..2    86.398612: function:                      igb_obtain_mbx_lock_pf
irq/65-e-1287    0d..2    86.398612: function:                         igb_rd32

In the above trace, observe that the ISR igb_msix_other() is only
scheduled after e1000_write_posted_mbx() fails due to a timeout.
The interrupt handler should run during the looping calls to
e1000_check_for_ack_vf(), but it is not scheduled because
preemption is disabled.

Note that e1000_check_for_ack_vf() is called 2000 times before
it finally times out.

Sucessful case:

      ip-5603    0...1  1884.710747: function:             igbvf_reset
      ip-5603    0...2  1884.710754: function:                e1000_reset_hw_vf
      ip-5603    0...2  1884.710755: function:                   e1000_check_for_rst_vf
      ip-5603    0...2  1884.710756: function:                   e1000_write_posted_mbx
      ip-5603    0...2  1884.710756: function:                      e1000_write_mbx_vf
      ip-5603    0...2  1884.710758: function:                      e1000_check_for_ack_vf
      ip-5603    0d.h2  1884.710760: function:             igb_msix_other
      ip-5603    0d.h2  1884.710760: function:                igb_rd32
      ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst
      ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst_pf
      ip-5603    0d.h3  1884.710761: function:                   igb_rd32
      ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg
      ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg_pf
      ip-5603    0d.h3  1884.710762: function:                   igb_rd32
      ip-5603    0d.h3  1884.710763: function:                igb_rcv_msg_from_vf
      ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx
      ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx_pf
      ip-5603    0d.h3  1884.710763: function:                      igb_obtain_mbx_lock_pf
      ip-5603    0d.h3  1884.710763: function:                         igb_rd32

Since we forced the interrupt context for igb_msix_other(), it now
runs immediately while the driver checks for an acknowledgment via
e1000_check_for_ack_vf().

> In the mean time:
> 
> igb_msg_task_irq_safe()
> -> vfs_raw_spin_lock_irqsave() // raw_spinlock_t
> -> igb_vf_reset_event()
>   -> igb_vf_reset()
>     -> igb_set_rx_mode()
>       -> igb_write_mc_addr_list()
>          -> mta_list = kcalloc(netdev_mc_count(netdev), 6, GFP_ATOMIC); // kaboom?

Perhaps the solution is to preallocate this buffer, if possible.
Doing so would significantly simplify the patch. However, this
would require knowing when the multicast (mc) count changes.
I attempted to identify this but have not succeeded so far.

> 
> By explicitly disabling preemption or using a raw_spinlock_t you need to
> pay attention not to do anything that might lead to unbounded loops
> (like iterating over many lists, polling on a bit for ages, …) and
> paying attention that the whole API underneath that it is not doing that
> is allowed to.

I unsure if I understood what you are trying to say.

> 
> Sebastian
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-05 20:08   ` Wander Lairson Costa
@ 2025-02-06 11:59     ` Sebastian Andrzej Siewior
  2025-02-06 20:42       ` Wander Lairson Costa
  2025-02-12 11:56       ` Wander Lairson Costa
  0 siblings, 2 replies; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-06 11:59 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On 2025-02-05 17:08:35 [-0300], Wander Lairson Costa wrote:
> On Wed, Feb 05, 2025 at 10:48:18AM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-02-04 09:52:36 [-0800], Tony Nguyen wrote:
> > > Wander Lairson Costa says:
> > > 
> > > This is the second attempt at fixing the behavior of igb_msix_other()
> > > for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> > > concerns raised by Sebastian [3].
> > 
> > I still prefer a solution where we don't have the ifdef in the driver. I
> > was presented two traces but I didn't get why it works in once case but
> > not in the other. Maybe it was too obvious.
> 
> Copying the traces here for further explanation. Both cases are for
> PREEMPT_RT.
> 
> Failure case:
> 
> kworker/-86      0...1    85.381866: function:                   igbvf_reset
> kworker/-86      0...2    85.381866: function:                      e1000_reset_hw_vf
> kworker/-86      0...2    85.381867: function:                         e1000_check_for_rst_vf
> kworker/-86      0...2    85.381868: function:                         e1000_write_posted_mbx
> kworker/-86      0...2    85.381868: function:                            e1000_write_mbx_vf
> kworker/-86      0...2    85.381870: function:                            e1000_check_for_ack_vf // repeats for 2000 lines

So it repeats because it waits for the bit. It waits for the interrupts.

> ...
> kworker/-86      0.N.2    86.393782: function:                         e1000_read_posted_mbx

Is this 2 the migrate-disable or preempt-disable counter? Because you
should get preempted based on that N.

> kworker/-86      0.N.2    86.398606: function:                      e1000_init_hw_vf
> kworker/-86      0.N.2    86.398606: function:                         e1000_rar_set_vf
> kworker/-86      0.N.2    86.398606: function:                            e1000_write_posted_mbx
> irq/65-e-1287    0d..1    86.398609: function:             igb_msix_other

So the kworker leaves and immediately the interrupt gets on the CPU.

> irq/65-e-1287    0d..1    86.398609: function:                igb_rd32
> irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst
> irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst_pf
> irq/65-e-1287    0d..2    86.398610: function:                   igb_rd32
> irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg
> irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg_pf
> irq/65-e-1287    0d..2    86.398611: function:                   igb_rd32
> irq/65-e-1287    0d..2    86.398612: function:                igb_rcv_msg_from_vf
> irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx
> irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx_pf
> irq/65-e-1287    0d..2    86.398612: function:                      igb_obtain_mbx_lock_pf
> irq/65-e-1287    0d..2    86.398612: function:                         igb_rd32
> 
> In the above trace, observe that the ISR igb_msix_other() is only
> scheduled after e1000_write_posted_mbx() fails due to a timeout.
> The interrupt handler should run during the looping calls to
> e1000_check_for_ack_vf(), but it is not scheduled because
> preemption is disabled.

What disables preemption? On PREEMPT_RT the spin_lock() does not disable
preemption. You shouldn't spin that long. When was interrupt scheduled.
_Why_ is the interrupt delayed that long.

> Note that e1000_check_for_ack_vf() is called 2000 times before
> it finally times out.

Exactly.

> Sucessful case:
> 
>       ip-5603    0...1  1884.710747: function:             igbvf_reset
>       ip-5603    0...2  1884.710754: function:                e1000_reset_hw_vf
>       ip-5603    0...2  1884.710755: function:                   e1000_check_for_rst_vf
>       ip-5603    0...2  1884.710756: function:                   e1000_write_posted_mbx
>       ip-5603    0...2  1884.710756: function:                      e1000_write_mbx_vf
>       ip-5603    0...2  1884.710758: function:                      e1000_check_for_ack_vf
>       ip-5603    0d.h2  1884.710760: function:             igb_msix_other
>       ip-5603    0d.h2  1884.710760: function:                igb_rd32
>       ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst
>       ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst_pf
>       ip-5603    0d.h3  1884.710761: function:                   igb_rd32
>       ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg
>       ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg_pf
>       ip-5603    0d.h3  1884.710762: function:                   igb_rd32
>       ip-5603    0d.h3  1884.710763: function:                igb_rcv_msg_from_vf
>       ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx
>       ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx_pf
>       ip-5603    0d.h3  1884.710763: function:                      igb_obtain_mbx_lock_pf
>       ip-5603    0d.h3  1884.710763: function:                         igb_rd32
> 
> Since we forced the interrupt context for igb_msix_other(), it now
> runs immediately while the driver checks for an acknowledgment via
> e1000_check_for_ack_vf().

Is this still RT or non-RT? I'm asking because igbvf_reset() is invoked
in ip's context and not in a worker. Also igb_msix_other() runs with a
'h' so it is not threaded.

I have a theory of my own, mind testing 

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index d368b753a4675..6fe37b8001c36 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -912,7 +912,7 @@ static int igb_request_msix(struct igb_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int i, err = 0, vector = 0, free_vector = 0;
 
-	err = request_irq(adapter->msix_entries[vector].vector,
+	err = request_threaded_irq(adapter->msix_entries[vector].vector, NULL,
 			  igb_msix_other, 0, netdev->name, adapter);
 	if (err)
 		goto err_out;

just to see if it solves the problem?

> 
> > In the mean time:
> > 
> > igb_msg_task_irq_safe()
> > -> vfs_raw_spin_lock_irqsave() // raw_spinlock_t
> > -> igb_vf_reset_event()
> >   -> igb_vf_reset()
> >     -> igb_set_rx_mode()
> >       -> igb_write_mc_addr_list()
> >          -> mta_list = kcalloc(netdev_mc_count(netdev), 6, GFP_ATOMIC); // kaboom?
> 
> Perhaps the solution is to preallocate this buffer, if possible.
> Doing so would significantly simplify the patch. However, this
> would require knowing when the multicast (mc) count changes.
> I attempted to identify this but have not succeeded so far.
> 
> > 
> > By explicitly disabling preemption or using a raw_spinlock_t you need to
> > pay attention not to do anything that might lead to unbounded loops
> > (like iterating over many lists, polling on a bit for ages, …) and
> > paying attention that the whole API underneath that it is not doing that
> > is allowed to.
> 
> I unsure if I understood what you are trying to say.

The moment you start disabling preemption/ use raw_spin_lock_t you need
to start about everything underneath/ everything within this section.
While if you keep using spinlock_t you don't have to worry *that* much
and worry if *this* will break PREEMPT_RT. Not to worry whether or not
it is okay to allocate memory or call this function because it might
break RT.
OR if netdev_mc_count() returns 1 you loop once later and this costs you
1us. If it returns 100, you loop 100 times and it costs how much
additional time?

Sebastian

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-06 11:59     ` Sebastian Andrzej Siewior
@ 2025-02-06 20:42       ` Wander Lairson Costa
  2025-02-12 11:56       ` Wander Lairson Costa
  1 sibling, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-02-06 20:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On Thu, Feb 6, 2025 at 8:59 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2025-02-05 17:08:35 [-0300], Wander Lairson Costa wrote:
> > On Wed, Feb 05, 2025 at 10:48:18AM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2025-02-04 09:52:36 [-0800], Tony Nguyen wrote:
> > > > Wander Lairson Costa says:
> > > >
> > > > This is the second attempt at fixing the behavior of igb_msix_other()
> > > > for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> > > > concerns raised by Sebastian [3].
> > >
> > > I still prefer a solution where we don't have the ifdef in the driver. I
> > > was presented two traces but I didn't get why it works in once case but
> > > not in the other. Maybe it was too obvious.
> >
> > Copying the traces here for further explanation. Both cases are for
> > PREEMPT_RT.
> >
> > Failure case:
> >
> > kworker/-86      0...1    85.381866: function:                   igbvf_reset
> > kworker/-86      0...2    85.381866: function:                      e1000_reset_hw_vf
> > kworker/-86      0...2    85.381867: function:                         e1000_check_for_rst_vf
> > kworker/-86      0...2    85.381868: function:                         e1000_write_posted_mbx
> > kworker/-86      0...2    85.381868: function:                            e1000_write_mbx_vf
> > kworker/-86      0...2    85.381870: function:                            e1000_check_for_ack_vf // repeats for 2000 lines
>
> So it repeats because it waits for the bit. It waits for the interrupts.
>

Exactly. Although I am not familiar with the hardware, I understand
the interrupt sets the bit it is waiting for.

> > ...
> > kworker/-86      0.N.2    86.393782: function:                         e1000_read_posted_mbx
>
> Is this 2 the migrate-disable or preempt-disable counter? Because you
> should get preempted based on that N.
>
> > kworker/-86      0.N.2    86.398606: function:                      e1000_init_hw_vf
> > kworker/-86      0.N.2    86.398606: function:                         e1000_rar_set_vf
> > kworker/-86      0.N.2    86.398606: function:                            e1000_write_posted_mbx
> > irq/65-e-1287    0d..1    86.398609: function:             igb_msix_other
>
> So the kworker leaves and immediately the interrupt gets on the CPU.
>
> > irq/65-e-1287    0d..1    86.398609: function:                igb_rd32
> > irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst
> > irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst_pf
> > irq/65-e-1287    0d..2    86.398610: function:                   igb_rd32
> > irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg
> > irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg_pf
> > irq/65-e-1287    0d..2    86.398611: function:                   igb_rd32
> > irq/65-e-1287    0d..2    86.398612: function:                igb_rcv_msg_from_vf
> > irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx
> > irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx_pf
> > irq/65-e-1287    0d..2    86.398612: function:                      igb_obtain_mbx_lock_pf
> > irq/65-e-1287    0d..2    86.398612: function:                         igb_rd32
> >
> > In the above trace, observe that the ISR igb_msix_other() is only
> > scheduled after e1000_write_posted_mbx() fails due to a timeout.
> > The interrupt handler should run during the looping calls to
> > e1000_check_for_ack_vf(), but it is not scheduled because
> > preemption is disabled.
>
> What disables preemption? On PREEMPT_RT the spin_lock() does not disable
> preemption. You shouldn't spin that long. When was interrupt scheduled.
> _Why_ is the interrupt delayed that long.
>

In RT, it is a thread. To simplify the process of simulating the bug,
I booted the kernel with the parameter nr_cpus=1. Since preemption is
disabled in this configuration, the ISR (Interrupt Service Routine) is
not scheduled immediately and is deferred until later.

From what I recall, I traced the issue starting from the netapi and
ksoftirqd components. To ensure accuracy, I will set up a machine and
verify the behavior again.

> > Note that e1000_check_for_ack_vf() is called 2000 times before
> > it finally times out.
>
> Exactly.
>
> > Sucessful case:
> >
> >       ip-5603    0...1  1884.710747: function:             igbvf_reset
> >       ip-5603    0...2  1884.710754: function:                e1000_reset_hw_vf
> >       ip-5603    0...2  1884.710755: function:                   e1000_check_for_rst_vf
> >       ip-5603    0...2  1884.710756: function:                   e1000_write_posted_mbx
> >       ip-5603    0...2  1884.710756: function:                      e1000_write_mbx_vf
> >       ip-5603    0...2  1884.710758: function:                      e1000_check_for_ack_vf
> >       ip-5603    0d.h2  1884.710760: function:             igb_msix_other
> >       ip-5603    0d.h2  1884.710760: function:                igb_rd32
> >       ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst
> >       ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst_pf
> >       ip-5603    0d.h3  1884.710761: function:                   igb_rd32
> >       ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg
> >       ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg_pf
> >       ip-5603    0d.h3  1884.710762: function:                   igb_rd32
> >       ip-5603    0d.h3  1884.710763: function:                igb_rcv_msg_from_vf
> >       ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx
> >       ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx_pf
> >       ip-5603    0d.h3  1884.710763: function:                      igb_obtain_mbx_lock_pf
> >       ip-5603    0d.h3  1884.710763: function:                         igb_rd32
> >
> > Since we forced the interrupt context for igb_msix_other(), it now
> > runs immediately while the driver checks for an acknowledgment via
> > e1000_check_for_ack_vf().
>
> Is this still RT or non-RT? I'm asking because igbvf_reset() is invoked
> in ip's context and not in a worker. Also igb_msix_other() runs with a
> 'h' so it is not threaded.
>

This is RT with this patch series applied.

> I have a theory of my own, mind testing
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index d368b753a4675..6fe37b8001c36 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -912,7 +912,7 @@ static int igb_request_msix(struct igb_adapter *adapter)
>         struct net_device *netdev = adapter->netdev;
>         int i, err = 0, vector = 0, free_vector = 0;
>
> -       err = request_irq(adapter->msix_entries[vector].vector,
> +       err = request_threaded_irq(adapter->msix_entries[vector].vector, NULL,
>                           igb_msix_other, 0, netdev->name, adapter);
>         if (err)
>                 goto err_out;
>
> just to see if it solves the problem?
>

Sure. I will do it and report the results back.

> >
> > > In the mean time:
> > >
> > > igb_msg_task_irq_safe()
> > > -> vfs_raw_spin_lock_irqsave() // raw_spinlock_t
> > > -> igb_vf_reset_event()
> > >   -> igb_vf_reset()
> > >     -> igb_set_rx_mode()
> > >       -> igb_write_mc_addr_list()
> > >          -> mta_list = kcalloc(netdev_mc_count(netdev), 6, GFP_ATOMIC); // kaboom?
> >
> > Perhaps the solution is to preallocate this buffer, if possible.
> > Doing so would significantly simplify the patch. However, this
> > would require knowing when the multicast (mc) count changes.
> > I attempted to identify this but have not succeeded so far.
> >
> > >
> > > By explicitly disabling preemption or using a raw_spinlock_t you need to
> > > pay attention not to do anything that might lead to unbounded loops
> > > (like iterating over many lists, polling on a bit for ages, …) and
> > > paying attention that the whole API underneath that it is not doing that
> > > is allowed to.
> >
> > I unsure if I understood what you are trying to say.
>
> The moment you start disabling preemption/ use raw_spin_lock_t you need
> to start about everything underneath/ everything within this section.
> While if you keep using spinlock_t you don't have to worry *that* much
> and worry if *this* will break PREEMPT_RT. Not to worry whether or not
> it is okay to allocate memory or call this function because it might
> break RT.
> OR if netdev_mc_count() returns 1 you loop once later and this costs you
> 1us. If it returns 100, you loop 100 times and it costs how much
> additional time?
>
Understood now. Thanks.

> Sebastian
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-06 11:59     ` Sebastian Andrzej Siewior
  2025-02-06 20:42       ` Wander Lairson Costa
@ 2025-02-12 11:56       ` Wander Lairson Costa
  2025-02-12 15:11         ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 25+ messages in thread
From: Wander Lairson Costa @ 2025-02-12 11:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On Thu, Feb 06, 2025 at 12:59:14PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-05 17:08:35 [-0300], Wander Lairson Costa wrote:
> > On Wed, Feb 05, 2025 at 10:48:18AM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2025-02-04 09:52:36 [-0800], Tony Nguyen wrote:
> > > > Wander Lairson Costa says:
> > > > 
> > > > This is the second attempt at fixing the behavior of igb_msix_other()
> > > > for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> > > > concerns raised by Sebastian [3].
> > > 
> > > I still prefer a solution where we don't have the ifdef in the driver. I
> > > was presented two traces but I didn't get why it works in once case but
> > > not in the other. Maybe it was too obvious.
> > 
> > Copying the traces here for further explanation. Both cases are for
> > PREEMPT_RT.
> > 
> > Failure case:
> > 
> > kworker/-86      0...1    85.381866: function:                   igbvf_reset
> > kworker/-86      0...2    85.381866: function:                      e1000_reset_hw_vf
> > kworker/-86      0...2    85.381867: function:                         e1000_check_for_rst_vf
> > kworker/-86      0...2    85.381868: function:                         e1000_write_posted_mbx
> > kworker/-86      0...2    85.381868: function:                            e1000_write_mbx_vf
> > kworker/-86      0...2    85.381870: function:                            e1000_check_for_ack_vf // repeats for 2000 lines
> 
> So it repeats because it waits for the bit. It waits for the interrupts.
> 
> > ...
> > kworker/-86      0.N.2    86.393782: function:                         e1000_read_posted_mbx
> 
> Is this 2 the migrate-disable or preempt-disable counter? Because you
> should get preempted based on that N.
> 
> > kworker/-86      0.N.2    86.398606: function:                      e1000_init_hw_vf
> > kworker/-86      0.N.2    86.398606: function:                         e1000_rar_set_vf
> > kworker/-86      0.N.2    86.398606: function:                            e1000_write_posted_mbx
> > irq/65-e-1287    0d..1    86.398609: function:             igb_msix_other
> 
> So the kworker leaves and immediately the interrupt gets on the CPU.
> 
> > irq/65-e-1287    0d..1    86.398609: function:                igb_rd32
> > irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst
> > irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst_pf
> > irq/65-e-1287    0d..2    86.398610: function:                   igb_rd32
> > irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg
> > irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg_pf
> > irq/65-e-1287    0d..2    86.398611: function:                   igb_rd32
> > irq/65-e-1287    0d..2    86.398612: function:                igb_rcv_msg_from_vf
> > irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx
> > irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx_pf
> > irq/65-e-1287    0d..2    86.398612: function:                      igb_obtain_mbx_lock_pf
> > irq/65-e-1287    0d..2    86.398612: function:                         igb_rd32
> > 
> > In the above trace, observe that the ISR igb_msix_other() is only
> > scheduled after e1000_write_posted_mbx() fails due to a timeout.
> > The interrupt handler should run during the looping calls to
> > e1000_check_for_ack_vf(), but it is not scheduled because
> > preemption is disabled.
> 
> What disables preemption? On PREEMPT_RT the spin_lock() does not disable
> preemption. You shouldn't spin that long. When was interrupt scheduled.
> _Why_ is the interrupt delayed that long.
> 
When I was using trace-cmd report -l, it omitted some fields, one of
them is preempt-lazy-depth (which was something new to me), and it seems
this is what affects interrupts. It comes from here [1]. I had the logs,
but the machine went under maintenance  before I could save them. Once
it comes back, I can grab them and post here.

[1] https://elixir.bootlin.com/linux/v6.13.2/source/drivers/net/ethernet/intel/igbvf/netdev.c#L1522

> > Note that e1000_check_for_ack_vf() is called 2000 times before
> > it finally times out.
> 
> Exactly.
> 
> > Sucessful case:
> > 
> >       ip-5603    0...1  1884.710747: function:             igbvf_reset
> >       ip-5603    0...2  1884.710754: function:                e1000_reset_hw_vf
> >       ip-5603    0...2  1884.710755: function:                   e1000_check_for_rst_vf
> >       ip-5603    0...2  1884.710756: function:                   e1000_write_posted_mbx
> >       ip-5603    0...2  1884.710756: function:                      e1000_write_mbx_vf
> >       ip-5603    0...2  1884.710758: function:                      e1000_check_for_ack_vf
> >       ip-5603    0d.h2  1884.710760: function:             igb_msix_other
> >       ip-5603    0d.h2  1884.710760: function:                igb_rd32
> >       ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst
> >       ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst_pf
> >       ip-5603    0d.h3  1884.710761: function:                   igb_rd32
> >       ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg
> >       ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg_pf
> >       ip-5603    0d.h3  1884.710762: function:                   igb_rd32
> >       ip-5603    0d.h3  1884.710763: function:                igb_rcv_msg_from_vf
> >       ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx
> >       ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx_pf
> >       ip-5603    0d.h3  1884.710763: function:                      igb_obtain_mbx_lock_pf
> >       ip-5603    0d.h3  1884.710763: function:                         igb_rd32
> > 
> > Since we forced the interrupt context for igb_msix_other(), it now
> > runs immediately while the driver checks for an acknowledgment via
> > e1000_check_for_ack_vf().
> 
> Is this still RT or non-RT? I'm asking because igbvf_reset() is invoked
> in ip's context and not in a worker. Also igb_msix_other() runs with a
> 'h' so it is not threaded.
> 
> I have a theory of my own, mind testing 
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index d368b753a4675..6fe37b8001c36 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -912,7 +912,7 @@ static int igb_request_msix(struct igb_adapter *adapter)
>  	struct net_device *netdev = adapter->netdev;
>  	int i, err = 0, vector = 0, free_vector = 0;
>  
> -	err = request_irq(adapter->msix_entries[vector].vector,
> +	err = request_threaded_irq(adapter->msix_entries[vector].vector, NULL,
>  			  igb_msix_other, 0, netdev->name, adapter);
>  	if (err)
>  		goto err_out;
> 
> just to see if it solves the problem?
> 
I have two test cases:

1) Boot the machine with nr_cpus=1. The driver reports "PF still
resetting" message continuously. This issue is gone.

2) Run the following script:

    ipaddr_vlan=3
    nic_test=ens14f0
    vf=${nic_test}v0 # The main testing steps:
    while true; do
        ip link set ${nic_test} mtu 1500
        ip link set ${vf} mtu 1500
        ip link set $vf up
        # 3. set vlan and ip for VF
        ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
        ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
        ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
        # 4. check the link state for VF and PF
        ip link show ${nic_test}
        if ! ip link show $vf | grep 'state UP'; then
            echo 'Error found'
            break
        fi
        ip link set $vf down
    done

This one eventually fails. It is the first time that one works and the
other fails. So far, it has been all or nothing. I didn't have time yet to
investigate why this happens.

> > 
> > > In the mean time:
> > > 
> > > igb_msg_task_irq_safe()
> > > -> vfs_raw_spin_lock_irqsave() // raw_spinlock_t
> > > -> igb_vf_reset_event()
> > >   -> igb_vf_reset()
> > >     -> igb_set_rx_mode()
> > >       -> igb_write_mc_addr_list()
> > >          -> mta_list = kcalloc(netdev_mc_count(netdev), 6, GFP_ATOMIC); // kaboom?
> > 
> > Perhaps the solution is to preallocate this buffer, if possible.
> > Doing so would significantly simplify the patch. However, this
> > would require knowing when the multicast (mc) count changes.
> > I attempted to identify this but have not succeeded so far.
> > 
> > > 
> > > By explicitly disabling preemption or using a raw_spinlock_t you need to
> > > pay attention not to do anything that might lead to unbounded loops
> > > (like iterating over many lists, polling on a bit for ages, …) and
> > > paying attention that the whole API underneath that it is not doing that
> > > is allowed to.
> > 
> > I unsure if I understood what you are trying to say.
> 
> The moment you start disabling preemption/ use raw_spin_lock_t you need
> to start about everything underneath/ everything within this section.
> While if you keep using spinlock_t you don't have to worry *that* much
> and worry if *this* will break PREEMPT_RT. Not to worry whether or not
> it is okay to allocate memory or call this function because it might
> break RT.
> OR if netdev_mc_count() returns 1 you loop once later and this costs you
> 1us. If it returns 100, you loop 100 times and it costs how much
> additional time?
> 
> Sebastian
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-12 11:56       ` Wander Lairson Costa
@ 2025-02-12 15:11         ` Sebastian Andrzej Siewior
  2025-02-12 15:21           ` Wander Lairson Costa
  0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-12 15:11 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On 2025-02-12 08:56:47 [-0300], Wander Lairson Costa wrote:
> > What disables preemption? On PREEMPT_RT the spin_lock() does not disable
> > preemption. You shouldn't spin that long. When was interrupt scheduled.
> > _Why_ is the interrupt delayed that long.
> > 
> When I was using trace-cmd report -l, it omitted some fields, one of
> them is preempt-lazy-depth (which was something new to me), and it seems
> this is what affects interrupts. It comes from here [1]. I had the logs,
> but the machine went under maintenance  before I could save them. Once
> it comes back, I can grab them and post here.
> 
> [1] https://elixir.bootlin.com/linux/v6.13.2/source/drivers/net/ethernet/intel/igbvf/netdev.c#L1522

If you do send patches against mainline please test against mainline. As
of today we have preempt-disable and migrate-disable depth. We don't
do lazy-depth anymore, we just have a bit now (which is [lLbB]).
The referenced line will only disable migration, not preemption.

It is important to understand what exactly is going on.

> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > index d368b753a4675..6fe37b8001c36 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -912,7 +912,7 @@ static int igb_request_msix(struct igb_adapter *adapter)
> >  	struct net_device *netdev = adapter->netdev;
> >  	int i, err = 0, vector = 0, free_vector = 0;
> >  
> > -	err = request_irq(adapter->msix_entries[vector].vector,
> > +	err = request_threaded_irq(adapter->msix_entries[vector].vector, NULL,
> >  			  igb_msix_other, 0, netdev->name, adapter);
> >  	if (err)
> >  		goto err_out;
> > 
> > just to see if it solves the problem?
> > 
> I have two test cases:
> 
> 1) Boot the machine with nr_cpus=1. The driver reports "PF still
> resetting" message continuously. This issue is gone.

good.

> 2) Run the following script:
> 
>     ipaddr_vlan=3
>     nic_test=ens14f0
>     vf=${nic_test}v0 # The main testing steps:
>     while true; do
>         ip link set ${nic_test} mtu 1500
>         ip link set ${vf} mtu 1500
>         ip link set $vf up
>         # 3. set vlan and ip for VF
>         ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
>         ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
>         ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
>         # 4. check the link state for VF and PF
>         ip link show ${nic_test}
>         if ! ip link show $vf | grep 'state UP'; then
>             echo 'Error found'
>             break
>         fi
>         ip link set $vf down
>     done
> 
> This one eventually fails. It is the first time that one works and the
> other fails. So far, it has been all or nothing. I didn't have time yet to
> investigate why this happens.

"eventually fails". Does this mean it passes the first few iterations
but then it times out? In that case it might be something else

I managed to find a "Intel Corporation I350 Gigabit Network Connection
(rev 01)" and I end up in a warning if I start the script (without the
while true):

|8021q: adding VLAN 0 to HW filter on device eno0v0
|igbvf 0000:08:10.0: Vlan id 0 is not added
|igb 0000:07:00.0: Setting VLAN 3, QOS 0x0 on VF 0
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
|------------[ cut here ]------------
|WARNING: CPU: 25 PID: 3013 at drivers/net/ethernet/intel/igbvf/netdev.c:1777 igbvf_close+0x111/0x120
…
|CPU: 25 UID: 0 PID: 3013 Comm: ip Not tainted 6.14.0-rc1-rt1+ #186 PREEMPT_RT+LAZY 39474a76e7562bb76173f4b98cf194301d39bf7f
|igbvf 0000:08:10.0: Link is Up 1000 Mbps Full Duplex
|---[ end trace 0000000000000000 ]---
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
|8021q: adding VLAN 0 to HW filter on device eno0v0
|igb 0000:07:00.0: VF 0 attempted to override administratively set VLAN tag
|Reload the VF driver to resume operations
|igbvf 0000:08:10.0: Link is Up 1000 Mbps Full Duplex
|igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter

and the state is down ('Error found' is printed). But if I do it
manually, line by line, then it all passes without the warning and the
state of the VF device is up.

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-12 15:11         ` Sebastian Andrzej Siewior
@ 2025-02-12 15:21           ` Wander Lairson Costa
  2025-02-12 15:29             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 25+ messages in thread
From: Wander Lairson Costa @ 2025-02-12 15:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On Wed, Feb 12, 2025 at 12:11 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2025-02-12 08:56:47 [-0300], Wander Lairson Costa wrote:
> > > What disables preemption? On PREEMPT_RT the spin_lock() does not disable
> > > preemption. You shouldn't spin that long. When was interrupt scheduled.
> > > _Why_ is the interrupt delayed that long.
> > >
> > When I was using trace-cmd report -l, it omitted some fields, one of
> > them is preempt-lazy-depth (which was something new to me), and it seems
> > this is what affects interrupts. It comes from here [1]. I had the logs,
> > but the machine went under maintenance  before I could save them. Once
> > it comes back, I can grab them and post here.
> >
> > [1] https://elixir.bootlin.com/linux/v6.13.2/source/drivers/net/ethernet/intel/igbvf/netdev.c#L1522
>
> If you do send patches against mainline please test against mainline. As
> of today we have preempt-disable and migrate-disable depth. We don't
> do lazy-depth anymore, we just have a bit now (which is [lLbB]).
> The referenced line will only disable migration, not preemption.
>
> It is important to understand what exactly is going on.
>

I forgot to mention. For this specific test, I had to test with an
older kernel because of an ongoing
issue with this machine when running recent kernels (this is why the
machine went to maintenance, btw).
I will reproduce it again. For patches, I did test them against the
latest mainline before submitting.

> > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > > index d368b753a4675..6fe37b8001c36 100644
> > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > @@ -912,7 +912,7 @@ static int igb_request_msix(struct igb_adapter *adapter)
> > >     struct net_device *netdev = adapter->netdev;
> > >     int i, err = 0, vector = 0, free_vector = 0;
> > >
> > > -   err = request_irq(adapter->msix_entries[vector].vector,
> > > +   err = request_threaded_irq(adapter->msix_entries[vector].vector, NULL,
> > >                       igb_msix_other, 0, netdev->name, adapter);
> > >     if (err)
> > >             goto err_out;
> > >
> > > just to see if it solves the problem?
> > >
> > I have two test cases:
> >
> > 1) Boot the machine with nr_cpus=1. The driver reports "PF still
> > resetting" message continuously. This issue is gone.
>
> good.
>
> > 2) Run the following script:
> >
> >     ipaddr_vlan=3
> >     nic_test=ens14f0
> >     vf=${nic_test}v0 # The main testing steps:
> >     while true; do
> >         ip link set ${nic_test} mtu 1500
> >         ip link set ${vf} mtu 1500
> >         ip link set $vf up
> >         # 3. set vlan and ip for VF
> >         ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
> >         ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
> >         ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
> >         # 4. check the link state for VF and PF
> >         ip link show ${nic_test}
> >         if ! ip link show $vf | grep 'state UP'; then
> >             echo 'Error found'
> >             break
> >         fi
> >         ip link set $vf down
> >     done
> >
> > This one eventually fails. It is the first time that one works and the
> > other fails. So far, it has been all or nothing. I didn't have time yet to
> > investigate why this happens.
>
> "eventually fails". Does this mean it passes the first few iterations
> but then it times out? In that case it might be something else
>
Yes. Indeed, might be due something else. I will perform further investigation
when I get the machine back.

> I managed to find a "Intel Corporation I350 Gigabit Network Connection
> (rev 01)" and I end up in a warning if I start the script (without the
> while true):
>
> |8021q: adding VLAN 0 to HW filter on device eno0v0
> |igbvf 0000:08:10.0: Vlan id 0 is not added
> |igb 0000:07:00.0: Setting VLAN 3, QOS 0x0 on VF 0
> |igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
> |------------[ cut here ]------------
> |WARNING: CPU: 25 PID: 3013 at drivers/net/ethernet/intel/igbvf/netdev.c:1777 igbvf_close+0x111/0x120
> …
> |CPU: 25 UID: 0 PID: 3013 Comm: ip Not tainted 6.14.0-rc1-rt1+ #186 PREEMPT_RT+LAZY 39474a76e7562bb76173f4b98cf194301d39bf7f
> |igbvf 0000:08:10.0: Link is Up 1000 Mbps Full Duplex
> |---[ end trace 0000000000000000 ]---
> |igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
> |igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
> |igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
> |8021q: adding VLAN 0 to HW filter on device eno0v0
> |igb 0000:07:00.0: VF 0 attempted to override administratively set VLAN tag
> |Reload the VF driver to resume operations
> |igbvf 0000:08:10.0: Link is Up 1000 Mbps Full Duplex
> |igb 0000:07:00.0: VF 0 attempted to set invalid MAC filter
>
> and the state is down ('Error found' is printed). But if I do it
> manually, line by line, then it all passes without the warning and the
> state of the VF device is up.
>
> Sebastian
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-12 15:21           ` Wander Lairson Costa
@ 2025-02-12 15:29             ` Sebastian Andrzej Siewior
  2025-02-18 14:50               ` Wander Lairson Costa
  0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-12 15:29 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On 2025-02-12 12:21:04 [-0300], Wander Lairson Costa wrote:
> > "eventually fails". Does this mean it passes the first few iterations
> > but then it times out? In that case it might be something else
> >
> Yes. Indeed, might be due something else. I will perform further investigation
> when I get the machine back.

Okay. Then I consider this series not going to be applied, I have an
idea what is happening and I wait until you get back.

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-12 15:29             ` Sebastian Andrzej Siewior
@ 2025-02-18 14:50               ` Wander Lairson Costa
  2025-02-19 15:29                 ` Steven Rostedt
  2025-02-19 16:35                 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-02-18 14:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On Wed, Feb 12, 2025 at 04:29:25PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-12 12:21:04 [-0300], Wander Lairson Costa wrote:
> > > "eventually fails". Does this mean it passes the first few iterations
> > > but then it times out? In that case it might be something else
> > >
> > Yes. Indeed, might be due something else. I will perform further investigation
> > when I get the machine back.
> 
> Okay. Then I consider this series not going to be applied, I have an
> idea what is happening and I wait until you get back.
> 

Sorry it took so long. After a day fighting the machine I could boot an
upstream kernel on it and generate the logs.

These logs are for the test case of booting the kernel with nr_cpus=1:

     kworker/0:0-8       [000] d..2.  2120.708145: process_one_work <-worker_thread
     kworker/0:0-8       [000] ...1.  2120.708145: igbvf_reset_task <-process_one_work
     kworker/0:0-8       [000] ...1.  2120.708145: igbvf_reinit_locked <-process_one_work
     kworker/0:0-8       [000] ...1.  2120.708145: igbvf_down <-igbvf_reinit_locked
     kworker/0:0-8       [000] ...1.  2120.718619: igbvf_update_stats <-igbvf_down
     kworker/0:0-8       [000] ...1.  2120.718619: igbvf_reset <-igbvf_down
     kworker/0:0-8       [000] b..13  2120.718620: e1000_reset_hw_vf <-igbvf_reset
     kworker/0:0-8       [000] b..13  2120.718620: e1000_check_for_rst_vf <-e1000_reset_hw_vf
     kworker/0:0-8       [000] b..13  2120.718621: e1000_write_posted_mbx <-e1000_reset_hw_vf
     kworker/0:0-8       [000] b..13  2120.718621: e1000_write_mbx_vf <-e1000_write_posted_mbx
     kworker/0:0-8       [000] b..13  2120.718624: e1000_check_for_ack_vf <-e1000_write_posted_mbx
     kworker/0:0-8       [000] D.h.3  2120.718626: irq_handler_entry: irq=63 name=ens14f0
     kworker/0:0-8       [000] b..13  2120.719133: e1000_check_for_ack_vf <-e1000_write_posted_mbx
	[...] repeats e1000_check_for_ack_vf for 2000 lines
     kworker/0:0-8       [000] b..13  2120.719634: e1000_check_for_ack_vf <-e1000_write_posted_mbx
     kworker/0:0-8       [000] b..13  2121.730639: e1000_read_posted_mbx <-e1000_reset_hw_vf
     kworker/0:0-8       [000] b..13  2121.730643: e1000_init_hw_vf <-igbvf_reset
     kworker/0:0-8       [000] b..13  2121.730643: e1000_rar_set_vf <-e1000_init_hw_vf
     kworker/0:0-8       [000] b..13  2121.730643: e1000_write_posted_mbx <-e1000_rar_set_vf
     kworker/0:0-8       [000] D.Zf2  2121.730645: igbvf_reset_L14: (igbvf_reset+0x62/0x120 [igbvf])
     kworker/0:0-8       [000] .N...  2121.730649: igbvf_reset_L16: (igbvf_reset+0x7b/0x120 [igbvf])
  irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_msix_other <-irq_thread_fn
  irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_rd32 <-igb_msix_other
  irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst <-igb_msix_other
  irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst_pf <-igb_msix_other

I created two custom probes inside igbvf_reset:

$ perf probe -m /lib/modules/6.14.0-rc3+/kernel/drivers/net/ethernet/intel/igbvf/igbvf.ko -L igbvf_reset
<igbvf_reset@/home/test/kernel-ark/drivers/net/ethernet/intel/igbvf/netdev.c:0>
      0  static void igbvf_reset(struct igbvf_adapter *adapter)
         {
      2         struct e1000_mac_info *mac = &adapter->hw.mac;
                struct net_device *netdev = adapter->netdev;
                struct e1000_hw *hw = &adapter->hw;
         
                spin_lock_bh(&hw->mbx_lock);
         
                /* Allow time for pending master requests to run */
      9         if (mac->ops.reset_hw(hw))
     10                 dev_info(&adapter->pdev->dev, "PF still resetting\n");
         
     12         mac->ops.init_hw(hw);
         
     14         spin_unlock_bh(&hw->mbx_lock);
         
     16         if (is_valid_ether_addr(adapter->hw.mac.addr)) {
     17                 eth_hw_addr_set(netdev, adapter->hw.mac.addr);
     18                 memcpy(netdev->perm_addr, adapter->hw.mac.addr,
                               netdev->addr_len);
                }
         
     22         adapter->last_reset = jiffies;
         }
         
         int igbvf_up(struct igbvf_adapter *adapter)

$ perf probe -m /lib/modules/6.14.0-rc3+/kernel/drivers/net/ethernet/intel/igbvf/igbvf.ko igbvf_reset:14
Added new event:
  probe:igbvf_reset_L14 (on igbvf_reset:14 in igbvf)

You can now use it in all perf tools, such as:

        perf record -e probe:igbvf_reset_L14 -aR sleep 1

$ perf probe -m /lib/modules/6.14.0-rc3+/kernel/drivers/net/ethernet/intel/igbvf/igbvf.ko igbvf_reset:16
Added new event:
  probe:igbvf_reset_L16 (on igbvf_reset:16 in igbvf)

They are intended to monitor the effect of spin_unlock_bh().

This is the trace-cmd command line I ran:

$ trace-cmd start -p function -l 'e1000*' -l 'igb*' -l process_one_work -e irq:irq_handler_entry -e probe
  plugin 'function'

The threaded interrupt handler is called right after (during?)
spin_unlock_bh(). I wonder what the 'f' means in the preempt-count
field there.

I am currently working on something else that has a higher priority, so
I don't have time right now to go deeper on that. But feel free to ask
me for any test or trace you may need.

> Sebastian
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-18 14:50               ` Wander Lairson Costa
@ 2025-02-19 15:29                 ` Steven Rostedt
  2025-02-19 15:30                   ` Steven Rostedt
                                     ` (2 more replies)
  2025-02-19 16:35                 ` Sebastian Andrzej Siewior
  1 sibling, 3 replies; 25+ messages in thread
From: Steven Rostedt @ 2025-02-19 15:29 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Sebastian Andrzej Siewior, Tony Nguyen, davem, kuba, pabeni,
	edumazet, andrew+netdev, netdev, clrkwllms, jgarzik, yuma,
	linux-rt-devel

On Tue, 18 Feb 2025 11:50:55 -0300
Wander Lairson Costa <wander@redhat.com> wrote:

>      kworker/0:0-8       [000] b..13  2121.730643: e1000_init_hw_vf <-igbvf_reset
>      kworker/0:0-8       [000] b..13  2121.730643: e1000_rar_set_vf <-e1000_init_hw_vf
>      kworker/0:0-8       [000] b..13  2121.730643: e1000_write_posted_mbx <-e1000_rar_set_vf
>      kworker/0:0-8       [000] D.Zf2  2121.730645: igbvf_reset_L14: (igbvf_reset+0x62/0x120 [igbvf])
>      kworker/0:0-8       [000] .N...  2121.730649: igbvf_reset_L16: (igbvf_reset+0x7b/0x120 [igbvf])
>   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_msix_other <-irq_thread_fn
>   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_rd32 <-igb_msix_other
>   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst <-igb_msix_other
>   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst_pf <-igb_msix_other
> 

> This is the trace-cmd command line I ran:
> 
> $ trace-cmd start -p function -l 'e1000*' -l 'igb*' -l process_one_work -e irq:irq_handler_entry -e probe
>   plugin 'function'
> 
> The threaded interrupt handler is called right after (during?)
> spin_unlock_bh(). I wonder what the 'f' means in the preempt-count
> field there.

The preempt count is hex, so 'f' means 15. But that that latency field looks corrupted.

After adding the probes by perf, what's the content of the format files for them?

That would likely be in /sys/kernel/tracing/events/probe/*/format

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-19 15:29                 ` Steven Rostedt
@ 2025-02-19 15:30                   ` Steven Rostedt
  2025-02-20 10:52                     ` Wander Lairson Costa
  2025-02-19 16:41                   ` Sebastian Andrzej Siewior
  2025-02-20 10:50                   ` Wander Lairson Costa
  2 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2025-02-19 15:30 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Sebastian Andrzej Siewior, Tony Nguyen, davem, kuba, pabeni,
	edumazet, andrew+netdev, netdev, clrkwllms, jgarzik, yuma,
	linux-rt-devel

On Wed, 19 Feb 2025 10:29:16 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> After adding the probes by perf, what's the content of the format files for them?

Also, what's in:

  /sys/kernel/tracing/dynamic_events

 ?

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-18 14:50               ` Wander Lairson Costa
  2025-02-19 15:29                 ` Steven Rostedt
@ 2025-02-19 16:35                 ` Sebastian Andrzej Siewior
  2025-02-20 11:35                   ` Wander Lairson Costa
  1 sibling, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-19 16:35 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On 2025-02-18 11:50:55 [-0300], Wander Lairson Costa wrote:
> These logs are for the test case of booting the kernel with nr_cpus=1:
> 
>      kworker/0:0-8       [000] d..2.  2120.708145: process_one_work <-worker_thread
>      kworker/0:0-8       [000] ...1.  2120.708145: igbvf_reset_task <-process_one_work

This looks like someone broke the function tracer because the preemtion
level should be 0 here not 1. So we would have to substract one… This
does remind me of something else…

…
>      kworker/0:0-8       [000] b..13  2120.718620: e1000_reset_hw_vf <-igbvf_reset
…
>      kworker/0:0-8       [000] D.h.3  2120.718626: irq_handler_entry: irq=63 name=ens14f0
^ the interrupt.
…
>      kworker/0:0-8       [000] b..13  2120.719133: e1000_check_for_ack_vf <-e1000_write_posted_mbx
>   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_msix_other <-irq_thread_fn
>   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_rd32 <-igb_msix_other
>   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst <-igb_msix_other
>   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst_pf <-igb_msix_other

The threaded interrupt is postponed due to the BH-off section. I am
working on lifting that restriction. Therefore it gets on CPU right
after kworker's bh-enable.

…
> The threaded interrupt handler is called right after (during?)
> spin_unlock_bh(). I wonder what the 'f' means in the preempt-count
> field there.

The hardware interrupt handler gets there while worker is in the wait
loop. The threaded interrupt handler gets postponed until after the last
spin_unlock_bh(). The BH part is the important part.
With that log, I expect the same hold-off part with threaded interrupts
and the same BH-off synchronisation.

> I am currently working on something else that has a higher priority, so
> I don't have time right now to go deeper on that. But feel free to ask
> me for any test or trace you may need.

I would need to check if it is safe to explicitly request the threaded
handler but this is what I would suggest. It works around the issue for
threaded interrupts and PREEMPT_RT as its user.
You confirmed that it works, right?

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-19 15:29                 ` Steven Rostedt
  2025-02-19 15:30                   ` Steven Rostedt
@ 2025-02-19 16:41                   ` Sebastian Andrzej Siewior
  2025-02-20 10:50                   ` Wander Lairson Costa
  2 siblings, 0 replies; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-19 16:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Wander Lairson Costa, Tony Nguyen, davem, kuba, pabeni, edumazet,
	andrew+netdev, netdev, clrkwllms, jgarzik, yuma, linux-rt-devel

On 2025-02-19 10:29:16 [-0500], Steven Rostedt wrote:
> On Tue, 18 Feb 2025 11:50:55 -0300
> Wander Lairson Costa <wander@redhat.com> wrote:
> 
> >      kworker/0:0-8       [000] b..13  2121.730643: e1000_write_posted_mbx <-e1000_rar_set_vf
> >      kworker/0:0-8       [000] D.Zf2  2121.730645: igbvf_reset_L14: (igbvf_reset+0x62/0x120 [igbvf])
> >      kworker/0:0-8       [000] .N...  2121.730649: igbvf_reset_L16: (igbvf_reset+0x7b/0x120 [igbvf])
> >   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_msix_other <-irq_thread_fn
> 
> The preempt count is hex, so 'f' means 15. But that that latency field looks corrupted.

It is high but it kind of makes sense. We cap it at 15 so it might be
higher. But then we would have nesting but why? What confuses me a bit
is the Z because this would indicate NMI.
Also the entry after is a N and nothing else. I would expect a
sched_switch right after unlock so there should be no further entry from
kworker which must run at RT priority because it is boosted by
irq/63-ens14f0-1112.

> Thanks,
> 
> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-19 15:29                 ` Steven Rostedt
  2025-02-19 15:30                   ` Steven Rostedt
  2025-02-19 16:41                   ` Sebastian Andrzej Siewior
@ 2025-02-20 10:50                   ` Wander Lairson Costa
  2 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-02-20 10:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, Tony Nguyen, davem, kuba, pabeni,
	edumazet, andrew+netdev, netdev, clrkwllms, jgarzik, yuma,
	linux-rt-devel

On Wed, Feb 19, 2025 at 10:29:16AM -0500, Steven Rostedt wrote:
> On Tue, 18 Feb 2025 11:50:55 -0300
> Wander Lairson Costa <wander@redhat.com> wrote:
> 
> >      kworker/0:0-8       [000] b..13  2121.730643: e1000_init_hw_vf <-igbvf_reset
> >      kworker/0:0-8       [000] b..13  2121.730643: e1000_rar_set_vf <-e1000_init_hw_vf
> >      kworker/0:0-8       [000] b..13  2121.730643: e1000_write_posted_mbx <-e1000_rar_set_vf
> >      kworker/0:0-8       [000] D.Zf2  2121.730645: igbvf_reset_L14: (igbvf_reset+0x62/0x120 [igbvf])
> >      kworker/0:0-8       [000] .N...  2121.730649: igbvf_reset_L16: (igbvf_reset+0x7b/0x120 [igbvf])
> >   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_msix_other <-irq_thread_fn
> >   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_rd32 <-igb_msix_other
> >   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst <-igb_msix_other
> >   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst_pf <-igb_msix_other
> > 
> 
> > This is the trace-cmd command line I ran:
> > 
> > $ trace-cmd start -p function -l 'e1000*' -l 'igb*' -l process_one_work -e irq:irq_handler_entry -e probe
> >   plugin 'function'
> > 
> > The threaded interrupt handler is called right after (during?)
> > spin_unlock_bh(). I wonder what the 'f' means in the preempt-count
> > field there.
> 
> The preempt count is hex, so 'f' means 15. But that that latency field looks corrupted.
> 
> After adding the probes by perf, what's the content of the format files for them?
> 
> That would likely be in /sys/kernel/tracing/events/probe/*/format
> 

$ cd /sys/kernel/tracing/events/probe/
$ cat igbvf_reset_L1*/format
name: igbvf_reset_L14
ID: 2610
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned long __probe_ip; offset:8;       size:8; signed:0;

print fmt: "(%lx)", REC->__probe_ip
name: igbvf_reset_L16
ID: 2611
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned long __probe_ip; offset:8;       size:8; signed:0;

print fmt: "(%lx)", REC->__probe_ip

> Thanks,
> 
> -- Steve
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-19 15:30                   ` Steven Rostedt
@ 2025-02-20 10:52                     ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-02-20 10:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, Tony Nguyen, davem, kuba, pabeni,
	edumazet, andrew+netdev, netdev, clrkwllms, jgarzik, yuma,
	linux-rt-devel

On Wed, Feb 19, 2025 at 10:30:06AM -0500, Steven Rostedt wrote:
> On Wed, 19 Feb 2025 10:29:16 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > After adding the probes by perf, what's the content of the format files for them?
> 
> Also, what's in:
> 
>   /sys/kernel/tracing/dynamic_events
> 
>  ?

$ cat /sys/kernel/tracing/dynamic_events
p:probe/igbvf_reset_L14 igbvf:igbvf_reset+98
p:probe/igbvf_reset_L16 igbvf:igbvf_reset+123

> 
> -- Steve
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-19 16:35                 ` Sebastian Andrzej Siewior
@ 2025-02-20 11:35                   ` Wander Lairson Costa
  2025-02-20 11:38                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 25+ messages in thread
From: Wander Lairson Costa @ 2025-02-20 11:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On Wed, Feb 19, 2025 at 05:35:54PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-18 11:50:55 [-0300], Wander Lairson Costa wrote:
> > These logs are for the test case of booting the kernel with nr_cpus=1:
> > 
> >      kworker/0:0-8       [000] d..2.  2120.708145: process_one_work <-worker_thread
> >      kworker/0:0-8       [000] ...1.  2120.708145: igbvf_reset_task <-process_one_work
> 
> This looks like someone broke the function tracer because the preemtion
> level should be 0 here not 1. So we would have to substract one… This
> does remind me of something else…
> 

That fooled for quite a while. That's why I claimed the preemption was
disabled at beginning.

> …
> >      kworker/0:0-8       [000] b..13  2120.718620: e1000_reset_hw_vf <-igbvf_reset
> …
> >      kworker/0:0-8       [000] D.h.3  2120.718626: irq_handler_entry: irq=63 name=ens14f0
> ^ the interrupt.
> …
> >      kworker/0:0-8       [000] b..13  2120.719133: e1000_check_for_ack_vf <-e1000_write_posted_mbx
> >   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_msix_other <-irq_thread_fn
> >   irq/63-ens14f0-1112    [000] b..12  2121.730652: igb_rd32 <-igb_msix_other
> >   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst <-igb_msix_other
> >   irq/63-ens14f0-1112    [000] b..13  2121.730653: igb_check_for_rst_pf <-igb_msix_other
> 
> The threaded interrupt is postponed due to the BH-off section. I am
> working on lifting that restriction. Therefore it gets on CPU right
> after kworker's bh-enable.
> 
> …
> > The threaded interrupt handler is called right after (during?)
> > spin_unlock_bh(). I wonder what the 'f' means in the preempt-count
> > field there.
> 
> The hardware interrupt handler gets there while worker is in the wait
> loop. The threaded interrupt handler gets postponed until after the last
> spin_unlock_bh(). The BH part is the important part.
> With that log, I expect the same hold-off part with threaded interrupts
> and the same BH-off synchronisation.
> 
> > I am currently working on something else that has a higher priority, so
> > I don't have time right now to go deeper on that. But feel free to ask
> > me for any test or trace you may need.
> 
> I would need to check if it is safe to explicitly request the threaded
> handler but this is what I would suggest. It works around the issue for
> threaded interrupts and PREEMPT_RT as its user.
> You confirmed that it works, right?
> 

Do you mean that earlier test removing IRQF_COND_ONESHOT? If so, yes.

> Sebastian
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-20 11:35                   ` Wander Lairson Costa
@ 2025-02-20 11:38                     ` Sebastian Andrzej Siewior
  2025-03-14 12:12                       ` Wander Lairson Costa
  0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-20 11:38 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On 2025-02-20 08:35:17 [-0300], Wander Lairson Costa wrote:
> > You confirmed that it works, right?
> > 
> 
> Do you mean that earlier test removing IRQF_COND_ONESHOT? If so, yes.

I mean just request_threaded_irq() as suggested in
	https://lore.kernel.org/all/20250206115914.VfzGTwD8@linutronix.de/

Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-02-20 11:38                     ` Sebastian Andrzej Siewior
@ 2025-03-14 12:12                       ` Wander Lairson Costa
  2025-03-14 13:16                         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 25+ messages in thread
From: Wander Lairson Costa @ 2025-03-14 12:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On Thu, Feb 20, 2025 at 8:39 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2025-02-20 08:35:17 [-0300], Wander Lairson Costa wrote:
> > > You confirmed that it works, right?
> > >
> >
> > Do you mean that earlier test removing IRQF_COND_ONESHOT? If so, yes.
>
> I mean just request_threaded_irq() as suggested in
>         https://lore.kernel.org/all/20250206115914.VfzGTwD8@linutronix.de/
>

I forgot to answer, sorry. The answer is yes.

> Sebastian
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-03-14 12:12                       ` Wander Lairson Costa
@ 2025-03-14 13:16                         ` Sebastian Andrzej Siewior
  2025-03-14 14:08                           ` Wander Lairson Costa
  0 siblings, 1 reply; 25+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 13:16 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On 2025-03-14 09:12:20 [-0300], Wander Lairson Costa wrote:
> On Thu, Feb 20, 2025 at 8:39 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2025-02-20 08:35:17 [-0300], Wander Lairson Costa wrote:
> > > > You confirmed that it works, right?
> > > >
> > >
> > > Do you mean that earlier test removing IRQF_COND_ONESHOT? If so, yes.
> >
> > I mean just request_threaded_irq() as suggested in
> >         https://lore.kernel.org/all/20250206115914.VfzGTwD8@linutronix.de/
> >
> 
> I forgot to answer, sorry. The answer is yes.

Are you going to post this a patch or want me doing it?
 
Sebastian

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-03-14 13:16                         ` Sebastian Andrzej Siewior
@ 2025-03-14 14:08                           ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-03-14 14:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	rostedt, clrkwllms, jgarzik, yuma, linux-rt-devel

On Fri, Mar 14, 2025 at 10:16 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2025-03-14 09:12:20 [-0300], Wander Lairson Costa wrote:
> > On Thu, Feb 20, 2025 at 8:39 AM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > >
> > > On 2025-02-20 08:35:17 [-0300], Wander Lairson Costa wrote:
> > > > > You confirmed that it works, right?
> > > > >
> > > >
> > > > Do you mean that earlier test removing IRQF_COND_ONESHOT? If so, yes.
> > >
> > > I mean just request_threaded_irq() as suggested in
> > >         https://lore.kernel.org/all/20250206115914.VfzGTwD8@linutronix.de/
> > >
> >
> > I forgot to answer, sorry. The answer is yes.
>
> Are you going to post this a patch or want me doing it?
>

Please don't hesitate to post it, you are the true author and deserve
the credit.

> Sebastian
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-03-14 14:08 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-04 17:52 [PATCH net 0/4][pull request] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
2025-02-04 17:52 ` [PATCH net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Tony Nguyen
2025-02-04 17:52 ` [PATCH net 2/4] igb: introduce raw vfs_lock to igb_adapter Tony Nguyen
2025-02-04 17:52 ` [PATCH net 3/4] igb: split igb_msg_task() Tony Nguyen
2025-02-04 17:52 ` [PATCH net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT Tony Nguyen
2025-02-05  9:48 ` [PATCH net 0/4][pull request] " Sebastian Andrzej Siewior
2025-02-05 20:08   ` Wander Lairson Costa
2025-02-06 11:59     ` Sebastian Andrzej Siewior
2025-02-06 20:42       ` Wander Lairson Costa
2025-02-12 11:56       ` Wander Lairson Costa
2025-02-12 15:11         ` Sebastian Andrzej Siewior
2025-02-12 15:21           ` Wander Lairson Costa
2025-02-12 15:29             ` Sebastian Andrzej Siewior
2025-02-18 14:50               ` Wander Lairson Costa
2025-02-19 15:29                 ` Steven Rostedt
2025-02-19 15:30                   ` Steven Rostedt
2025-02-20 10:52                     ` Wander Lairson Costa
2025-02-19 16:41                   ` Sebastian Andrzej Siewior
2025-02-20 10:50                   ` Wander Lairson Costa
2025-02-19 16:35                 ` Sebastian Andrzej Siewior
2025-02-20 11:35                   ` Wander Lairson Costa
2025-02-20 11:38                     ` Sebastian Andrzej Siewior
2025-03-14 12:12                       ` Wander Lairson Costa
2025-03-14 13:16                         ` Sebastian Andrzej Siewior
2025-03-14 14:08                           ` Wander Lairson Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).