netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT
@ 2024-12-04 11:42 Wander Lairson Costa
  2024-12-04 11:42 ` [PATCH iwl-net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Wander Lairson Costa
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Wander Lairson Costa @ 2024-12-04 11:42 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jeff Garzik, Auke Kok, moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT
  Cc: Wander Lairson Costa

This is the second attempt at fixing the behavior of igb_msix_other()
for PREEMPT_RT. The previous attempt [1] was reverted [2] following
concerns raised by Sebastian [3].

The initial approach proposed converting vfs_lock to a raw_spinlock,
a minor change intended to make it safe. However, it became evident
that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
which is unsafe in interrupt context on PREEMPT_RT systems.

To address this, the solution involves splitting igb_msg_task()
into two parts:

    * One part invoked from the IRQ context.
    * Another part called from the threaded interrupt handler.

To accommodate this, vfs_lock has been restructured into a double
lock: a spinlock_t and a raw_spinlock_t. In the revised design:

    * igb_disable_sriov() locks both spinlocks.
    * Each part of igb_msg_task() locks the appropriate spinlock for
    its execution context.

It is worth noting that the double lock mechanism is only active under
PREEMPT_RT. For non-PREEMPT_RT builds, the additional raw_spinlock_t
field is ommited.

If the extra raw_spinlock_t field can be tolerated under
!PREEMPT_RT (even though it remains unused), we can eliminate the
need for #ifdefs and simplify the code structure.

I will be on vacation from December 7th to Christmas and will address
review comments upon my return.

If possible, I kindly request the Intel team to perform smoke tests
on both stock and realtime kernels to catch any potential issues with
this patch series.

Cheers,
Wander

[1] https://lore.kernel.org/all/20240920185918.616302-2-wander@redhat.com/
[2] https://lore.kernel.org/all/20241104124050.22290-1-wander@redhat.com/
[3] https://lore.kernel.org/all/20241104110708.gFyxRFlC@linutronix.de/


Wander Lairson Costa (4):
  igb: narrow scope of vfs_lock in SR-IOV cleanup
  igb: introduce raw vfs_lock to igb_adapter
  igb: split igb_msg_task()
  igb: fix igb_msix_other() handling for PREEMPT_RT

 drivers/net/ethernet/intel/igb/igb.h      |   4 +
 drivers/net/ethernet/intel/igb/igb_main.c | 160 +++++++++++++++++++---
 2 files changed, 148 insertions(+), 16 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH iwl-net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup
  2024-12-04 11:42 [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
@ 2024-12-04 11:42 ` Wander Lairson Costa
  2025-01-07 10:06   ` [Intel-wired-lan] " Romanowski, Rafal
  2024-12-04 11:42 ` [PATCH iwl-net 2/4] igb: introduce raw vfs_lock to igb_adapter Wander Lairson Costa
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2024-12-04 11:42 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Auke Kok, Jeff Garzik, moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT
  Cc: Wander Lairson Costa

The adapter->vfs_lock currently protects critical sections shared between
igb_disable_sriov() and igb_msg_task(). Since igb_msg_task() — which is
invoked solely by the igb_msix_other() ISR—only proceeds when
adapter->vfs_allocated_count > 0, we can reduce the lock scope further.

By moving the assignment adapter->vfs_allocated_count = 0 to the start of the
cleanup code in igb_disable_sriov(), we can restrict the spinlock protection
solely to this assignment. This change removes kfree() calls from within the
locked section, simplifying lock management.

Once kfree() is outside the vfs_lock scope, it becomes possible to safely
convert vfs_lock to a raw_spin_lock.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 08578980b6518..4ca25660e876e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3708,12 +3708,12 @@ static int igb_disable_sriov(struct pci_dev *pdev, bool reinit)
 			msleep(500);
 		}
 		spin_lock_irqsave(&adapter->vfs_lock, flags);
+		adapter->vfs_allocated_count = 0;
+		spin_unlock_irqrestore(&adapter->vfs_lock, flags);
 		kfree(adapter->vf_mac_list);
 		adapter->vf_mac_list = NULL;
 		kfree(adapter->vf_data);
 		adapter->vf_data = NULL;
-		adapter->vfs_allocated_count = 0;
-		spin_unlock_irqrestore(&adapter->vfs_lock, flags);
 		wr32(E1000_IOVCTL, E1000_IOVCTL_REUSE_VFQ);
 		wrfl();
 		msleep(100);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH iwl-net 2/4] igb: introduce raw vfs_lock to igb_adapter
  2024-12-04 11:42 [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
  2024-12-04 11:42 ` [PATCH iwl-net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Wander Lairson Costa
@ 2024-12-04 11:42 ` Wander Lairson Costa
  2025-01-07 10:08   ` [Intel-wired-lan] " Romanowski, Rafal
  2024-12-04 11:42 ` [PATCH iwl-net 3/4] igb: split igb_msg_task() Wander Lairson Costa
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2024-12-04 11:42 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jeff Garzik, Auke Kok, moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT
  Cc: Wander Lairson Costa, Clark Williams

This change adds a raw_spinlock for the vfs_lock to the
igb_adapter structure, enabling its use in both interrupt and
preemptible contexts. This is essential for upcoming modifications
to split igb_msg_task() into interrupt-safe and preemptible-safe
parts.

The motivation for this change stems from the need to modify
igb_msix_other() to run in interrupt context under PREEMPT_RT.
Currently, igb_msg_task() contains a code path that invokes
kcalloc() with the GFP_ATOMIC flag. However, on PREEMPT_RT,
GFP_ATOMIC is not honored, making it unsafe to call allocation
functions in interrupt context. By introducing this raw spinlock,
we can safely acquire the lock in both contexts, paving the way for
the necessary restructuring of igb_msg_task().

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Suggested-by: Clark Williams <williams@redhat.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |  4 ++
 drivers/net/ethernet/intel/igb/igb_main.c | 51 ++++++++++++++++++++---
 2 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 3c2dc7bdebb50..d50c22f09d0f8 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -666,6 +666,10 @@ struct igb_adapter {
 	struct vf_mac_filter *vf_mac_list;
 	/* lock for VF resources */
 	spinlock_t vfs_lock;
+#ifdef CONFIG_PREEMPT_RT
+	/* Used to lock VFS in interrupt context under PREEMPT_RT */
+	raw_spinlock_t raw_vfs_lock;
+#endif
 };
 
 /* flags controlling PTP/1588 function */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 4ca25660e876e..9b4235ec226df 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3657,6 +3657,47 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return err;
 }
 
+#ifdef CONFIG_PREEMPT_RT
+static __always_inline void vfs_lock_init(struct igb_adapter *adapter)
+{
+	spin_lock_init(&adapter->vfs_lock);
+	raw_spin_lock_init(&adapter->raw_vfs_lock);
+}
+
+static __always_inline void vfs_lock_irqsave(struct igb_adapter *adapter,
+					     unsigned long *flags)
+{
+	/*
+	 * Remember that under PREEMPT_RT spin_lock_irqsave
+	 * ignores the flags parameter
+	 */
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+	raw_spin_lock_irqsave(&adapter->raw_vfs_lock, *flags);
+}
+
+static __always_inline void vfs_unlock_irqrestore(struct igb_adapter *adapter,
+						  unsigned long flags)
+{
+	raw_spin_unlock_irqrestore(&adapter->raw_vfs_lock, flags);
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
+#else
+static __always_inline void vfs_lock_init(struct igb_adapter *adapter)
+{
+	spin_lock_init(&adapter->vfs_lock);
+}
+
+static __always_inline void vfs_lock_irqsave(struct igb_adapter *adapter, unsigned long *flags)
+{
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+}
+
+static __always_inline void vfs_unlock_irqrestore(struct igb_adapter *adapter, unsigned long flags)
+{
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
+#endif
+
 #ifdef CONFIG_PCI_IOV
 static int igb_sriov_reinit(struct pci_dev *dev)
 {
@@ -3707,9 +3748,9 @@ static int igb_disable_sriov(struct pci_dev *pdev, bool reinit)
 			pci_disable_sriov(pdev);
 			msleep(500);
 		}
-		spin_lock_irqsave(&adapter->vfs_lock, flags);
+		vfs_lock_irqsave(adapter, &flags);
 		adapter->vfs_allocated_count = 0;
-		spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+		vfs_unlock_irqrestore(adapter, flags);
 		kfree(adapter->vf_mac_list);
 		adapter->vf_mac_list = NULL;
 		kfree(adapter->vf_data);
@@ -4042,7 +4083,7 @@ static int igb_sw_init(struct igb_adapter *adapter)
 	spin_lock_init(&adapter->stats64_lock);
 
 	/* init spinlock to avoid concurrency of VF resources */
-	spin_lock_init(&adapter->vfs_lock);
+	vfs_lock_init(adapter);
 #ifdef CONFIG_PCI_IOV
 	switch (hw->mac.type) {
 	case e1000_82576:
@@ -8035,7 +8076,7 @@ static void igb_msg_task(struct igb_adapter *adapter)
 	unsigned long flags;
 	u32 vf;
 
-	spin_lock_irqsave(&adapter->vfs_lock, flags);
+	vfs_lock_irqsave(adapter, &flags);
 	for (vf = 0; vf < adapter->vfs_allocated_count; vf++) {
 		/* process any reset requests */
 		if (!igb_check_for_rst(hw, vf))
@@ -8049,7 +8090,7 @@ static void igb_msg_task(struct igb_adapter *adapter)
 		if (!igb_check_for_ack(hw, vf))
 			igb_rcv_ack_from_vf(adapter, vf);
 	}
-	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+	vfs_unlock_irqrestore(adapter, flags);
 }
 
 /**
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH iwl-net 3/4] igb: split igb_msg_task()
  2024-12-04 11:42 [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
  2024-12-04 11:42 ` [PATCH iwl-net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Wander Lairson Costa
  2024-12-04 11:42 ` [PATCH iwl-net 2/4] igb: introduce raw vfs_lock to igb_adapter Wander Lairson Costa
@ 2024-12-04 11:42 ` Wander Lairson Costa
  2025-01-07 10:09   ` [Intel-wired-lan] " Romanowski, Rafal
  2024-12-04 11:42 ` [PATCH iwl-net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2024-12-04 11:42 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Auke Kok, Jeff Garzik, moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT
  Cc: Wander Lairson Costa

From the perspective of PREEMPT_RT, igb_msg_task() invokes functions
that are a mix of IRQ-safe and non-IRQ-safe operations.

To address this, we separate igb_msg_task() into distinct IRQ-safe and
preemptible-safe components. This is a preparatory step for upcoming
commits, where the igb_msix_other interrupt handler will be split into
IRQ and threaded handlers, each invoking the appropriate part of the
newly divided igb_msg_task().

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 88 +++++++++++++++++++++--
 1 file changed, 81 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 9b4235ec226df..5828831fd29c2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -149,6 +149,8 @@ static int igb_vlan_rx_kill_vid(struct net_device *, __be16, u16);
 static void igb_restore_vlan(struct igb_adapter *);
 static void igb_rar_set_index(struct igb_adapter *, u32);
 static void igb_ping_all_vfs(struct igb_adapter *);
+static void igb_msg_task_irq_safe(struct igb_adapter *adapter);
+static void igb_msg_task_preemptible_safe(struct igb_adapter *adapter);
 static void igb_msg_task(struct igb_adapter *);
 static void igb_vmm_control(struct igb_adapter *);
 static int igb_set_vf_mac(struct igb_adapter *, int, unsigned char *);
@@ -3681,6 +3683,30 @@ static __always_inline void vfs_unlock_irqrestore(struct igb_adapter *adapter,
 	raw_spin_unlock_irqrestore(&adapter->raw_vfs_lock, flags);
 	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
 }
+
+static __always_inline void vfs_spin_lock_irqsave(struct igb_adapter *adapter,
+						  unsigned long *flags)
+{
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+}
+
+static __always_inline void vfs_spin_unlock_irqrestore(struct igb_adapter *adapter,
+						       unsigned long flags)
+{
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
+
+static __always_inline void vfs_raw_spin_lock_irqsave(struct igb_adapter *adapter,
+						      unsigned long *flags)
+{
+	raw_spin_lock_irqsave(&adapter->raw_vfs_lock, *flags);
+}
+
+static __always_inline void vfs_raw_spin_unlock_irqrestore(struct igb_adapter *adapter,
+							   unsigned long flags)
+{
+	raw_spin_unlock_irqrestore(&adapter->raw_vfs_lock, flags);
+}
 #else
 static __always_inline void vfs_lock_init(struct igb_adapter *adapter)
 {
@@ -3696,6 +3722,30 @@ static __always_inline void vfs_unlock_irqrestore(struct igb_adapter *adapter, u
 {
 	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
 }
+
+static __always_inline void vfs_spin_lock_irqsave(struct igb_adapter *adapter,
+						  unsigned long *flags)
+{
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+}
+
+static __always_inline void vfs_spin_unlock_irqrestore(struct igb_adapter *adapter,
+						       unsigned long flags)
+{
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
+
+static __always_inline void vfs_raw_spin_lock_irqsave(struct igb_adapter *adapter,
+						      unsigned long *flags)
+{
+	spin_lock_irqsave(&adapter->vfs_lock, *flags);
+}
+
+static __always_inline void vfs_raw_spin_unlock_irqrestore(struct igb_adapter *adapter,
+							   unsigned long flags)
+{
+	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+}
 #endif
 
 #ifdef CONFIG_PCI_IOV
@@ -8070,27 +8120,51 @@ static void igb_rcv_msg_from_vf(struct igb_adapter *adapter, u32 vf)
 	igb_unlock_mbx(hw, vf);
 }
 
-static void igb_msg_task(struct igb_adapter *adapter)
+/*
+ * Note: the split of irq and preempible safe parts of igb_msg_task()
+ * only makes sense under PREEMPT_RT.
+ * The root cause of igb_rcv_msg_from_vf() is not IRQ safe is because
+ * it calls kcalloc with GFP_ATOMIC, but GFP_ATOMIC is not IRQ safe
+ * in PREEMPT_RT.
+ */
+static void igb_msg_task_irq_safe(struct igb_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	unsigned long flags;
 	u32 vf;
 
-	vfs_lock_irqsave(adapter, &flags);
+	vfs_raw_spin_lock_irqsave(adapter, &flags);
 	for (vf = 0; vf < adapter->vfs_allocated_count; vf++) {
 		/* process any reset requests */
 		if (!igb_check_for_rst(hw, vf))
 			igb_vf_reset_event(adapter, vf);
 
-		/* process any messages pending */
-		if (!igb_check_for_msg(hw, vf))
-			igb_rcv_msg_from_vf(adapter, vf);
-
 		/* process any acks */
 		if (!igb_check_for_ack(hw, vf))
 			igb_rcv_ack_from_vf(adapter, vf);
 	}
-	vfs_unlock_irqrestore(adapter, flags);
+	vfs_raw_spin_unlock_irqrestore(adapter, flags);
+}
+
+static void igb_msg_task_preemptible_safe(struct igb_adapter *adapter)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	unsigned long flags;
+	u32 vf;
+
+	vfs_spin_lock_irqsave(adapter, &flags);
+	for (vf = 0; vf < adapter->vfs_allocated_count; vf++) {
+		/* process any messages pending */
+		if (!igb_check_for_msg(hw, vf))
+			igb_rcv_msg_from_vf(adapter, vf);
+	}
+	vfs_spin_unlock_irqrestore(adapter, flags);
+}
+
+static __always_inline void igb_msg_task(struct igb_adapter *adapter)
+{
+	igb_msg_task_irq_safe(adapter);
+	igb_msg_task_preemptible_safe(adapter);
 }
 
 /**
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH iwl-net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2024-12-04 11:42 [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
                   ` (2 preceding siblings ...)
  2024-12-04 11:42 ` [PATCH iwl-net 3/4] igb: split igb_msg_task() Wander Lairson Costa
@ 2024-12-04 11:42 ` Wander Lairson Costa
  2025-01-07 10:10   ` [Intel-wired-lan] " Romanowski, Rafal
  2024-12-26 13:24 ` [PATCH iwl-net 0/4] " Wander Lairson Costa
  2025-01-07 13:51 ` Sebastian Andrzej Siewior
  5 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2024-12-04 11:42 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Auke Kok, Jeff Garzik, moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT
  Cc: Wander Lairson Costa, Yuying Ma

During testing of SR-IOV, Red Hat QE encountered an issue where the
ip link up command intermittently fails for the igbvf interfaces when
using the PREEMPT_RT variant. Investigation revealed that
e1000_write_posted_mbx returns an error due to the lack of an ACK
from e1000_poll_for_ack.

The underlying issue arises from the fact that IRQs are threaded by
default under PREEMPT_RT. While the exact hardware details are not
available, it appears that the IRQ handled by igb_msix_other must
be processed before e1000_poll_for_ack times out. However,
e1000_write_posted_mbx is called with preemption disabled, leading
to a scenario where the IRQ is serviced only after the failure of
e1000_write_posted_mbx.

Commit 338c4d3902fe ("igb: Disable threaded IRQ for igb_msix_other")
forced the ISR to run in a non-threaded context. However, Sebastian
observed that some functions called within the ISR acquire locks that
may sleep.

In the previous two patches, we managed to make igb_msg_mask() safe to
call from an interrupt context.

In this commit, we move most of the ISR handling to an interrupt
context, leaving non IRQ safe code to be called from the thread
context under PREEMPT_RT.

Reproducer:

ipaddr_vlan=3
nic_test=ens14f0
vf=${nic_test}v0 # The main testing steps:
while true; do
    ip link set ${nic_test} mtu 1500
    ip link set ${vf} mtu 1500
    ip link set $vf up
    # 3. set vlan and ip for VF
    ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
    ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
    ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
    # 4. check the link state for VF and PF
    ip link show ${nic_test}
    if ! ip link show $vf | grep 'state UP'; then
        echo 'Error found'
        break
    fi
    ip link set $vf down
done

You can also reproduce it more reliably by setting nr_cpus=1 in the
kernel command line.

Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reported-by: Yuying Ma <yuma@redhat.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 35 ++++++++++++++++-------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 5828831fd29c2..b2894cebe2c9e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -131,6 +131,7 @@ static void igb_set_uta(struct igb_adapter *adapter, bool set);
 static irqreturn_t igb_intr(int irq, void *);
 static irqreturn_t igb_intr_msi(int irq, void *);
 static irqreturn_t igb_msix_other(int irq, void *);
+static irqreturn_t igb_msix_other_threaded(int irq, void *);
 static irqreturn_t igb_msix_ring(int irq, void *);
 #ifdef CONFIG_IGB_DCA
 static void igb_update_dca(struct igb_q_vector *);
@@ -151,7 +152,6 @@ static void igb_rar_set_index(struct igb_adapter *, u32);
 static void igb_ping_all_vfs(struct igb_adapter *);
 static void igb_msg_task_irq_safe(struct igb_adapter *adapter);
 static void igb_msg_task_preemptible_safe(struct igb_adapter *adapter);
-static void igb_msg_task(struct igb_adapter *);
 static void igb_vmm_control(struct igb_adapter *);
 static int igb_set_vf_mac(struct igb_adapter *, int, unsigned char *);
 static void igb_flush_mac_table(struct igb_adapter *);
@@ -908,8 +908,9 @@ static int igb_request_msix(struct igb_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int i, err = 0, vector = 0, free_vector = 0;
 
-	err = request_irq(adapter->msix_entries[vector].vector,
-			  igb_msix_other, 0, netdev->name, adapter);
+	err = request_threaded_irq(adapter->msix_entries[vector].vector,
+				   igb_msix_other, igb_msix_other_threaded,
+				   IRQF_NO_THREAD, netdev->name, adapter);
 	if (err)
 		goto err_out;
 
@@ -7113,9 +7114,27 @@ static irqreturn_t igb_msix_other(int irq, void *data)
 		igb_check_wvbr(adapter);
 	}
 
-	/* Check for a mailbox event */
+	/* Check for a mailbox event (interrupt safe part) */
 	if (icr & E1000_ICR_VMMB)
-		igb_msg_task(adapter);
+		igb_msg_task_irq_safe(adapter);
+
+	adapter->test_icr = icr;
+
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		return igb_msix_other_threaded(irq, data);
+
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t igb_msix_other_threaded(int irq, void *data)
+{
+	struct igb_adapter *adapter = data;
+	struct e1000_hw *hw = &adapter->hw;
+	u32 icr = adapter->test_icr;
+
+	/* Check for a mailbox event (preempible safe part) */
+	if (icr & E1000_ICR_VMMB)
+		igb_msg_task_preemptible_safe(adapter);
 
 	if (icr & E1000_ICR_LSC) {
 		hw->mac.get_link_status = 1;
@@ -8161,12 +8180,6 @@ static void igb_msg_task_preemptible_safe(struct igb_adapter *adapter)
 	vfs_spin_unlock_irqrestore(adapter, flags);
 }
 
-static __always_inline void igb_msg_task(struct igb_adapter *adapter)
-{
-	igb_msg_task_irq_safe(adapter);
-	igb_msg_task_preemptible_safe(adapter);
-}
-
 /**
  *  igb_set_uta - Set unicast filter table address
  *  @adapter: board private structure
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2024-12-04 11:42 [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
                   ` (3 preceding siblings ...)
  2024-12-04 11:42 ` [PATCH iwl-net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
@ 2024-12-26 13:24 ` Wander Lairson Costa
  2025-01-07 13:51 ` Sebastian Andrzej Siewior
  5 siblings, 0 replies; 16+ messages in thread
From: Wander Lairson Costa @ 2024-12-26 13:24 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jeff Garzik, Auke Kok, moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

On Wed, Dec 4, 2024 at 8:43 AM Wander Lairson Costa <wander@redhat.com> wrote:
>
> This is the second attempt at fixing the behavior of igb_msix_other()
> for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> concerns raised by Sebastian [3].
>
> The initial approach proposed converting vfs_lock to a raw_spinlock,
> a minor change intended to make it safe. However, it became evident
> that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
> which is unsafe in interrupt context on PREEMPT_RT systems.
>
> To address this, the solution involves splitting igb_msg_task()
> into two parts:
>
>     * One part invoked from the IRQ context.
>     * Another part called from the threaded interrupt handler.
>
> To accommodate this, vfs_lock has been restructured into a double
> lock: a spinlock_t and a raw_spinlock_t. In the revised design:
>
>     * igb_disable_sriov() locks both spinlocks.
>     * Each part of igb_msg_task() locks the appropriate spinlock for
>     its execution context.
>
> It is worth noting that the double lock mechanism is only active under
> PREEMPT_RT. For non-PREEMPT_RT builds, the additional raw_spinlock_t
> field is ommited.
>
> If the extra raw_spinlock_t field can be tolerated under
> !PREEMPT_RT (even though it remains unused), we can eliminate the
> need for #ifdefs and simplify the code structure.
>
> I will be on vacation from December 7th to Christmas and will address
> review comments upon my return.
>
> If possible, I kindly request the Intel team to perform smoke tests
> on both stock and realtime kernels to catch any potential issues with
> this patch series.
>
> Cheers,
> Wander
>
> [1] https://lore.kernel.org/all/20240920185918.616302-2-wander@redhat.com/
> [2] https://lore.kernel.org/all/20241104124050.22290-1-wander@redhat.com/
> [3] https://lore.kernel.org/all/20241104110708.gFyxRFlC@linutronix.de/
>
>
> Wander Lairson Costa (4):
>   igb: narrow scope of vfs_lock in SR-IOV cleanup
>   igb: introduce raw vfs_lock to igb_adapter
>   igb: split igb_msg_task()
>   igb: fix igb_msix_other() handling for PREEMPT_RT
>
>  drivers/net/ethernet/intel/igb/igb.h      |   4 +
>  drivers/net/ethernet/intel/igb/igb_main.c | 160 +++++++++++++++++++---
>  2 files changed, 148 insertions(+), 16 deletions(-)
>
> --
> 2.47.0
>

I had requested Red Hat Network QA to run regression tests on this,
and they recently reported that no issues were found.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup
  2024-12-04 11:42 ` [PATCH iwl-net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Wander Lairson Costa
@ 2025-01-07 10:06   ` Romanowski, Rafal
  0 siblings, 0 replies; 16+ messages in thread
From: Romanowski, Rafal @ 2025-01-07 10:06 UTC (permalink / raw)
  To: Wander Lairson Costa, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Auke Kok, Jeff Garzik,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Wander Lairson Costa
> Sent: Wednesday, December 4, 2024 12:42 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>; Clark
> Williams <clrkwllms@kernel.org>; Steven Rostedt <rostedt@goodmis.org>; Auke
> Kok <auke-jan.h.kok@intel.com>; Jeff Garzik <jgarzik@redhat.com>; moderated
> list:INTEL ETHERNET DRIVERS <intel-wired-lan@lists.osuosl.org>; open
> list:NETWORKING DRIVERS <netdev@vger.kernel.org>; open list <linux-
> kernel@vger.kernel.org>; open list:Real-time Linux
> (PREEMPT_RT):Keyword:PREEMPT_RT <linux-rt-devel@lists.linux.dev>
> Cc: Wander Lairson Costa <wander@redhat.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net 1/4] igb: narrow scope of vfs_lock in SR-
> IOV cleanup
> 
> The adapter->vfs_lock currently protects critical sections shared between
> igb_disable_sriov() and igb_msg_task(). Since igb_msg_task() — which is invoked
> solely by the igb_msix_other() ISR—only proceeds when
> adapter->vfs_allocated_count > 0, we can reduce the lock scope further.
> 
> By moving the assignment adapter->vfs_allocated_count = 0 to the start of the
> cleanup code in igb_disable_sriov(), we can restrict the spinlock protection solely
> to this assignment. This change removes kfree() calls from within the locked
> section, simplifying lock management.
> 
> Once kfree() is outside the vfs_lock scope, it becomes possible to safely convert
> vfs_lock to a raw_spin_lock.
> 
> Signed-off-by: Wander Lairson Costa <wander@redhat.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 08578980b6518..4ca25660e876e 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3708,12 +3708,12 @@ static int igb_disable_sriov(struct pci_dev *pdev,

Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net 2/4] igb: introduce raw vfs_lock to igb_adapter
  2024-12-04 11:42 ` [PATCH iwl-net 2/4] igb: introduce raw vfs_lock to igb_adapter Wander Lairson Costa
@ 2025-01-07 10:08   ` Romanowski, Rafal
  0 siblings, 0 replies; 16+ messages in thread
From: Romanowski, Rafal @ 2025-01-07 10:08 UTC (permalink / raw)
  To: Wander Lairson Costa, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Jeff Garzik, Auke Kok,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT
  Cc: Clark Williams

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Wander Lairson Costa
> Sent: Wednesday, December 4, 2024 12:42 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>; Clark
> Williams <clrkwllms@kernel.org>; Steven Rostedt <rostedt@goodmis.org>; Jeff
> Garzik <jgarzik@redhat.com>; Auke Kok <auke-jan.h.kok@intel.com>; moderated
> list:INTEL ETHERNET DRIVERS <intel-wired-lan@lists.osuosl.org>; open
> list:NETWORKING DRIVERS <netdev@vger.kernel.org>; open list <linux-
> kernel@vger.kernel.org>; open list:Real-time Linux
> (PREEMPT_RT):Keyword:PREEMPT_RT <linux-rt-devel@lists.linux.dev>
> Cc: Wander Lairson Costa <wander@redhat.com>; Clark Williams
> <williams@redhat.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net 2/4] igb: introduce raw vfs_lock to
> igb_adapter
> 
> This change adds a raw_spinlock for the vfs_lock to the igb_adapter structure,
> enabling its use in both interrupt and preemptible contexts. This is essential for
> upcoming modifications to split igb_msg_task() into interrupt-safe and
> preemptible-safe parts.
> 
> The motivation for this change stems from the need to modify
> igb_msix_other() to run in interrupt context under PREEMPT_RT.
> Currently, igb_msg_task() contains a code path that invokes
> kcalloc() with the GFP_ATOMIC flag. However, on PREEMPT_RT, GFP_ATOMIC is
> not honored, making it unsafe to call allocation functions in interrupt context. By
> introducing this raw spinlock, we can safely acquire the lock in both contexts,
> paving the way for the necessary restructuring of igb_msg_task().
> 
> Signed-off-by: Wander Lairson Costa <wander@redhat.com>
> Suggested-by: Clark Williams <williams@redhat.com>
> ---
>  drivers/net/ethernet/intel/igb/igb.h      |  4 ++
>  drivers/net/ethernet/intel/igb/igb_main.c | 51 ++++++++++++++++++++---
>  2 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb.h
> b/drivers/net/ethernet/intel/igb/igb.h
> index 3c2dc7bdebb50..d50c22f09d0f8 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -666,6 +666,10 @@ struct igb_adapter {


Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net 3/4] igb: split igb_msg_task()
  2024-12-04 11:42 ` [PATCH iwl-net 3/4] igb: split igb_msg_task() Wander Lairson Costa
@ 2025-01-07 10:09   ` Romanowski, Rafal
  0 siblings, 0 replies; 16+ messages in thread
From: Romanowski, Rafal @ 2025-01-07 10:09 UTC (permalink / raw)
  To: Wander Lairson Costa, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Auke Kok, Jeff Garzik,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Wander Lairson Costa
> Sent: Wednesday, December 4, 2024 12:42 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>; Clark
> Williams <clrkwllms@kernel.org>; Steven Rostedt <rostedt@goodmis.org>; Auke
> Kok <auke-jan.h.kok@intel.com>; Jeff Garzik <jgarzik@redhat.com>; moderated
> list:INTEL ETHERNET DRIVERS <intel-wired-lan@lists.osuosl.org>; open
> list:NETWORKING DRIVERS <netdev@vger.kernel.org>; open list <linux-
> kernel@vger.kernel.org>; open list:Real-time Linux
> (PREEMPT_RT):Keyword:PREEMPT_RT <linux-rt-devel@lists.linux.dev>
> Cc: Wander Lairson Costa <wander@redhat.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net 3/4] igb: split igb_msg_task()
> 
> From the perspective of PREEMPT_RT, igb_msg_task() invokes functions that are
> a mix of IRQ-safe and non-IRQ-safe operations.
> 
> To address this, we separate igb_msg_task() into distinct IRQ-safe and
> preemptible-safe components. This is a preparatory step for upcoming commits,
> where the igb_msix_other interrupt handler will be split into IRQ and threaded
> handlers, each invoking the appropriate part of the newly divided igb_msg_task().
> 
> Signed-off-by: Wander Lairson Costa <wander@redhat.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 88 +++++++++++++++++++++--
>  1 file changed, 81 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 9b4235ec226df..5828831fd29c2 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -149,6 +149,8 @@ static int igb_vlan_rx_kill_vid(struct net_device *,


Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2024-12-04 11:42 ` [PATCH iwl-net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
@ 2025-01-07 10:10   ` Romanowski, Rafal
  0 siblings, 0 replies; 16+ messages in thread
From: Romanowski, Rafal @ 2025-01-07 10:10 UTC (permalink / raw)
  To: Wander Lairson Costa, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Auke Kok, Jeff Garzik,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT
  Cc: Yuying Ma

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Wander Lairson Costa
> Sent: Wednesday, December 4, 2024 12:42 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>; Clark
> Williams <clrkwllms@kernel.org>; Steven Rostedt <rostedt@goodmis.org>; Auke
> Kok <auke-jan.h.kok@intel.com>; Jeff Garzik <jgarzik@redhat.com>; moderated
> list:INTEL ETHERNET DRIVERS <intel-wired-lan@lists.osuosl.org>; open
> list:NETWORKING DRIVERS <netdev@vger.kernel.org>; open list <linux-
> kernel@vger.kernel.org>; open list:Real-time Linux
> (PREEMPT_RT):Keyword:PREEMPT_RT <linux-rt-devel@lists.linux.dev>
> Cc: Wander Lairson Costa <wander@redhat.com>; Yuying Ma
> <yuma@redhat.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net 4/4] igb: fix igb_msix_other() handling
> for PREEMPT_RT
> 
> During testing of SR-IOV, Red Hat QE encountered an issue where the ip link up
> command intermittently fails for the igbvf interfaces when using the PREEMPT_RT
> variant. Investigation revealed that e1000_write_posted_mbx returns an error
> due to the lack of an ACK from e1000_poll_for_ack.
> 
> The underlying issue arises from the fact that IRQs are threaded by default under
> PREEMPT_RT. While the exact hardware details are not available, it appears that
> the IRQ handled by igb_msix_other must be processed before
> e1000_poll_for_ack times out. However, e1000_write_posted_mbx is called with
> preemption disabled, leading to a scenario where the IRQ is serviced only after
> the failure of e1000_write_posted_mbx.
> 
> Commit 338c4d3902fe ("igb: Disable threaded IRQ for igb_msix_other") forced
> the ISR to run in a non-threaded context. However, Sebastian observed that some
> functions called within the ISR acquire locks that may sleep.
> 
> In the previous two patches, we managed to make igb_msg_mask() safe to call
> from an interrupt context.
> 
> In this commit, we move most of the ISR handling to an interrupt context, leaving
> non IRQ safe code to be called from the thread context under PREEMPT_RT.
> 
> Reproducer:
> 
> ipaddr_vlan=3
> nic_test=ens14f0
> vf=${nic_test}v0 # The main testing steps:
> while true; do
>     ip link set ${nic_test} mtu 1500
>     ip link set ${vf} mtu 1500
>     ip link set $vf up
>     # 3. set vlan and ip for VF
>     ip link set ${nic_test} vf 0 vlan ${ipaddr_vlan}
>     ip addr add 172.30.${ipaddr_vlan}.1/24 dev ${vf}
>     ip addr add 2021:db8:${ipaddr_vlan}::1/64 dev ${vf}
>     # 4. check the link state for VF and PF
>     ip link show ${nic_test}
>     if ! ip link show $vf | grep 'state UP'; then
>         echo 'Error found'
>         break
>     fi
>     ip link set $vf down
> done
> 
> You can also reproduce it more reliably by setting nr_cpus=1 in the kernel
> command line.
> 
> Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
> Signed-off-by: Wander Lairson Costa <wander@redhat.com>
> Reported-by: Yuying Ma <yuma@redhat.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 35 ++++++++++++++++-------
>  1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 5828831fd29c2..b2894cebe2c9e 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -131,6 +131,7 @@ static void igb_set_uta(struct igb_adapter *adapter, bool

Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2024-12-04 11:42 [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
                   ` (4 preceding siblings ...)
  2024-12-26 13:24 ` [PATCH iwl-net 0/4] " Wander Lairson Costa
@ 2025-01-07 13:51 ` Sebastian Andrzej Siewior
  2025-01-07 18:52   ` Wander Lairson Costa
  5 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-07 13:51 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Clark Williams,
	Steven Rostedt, Jeff Garzik, Auke Kok,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

On 2024-12-04 08:42:23 [-0300], Wander Lairson Costa wrote:
> This is the second attempt at fixing the behavior of igb_msix_other()
> for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> concerns raised by Sebastian [3].
> 
> The initial approach proposed converting vfs_lock to a raw_spinlock,
> a minor change intended to make it safe. However, it became evident
> that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
> which is unsafe in interrupt context on PREEMPT_RT systems.
> 
> To address this, the solution involves splitting igb_msg_task()
> into two parts:
> 
>     * One part invoked from the IRQ context.
>     * Another part called from the threaded interrupt handler.
> 
> To accommodate this, vfs_lock has been restructured into a double
> lock: a spinlock_t and a raw_spinlock_t. In the revised design:
> 
>     * igb_disable_sriov() locks both spinlocks.
>     * Each part of igb_msg_task() locks the appropriate spinlock for
>     its execution context.

- Is this limited to PREEMPT_RT or does it also occur on PREEMPT systems
  with threadirqs? And if this is PREEMPT_RT only, why?

- What causes the failure? I see you reworked into two parts to behave
  similar to what happens without threaded interrupts. There is still no
  explanation for it. Is there a timing limit or was there another
  register operation which removed the mailbox message?

> Cheers,
> Wander

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-01-07 13:51 ` Sebastian Andrzej Siewior
@ 2025-01-07 18:52   ` Wander Lairson Costa
  2025-01-08 10:25     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2025-01-07 18:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Clark Williams,
	Steven Rostedt, Jeff Garzik, Auke Kok,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

On Tue, Jan 07, 2025 at 02:51:06PM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-12-04 08:42:23 [-0300], Wander Lairson Costa wrote:
> > This is the second attempt at fixing the behavior of igb_msix_other()
> > for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> > concerns raised by Sebastian [3].
> > 
> > The initial approach proposed converting vfs_lock to a raw_spinlock,
> > a minor change intended to make it safe. However, it became evident
> > that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
> > which is unsafe in interrupt context on PREEMPT_RT systems.
> > 
> > To address this, the solution involves splitting igb_msg_task()
> > into two parts:
> > 
> >     * One part invoked from the IRQ context.
> >     * Another part called from the threaded interrupt handler.
> > 
> > To accommodate this, vfs_lock has been restructured into a double
> > lock: a spinlock_t and a raw_spinlock_t. In the revised design:
> > 
> >     * igb_disable_sriov() locks both spinlocks.
> >     * Each part of igb_msg_task() locks the appropriate spinlock for
> >     its execution context.
> 
> - Is this limited to PREEMPT_RT or does it also occur on PREEMPT systems
>   with threadirqs? And if this is PREEMPT_RT only, why?

PREEMPT systems configured to use threadirqs should be affected as well,
although I never tested with this configuration. Honestly, until now I wasn't
aware of the possibility of a non PREEMPT_RT kernel with threaded IRQs by default.

> 
> - What causes the failure? I see you reworked into two parts to behave
>   similar to what happens without threaded interrupts. There is still no
>   explanation for it. Is there a timing limit or was there another
>   register operation which removed the mailbox message?
> 

I explained the root cause of the issue in the last commit. Maybe I should
have added the explanation to the cover letter as well.  Anyway, here is a
partial verbatim copy of it:

"During testing of SR-IOV, Red Hat QE encountered an issue where the
ip link up command intermittently fails for the igbvf interfaces when
using the PREEMPT_RT variant. Investigation revealed that
e1000_write_posted_mbx returns an error due to the lack of an ACK
from e1000_poll_for_ack.

The underlying issue arises from the fact that IRQs are threaded by
default under PREEMPT_RT. While the exact hardware details are not
available, it appears that the IRQ handled by igb_msix_other must
be processed before e1000_poll_for_ack times out. However,
e1000_write_posted_mbx is called with preemption disabled, leading
to a scenario where the IRQ is serviced only after the failure of
e1000_write_posted_mbx."

The call chain from igb_msg_task():

igb_msg_task
	igb_rcv_msg_from_vf
		igb_set_vf_multicasts
			igb_set_rx_mode
				igb_write_mc_addr_list
					kmalloc

Cannot happen from interrupt context under PREEMPT_RT. So this part of
the interrupt handler is deferred to a threaded IRQ handler.

> > Cheers,
> > Wander
> 
> Sebastian
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-01-07 18:52   ` Wander Lairson Costa
@ 2025-01-08 10:25     ` Sebastian Andrzej Siewior
  2025-01-09 16:46       ` Wander Lairson Costa
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-08 10:25 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Clark Williams,
	Steven Rostedt, Jeff Garzik, Auke Kok,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

On 2025-01-07 15:52:47 [-0300], Wander Lairson Costa wrote:
> On Tue, Jan 07, 2025 at 02:51:06PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2024-12-04 08:42:23 [-0300], Wander Lairson Costa wrote:
> > > This is the second attempt at fixing the behavior of igb_msix_other()
> > > for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> > > concerns raised by Sebastian [3].
> > > 
> > > The initial approach proposed converting vfs_lock to a raw_spinlock,
> > > a minor change intended to make it safe. However, it became evident
> > > that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
> > > which is unsafe in interrupt context on PREEMPT_RT systems.
> > > 
> > > To address this, the solution involves splitting igb_msg_task()
> > > into two parts:
> > > 
> > >     * One part invoked from the IRQ context.
> > >     * Another part called from the threaded interrupt handler.
> > > 
> > > To accommodate this, vfs_lock has been restructured into a double
> > > lock: a spinlock_t and a raw_spinlock_t. In the revised design:
> > > 
> > >     * igb_disable_sriov() locks both spinlocks.
> > >     * Each part of igb_msg_task() locks the appropriate spinlock for
> > >     its execution context.
> > 
> > - Is this limited to PREEMPT_RT or does it also occur on PREEMPT systems
> >   with threadirqs? And if this is PREEMPT_RT only, why?
> 
> PREEMPT systems configured to use threadirqs should be affected as well,
> although I never tested with this configuration. Honestly, until now I wasn't
> aware of the possibility of a non PREEMPT_RT kernel with threaded IRQs by default.

If the issue is indeed the use of threaded interrupts then the fix
should not be limited to be PREEMPT_RT only.

> > - What causes the failure? I see you reworked into two parts to behave
> >   similar to what happens without threaded interrupts. There is still no
> >   explanation for it. Is there a timing limit or was there another
> >   register operation which removed the mailbox message?
> > 
> 
> I explained the root cause of the issue in the last commit. Maybe I should
> have added the explanation to the cover letter as well.  Anyway, here is a
> partial verbatim copy of it:
> 
> "During testing of SR-IOV, Red Hat QE encountered an issue where the
> ip link up command intermittently fails for the igbvf interfaces when
> using the PREEMPT_RT variant. Investigation revealed that
> e1000_write_posted_mbx returns an error due to the lack of an ACK
> from e1000_poll_for_ack.

That ACK would have come if it would poll longer?

> The underlying issue arises from the fact that IRQs are threaded by
> default under PREEMPT_RT. While the exact hardware details are not
> available, it appears that the IRQ handled by igb_msix_other must
> be processed before e1000_poll_for_ack times out. However,
> e1000_write_posted_mbx is called with preemption disabled, leading
> to a scenario where the IRQ is serviced only after the failure of
> e1000_write_posted_mbx."

Where is this disabled preemption coming from? This should be one of the
ops.write_posted() calls, right? I've been looking around and don't see
anything obvious.
Couldn't you wait for an event instead of polling?

> The call chain from igb_msg_task():
> 
> igb_msg_task
> 	igb_rcv_msg_from_vf
> 		igb_set_vf_multicasts
> 			igb_set_rx_mode
> 				igb_write_mc_addr_list
> 					kmalloc
> 
> Cannot happen from interrupt context under PREEMPT_RT. So this part of
> the interrupt handler is deferred to a threaded IRQ handler.
>
> > > Cheers,
> > > Wander

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-01-08 10:25     ` Sebastian Andrzej Siewior
@ 2025-01-09 16:46       ` Wander Lairson Costa
  2025-01-09 17:45         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 16+ messages in thread
From: Wander Lairson Costa @ 2025-01-09 16:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Clark Williams,
	Steven Rostedt, Jeff Garzik, Auke Kok,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

On Wed, Jan 8, 2025 at 7:25 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2025-01-07 15:52:47 [-0300], Wander Lairson Costa wrote:
> > On Tue, Jan 07, 2025 at 02:51:06PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2024-12-04 08:42:23 [-0300], Wander Lairson Costa wrote:
> > > > This is the second attempt at fixing the behavior of igb_msix_other()
> > > > for PREEMPT_RT. The previous attempt [1] was reverted [2] following
> > > > concerns raised by Sebastian [3].
> > > >
> > > > The initial approach proposed converting vfs_lock to a raw_spinlock,
> > > > a minor change intended to make it safe. However, it became evident
> > > > that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC,
> > > > which is unsafe in interrupt context on PREEMPT_RT systems.
> > > >
> > > > To address this, the solution involves splitting igb_msg_task()
> > > > into two parts:
> > > >
> > > >     * One part invoked from the IRQ context.
> > > >     * Another part called from the threaded interrupt handler.
> > > >
> > > > To accommodate this, vfs_lock has been restructured into a double
> > > > lock: a spinlock_t and a raw_spinlock_t. In the revised design:
> > > >
> > > >     * igb_disable_sriov() locks both spinlocks.
> > > >     * Each part of igb_msg_task() locks the appropriate spinlock for
> > > >     its execution context.
> > >
> > > - Is this limited to PREEMPT_RT or does it also occur on PREEMPT systems
> > >   with threadirqs? And if this is PREEMPT_RT only, why?
> >
> > PREEMPT systems configured to use threadirqs should be affected as well,
> > although I never tested with this configuration. Honestly, until now I wasn't
> > aware of the possibility of a non PREEMPT_RT kernel with threaded IRQs by default.
>
> If the issue is indeed the use of threaded interrupts then the fix
> should not be limited to be PREEMPT_RT only.
>
Although I was not aware of this scenario, the patch should work for it as well,
as I am forcing it to run in interrupt context. I will test it to confirm.

> > > - What causes the failure? I see you reworked into two parts to behave
> > >   similar to what happens without threaded interrupts. There is still no
> > >   explanation for it. Is there a timing limit or was there another
> > >   register operation which removed the mailbox message?
> > >
> >
> > I explained the root cause of the issue in the last commit. Maybe I should
> > have added the explanation to the cover letter as well.  Anyway, here is a
> > partial verbatim copy of it:
> >
> > "During testing of SR-IOV, Red Hat QE encountered an issue where the
> > ip link up command intermittently fails for the igbvf interfaces when
> > using the PREEMPT_RT variant. Investigation revealed that
> > e1000_write_posted_mbx returns an error due to the lack of an ACK
> > from e1000_poll_for_ack.
>
> That ACK would have come if it would poll longer?
>
No, the service wouldn't be serviced while polling.

> > The underlying issue arises from the fact that IRQs are threaded by
> > default under PREEMPT_RT. While the exact hardware details are not
> > available, it appears that the IRQ handled by igb_msix_other must
> > be processed before e1000_poll_for_ack times out. However,
> > e1000_write_posted_mbx is called with preemption disabled, leading
> > to a scenario where the IRQ is serviced only after the failure of
> > e1000_write_posted_mbx."
>
> Where is this disabled preemption coming from? This should be one of the
> ops.write_posted() calls, right? I've been looking around and don't see
> anything obvious.

I don't remember if I found the answer by looking at the code or by
looking at the ftrace flags.
I am currently on sick leave with covid. I can check it when I come back.

> Couldn't you wait for an event instead of polling?
>
> > The call chain from igb_msg_task():
> >
> > igb_msg_task
> >       igb_rcv_msg_from_vf
> >               igb_set_vf_multicasts
> >                       igb_set_rx_mode
> >                               igb_write_mc_addr_list
> >                                       kmalloc
> >
> > Cannot happen from interrupt context under PREEMPT_RT. So this part of
> > the interrupt handler is deferred to a threaded IRQ handler.
> >
> > > > Cheers,
> > > > Wander
>
> Sebastian
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-01-09 16:46       ` Wander Lairson Costa
@ 2025-01-09 17:45         ` Sebastian Andrzej Siewior
  2025-01-17 13:19           ` Wander Lairson Costa
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-01-09 17:45 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Clark Williams,
	Steven Rostedt, Jeff Garzik, Auke Kok,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

On 2025-01-09 13:46:47 [-0300], Wander Lairson Costa wrote:
> > If the issue is indeed the use of threaded interrupts then the fix
> > should not be limited to be PREEMPT_RT only.
> >
> Although I was not aware of this scenario, the patch should work for it as well,
> as I am forcing it to run in interrupt context. I will test it to confirm.

If I remember correctly there were "ifdef preempt_rt" things in it.

> > > > - What causes the failure? I see you reworked into two parts to behave
> > > >   similar to what happens without threaded interrupts. There is still no
> > > >   explanation for it. Is there a timing limit or was there another
> > > >   register operation which removed the mailbox message?
> > > >
> > >
> > > I explained the root cause of the issue in the last commit. Maybe I should
> > > have added the explanation to the cover letter as well.  Anyway, here is a
> > > partial verbatim copy of it:
> > >
> > > "During testing of SR-IOV, Red Hat QE encountered an issue where the
> > > ip link up command intermittently fails for the igbvf interfaces when
> > > using the PREEMPT_RT variant. Investigation revealed that
> > > e1000_write_posted_mbx returns an error due to the lack of an ACK
> > > from e1000_poll_for_ack.
> >
> > That ACK would have come if it would poll longer?
> >
> No, the service wouldn't be serviced while polling.

Hmm. 

> > > The underlying issue arises from the fact that IRQs are threaded by
> > > default under PREEMPT_RT. While the exact hardware details are not
> > > available, it appears that the IRQ handled by igb_msix_other must
> > > be processed before e1000_poll_for_ack times out. However,
> > > e1000_write_posted_mbx is called with preemption disabled, leading
> > > to a scenario where the IRQ is serviced only after the failure of
> > > e1000_write_posted_mbx."
> >
> > Where is this disabled preemption coming from? This should be one of the
> > ops.write_posted() calls, right? I've been looking around and don't see
> > anything obvious.
> 
> I don't remember if I found the answer by looking at the code or by
> looking at the ftrace flags.
> I am currently on sick leave with covid. I can check it when I come back.

Don't worry, get better first. I'm kind of off myself. I'm not sure if I
have the hardware needed to setup so I can look at it…

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT
  2025-01-09 17:45         ` Sebastian Andrzej Siewior
@ 2025-01-17 13:19           ` Wander Lairson Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Wander Lairson Costa @ 2025-01-17 13:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Clark Williams,
	Steven Rostedt, Jeff Garzik, Auke Kok,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list,
	open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT

On Thu, Jan 09, 2025 at 06:45:12PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-01-09 13:46:47 [-0300], Wander Lairson Costa wrote:
> > > If the issue is indeed the use of threaded interrupts then the fix
> > > should not be limited to be PREEMPT_RT only.
> > >
> > Although I was not aware of this scenario, the patch should work for it as well,
> > as I am forcing it to run in interrupt context. I will test it to confirm.

I tested with the stock kernel with threadirqs and the problem does show up.
Applying the patches the issue is gone.

> 
> If I remember correctly there were "ifdef preempt_rt" things in it.

That exists only to handle the case in which part in which the ISR needs to
partially run in thread context (because the piece of code calling kmalloc),
so I need an sleeping lock for that. For non-PREEMPT_RT, we don't have this
constrain.

> 
> > > > > - What causes the failure? I see you reworked into two parts to behave
> > > > >   similar to what happens without threaded interrupts. There is still no
> > > > >   explanation for it. Is there a timing limit or was there another
> > > > >   register operation which removed the mailbox message?
> > > > >
> > > >
> > > > I explained the root cause of the issue in the last commit. Maybe I should
> > > > have added the explanation to the cover letter as well.  Anyway, here is a
> > > > partial verbatim copy of it:
> > > >
> > > > "During testing of SR-IOV, Red Hat QE encountered an issue where the
> > > > ip link up command intermittently fails for the igbvf interfaces when
> > > > using the PREEMPT_RT variant. Investigation revealed that
> > > > e1000_write_posted_mbx returns an error due to the lack of an ACK
> > > > from e1000_poll_for_ack.
> > >
> > > That ACK would have come if it would poll longer?

No, the poll happens while preemption is disabled.

> > >
> > No, the service wouldn't be serviced while polling.

s/service/interrupt/. Since we can't sleep at this context, there is
no way to wait for an event.

> 
> Hmm. 


> 
> > > > The underlying issue arises from the fact that IRQs are threaded by
> > > > default under PREEMPT_RT. While the exact hardware details are not
> > > > available, it appears that the IRQ handled by igb_msix_other must
> > > > be processed before e1000_poll_for_ack times out. However,
> > > > e1000_write_posted_mbx is called with preemption disabled, leading
> > > > to a scenario where the IRQ is serviced only after the failure of
> > > > e1000_write_posted_mbx."
> > >
> > > Where is this disabled preemption coming from? This should be one of the
> > > ops.write_posted() calls, right? I've been looking around and don't see
> > > anything obvious.
> > 
> > I don't remember if I found the answer by looking at the code or by
> > looking at the ftrace flags.
> > I am currently on sick leave with covid. I can check it when I come back.
> 
> Don't worry, get better first. I'm kind of off myself. I'm not sure if I
> have the hardware needed to setup so I can look at it…
> 

The reason of why you didn't find is because the interrupt in the igb
driver is triggered inside the igbvf code. igbvf_reset() calls
spin_lock_bh() [1], although in the cases I found it was already called
with preemption disabled from process_one_work() (workqueue) and netlink_sendmsg().

Here is an ftrace log for the failure case:

kworker/-86      0...1    85.381866: function:                   igbvf_reset
kworker/-86      0...2    85.381866: function:                      e1000_reset_hw_vf
kworker/-86      0...2    85.381867: function:                         e1000_check_for_rst_vf
kworker/-86      0...2    85.381868: function:                         e1000_write_posted_mbx
kworker/-86      0...2    85.381868: function:                            e1000_write_mbx_vf
kworker/-86      0...2    85.381870: function:                            e1000_check_for_ack_vf // repeats for 2000 lines
...
kworker/-86      0.N.2    86.393782: function:                         e1000_read_posted_mbx
kworker/-86      0.N.2    86.398606: function:                      e1000_init_hw_vf
kworker/-86      0.N.2    86.398606: function:                         e1000_rar_set_vf
kworker/-86      0.N.2    86.398606: function:                            e1000_write_posted_mbx
irq/65-e-1287    0d..1    86.398609: function:             igb_msix_other
irq/65-e-1287    0d..1    86.398609: function:                igb_rd32
irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst
irq/65-e-1287    0d..2    86.398610: function:                igb_check_for_rst_pf
irq/65-e-1287    0d..2    86.398610: function:                   igb_rd32
irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg
irq/65-e-1287    0d..2    86.398611: function:                igb_check_for_msg_pf
irq/65-e-1287    0d..2    86.398611: function:                   igb_rd32
irq/65-e-1287    0d..2    86.398612: function:                igb_rcv_msg_from_vf
irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx
irq/65-e-1287    0d..2    86.398612: function:                   igb_read_mbx_pf
irq/65-e-1287    0d..2    86.398612: function:                      igb_obtain_mbx_lock_pf
irq/65-e-1287    0d..2    86.398612: function:                         igb_rd32

Notice the interrupt handler only executes after e1000_write_posted()
returns. And here it is for the sucessful case:

      ip-5603    0...1  1884.710747: function:             igbvf_reset
      ip-5603    0...2  1884.710754: function:                e1000_reset_hw_vf
      ip-5603    0...2  1884.710755: function:                   e1000_check_for_rst_vf
      ip-5603    0...2  1884.710756: function:                   e1000_write_posted_mbx
      ip-5603    0...2  1884.710756: function:                      e1000_write_mbx_vf
      ip-5603    0...2  1884.710758: function:                      e1000_check_for_ack_vf
      ip-5603    0d.h2  1884.710760: function:             igb_msix_other
      ip-5603    0d.h2  1884.710760: function:                igb_rd32
      ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst
      ip-5603    0d.h3  1884.710761: function:                igb_check_for_rst_pf
      ip-5603    0d.h3  1884.710761: function:                   igb_rd32
      ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg
      ip-5603    0d.h3  1884.710762: function:                igb_check_for_msg_pf
      ip-5603    0d.h3  1884.710762: function:                   igb_rd32
      ip-5603    0d.h3  1884.710763: function:                igb_rcv_msg_from_vf
      ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx
      ip-5603    0d.h3  1884.710763: function:                   igb_read_mbx_pf
      ip-5603    0d.h3  1884.710763: function:                      igb_obtain_mbx_lock_pf
      ip-5603    0d.h3  1884.710763: function:                         igb_rd32

The ISR executes immediately fater e1000_write_mbx_vf().

[1] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/net/ethernet/intel/igbvf/netdev.c#L1522

> Sebastian
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-01-17 13:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-04 11:42 [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
2024-12-04 11:42 ` [PATCH iwl-net 1/4] igb: narrow scope of vfs_lock in SR-IOV cleanup Wander Lairson Costa
2025-01-07 10:06   ` [Intel-wired-lan] " Romanowski, Rafal
2024-12-04 11:42 ` [PATCH iwl-net 2/4] igb: introduce raw vfs_lock to igb_adapter Wander Lairson Costa
2025-01-07 10:08   ` [Intel-wired-lan] " Romanowski, Rafal
2024-12-04 11:42 ` [PATCH iwl-net 3/4] igb: split igb_msg_task() Wander Lairson Costa
2025-01-07 10:09   ` [Intel-wired-lan] " Romanowski, Rafal
2024-12-04 11:42 ` [PATCH iwl-net 4/4] igb: fix igb_msix_other() handling for PREEMPT_RT Wander Lairson Costa
2025-01-07 10:10   ` [Intel-wired-lan] " Romanowski, Rafal
2024-12-26 13:24 ` [PATCH iwl-net 0/4] " Wander Lairson Costa
2025-01-07 13:51 ` Sebastian Andrzej Siewior
2025-01-07 18:52   ` Wander Lairson Costa
2025-01-08 10:25     ` Sebastian Andrzej Siewior
2025-01-09 16:46       ` Wander Lairson Costa
2025-01-09 17:45         ` Sebastian Andrzej Siewior
2025-01-17 13:19           ` Wander Lairson Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).