From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Vivek Behera <vivek.behera@siemens.com>
Cc: <aleksandr.loktionov@intel.com>, <jacob.e.keller@intel.com>,
<anthony.l.nguyen@intel.com>, <przemyslaw.kitszel@intel.com>,
<sriram.yagnaraman@ericsson.com>, <kurt@linutronix.de>,
<intel-wired-lan@lists.osuosl.org>
Subject: Re: [Intel-wired-lan] [PATCH iwl-net v4] igb: Fix trigger of incorrect irq in igb_xsk_wakeup
Date: Fri, 9 Jan 2026 01:14:56 +0100 [thread overview]
Message-ID: <aWBIgOaRG50IuJsU@boxer> (raw)
In-Reply-To: <20251222115747.230521-1-vivek.behera@siemens.com>
On Mon, Dec 22, 2025 at 12:57:47PM +0100, Vivek Behera wrote:
> The current implementation in the igb_xsk_wakeup expects the Rx and Tx queues
> to share the same irq. This would lead to triggering of incorrect irq
> in split irq configuration.
> This patch addresses this issue which could impact environments
> with 2 active cpu cores
> or when the number of queues is reduced to 2 or less
>
> cat /proc/interrupts | grep eno2
> 167: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
> 0-edge eno2
> 168: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
> 1-edge eno2-rx-0
> 169: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
> 2-edge eno2-rx-1
> 170: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
> 3-edge eno2-tx-0
> 171: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
> 4-edge eno2-tx-1
>
> Furthermore it uses the flags input argument to trigger either rx, tx or
> both rx and tx irqs as specified in the ndo_xsk_wakeup api documentation
>
> Fixes: 80f6ccf9f116 ("igb: Introduce XSK data structures and helpers")
> Signed-off-by: Vivek Behera <vivek.behera@siemens.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> v1: https://lore.kernel.org/intel-wired-lan/20251212131454.124116-1-vivek.behera@siemens.com/
> v2: https://lore.kernel.org/intel-wired-lan/20251215115416.410619-1-vivek.behera@siemens.com/
> v3: https://lore.kernel.org/intel-wired-lan/20251220114936.140473-1-vivek.behera@siemens.com/
>
> changelog:
> v1
> - Inital description of the Bug and fixes made in the patch
>
> v1 -> v2
> - Handling of RX and TX Wakeup in igc_xsk_wakeup for a split IRQ configuration
> - Review suggestions by Aleksander: Modified sequence to complete all
> error checks for rx and tx before updating napi states and triggering irqs
> - Corrected trigger of TX and RX interrupts over E1000_ICS (non msix use case)
> - Added define for Tx interrupt trigger bit mask for E1000_ICS
>
> v2 -> v3
> - Included applicable feedback and suggestions from igc patch
> - Fixed logic in updating eics value when both TX and RX need wakeup
>
> v3 -> v4
> - Added comments to explain trigerring of both TX and RX with active queue pairs
> - Fixed check of xsk pools in if statement
> ---
> .../net/ethernet/intel/igb/e1000_defines.h | 1 +
> drivers/net/ethernet/intel/igb/igb_xsk.c | 90 +++++++++++++++++--
> 2 files changed, 83 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
> index fa028928482f..9357564a2d58 100644
> --- a/drivers/net/ethernet/intel/igb/e1000_defines.h
> +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
> @@ -443,6 +443,7 @@
> #define E1000_ICS_LSC E1000_ICR_LSC /* Link Status Change */
> #define E1000_ICS_RXDMT0 E1000_ICR_RXDMT0 /* rx desc min. threshold */
> #define E1000_ICS_DRSTA E1000_ICR_DRSTA /* Device Reset Aserted */
> +#define E1000_ICS_TXDW E1000_ICR_TXDW /* Transmit desc written back */
>
> /* Extended Interrupt Cause Set */
> /* E1000_EITR_CNT_IGNR is only for 82576 and newer */
> diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
> index 30ce5fbb5b77..1d21674c0f33 100644
> --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> @@ -529,6 +529,7 @@ int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
> struct igb_adapter *adapter = netdev_priv(dev);
> struct e1000_hw *hw = &adapter->hw;
> struct igb_ring *ring;
> + struct igb_q_vector *q_vector;
> u32 eics = 0;
>
> if (test_bit(__IGB_DOWN, &adapter->state))
> @@ -536,14 +537,82 @@ int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
>
> if (!igb_xdp_is_enabled(adapter))
> return -EINVAL;
> -
> - if (qid >= adapter->num_tx_queues)
> + /* Check if queue_id is valid. Tx and Rx queue numbers are always same */
> + if (qid >= adapter->num_rx_queues)
> return -EINVAL;
>
> - ring = adapter->tx_ring[qid];
> -
> - if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
> - return -ENETDOWN;
> + if ((flags & XDP_WAKEUP_RX) && (flags & XDP_WAKEUP_TX)) {
> + /* If both TX and RX need to be woken up check if queue pairs are active */
> + if (adapter->flags & IGB_FLAG_QUEUE_PAIRS) {
> + /* In queue-pair mode, rx_ring and tx_ring share the same q_vector,
> + * so a single IRQ trigger will wake both RX and TX processing
> + */
> + ring = adapter->rx_ring[qid];
> + } else {
> + /* Two irqs for Rx AND Tx need to be triggered */
> + struct napi_struct *rx_napi;
> + struct napi_struct *tx_napi;
> + bool trigger_irq_tx = false;
> + bool trigger_irq_rx = false;
> + u32 eics_tx = 0;
> + u32 eics_rx = 0;
> + /* IRQ trigger preparation for Rx */
> + ring = adapter->rx_ring[qid];
> + if (!READ_ONCE(ring->xsk_pool))
> + return -ENXIO;
> + q_vector = ring->q_vector;
> + rx_napi = &q_vector->napi;
> + /* Extend the BIT mask for eics */
> + eics_rx = ring->q_vector->eims_value;
> +
> + /* IRQ trigger preparation for Tx */
> + ring = adapter->tx_ring[qid];
> + if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
> + return -ENETDOWN;
> +
> + if (!READ_ONCE(ring->xsk_pool))
> + return -ENXIO;
> + q_vector = ring->q_vector;
> + tx_napi = &q_vector->napi;
> + /* Extend the BIT mask for eics */
> + eics_tx = ring->q_vector->eims_value;
> +
> + /* Check and update napi states for rx and tx */
> + if (!napi_if_scheduled_mark_missed(rx_napi)) {
> + trigger_irq_rx = true;
> + eics |= eics_rx;
> + }
> + if (!napi_if_scheduled_mark_missed(tx_napi)) {
> + trigger_irq_tx = true;
> + eics |= eics_tx;
> + }
> + /* Now we trigger the required irqs for Rx and Tx */
> + if ((trigger_irq_rx) || (trigger_irq_tx)) {
> + if (adapter->flags & IGB_FLAG_HAS_MSIX) {
> + wr32(E1000_EICS, eics);
> + } else {
> + if ((trigger_irq_rx) && (trigger_irq_tx))
> + wr32(E1000_ICS, E1000_ICS_RXDMT0 | E1000_ICS_TXDW);
> + else if (trigger_irq_rx)
> + wr32(E1000_ICS, E1000_ICS_RXDMT0);
> + else
> + wr32(E1000_ICS, E1000_ICS_TXDW);
> + }
> + }
> + return 0;
> + }
> + } else if (flags & XDP_WAKEUP_TX) {
> + /* Get the ring params from Tx */
> + ring = adapter->tx_ring[qid];
> + if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
> + return -ENETDOWN;
> + } else if (flags & XDP_WAKEUP_RX) {
> + /* Get the ring params from Rx */
> + ring = adapter->rx_ring[qid];
> + } else {
> + /* Invalid Flags */
> + return -EINVAL;
> + }
This is too complicated IMHO. Wouldn't something like this work:
- if wakeup rx, pick napi from rx ring's q_vector
* napi_if_scheduled_mark_missed() logic that exists in igc_xsk_wakeup()
repeat for tx; if IGB_FLAG_QUEUE_PAIRS then the branch of second
napi_if_scheduled_mark_missed() call would not be executed as we had
previously marked the missed bit in napi state;
>
> if (!READ_ONCE(ring->xsk_pool))
> return -EINVAL;
> @@ -551,10 +620,15 @@ int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
> if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
> /* Cause software interrupt */
> if (adapter->flags & IGB_FLAG_HAS_MSIX) {
> - eics |= ring->q_vector->eims_value;
> + eics = ring->q_vector->eims_value;
> wr32(E1000_EICS, eics);
> } else {
> - wr32(E1000_ICS, E1000_ICS_RXDMT0);
> + if ((flags & XDP_WAKEUP_RX) && (flags & XDP_WAKEUP_TX))
> + wr32(E1000_ICS, E1000_ICS_RXDMT0 | E1000_ICS_TXDW);
> + else if (flags & XDP_WAKEUP_RX)
> + wr32(E1000_ICS, E1000_ICS_RXDMT0);
> + else
> + wr32(E1000_ICS, E1000_ICS_TXDW);
> }
> }
>
> --
> 2.34.1
>
next prev parent reply other threads:[~2026-01-09 0:15 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 11:57 [Intel-wired-lan] [PATCH iwl-net v4] igb: Fix trigger of incorrect irq in igb_xsk_wakeup Vivek Behera via Intel-wired-lan
2026-01-09 0:14 ` Maciej Fijalkowski [this message]
2026-01-11 14:27 ` Behera, VIVEK via Intel-wired-lan
2026-01-12 11:38 ` Loktionov, Aleksandr
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWBIgOaRG50IuJsU@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=aleksandr.loktionov@intel.com \
--cc=anthony.l.nguyen@intel.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jacob.e.keller@intel.com \
--cc=kurt@linutronix.de \
--cc=przemyslaw.kitszel@intel.com \
--cc=sriram.yagnaraman@ericsson.com \
--cc=vivek.behera@siemens.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox