All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: <intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
	<anthony.l.nguyen@intel.com>, <magnus.karlsson@intel.com>,
	<tobias.boehm@hetzner-cloud.de>,
	<marcus.wichelmann@hetzner-cloud.de>,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Subject: Re: [Intel-wired-lan] [PATCH v2 iwl-net] ixgbe: fix ndo_xdp_xmit() workloads
Date: Thu, 7 Aug 2025 14:13:35 +0200	[thread overview]
Message-ID: <aJSYb70vIpUEm4n0@boxer> (raw)
In-Reply-To: <171ae15c-5c13-48b5-b5a6-e61e6f259681@molgen.mpg.de>

On Wed, Aug 06, 2025 at 07:05:19PM +0200, Paul Menzel wrote:
> Dear Maciej,
> 
> 
> Thank you for your patch.
> 
> 
> Am 06.08.25 um 18:58 schrieb Maciej Fijalkowski:
> > Currently ixgbe driver checks periodically in its watchdog subtask if
> > there is anything to be transmitted (considering both Tx and XDP rings)
> > under state of carrier not being 'ok'. Such event is interpreted as Tx
> > hang and therefore results in interface reset.
> 
> For people grepping through commit messages, add some more details how the
> hang manifests?

Hi Paul,

I didn't want to repeat too much of things here that are included under
the link where original report took place (see lore link from Closes:
tag).

I know this adds a level of indirection for future reader, but I assumed
lore link will not be gone and it is safe to rely on it.

> 
> > This is currently problematic for ndo_xdp_xmit() as it is allowed to
> > produce descriptors when interface is going through reset or its carrier
> > is turned off.
> > 
> > Furthermore, XDP rings should not really be objects of Tx hang
> > detection. This mechanism is rather a matter of ndo_tx_timeout() being
> > called from dev_watchdog against Tx rings exposed to networking stack.
> > 
> > Taking into account issues described above, let us have a two fold fix -
> > do not respect XDP rings in local ixgbe watchdog and do not produce Tx
> > descriptors in ndo_xdp_xmit callback when there is some problem with
> > carrier currently. For now, keep the Tx hang checks in clean Tx irq
> > routine, but adjust it to not execute for XDP rings.
> 
> Do you have a reproducer for this?

Again, the original report has it. xdp-trafficgen was used to trigger this
problem.

I am not sure if it's worth re-spinning, especially that Marcus and Tobias
might be angry at me that it still didn't make it to mainline:P

> 
> > Cc: Tobias Böhm <tobias.boehm@hetzner-cloud.de>
> > Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
> > Closes: https://lore.kernel.org/netdev/eca1880f-253a-4955-afe6-732d7c6926ee@hetzner-cloud.de/
> > Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
> > Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action")
> > Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> > Tested-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> > v1->v2:
> > * collect tags
> > * fix typos (Dawid)
> > ---
> >   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 34 ++++++-------------
> >   1 file changed, 11 insertions(+), 23 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > index 03d31e5b131d..7c0db3b3ee8e 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > @@ -967,10 +967,6 @@ static void ixgbe_update_xoff_rx_lfc(struct ixgbe_adapter *adapter)
> >   	for (i = 0; i < adapter->num_tx_queues; i++)
> >   		clear_bit(__IXGBE_HANG_CHECK_ARMED,
> >   			  &adapter->tx_ring[i]->state);
> > -
> > -	for (i = 0; i < adapter->num_xdp_queues; i++)
> > -		clear_bit(__IXGBE_HANG_CHECK_ARMED,
> > -			  &adapter->xdp_ring[i]->state);
> >   }
> >   static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
> > @@ -1264,10 +1260,13 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
> >   				   total_bytes);
> >   	adapter->tx_ipsec += total_ipsec;
> > +	if (ring_is_xdp(tx_ring))
> > +		return !!budget;
> > +
> >   	if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
> >   		/* schedule immediate reset if we believe we hung */
> >   		struct ixgbe_hw *hw = &adapter->hw;
> > -		e_err(drv, "Detected Tx Unit Hang %s\n"
> > +		e_err(drv, "Detected Tx Unit Hang\n"
> >   			"  Tx Queue             <%d>\n"
> >   			"  TDH, TDT             <%x>, <%x>\n"
> >   			"  next_to_use          <%x>\n"
> > @@ -1275,16 +1274,14 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
> >   			"tx_buffer_info[next_to_clean]\n"
> >   			"  time_stamp           <%lx>\n"
> >   			"  jiffies              <%lx>\n",
> > -			ring_is_xdp(tx_ring) ? "(XDP)" : "",
> >   			tx_ring->queue_index,
> >   			IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)),
> >   			IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)),
> >   			tx_ring->next_to_use, i,
> >   			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
> > -		if (!ring_is_xdp(tx_ring))
> > -			netif_stop_subqueue(tx_ring->netdev,
> > -					    tx_ring->queue_index);
> > +		netif_stop_subqueue(tx_ring->netdev,
> > +				    tx_ring->queue_index);
> >   		e_info(probe,
> >   		       "tx hang %d detected on queue %d, resetting adapter\n",
> > @@ -1297,9 +1294,6 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
> >   		return true;
> >   	}
> > -	if (ring_is_xdp(tx_ring))
> > -		return !!budget;
> > -
> >   #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
> >   	txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index);
> >   	if (!__netif_txq_completed_wake(txq, total_packets, total_bytes,
> > @@ -7796,12 +7790,9 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
> >   		return;
> >   	/* Force detection of hung controller */
> > -	if (netif_carrier_ok(adapter->netdev)) {
> > +	if (netif_carrier_ok(adapter->netdev))
> >   		for (i = 0; i < adapter->num_tx_queues; i++)
> >   			set_check_for_tx_hang(adapter->tx_ring[i]);
> > -		for (i = 0; i < adapter->num_xdp_queues; i++)
> > -			set_check_for_tx_hang(adapter->xdp_ring[i]);
> > -	}
> >   	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) {
> >   		/*
> > @@ -8016,13 +8007,6 @@ static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)
> >   			return true;
> >   	}
> > -	for (i = 0; i < adapter->num_xdp_queues; i++) {
> > -		struct ixgbe_ring *ring = adapter->xdp_ring[i];
> > -
> > -		if (ring->next_to_use != ring->next_to_clean)
> > -			return true;
> > -	}
> > -
> >   	return false;
> >   }
> > @@ -10825,6 +10809,10 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n,
> >   	if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state)))
> >   		return -ENETDOWN;
> > +	if (!netif_carrier_ok(adapter->netdev) ||
> > +	    !netif_running(adapter->netdev))
> > +		return -ENETDOWN;
> > +
> 
> I am no expert here, but should the commit be split into two?

fixing producer on one commit and consumer on other means that first
commit would still contain a broken driver, which would be not a real
*fix*. you can think of ixgbe_xdp_xmit() as a producer of descriptors and
ixgbe_clean_tx_irq() as consumer (in reality HW is the consumer, but i
hope this analogy makes some sense to you).

> 
> >   	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
> >   		return -EINVAL;
> 
> 
> Kind regards,
> 
> Paul

      reply	other threads:[~2025-08-07 12:13 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06 16:58 [Intel-wired-lan] [PATCH v2 iwl-net] ixgbe: fix ndo_xdp_xmit() workloads Maciej Fijalkowski
2025-08-06 16:58 ` Maciej Fijalkowski
2025-08-06 17:05 ` [Intel-wired-lan] " Paul Menzel
2025-08-07 12:13   ` Maciej Fijalkowski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJSYb70vIpUEm4n0@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=magnus.karlsson@intel.com \
    --cc=marcus.wichelmann@hetzner-cloud.de \
    --cc=netdev@vger.kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    --cc=tobias.boehm@hetzner-cloud.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.