From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maciej Fijalkowski Date: Fri, 28 Jan 2022 16:31:13 +0100 Subject: [Intel-wired-lan] ixgbe driver link down causes 100% load in ksoftirqd/x In-Reply-To: References: Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: On Thu, Jan 20, 2022 at 09:23:06AM +0000, Maurice Baijens (Ellips B.V.) wrote: > Hello, > > > I have an issue with the ixgbe driver and X550Tx network adapter. > When I disconnect the network cable I end up with 100% load in ksoftirqd/x. I am running the adapter in > xdp mode (XDP_FLAGS_DRV_MODE). Problem seen in linux kernel 5.15.x and also 5.16.0+ (head). Hello, a stupid question - why do you disconnect the cable when running traffic? :) If you plug this back in then what happens? > > I traced the problem down to function ixgbe_xmit_zc in ixgbe_xsk.c: > > if (unlikely(!ixgbe_desc_unused(xdp_ring)) || > !netif_carrier_ok(xdp_ring->netdev)) { > work_done = false; > break; > } This was done in commit c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") - it was addressing the transient state when configuring the xsk pool on particular queue pair. > > This function is called from ixgbe_poll() function via ixgbe_clean_xdp_tx_irq(). It sets > work_done to false if netif_carrier_ok() returns false (so if link is down). Because work_done > is always false, ixgbe_poll keeps on polling forever. > > I made a fix by checking link in ixgbe_poll() function and if no link exiting polling mode: > > /* If all work not completed, return budget and keep polling */ > if ((!clean_complete) && netif_carrier_ok(adapter->netdev)) > return budget; Not sure about the correctness of this. Question is how should we act for link down - should we say that we are done with processing or should we wait until the link gets back? Instead of setting the work_done to false immediately for !netif_carrier_ok(), I'd rather break out the checks that are currently combined into the single statement, something like this: diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index b3fd8e5cd85b..6a5e9cf6b5da 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -390,12 +390,14 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) u32 cmd_type; while (budget-- > 0) { - if (unlikely(!ixgbe_desc_unused(xdp_ring)) || - !netif_carrier_ok(xdp_ring->netdev)) { + if (unlikely(!ixgbe_desc_unused(xdp_ring))) { work_done = false; break; } + if (!netif_carrier_ok(xdp_ring->netdev)) + break; + if (!xsk_tx_peek_desc(pool, &desc)) break; > > This is probably fine for our application as we only run in xdpdrv mode, however I am not sure this By xdpdrv I would understand that you're running XDP in standard native mode, however you refer to the AF_XDP Zero Copy implementation in the driver. But I don't think it changes anything in this thread. In the end I see some outstanding issues with ixgbe_xmit_zc(), so this probably might need some attention. Thanks! Maciej > is the correct way to fix this issue and the behaviour of the normal skb mode operation is > also affected by my fix. > > So hopefully my observations are correct and someone here can fix the issue and push it upstream. > > > Best regards, > Maurice Baijens