From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1E99449EB6
	for <linux-kernel@vger.kernel.org>; Tue,  5 May 2026 13:27:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.41
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1777987659; cv=none; b=up8olOOCXFCtpV7q7WIoDMwL1mh5a0NZqNaduoctPd8rVCsj4aM6F6Qmuxn13NOWosn9UFVoIyRqDXLVWGutUhyKg7dFaDFwWlXff3PAfY1ym7FAyCBNhd4nyfrUnKfIJzvShUYsKVDG8cE3nyKsL6cwbM1UW1pHlwlRqVK5WAQ=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1777987659; c=relaxed/simple;
	bh=YPthGmBagOjav013lu+wSW9xMVwN5Gu+hGSDUNENwvE=;
	h=From:Date:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=nRaGEIdeoxK1NLHW9i2cANl6BwqAZNnRUr4tBQRxspAvRnJPI6PTVZ2NP9F7u6VUJJ9NuCdR4C7VqMKUzXkNq+vuE/Rg+edouCKbLr1/NRnsW4ZzF0aLy8fPz7WgwE4t/XFjTrvf3qYBdjC/EMehF8qNZ6R8zg/sLX3aDVCuzq8=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=EOznjxZj; arc=none smtp.client-ip=209.85.221.41
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="EOznjxZj"
Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-449e96a8a80so3118856f8f.3
        for <linux-kernel@vger.kernel.org>; Tue, 05 May 2026 06:27:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=suse.com; s=google; t=1777987656; x=1778592456; darn=vger.kernel.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to;
        bh=lraFz6oBhP7MmXTOPo6oBPAEVJFopwF4+HMSlIJwUjg=;
        b=EOznjxZjTkWhcFqAP+iKuD8RXYBKGEVi3PG5UCauKz+BYjcyytRhCnD+MSR6Zf3cDd
         Bfoz1AHDgwUiyKara4RCTNZMOoE03SkkIFS6ZeTzEAFwzkc3cjdXaxW3YQf+VYoiqv6K
         CURS9IdI9qbCyDpxFn74UM31m60vTGrYwTke+eUSuasjpTuhoiut4JEexFcMAVMG3r3L
         oAxPhxbc1qvPfx7Fo9D3XZ1CX4CDIiEEmVFYd5Q91boaLtsBTaU9b32SwtP+U6bJRATn
         ljIYdmBKJEcZ5yKA2+EYwOTorhcN0s6HH8uVHKDtJ7Y8jjzNva5vG4oHecnuM2N3RinO
         M7qw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1777987656; x=1778592456;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:date:from:x-gm-gg:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=lraFz6oBhP7MmXTOPo6oBPAEVJFopwF4+HMSlIJwUjg=;
        b=RBfmxR0b0IYx0han+H86cahUUNJvHjE26TOHsA8tolCAUEciE4g9Eetl6QyC7R4EvR
         TjRO7qJR3qaHdIrENVl3S0hOlKmJwiA8yTQJDyMRCq57n1ISrNLsB1HlGqwWXXcqoysp
         VhPdCkiwTf5wwY1jZrrgzFBalNmFdHV1J/Y7mk7m3nVeM5fb32f7AF9CAK3JqBA7ZGQ9
         l5m165BoMlFMMJsutNFUlwQvZEFWMXO6/nsjpsmNmRrnNRJ6Yr49FdLgFTIharvzd1p2
         ityV18pUU0oV2HxklomTjjqmakIHL57zAPfiCQ+MT9rsY/Ir1Ke0gnJQfnOH8xOcCNNl
         ZlNw==
X-Forwarded-Encrypted: i=1; AFNElJ/QjBoEdTxLR4oD1Xi9XGcVZiIk6qj8Lnl4sx2k2j0eWFRFEOSbTLIEywdGTsHOW9SuoRN2DmyJBJb6tLo=@vger.kernel.org
X-Gm-Message-State: AOJu0YwB7q6SOwYUJpMc986Sj3JEMG5h3Jf9Pu1EqPpKcazmXKpUydmK
	/XNDTXGxuVtIsE1NE/8cud5rGK734m2QwXQ5oLATE619ilMQunHMo77Ml4KGnR+Dljw=
X-Gm-Gg: AeBDietZzUA6BMjkOG4X2ne1DUOcAjhg1uAbGeahnS6x3grgzdKI5MVdwMYK10WUAZe
	mj3cswSy/SLfitx4/iS7k1fYiZqoZmi/mNIHCfRVCAbOu+wx3jIchCLUy0aPjzegD/oXi2KM7dI
	C2gMJJJlju4ycwqT8eSs6+Wn8jeZw85Hy3ans+gLYcSkfiW7h09DqnQkIO7wNpm2UNpQ9veOz+J
	xlz8ihtLKQfZcF4RQrsM6vEJhejfVy97OeIIIndef3fu8ehNq1UfOpqNjuZJlHOBdeuDx/4DaxO
	JGGomcbbe/Xp78TqqswauELXmlmgI/s3TFq3AAkHiUkldymGC4Co4hpFIshuwq1FBWPEqH36F8/
	rpr+QnP5yLIinr6mBCp4AilY2ooUTHeytIV2k73UazBAFN07ZsIXpksh9TUTBf4Aq6NaPH46+IM
	smo4tPN6uWf5Rf5gKSluG+H1iWPpB0e+LgKmZTQoocobCsnWo05pD9mODgiZMjFb9AhU2nFxcF5
	ItTvF9j0IQEdb58cw==
X-Received: by 2002:a05:6000:18a5:b0:449:c5e2:a8b7 with SMTP id ffacd0b85a97d-450060571bemr5745163f8f.30.1777987656056;
        Tue, 05 May 2026 06:27:36 -0700 (PDT)
Received: from localhost (host-79-47-155-212.retail.telecomitalia.it. [79.47.155.212])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45055960022sm4650340f8f.26.2026.05.05.06.27.35
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 05 May 2026 06:27:35 -0700 (PDT)
From: Andrea della Porta <andrea.porta@suse.com>
X-Google-Original-From: Andrea della Porta <aporta@suse.de>
Date: Tue, 5 May 2026 15:30:51 +0200
To: Lukasz Raczylo <lukasz@raczylo.com>
Cc: netdev@vger.kernel.org, Nicolas Ferre <nicolas.ferre@microchip.com>,
	Claudiu Beznea <claudiu.beznea@tuxon.dev>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-rpi-kernel@lists.infradead.org
Subject: Re: [RFC PATCH net-next 3/3] net: macb: add TX stall watchdog as
 defence-in-depth safety net
Message-ID: <afnxC-Lk5LELsm42@apocalypse>
References: <cover.1777064117.git.lukasz@raczylo.com>
 <c0469642f42ada85d91a8a685eb7c0e04cb99131.1777064117.git.lukasz@raczylo.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <c0469642f42ada85d91a8a685eb7c0e04cb99131.1777064117.git.lukasz@raczylo.com>

Hi Lukasz,

On 23:38 Fri 24 Apr     , Lukasz Raczylo wrote:
> Patches 1/3 and 2/3 address two candidate races that could lead
> to a TCOMP completion being missed on PCIe-attached macb
> instances.  This patch adds a defence-in-depth safety net, in
> case a further race remains that we have not identified.
> 
> The watchdog is a per-queue delayed_work that runs once per
> second.  It snapshots queue->tx_tail; if the ring is non-empty
> (queue->tx_head != queue->tx_tail) and tx_tail has not advanced
> since the previous tick, it calls macb_tx_restart().
> 
> No new recovery logic is introduced.  macb_tx_restart() already
> exists in this file, is correctly locked (tx_ptr_lock, bp->lock),
> and verifies that the hardware's TBQP is behind the driver's
> head index before re-asserting TSTART.  On a healthy ring it is
> a no-op at the hardware level; the watchdog only supplies the
> missing trigger.
> 
> On a healthy queue the per-tick cost is one spin_lock_irqsave()
> / spin_unlock_irqrestore() and one branch.  The delayed_work is
> only scheduled between macb_open() and macb_close(), and is
> cancelled synchronously on close.
> 
> Context for submission: on our 24-node Raspberry Pi 5 fleet,
> before this series, an out-of-band user-space watchdog
> (monitoring tx_packets from /sys/class/net/.../statistics and
> toggling the link down/up when it froze) was required to keep
> nodes usable.  We include this kernel-side watchdog as a cleaner
> in-kernel equivalent for any residual stall that patches 1 and
> 2 do not cover.  We are willing to drop this patch if the view
> is that 1 and 2 should stand alone.
> 
> Link: https://github.com/cilium/cilium/issues/43198
> Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
> Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
> ---
>  drivers/net/ethernet/cadence/macb.h      |  5 ++
>  drivers/net/ethernet/cadence/macb_main.c | 59 ++++++++++++++++++++++++
>  2 files changed, 64 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index 2de56017e..9115f2b47 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -1278,6 +1278,11 @@ struct macb_queue {
>  	dma_addr_t		tx_ring_dma;
>  	struct work_struct	tx_error_task;
>  	bool			txubr_pending;
> +
> +	/* TX stall watchdog -- see macb_tx_stall_watchdog() in macb_main.c */
> +	struct delayed_work	tx_stall_watchdog_work;
> +	unsigned int		tx_stall_last_tail;
> +
>  	struct napi_struct	napi_tx;
>  
>  	dma_addr_t		rx_ring_dma;
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index ea231b1c5..ea2306ef7 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -2002,6 +2002,59 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
>  	return work_done;
>  }
>  
> +#define MACB_TX_STALL_INTERVAL_MS	1000
> +
> +/*
> + * TX stall watchdog.
> + *
> + * Defence-in-depth against lost TCOMP interrupts.  macb already has a
> + * recovery chain (tx_pending -> txubr_pending -> macb_tx_restart())
> + * that fires on TCOMP; if TCOMP itself is lost the TX ring stalls
> + * silently until something else kicks TSTART.  This watchdog runs
> + * once per second per queue, snapshots tx_tail, and calls
> + * macb_tx_restart() if the ring is non-empty and tx_tail has not
> + * advanced since the previous tick.
> + *
> + * macb_tx_restart() already checks the hardware's TBQP against the
> + * driver's head index before re-asserting TSTART, so on a healthy
> + * ring this is a no-op at the hardware level.  The watchdog only
> + * adds the missing trigger.
> + */
> +static void macb_tx_stall_watchdog(struct work_struct *work)
> +{
> +	struct macb_queue *queue = container_of(to_delayed_work(work),
> +						struct macb_queue,
> +						tx_stall_watchdog_work);
> +	struct macb *bp = queue->bp;
> +	unsigned int cur_tail, cur_head;
> +	bool stalled = false;
> +	unsigned long flags;
> +
> +	if (!netif_running(bp->dev))
> +		return;
> +
> +	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
> +	cur_tail = queue->tx_tail;
> +	cur_head = queue->tx_head;
> +	if (cur_head != cur_tail &&
> +	    cur_tail == queue->tx_stall_last_tail)
> +		stalled = true;
> +	else
> +		queue->tx_stall_last_tail = cur_tail;
> +	spin_unlock_irqrestore(&queue->tx_ptr_lock, flags);
> +
> +	if (stalled) {
> +		netdev_warn_once(bp->dev,
> +				 "TX stall detected on queue %u (tail=%u head=%u); re-kicking TSTART\n",
> +				 (unsigned int)(queue - bp->queues),
> +				 cur_tail, cur_head);
> +		macb_tx_restart(queue);
> +	}
> +
> +	schedule_delayed_work(&queue->tx_stall_watchdog_work,
> +			      msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS));
> +}
> +
>  static void macb_hresp_error_task(struct work_struct *work)
>  {
>  	struct macb *bp = from_work(bp, work, hresp_err_bh_work);
> @@ -3190,6 +3243,9 @@ static int macb_open(struct net_device *dev)
>  	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
>  		napi_enable(&queue->napi_rx);
>  		napi_enable(&queue->napi_tx);
> +		queue->tx_stall_last_tail = queue->tx_tail;
> +		schedule_delayed_work(&queue->tx_stall_watchdog_work,
> +				      msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS));
>  	}
>  
>  	macb_init_hw(bp);
> @@ -3240,6 +3296,7 @@ static int macb_close(struct net_device *dev)
>  	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
>  		napi_disable(&queue->napi_rx);
>  		napi_disable(&queue->napi_tx);
> +		cancel_delayed_work_sync(&queue->tx_stall_watchdog_work);
>  		netdev_tx_reset_queue(netdev_get_tx_queue(dev, q));
>  	}
>  
> @@ -4802,6 +4859,8 @@ static int macb_init_dflt(struct platform_device *pdev)
>  		}
>  
>  		INIT_WORK(&queue->tx_error_task, macb_tx_error_task);
> +		INIT_DELAYED_WORK(&queue->tx_stall_watchdog_work,
> +				  macb_tx_stall_watchdog);
>  		q++;
>  	}
>  
> -- 
> 2.53.0
>

I've applied all three patches to v6.19.10 changing netdev_warn_once() from this one to
netdev_warn() and it the "TX stall" warning appears several time. So it seems that there
could be another cause escaping the filtering in the first two patches.

Interestingly enough, running the same tests after substituing the entire macb driver with
the downstream version works ok.

Not sure how to interpret these results since they seem to be the opposite of yours.
More investigation is ongoing from my side.

Thanks,
Andrea