From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from metis.whiteo.stw.pengutronix.de (metis.whiteo.stw.pengutronix.de [185.203.201.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62CB536E498 for ; Fri, 23 Jan 2026 09:34:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.203.201.7 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769160860; cv=none; b=oOFXyAMTEe91pHVacCB59H7C17yhgw2VOzOA07PZh5YxZDqR/zOLQgxtb6KSe3ZVcIelttozT27p7dfClwMrhqZWFfkfENY7BD2n8f3h4Ce8UOZ51aYgyrL6UygeyIyKtFy3oAbeoXRKdmFcJnwnDTQDgpdAZT7gfJp0rg3/m9A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769160860; c=relaxed/simple; bh=wksl7ixP/bS/PYXdddI8qlI5+uMJ2pVUKvzcj5RdSOI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kbwn2W30/fspcW9FXGTw+n/TzzBERkymnA8Fh8WNQkT2qhYQmyGM2U73xrhGxQ9AnzjHnh4MMJ774PwFyYk6lGXwlFBdBO2UbhMHqUczKYuTHIWZ+WUu9PJGbRmOdrnNkmvx+8ZfNZX+sVcBEuJZ96SglFIrA1ZxtJ/wjODaHdo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pengutronix.de; spf=pass smtp.mailfrom=pengutronix.de; arc=none smtp.client-ip=185.203.201.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pengutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pengutronix.de Received: from drehscheibe.grey.stw.pengutronix.de ([2a0a:edc0:0:c01:1d::a2]) by metis.whiteo.stw.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1vjDYM-0006eC-Uv; Fri, 23 Jan 2026 10:34:06 +0100 Received: from pty.whiteo.stw.pengutronix.de ([2a0a:edc0:2:b01:1d::c5]) by drehscheibe.grey.stw.pengutronix.de with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vjDYL-0024Jn-15; Fri, 23 Jan 2026 10:34:04 +0100 Received: from ore by pty.whiteo.stw.pengutronix.de with local (Exim 4.96) (envelope-from ) id 1vjDYK-00AtXa-1j; Fri, 23 Jan 2026 10:34:04 +0100 Date: Fri, 23 Jan 2026 10:34:04 +0100 From: Oleksij Rempel To: Mohsin Bashir Cc: netdev@vger.kernel.org, alexanderduyck@fb.com, alok.a.tiwari@oracle.com, andrew+netdev@lunn.ch, andrew@lunn.ch, chuck.lever@oracle.com, davem@davemloft.net, donald.hunter@gmail.com, edumazet@google.com, gal@nvidia.com, horms@kernel.org, idosch@nvidia.com, jacob.e.keller@intel.com, kernel-team@meta.com, kory.maincent@bootlin.com, kuba@kernel.org, lee@trager.us, pabeni@redhat.com, vadim.fedorenko@linux.dev, kernel@pengutronix.de Subject: Re: [PATCH net-next 0/3] net: ethtool: Track TX pause storm Message-ID: References: <20260122192158.428882-1-mohsin.bashr@gmail.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20260122192158.428882-1-mohsin.bashr@gmail.com> X-Sent-From: Pengutronix Hildesheim X-URL: http://www.pengutronix.de/ X-Accept-Language: de,en X-Accept-Content-Type: text/plain X-SA-Exim-Connect-IP: 2a0a:edc0:0:c01:1d::a2 X-SA-Exim-Mail-From: ore@pengutronix.de X-SA-Exim-Scanned: No (on metis.whiteo.stw.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: netdev@vger.kernel.org Hi Mohsin, hi Jakub! On Thu, Jan 22, 2026 at 11:21:55AM -0800, Mohsin Bashir wrote: > With TX pause enabled, if a device cannot deliver received frames to > the stack (e.g., during a system hang), it may generate excessive pause > frames causing a pause storm. This series updates the uAPI to track TX > pause storm events as part of the pause stats (p1), adds pause storm > protection support for fbnic (p2), and leverages p1 to provide > observability into these events for fbnic (p3). > > Mohsin Bashir (3): > net: ethtool: Track pause storm events > eth: fbnic: Add protection against pause storm > eth: fbnic: Fetch TX pause storm stats > Very interesting series! Before the Christmas break, I started looking into a similar issue on different hardware (LAN78xx USB Ethernet), which has counters but no automatic pause storm mitigation in silicon. For reference, here is the RFC patch set I'm working on: https://lore.kernel.org/all/20260123090741.1566469-1-o.rempel@pengutronix.de/ In my implementation, I added the detection logic to the driver but exposed it via the devlink health interface. I have a few questions regarding the definitions and the architectural approach here. - Definition of "Pause Storm" It seems fbnic defines a storm as a continuous assertion of the pause signal (duration-based). The code defines FBNIC_RXB_PAUSE_STORM_UNIT_WR with a 10us granularity and a 500ms threshold. Does this hardware detector handle "flapping" (high rate of distinct pause frames) or is it strictly for a stuck-high condition (deadlock/hang)? If the RX queue drains slightly and fills up again every 100ms, does the internal timer reset? If so, a "stuttering" storm might bypass this protection. Except the primary goal of this feature is to prevent the link partner's TX watchdog from firing (saving the neighbor from a reset). - Policy and Tunability The current implementation enforces an auto-recovery policy that is invisible to userspace. By resetting the storm protection in fbnic_service_task (runs every 1s), you effectively enforce a ~50% duty cycle on a hung system. Since this logic decides when to stop flow control and start dropping packets, shouldn't these thresholds be configurable? I understand that in a complete OS crash, software cannot intervene, and we rely entirely on the hardware's 500ms safety breaker. However, for partial stalls (livelocks, soft lockups) where the service task might still run, the hardcoded "force normal" policy might not be ideal. Furthermore, having the ability to disable auto-recovery is critical for testing. If we want to develop/validate a driver for the link partner that detects incoming pause storms, we need a way to generate a sustained storm from this device without the safety breaker constantly resetting it. A "controllable reference" is very valuable here. In my LAN78xx work, I used the devlink reporter's .recover callback. This lets userspace control the policy (devlink health set ... recover true/false) - determining whether the driver should auto-reset the MAC/PHY or wait for manual intervention. Baking the "force normal" logic into the service task removes this control. Architecture: This feels very similar to the existing ndo_tx_timeout watchdog, but for the RX side. We are essentially trying to handle an "RX Timeout" or "RX Stall" where the device cannot push packets to the host. I assume the 500ms hardware threshold is specifically chosen to be well below the standard 5-second ndo_tx_timeout of link partners, preventing a hung node from causing watchdog resets on its neighbors. I've observed cases where triggering a full ndo_tx_timeout reset on a stalled device leaves the hardware in a partially reconfigured "zombie" state, unable to receive anything thereafter. Did you consider if we should approach this as a central networking mechanism (an "RX Watchdog")? I am debating this for my own driver: - Option A: Export needed metrics/events to userspace (Devlink/Ethtool) and let a userspace agent decide when to reset. - Option B: Handle it entirely in the kernel (like this patch), but potentially standardize the "RX Stall" behavior so we don't end up with different heuristics in every driver. Best Regards, Oleksij -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |