Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
From: Jakub Kicinski @ 2026-06-25 15:29 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, pabeni, horms,
	tushar.vyavahare, kerneljasonxing
In-Reply-To: <aj1He9vNkRh+Ettf@boxer>

On Thu, 25 Jun 2026 17:21:31 +0200 Maciej Fijalkowski wrote:
> On Thu, Jun 25, 2026 at 07:36:36AM -0700, Jakub Kicinski wrote:
> > On Thu, 25 Jun 2026 12:35:12 +0200 Maciej Fijalkowski wrote:  
> > > On Wed, Jun 24, 2026 at 07:33:26PM -0700, Jakub Kicinski wrote:  
> > > I have not checked if this has been -net propagated already, but the rule
> > > of thumb on bpf side was that all selftests related effort goes to -next.
> > > Is it different on netdev side?  
> > 
> > We prefer -next too, but during the merge window net-next is closed.
> > 
> > What we definitely don't want is a -next patch with a Fixes tag.
> > So either net or drop the tag, please.  
> 
> I have verified that offending commit is present in net tree. Could you
> apply the v2 that I unfortunately sent already targeted at net-next, to
> net tree?

Will do. Hopefully it applies, cause net-next wasn't forwarded yet
so it doesn't include Tushar's patches.

^ permalink raw reply

* Re: [PATCH net-next] Documentation: networking: Add a test plan for ethtool pause validation
From: Maxime Chevallier @ 2026-06-25 15:29 UTC (permalink / raw)
  To: Andrew Lunn, Jakub Kicinski
  Cc: davem, Eric Dumazet, Paolo Abeni, Simon Horman, Russell King,
	Heiner Kallweit, Jonathan Corbet, Shuah Khan, Oleksij Rempel,
	Vladimir Oltean, Florian Fainelli, thomas.petazzoni, netdev,
	linux-kernel, linux-doc
In-Reply-To: <5cb8e2b4-8eb6-4446-9b90-1cd4c7964cd9@lunn.ch>

Hi Andrew,

On 5/27/26 04:47, Andrew Lunn wrote:
> On Tue, May 26, 2026 at 05:24:47PM -0700, Jakub Kicinski wrote:
>> On Fri, 22 May 2026 19:51:06 +0200 Maxime Chevallier (Netdev
>> Foundation) wrote:
>>>  Documentation/networking/pause_test_plan.rst | 556 +++++++++++++++++++
>>
>> It'd be great to hear from others but IMHO in the current form this is
>> not suitable for Documentation/networking/ We can commit the "knowledge"
>> part but enumerating the test cases seems odd for Documentation/.
> 
> Sorry, not looked too deeply at the actual content yet.
> 
> What i was thinking was a python file, which sphinx can ingest to
> produce documentation, and place holders were code would be added to
> implement the actual test during the next phase.
> 
> This is how i've done testing in the past. I would be the evil one who
> thought up the tests and described them in detail using sphinx markup
> in a python test template file. After some review they got passed off
> to a python developer for implementation. And when they got run and
> failed, sometimes the feature developer, the test developer and myself
> got together to figure who made the error.
> 
> I'm not sure we even need sphinx. What i find important is that the
> test is documented. What kAPI calls should be made with what
> parameters. What results we are expected and why? So that when a test
> fails, a developer has the information they need to fix their
> code. The Why? is important, and often missing from the kernel tests.

This isn't sphynx, but I've come-up with something like this for a
test definition :


@ksft_ethtool_needs_supported_anyof([Pause, Asym_Pause])
def test_ethtool_pause_advertising(cfg, peer) -> None:
    """Pause advertisement

    Validate that changing pause params through the ETHTOOL_MSG_PAUSE command
    translates to a change in the advertised pause params, and that these
    parameters are correct w.r.t the supported pause params and requested pause
    params.
    
    This exercises the .set_pauseparams() ethtool ops for MAC configuration,
    as well as the reconfiguration of the PHY's advertising and negociation.
    
    On non-phylink MACs, the MAC should call phy_set_sym_pause() to update the
    PHY's advertising, and restart a negotiation with phy_start_aneg() if
    need be. Failure to do so will result on the wrong advertising parameters.
    
    Pn phylink-enabled MACs, phylink deals with the PHY reconfiguration provided
    the MAC driver calls phylink_ethtool_set_pauseparam().
    
    Failing this test likely means that the PHY driver is not correctly advertising
    pause settings, either due to the MAC not triggering a PHY reconfiguration,
    a misconficonfiguration of the advertising registers by the PHY, or by
    mis-handling the phydev->advertising bitfield in the PHY driver directly.
    
    The validation is made by looking at the advertised modes locally, as well as
    what the peer's 'lp_advertising' values report.

    cfg -- local device's interface configuration
    peer -- peer device handle
    """

    # Initial conditions :
    # - Local interface is admin UP, and reports lowlayer link UP
    # - Remote interface is adming UP, and reports lowlayer link UP
    #
    # Test 1
    # - SKIP if supported doesn't contain "Pause"
    # - run 'ethtool -A ethX rx on tx on autoneg on'
    # - FAIL if the return isn't 0
    # - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
    #   "Pause" or contains "Asym_Pause"
    # - FAIL if peer's lp_advertising doesn't contain "Pause" or contains
    #   "Asym_Pause"
    # - Succeed otherwise
    #
    # Test 2
    # - SKIP uif supported doesn't contain both "Pause" and "Asym_Pause"
    # - run 'ethtool -A ethX rx on tx on autoneg on'
    # - FAIL if the return isn't 0
    # - FAIL if ETHTOOL_A_LINKMODES_OURS's advertised values does not contain
    #   "Pause" or contains "Asym_Pause"
    # - FAIL if peer's lp_advertising doesn't contain "Pause" or contains
    #   "Asym_Pause"
    #
    # ...
   
The annotation defines the pre-requisites in terms of locally supported
linkmodes, we have a docstring containing information for developpers
to debug their drivers, what I'm unsure about is the commented-out part
below, so either one big function testing multiple adjacent scenarios
or indivitual functions.

We could also use annotations to enumerate the various combinations of
modes to test.

That's just an extract of the full test suite for Pause, but before
writing the whole thing down i figure it's better to iterate on a single
test's design.

What do you think ?

Maxime

^ permalink raw reply

* Re: [PATCH net v4 2/2] net: phy: mdio-i2c: defer RollBall bridge probe to PHY discovery
From: Jakub Kicinski @ 2026-06-25 15:23 UTC (permalink / raw)
  To: Aleksander Jan Bajkowski
  Cc: Petr Wozniak, Russell King, Andrew Lunn, Heiner Kallweit,
	David S . Miller, Eric Dumazet, Paolo Abeni, netdev, linux-kernel,
	linux-phy, Maxime Chevallier, Bjorn Mork, Marek Behun
In-Reply-To: <9f813a8e-8b9a-4708-b3b6-db4972adac35@wp.pl>

On Wed, 24 Jun 2026 23:44:19 +0200 Aleksander Jan Bajkowski wrote:
> > For genuine RollBall modules (e.g. FLYPRO SFP-10GT-CS-30M with Aquantia
> > AQR113C) the probe now runs after initialization is complete and
> > correctly returns 0, so PHY detection proceeds normally.  
> The FLPRO SFP module still fails to detect the PHY. It is necessary to
> increase `module_t_wait` to 20 seconds. Most likely, during this time
> the module loads the PHY firmware from SPI memory or from the
> microcontroller (rollball bridge) via MDIO. Same probably applies to
> most SFP modules with a PHY that load firmware at start-up (AQR113,
> RTL8261C etc.).

Just to clarify is FLPRO a typo or a knock off ?
Do you want something to be changed here or you're just flagging that
more follow ups are needed if we want to cover more modules?

^ permalink raw reply

* RE: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level triggered.
From: Selvamani Rajagopal @ 2026-06-25 15:21 UTC (permalink / raw)
  To: Parthiban.Veerasooran@microchip.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, robh@kernel.org, krzk+dt@kernel.org,
	conor+dt@kernel.org, Piergiorgio Beruto
  Cc: andrew@lunn.ch, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Conor.Dooley@microchip.com,
	devicetree@vger.kernel.org
In-Reply-To: <f127837f-e08f-48e0-a3a9-906e1d61d6bb@microchip.com>

> -----Original Message-----
> From: Parthiban.Veerasooran@microchip.com <Parthiban.Veerasooran@microchip.com>
> Subject: Re: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level triggered.
> 
> 
> With your above patches, I did a quick test (Test case 2) with two
> Microchip MAC-PHYs and faced a similar issue reported before. Sharing
> the dmesg crash log for your reference.

Root cause seems to be same. When oa_tc6_update_rx_skb function is called, tc6->rx_skb 
seems to be NULL, which may mean, controller seems to be not getting start

I have a theory. Look at line #933. We have the following comment. I am sure this could be true
for the call to oa_tc6_prcs_rx_frame_end at line #926 or oa_tc6_prcs_ongoing_rx_frame at line #950.
               /* After rx buffer overflow error received, there might be a
                 * possibility of getting an end valid of a previously
                 * incomplete rx frame along with the new rx frame start valid.
                 */

Either we change the following line in the function oa_tc6_update_rx_skb
    if ((tc6->rx_skb->tail + length) > tc6->rx_skb->end) {
to
        if (tc6->rx_skb == NULL || (tc6->rx_skb->tail + length) > tc6->rx_skb->end) {

Or add a check 
   If (tc6->rx_skb) before calling above mentioned two functions from the callee function.

I could do. But I have no way of verifying this. I am sure it will fix the crash. I would like to confirm
whether traffic recovers.

> 
> [ 2863.182105] eth1: Receive buffer overflow error
> [ 2863.199905] eth1: Receive buffer overflow error
> [ 2867.669312] Unable to handle kernel NULL pointer dereference at
> virtual address 00000000000000b8


^ permalink raw reply

* Re: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
From: Maciej Fijalkowski @ 2026-06-25 15:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, pabeni, horms,
	tushar.vyavahare, kerneljasonxing
In-Reply-To: <20260625073636.449a28c0@kernel.org>

On Thu, Jun 25, 2026 at 07:36:36AM -0700, Jakub Kicinski wrote:
> On Thu, 25 Jun 2026 12:35:12 +0200 Maciej Fijalkowski wrote:
> > On Wed, Jun 24, 2026 at 07:33:26PM -0700, Jakub Kicinski wrote:
> > > On Tue, 23 Jun 2026 11:10:08 +0200 Maciej Fijalkowski wrote:  
> > > > Subject: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test  
> > > 
> > > Do you want it in net? Either way - we'll need a rebase  
> > 
> > I have not checked if this has been -net propagated already, but the rule
> > of thumb on bpf side was that all selftests related effort goes to -next.
> > Is it different on netdev side?
> 
> We prefer -next too, but during the merge window net-next is closed.
> 
> What we definitely don't want is a -next patch with a Fixes tag.
> So either net or drop the tag, please.

I have verified that offending commit is present in net tree. Could you
apply the v2 that I unfortunately sent already targeted at net-next, to
net tree?

^ permalink raw reply

* Re: [PATCH net v2] sctp: fix SCTP_RESET_STREAMS stream list length limit
From: Jakub Kicinski @ 2026-06-25 15:19 UTC (permalink / raw)
  To: Yousef Alhouseen
  Cc: Marcelo Ricardo Leitner, Xin Long, David S . Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, linux-sctp, netdev, linux-kernel
In-Reply-To: <20260625142354.2600-1-alhouseenyousef@gmail.com>

On Thu, 25 Jun 2026 16:23:54 +0200 Yousef Alhouseen wrote:
> Changes in v2:
> - Add Fixes and Acked-by tags from Xin Long.
> - v1: https://lore.kernel.org/r/20260624122213.4052-1-alhouseenyousef@gmail.com

You don't have to repost patches for networking just to add tags :/

^ permalink raw reply

* Re: [PATCH v2 net 2/3] net: udp_tunnel: convert state flags to atomic bitops
From: Eric Dumazet @ 2026-06-25 15:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stanislav Fomichev, David S . Miller, Paolo Abeni, Simon Horman,
	Yue Sun, netdev, eric.dumazet
In-Reply-To: <20260625080854.06851faf@kernel.org>

On Thu, Jun 25, 2026 at 8:08 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 25 Jun 2026 06:59:37 +0000 Eric Dumazet wrote:
> > These flags can be modified concurrently from different contexts:
> > - RTNL-locked paths (like add_port/del_port) write to need_sync and
> >   work_pending.
>
> These should hold utn->lock. Not sure why udp_tunnel_nic_lock()
> is locking in the callers rather than directly in
> __udp_tunnel_nic_add_port() / __udp_tunnel_nic_del_port()..
>
> > - The RTNL-less reset path (reset_ntf, used by netdevsim) writes to
> >   need_sync and need_replay under utn->lock.
>
> I'd rather add asserts to confirm utn lock is held everywhere.
> This code is hard enough to follow as is, without having to
> think through potential concurrent accesses.

Ah ok, I will let you finish this, it seems I am wasting your time.

Thanks.

^ permalink raw reply

* Re: [PATCH net 1/4] net: turn the rx_mode work into a generic netdev_work facility
From: Jakub Kicinski @ 2026-06-25 15:17 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, jv, sdf,
	dongchenchen2, idosch, n05ec, yuantan098, nb, aleksandr.loktionov,
	dtatulea
In-Reply-To: <CAAVpQUCbT1Q9BPTLrVCjpt2vcJiWsYKa0onJ_vwnq86L73m8mw@mail.gmail.com>

On Wed, 24 Jun 2026 22:55:06 -0700 Kuniyuki Iwashima wrote:
> Oh very nice !
> 
> I was drafting almost the same change for dev_set_rx_mode()
> in mcast path and some ipvlan changes.

Glad to hear! I wasn't 100% convinced by the added complexity
in the core :S

^ permalink raw reply

* Re: [PATCH] xsk: fix memory corruptions in net/core/xdp.c
From: Alexander Lobakin @ 2026-06-25 15:14 UTC (permalink / raw)
  To: Clement Lecigne
  Cc: edumazet, netdev, bpf, linux-kernel, kuba, sdf, horms,
	john.fastabend, ast, daniel
In-Reply-To: <20260624084130.2382335-1-clecigne@google.com>

From: Clement Lecigne <clecigne@google.com>
Date: Wed, 24 Jun 2026 08:41:28 +0000

> From: Clément Lecigne <clecigne@google.com>
> 
> Commit 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
> introduced a vulnerability in the handling of XDP_PASS for AF_XDP zero-copy
> frames.
> 
> Note: Currently, this specific AF_XDP zero-copy conversion path is only
> reachable from the drivers/net/ethernet/intel/ice driver.

idpf uses this, too (every driver based on libeth_xdp in general,
currently these two).

> 
> When building an skb, xdp_build_skb_from_zc() uses the chunk size
> (xdp->frame_sz) for the allocation. However, napi_build_skb() automatically
> reserves space at the end of the allocation for the skb_shared_info
> structure. 
> 
> Most high performance UMEM applications use 4K chunks, where the
> corruption cannot happen. However, if the UMEM is configured with 2KB
> chunks (a very common configuration to maximize packet density in memory),
> a standard 1500 MTU packet will trigger the corruption because the required
> space exceeds the 2048 byte chunk size:
> 
> Headroom (256) + Packet (1514) + skb_shared_info (320) = 2090 bytes
> 
> Because 2090 bytes > 2048 bytes and __skb_put() does not perform bounds
> checking, the memcpy() writes past the available linear data area and
> corrupts the skb_shared_info structure. This can lead to arbitrary code
> execution if pointers like destructor_arg are overwritten.
> 
> Additionally, in xdp_copy_frags_from_zc(), the allocation size is set
> strictly to the fragment size (len), but the subsequent memcpy() uses
> LARGEST_ALIGN(len). This mismatch results in an out-of-bounds write of
> up to 7 bytes, which triggers KASAN warnings and is unsafe despite typical
> page pool allocator padding.
> 
> Fix the skb allocation in xdp_build_skb_from_zc() by dynamically
> calculating the exact truesize required: the sum of the headroom, the
> packet length, and the skb_shared_info overhead, properly aligned via
> SKB_DATA_ALIGN.
> 
> Fix the out-of-bounds write in xdp_copy_frags_from_zc() by rounding up
> the allocation request using LARGEST_ALIGN(len) to match the copy
> operation.
> 
> Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> CC: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Clément Lecigne <clecigne@google.com>
> ---
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 9890a30584ba..f36d1fb875ab 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -699,7 +699,7 @@ static noinline bool xdp_copy_frags_from_zc(struct sk_buff *skb,
>  	for (u32 i = 0; i < nr_frags; i++) {
>  		const skb_frag_t *frag = &xinfo->frags[i];
>  		u32 len = skb_frag_size(frag);
> -		u32 offset, truesize = len;
> +		u32 offset, truesize = LARGEST_ALIGN(len);

I think you need to re-sort this to keep RCT, now that the truesize
initialization is way longer than it was.

		const skb_frag_t *frag = &xinfo->frags[i];
		u32 offset, len = skb_frag_size(frag);
		u32 truesize = LARGEST_ALIGN(len);
		struct page *page;

>  		struct page *page;
>  
>  		page = page_pool_dev_alloc(pp, &offset, &truesize);

BTW usually LARGEST_ALIGN() aligns to 16, I've never seen a bigger one.
IIRC Page Pool never returns a truesize aligned to a smaller value. But
if you're really able to trigger this, it probably does?

> @@ -740,7 +740,9 @@ struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
>  {
>  	const struct xdp_rxq_info *rxq = xdp->rxq;
>  	u32 len = xdp->data_end - xdp->data_meta;
> -	u32 truesize = xdp->frame_sz;
> +	u32 headroom = xdp->data_meta - xdp->data_hard_start;
> +	u32 truesize = SKB_DATA_ALIGN(headroom + len) +
> +		       SKB_DATA_ALIGN(sizeof(struct skb_shared_info));

Ah now I get it: xdp->frame_sz doesn't account the shinfo for
single-buffer frames, only for multi-buffer ones. The fix looks correct,
but I'd use SKB_HEAD_ALIGN() since it does exactly what you're
open-coding here and sort the declarations:

{
	u32 hr = xdp->data_meta - xdp->data_hard_start;
	const struct xdp_rxq_info *rxq = xdp->rxq;
	u32 len = xdp->data_end - xdp->data_meta;
	u32 truesize = SKB_HEAD_ALIGN(hr + len);
	struct sk_buff *skb = NULL;
	struct page_pool *pp;
	int metalen;
	void *data;

	if (!IS_ENABLED(CONFIG_PAGE_POOL))
		return NULL;

	...

>  	struct sk_buff *skb = NULL;
>  	struct page_pool *pp;
>  	int metalen;
> @@ -762,7 +764,7 @@ struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
>  	}
>  
>  	skb_mark_for_recycle(skb);
> -	skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
> +	skb_reserve(skb, headroom);
>  
>  	memcpy(__skb_put(skb, len), xdp->data_meta, LARGEST_ALIGN(len));

Thanks,
Olek

^ permalink raw reply

* [PATCH v4 net 3/3] i40e: keep q_vectors array in sync with channel count changes
From: Maciej Fijalkowski @ 2026-06-25 15:14 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
	jacob.e.keller, Maciej Fijalkowski
In-Reply-To: <20260625151431.1102838-1-maciej.fijalkowski@intel.com>

For the main VSI, i40e_set_num_rings_in_vsi() always derives
num_q_vectors from pf->num_lan_msix. At the same time, ethtool -L stores
the user requested channel count in vsi->req_queue_pairs and the queue
setup path uses that value for the effective number of queue pairs.

This leaves queue and vector counts out of sync after shrinking channel
count via ethtool -L. The active queue configuration is reduced, but the
VSI still keeps the full PF-sized q_vector topology.

That mismatch breaks reconfiguration flows which rely on vector/NAPI
state matching the effective channel configuration. In particular,
toggling /sys/class/net/<dev>/threaded after reducing the channel count
can hang, and later channel-count changes can fail because VSI reinit
does not rebuild q_vectors to match the new vector count.

Fix this by making the main VSI num_q_vectors follow the effective
requested channel count, capped by the available MSI-X vectors. Update
i40e_vsi_reinit_setup() to rebuild q_vectors during VSI reinit so the
vector topology is refreshed together with the ring arrays when channel
count changes.

Keep alloc_queue_pairs unchanged and based on pf->num_lan_qps so the VSI
retains its full queue capacity.

Selftest napi_threaded.py was originally used when Jakub reported hang
on /sys/class/net/<dev>/threaded toggle. In order to make it pass on
i40e, use persistent NAPI configuration for q_vector NAPIs so NAPI
identity and threaded settings survive q_vector reallocation across
channel-count changes. This is achieved by using netif_napi_add_config()
when configuring q_vectors.

$ export NETIF=ens259f1np1
$ sudo -E env PATH="$PATH" ./tools/testing/selftests/drivers/net/napi_threaded.py
TAP version 13
1..3
ok 1 napi_threaded.napi_init
ok 2 napi_threaded.change_num_queues
ok 3 napi_threaded.enable_dev_threaded_disable_napi_threaded
Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/intel-wired-lan/20260316133100.6054a11f@kernel.org/
Fixes: d2a69fefd756 ("i40e: Fix changing previously set num_queue_pairs for PFs")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 60 +++++++++++++--------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4adc7b0fb2f4..c017217a1bc3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11406,10 +11406,14 @@ static void i40e_service_timer(struct timer_list *t)
 static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
 {
 	struct i40e_pf *pf = vsi->back;
+	u16 qps;
 
 	switch (vsi->type) {
 	case I40E_VSI_MAIN:
 		vsi->alloc_queue_pairs = pf->num_lan_qps;
+		qps = vsi->req_queue_pairs ?
+		      min(vsi->req_queue_pairs, pf->num_lan_qps) :
+		      pf->num_lan_qps;
 		if (!vsi->num_tx_desc)
 			vsi->num_tx_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
 						 I40E_REQ_DESCRIPTOR_MULTIPLE);
@@ -11417,7 +11421,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
 			vsi->num_rx_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
 						 I40E_REQ_DESCRIPTOR_MULTIPLE);
 		if (test_bit(I40E_FLAG_MSIX_ENA, pf->flags))
-			vsi->num_q_vectors = pf->num_lan_msix;
+			vsi->num_q_vectors = clamp(qps, 1, pf->num_lan_msix);
 		else
 			vsi->num_q_vectors = 1;
 
@@ -11469,12 +11473,11 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
 /**
  * i40e_vsi_alloc_arrays - Allocate queue and vector pointer arrays for the vsi
  * @vsi: VSI pointer
- * @alloc_qvectors: a bool to specify if q_vectors need to be allocated.
  *
  * On error: returns error code (negative)
  * On success: returns 0
  **/
-static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
+static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi)
 {
 	struct i40e_ring **next_rings;
 	int size;
@@ -11493,19 +11496,18 @@ static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
 	}
 	vsi->rx_rings = next_rings;
 
-	if (alloc_qvectors) {
-		/* allocate memory for q_vector pointers */
-		size = sizeof(struct i40e_q_vector *) * vsi->num_q_vectors;
-		vsi->q_vectors = kzalloc(size, GFP_KERNEL);
-		if (!vsi->q_vectors) {
-			ret = -ENOMEM;
-			goto err_vectors;
-		}
+	/* allocate memory for q_vector pointers */
+	size = sizeof(struct i40e_q_vector *) * vsi->num_q_vectors;
+	vsi->q_vectors = kzalloc(size, GFP_KERNEL);
+	if (!vsi->q_vectors) {
+		ret = -ENOMEM;
+		goto err_vectors;
 	}
 	return ret;
 
 err_vectors:
 	kfree(vsi->tx_rings);
+	vsi->tx_rings = NULL;
 	return ret;
 }
 
@@ -11578,7 +11580,7 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
 	if (ret)
 		goto err_rings;
 
-	ret = i40e_vsi_alloc_arrays(vsi, true);
+	ret = i40e_vsi_alloc_arrays(vsi);
 	if (ret)
 		goto err_rings;
 
@@ -11603,18 +11605,15 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
 /**
  * i40e_vsi_free_arrays - Free queue and vector pointer arrays for the VSI
  * @vsi: VSI pointer
- * @free_qvectors: a bool to specify if q_vectors need to be freed.
  *
  * On error: returns error code (negative)
  * On success: returns 0
  **/
-static void i40e_vsi_free_arrays(struct i40e_vsi *vsi, bool free_qvectors)
+static void i40e_vsi_free_arrays(struct i40e_vsi *vsi)
 {
 	/* free the ring and vector containers */
-	if (free_qvectors) {
-		kfree(vsi->q_vectors);
-		vsi->q_vectors = NULL;
-	}
+	kfree(vsi->q_vectors);
+	vsi->q_vectors = NULL;
 	kfree(vsi->tx_rings);
 	vsi->tx_rings = NULL;
 	vsi->rx_rings = NULL;
@@ -11674,7 +11673,7 @@ static int i40e_vsi_clear(struct i40e_vsi *vsi)
 	i40e_put_lump(pf->irq_pile, vsi->base_vector, vsi->idx);
 
 	bitmap_free(vsi->af_xdp_zc_qps);
-	i40e_vsi_free_arrays(vsi, true);
+	i40e_vsi_free_arrays(vsi);
 	i40e_clear_rss_config_user(vsi);
 
 	pf->vsi[vsi->idx] = NULL;
@@ -12046,7 +12045,8 @@ static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, int v_idx)
 	cpumask_copy(&q_vector->affinity_mask, cpu_possible_mask);
 
 	if (vsi->netdev)
-		netif_napi_add(vsi->netdev, &q_vector->napi, i40e_napi_poll);
+		netif_napi_add_config(vsi->netdev, &q_vector->napi,
+				      i40e_napi_poll, v_idx);
 
 	/* tie q_vector and vsi together */
 	vsi->q_vectors[v_idx] = q_vector;
@@ -14267,12 +14267,26 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 
 	pf = vsi->back;
 
+	if (test_bit(I40E_FLAG_MSIX_ENA, pf->flags)) {
+		i40e_put_lump(pf->irq_pile, vsi->base_vector, vsi->idx);
+		vsi->base_vector = 0;
+	}
+
 	i40e_put_lump(pf->qp_pile, vsi->base_queue, vsi->idx);
+	i40e_vsi_free_q_vectors(vsi);
 	i40e_vsi_clear_rings(vsi);
+	i40e_vsi_free_arrays(vsi);
 
-	i40e_vsi_free_arrays(vsi, false);
 	i40e_set_num_rings_in_vsi(vsi);
-	ret = i40e_vsi_alloc_arrays(vsi, false);
+	ret = i40e_vsi_alloc_arrays(vsi);
+	if (ret)
+		goto err_netdev;
+
+	/* Rebuild q_vectors during VSI reinit because the effective channel
+	 * count may change num_q_vectors. Keep vector topology aligned with the
+	 * queue configuration after ethtool's .set_channels() callback.
+	 */
+	ret = i40e_vsi_setup_vectors(vsi);
 	if (ret)
 		goto err_netdev;
 
@@ -14284,7 +14298,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 		dev_info(&pf->pdev->dev,
 			 "failed to get tracking for %d queues for VSI %d err %d\n",
 			 alloc_queue_pairs, vsi->seid, ret);
-		goto err_netdev;
+		goto err_rings;
 	}
 	vsi->base_queue = ret;
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 net 2/3] i40e: fix potential UAF in i40e_vsi_setup()'s error path
From: Maciej Fijalkowski @ 2026-06-25 15:14 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
	jacob.e.keller, Maciej Fijalkowski
In-Reply-To: <20260625151431.1102838-1-maciej.fijalkowski@intel.com>

Sashiko pointed out an issue where error path in i40e_vsi_reinit_setup()
released ring memory but then when freeing q_vectors, the rings mapped
to q_vectors where touched which implies a regular use-after-free bug.

Apparently i40e_vsi_setup() has the same problem, so swap the allocation
and freeing order and fix the 13 year old bug.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 471fa7f7b643..4adc7b0fb2f4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -14460,14 +14460,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 		fallthrough;
 	case I40E_VSI_FDIR:
 		/* set up vectors and rings if needed */
-		ret = i40e_vsi_setup_vectors(vsi);
-		if (ret)
-			goto err_msix;
-
 		ret = i40e_alloc_rings(vsi);
 		if (ret)
 			goto err_rings;
 
+		ret = i40e_vsi_setup_vectors(vsi);
+		if (ret)
+			goto err_qvec;
+
 		/* map all of the rings to the q_vectors */
 		i40e_vsi_map_rings_to_vectors(vsi);
 
@@ -14487,10 +14487,10 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 	return vsi;
 
 err_config:
+	i40e_vsi_free_q_vectors(vsi);
+err_qvec:
 	i40e_vsi_clear_rings(vsi);
 err_rings:
-	i40e_vsi_free_q_vectors(vsi);
-err_msix:
 	if (vsi->netdev_registered) {
 		vsi->netdev_registered = false;
 		unregister_netdev(vsi->netdev);
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 net 1/3] i40e: unregister netdev before clearing VSI on reinit failure
From: Maciej Fijalkowski @ 2026-06-25 15:14 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
	jacob.e.keller, Maciej Fijalkowski
In-Reply-To: <20260625151431.1102838-1-maciej.fijalkowski@intel.com>

i40e_vsi_reinit_setup() tears down the existing VSI queue/ring backing
state before allocating replacement arrays and queue tracking. If one of
these early allocations fails, the function jumps directly to err_vsi
and calls i40e_vsi_clear().

For a registered netdev, this frees the VSI while
netdev_priv(netdev)->vsi can still point at it, leaving the registered
netdev with dangling private driver state.

Split the error path so failures after destructive reinit teardown first
unregister and free the netdev before clearing the VSI.

Fixes: d2a69fefd756 ("i40e: Fix changing previously set num_queue_pairs for PFs")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a04683004a56..471fa7f7b643 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -14274,7 +14274,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 	i40e_set_num_rings_in_vsi(vsi);
 	ret = i40e_vsi_alloc_arrays(vsi, false);
 	if (ret)
-		goto err_vsi;
+		goto err_netdev;
 
 	alloc_queue_pairs = vsi->alloc_queue_pairs *
 			    (i40e_enabled_xdp_vsi(vsi) ? 2 : 1);
@@ -14284,7 +14284,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 		dev_info(&pf->pdev->dev,
 			 "failed to get tracking for %d queues for VSI %d err %d\n",
 			 alloc_queue_pairs, vsi->seid, ret);
-		goto err_vsi;
+		goto err_netdev;
 	}
 	vsi->base_queue = ret;
 
@@ -14309,6 +14309,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 
 err_rings:
 	i40e_vsi_free_q_vectors(vsi);
+err_netdev:
 	if (vsi->netdev_registered) {
 		vsi->netdev_registered = false;
 		unregister_netdev(vsi->netdev);
@@ -14318,7 +14319,6 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 	if (vsi->type == I40E_VSI_MAIN)
 		i40e_devlink_destroy_port(pf);
 	i40e_aq_delete_element(&pf->hw, vsi->seid, NULL);
-err_vsi:
 	i40e_vsi_clear(vsi);
 	return NULL;
 }
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 net 0/3] i40e: re-init and UAF fixes
From: Maciej Fijalkowski @ 2026-06-25 15:14 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
	jacob.e.keller, Maciej Fijalkowski

v4:
- add preceding patch that fixes a case when some of re-init allocations
  failed and we missed de-registering netdev at failure path
- pull out i40e_vsi_setup() changes onto separate patch
v3:
- address UAF when ring arrays were freed before q_vector's ring
  containers (Sashiko, Jacob)
- remove bool params from alloc/free array routines (Simon)
v2:
- NULL vsi->tx_rings in i40e_vsi_alloc_arrays() (Sashiko)

Maciej Fijalkowski (3):
  i40e: unregister netdev before clearing VSI on reinit failure
  i40e: fix potential UAF in i40e_vsi_setup()'s error path
  i40e: keep q_vectors array in sync with channel count changes

 drivers/net/ethernet/intel/i40e/i40e_main.c | 76 ++++++++++++---------
 1 file changed, 45 insertions(+), 31 deletions(-)

-- 
2.43.0


^ permalink raw reply

* Re: [PATCH v2 net 2/3] net: udp_tunnel: convert state flags to atomic bitops
From: Jakub Kicinski @ 2026-06-25 15:08 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Eric Dumazet, David S . Miller, Paolo Abeni, Simon Horman,
	Yue Sun, netdev, eric.dumazet
In-Reply-To: <20260625065938.654652-3-edumazet@google.com>

On Thu, 25 Jun 2026 06:59:37 +0000 Eric Dumazet wrote:
> These flags can be modified concurrently from different contexts:
> - RTNL-locked paths (like add_port/del_port) write to need_sync and
>   work_pending.

These should hold utn->lock. Not sure why udp_tunnel_nic_lock() 
is locking in the callers rather than directly in
__udp_tunnel_nic_add_port() / __udp_tunnel_nic_del_port()..

> - The RTNL-less reset path (reset_ntf, used by netdevsim) writes to
>   need_sync and need_replay under utn->lock.

I'd rather add asserts to confirm utn lock is held everywhere.
This code is hard enough to follow as is, without having to
think through potential concurrent accesses.

^ permalink raw reply

* [Regression] Broken MPLS routes with multiple nexthops
From: Anthony Doeraene @ 2026-06-25 15:07 UTC (permalink / raw)
  To: davem; +Cc: kuniyu, netdev

Hello all,

According to my experiments, it seems that ECMP with MPLS (i.e. an MPLS 
route with multiple
nexthops) is broken on the master branch of the kernel.

Indeed, whenever adding an MPLS route with multiple nexthops, ip route 
show the route as
a dead route/link down, even if nexthops are reachable.

Example to reproduce the error (tested with virtme-ng and the master 
branch of the kernel):
```
modprobe mpls_iptunnel mpls_router
sysctl net.mpls.platform_labels=100000
ip link set dummy0 up
ip addr add fc00:1::1/112 dev dummy0
ip addr add fc00:2::1/112 dev dummy0
ip -M route add 16000 \
     nexthop via inet6 fc00:1::2 as 16001 \
     nexthop via inet6 fc00:2::2 as 16002

# Check the route
ip -M route
# Output:
#     16000 dead linkdown
#
# Route is not present, even if accepted
```

 From a git blame, it seems that commit 
f0914b8436c589b7ab32c614d8d7868eb4ebd5bf
broke the core logic for building nexthops.

This commit modified function `mpls_nx_build_multi` by setting 
`rt->rt_nhn` to 0 at the
start of the function. However, the loop `change_nexthops` just below 
depends on
`rt->rt_nhn` to know the actual number of nexthops that it should build. 
As `rt->rt_nhn`
is set to 0 just before, **no nexthop is ever built**, leading to a dead 
route. Even if we
remove this modification, this commit incorrectly increments 
`rt->rt_nhn` at the end of
the loop (`rt->rt_nhn++`),  such that the loop always end with an error 
as it tries to
constructs more nexthops that actually provided.

Commenting these two lines fix the issue, and allows to create once 
again MPLS routes
with multiple nexthops:

```
modprobe mpls_iptunnel mpls_router
sysctl net.mpls.platform_labels=100000
ip link set dummy0 up
ip addr add fc00:1::1/112 dev dummy0
ip addr add fc00:2::1/112 dev dummy0
ip -M route add 16000 \
     nexthop via inet6 fc00:1::2 as 16001 \
     nexthop via inet6 fc00:2::2 as 16002

# Check the route
ip -M route
# Output:
#     16000
#            nexthop as to 16001 via inet6 fc00:1::2 dev dummy0
#            nexthop as to 16002 via inet6 fc00:2::2 dev dummy0
#
# Route is accepted and present !
```

Overall, I think it would be interesting to discuss what this patch was 
trying to achieve,
and how we can conciliate both use-cases.

Best regards and looking forward to hearing from you,
Doeraene Anthony

^ permalink raw reply

* Re: [PATCH v2] mctp: serial: replace memset with zero-initialization
From: Manish Baing @ 2026-06-25 15:01 UTC (permalink / raw)
  To: David Laight
  Cc: jk, matt, andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel
In-Reply-To: <CAJvdc_fTNPbJcgVv4kTe7VP9ibrzECHkad8Uyan8KrP-=WTyOQ@mail.gmail.com>

Hi David,
Just a quick follow-up on this thread.
Please let me know how you would like me to proceed with this patch.

Thanks & Regards ,
Manish

On Sun, Jun 7, 2026 at 1:20 AM Manish Baing <manishbaing2789@gmail.com> wrote:
>
> Hi David,
>
> Just a quick follow-up on this thread. When you have a moment, I would
> love to hear your thoughts on my previous email.
> Please let me know how you would like me to proceed with this patch.
>
> Thanks & Regards ,
>  Manish
>
> On Sun, May 31, 2026 at 3:06 PM Manish Baing <manishbaing2789@gmail.com> wrote:
> >
> > Hi David,
> >
> > I understand that this might just be unnecessary churn for the
> > networking subsystem, and I am perfectly fine dropping the patch if
> > that is the case.
> >
> > However, for my own learning, I would love to get your perspective on
> > this. I was carrying over a pattern from recent cleanups in the IIO
> > subsystem [1],
> > as empty brace initialization was noted as the preferred approach in
> > those discussions (including by Kees Cook [2]).
> >
> > Is the reasoning behind that recommendation just not applicable to
> > netdev,or is the policy here strictly against code churn unless it
> > fixes a tangible bug?
> >
> > Thanks for the guidance.
> >
> > [1]: https://lore.kernel.org/all/20250611-iio-zero-init-stack-with-instead-of-memset-v1-0-ebb2d0a24302@baylibre.com/
> > [2]: https://lore.kernel.org/linux-iio/202505090942.48EBF01B@keescook/
> >
> > Regards,
> > Manish
> >
> >
> > On Sun, May 31, 2026 at 2:42 PM David Laight
> > <david.laight.linux@gmail.com> wrote:
> > >
> > > On Sat, 30 May 2026 22:46:46 +0000
> > > Manish Baing <manishbaing2789@gmail.com> wrote:
> > >
> > > > Use empty brace initialization (= {}) instead of explicit memset()
> > > > to zero-initialize stack memory to simplify the code.
> > > >
> > > > No functional change.
> > >
> > > Isn't it also entirely pointless?
> > >
> > > -- David
> > >
> > > >
> > > > Signed-off-by: Manish Baing <manishbaing2789@gmail.com>
> > > > ---
> > > > Changes in v2:
> > > > - Fixed a compilation error caused by a duplicate variable declaration caught
> > > >   by the kernel test robot.
> > > >
> > > >  drivers/net/mctp/mctp-serial.c | 3 +--
> > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/mctp/mctp-serial.c b/drivers/net/mctp/mctp-serial.c
> > > > index 26c9a33fd636..df721ca4e07b 100644
> > > > --- a/drivers/net/mctp/mctp-serial.c
> > > > +++ b/drivers/net/mctp/mctp-serial.c
> > > > @@ -536,13 +536,12 @@ struct test_chunk_tx {
> > > >
> > > >  static void test_next_chunk_len(struct kunit *test)
> > > >  {
> > > > -     struct mctp_serial devx;
> > > > +     struct mctp_serial devx = { };
> > > >       struct mctp_serial *dev = &devx;
> > > >       int next;
> > > >
> > > >       const struct test_chunk_tx *params = test->param_value;
> > > >
> > > > -     memset(dev, 0x0, sizeof(*dev));
> > > >       memcpy(dev->txbuf, params->input, params->input_len);
> > > >       dev->txlen = params->input_len;
> > > >
> > >

^ permalink raw reply

* Re: Please backport bridge multicast exponential field encoding fix series to stable kernels
From: Ujjal Roy @ 2026-06-25 14:50 UTC (permalink / raw)
  To: Sasha Levin
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Nikolay Aleksandrov, Ido Schimmel, David Ahern,
	Shuah Khan, Andy Roulin, Yong Wang, Petr Machata, stable, Greg KH,
	Greg Kroah-Hartman, Ujjal Roy, bridge, Kernel, Kernel,
	linux-kselftest
In-Reply-To: <20260625054005.0016.bridge-mcast@kernel.org>

On Thu, Jun 25, 2026 at 4:12 PM Sasha Levin <sashal@kernel.org> wrote:
>
> > Please backport the 5-patch bridge multicast exponential field
> > encoding series (726fa7da2d8c, 12cfb4ecc471, 95bfd196f0dc,
> > e51560f4220a, 529dbe762de0) to the stable kernels.
>
> I tried, but it doesn't apply to 7.1. Could you provide a backport please?
>
> --
> Thanks,
> Sasha

I will create patches on top of 7.1. But tell me what about all other
stable releases? I have to create patches to all stables and how to
share the patches to you? Via this email or any other process? I am a
fresh on backporting my changes to all stables.

^ permalink raw reply

* Re: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
From: Jakub Sitnicki @ 2026-06-25 14:47 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	tushar.vyavahare, kerneljasonxing
In-Reply-To: <20260623091008.1046547-1-maciej.fijalkowski@intel.com>

On Tue, Jun 23, 2026 at 11:10 AM +02, Maciej Fijalkowski wrote:
> The UMEM state refactor made __send_pkts() use xsk->umem for Tx
> address generation. At the same time, the shared-UMEM Tx setup copies the
> Rx UMEM state into a Tx-local state object and resets base_addr and
> next_buffer before configuring the Tx socket.
>
> Passing that Tx-local object to xsk_configure() makes xsk->umem point to
> the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
> the roles are switched: the same socket is then used for Rx validation, but
> received descriptors from the other logical UMEM half are checked against
> base_addr == 0. With the new UMEM bounds check, a valid address such as
> base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
> window.
>
> Keep xsk->umem as the shared/Rx UMEM view used for socket configuration
> and Rx validation. Use the ifobject-local UMEM copy only for Tx descriptor
> address generation, preserving the BIDIRECTIONAL test's intent of using
> the proper logical UMEM half after the direction switch.
>
> Fixes: b17631032769 ("selftests/xsk: Move UMEM state from ifobject to xsk_socket_info")
> Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com
> ---
>  tools/testing/selftests/bpf/prog_tests/test_xsk.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
> index d8a1c0d40e5a..50a8dbacb63d 100644
> --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
> +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
> @@ -1169,8 +1169,8 @@ static int receive_pkts(struct test_spec *test)
>  static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, bool timeout)
>  {
>  	u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len;
> +	struct xsk_umem_info *umem = ifobject->xsk_arr[0].umem_real;
>  	struct pkt_stream *pkt_stream = xsk->pkt_stream;
> -	struct xsk_umem_info *umem = xsk->umem;
>  	bool use_poll = ifobject->use_poll;
>  	struct pollfd fds = { };
>  	int ret;

IIUC, this works because umem_real happens to point at shared/Tx UMEM
view (base_addr=0).

> @@ -1524,7 +1524,7 @@ static int thread_common_ops_tx(struct test_spec *test, struct ifobject *ifobjec
>  	umem_tx->base_addr = 0;
>  	umem_tx->next_buffer = 0;
>  
> -	ret = xsk_configure(test, ifobject, umem_tx, true);
> +	ret = xsk_configure(test, ifobject, umem_rx, true);
>  	if (ret)
>  		return ret;
>  	ifobject->xsk = &ifobject->xsk_arr[0];

And this bit works because thread_common_ops_tx is only invoked in
shared test case.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>

^ permalink raw reply

* Re: [PATCH v6 07/10] rust: configfs: use `LocalModule` for `THIS_MODULE`
From: Gary Guo @ 2026-06-25 14:40 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260624-fix-fops-owner-v6-7-5295e333cb3e@linux.dev>

On Wed Jun 24, 2026 at 4:00 PM BST, Alvin Sun wrote:
> Replace the `THIS_MODULE` static reference in the `configfs_attrs!`
> macro with `this_module::<LocalModule>()`, and update
> rnull to import `LocalModule` instead of `THIS_MODULE`, consistent
> with the move of `THIS_MODULE` into the `ModuleMetadata` trait.
>
> Assisted-by: opencode:glm-5.2
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Acked-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>
> ---
>  drivers/block/rnull/configfs.rs | 6 ++----
>  rust/kernel/configfs.rs         | 8 +++++---
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/block/rnull/configfs.rs b/drivers/block/rnull/configfs.rs
> index c10a55fc58948..b2547ad1e5ddd 100644
> --- a/drivers/block/rnull/configfs.rs
> +++ b/drivers/block/rnull/configfs.rs
> @@ -1,9 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> -use super::{
> -    NullBlkDevice,
> -    THIS_MODULE, //
> -};
> +use super::NullBlkDevice;
> +use crate::LocalModule;
>  use kernel::{
>      block::mq::gen_disk::{
>          GenDisk,
> diff --git a/rust/kernel/configfs.rs b/rust/kernel/configfs.rs
> index 2339c6467325d..c31d7882e216d 100644
> --- a/rust/kernel/configfs.rs
> +++ b/rust/kernel/configfs.rs
> @@ -875,7 +875,7 @@ fn as_ptr(&self) -> *const bindings::config_item_type {
>  ///                 configfs::Subsystem<Configuration>,
>  ///                 Configuration
>  ///                 >::new_with_child_ctor::<N,Child>(
> -///             &THIS_MODULE,
> +///             ::kernel::module::this_module::<crate::LocalModule>(),
>  ///             &CONFIGURATION_ATTRS
>  ///         );
>  ///
> @@ -1021,7 +1021,8 @@ macro_rules! configfs_attrs {
>  
>                      static [< $data:upper _TPE >] : $crate::configfs::ItemType<$container, $data>  =
>                          $crate::configfs::ItemType::<$container, $data>::new::<N>(
> -                            &THIS_MODULE, &[<$ data:upper _ATTRS >]
> +                            $crate::module::this_module::<LocalModule>(),

^ You only changed one single place. This is still plain `LocalModule`.

Best,
Gary

> +                            &[<$ data:upper _ATTRS >]
>                          );
>                  )?
>  
> @@ -1030,7 +1031,8 @@ macro_rules! configfs_attrs {
>                          $crate::configfs::ItemType<$container, $data>  =
>                              $crate::configfs::ItemType::<$container, $data>::
>                              new_with_child_ctor::<N, $child>(
> -                                &THIS_MODULE, &[<$ data:upper _ATTRS >]
> +                                $crate::module::this_module::<LocalModule>(),
> +                                &[<$ data:upper _ATTRS >]
>                              );
>                  )?
>  



^ permalink raw reply

* Re: [PATCH v6 10/10] rust: module: update MAINTAINERS to cover module.rs
From: Gary Guo @ 2026-06-25 14:39 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260624-fix-fops-owner-v6-10-5295e333cb3e@linux.dev>

On Wed Jun 24, 2026 at 4:00 PM BST, Alvin Sun wrote:
> Module types now live in `rust/kernel/module.rs` alongside
> `rust/kernel/module_param.rs`. Update the MODULE SUPPORT file pattern
> from `rust/kernel/module_param.rs` to `rust/kernel/module*.rs` so both
> files are covered.
>
> Assisted-by: opencode:glm-5.2

Did you actually use a LLM for this patch even? :)

> Link: https://lore.kernel.org/rust-for-linux/8ea21b29-9baf-4926-a16f-7d21c5a1a1b8@suse.com
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

This patch should probably be squashed into the actual move, i.e. patch 1.

Best,
Gary

> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e035a3be797c4..74733de3e41ee 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17984,7 +17984,7 @@ F:	include/linux/module*.h
>  F:	kernel/module/
>  F:	lib/test_kmod.c
>  F:	lib/tests/module/
> -F:	rust/kernel/module_param.rs
> +F:	rust/kernel/module*.rs
>  F:	rust/macros/module.rs
>  F:	scripts/module*
>  F:	tools/testing/selftests/kmod/



^ permalink raw reply

* [PATCH net v3] tipc: fix out-of-bounds read in broadcast Gap ACK blocks
From: Samuel Page @ 2026-06-25 14:38 UTC (permalink / raw)
  To: Jon Maloy
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Tung Quang Nguyen, netdev, tipc-discussion,
	linux-kernel, Samuel Page

A broadcast PROTOCOL/STATE_MSG can carry a Gap ACK blocks record in its
data area. tipc_get_gap_ack_blks() only verifies that the record's len
field is self-consistent with its ugack_cnt/bgack_cnt counts
(sz == struct_size(p, gacks, ugack_cnt + bgack_cnt)); it does not check
that the record actually fits in the message data area, msg_data_sz().

The unicast caller tipc_link_proto_rcv() bounds it ("if (glen > dlen)
break;"), but the broadcast caller tipc_bcast_sync_rcv() discards the
returned size, so tipc_link_advance_transmq() copies the record off the
receive skb with an attacker-controlled count:

	this_ga = kmemdup(ga, struct_size(ga, gacks, ga->bgack_cnt),
			  GFP_ATOMIC);

A TIPC neighbour that negotiated TIPC_GAP_ACK_BLOCK triggers it with one
ordinary broadcast STATE_MSG (msg_bc_ack_invalid() clear), sized so its
data area is short, carrying a Gap ACK record with len = 0x400,
bgack_cnt = 0xff and ugack_cnt = 0. len then equals
struct_size(p, gacks, 255), so the consistency check passes and ga is
non-NULL; kmemdup() reads struct_size(ga, gacks, 255) = 1024 bytes out
of the much smaller skb:

  BUG: KASAN: slab-out-of-bounds in kmemdup_noprof+0x48/0x60
  Read of size 1024 at addr ffff0000c7030d38 by task poc864/69
  Call trace:
   kmemdup_noprof+0x48/0x60
   tipc_link_advance_transmq+0x86c/0xb80
   tipc_link_bc_ack_rcv+0x19c/0x1e0
   tipc_bcast_sync_rcv+0x1c4/0x2c4
   tipc_rcv+0x85c/0x1340
   tipc_l2_rcv_msg+0xac/0x104
  The buggy address belongs to the object at ffff0000c7030d00
   which belongs to the cache skbuff_small_head of size 704
  The buggy address is located 56 bytes inside of
   allocated 704-byte region [ffff0000c7030d00, ffff0000c7030fc0)

The copied-out bytes are subsequently consumed as gap/ack values, but
the read is already out of bounds at the kmemdup() regardless of how
they are used.

The unicast STATE path drops such a message: "if (glen > dlen) break;"
skips the rest of STATE_MSG handling and the skb is freed. Make the
broadcast path drop it too. tipc_bcast_sync_rcv() now bounds the record
against msg_data_sz() and, when it does not fit, reports it back through
tipc_node_bc_sync_rcv() to tipc_rcv() so the skb is discarded rather than
processed. ga is not cleared on this path: ga == NULL already means
"legacy peer without Selective ACK", a distinct legitimate state.

Fixes: d7626b5acff9 ("tipc: introduce Gap ACK blocks for broadcast link")
Cc: stable@vger.kernel.org
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
---
v3, per review of v2 [2]:
 - reverse-xmas-tree order for the new 'glen' declaration in
   tipc_bcast_sync_rcv().
 - tipc_node_bc_sync_rcv() now checks the validity flag and returns
   immediately when the Gap ACK record is malformed, rather than relying on
   the (zero) return code to fall through.

v2, per review of v1 [1]:
 - v1 cleared 'ga' on an oversized Gap ACK record, which let the malformed
   STATE message be processed as a legacy (no Selective ACK) one rather than
   dropped.  v2 drops it instead, matching the unicast STATE path:
   tipc_bcast_sync_rcv() reports the bad record through a bool output
   parameter, propagated by tipc_node_bc_sync_rcv() to tipc_rcv(), which
   discards the skb.
 - v1 touched only net/tipc/bcast.c; v2 also touches net/tipc/{bcast.h,node.c}.

[1] https://lore.kernel.org/netdev/20260623135443.3662041-1-sam@bynar.io/
[2] https://lore.kernel.org/netdev/20260624135629.727262-1-sam@bynar.io/

For reference, an earlier thread proposed validating inside
tipc_get_gap_ack_blks():
  https://lore.kernel.org/netdev/1316452e465e9a96fce44ec15130a14f3872149f.1775809727.git.caoruide123@gmail.com/

 net/tipc/bcast.c | 22 ++++++++++++++--------
 net/tipc/bcast.h |  2 +-
 net/tipc/node.c  | 15 ++++++++++++---
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 76a1585d3f6b..10d1ec593084 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -497,12 +497,13 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
  */
 int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
 			struct tipc_msg *hdr,
-			struct sk_buff_head *retrq)
+			struct sk_buff_head *retrq, bool *valid)
 {
 	struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq;
 	struct tipc_gap_ack_blks *ga;
 	struct sk_buff_head xmitq;
 	int rc = 0;
+	u16 glen;
 
 	__skb_queue_head_init(&xmitq);
 
@@ -510,13 +511,18 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
 	if (msg_type(hdr) != STATE_MSG) {
 		tipc_link_bc_init_rcv(l, hdr);
 	} else if (!msg_bc_ack_invalid(hdr)) {
-		tipc_get_gap_ack_blks(&ga, l, hdr, false);
-		if (!sysctl_tipc_bc_retruni)
-			retrq = &xmitq;
-		rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
-					  msg_bc_gap(hdr), ga, &xmitq,
-					  retrq);
-		rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq);
+		glen = tipc_get_gap_ack_blks(&ga, l, hdr, false);
+		if (glen > msg_data_sz(hdr)) {
+			/* Malformed Gap ACK blocks; caller drops the msg */
+			*valid = false;
+		} else {
+			if (!sysctl_tipc_bc_retruni)
+				retrq = &xmitq;
+			rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
+						  msg_bc_gap(hdr), ga, &xmitq,
+						  retrq);
+			rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq);
+		}
 	}
 	tipc_bcast_unlock(net);
 
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 2d9352dc7b0e..55d17b5413e1 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -97,7 +97,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
 			struct tipc_msg *hdr);
 int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
 			struct tipc_msg *hdr,
-			struct sk_buff_head *retrq);
+			struct sk_buff_head *retrq, bool *valid);
 int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg,
 			struct tipc_link *bcl);
 int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 97aa970a0d83..8e4ef2630ae4 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1831,12 +1831,15 @@ static void tipc_node_mcast_rcv(struct tipc_node *n)
 }
 
 static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr,
-				  int bearer_id, struct sk_buff_head *xmitq)
+				  int bearer_id, struct sk_buff_head *xmitq,
+				  bool *valid)
 {
 	struct tipc_link *ucl;
 	int rc;
 
-	rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq);
+	rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq, valid);
+	if (!*valid)
+		return;
 
 	if (rc & TIPC_LINK_DOWN_EVT) {
 		tipc_node_reset_links(n);
@@ -2140,12 +2143,18 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b)
 
 	/* Ensure broadcast reception is in synch with peer's send state */
 	if (unlikely(usr == LINK_PROTOCOL)) {
+		bool valid = true;
+
 		if (unlikely(skb_linearize(skb))) {
 			tipc_node_put(n);
 			goto discard;
 		}
 		hdr = buf_msg(skb);
-		tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq);
+		tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq, &valid);
+		if (!valid) {
+			tipc_node_put(n);
+			goto discard;
+		}
 	} else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) {
 		tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr);
 	}

base-commit: 02f144fbb4c86c360495d33debe307cb46a57f95
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v6 01/10] rust: module: move module types into `module.rs`
From: Gary Guo @ 2026-06-25 14:37 UTC (permalink / raw)
  To: Alvin Sun, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Daniel Almeida,
	Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar, Breno Leitao,
	Jens Axboe, Dave Ertman, Leon Romanovsky, Igor Korotin,
	FUJITA Tomonori, Bjorn Helgaas, Krzysztof Wilczyński,
	Arve Hjønnevåg, Todd Kjos, Christian Brauner,
	Carlos Llamas
  Cc: rust-for-linux, linux-modules, driver-core, dri-devel, nova-gpu,
	linux-kselftest, kunit-dev, linux-block, linux-kernel, netdev,
	linux-pci
In-Reply-To: <20260624-fix-fops-owner-v6-1-5295e333cb3e@linux.dev>

On Wed Jun 24, 2026 at 4:00 PM BST, Alvin Sun wrote:
> Move `Module`, `InPlaceModule`, `ModuleMetadata` and `ThisModule` from
> `lib.rs` into a new `rust/kernel/module.rs`. Re-export them from `lib.rs`
> to avoid tree-wide changes.
> 
> Switch six bus driver registrations from `module.0` to the public
> `ThisModule::as_ptr()` accessor, since the field is no longer visible
> outside the new `module` submodule.
> 
> No functional change.
> 
> Assisted-by: opencode:glm-5.2
> Acked-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Alvin Sun <alvin.sun@linux.dev>

Suggested-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/all/DJFIQPLOVO4T.1K8T0VZM30LDA@garyguo.net/
Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/kernel/auxiliary.rs |  2 +-
>  rust/kernel/i2c.rs       |  2 +-
>  rust/kernel/lib.rs       | 75 +++++-------------------------------------------
>  rust/kernel/module.rs    | 71 +++++++++++++++++++++++++++++++++++++++++++++
>  rust/kernel/net/phy.rs   |  6 +++-
>  rust/kernel/pci.rs       |  2 +-
>  rust/kernel/platform.rs  |  2 +-
>  rust/kernel/usb.rs       |  2 +-
>  8 files changed, 88 insertions(+), 74 deletions(-)


^ permalink raw reply

* Re: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
From: Jakub Kicinski @ 2026-06-25 14:36 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, pabeni, horms,
	tushar.vyavahare, kerneljasonxing
In-Reply-To: <aj0EYNt7Sr1dD96N@boxer>

On Thu, 25 Jun 2026 12:35:12 +0200 Maciej Fijalkowski wrote:
> On Wed, Jun 24, 2026 at 07:33:26PM -0700, Jakub Kicinski wrote:
> > On Tue, 23 Jun 2026 11:10:08 +0200 Maciej Fijalkowski wrote:  
> > > Subject: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test  
> > 
> > Do you want it in net? Either way - we'll need a rebase  
> 
> I have not checked if this has been -net propagated already, but the rule
> of thumb on bpf side was that all selftests related effort goes to -next.
> Is it different on netdev side?

We prefer -next too, but during the merge window net-next is closed.

What we definitely don't want is a -next patch with a Fixes tag.
So either net or drop the tag, please.

^ permalink raw reply

* Re: [BUG] TCP connection deadlock under simultaneous bidirectional ICSK_ACK_NOMEM (OOM)
From: Eric Dumazet @ 2026-06-25 14:34 UTC (permalink / raw)
  To: xietangxin
  Cc: Menglong Dong, davem, kuba, pabeni, jmaloy, menglong8.dong,
	kuniyu, horms, willemb, netdev, linux-kernel, linux-stable
In-Reply-To: <585bc5ca-8348-49e6-bbce-acbf6b99d912@yeah.net>

On Thu, Jun 25, 2026 at 6:22 AM xietangxin <xietangxin@yeah.net> wrote:
>
>
>
> On 6/8/2026 7:55 PM, Menglong Dong wrote:
> > On 2026/6/4 16:22 xietangxin <xietangxin@yeah.net> write:
> >> Hi all,
> >>
> >> We have observed a TCP connection deadlock on stable 6.6 under heavy stress testing.
> >>
> >> 1.Both Peer A and Peer B enter the ICSK_ACK_NOMEM branch in tcp_select_window().
> >> After commit 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"),
> >> Both peers freeze their rcv_nxt and set rcv_wnd = 0.
> >>
> >> 2.Prior to freezing, both sides had already sent out flight data.
> >> Since both sides are dropping incoming data packets due to OOM, rcv_nxt stops advancing,
> >> but the peer's seq of subsequent packets continues to grow.
> >>
> >> 3.When Peer A receives Peer B's Zero Window ACK,
> >> the packet's seq is far ahead of Peer A's frozen rcv_nxt.
> >> Both peers drop each other's packet, also no Zero Window Probes are triggered
> >> because snd_wnd is never updated to 0.
> >>
> >
> > Hi,
> >
> > The problem you addressed is already fixed in this commit:
> > 0e24d17bd966 ("tcp: implement RFC 7323 window retraction receiver requirements"),
> > which hasn't been picked to the 6.6 branch.
> >
> > That patch doesn't have the Fix tag, so I'm not sure if it will be picked
> > to the 6.6 branch. Just CC the linux-stable :)
> >
> > Thanks!
> > Menglong Dong
> >
> >>
> >> Simplified Packet Trace:
> >>
> >> Assume Peer A's rcv_nxt = 1000, and Peer B's rcv_nxt = 5000 initially.
> >>
> >> Time  Dir      Type        Seq   Ack   Win  Len  Status
> >> ------------------------------------------------------------------------
> >> T1:   B -> A   [PSH, ACK]  1000  5000  3000 100  (A hits OOM, rcv_nxt=1000)
> >> T2:   B -> A   [ACK]       1100  5000  3000 200  (Dropped due to A's OOM)
> >> T3:   B -> A   [PSH, ACK]  1300  5000  3000 200  (Dropped due to A's OOM)
> >>
> >> T4:   A -> B   [PSH, ACK]  5000  1000  3000 100  (B hits OOM, rcv_nxt=5000)
> >> T5:   A -> B   [ACK]       5100  1000  3000 200  (Dropped due to B's OOM)
> >> T6:   A -> B   [PSH, ACK]  5300  1000  3000 200  (Dropped due to B's OOM)
> >>
> >> -- Both sides are now in OOM. B's Seq is 1500; A's Seq is 5500 --
> >>
> >> T7:   B -> A   [ZeroWin]   1500  5000  0    0    (Dropped: Seq 1500 != 1000)
> >> T8:   A -> B   [ZeroWin]   5500  1000  0    0    (Dropped: Seq 5500 != 5000)
> >> T9:   A -> B   [WinUpdate] 5500  1000  20   0    (Dropped: Seq 5500 != 5000)
> >>
> >> Should we relax the sequence check in tcp_sequence() for zero window ACK?
> >>
> >> Any feedback or guidance would be greatly appreciated.
> >>
> >> --
> >> Best regards,
> >> Tangxin Xie
> >>
> >>
> >>
> >
> >
> >
>
> Hi,
>
> We observed a throughput regression (dropping from ~1GB/s to 100MB/s)
> in our test environment after commit 0e24d17bd966
> ("tcp: implement RFC 7323 window retraction receiver requirements").
>

Could you provide instructions on how you/we can deterministically
reproduce this issue?

> When the rcv_buf reaches the pressure triggers tcp_clamp_window().
> then rcv_ssthresh is strictly capped to 2 * advmss.
> Subsequently, even after the user completely consumes the data and releases
> a massive amount of free_space, tcp_select_window() is still heavily
> suppressed by the clamped rcv_ssthresh. As a result, the receiver advertises
> an extremely small window (Win=23) to the peer.
>
> The sender cannot transmit any new data segments, until the sender's RTO timer
> expires and triggers a slow-start recovery. This 200ms silence window slashes
> our bandwidth by 90%.
>
>
> No.   Time           Source       Destination  Info
> -----------------------------------------------------------------------------------------------
> 1045  08:16:06.8005  192.168.1.9  192.168.1.10  [TCP ZeroWindow] 57334 -> 6666 [PSH, ACK] Win=0
> 1052  08:16:06.8013  192.168.1.9  192.168.1.10  [TCP Window Update] 57334 -> 6666 [ACK] Win=23
> 1055  08:16:06.8036  192.168.1.10  192.168.1.9  6666 -> 57334 [ACK] Seq=2999704568 Ack=2416286095
> =========================== 200ms  SILENCE (RTO WAITING) ===================================
> 1088  08:16:07.0056  192.168.1.10  192.168.1.9  [TCP Retransmission] 6666 -> 57334 Len=1448
> 1090  08:16:07.0060  192.168.1.10  192.168.1.9  [TCP Retransmission] Len=2896
>
> --
> Best regards,
> Tangxin Xie
>

^ permalink raw reply

* Re: [PATCH net 1/2] net: dsa: mxl862xx: avoid unaligned 16-bit access in api_wrap
From: Jakub Kicinski @ 2026-06-25 14:31 UTC (permalink / raw)
  To: David Laight
  Cc: Daniel Golle, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260625084459.7393409f@pumpkin>

On Thu, 25 Jun 2026 08:44:59 +0100 David Laight wrote:
> > struct mxl862xx_mac_table_clear {
> > 	u8 type;
> > 	u8 port_id;
> > } __packed;  
> 
> Does that one need an aligned(2) ?

Right, I meant that if we need to remember to do that instead of
depending on natural alignment - chances are someone will forget
while adding a new register, and the bug will be back.

> > So I guess the "just don't pack" will have some corner cases, too.  

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox