Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Intel-wired-lan] [PATCH v2] dpf: fix UAF and double free in idpf_plug_vport_aux_dev() error path
From: Guangshuo Li @ 2026-04-15  1:47 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Joshua Hay,
	Tatyana Nikolova, Madhu Chittim, intel-wired-lan, netdev,
	linux-kernel, Greg Kroah-Hartman, stable
In-Reply-To: <5da15f31-e9af-4f8d-82fd-eac29a6d98f6@intel.com>

Hi Jacob,

Thanks for reviewing.

On Wed, 15 Apr 2026 at 05:03, Jacob Keller <jacob.e.keller@intel.com> wrote:
>
>
> This doesn't look right. The commit message analysis seems to match this
> fix from Greg KH:
>
> https://lore.kernel.org/intel-wired-lan/2026041432-tapestry-condition-22ff@gregkh/
>
> But the changes do not make any sense to me. It looks like a poorly done
> AI-generated "fix" which is not correct. Greg's version does look like
> it properly resolves this.
>
> > v2:
> >   - note that the issue was identified by my static analysis tool
> >   - and confirmed by manual review
> >
>
> What even is this change log?? I see that version was sent and everyone
> else was sane enough to just silently reject or ignore the v1...
>
> >  drivers/net/ethernet/intel/idpf/idpf_idc.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/intel/idpf/idpf_idc.c b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > index 6dad0593f7f2..2a18907643fc 100644
> > --- a/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > +++ b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> > @@ -59,6 +59,7 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
> >       char name[IDPF_IDC_MAX_ADEV_NAME_LEN];
> >       struct auxiliary_device *adev;
> >       int ret;
> > +     int adev_id;
> >
>
> You create a local variable here...
>
> >       iadev = kzalloc(sizeof(*iadev), GFP_KERNEL);
> >       if (!iadev)
> > @@ -74,11 +75,14 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
> >               goto err_ida_alloc;
> >       }
> >       adev->id = ret;
> > +     adev->id = adev_id;
>
> adev_is is never initialized, so you assign a random garbage
> uninitialized value. This is obviously wrong and will lead to worse
> errors than the failed cleanup.
>
> I'm rejecting this patch in favor of the clearly appropriate fix from Greg.
>
> >       adev->dev.release = idpf_vport_adev_release;
> >       adev->dev.parent = &cdev_info->pdev->dev;
> >       sprintf(name, "%04x.rdma.vdev", cdev_info->pdev->vendor);
> >       adev->name = name;
> >
> > +     /* iadev is owned by the auxiliary device */
> > +     iadev = NULL;>          ret = auxiliary_device_init(adev);
> >       if (ret)
> >               goto err_aux_dev_init;
> > @@ -92,7 +96,7 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
> >  err_aux_dev_add:
> >       auxiliary_device_uninit(adev);
> >  err_aux_dev_init:
> > -     ida_free(&idpf_idc_ida, adev->id);
> > +     ida_free(&idpf_idc_ida, adev_id);
> >  err_ida_alloc:
> >       vdev_info->adev = NULL;
> >       kfree(iadev);
>

You are right that the v2 patch as sent is incomplete. That was my
mistake when preparing/sending v2: it accidentally dropped the adev_id
= ret; assignment, which made that version incorrect.

For reference, the original v1 patch is here:

https://lkml.org/lkml/2026/3/21/421

In v1, adev_id was assigned from ret before use, so I believe that
particular uninitialized-variable issue was introduced in the v2
posting.

Sorry for the confusion caused by the broken v2 posting.

Thanks,
Guangshuo

^ permalink raw reply

* Re: [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration
From: fengchengwen @ 2026-04-15  1:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Bjorn Helgaas, linux-rdma, linux-pci, netdev,
	dri-devel, Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
In-Reply-To: <20260414151125.GF2577880@ziepe.ca>

On 4/14/2026 11:11 PM, Jason Gunthorpe wrote:
> On Tue, Apr 14, 2026 at 10:46:00PM +0800, fengchengwen wrote:
>>    We have a real platform requirement:
>>
>>      * 1. Devices in TPH Device-Specific Mode with no standard ST table
>>      * 2. Steering Tags must be obtained from ACPI _DSM (kernel-only)
>>      * 3. Devices are fully managed by userspace drivers (VFIO/UIO)
>>      * 4. Userspace must program STs into vendor-specific registers
> 
> No, this is nonsenscial too.
> 
> If you want to control the steering tags for MMIO BAR memory exposed
> by VFIO then the DMABUF mechanism Keith & co has been working on is
> the correct approach.
> 
> If the VFIO user needs to control steering tags for the device it is
> directly controling then it must do that through VFIO ioctls.
> 
> Nobody messes around with other devices under the covers of the
> operating kernel driver. Stop proposing that.

Understood.

For VFIO-passed devices that are fully under userspace control,
we will implement the TPH Steering Tag query interface
exclusively through VFIO ioctls, not sysfs.

This will allow userspace to query per-CPU Steering Tags
from platform firmware (ACPI _DSM) for the VFIO device,
which is fully under its control.

Thanks

> 
> Jason


^ permalink raw reply

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Jiayuan Chen @ 2026-04-15  1:47 UTC (permalink / raw)
  To: mkf, bpf
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David Ahern, netdev, linux-doc, linux-kernel
In-Reply-To: <42c1fed84a84519c2432163aa46f587f2d624fef.camel@163.com>


On 4/14/26 11:37 PM, mkf wrote:
> On Tue, 2026-04-14 at 18:57 +0800, Jiayuan Chen wrote:


[...]

> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -475,12 +475,21 @@ struct tcp_sock {
>   	u8	bpf_sock_ops_cb_flags;  /* Control calling BPF programs
>   					 * values defined in uapi/linux/tcp.h
>   					 */
> -	u8	bpf_chg_cc_inprogress:1; /* In the middle of
> +	u8	bpf_chg_cc_inprogress:1, /* In the middle of
>   					  * bpf_setsockopt(TCP_CONGESTION),
>   					  * it is to avoid the bpf_tcp_cc->init()
>   					  * to recur itself by calling
>   					  * bpf_setsockopt(TCP_CONGESTION, "itself").
>   					  */
> +		bpf_hdr_opt_len_cb_inprogress:1; /* It is set before invoking the
> +						  * callback so that a nested
> +						  * bpf_setsockopt(TCP_NODELAY) or
> +						  * bpf_setsockopt(TCP_CORK) cannot
> +						  * trigger tcp_push_pending_frames(),
> +						  * which would call tcp_current_mss()
> +						  * -> bpf_skops_hdr_opt_len(), causing
> +						  * infinite recursion.
> +						  */
>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
>   #else
>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158fb0..518699429a7a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5483,6 +5483,10 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
>   	if (sk->sk_protocol != IPPROTO_TCP)
>   		return -EINVAL;
>   
> +	if ((optname == TCP_NODELAY || optname == TCP_CORK) &&
> +	    tcp_sk(sk)->bpf_hdr_opt_len_cb_inprogress)
> +		return -EBUSY;
> +
> TCP_CORK is not support in sol_tcp_sockopt(), return -EINVAL by default. and put the check here
> could also prevent us from calling getsockopt(TCP_NODELAY) below.
>
>>   	switch (optname) {
>>   	case TCP_NODELAY:
>>   	case TCP_MAXSEG:
>> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
>> index dafb63b923d0..fb06c464ac16 100644
>> --- a/net/ipv4/tcp_minisocks.c
>> +++ b/net/ipv4/tcp_minisocks.c
>> @@ -663,6 +663,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
>>   	RCU_INIT_POINTER(newtp->fastopen_rsk, NULL);
>>   
>>   	newtp->bpf_chg_cc_inprogress = 0;
>> +	newtp->bpf_hdr_opt_len_cb_inprogress = 0;
>>   	tcp_bpf_clone(sk, newsk);
>>   
>>   	__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 326b58ff1118..c9654e690e1a 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -475,6 +475,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
>>   				  unsigned int *remaining)
>>   {
>>   	struct bpf_sock_ops_kern sock_ops;
>> +	struct tcp_sock *tp = tcp_sk(sk);
>>   	int err;
>>   
>>   	if (likely(!BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk),
>> @@ -519,7 +520,9 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
>>   	if (skb)
>>   		bpf_skops_init_skb(&sock_ops, skb, 0);
>>   
>> +	tp->bpf_hdr_opt_len_cb_inprogress = 1;
> we check the BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG before calling BPF_CGROUP_RUN_PROG_SOCK_OPS_SK,
> could this flag use for the same purpose? so we don't need to add an extra field.
>
> 	if (likely(!BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk),
> 					   BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) ||
> 	    !*remaining)
> 		return;


Hi Martin, I saw your patch. Your solution is better, please ignore mine :)




^ permalink raw reply

* [ANN] netdev development stats for 7.1
From: Jakub Kicinski @ 2026-04-15  1:26 UTC (permalink / raw)
  To: netdev

Hi!

Intro
-----

As is tradition here are the development statistics based on mailing
list traffic on netdev@vger.

These stats are somewhat like LWN stats: https://lwn.net/Articles/1004998/
but more focused on mailing list participation. And by participation
we mean reviewing code more than producing patches.

In particular "review score" tries to capture the balance between
reviewing other people's code vs posting patches. It's roughly
number of patches reviewed minus number of patches posted. 
Those who post more than they review will have a negative score.

Previous 3 reports:
 - for 6.18: https://lore.kernel.org/20251002171032.75263b18@kernel.org 
 - for 6.19: https://lore.kernel.org/20251202175548.6b5eb80e@kernel.org
 - for 7.0:  https://lore.kernel.org/20260212124208.187e53ae@kernel.org

General stats
-------------

This has been subjectively a pretty crazy release. Last week especially.
It's bleakly reassuring to see that the numbers confirm how we feel.

Let us use 6.18 as a point of reference, since the last release of year
is usually the biggest one. 7.1 had the same linux-next size as 6.18.
The core networking maintainers committed slightly more changes than 
in the 6.18 cycle (1531 / 24 a day / +1.6%). The number of messages 
on the list was dramatically higher (318 msg a day / +21.0%), and so
was the number of people we've interacted with (874 / +12.0%).
The number of people may be slightly under-counted, we noticed that
some authors of semi-automated fixes share an email address(!?)

The tenure histograms confirm that we are dealing with a lot of
newcomers:

  Time since poster's first commit in 6.18
  no commit |  76 | **************************************************
   0- 3mo   |  33 | *********************
   3- 6mo   |  18 | ***********
  6mo-1yr   |  30 | *******************

  Time since poster's first commit in 7.1
  no commit | 107 | **************************************************
   0- 3mo   |  61 | ****************************
   3- 6mo   |  15 | *******
  6mo-1yr   |  29 | *************

In other words number of authors increased by 81, number of people with
less than 3mo since their first commit increased by 59. This is not
surprising but newcomers require a lot more hand holding. And something
tells me the churn of newcomers will only go up.

The review coverage continues to drop, and is now the lowest recorded
(42.9% of changes being reviewed by someone from a different company
than the author). At the same time patches are reposted more often,
with average number of revisions going up by 10%.

AI reviews
----------

In the previous cycle we have introduced a netdev AI review bot which
was using Chris Mason's review prompts. This cycle saw introduction
of Sashiko, which _seems_ much better at spotting bugs but most of the
bugs it finds are unrelated to the submission. Our bot intentionally
tried to exclude complaining about existing problems. Sashiko also 
"asks questions" about potentially issues which it is unsure are in
fact a problem. This may be fine during review in development, but
upstream it means that maintainers are now spending around 50% of their
time trying to disprove AI reviews. Last but not least because
the reviews are public immediately we have people reacting to them,
spamming the list and often incorporating incorrect feedback.

None of this is meant as a criticism of the tools. We are lucky
to have in the community people willing to invest their time
to build such tools, and companies willing to sponsor the LLM tokens.
That said, combination of extra work AI tools put on maintainers and
ease for newcomers to produce plausible but incorrect code is pushing 
us beyond our limits. Especially when the plausible looking code is
"fixing bugs" in 20 year old code which none of the current maintainers
have any interest in or frankly sense of responsibility for.

One more thing to note before I end this rant. The LLMs are expensive
and/or capacity constrained. While a lot of the issues could be
addressed by LLMs doing more research, the current prompts already
eat our entire budgets. Real engineering work is required to make
the LLMs more efficient by building tools and MCP endpoints around
the LLMs. It is hard to find time to do this work when we average 
150 patches send to the list on any working day.

I'd like us to gather up during the next bi-weekly call slot and
discuss some ideas on how we can survive the changes.

Testing
-------

Percentage of changes to selftests stubbornly remains at around 10% of
all commits. Here are the top contributors:

Contributions to selftests:
   1 [ 34] Jakub Kicinski
   2 [ 10] Ioana Ciornei
   3 [  7] Aleksei Oladko
   4 [  7] Simon Baatz
   5 [  6] Jiayuan Chen
   6 [  6] Dimitri Daskalakis
   7 [  6] Bobby Eshleman
   8 [  5] David Wei
   9 [  5] Allison Henderson
  10 [  4] Maciej Fijalkowski

Good news on the HW testing side, we now have machines with 4 NICs in
our labs (all the 25G+ NICs our supplier offered ;)) Broadcom BCM57508,
nVidia CX7, Intel X710, Intel E830. We have caught a number of issues
with them already.

Matrix of the tests vs NICs: https://netdev.bots.linux.dev/devices.html

Developer rankings
------------------

Top reviewers (cs):                  Top reviewers (msg):                
   1 (   ) [48] Jakub Kicinski          1 (   ) [112] Jakub Kicinski     
   2 (   ) [31] Simon Horman            2 ( +1) [ 50] Simon Horman       
   3 (   ) [13] Andrew Lunn             3 ( -1) [ 34] Andrew Lunn        
   4 (   ) [11] Paolo Abeni             4 (   ) [ 22] Paolo Abeni        
   5 (   ) [10] Eric Dumazet            5 ( +1) [ 21] Eric Dumazet       
   6 (+13) [ 7] Kuniyuki Iwashima       6 ( +1) [ 19] Russell King       
   7 (   ) [ 7] Russell King            7 ( -2) [ 15] Aleksandr Loktionov
   8 ( -2) [ 5] Aleksandr Loktionov     8 ( +7) [ 13] Kuniyuki Iwashima  
   9 ( -1) [ 4] Willem de Bruijn        9 ( -1) [ 11] Willem de Bruijn   
  10 ( +5) [ 3] Krzysztof Kozlowski    10 ( +4) [  9] Krzysztof Kozlowski
  11 (***) [ 3] Joe Damato             11 ( -2) [  9] Vladimir Oltean    
  12 ( +5) [ 3] Florian Westphal       12 (+33) [  8] Sabrina Dubroca    
  13 (***) [ 3] Pablo Neira Ayuso      13 ( +9) [  7] Ido Schimmel       
  14 (   ) [ 3] Paul Menzel            14 (***) [  6] Joe Damato         
  15 ( -3) [ 3] Maxime Chevallier      15 (+21) [  6] Conor Dooley       

Lots of familiar names among top reviewers. Kuniyuki returned after
short absence, reviewing core networking, sockets, UNIX, TCP etc.
Joe reviewed various patches with no easily discernible theme (which
is perfectly fine :)). Sabrina reviews / maintains all things crypto
(ipsec, macsec, tls) which is of huge help. Ido is reliably helping
with IP / routing and bridge reviews. Pablo and Florian focus on
netfilter but there's quite a bit of cross posting. Thank you all!

Top authors (cs):                    Top authors (msg):                  
   1 (   ) [10] Eric Dumazet            1 (   ) [37] Russell King        
   2 ( +1) [ 5] Jakub Kicinski          2 ( +3) [23] Eric Dumazet        
   3 (***) [ 4] Jiayuan Chen            3 (***) [22] Jeff Layton         
   4 (***) [ 4] Aleksandr Loktionov     4 (***) [22] Kuniyuki Iwashima   
   5 ( -1) [ 4] Russell King            5 ( +2) [21] Tariq Toukan        
   6 ( +3) [ 3] Lorenzo Bianconi        6 (+21) [20] Vladimir Oltean     
   7 ( -1) [ 3] Tariq Toukan            7 ( +6) [17] Jakub Kicinski      
   8 (***) [ 2] Qingfang Deng           8 (+15) [16] Xuan Zhuo           
   9 ( +3) [ 2] Kuniyuki Iwashima       9 (+11) [15] Florian Westphal    
  10 (***) [ 2] Fernando Fernandez M.  10 ( +8) [15] Tony Nguyen         

Jiayuan Chen provided quite a few (quality) fixes across the stack.
Aleksandr cross posts Intel driver submissions quite a bit.
Russell continued to clean up stammac, AKA the Augean stables.
Lorenzo works on airoha, Qingfang on PPP and Fernando removed
the support for IPv6=m among other things.

Jeff cross posts NFS patches, bringing Meta's reviewer score down,
much to my chagrin. Don't tell him I said this :)

Top scores (positive):               Top scores (negative):              
   1 (   ) [769] Jakub Kicinski         1 (***) [84] Jeff Layton         
   2 ( +1) [440] Simon Horman           2 (+11) [67] Tariq Toukan        
   3 ( -1) [227] Andrew Lunn            3 (+22) [57] Xuan Zhuo           
   4 (   ) [170] Paolo Abeni            4 (+42) [41] Bhargava Chenna Marreddy
   5 ( +4) [ 73] Eric Dumazet           5 (***) [38] Larysa Zaremba      
   6 (   ) [ 65] Willem de Bruijn       6 (+15) [38] Illusion Wang       
   7 ( +1) [ 58] Krzysztof Kozlowski    7 ( -6) [37] Ratheesh Kannoth    
   8 ( +7) [ 39] David Ahern            8 ( +8) [37] Tony Nguyen         
   9 (+10) [ 38] Ido Schimmel           9 (***) [36] Satish Kharat       
  10 ( -5) [ 37] Aleksandr Loktionov   10 ( -5) [36] Wei Fang            

Number of people on the "negative review score" side are there because
they are struggling to get new drivers in because of the depth of the
AI reviews.

Company rankings
----------------

Note on company rankings - because of the volume of patches I now
completely depend on a UI which ranks submissions on various
"readiness" metrics. One of them is the company review score.
This is to say that having a negative review score will now
impact review latency by up to 2 days.

Top reviewers (cs):                  Top reviewers (msg):                
   1 (   ) [55] Meta                    1 (   ) [135] Meta               
   2 (   ) [47] RedHat                  2 (   ) [104] RedHat             
   3 ( +2) [16] Google                  3 ( +1) [ 45] Google             
   4 ( -1) [16] Intel                   4 ( -1) [ 45] Intel              
   5 ( -1) [13] Andrew Lunn             5 (   ) [ 34] Andrew Lunn        
   6 (   ) [11] nVidia                  6 (   ) [ 30] nVidia             
   7 (   ) [ 9] Oracle                  7 (   ) [ 26] Oracle          

Top authors (cs):                    Top authors (msg):                  
   1 (   ) [16] Google                  1 ( +1) [102] Meta               
   2 (   ) [14] RedHat                  2 ( -1) [ 73] RedHat             
   3 (   ) [12] Meta                    3 ( +1) [ 68] Google             
   4 ( +1) [11] Intel                   4 ( +4) [ 67] Intel              
   5 ( +2) [ 7] Oracle                  5 (   ) [ 50] Oracle             
   6 ( -2) [ 6] nVidia                  6 ( -3) [ 47] nVidia             
   7 (+12) [ 5] Microsoft               7 ( -1) [ 43] NXP                 

Top scores (positive):               Top scores (negative):              
   1 (   ) [556] Meta                   1 (+16) [112] NXP                
   2 (   ) [496] RedHat                 2 (***) [ 68] Microsoft          
   3 (   ) [227] Andrew Lunn            3 (+15) [ 59] Alibaba            
   4 ( +5) [ 88] Linaro                 4 (***) [ 45] Microchip          
   5 ( +3) [ 35] Linux Foundation       5 (+46) [ 45] Shopee             
   6 ( -1) [ 32] Google                 6 (***) [ 41] Qualcomm           
   7 (   ) [ 32] Max-Planck             7 ( -6) [ 38] Huawei            
-- 
Code: https://github.com/kuba-moo/ml-stat
Raw output: https://netdev.bots.linux.dev/ml-stats/stats-7.1

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-15  1:19 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <ad5LlXzeQ8j14Mjg@shell.armlinux.org.uk>

Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
survives iperf3 -c -R to the imx6.

Dumping the registers and comparing, and then forcing the RQS and TQS
values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
*256 = 36864 ytes) respectively seems to solve the problem. Under
net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
all four of the MTL receive operation mode registers, but only the
first MTL transmit operation mode register. However, DMA channels 1-3
aren't initialised.

net-next derives them from:

        unsigned int tqs = fifosz / 256 - 1;

where fifosz is passed in to dwmac4_dma_tx_chan_op_mode() and

        unsigned int rqs = fifosz / 256 - 1;

where fifosz is passed in to dwmac4_dma_rx_chan_op_mode().

Now, according to the DMA capabilities:

        Number of Additional RX channel: 4
        Number of Additional TX channel: 4
        Number of Additional RX queues: 4
        Number of Additional TX queues: 4
        TX Fifo Size: 65536
        RX Fifo Size: 65536

However:

# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:             4
TX:             4
Other:          0
Combined:       0
Current hardware settings:
RX:             1
TX:             1
Other:          0
Combined:       0

So, we end up allocating the entire 64K of the tx and rx FIFO to one
queue in net-next.

Looking back at 5.10, I don't see any code that would account for these
values being programmed for TQS and RQS, it looks like the calculations
are basically the same as we have today.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH v4] net/mlx5: Fix OOB access and stack information leak in PTP event handling
From: Prathamesh Deshpande @ 2026-04-15  1:05 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Carolina Jubran, Saeed Mahameed, Richard Cochran, Tariq Toukan,
	Mark Bloch, netdev, linux-rdma, linux-kernel
In-Reply-To: <20260413144610.GJ21470@unreal>

On Mon, Apr 13, 2026 at 05:46:10PM +0300, Leon Romanovsky wrote:
> On Sun, Apr 12, 2026 at 01:04:10AM +0100, Prathamesh Deshpande wrote:
> > In mlx5_pps_event(), several critical issues were identified:
> > 
> > 1. The 'pin' index from the hardware event was used without bounds
> >    checking to index 'pin_config' and 'pps_info->start'. Check against
> >    MAX_PIN_NUM to prevent out-of-bounds access.
> 
> You were told more than once that this is impossible.
> 
> <...>
> 
> > +	if (WARN_ON_ONCE(pin >= MAX_PIN_NUM))
> > +		return NOTIFY_OK;
> 
> Let's not add useless checks in fast path.

Hi Leon,

Thanks for the feedback. I've addressed this in v5 by dropping the 
redundant pin bounds and pin_config checks to keep the fast path clean, 
focusing strictly on the stack leak and NULL clock guard fixes.

Thanks,
Prathamesh

^ permalink raw reply

* [PATCH net v1] net/mlx5: Fix HCA caps leak on notifier init failure
From: Prathamesh Deshpande @ 2026-04-15  0:49 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Carolina Jubran
  Cc: Cosmin Ratiu, Tariq Toukan, Jakub Kicinski, netdev, linux-rdma,
	linux-kernel, Prathamesh Deshpande

mlx5_mdev_init() allocates HCA caps via mlx5_hca_caps_alloc() before
calling mlx5_notifiers_init(). If notifier initialization fails, the
error path jumps to err_hca_caps and skips mlx5_hca_caps_free(), leaking
allocated caps.

Add a dedicated unwind label for notifier-init failure that frees HCA
caps before continuing the existing cleanup sequence.

Fixes: b6b03097f982 ("net/mlx5: Initialize events outside devlink lock")
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 3f73d9b1115d..fab80c79ff07 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1907,7 +1907,7 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
 
 	err = mlx5_notifiers_init(dev);
 	if (err)
-		goto err_hca_caps;
+		goto err_notifiers_init;
 
 	/* The conjunction of sw_vhca_id with sw_owner_id will be a global
 	 * unique id per function which uses mlx5_core.
@@ -1923,6 +1923,8 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
 
 	return 0;
 
+err_notifiers_init:
+	mlx5_hca_caps_free(dev);
 err_hca_caps:
 	mlx5_adev_cleanup(dev);
 err_adev_init:
-- 
2.43.0


^ permalink raw reply related

* Re: DMA issues with the SKGE drivers
From: Benoît Dufour @ 2026-04-15  0:27 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <c3130329-a7ea-406c-9ac1-2fa5d9d3a8bc@lunn.ch>

Yes, I guess I'd be able to, but only after testing it.
I'm currently waiting a new cooler that would support the 110 W TDP of 
my Opteron 180 X2.
The previous cooler I tested were definitely not good enough for cooling 
that CPU.
Just running MemTest86+ for too long made the CPU overheat.
Even the Arctic Cooler Freezer 7 CO wasn't good enough.

Le 14/04/2026 à 20:12, Andrew Lunn a écrit :
> On Tue, Apr 14, 2026 at 07:23:17PM -0400, Benoît Dufour wrote:
>> In 2024, I reported a bug about the SKGE driver, you can see it here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219270
>>
>> Basically, the problem is that the Marvell 88E8001 on my ASUS A8V motherboard
>> can only work with 32bit DMA, and if trying to use 64bit DMA, the NIC won't
>> work at all and after some time, the operating system will become completely
>> unresponsive (on screen tty will stop refresh, keyboard and mouse input will
>> stop working too).
>>
>> The fix is quite easy:
>>
>> At the very end of the SKGE driver source code, the ASUS A8V motherboard (as
>> well as many other boards like the ASUS A8V Deluxe) should be added to the list
>> of 32bit DMA boards:
>> https://github.com/torvalds/linux/blob/508fed6795411f5ab277fd1edc0d7adca4946f23
>> /drivers/net/ethernet/marvell/skge.c#L4150
> Hi Benoît
>
> Could you submit a patch adding the needed entry for your board?
>
> 	Andrew
>
>
-- 
Benoît Dufour

Unfortunately still a student in Computer Science


^ permalink raw reply

* Re: DMA issues with the SKGE drivers
From: Andrew Lunn @ 2026-04-15  0:12 UTC (permalink / raw)
  To: Benoît Dufour; +Cc: netdev
In-Reply-To: <9df653d6-d7f8-4b36-87de-65daf28635dd@mail.com>

On Tue, Apr 14, 2026 at 07:23:17PM -0400, Benoît Dufour wrote:
> In 2024, I reported a bug about the SKGE driver, you can see it here:
> https://bugzilla.kernel.org/show_bug.cgi?id=219270
> 
> Basically, the problem is that the Marvell 88E8001 on my ASUS A8V motherboard
> can only work with 32bit DMA, and if trying to use 64bit DMA, the NIC won't
> work at all and after some time, the operating system will become completely
> unresponsive (on screen tty will stop refresh, keyboard and mouse input will
> stop working too).
> 
> The fix is quite easy:
> 
> At the very end of the SKGE driver source code, the ASUS A8V motherboard (as
> well as many other boards like the ASUS A8V Deluxe) should be added to the list
> of 32bit DMA boards:
> https://github.com/torvalds/linux/blob/508fed6795411f5ab277fd1edc0d7adca4946f23
> /drivers/net/ethernet/marvell/skge.c#L4150

Hi Benoît 

Could you submit a patch adding the needed entry for your board?

	Andrew



^ permalink raw reply

* Re: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
From: kernel test robot @ 2026-04-15  0:04 UTC (permalink / raw)
  To: Dudu Lu, netdev; +Cc: oe-kbuild-all, stefanha, sgarzare, mst, jasowang, Dudu Lu
In-Reply-To: <20260413085243.73200-1-phx0fer@gmail.com>

Hi Dudu,

kernel test robot noticed the following build errors:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on net/main net-next/main linus/master horms-ipvs/master v7.0 next-20260414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dudu-Lu/vsock-virtio-fix-accept-queue-count-leak-on-transport-mismatch-in-recv_listen/20260414-233232
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20260413085243.73200-1-phx0fer%40gmail.com
patch subject: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
config: arc-randconfig-001-20260415 (https://download.01.org/0day-ci/archive/20260415/202604150741.iQBI3cGE-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 13.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604150741.iQBI3cGE-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604150741.iQBI3cGE-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> net/vmw_vsock/virtio_transport_common.c:1:9: warning: data definition has no type or storage class
       1 |         sk_acceptq_added(sk);
         |         ^~~~~~~~~~~~~~~~
>> net/vmw_vsock/virtio_transport_common.c:1:9: error: type defaults to 'int' in declaration of 'sk_acceptq_added' [-Werror=implicit-int]
>> net/vmw_vsock/virtio_transport_common.c:1:9: warning: parameter names (without types) in function declaration
   In file included from include/linux/virtio_vsock.h:7,
                    from net/vmw_vsock/virtio_transport_common.c:15:
>> include/net/sock.h:1080:20: error: conflicting types for 'sk_acceptq_added'; have 'void(struct sock *)'
    1080 | static inline void sk_acceptq_added(struct sock *sk)
         |                    ^~~~~~~~~~~~~~~~
   net/vmw_vsock/virtio_transport_common.c:1:9: note: previous declaration of 'sk_acceptq_added' with type 'int()'
       1 |         sk_acceptq_added(sk);
         |         ^~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +1 net/vmw_vsock/virtio_transport_common.c

   > 1		sk_acceptq_added(sk);
     2	// SPDX-License-Identifier: GPL-2.0-only
     3	/*
     4	 * common code for virtio vsock
     5	 *
     6	 * Copyright (C) 2013-2015 Red Hat, Inc.
     7	 * Author: Asias He <asias@redhat.com>
     8	 *         Stefan Hajnoczi <stefanha@redhat.com>
     9	 */
    10	#include <linux/spinlock.h>
    11	#include <linux/module.h>
    12	#include <linux/sched/signal.h>
    13	#include <linux/ctype.h>
    14	#include <linux/list.h>
    15	#include <linux/virtio_vsock.h>
    16	#include <uapi/linux/vsockmon.h>
    17	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH net-next v9 04/10] net: phy: Create SFP phy_port before registering upstream
From: Andrew Lunn @ 2026-04-14 23:46 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-5-maxime.chevallier@bootlin.com>

On Fri, Apr 03, 2026 at 02:37:48PM +0200, Maxime Chevallier wrote:
> When dealing with PHY-driven SFP, we create a phy_port representing the
> SFP bus when we know we have such a bus.

I'm missing the big picture here.

Do we have three different things represented in the topology:

SFP bus-> SFP cage-> SFP module

	Andrew

^ permalink raw reply

* [PATCH net] hv_sock: Report EOF instead of -EIO for FIN
From: Dexuan Cui @ 2026-04-14 23:43 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, sgarzare, davem, edumazet,
	kuba, pabeni, horms, niuxuewei.nxw, linux-hyperv, virtualization,
	netdev, linux-kernel
  Cc: stable, Ben Hillis, Mitchell Levy

Commit f0c5827d07cb unluckily causes a regression for the FIN packet,
and the final read syscall gets an error rather than 0.

Ideally, we would want to fix hvs_channel_readable_payload() so that it
could return 0 in the FIN scenario, but it's not good for the hv_sock
driver to use the VMBus ringbuffer's cached priv_read_index, which is
internal data in the VMBus driver.

Fix the regression in hv_sock by returning 0 rather than -EIO.

Fixes: f0c5827d07cb ("hv_sock: Return the readable bytes in hvs_stream_has_data()")
Cc: stable@vger.kernel.org
Reported-by: Ben Hillis <Ben.Hillis@microsoft.com>
Reported-by: Mitchell Levy <levymitchell0@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 net/vmw_vsock/hyperv_transport.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 069386a74557..63d3549125be 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -703,8 +703,22 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
 	switch (hvs_channel_readable_payload(hvs->chan)) {
 	case 1:
 		need_refill = !hvs->recv_desc;
-		if (!need_refill)
-			return -EIO;
+		if (!need_refill) {
+			/* Here hvs->recv_data_len is 0, so hvs->recv_desc must
+			 * be NULL unless it points to the 0-byte-payload FIN
+			 * packet: see hvs_update_recv_data().
+			 *
+			 * Here all the payload has been dequeued, but
+			 * hvs_channel_readable_payload() still returns 1,
+			 * because the VMBus ringbuffer's read_index is not
+			 * updated for the FIN packet: hvs_stream_dequeue() ->
+			 * hv_pkt_iter_next() updates the cached priv_read_index
+			 * but has no opportunity to update the read_index in
+			 * hv_pkt_iter_close() as hvs_stream_has_data() returns
+			 * 0 for the FIN packet, so it won't get dequeued.
+			 */
+			return 0;
+		}
 
 		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
 		if (!hvs->recv_desc)
-- 
2.49.0


^ permalink raw reply related

* Re: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
From: kernel test robot @ 2026-04-14 23:40 UTC (permalink / raw)
  To: Dudu Lu, netdev; +Cc: oe-kbuild-all, stefanha, sgarzare, mst, jasowang, Dudu Lu
In-Reply-To: <20260413085243.73200-1-phx0fer@gmail.com>

Hi Dudu,

kernel test robot noticed the following build errors:

[auto build test ERROR on mst-vhost/linux-next]
[also build test ERROR on net/main net-next/main linus/master horms-ipvs/master v7.0 next-20260414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dudu-Lu/vsock-virtio-fix-accept-queue-count-leak-on-transport-mismatch-in-recv_listen/20260414-233232
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20260413085243.73200-1-phx0fer%40gmail.com
patch subject: [PATCH] vsock/virtio: fix accept queue count leak on transport mismatch in recv_listen
config: sparc-randconfig-001-20260415 (https://download.01.org/0day-ci/archive/20260415/202604150747.6LyaJckM-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604150747.6LyaJckM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604150747.6LyaJckM-lkp@intel.com/

All errors (new ones prefixed by >>):

   net/vmw_vsock/virtio_transport_common.c:1:2: warning: data definition has no type or storage class
     sk_acceptq_added(sk);
     ^~~~~~~~~~~~~~~~
   net/vmw_vsock/virtio_transport_common.c:1:2: error: type defaults to 'int' in declaration of 'sk_acceptq_added' [-Werror=implicit-int]
   net/vmw_vsock/virtio_transport_common.c:1:2: warning: parameter names (without types) in function declaration
   In file included from include/linux/virtio_vsock.h:7,
                    from net/vmw_vsock/virtio_transport_common.c:15:
>> include/net/sock.h:1080:20: error: conflicting types for 'sk_acceptq_added'
    static inline void sk_acceptq_added(struct sock *sk)
                       ^~~~~~~~~~~~~~~~
   net/vmw_vsock/virtio_transport_common.c:1:2: note: previous declaration of 'sk_acceptq_added' was here
     sk_acceptq_added(sk);
     ^~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/sk_acceptq_added +1080 include/net/sock.h

^1da177e4c3f415 Linus Torvalds 2005-04-16  1079  
^1da177e4c3f415 Linus Torvalds 2005-04-16 @1080  static inline void sk_acceptq_added(struct sock *sk)
^1da177e4c3f415 Linus Torvalds 2005-04-16  1081  {
288efe8606b62d0 Eric Dumazet   2019-11-05  1082  	WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog + 1);
^1da177e4c3f415 Linus Torvalds 2005-04-16  1083  }
^1da177e4c3f415 Linus Torvalds 2005-04-16  1084  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH net-next v9 03/10] net: phylink: Register a phy_port for MAC-driven SFP busses
From: Andrew Lunn @ 2026-04-14 23:38 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-4-maxime.chevallier@bootlin.com>

> This phy_port represents the SFP cage itself, and not the module

> +static int phylink_create_sfp_port(struct phylink *pl)

I'm thinking about naming here. If this represent the cage, why not
call this phylink_create_sfp_cage_port(). I assume as some point there
is going to be something for the module, and it seem like the naming
is going to be confusing.

> +{
> +	struct phy_port *port;
> +	int ret = 0;
> +
> +	if (!pl->netdev || !pl->sfp_bus)
> +		return 0;
> +
> +	port = phy_port_alloc();
> +	if (!port)
> +		return -ENOMEM;
> +
> +	port->is_sfp = true;
> +	port->is_mii = true;
> +	port->active = true;

If this is a cage, not a module, does is_sfp = true make sense?

And what does an active cage mean?

	Andrew

^ permalink raw reply

* [PATCH net v3 4/4] nfc: llcp: fix OOB read of DM reason byte in nfc_llcp_recv_dm
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nfc_llcp_recv_dm() reads skb->data[2] (the DM reason byte) without
first verifying that skb->len is at least LLCP_HEADER_SIZE + 1.  A DM
PDU carrying only the 2-byte LLCP header from a rogue peer therefore
triggers a 1-byte OOB read.

Add the minimum-length guard at function entry, matching the pattern
used by nfc_llcp_recv_snl() and nfc_llcp_recv_agf().

Reachable from any NFC peer within ~4 cm once an LLCP link is up.

Fixes: d646960f7986 ("NFC: Add LLCP sockets")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/llcp_core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index efe228f96..6baf2fc6b 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -1237,6 +1237,11 @@ static void nfc_llcp_recv_dm(struct nfc_llcp_local *local,
 	struct sock *sk;
 	u8 dsap, ssap, reason;

+	if (skb->len < LLCP_HEADER_SIZE + 1) {
+		pr_err("Malformed DM PDU\n");
+		return;
+	}
+
 	dsap = nfc_llcp_dsap(skb);
 	ssap = nfc_llcp_ssap(skb);
 	reason = skb->data[2];
-- 
2.51.0

^ permalink raw reply related

* [PATCH net v3 3/4] nfc: llcp: fix TLV parsing OOB in nfc_llcp_recv_snl
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nfc_llcp_recv_snl() has four problems when handling a hostile peer:

 1. nfc_llcp_dsap()/nfc_llcp_ssap() dereference skb->data[0..1] without
    verifying skb->len; a 0- or 1-byte frame leads to an OOB read.
    Additionally tlv_len = skb->len - LLCP_HEADER_SIZE wraps when
    skb->len < 2, causing the following loop to run far past the
    buffer.

 2. The per-iteration loop guard `offset < tlv_len` only proves one
    byte is available, but the body reads tlv[0] and tlv[1].

 3. The peer-supplied `length` field is used to advance `tlv` without
    being checked against the remaining array space.

 4. The SDREQ handler previously only required length >= 1 but reads
    both tid (tlv[2]) and the first byte of service_name (tlv[3], via
    the pr_debug("%.16s") print and the service_name_len = length - 1
    string usage), so length >= 2 is required.

Fix: reject frames smaller than LLCP_HEADER_SIZE up front; add TLV
header and TLV value guards at the top of each iteration; bump the
SDREQ minimum length to 2.

Reachable from any NFC peer within ~4 cm once an LLCP link is up.

Fixes: 7a06f0ee2823 ("NFC: llcp: Service Name Lookup implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/llcp_core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index 366d75663..efe228f96 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -1282,6 +1282,11 @@ static void nfc_llcp_recv_snl(struct nfc_llcp_local *local,
 	size_t sdres_tlvs_len;
 	HLIST_HEAD(nl_sdres_list);
 
+	if (skb->len < LLCP_HEADER_SIZE) {
+		pr_err("Malformed SNL PDU\n");
+		return;
+	}
+
 	dsap = nfc_llcp_dsap(skb);
 	ssap = nfc_llcp_ssap(skb);
 
@@ -1298,11 +1303,17 @@ static void nfc_llcp_recv_snl(struct nfc_llcp_local *local,
 	sdres_tlvs_len = 0;
 
 	while (offset < tlv_len) {
+		if (tlv_len - offset < 2)
+			break;
 		type = tlv[0];
 		length = tlv[1];
+		if (tlv_len - offset - 2 < length)
+			break;
 
 		switch (type) {
 		case LLCP_TLV_SDREQ:
+			if (length < 2)
+				break;
 			tid = tlv[2];
 			service_name = (char *) &tlv[3];
 			service_name_len = length - 1;
-- 
2.51.0


^ permalink raw reply related

* [PATCH net v3 2/4] nfc: llcp: fix TLV parsing in parse_gb_tlv and parse_connection_tlv
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nfc_llcp_parse_gb_tlv() and nfc_llcp_parse_connection_tlv() walk TLV
arrays whose length and content come from a peer-supplied frame.  The
parsing loop has three weaknesses:

 1. `offset` is declared u8 while `tlv_array_len` is u16.  In
    parse_connection_tlv() the TLV array can reach ~2173 bytes (MIUX
    up to 0x7FF), so 128 zero-length TLVs wrap `offset` back to 0 and
    the loop never terminates while `tlv` advances past the buffer.

 2. The guard `offset < tlv_array_len` only proves one byte is
    available, but the body reads tlv[0] (type) and tlv[1] (length).
    When one byte remains, tlv[1] is out of bounds.

 3. `length` is read from peer data and used to advance `tlv` without
    being checked against the remaining array space.  A crafted length
    walks `tlv` past the buffer; the next iteration reads tlv[0]/tlv[1]
    from adjacent memory.

The llcp_tlv8() and llcp_tlv16() accessors additionally read tlv[2]
and tlv[2..3]; a zero-length TLV makes those reads out of bounds.

Fix: promote `offset` to u16; add two per-iteration guards, one for
the TLV header and one for the TLV value; require length >= 1 for all
TLVs before the type dispatch and length >= 2 for the llcp_tlv16()
accessors (MIUX, WKS).  Return -EINVAL on malformed input.

Reached on ATR_RES (parse_gb_tlv) and on CONNECT/CC PDUs before a
connection is established (parse_connection_tlv).  Both are
triggerable from any NFC peer within ~4 cm, without authentication.

Reported-by: Simon Horman <horms@kernel.org>
Fixes: d646960f7986 ("NFC: Add LLCP sockets")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/llcp_commands.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
index 291f26fac..b6dcfb2d1 100644
--- a/net/nfc/llcp_commands.c
+++ b/net/nfc/llcp_commands.c
@@ -193,7 +193,8 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
 			  const u8 *tlv_array, u16 tlv_array_len)
 {
 	const u8 *tlv = tlv_array;
-	u8 type, length, offset = 0;
+	u8 type, length;
+	u16 offset = 0;
 
 	pr_debug("TLV array length %d\n", tlv_array_len);
 
@@ -201,8 +202,14 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
 		return -ENODEV;
 
 	while (offset < tlv_array_len) {
+		if (tlv_array_len - offset < 2)
+			return -EINVAL;
 		type = tlv[0];
 		length = tlv[1];
+		if (tlv_array_len - offset - 2 < length)
+			return -EINVAL;
+		if (length < 1)
+			return -EINVAL;
 
 		pr_debug("type 0x%x length %d\n", type, length);
 
@@ -211,9 +218,13 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
 			local->remote_version = llcp_tlv_version(tlv);
 			break;
 		case LLCP_TLV_MIUX:
+			if (length < 2)
+				return -EINVAL;
 			local->remote_miu = llcp_tlv_miux(tlv) + 128;
 			break;
 		case LLCP_TLV_WKS:
+			if (length < 2)
+				return -EINVAL;
 			local->remote_wks = llcp_tlv_wks(tlv);
 			break;
 		case LLCP_TLV_LTO:
@@ -243,7 +254,8 @@ int nfc_llcp_parse_connection_tlv(struct nfc_llcp_sock *sock,
 				  const u8 *tlv_array, u16 tlv_array_len)
 {
 	const u8 *tlv = tlv_array;
-	u8 type, length, offset = 0;
+	u8 type, length;
+	u16 offset = 0;
 
 	pr_debug("TLV array length %d\n", tlv_array_len);
 
@@ -251,13 +263,21 @@ int nfc_llcp_parse_connection_tlv(struct nfc_llcp_sock *sock,
 		return -ENOTCONN;
 
 	while (offset < tlv_array_len) {
+		if (tlv_array_len - offset < 2)
+			return -EINVAL;
 		type = tlv[0];
 		length = tlv[1];
+		if (tlv_array_len - offset - 2 < length)
+			return -EINVAL;
+		if (length < 1)
+			return -EINVAL;
 
 		pr_debug("type 0x%x length %d\n", type, length);
 
 		switch (type) {
 		case LLCP_TLV_MIUX:
+			if (length < 2)
+				return -EINVAL;
 			sock->remote_miu = llcp_tlv_miux(tlv) + 128;
 			break;
 		case LLCP_TLV_RW:
-- 
2.51.0


^ permalink raw reply related

* [PATCH net v3 1/4] nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

nci_store_general_bytes_nfc_dep() computes the General Bytes length by
subtracting a fixed header offset from the peer-supplied atr_res_len
(POLL) or atr_req_len (LISTEN) field:

    ndev->remote_gb_len = min_t(__u8,
        atr_res_len - NFC_ATR_RES_GT_OFFSET,   /* offset = 15 */
        NFC_ATR_RES_GB_MAXSIZE);

Both length fields are __u8.  When a malicious NFC-DEP peer sends an
ATR_RES/ATR_REQ whose length is smaller than the fixed offset (< 15
or < 14 respectively), the subtraction wraps:

    atr_res_len = 0  ->  (u8)(0 - 15) = 241
    min_t(__u8, 241, NFC_ATR_RES_GB_MAXSIZE=47) = 47

The subsequent memcpy then reads 47 bytes beyond the valid activation
parameter data into ndev->remote_gb[].  This buffer is later fed to
nfc_llcp_parse_gb_tlv() as a TLV array.

Reject the frame with NCI_STATUS_RF_PROTOCOL_ERROR when the length is
below the required offset, and propagate the error out of
nci_rf_intf_activated_ntf_packet() instead of silently accepting the
malformed packet.

Reachable from any NFC peer within ~4 cm during RF activation, prior
to any pairing.

Fixes: c4fbb6515709 ("NFC: NCI: Add NFC-DEP support to NCI data exchange")
Cc: stable@vger.kernel.org
Signed-off-by: Lekë Hapçiu <framemain@outlook.com>
---
 net/nfc/nci/ntf.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
index c96512bb8..eb8c6e5a1 100644
--- a/net/nfc/nci/ntf.c
+++ b/net/nfc/nci/ntf.c
@@ -631,6 +631,9 @@ static int nci_store_general_bytes_nfc_dep(struct nci_dev *ndev,
 	switch (ntf->activation_rf_tech_and_mode) {
 	case NCI_NFC_A_PASSIVE_POLL_MODE:
 	case NCI_NFC_F_PASSIVE_POLL_MODE:
+		if (ntf->activation_params.poll_nfc_dep.atr_res_len <
+		    NFC_ATR_RES_GT_OFFSET)
+			return NCI_STATUS_RF_PROTOCOL_ERROR;
 		ndev->remote_gb_len = min_t(__u8,
 			(ntf->activation_params.poll_nfc_dep.atr_res_len
 						- NFC_ATR_RES_GT_OFFSET),
@@ -643,6 +646,9 @@ static int nci_store_general_bytes_nfc_dep(struct nci_dev *ndev,
 
 	case NCI_NFC_A_PASSIVE_LISTEN_MODE:
 	case NCI_NFC_F_PASSIVE_LISTEN_MODE:
+		if (ntf->activation_params.listen_nfc_dep.atr_req_len <
+		    NFC_ATR_REQ_GT_OFFSET)
+			return NCI_STATUS_RF_PROTOCOL_ERROR;
 		ndev->remote_gb_len = min_t(__u8,
 			(ntf->activation_params.listen_nfc_dep.atr_req_len
 						- NFC_ATR_REQ_GT_OFFSET),
@@ -842,8 +848,10 @@ static int nci_rf_intf_activated_ntf_packet(struct nci_dev *ndev,
 		/* store general bytes to be reported later in dep_link_up */
 		if (ntf.rf_interface == NCI_RF_INTERFACE_NFC_DEP) {
 			err = nci_store_general_bytes_nfc_dep(ndev, &ntf);
-			if (err != NCI_STATUS_OK)
+			if (err != NCI_STATUS_OK) {
 				pr_err("unable to store general bytes\n");
+				return -EINVAL;
+			}
 		}
 
 		/* store ATS to be reported later in nci_activate_target */
-- 
2.51.0


^ permalink raw reply related

* [PATCH net v3 0/4] nfc: fix multiple parsing vulnerabilities reachable from RF
From: Lekë Hapçiu @ 2026-04-14 23:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, linux-kernel, stable,
	Lekë Hapçiu

From: Lekë Hapçiu <framemain@outlook.com>

This series fixes four RF-reachable parsing vulnerabilities in the NFC
stack.  All four are triggerable from an NFC peer within ~4 cm of the
victim, before any pairing or authentication.

Patch 1 fixes a u8 underflow in nci_store_general_bytes_nfc_dep() where
a short ATR_RES/ATR_REQ causes (atr_res_len - NFC_ATR_RES_GT_OFFSET) to
wrap in u8 arithmetic, producing a bogus remote_gb_len that copies up
to 47 bytes beyond the valid activation parameter data.

Patch 2 hardens nfc_llcp_parse_gb_tlv() and
nfc_llcp_parse_connection_tlv().  The loop guard does not prove that
two header bytes can be read, and the peer-controlled `length` field
is used to advance `tlv` without bounds checking.  An 8-bit `offset`
against a 16-bit `tlv_array_len` compounds the issue in
parse_connection_tlv() where the TLV array can exceed 255 bytes.

Patch 3 fixes nfc_llcp_recv_snl().  The SNL handler accesses skb->data
before verifying skb->len, and its inner TLV loop has the same two
weaknesses as patch 2.  SDREQ handling additionally requires
length >= 2 because both tid (tlv[2]) and the start of service_name
(tlv[3]) are read.

Patch 4 fixes nfc_llcp_recv_dm() which reads skb->data[2] (the DM
reason byte) without checking skb->len >= 3.

Changes in v3:
 - Restore the u8 -> u16 `offset` promotion in patch 2.  v2 split this
   into a separate v1 patch and did not re-send it; v3 combines the
   promotion and the bounds checks in a single patch (Paolo Abeni).
 - Return -EINVAL from nci_store_general_bytes_nfc_dep() and propagate
   the error out of nci_rf_intf_activated_ntf_packet() rather than
   silently accepting the malformed packet (Paolo Abeni).
 - Drop the style-only paren removal in patch 1 (Paolo Abeni).
 - Condense commit message in patch 2 (Paolo Abeni).
 - Consolidate the length >= 1 checks before the switch in patch 2,
   keeping length >= 2 only for the llcp_tlv16() accessors (Paolo Abeni).
 - Tighten SDREQ length check from >=1 to >=2 in patch 3; the handler
   reads both tlv[2] and tlv[3] (Sashiko).
 - Add patch 4 for nfc_llcp_recv_dm().
 - Send as a fresh thread rather than In-Reply-To v2 (Paolo Abeni).

Lekë Hapçiu (4):
  nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
  nfc: llcp: fix TLV parsing in parse_gb_tlv and parse_connection_tlv
  nfc: llcp: fix TLV parsing OOB in nfc_llcp_recv_snl
  nfc: llcp: fix OOB read of DM reason byte in nfc_llcp_recv_dm

 net/nfc/llcp_commands.c | 24 ++++++++++++++++++++++--
 net/nfc/llcp_core.c     | 16 ++++++++++++++++
 net/nfc/nci/ntf.c       | 10 +++++++++-
 3 files changed, 47 insertions(+), 3 deletions(-)

-- 
2.51.0

^ permalink raw reply

* Re: [PATCH net-next v9 02/10] net: phy: phy_link_topology: Track ports in phy_link_topology
From: Andrew Lunn @ 2026-04-14 23:31 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-3-maxime.chevallier@bootlin.com>

On Fri, Apr 03, 2026 at 02:37:46PM +0200, Maxime Chevallier wrote:
> phy_port is aimed at representing the various physical interfaces of a
> net_device. They can be controlled by various components in the link,
> such as the Ethernet PHY, the Ethernet MAC, and SFP module, etc.
> 
> Let's therefore make so we keep track of all the ports connected to a
> netdev in phy_link_topology. The only ports added for now are phy-driven
> ports.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v9 01/10] net: phy: phy_link_topology: Add a helper for opportunistic alloc
From: Andrew Lunn @ 2026-04-14 23:28 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-2-maxime.chevallier@bootlin.com>

On Fri, Apr 03, 2026 at 02:37:45PM +0200, Maxime Chevallier wrote:
> The phy_link_topology structure stores information about the PHY-related
> components connected to a net_device. It is opportunistically allocated,
> when we add the first item to the topology, as this is not relevant for
> all kinds of net_devices.
> 
> In preparation for the addition of phy_port tracking in the topology,
> let's make a dedicated helper for that allocation sequence.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [RFC PATCH net-next 2/2] selftests: net: add FOU multicast encapsulation resubmit test
From: Anton Danilov @ 2026-04-14 23:28 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest
In-Reply-To: <ad7MsSJOuUU6EGwS@dau-home-pc>

Add a selftest to verify that FOU-encapsulated packets addressed to a
multicast destination are correctly resubmitted to the inner protocol
handler (GRE) via the UDP multicast delivery path.

The test creates two network namespaces connected by a veth pair. The
receiver namespace has a FOU/GRETAP tunnel with a multicast remote
address (239.0.0.1). The sender crafts GRE-over-UDP packets and sends
them to the multicast address.

The early demux optimization (net.ipv4.ip_early_demux) is disabled on
the receiver to force packets through __udp4_lib_mcast_deliver(),
which is the code path that was previously broken.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
 tools/testing/selftests/net/Makefile          |   1 +
 .../testing/selftests/net/fou_mcast_encap.sh  | 150 ++++++++++++++++++
 2 files changed, 151 insertions(+)
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a275ed584026..9b2a573e4af2 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -38,6 +38,7 @@ TEST_PROGS := \
 	fib_rule_tests.sh \
 	fib_tests.sh \
 	fin_ack_lat.sh \
+	fou_mcast_encap.sh \
 	fq_band_pktlimit.sh \
 	gre_gso.sh \
 	gre_ipv6_lladdr.sh \
diff --git a/tools/testing/selftests/net/fou_mcast_encap.sh b/tools/testing/selftests/net/fou_mcast_encap.sh
new file mode 100755
index 000000000000..d4737d674862
--- /dev/null
+++ b/tools/testing/selftests/net/fou_mcast_encap.sh
@@ -0,0 +1,150 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test that UDP encapsulation (FOU) correctly handles packet resubmit
+# when packets are delivered via the multicast UDP delivery path.
+#
+# When a FOU-encapsulated packet arrives with a multicast destination IP,
+# __udp4_lib_mcast_deliver() must resubmit it to the inner protocol
+# handler (e.g., GRE) rather than consuming it. This test verifies that
+# by creating a FOU/GRETAP tunnel with a multicast remote address, sending
+# encapsulated packets, and checking that they are correctly decapsulated.
+#
+# The early demux optimization can mask this issue by routing packets via
+# the unicast path (udp_unicast_rcv_skb), so we disable it to force
+# packets through __udp4_lib_mcast_deliver().
+
+source lib.sh
+
+NSENDER=""
+NRECV=""
+
+cleanup() {
+	cleanup_all_ns
+}
+
+trap cleanup EXIT
+
+setup() {
+	setup_ns NSENDER NRECV
+
+	ip link add veth_s type veth peer name veth_r
+	ip link set veth_s netns "$NSENDER"
+	ip link set veth_r netns "$NRECV"
+
+	ip -n "$NSENDER" addr add 10.0.0.1/24 dev veth_s
+	ip -n "$NSENDER" link set veth_s up
+
+	ip -n "$NRECV" addr add 10.0.0.2/24 dev veth_r
+	ip -n "$NRECV" link set veth_r up
+
+	# Disable early demux to force multicast delivery path
+	ip netns exec "$NRECV" sysctl -wq net.ipv4.ip_early_demux=0
+
+	# Join multicast group on receiver
+	ip -n "$NRECV" addr add 239.0.0.1/32 dev veth_r autojoin
+
+	# Multicast routes
+	ip -n "$NRECV" route add 239.0.0.0/8 dev veth_r
+	ip -n "$NSENDER" route add 239.0.0.0/8 dev veth_s
+
+	# FOU listener
+	ip netns exec "$NRECV" ip fou add port 4797 ipproto 47
+
+	# GRETAP with multicast remote - this triggers __udp4_lib_mcast_deliver
+	ip -n "$NRECV" link add eoudp0 type gretap \
+		remote 239.0.0.1 local 10.0.0.2 \
+		encap fou encap-sport 4797 encap-dport 4797 \
+		key 239.0.0.1
+	ip -n "$NRECV" link set eoudp0 up
+	ip -n "$NRECV" addr add 192.168.99.2/24 dev eoudp0
+}
+
+send_fou_gre_packets() {
+	local count=$1
+
+	ip netns exec "$NSENDER" python3 -c "
+import socket, struct
+
+# GRE header: key flag set, proto=0x6558 (transparent ethernet bridging)
+gre_key = socket.inet_aton('239.0.0.1')
+gre_hdr = struct.pack('!HH', 0x2000, 0x6558) + gre_key
+
+# Inner Ethernet frame
+dst_mac = b'\xff\xff\xff\xff\xff\xff'
+src_mac = b'\x02\x00\x00\x00\x00\x01'
+eth_hdr = dst_mac + src_mac + struct.pack('!H', 0x0800)
+
+# Inner IP: 192.168.99.1 -> 192.168.99.2, ICMP echo
+inner_ip_src = socket.inet_aton('192.168.99.1')
+inner_ip_dst = socket.inet_aton('192.168.99.2')
+
+# ICMP echo request
+icmp_payload = b'TESTFOU!' * 4
+icmp_hdr = struct.pack('!BBHHH', 8, 0, 0, 0x1234, 1) + icmp_payload
+csum = 0
+for i in range(0, len(icmp_hdr), 2):
+    if i + 1 < len(icmp_hdr):
+        csum += (icmp_hdr[i] << 8) + icmp_hdr[i+1]
+    else:
+        csum += icmp_hdr[i] << 8
+while csum >> 16:
+    csum = (csum & 0xffff) + (csum >> 16)
+csum = ~csum & 0xffff
+icmp_hdr = struct.pack('!BBHHH', 8, 0, csum, 0x1234, 1) + icmp_payload
+
+# Inner IP header
+ip_len = 20 + len(icmp_hdr)
+ip_hdr = struct.pack('!BBHHHBBH', 0x45, 0, ip_len, 0x1234, 0, 64, 1, 0)
+ip_hdr += inner_ip_src + inner_ip_dst
+csum = 0
+for i in range(0, 20, 2):
+    csum += (ip_hdr[i] << 8) + ip_hdr[i+1]
+while csum >> 16:
+    csum = (csum & 0xffff) + (csum >> 16)
+csum = ~csum & 0xffff
+ip_hdr = ip_hdr[:10] + struct.pack('!H', csum) + ip_hdr[12:]
+
+payload = gre_hdr + eth_hdr + ip_hdr + icmp_hdr
+
+sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+for _ in range($count):
+    sock.sendto(payload, ('239.0.0.1', 4797))
+sock.close()
+"
+}
+
+get_rx_packets() {
+	ip -n "$NRECV" -s link show eoudp0 | awk '/RX:/{getline; print $2}'
+}
+
+test_fou_mcast_encap() {
+	local count=100
+	local rx_before
+	local rx_after
+	local rx_delta
+
+	rx_before=$(get_rx_packets)
+	send_fou_gre_packets $count
+	sleep 1
+	rx_after=$(get_rx_packets)
+
+	rx_delta=$((rx_after - rx_before))
+
+	if [ "$rx_delta" -ge "$count" ]; then
+		echo "PASS: received $rx_delta/$count packets via multicast FOU/GRETAP"
+		return "$ksft_pass"
+	elif [ "$rx_delta" -gt 0 ]; then
+		echo "FAIL: only $rx_delta/$count packets received (partial delivery)"
+		return "$ksft_fail"
+	else
+		echo "FAIL: 0/$count packets received (multicast encap resubmit broken)"
+		return "$ksft_fail"
+	fi
+}
+
+echo "TEST: FOU/GRETAP multicast encapsulation resubmit"
+
+setup
+test_fou_mcast_encap
+exit $?
-- 
2.47.3


^ permalink raw reply related

* [RFC PATCH net-next 1/2] udp: fix encapsulation packet resubmit in multicast deliver
From: Anton Danilov @ 2026-04-14 23:27 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest
In-Reply-To: <ad7MsSJOuUU6EGwS@dau-home-pc>

When a UDP encapsulation socket (e.g., FOU) receives a multicast
packet, __udp4_lib_mcast_deliver() and __udp6_lib_mcast_deliver()
incorrectly call consume_skb() when udp_queue_rcv_skb() returns a
positive value. A positive return value from udp_queue_rcv_skb()
indicates that the encap_rcv handler (e.g., fou_udp_recv) has
consumed the UDP header and wants the packet to be resubmitted to
the IP protocol handler for further processing (e.g., as a GRE
packet).

The unicast path in udp_unicast_rcv_skb() handles this correctly by
returning -ret, which propagates up to ip_protocol_deliver_rcu() for
resubmission. The GSO path in udp_queue_rcv_skb() also handles this
correctly by calling ip_protocol_deliver_rcu() directly. However, the
multicast path destroys the packet via consume_skb() instead of
resubmitting it, causing silent packet loss.

This bug affects any UDP encapsulation (FOU, GUE) combined with
multicast destination addresses. In practice, it causes ~50% packet
loss on FOU/GRETAP tunnels configured with multicast remote addresses,
with the exact ratio depending on the early demux cache hit rate
(packets that hit early demux take the unicast path and are handled
correctly).

Fix this by calling ip_protocol_deliver_rcu() (IPv4) or
ip6_protocol_deliver_rcu() (IPv6) instead of consume_skb() when the
return value is positive, matching the behavior of the GSO path.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
 net/ipv4/udp.c | 13 +++++++++----
 net/ipv6/udp.c | 13 +++++++++----
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e9e2ce9522ef..8c2d4367cba2 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2467,6 +2467,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	struct udp_hslot *hslot;
 	struct sk_buff *nskb;
 	bool use_hash2;
+	int ret;

 	hash2_any = 0;
 	hash2 = 0;
@@ -2500,8 +2501,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			__UDP_INC_STATS(net, UDP_MIB_INERRORS);
 			continue;
 		}
-		if (udp_queue_rcv_skb(sk, nskb) > 0)
-			consume_skb(nskb);
+		ret = udp_queue_rcv_skb(sk, nskb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
+						ret);
 	}

 	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
@@ -2511,8 +2514,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	}

 	if (first) {
-		if (udp_queue_rcv_skb(first, skb) > 0)
-			consume_skb(skb);
+		ret = udp_queue_rcv_skb(first, skb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(skb->dev), skb,
+						ret);
 	} else {
 		kfree_skb(skb);
 		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 15e032194ecc..f74935d9f7d7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -949,6 +949,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	struct udp_hslot *hslot;
 	struct sk_buff *nskb;
 	bool use_hash2;
+	int ret;

 	hash2_any = 0;
 	hash2 = 0;
@@ -987,8 +988,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			continue;
 		}

-		if (udpv6_queue_rcv_skb(sk, nskb) > 0)
-			consume_skb(nskb);
+		ret = udpv6_queue_rcv_skb(sk, nskb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
+						 ret, true);
 	}

 	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
@@ -998,8 +1001,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	}

 	if (first) {
-		if (udpv6_queue_rcv_skb(first, skb) > 0)
-			consume_skb(skb);
+		ret = udpv6_queue_rcv_skb(first, skb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb,
+						 ret, true);
 	} else {
 		kfree_skb(skb);
 		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH net-next 0/2] udp: fix FOU/GUE over multicast
From: Anton Danilov @ 2026-04-14 23:24 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest

UDP encapsulation (FOU, GUE) has never worked correctly with multicast
destination addresses. When a FOU-encapsulated packet arrives at a
multicast address, it enters __udp4_lib_mcast_deliver() which calls
consume_skb() on packets that need resubmission to the inner protocol
handler, silently dropping them instead.

The unicast delivery path and the GSO segmentation path both handle
this correctly, but the multicast path was never updated to support
UDP encapsulation resubmit.

This causes silent packet loss for FOU/GRETAP tunnels configured with
multicast remote addresses. The loss ratio depends on the early demux
cache hit rate - packets that hit early demux bypass the multicast path
and work correctly, masking the issue.

Reproducing the issue:

  ip netns add ns_a && ip netns add ns_b
  ip link add veth0 type veth peer name veth1
  ip link set veth0 netns ns_a && ip link set veth1 netns ns_b

  ip -n ns_a addr add 10.0.0.1/24 dev veth0 && ip -n ns_a link set veth0 up
  ip -n ns_b addr add 10.0.0.2/24 dev veth1 && ip -n ns_b link set veth1 up

  # Disable early demux to expose the bug (otherwise it's partially masked)
  ip netns exec ns_b sysctl -w net.ipv4.ip_early_demux=0

  # Join multicast group
  ip -n ns_b addr add 239.0.0.1/32 dev veth1 autojoin

  # FOU + GRETAP with multicast remote
  ip netns exec ns_b ip fou add port 4797 ipproto 47
  ip -n ns_b link add eoudp0 type gretap \
      remote 239.0.0.1 local 10.0.0.2 \
      encap fou encap-sport 4797 encap-dport 4797 key 239.0.0.1
  ip -n ns_b link set eoudp0 up

  # Send FOU/GRE packets to 239.0.0.1:4797 from ns_a
  # -> without this fix: 0 packets received on eoudp0
  # -> with this fix: all packets received on eoudp0

AI assistance (Claude, claude-opus-4-6) was used during root cause
analysis of the kernel source code (tracing the call chain from
udp_queue_rcv_skb through encap_rcv to ip_protocol_deliver_rcu,
comparing unicast/GSO/multicast paths) and during patch and selftest
authoring. The fix approach was identified by observing that the
unicast path (udp_unicast_rcv_skb) and the GSO path
(udp_queue_rcv_skb) both already handle encap resubmit correctly,
while the multicast path did not.

Anton Danilov (2):
  udp: fix encapsulation packet resubmit in multicast deliver
  selftests: net: add FOU multicast encapsulation resubmit test

 net/ipv4/udp.c                                |  13 +-
 net/ipv6/udp.c                                |  13 +-
 tools/testing/selftests/net/Makefile          |   1 +
 .../testing/selftests/net/fou_mcast_encap.sh  | 150 ++++++++++++++++++
 4 files changed, 169 insertions(+), 8 deletions(-)
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh

-- 
2.47.3

^ permalink raw reply

* Re: [PATCH net-next v9 01/10] net: phy: phy_link_topology: Add a helper for opportunistic alloc
From: Andrew Lunn @ 2026-04-14 23:23 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Jakub Kicinski, Eric Dumazet, Paolo Abeni, Russell King,
	Heiner Kallweit, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau
In-Reply-To: <20260403123755.175742-2-maxime.chevallier@bootlin.com>

On Fri, Apr 03, 2026 at 02:37:45PM +0200, Maxime Chevallier wrote:
> The phy_link_topology structure stores information about the PHY-related
> components connected to a net_device. It is opportunistically allocated,
> when we add the first item to the topology, as this is not relevant for
> all kinds of net_devices.
> 
> In preparation for the addition of phy_port tracking in the topology,
> let's make a dedicated helper for that allocation sequence.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox