Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 2/2] selftests/bpf: Cover small conntrack opts error writes
From: bot+bpf-ci @ 2026-06-16  6:19 UTC (permalink / raw)
  To: chenyy23, bpf, netfilter-devel
  Cc: chenyy23, pablo, fw, phil, davem, edumazet, kuba, pabeni, horms,
	andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <c4c898dd23181b676ebf6b6b4d9c54f51bb69c75.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

[-- Attachment #1: Type: text/plain, Size: 1565 bytes --]

> selftests/bpf: Cover small conntrack opts error writes
>
> Add a conntrack kfunc regression check for opts__sz values that do not
> cover opts->error. The BPF program initializes opts->error with a guard
> value, calls the lookup and allocation kfuncs with opts__sz set to
> sizeof(opts->netns_id), and verifies that the guard is still intact
> after the kfunc returns NULL.
>
> Without the conntrack wrapper guard, the kfunc error path overwrites
> that guard with -EINVAL even though the verifier checked only the first
> four bytes of the options object.
>
> Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>

This is a regression test for the out-of-bounds opts->error write that
is fixed in the prior series commit 86b813c37ee5 ("bpf: Guard conntrack
opts error writes"), but it carries no Fixes: tags.

A regression test usually carries the same Fixes: tags as the fix it
covers, so it gets backported to stable alongside that fix. The two new
test cases map onto the two commits that introduced the kfuncs: the
small-opts lookup case covers the lookup kfunc and the small-opts alloc
case covers the alloc kfunc.

Should this include:

  Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF")
  Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27597459087

^ permalink raw reply

* Re: [BUG] kernel BUG in team driver: buffer overflow in team_add_slave()
From: Yeswanth Krishna @ 2026-06-16  6:08 UTC (permalink / raw)
  To: netdev, venkat88; +Cc: linux-kernel, linuxppc-dev
In-Reply-To: <a08b0a7f-089f-4428-9360-9edbec3a5453@linux.ibm.com>


> Please add below reported-by tag:
> Reported-by: Yeswanth Krishna Tellakula <yeswanth@linux.ibm.com>\
>
>

^ permalink raw reply

* Re: [PATCH v3 0/3] net: stmmac: L3/L4 filter bug fixes
From: Maxime Chevallier @ 2026-06-16  6:06 UTC (permalink / raw)
  To: muhammad.nazim.amirul.nazle.asmade, netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
	Jose.Abreu, linux-kernel
In-Reply-To: <20260616042655.7782-1-muhammad.nazim.amirul.nazle.asmade@altera.com>

Hi Nazim,

On 6/16/26 06:26, muhammad.nazim.amirul.nazle.asmade@altera.com wrote:
> From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
> 
> This series fixes three bugs in the stmmac L3/L4 TC flower filter
> implementation for the XGMAC2 core. All three patches target net.
> 
> The L3/L4 filter match count statistics patch (originally patch 4/4)
> has been split out and will be sent separately against net-next per
> Andrew Lunn's review of v1.
> 
> Patch 1 fixes a register corruption bug in the L4 filter port configuration.
> The XGMAC_L4_ADDR register holds both source and destination port match
> values in a single register. The original code overwrites the entire register
> when setting either field, silently erasing the other. This is fixed by
> using a read-modify-write sequence.
> 
> Patch 2 fixes the basic flow match parser to properly reject unsupported
> offload requests with -EOPNOTSUPP instead of silently accepting them.
> Unsupported cases include partial protocol masks, non-IPv4 network proto,
> and non-TCP/UDP transport proto. Extack messages are now included so users
> know exactly which part of the match is unsupported. The -EOPNOTSUPP is
> also now returned directly instead of using break, which was silently
> discarding the error on FLOW_CLS_REPLACE operations.
> 
> Patch 3 fixes a stale action bug on filter deletion. When a filter entry
> with a drop action is deleted, the action field was not reset, causing
> it to persist and potentially affect subsequent filter configurations.
> 
> All three patches fix the original L3/L4 filter implementation introduced in
> 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower").
> 
> Changes in v3:
> - Patch 2: add extack messages to each -EOPNOTSUPP return (Jakub Kicinski)
> - Patch 2: return -EOPNOTSUPP directly instead of break to avoid silently
>   reporting success on unsupported FLOW_CLS_REPLACE (Sashiko review)

Please take a look at this page prior to reposting :

https://netdev.bots.linux.dev/net-next.html

There's also an announcement made on the netdev@ list when net-next
opens/closes.

You can't submit new series that target net-next during the merge window,
this revision will have to wait 2 weeks.

Maxime


^ permalink raw reply

* RE: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned folio
From: Kasireddy, Vivek @ 2026-06-16  6:04 UTC (permalink / raw)
  To: Bobby Eshleman, Donald Hunter, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Andrew Lunn,
	Gerd Hoffmann, Sumit Semwal, Christian König, Shuah Khan,
	Jason Gunthorpe
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kselftest@vger.kernel.org,
	sdf@fomichev.me, razor@blackwall.org, daniel@iogearbox.net,
	almasrymina@google.com, matttbe@kernel.org, skhawaja@google.com,
	dw@davidwei.uk, Bobby Eshleman
In-Reply-To: <20260611-tcpdm-large-niovs-v2-2-ee2bf15e7523@meta.com>

Adding Jason to this discussion.

Hi Bobby,

> Subject: [PATCH net-next v2 2/4] udmabuf: emit one sg entry per pinned
> folio
> 
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> get_sg_table() emitted one PAGE_SIZE sg entry per page even when the
> underlying folio was larger.
> 
> Instead, walk folios[] and emit one sg entry per folio. When folios
We have recently merged a patch (that will make it into 7.2) from Jason that
replaced sg_set_folio() with sg_alloc_table_from_pages() in udmabuf driver:
https://gitlab.freedesktop.org/drm/tip/-/commit/5bf888673e0dda5a53220fa0c4956271a46c353c

Since you are relying on sg_set_folio(), the core argument against its usage
in udmabuf is that it doesn't work well with offsets > PAGE_SIZE, resulting
in a malformed scatterlist. Not sure if this can be fixed easily.

> represent large pages (as is for MFD_HUGETLB), each sg entry is a large
> page. Normal PAGE_SIZE sg tables are unchanged.
> 
> This is helpful for importers like net/core/devmem that expect dmabuf sg
IMO, udmabuf needs to detect whether importers can handle segments that
are > PAGE_SIZE and set the entries appropriately. Please look into how the
GPU drivers and other dmabuf exporters/importers handle this situation, so
that we can adopt best practices to address this issue.

Thanks,
Vivek

> entries to be size and length aligned. Prior to this patch udmabuf
> handed over one PAGE_SIZE sg entry per page, so devmem only saw
> PAGE_SIZE chunks regardless of the underlying folio size.
> 
> dma_map_sgtable() does not always merge contiguous pages for us, so we
> do this internally before exporting.
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---
>  drivers/dma-buf/udmabuf.c | 52
> ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 47 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 94b8ecb892bb..9b751dd98b12 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -141,26 +141,68 @@ static void vunmap_udmabuf(struct dma_buf
> *buf, struct iosys_map *map)
>  	vm_unmap_ram(map->vaddr, ubuf->pagecount);
>  }
> 
> +/* Return the number of contiguous pages backed by the folio at @i.
> + * A udmabuf may map only part of a folio, or reference the same folio
> + * in multiple non-contiguous runs, so folio_nr_pages() can't be used.
> + */
> +static pgoff_t udmabuf_folio_nr_pages(struct udmabuf *ubuf, pgoff_t i)
> +{
> +	struct folio *f = ubuf->folios[i];
> +	pgoff_t j;
> +
> +	for (j = 1; i + j < ubuf->pagecount; j++) {
> +		if (ubuf->folios[i + j] != f)
> +			break;
> +		/* Same folio, but not a sequential offset within it. */
> +		if (ubuf->offsets[i + j] != ubuf->offsets[i] + j * PAGE_SIZE)
> +			break;
> +	}
> +	return j;
> +}
> +
> +/* Count the contiguous folio runs in @ubuf, one sg entry per run.
> + *
> + * Coalescing folios into a single sg entry up front lets importers actually
> + * see large chunks. We can't rely on dma_map_sgtable() to do this for us
> as
> + * the dma_map_direct() path preserves the input scatterlist lengths
> verbatim.
> + */
> +static unsigned int udmabuf_sg_nents(struct udmabuf *ubuf)
> +{
> +	unsigned int nents = 0;
> +	pgoff_t i;
> +
> +	for (i = 0; i < ubuf->pagecount; i += udmabuf_folio_nr_pages(ubuf,
> i))
> +		nents++;
> +	return nents;
> +}
> +
>  static struct sg_table *get_sg_table(struct device *dev, struct dma_buf
> *buf,
>  				     enum dma_data_direction direction)
>  {
>  	struct udmabuf *ubuf = buf->priv;
> -	struct sg_table *sg;
>  	struct scatterlist *sgl;
> -	unsigned int i = 0;
> +	struct sg_table *sg;
> +	pgoff_t i, run;
> +	unsigned int nents;
>  	int ret;
> 
> +	nents = udmabuf_sg_nents(ubuf);
> +
>  	sg = kzalloc_obj(*sg);
>  	if (!sg)
>  		return ERR_PTR(-ENOMEM);
> 
> -	ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
> +	ret = sg_alloc_table(sg, nents, GFP_KERNEL);
>  	if (ret < 0)
>  		goto err_alloc;
> 
> -	for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
> -		sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
> +	sgl = sg->sgl;
> +	for (i = 0; i < ubuf->pagecount; i += run) {
> +		run = udmabuf_folio_nr_pages(ubuf, i);
> +		sg_set_folio(sgl, ubuf->folios[i], run << PAGE_SHIFT,
>  			     ubuf->offsets[i]);
> +		sgl = sg_next(sgl);
> +	}
> 
>  	ret = dma_map_sgtable(dev, sg, direction, 0);
>  	if (ret < 0)
> 
> --
> 2.53.0-Meta


^ permalink raw reply

* Re: [PATCH 0/18] pull request (net-next): ipsec-next 2026-06-12
From: Antony Antony @ 2026-06-16  5:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Steffen Klassert, Antony Antony, David Miller, Herbert Xu, netdev
In-Reply-To: <20260613131552.2562d433@kernel.org>

On Sat, Jun 13, 2026 at 01:15:52PM -0700, Jakub Kicinski wrote:
> On Fri, 12 Jun 2026 09:46:16 +0200 Steffen Klassert wrote:
> > 3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that
> >    allows migrating individual IPsec SAs independently of
> >    their policies. The existing XFRM_MSG_MIGRATE is tightly coupled
> >    to policy+SA migration, lacks SPI for unique SA identification,
> >    and cannot express reqid changes or migrate Transport mode
> >    selectors. The new interface identifies the SA via SPI and mark,
> >    supports reqid changes, address family changes, encap removal,
> >    and uses an atomic create+install flow under x->lock to prevent
> >    SN/IV reuse during AEAD SA migration.
> >    From Antony Antony.
> 
> Hi! There are some Sashiko comments here, please follow up:
> 
> https://sashiko.dev/#/patchset/20260612074725.1760473-8-steffen.klassert@secunet.com
> 

Thanks Jakub. I have fixes and testing them now. And I will send fixes soon.

The comments didn't click until I realized xfrm_user_state_lookup() only
keys on mark.v & mark.m, so distinct (v, m) pairs collapse to the same
masked value. A lookup key of {0, 0} matches a source SA with mark
{0, 0xffffff} (both mask to 0), but reusing {0, 0} as the migrated mark 
turns "match only mark 0x00" into "match all traffic".

Fix is copy from old SA than from old_mark passed along. This also pointed 
more issues.

-antony


^ permalink raw reply

* Re: [PATCH net v2] xfrm: Fix dev use-after-free in xfrm async resumption
From: Steffen Klassert @ 2026-06-16  6:01 UTC (permalink / raw)
  To: Dong Chenchen
  Cc: herbert, davem, edumazet, kuba, pabeni, horms, tpluszz77, idosch,
	netdev, zhangchangzhong, xuchunxiao3
In-Reply-To: <20260609092117.1362316-1-dongchenchen2@huawei.com>

On Tue, Jun 09, 2026 at 05:21:17PM +0800, Dong Chenchen wrote:
> xfrm async resumption hold skb->dev refcnt until after transport_finish.
> However, xfrm_rcv_cb may modify skb->dev to tunnel dev without taking
> device reference, such as vti_rcv_cb. The subsequent async resumption
> will decrement the tunnel device's reference count, which lead to uaf
> of tunnel dev and refcnt leak of orig dev as below:
> 
> unregister_netdevice: waiting for vti1 to become free. Usage count = -2
> 
> Stash the original skb->dev to fix refcnt imbalance. The new skb->dev set
> by xfrm_rcv_cb can race with device teardown. Extend rcu protection over
> xfrm_rcv_cb and transport_finish to prevent races.
> 
> Fixes: 1c428b038400 ("xfrm: hold dev ref until after transport_finish NF_HOOK")
> Reported-by: Xu Chunxiao <xuchunxiao3@huawei.com>
> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>

Applied to the ipsec tree, thanks Dong!

^ permalink raw reply

* Re: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level triggered.
From: Parthiban.Veerasooran @ 2026-06-16  6:01 UTC (permalink / raw)
  To: Selvamani.Rajagopal, andrew+netdev, davem, edumazet, kuba, pabeni,
	robh, krzk+dt, conor+dt, pier.beruto
  Cc: andrew, netdev, linux-kernel, Conor.Dooley, devicetree
In-Reply-To: <20260611-level-trigger-v5-1-4533a9e85ce2@onsemi.com>

Hi Selvamani,

I did a quick test by connecting Mikroe LAN8651 Click to a Raspberry Pi 
4 and shared the feedback below. Please let me know if you need any 
further details.

Test case 1: Single LAN8651 instance on RPI4

Setup:

RPI4 #1 + LAN8651 (IP: 192.168.10.101) <--- RPI4 #2 + EVB-LAN8670-USB 
(IP: 192.168.10.102)

Commands:

iperf3 -s -p 5001 <--- iperf3 -c 192.168.10.101 -u -b 9.4M -i 1 -t 0 -p 5001

Result:

No issues observed.

Test case 2: Two LAN8651 instances on the same RPI4

Setup:

RPI4 #1 + LAN8651 (IP: 192.168.10.101) <--- RPI4 #2 + EVB-LAN8670-USB 
(IP: 192.168.10.102)
RPI4 #1 + LAN8651 (IP: 192.168.20.101) <--- RPI4 #2 + EVB-LAN8670-USB 
(IP: 192.168.20.102)

Result:

Initially working fine with continuous "Receive buffer overflow" errors. 
This is expected, as both USB devices transmit at full speed while RPI4 
can handle the traffic only up to a maximum SPI frequency of 15 MHz. 
Eventually, the system crashed. The crash message was captured in the 
dmesg log,

[ 8276.676335] net_ratelimit: 2448 callbacks suppressed
[ 8276.676341] eth1: Receive buffer overflow error
[ 8276.676349] eth2: Receive buffer overflow error
[ 8276.680025] eth2: Receive buffer overflow error
[ 8276.680033] eth1: Receive buffer overflow error
[ 8276.683701] eth2: Receive buffer overflow error
[ 8276.683710] eth1: Receive buffer overflow error
[ 8276.687378] eth2: Receive buffer overflow error
[ 8276.687387] eth1: Receive buffer overflow error
[ 8276.691055] eth2: Receive buffer overflow error
[ 8276.691064] eth1: Receive buffer overflow error
[ 8281.662600] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000074
[ 8281.670936] Mem abort info:
[ 8281.673747]   ESR = 0x0000000096000005
[ 8281.677544]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 8281.682917]   SET = 0, FnV = 0
[ 8281.685997]   EA = 0, S1PTW = 0
[ 8281.689173]   FSC = 0x05: level 1 translation fault
[ 8281.694109] Data abort info:
[ 8281.697017]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 8281.702571]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 8281.707680]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 8281.713056] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000040a1d000
[ 8281.719578] [0000000000000074] pgd=0000000000000000, 
p4d=0000000000000000, pud=0000000000000000
[ 8281.728391] Internal error: Oops: 0000000096000005 [#1]  SMP
[ 8281.734115] Modules linked in: sch_fq lan865x_t1s(O) microchip_t1s(O) 
snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfcomm algif_hash 
aes_neon_bs algif_skcipher af_alg bnep binfmt_misc brcmfmac_cyw brcmfmac 
vc4 hci_uart brcmutil btbcm bluetooth v3d snd_soc_hdmi_codec bcm2835_isp 
cfg80211 rpi_hevc_dec bcm2835_codec(C) drm_exec bcm2835_v4l2(C) 
drm_display_helper ecdh_generic cec ecc bcm2835_mmal_vchiq gpu_sched 
videobuf2_vmalloc vc_sm_cma v4l2_mem2mem drm_dma_helper rfkill crc_ccitt 
drm_client_lib drm_shmem_helper videobuf2_dma_contig videobuf2_memops 
drm_kms_helper videobuf2_v4l2 snd_soc_core videodev raspberrypi_hwmon 
snd_bcm2835(C) snd_compress i2c_brcmstb snd_pcm_dmaengine snd_pcm 
videobuf2_common snd_timer mc raspberrypi_gpiomem spi_bcm2835 snd 
gpio_fan nvmem_rmem sch_fq_codel i2c_dev zram lz4_compress drm fuse 
drm_panel_orientation_quirks backlight nfnetlink
[ 8281.811847] CPU: 3 UID: 0 PID: 1759 Comm: irq/59-spi0.0 Tainted: G 
      C O        7.1.0-rc7-v8+ #1 PREEMPT
[ 8281.822067] Tainted: [C]=CRAP, [O]=OOT_MODULE
[ 8281.826473] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 8281.832377] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[ 8281.839427] pc : skb_put+0x14/0x80
[ 8281.842864] lr : oa_tc6_macphy_threaded_irq+0x428/0x880 [lan865x_t1s]
[ 8281.849386] sp : ffffffc083c4bd40
[ 8281.852735] x29: ffffffc083c4bd40 x28: 000000002020003e x27: 
ffffffe59e5609c8
[ 8281.859962] x26: ffffff8103d55080 x25: 0000000000000001 x24: 
ffffff8040566880
[ 8281.867187] x23: 0000000000000001 x22: 0000000000000000 x21: 
0000000000000000
[ 8281.874414] x20: 000000003e002020 x19: ffffff80405668a0 x18: 
00000000000b6748
[ 8281.881641] x17: ffffff9b64502000 x16: ffffffe59f08cef0 x15: 
1ae8add2c0a08935
[ 8281.888867] x14: b154d86008ee08e7 x13: b66a1ae8add2c0a0 x12: 
8935b154d86008ee
[ 8281.896093] x11: 00000000000000c0 x10: 0000000000001ae0 x9 : 
ffffffe590bb7918
[ 8281.903320] x8 : ffffff80493c1b40 x7 : 0000000000000004 x6 : 
ffffffffffffffff
[ 8281.910547] x5 : ffffffe59fb9d000 x4 : 0000000000000004 x3 : 
0000000000000000
[ 8281.917773] x2 : 0000000000000000 x1 : 0000000000000040 x0 : 
0000000000000000
[ 8281.925000] Call trace:
[ 8281.927468]  skb_put+0x14/0x80 (P)
[ 8281.930905]  oa_tc6_macphy_threaded_irq+0x428/0x880 [lan865x_t1s]
[ 8281.937073]  irq_thread_fn+0x34/0xc0
[ 8281.940686]  irq_thread+0x1a8/0x308
[ 8281.944212]  kthread+0x138/0x150
[ 8281.947472]  ret_from_fork+0x10/0x20
[ 8281.951089] Code: d503201f d503233f a9bf7bfd 910003fd (b9407406)
[ 8281.957258] ---[ end trace 0000000000000000 ]---
[ 8281.961969] genirq: exiting task "irq/59-spi0.0" (1759) is an active 
IRQ thread (irq 59)
[ 8282.080140] irq 59: nobody cared (try booting with the "irqpoll" option)
[ 8282.086344] CPU: 0 UID: 0 PID: 15 Comm: rcu_preempt Tainted: G      D 
  C O        7.1.0-rc7-v8+ #1 PREEMPT
[ 8282.086352] Tainted: [D]=DIE, [C]=CRAP, [O]=OOT_MODULE
[ 8282.086354] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 8282.086357] Call trace:
[ 8282.086359]  show_stack+0x20/0x38 (C)
[ 8282.086372]  dump_stack_lvl+0x60/0x80
[ 8282.086378]  dump_stack+0x18/0x24
[ 8282.086382]  __report_bad_irq+0x54/0xf0
[ 8282.086388]  note_interrupt+0x344/0x398
[ 8282.086393]  handle_irq_event+0xa4/0x110
[ 8282.086397]  handle_level_irq+0xe0/0x178
[ 8282.086401]  handle_irq_desc+0x3c/0x68
[ 8282.086407]  generic_handle_domain_irq+0x20/0x40
[ 8282.086413]  bcm2835_gpio_irq_handle_bank+0x180/0x1c8
[ 8282.086420]  bcm2835_gpio_irq_handler+0x88/0x188
[ 8282.086424]  handle_irq_desc+0x3c/0x68
[ 8282.086430]  generic_handle_domain_irq+0x20/0x40
[ 8282.086435]  gic_handle_irq+0x4c/0xe0
[ 8282.086438]  call_on_irq_stack+0x30/0x88
[ 8282.086444]  do_interrupt_handler+0x88/0x98
[ 8282.086447]  el1_interrupt+0x3c/0x60
[ 8282.086452]  el1h_64_irq_handler+0x18/0x30
[ 8282.086457]  el1h_64_irq+0x6c/0x70
[ 8282.086460]  _raw_spin_unlock_irq+0x10/0x60 (P)
[ 8282.086465]  rcu_gp_kthread+0x2f0/0x310
[ 8282.086471]  kthread+0x138/0x150
[ 8282.086476]  ret_from_fork+0x10/0x20
[ 8282.086481] handlers:
[ 8282.170193] lan8650 spi0.1: SPI transfer timed out
[ 8282.170591] [<0000000094f492f8>] oa_tc6_macphy_isr [lan865x_t1s]
[ 8282.174845] spi_master spi0: failed to transfer one message from queue
[ 8282.178435]  threaded [<000000008b769ba3>] oa_tc6_macphy_threaded_irq 
[lan865x_t1s]
[ 8282.178443] spi_master spi0: noqueue transfer failed

[ 8282.182578] Disabling IRQ #59
[ 8282.238458] lan8650 spi0.1 eth2: SPI data transfer failed: -110
[ 8282.244490] lan8650 spi0.1: Device interrupt disabled to avoid 
interrupt storm

Best regards,
Parthiban V

On 12/06/26 3:25 am, Selvamani Rajagopal via B4 Relay wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> From: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> 
> According OPEN Alliance 10BASET1x MAC-PHY Serial Interface
> specification, interrupt is active low, level triggered.
> 
> Code used edge triggered interrupt which has the risk of losing an
> interrupt on instances like when interrupt is disabled. Level
> triggered interrupt won't be deasserted unless handler runs and
> clear the interrupting conditions.
> 
> Interrupt handler mechanism is changed to threaded irq from
> interrupt handler and kernel thread waiting on work queue.
> Threaded irq mechanism is best suited for level triggered interrupt
> as it disables the interrupt until handler is run in thread level,
> while giving us an ability to have interrupt context handler to
> signal the threaded irq handler.
> 
> Introduced a logic to disable the device interrupt on error. Error
> could be due in data chunk's header and footer or SPI interface itself.
> This will avoid having repeated interrupts, in case the driver couldn't
> recover from the error condition with the available recovery mechanism.
> 
> Fixes: 2c6ce5354453 ("net: ethernet: oa_tc6: implement mac-phy interrupt")
> Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>

^ permalink raw reply

* Re: [PATCH net v2] net: af_key: initialize alg_key_len for IPComp states
From: Steffen Klassert @ 2026-06-16  6:00 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Zijing Yin, Herbert Xu, David S . Miller, Eric Dumazet,
	Paolo Abeni, Ido Schimmel, Simon Horman, netdev, linux-kernel,
	stable
In-Reply-To: <aibn3tkGc3Iz1r5n@krikkit>

On Mon, Jun 08, 2026 at 06:03:42PM +0200, Sabrina Dubroca wrote:
> note: fixes for IPsec should go to the "ipsec" tree, not net
> 
> 2026-06-08, 07:44:41 -0700, Zijing Yin wrote:
> > pfkey_msg2xfrm_state() handles the IPComp (SADB_X_SATYPE_IPCOMP) case by
> > allocating x->calg and copying only the algorithm name:
> > 
> > 	x->calg = kmalloc_obj(*x->calg);
> > 	if (!x->calg) {
> > 		err = -ENOMEM;
> > 		goto out;
> > 	}
> > 	strcpy(x->calg->alg_name, a->name);
> > 	x->props.calgo = sa->sadb_sa_encrypt;
> > 
> > Unlike the authentication (x->aalg) and encryption (x->ealg) branches of
> > the same function, the compression branch never initializes
> > calg->alg_key_len.  IPComp carries no key and the allocation only
> > reserves sizeof(struct xfrm_algo) (i.e. no room for a key), so the field
> > is left containing uninitialized slab data.
> > 
> > calg->alg_key_len is later used as a length by xfrm_algo_clone() when an
> > IPComp state is cloned during XFRM_MSG_MIGRATE:
> 
> The patch looks correct, but do we want to start fixing random bugs in
> code that we're trying to get rid of and that nobody actually uses?
> 
> If we do, then:
> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

As long as we have the code in the repo, we do.

Applied, thanks everyone!

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: use compat translator only for u64 alignment mismatch
From: Steffen Klassert @ 2026-06-16  5:58 UTC (permalink / raw)
  To: Pradhan, Sanman
  Cc: netdev@vger.kernel.org, herbert@gondor.apana.org.au,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, 0x7f454c46@gmail.com,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Sanman Pradhan
In-Reply-To: <20260607164726.1544435-1-sanman.pradhan@hpe.com>

On Sun, Jun 07, 2026 at 04:47:34PM +0000, Pradhan, Sanman wrote:
> From: Sanman Pradhan <psanman@juniper.net>
> 
> The XFRM compat layer (CONFIG_XFRM_USER_COMPAT) translates 32-bit xfrm
> netlink and setsockopt messages into the native 64-bit layout. It is
> only needed on architectures where the 32-bit and 64-bit ABIs disagree
> on u64 alignment, which the kernel encodes as COMPAT_FOR_U64_ALIGNMENT.
> 
> That symbol is defined only by arch/x86. XFRM_USER_COMPAT depends on it,
> so the translator can never be built on any other architecture,
> including arm64, which still provides a 32-bit compat ABI (CONFIG_COMPAT)
> for AArch32 EL0 userspace. On arm64 the AArch32 EABI already aligns u64
> to 8 bytes, identical to the AArch64 ABI, so no translation is required
> and the native code path is correct for 32-bit tasks.
> 
> However, xfrm_user_rcv_msg() and xfrm_user_policy() gate on
> in_compat_syscall() alone and then call xfrm_get_translator(), which
> returns NULL when no translator is registered. On arm64 that is always
> the case, so every xfrm netlink message and the XFRM_POLICY setsockopt
> issued by a 32-bit task returns -EOPNOTSUPP. A 32-bit userspace process
> on arm64 (and on any other arch with CONFIG_COMPAT but without
> COMPAT_FOR_U64_ALIGNMENT) therefore cannot configure XFRM state or
> policy through the XFRM_USER netlink API, and cannot use the XFRM_POLICY
> setsockopt path, because both fail before reaching the native parser.
> 
> The translator series replaced the blanket compat rejection with a
> translator lookup. That made the path usable on x86 when the translator
> is available, but left architectures that cannot build the translator
> permanently rejected even when their compat layout already matches the
> native layout. Let those architectures use the native parser instead.
> 
> Gate the translator requirement on COMPAT_FOR_U64_ALIGNMENT instead of
> on in_compat_syscall() alone. Gating on the ABI property rather than on
> CONFIG_XFRM_USER_COMPAT is deliberate: on x86 with IA32_EMULATION=y but
> XFRM_USER_COMPAT=n, a 32-bit task must still be rejected rather than
> routed through the native parser, which would misread genuinely
> 4-byte-aligned x86-32 messages. COMPAT_FOR_U64_ALIGNMENT is the ABI
> property that makes the XFRM translator mandatory.
> 
> Only the receive/input direction needs the guard. The send, dump and
> notification paths already call the translator as "if (xtr) { ... }"
> with no error on NULL, so on arches without a translator they no-op and
> the kernel emits native 64-bit-layout messages, which is what an AArch32
> task expects.
> 
> Tested on Juniper SRX hardware: with the fix, 32-bit IPsec userspace
> netlink and XFRM_POLICY setsockopt operations that previously failed
> with -EOPNOTSUPP now succeed; x86 behaviour is unchanged by inspection.
> 
> Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
> Fixes: 96392ee5a13b ("xfrm/compat: Translate 32-bit user_policy from sockptr")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sanman Pradhan <psanman@juniper.net>

Patch applied, thanks a lot!

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] net: dsa: realtek: rtl8365mb: add SGMII support for RTL8367S
From: Maxime Chevallier @ 2026-06-16  5:55 UTC (permalink / raw)
  To: Johan Alvarado, linusw, alsi, andrew, olteanv, davem, edumazet,
	kuba, pabeni, netdev
  Cc: linux, namiltd, luizluca, linux-kernel
In-Reply-To: <0100019ecd045b3f-a7bfbb9d-6659-45c6-8fca-9cdce637092c-000000@email.amazonses.com>

Hi Johan,

On 6/15/26 22:41, Johan Alvarado wrote:
>> This comment implies that you could deal with SGMII aneg at some point.
>> [...] makes me wonder if this whole SGMII/2500BaseX series should be
>> represented as a PCS phylink driver.
> 
> Hi Maxime,
> 
> You're right, and I'll convert the SerDes path to a phylink_pcs for v4.
> It splits the MAC and SerDes layers cleanly, drops the "ext then sds"
> branches in mac_link_up/down, and makes future in-band aneg an additive
> change instead of a rewrite.

great !

> 
> One point I'd like to confirm on scope: I can only test the forced-link
> path on my MR80X (fixed-link / conventional PHY), and I have no setup to
> exercise SGMII in-band autonegotiation. My plan is to do the PCS refactor
> keeping the link forced (outband / no in-band AN), and leave actual
> in-band aneg support for a follow-up once I have hardware to validate it.
> Does limiting v4 to the forced path sound acceptable, or would you prefer
> in-band aneg implemented up front? I'd rather not add a code path I can't
> test.

That's fine by me, it's usually better to have something smaller but fully
tested :)

> 
> I'll also reword the misleading "disable in-band aneg" comment.
> 
> net-next being closed until the 29th gives me time to do this properly,
> so v4 will carry the PCS conversion, retested on the MR80X v2.20.

Thanks,

Maxime

> 
> Best regards,
> Johan


^ permalink raw reply

* Re: [PATCH stable 6.6.y v3 1/4] bpf: Track equal scalars history on per-instruction level
From: Shung-Hsi Yu @ 2026-06-16  5:51 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: bpf, netdev, linux-kernel, ast, daniel, john.fastabend, andrii,
	martin.lau, song, yonghong.song, kpsingh, haoluo, jolsa,
	menglong8.dong, eddyz87, stable, mykolal, tamird, Hao Sun
In-Reply-To: <7f27d335fa6280d5eb04e7b27a7e3d7e7ac1d641.1781194510.git.jt26wzz@gmail.com>

On Mon, Jun 15, 2026 at 12:58:38AM +0800, Zhenzhong Wu wrote:
[...]
> +/* For all R being scalar registers or spilled scalar registers
> + * in verifier state, save R in linked_regs if R->id == id.
> + * If there are too many Rs sharing same id, reset id for leftover Rs.
> + */
> +static void collect_linked_regs(struct bpf_verifier_state *vstate, u32 id,
> +				struct linked_regs *linked_regs)
> +{
> +	struct bpf_func_state *func;
>  	struct bpf_reg_state *reg;
> +	int i, j;
>  
> -	bpf_for_each_reg_in_vstate(vstate, state, reg, ({
> -		if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) {
> +	for (i = vstate->curframe; i >= 0; i--) {
> +		func = vstate->frame[i];
> +		for (j = 0; j < BPF_REG_FP; j++) {
> +			reg = &func->regs[j];
> +			__collect_linked_regs(linked_regs, reg, id, i, j, true);
> +		}
> +		for (j = 0; j < func->allocated_stack / BPF_REG_SIZE; j++) {
> +			if (!is_spilled_reg(&func->stack[j]))
> +				continue;
> +			reg = &func->stack[j].spilled_ptr;
> +			__collect_linked_regs(linked_regs, reg, id, i, j, false);
> +		}
> +	}
> +
> +	if (linked_regs->cnt == 1)
> +		linked_regs->cnt = 0;

This part seems new, not found on the original commit, and also not in
bpf-next. Can you add some more explaining (in the notes before your
signed-off-by) regarding why this is needed?

> +}
[...]
> @@ -14704,6 +14899,21 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>  		return 0;
>  	}
>  
> +	/* Push scalar registers sharing same ID to jump history,
> +	 * do this before creating 'other_branch', so that both
> +	 * 'this_branch' and 'other_branch' share this history
> +	 * if parent state is created.
> +	 */
> +	if (BPF_SRC(insn->code) == BPF_X && src_reg->type == SCALAR_VALUE && src_reg->id)
> +		collect_linked_regs(this_branch, src_reg->id, &linked_regs);
> +	if (dst_reg->type == SCALAR_VALUE && dst_reg->id)
> +		collect_linked_regs(this_branch, dst_reg->id, &linked_regs);
> +	if (linked_regs.cnt > 0) {

Same here, the original commit and bpf-next has the '> 1' conditional,
where as your has '> 0'. Can you also added some explanation on this
part?

> +		err = push_jmp_history(env, this_branch, 0, linked_regs_pack(&linked_regs));
> +		if (err)
> +			return err;
> +	}
> +
...

^ permalink raw reply

* [PATCH bpf-next 1/2] bpf: Guard conntrack opts error writes
From: Yiyang Chen @ 2026-06-16  5:42 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <cover.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

The conntrack lookup and allocation kfuncs take an opts pointer
together with an opts__sz argument. The verifier checks only the memory
range described by opts__sz, but the wrappers unconditionally write
opts->error whenever the internal lookup or allocation helper returns an
error.

For an invalid size smaller than the end of opts->error, that write can
land outside the verifier-checked range. Keep returning NULL for invalid
arguments, but only report the error through opts->error when the
supplied size includes the field.

This preserves error reporting for the supported 12-byte and 16-byte
layouts, and for other invalid sizes that still include opts->error.

Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF")
Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT")
Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>
---
 net/netfilter/nf_conntrack_bpf.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 40c261cd0af38..3c182024ec509 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -65,6 +65,11 @@ enum {
 	NF_BPF_CT_OPTS_SZ = 16,
 };
 
+static bool bpf_ct_opts_has_error(u32 opts_len)
+{
+	return opts_len >= offsetofend(struct bpf_ct_opts, error);
+}
+
 static int bpf_nf_ct_tuple_parse(struct bpf_sock_tuple *bpf_tuple,
 				 u32 tuple_len, u8 protonum, u8 dir,
 				 struct nf_conntrack_tuple *tuple)
@@ -298,7 +303,8 @@ bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
 	nfct = __bpf_nf_ct_alloc_entry(dev_net(ctx->rxq->dev), bpf_tuple, tuple__sz,
 				       opts, opts__sz, 10);
 	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
+		if (bpf_ct_opts_has_error(opts__sz))
+			opts->error = PTR_ERR(nfct);
 		return NULL;
 	}
 
@@ -332,7 +338,8 @@ bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple,
 	caller_net = dev_net(ctx->rxq->dev);
 	nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
 	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
+		if (bpf_ct_opts_has_error(opts__sz))
+			opts->error = PTR_ERR(nfct);
 		return NULL;
 	}
 	return nfct;
@@ -364,7 +371,8 @@ bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
 	net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
 	nfct = __bpf_nf_ct_alloc_entry(net, bpf_tuple, tuple__sz, opts, opts__sz, 10);
 	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
+		if (bpf_ct_opts_has_error(opts__sz))
+			opts->error = PTR_ERR(nfct);
 		return NULL;
 	}
 
@@ -398,7 +406,8 @@ bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple,
 	caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
 	nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz);
 	if (IS_ERR(nfct)) {
-		opts->error = PTR_ERR(nfct);
+		if (bpf_ct_opts_has_error(opts__sz))
+			opts->error = PTR_ERR(nfct);
 		return NULL;
 	}
 	return nfct;
-- 
2.34.1


^ permalink raw reply related

* [PATCH bpf-next 2/2] selftests/bpf: Cover small conntrack opts error writes
From: Yiyang Chen @ 2026-06-16  5:42 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <cover.1781586477.git.chenyy23@mails.tsinghua.edu.cn>

Add a conntrack kfunc regression check for opts__sz values that do not
cover opts->error. The BPF program initializes opts->error with a guard
value, calls the lookup and allocation kfuncs with opts__sz set to
sizeof(opts->netns_id), and verifies that the guard is still intact
after the kfunc returns NULL.

Without the conntrack wrapper guard, the kfunc error path overwrites
that guard with -EINVAL even though the verifier checked only the first
four bytes of the options object.

Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn>
---
 .../testing/selftests/bpf/prog_tests/bpf_nf.c |  6 +++++
 .../testing/selftests/bpf/progs/test_bpf_nf.c | 26 +++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
index b33dba4b126e2..14d4c1793aed5 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
@@ -5,6 +5,8 @@
 #include "test_bpf_nf.skel.h"
 #include "test_bpf_nf_fail.skel.h"
 
+#define CT_OPTS_ERROR_GUARD 0x12345678
+
 static char log_buf[1024 * 1024];
 
 struct {
@@ -119,6 +121,10 @@ static void test_bpf_nf_ct(int mode)
 	ASSERT_EQ(skel->bss->test_einval_reserved_new, -EINVAL, "Test EINVAL for reserved in new struct not set to 0");
 	ASSERT_EQ(skel->bss->test_einval_netns_id, -EINVAL, "Test EINVAL for netns_id < -1");
 	ASSERT_EQ(skel->bss->test_einval_len_opts, -EINVAL, "Test EINVAL for len__opts != NF_BPF_CT_OPTS_SZ");
+	ASSERT_EQ(skel->bss->test_einval_len_opts_small_lookup, CT_OPTS_ERROR_GUARD,
+		  "Test no error write for lookup opts__sz before error field");
+	ASSERT_EQ(skel->bss->test_einval_len_opts_small_alloc, CT_OPTS_ERROR_GUARD,
+		  "Test no error write for alloc opts__sz before error field");
 	ASSERT_EQ(skel->bss->test_eproto_l4proto, -EPROTO, "Test EPROTO for l4proto != TCP or UDP");
 	ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id");
 	ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup");
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf.c b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
index 076fbf03a1268..df43649ecb785 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_nf.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
@@ -10,6 +10,8 @@
 #define EINVAL 22
 #define ENOENT 2
 
+#define CT_OPTS_ERROR_GUARD 0x12345678
+
 #define NF_CT_ZONE_DIR_ORIG (1 << IP_CT_DIR_ORIGINAL)
 #define NF_CT_ZONE_DIR_REPL (1 << IP_CT_DIR_REPLY)
 
@@ -19,6 +21,8 @@ int test_einval_reserved = 0;
 int test_einval_reserved_new = 0;
 int test_einval_netns_id = 0;
 int test_einval_len_opts = 0;
+int test_einval_len_opts_small_lookup = 0;
+int test_einval_len_opts_small_alloc = 0;
 int test_eproto_l4proto = 0;
 int test_enonet_netns_id = 0;
 int test_enoent_lookup = 0;
@@ -124,6 +128,28 @@ nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,
 	else
 		test_einval_len_opts = opts_def.error;
 
+	opts_def.error = CT_OPTS_ERROR_GUARD;
+	ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+		       sizeof(opts_def.netns_id));
+	if (ct) {
+		bpf_ct_release(ct);
+		test_einval_len_opts_small_lookup = -EINVAL;
+	} else {
+		test_einval_len_opts_small_lookup = opts_def.error;
+	}
+
+	opts_def.error = CT_OPTS_ERROR_GUARD;
+	ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
+		      sizeof(opts_def.netns_id));
+	if (ct) {
+		ct = bpf_ct_insert_entry(ct);
+		if (ct)
+			bpf_ct_release(ct);
+		test_einval_len_opts_small_alloc = -EINVAL;
+	} else {
+		test_einval_len_opts_small_alloc = opts_def.error;
+	}
+
 	opts_def.l4proto = IPPROTO_ICMP;
 	ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def,
 		       sizeof(opts_def));
-- 
2.34.1


^ permalink raw reply related

* [PATCH bpf-next 0/2] bpf: Guard conntrack opts error writes
From: Yiyang Chen @ 2026-06-16  5:42 UTC (permalink / raw)
  To: bpf, netfilter-devel
  Cc: Yiyang Chen, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, andrii, eddyz87, ast, daniel, memxor, martin.lau, song,
	yonghong.song, jolsa, emil, shuah, kartikey406, coreteam, netdev,
	linux-kernel, linux-kselftest

The conntrack lookup/allocation kfuncs expose an opts/opts__sz pair.
The verifier checks the caller-provided opts__sz range, but the wrappers
currently write opts->error after internal errors even when opts__sz is too
small to include that field.

Patch 1 writes opts->error only when opts__sz includes it.
Patch 2 adds a bpf_nf regression check that keeps a guard in opts->error
while passing opts__sz covering only netns_id.

The regression check follows the existing bpf_nf test shape.  Before the
fix, the guard is overwritten with -EINVAL even though opts__sz covers only
the first four bytes of the options object.  After the fix, the kfunc still
returns NULL for the invalid size, but the guard remains intact.

Validation, rebased and tested on bpf-next master e4287bf34f97
("selftests/bpf: Work around llvm stack overflow in crypto progs"):

  git diff --check origin/master..HEAD: OK
  scripts/checkpatch.pl --strict on 1/2 and 2/2: OK
  make O=/root/ebpf-verifier-bug-detection/kernel-build/bpf-next \
    net/netfilter/nf_conntrack_bpf.o: OK
  git am of exported 1/2 and 2/2 on a fresh worktree at base: OK
  range-diff between branch commits and git-am result: equivalent

The local direct clang build of test_bpf_nf.c is blocked by the local
kernel BTF/config: this environment's generated vmlinux.h lacks
struct nf_conn.mark, which is used by pre-existing test_bpf_nf.c code.
The changed kernel object and generated patch application were validated.

Yiyang Chen (2):
  bpf: Guard conntrack opts error writes
  selftests/bpf: Cover small conntrack opts error writes

 net/netfilter/nf_conntrack_bpf.c              | 17 +++++++++---
 .../testing/selftests/bpf/prog_tests/bpf_nf.c |  6 +++++
 .../testing/selftests/bpf/progs/test_bpf_nf.c | 26 +++++++++++++++++++
 3 files changed, 45 insertions(+), 4 deletions(-)

base-commit: e4287bf34f97a88c7d9322f5bde828724c073a6b
-- 
2.34.1

^ permalink raw reply

* Re: [PATCH] swiotlb: avoid double copy with swiotlb on tx socket
From: kernel test robot @ 2026-06-16  5:31 UTC (permalink / raw)
  To: Luigi Rizzo, rizzo.unipi, m.szyprowski, robin.murphy, willemb,
	kuniyu, davem, edumazet, kuba, pabeni
  Cc: llvm, oe-kbuild-all, gregkh, rafael, akpm, david, netdev,
	linux-mm, iommu, driver-core, linux-kernel
In-Reply-To: <20260615234220.3946885-1-lrizzo@google.com>

Hi Luigi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v7.1 next-20260615]
[cannot apply to driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Luigi-Rizzo/swiotlb-avoid-double-copy-with-swiotlb-on-tx-socket/20260616-074655
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260615234220.3946885-1-lrizzo%40google.com
patch subject: [PATCH] swiotlb: avoid double copy with swiotlb on tx socket
config: powerpc-pmac32_defconfig (https://download.01.org/0day-ci/archive/20260616/202606161322.zGyw68Qa-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project e19d1f51a2c80b63cd8ca95bcc757b7077112808)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260616/202606161322.zGyw68Qa-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606161322.zGyw68Qa-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: net/core/sock.c:3215 function parameter 'sk' not described in '__skb_page_frag_refill'
>> Warning: net/core/sock.c:3215 expecting prototype for skb_page_frag_refill(). Prototype was for __skb_page_frag_refill() instead

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH][net-next] net/mlx5: Remove broken and unused mlx5_query_mtppse()
From: Gal Pressman @ 2026-06-16  5:28 UTC (permalink / raw)
  To: lirongqing, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-rdma, linux-kernel
In-Reply-To: <20260615140406.1828-1-lirongqing@baidu.com>

On 15/06/2026 17:04, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> mlx5_query_mtppse() reads the Event Trigger Pin (MTPPSE) register but
> reads the returned arm and mode values from the input buffer 'in'
> instead of the output buffer 'out', so it always returns the values
> that were written rather than the actual hardware state, making the
> query useless.
> 
> The function has no in-tree callers. Remove it rather than fix it.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>

Reviewed-by: Gal Pressman <gal@nvidia.com>

^ permalink raw reply

* Re: [PATCH stable 6.6.y v3 0/4] bpf: linked scalar precision fixes
From: Shung-Hsi Yu @ 2026-06-16  5:22 UTC (permalink / raw)
  To: Zhenzhong Wu
  Cc: Sasha Levin, Paul Chaignon, bpf, netdev, linux-kernel, ast,
	daniel, john.fastabend, andrii, martin.lau, song, yonghong.song,
	kpsingh, haoluo, jolsa, menglong8.dong, eddyz87, stable, mykolal,
	tamird
In-Reply-To: <ajCB9jXBzPyaDNSQ@mail.gmail.com>

On Tue, Jun 16, 2026 at 12:51:34AM +0200, Paul Chaignon wrote:
> On Mon, Jun 15, 2026 at 12:58:37AM +0800, Zhenzhong Wu wrote:
> > Hi,
> > 
> > This v3 targets 6.6.y and changes the backport strategy based on review
> > feedback on v2.
> 
> [...]
> 
> > Relevant QEMU selftest results on 6.6.y with this backport:
> > 
> >   verifier_scalar_ids passed all 18 subtests, including the newly
> >   backported linked-scalar precision tests and the related
> >   check_ids_in_regsafe tests.
> 
> The first patch in this backport series is actually breaking the
> "precise: test 1" selftest from test_verifier. You can see the full
> error at [1]. I haven't yet checked if it's the test or the backport
> that needs to be adjusted.

I had a quick look, and believe it was that test that needs to be
adjusted to include r9 into the precise register set.

So unless Sasha have other preference, I suggest Zhenzhong send a v4,
with changes to tools/testing/selftests/bpf/verifier/precise.c
(including "r9" the the expected verifier output) merged into "bpf:
Track equal scalars history on per-instruction level".

---

The program under test is:

  00: BPF_MOV64_IMM(BPF_REG_0, 1),
  01: BPF_LD_MAP_FD(BPF_REG_6, 0),
  03: BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
  04: BPF_MOV64_REG(BPF_REG_2, BPF_REG_FP),
  05: BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
  06: BPF_ST_MEM(BPF_DW, BPF_REG_FP, -8, 0),
  07: BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
  08: BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
  09: BPF_EXIT_INSN(),

  10: BPF_MOV64_REG(BPF_REG_9, BPF_REG_0),

  11: BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
  12: BPF_MOV64_REG(BPF_REG_2, BPF_REG_FP),
  13: BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
  14: BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
  15: BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
  16: BPF_EXIT_INSN(),

  17: BPF_MOV64_REG(BPF_REG_8, BPF_REG_0),

  18: BPF_ALU64_REG(BPF_SUB, BPF_REG_9, BPF_REG_8), /* map_value_ptr -= map_value_ptr */
  19: BPF_MOV64_REG(BPF_REG_2, BPF_REG_9),
  20: BPF_JMP_IMM(BPF_JLT, BPF_REG_2, 8, 1),
  21: BPF_EXIT_INSN(),

  22: BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 1), /* R2=scalar(umin=1, umax=8) */
  23: BPF_MOV64_REG(BPF_REG_1, BPF_REG_FP),
  24: BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
  25: BPF_MOV64_IMM(BPF_REG_3, 0),
  26: BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel),
  27: BPF_EXIT_INSN(),

The test was expecting the following line in the verifier log that was
shown during the backtracking start at instruction 26 (call
bpf_probe_read_kernel#113) 

  mark_precise: frame0: regs=r2 stack= before 20: (a5) if r2 < 0x8 goto pc+1
  mark_precise: frame0: parent state regs=r2 stack=: ...
  mark_precise: frame0: last_idx 19 first_idx 10 ...

But after applying the patchset, we now got an additional register r9 in
the precise set:

  mark_precise: frame0: regs=r2 stack= before 20: (a5) if r2 < 0x8 goto pc+1
  mark_precise: frame0: parent state regs=r2,r9 stack=: ....
  mark_precise: frame0: last_idx 19 first_idx 10 ...

The additional r9 in the precise set seems actually correct, this is
because r2 and r9 share the same scalar ID at instruction 20 (before the
link got broken in instruction 21), and hence at that point, both
register should be marked as precise.

---

In upstream the test already has the expected verifier log to include
r9, and hence no failure, but it simply comes from the fact that r2 and
r9 maintain a link even after instruction 22 (r2 += 1).

  commit 98d7ca374ba4b39e7535613d40e159f09ca14da2
  Author: Alexei Starovoitov <ast@kernel.org>
  Date:   Wed Jun 12 18:38:13 2024 -0700
  
      bpf: Track delta between "linked" registers.
  ...
  --- a/tools/testing/selftests/bpf/verifier/precise.c
  +++ b/tools/testing/selftests/bpf/verifier/precise.c
  @@ -39,12 +39,12 @@
   	.result = VERBOSE_ACCEPT,
   	.errstr =
   	"mark_precise: frame0: last_idx 26 first_idx 20\
  -	mark_precise: frame0: regs=r2 stack= before 25\
  -	mark_precise: frame0: regs=r2 stack= before 24\
  -	mark_precise: frame0: regs=r2 stack= before 23\
  -	mark_precise: frame0: regs=r2 stack= before 22\
  -	mark_precise: frame0: regs=r2 stack= before 20\
  -	mark_precise: frame0: parent state regs=r2 stack=:\
  +	mark_precise: frame0: regs=r2,r9 stack= before 25\
  +	mark_precise: frame0: regs=r2,r9 stack= before 24\
  +	mark_precise: frame0: regs=r2,r9 stack= before 23\
  +	mark_precise: frame0: regs=r2,r9 stack= before 22\
  +	mark_precise: frame0: regs=r2,r9 stack= before 20\
  +	mark_precise: frame0: parent state regs=r2,r9 stack=:\
   	mark_precise: frame0: last_idx 19 first_idx 10\
   	mark_precise: frame0: regs=r2,r9 stack= before 19\
   	mark_precise: frame0: regs=r9 stack= before 18\
  ...

---

Full test log below

  #492/p precise: test 1 FAIL
  Unexpected verifier log!
  EXP: mark_precise: frame0: parent state regs=r2 stack=:
  RES:
  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (b7) r0 = 1                        ; R0_w=1
  1: (18) r6 = 0xffff9eb644619000       ; R6_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  3: (bf) r1 = r6                       ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R6_w=map_ptr(off=0,ks=4,vs=48,imm=0)
  4: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
  5: (07) r2 += -8                      ; R2_w=fp-8
  6: (7a) *(u64 *)(r10 -8) = 0          ; R10=fp0 fp-8_w=00000000
  7: (85) call bpf_map_lookup_elem#1    ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
  8: (55) if r0 != 0x0 goto pc+1        ; R0_w=0
  9: (95) exit
  
  from 8 to 10: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=map_ptr(off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=0000mmmm
  10: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=map_ptr(off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=0000mmmm
  10: (bf) r9 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R9_w=map_value(off=0,ks=4,vs=48,imm=0)
  11: (bf) r1 = r6                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R6=map_ptr(off=0,ks=4,vs=48,imm=0)
  12: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
  13: (07) r2 += -8                     ; R2_w=fp-8
  14: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=2,off=0,ks=4,vs=48,imm=0)
  15: (55) if r0 != 0x0 goto pc+1       ; R0_w=0
  16: (95) exit
  
  from 15 to 17: R0_w=map_value(off=0,ks=4,vs=48,imm=0) R6=map_ptr(off=0,ks=4,vs=48,imm=0) R9_w=map_value(off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=0000mmmm
  17: R0_w=map_value(off=0,ks=4,vs=48,imm=0) R6=map_ptr(off=0,ks=4,vs=48,imm=0) R9_w=map_value(off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8=0000mmmm
  17: (bf) r8 = r0                      ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R8_w=map_value(off=0,ks=4,vs=48,imm=0)
  18: (1f) r9 -= r8                     ; R8_w=map_value(off=0,ks=4,vs=48,imm=0) R9_w=scalar()
  19: (bf) r2 = r9                      ; R2=scalar(id=3) R9=scalar(id=3)
  20: (a5) if r2 < 0x8 goto pc+1        ; R2=scalar(id=3,umin=8)
  21: (95) exit
  
  from 20 to 22: R0=map_value(off=0,ks=4,vs=48,imm=0) R2=scalar(id=3,umax=7,var_off=(0x0; 0x7)) R6=map_ptr(off=0,ks=4,vs=48,imm=0) R8=map_value(off=0,ks=4,vs=48,imm=0) R9=scalar(id=3,umax=7,var_off=(0x0; 0x7)) R10=fp0 fp-8=0000mmmm
  22: R0=map_value(off=0,ks=4,vs=48,imm=0) R2=scalar(id=3,umax=7,var_off=(0x0; 0x7)) R6=map_ptr(off=0,ks=4,vs=48,imm=0) R8=map_value(off=0,ks=4,vs=48,imm=0) R9=scalar(id=3,umax=7,var_off=(0x0; 0x7)) R10=fp0 fp-8=0000mmmm
  22: (07) r2 += 1                      ; R2_w=scalar(umin=1,umax=8,var_off=(0x0; 0xf))
  23: (bf) r1 = r10                     ; R1_w=fp0 R10=fp0
  24: (07) r1 += -8                     ; R1_w=fp-8
  25: (b7) r3 = 0                       ; R3_w=0
  26: (85) call bpf_probe_read_kernel#113
  mark_precise: frame0: last_idx 26 first_idx 20 subseq_idx -1
  mark_precise: frame0: regs=r2 stack= before 25: (b7) r3 = 0
  mark_precise: frame0: regs=r2 stack= before 24: (07) r1 += -8
  mark_precise: frame0: regs=r2 stack= before 23: (bf) r1 = r10
  mark_precise: frame0: regs=r2 stack= before 22: (07) r2 += 1
  mark_precise: frame0: regs=r2 stack= before 20: (a5) if r2 < 0x8 goto pc+1
  mark_precise: frame0: parent state regs=r2,r9 stack=:  R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R2_rw=Pscalar(id=3) R6=map_ptr(off=0,ks=4,vs=48,imm=0) R8_w=map_value(off=0,ks=4,vs=48,imm=0) R9_w=Pscalar(id=3) R10=fp0 fp-8_r=0000mmmm
  mark_precise: frame0: last_idx 19 first_idx 10 subseq_idx 20
  mark_precise: frame0: regs=r2,r9 stack= before 19: (bf) r2 = r9
  mark_precise: frame0: regs=r9 stack= before 18: (1f) r9 -= r8
  mark_precise: frame0: regs=r8,r9 stack= before 17: (bf) r8 = r0
  mark_precise: frame0: regs=r0,r9 stack= before 15: (55) if r0 != 0x0 goto pc+1
  mark_precise: frame0: regs=r0,r9 stack= before 14: (85) call bpf_map_lookup_elem#1
  mark_precise: frame0: regs=r9 stack= before 13: (07) r2 += -8
  mark_precise: frame0: regs=r9 stack= before 12: (bf) r2 = r10
  mark_precise: frame0: regs=r9 stack= before 11: (bf) r1 = r6
  mark_precise: frame0: regs=r9 stack= before 10: (bf) r9 = r0
  mark_precise: frame0: parent state regs= stack=:  R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R6_rw=map_ptr(off=0,ks=4,vs=48,imm=0) R10=fp0 fp-8_rw=0000mmmm
  27: R0_w=scalar()
  27: (95) exit
  processed 27 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1

[...]

^ permalink raw reply

* Re: [PATCH net v4] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-16  5:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260616003903-mutt-send-email-mst@kernel.org>

On Mon, Jun 15, 2026 at 9:40 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 15, 2026 at 09:28:37PM -0700, Xiang Mei wrote:
> > receive_big() bounds the device-announced length by
> > (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> > add_recvbuf_big() sets sg[1] to start at offset
> > sizeof(struct padded_vnet_hdr) into the first page, so the chain
> > actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> > big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> > check allows for the common hdr_len == 12 case.
> >
> > A malicious virtio backend can announce a len in that gap.  page_to_skb()
> > then walks one frag past the page chain, storing a NULL page->private
> > into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> > write past the static frag array and a NULL frag handed up the rx path.
> >
> > Bound len by the size add_recvbuf_big() actually advertised.
> >
> > Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
> > ---
> > v4: use easy to understand math to compute the max_len
> > v3: revoke 2/2 and add Xuan Zhuo's Reviewed-by tag
>
> I still feel 2/2 is good defence in depth but it can be
> pursued separately.
Thanks, Michael. I'll leave 2/2 out of this series.
Appreciate the review.

Xiang
>
> > v2: add additiona check as 2/2
> >
> >  drivers/net/virtio_net.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f4adcfee7a80..8f4562316aaa 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1999,15 +1999,16 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >                                  struct virtnet_rq_stats *stats)
> >  {
> >       struct page *page = buf;
> > +     unsigned long max_len = (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE -
> > +                             sizeof(struct padded_vnet_hdr) + vi->hdr_len;
> >       struct sk_buff *skb;
> >
> >       /* Make sure that len does not exceed the size allocated in
> >        * add_recvbuf_big.
> >        */
> > -     if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> > +     if (unlikely(len > max_len)) {
> >               pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> > -                      dev->name, len,
> > -                      (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> > +                      dev->name, len, max_len);
> >               goto err;
> >       }
> >
> > --
> > 2.43.0
>

^ permalink raw reply

* Re: [PATCH net v4] virtio-net: fix len check in receive_big()
From: Michael S. Tsirkin @ 2026-06-16  4:39 UTC (permalink / raw)
  To: Xiang Mei
  Cc: jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260616042837.2249468-1-xmei5@asu.edu>

On Mon, Jun 15, 2026 at 09:28:37PM -0700, Xiang Mei wrote:
> receive_big() bounds the device-announced length by
> (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> add_recvbuf_big() sets sg[1] to start at offset
> sizeof(struct padded_vnet_hdr) into the first page, so the chain
> actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> check allows for the common hdr_len == 12 case.
> 
> A malicious virtio backend can announce a len in that gap.  page_to_skb()
> then walks one frag past the page chain, storing a NULL page->private
> into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> write past the static frag array and a NULL frag handed up the rx path.
> 
> Bound len by the size add_recvbuf_big() actually advertised.
> 
> Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> v4: use easy to understand math to compute the max_len
> v3: revoke 2/2 and add Xuan Zhuo's Reviewed-by tag

I still feel 2/2 is good defence in depth but it can be
pursued separately.

> v2: add additiona check as 2/2
> 
>  drivers/net/virtio_net.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f4adcfee7a80..8f4562316aaa 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1999,15 +1999,16 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  				   struct virtnet_rq_stats *stats)
>  {
>  	struct page *page = buf;
> +	unsigned long max_len = (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE -
> +				sizeof(struct padded_vnet_hdr) + vi->hdr_len;
>  	struct sk_buff *skb;
>  
>  	/* Make sure that len does not exceed the size allocated in
>  	 * add_recvbuf_big.
>  	 */
> -	if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> +	if (unlikely(len > max_len)) {
>  		pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> -			 dev->name, len,
> -			 (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> +			 dev->name, len, max_len);
>  		goto err;
>  	}
>  
> -- 
> 2.43.0


^ permalink raw reply

* Re: [PATCH net v3] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-16  4:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260614152904-mutt-send-email-mst@kernel.org>

On Sun, Jun 14, 2026 at 12:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Sat, Jun 13, 2026 at 01:15:02PM -0700, Xiang Mei wrote:
> > On Wed, Jun 10, 2026 at 10:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jun 10, 2026 at 07:46:16PM -0700, Xiang Mei wrote:
> > > > receive_big() bounds the device-announced length by
> > > > (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> > > > add_recvbuf_big() sets sg[1] to start at offset
> > > > sizeof(struct padded_vnet_hdr) into the first page, so the chain
> > > > actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> > > > big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> > > > check allows for the common hdr_len == 12 case.
> > > >
> > > > A malicious virtio backend can announce a len in that gap.  page_to_skb()
> > > > then walks one frag past the page chain, storing a NULL page->private
> > > > into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> > > > write past the static frag array and a NULL frag handed up the rx path.
> > > >
> > > > Bound len by the size add_recvbuf_big() actually advertised.
> > > >
> > > > Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> > > > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > > > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > > > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >
> > > Thanks for the patch! Something small to improve:
> > >
> > > > ---
> > > > v3: revoke 2/2 and add Xuan Zhuo's Reviewed-by tag
> > > >
> > > >  drivers/net/virtio_net.c | 8 +++++---
> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index f4adcfee7a80..afe73eda1491 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
> > > >                                  struct virtnet_rq_stats *stats)
> > > >  {
> > > >       struct page *page = buf;
> > > > +     unsigned long max_len;
> > >
> > > Assignment can happen here?
> > >
> > > >       struct sk_buff *skb;
> > > >
> > > >       /* Make sure that len does not exceed the size allocated in
> > > >        * add_recvbuf_big.
> > > >        */
> > > > -     if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> > > > +     max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> > > > +               vi->big_packets_num_skbfrags * PAGE_SIZE;
> > >
> > > Took me a while to figure out what is going on, but I finally
> > > understand:
> > >
> > >
> > > Reducing
> > > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE
> > >
> > > (what we allocated)
> > >
> > > by sizeof(struct padded_vnet_hdr) - vi->hdr_len
> > >
> > >
> > > right?
> > >
> > > So clearer as:
> > >
> > >
> > >         unsigned long max_len = (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE -
> > >         sizeof(struct padded_vnet_hdr) + vi->hdr_len;
> > >
> > Right, that's the same value. Yours reads better!
> >
> > I'll fold this into the next respin. One thing I'd like to settle
> > first: David suggested storing this in a vi field computed once at the
> > probe (it's a per-device constant) and just comparing len against it
> > on the datapath, instead of re-deriving it in receive_big() each time.
> > I'll wait for his take on that and send a single v4 that covers both.
> >
> > Xiang
>
> I don't mind.
Thanks, Michael,

V4 has been sent.

Xiang
>
> > >
> > >
> > >
> > > > +     if (unlikely(len > max_len)) {
> > > >               pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> > > > -                      dev->name, len,
> > > > -                      (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> > > > +                      dev->name, len, max_len);
> > > >               goto err;
> > > >       }
> > > >
> > > > --
> > > > 2.43.0
> > >
>

^ permalink raw reply

* Re: [PATCH v2 2/3] net: stmmac: fix l3l4 filter rejecting unsupported offload requests
From: Nazle Asmade, Muhammad Nazim Amirul @ 2026-06-16  4:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	rmk+kernel@armlinux.org.uk, maxime.chevallier@bootlin.com,
	Jose.Abreu@synopsys.com, linux-kernel@vger.kernel.org
In-Reply-To: <20260609174318.3e34e62b@kernel.org>

On 10/6/2026 8:43 am, Jakub Kicinski wrote:
> On Fri,  5 Jun 2026 02:01:13 -0700
> muhammad.nazim.amirul.nazle.asmade@altera.com wrote:
>> +    /* Both network proto and transport proto not present in the key */
>> +    if (!match.mask || !(match.mask->n_proto || match.mask->ip_proto))
>> +            return -EOPNOTSUPP;
>> +
>> +    /* If the proto is present in the key and is not full mask */
>> +    if ((match.mask->n_proto && match.mask->n_proto != ETHER_TYPE_FULL_MASK) ||
>> +        (match.mask->ip_proto && match.mask->ip_proto != IP_PROTO_FULL_MASK))
>> +            return -EOPNOTSUPP;
>> +
>> +    /* Network proto is present in the key and is not IPv4 */
>> +    if (match.mask->n_proto && match.key->n_proto != cpu_to_be16(ETH_P_IP))
>> +            return -EOPNOTSUPP;
>> +
>> +    /* Transport proto is present in the key and is not TCP or UDP */
>> +    if (match.mask->ip_proto &&
>> +        match.key->ip_proto != IPPROTO_TCP &&
>> +        match.key->ip_proto != IPPROTO_UDP)
>> +            return -EOPNOTSUPP;
>
> Please add extack messages to let user know which part of the match is
> unsupported. Extack pointer is somewhere inside struct flow_cls_offload
>
> FWIW Sashiko points out a bunch of other potential issues, not sure if
> they matter
> https://sashiko.dev/#/patchset/20260605090114.16028-2-muhammad.nazim.amirul.nazle.asmade@altera.com
Hi Jakub,
Thank you for the review! After going through the Shashiko review, there
are some that valid and related. It was quite helpful! Here are the
changes in v3 of patch 2/3:
1. Added extack messages to all four -EOPNOTSUPP return paths in
    tc_add_basic_flow() so users know exactly which part of the match
    is unsupported.
2. Fixed a bug in v2: using break when -EOPNOTSUPP was returned
    silently discarded the error on FLOW_CLS_REPLACE - entry->in_use
    is already true on replace, so tc_add_flow() would fall through
    and return 0 (success) for an unsupported rule. Fixed by returning
    ret directly instead.
Patches 1/3 and 3/3 are unchanged.

Posted patch set with the updates above.

https://lore.kernel.org/all/20260616042655.7782-1-muhammad.nazim.amirul.nazle.asmade@altera.com/


BR,
Nazim

^ permalink raw reply

* [PATCH net v4] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-16  4:28 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo, eperezma
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs,
	Xiang Mei

receive_big() bounds the device-announced length by
(big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
add_recvbuf_big() sets sg[1] to start at offset
sizeof(struct padded_vnet_hdr) into the first page, so the chain
actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
check allows for the common hdr_len == 12 case.

A malicious virtio backend can announce a len in that gap.  page_to_skb()
then walks one frag past the page chain, storing a NULL page->private
into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
write past the static frag array and a NULL frag handed up the rx path.

Bound len by the size add_recvbuf_big() actually advertised.

Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
v4: use easy to understand math to compute the max_len
v3: revoke 2/2 and add Xuan Zhuo's Reviewed-by tag
v2: add additiona check as 2/2

 drivers/net/virtio_net.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f4adcfee7a80..8f4562316aaa 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1999,15 +1999,16 @@ static struct sk_buff *receive_big(struct net_device *dev,
 				   struct virtnet_rq_stats *stats)
 {
 	struct page *page = buf;
+	unsigned long max_len = (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE -
+				sizeof(struct padded_vnet_hdr) + vi->hdr_len;
 	struct sk_buff *skb;

 	/* Make sure that len does not exceed the size allocated in
 	 * add_recvbuf_big.
 	 */
-	if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
+	if (unlikely(len > max_len)) {
 		pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
-			 dev->name, len,
-			 (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
+			 dev->name, len, max_len);
 		goto err;
 	}

-- 
2.43.0

^ permalink raw reply related

* [PATCH v2 3/3] net: stmmac: reset residual action in L3L4 filters on delete
From: muhammad.nazim.amirul.nazle.asmade @ 2026-06-16  4:26 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
	maxime.chevallier, Jose.Abreu, linux-kernel
In-Reply-To: <20260616042655.7782-1-muhammad.nazim.amirul.nazle.asmade@altera.com>

From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>

When deleting an L3/L4 flower filter entry, the action field is not
reset. If a filter was previously configured with a drop action, that
action may persist and affect subsequent filter configurations
unintentionally.

Clear the action field when the filter entry is deleted.

Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower")
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
index 869f84756ca5..4f9758eeb86f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
@@ -653,6 +653,7 @@ static int tc_del_flow(struct stmmac_priv *priv,
 	entry->in_use = false;
 	entry->cookie = 0;
 	entry->is_l4 = false;
+	entry->action = 0;
 	return ret;
 }
 
-- 
2.43.7


^ permalink raw reply related

* [PATCH v3 2/3] net: stmmac: fix l3l4 filter rejecting unsupported offload requests
From: muhammad.nazim.amirul.nazle.asmade @ 2026-06-16  4:26 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
	maxime.chevallier, Jose.Abreu, linux-kernel
In-Reply-To: <20260616042655.7782-1-muhammad.nazim.amirul.nazle.asmade@altera.com>

From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>

The basic flow parser in tc_add_basic_flow() does not validate match
keys before proceeding. Unsupported offload configurations such as
partial protocol masks, non-IPv4 network proto, or non-TCP/UDP transport
proto are silently accepted instead of returning -EOPNOTSUPP.

Add validation to return -EOPNOTSUPP early for:
- No network or transport proto present in the key
- Partial protocol mask (only full mask supported)
- Network proto is not IPv4
- Transport proto is not TCP or UDP

Each rejection includes an extack message so the user knows which part
of the match is unsupported.

Also propagate -EOPNOTSUPP from tc_add_basic_flow() in tc_add_flow()
by returning it directly rather than using break. The break was silently
discarding the error for FLOW_CLS_REPLACE operations where entry->in_use
is already true, causing tc_add_flow() to return 0 (success) for
unsupported replace requests.

Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower")
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
---
Changes in v3:
- Add extack messages to each -EOPNOTSUPP return so users know which
  part of the match is unsupported (Jakub Kicinski)
- Return -EOPNOTSUPP directly instead of break to avoid silently
  reporting success on unsupported FLOW_CLS_REPLACE (Sashiko review)
- Patches 1/3 and 3/3 are unchanged from v2

Changes in v2:
- No changes

---
 .../net/ethernet/stmicro/stmmac/stmmac_tc.c   | 34 +++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
index d78652718599..14cabe76e53e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
@@ -446,6 +446,7 @@ static int tc_parse_flow_actions(struct stmmac_priv *priv,
 }
 
 #define ETHER_TYPE_FULL_MASK	cpu_to_be16(~0)
+#define IP_PROTO_FULL_MASK	0xFF
 
 static int tc_add_basic_flow(struct stmmac_priv *priv,
 			     struct flow_cls_offload *cls,
@@ -461,6 +462,33 @@ static int tc_add_basic_flow(struct stmmac_priv *priv,
 
 	flow_rule_match_basic(rule, &match);
 
+	/* Both network proto and transport proto not present in the key */
+	if (!match.mask || !(match.mask->n_proto || match.mask->ip_proto)) {
+		NL_SET_ERR_MSG_MOD(cls->common.extack,
+				   "filter must specify network or transport protocol");
+		return -EOPNOTSUPP;
+	}
+
+	/* If the proto is present in the key and is not full mask */
+	if ((match.mask->n_proto && match.mask->n_proto != ETHER_TYPE_FULL_MASK) ||
+	    (match.mask->ip_proto && match.mask->ip_proto != IP_PROTO_FULL_MASK)) {
+		NL_SET_ERR_MSG_MOD(cls->common.extack,
+				   "only full protocol mask is supported");
+		return -EOPNOTSUPP;
+	}
+
+	/* Network proto is present in the key and is not IPv4 */
+	if (match.mask->n_proto && match.key->n_proto != cpu_to_be16(ETH_P_IP)) {
+		NL_SET_ERR_MSG_MOD(cls->common.extack,
+				   "only IPv4 network protocol is supported");
+		return -EOPNOTSUPP;
+	}
+
+	/* Transport proto is present in the key and is not TCP or UDP */
+	if (match.mask->ip_proto &&
+	    match.key->ip_proto != IPPROTO_TCP &&
+	    match.key->ip_proto != IPPROTO_UDP) {
+		NL_SET_ERR_MSG_MOD(cls->common.extack,
+				   "only TCP and UDP transport protocols are supported");
+		return -EOPNOTSUPP;
+	}
+
 	entry->ip_proto = match.key->ip_proto;
 	return 0;
 }
@@ -598,11 +626,7 @@ static int tc_add_flow(struct stmmac_priv *priv,
 		ret = tc_flow_parsers[i].fn(priv, cls, entry);
 		if (!ret)
 			entry->in_use = true;
-		else if (ret == -EOPNOTSUPP)
-			/* The basic flow parser will return EOPNOTSUPP, if a
-			 * requested offload not fully supported by the hw. And
-			 * in that case fail early.
-			 */
-			break;
+		else if (ret == -EOPNOTSUPP)
+			return ret;
 	}
 
 	if (!entry->in_use)
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 1/3] net: stmmac: xgmac: fix l4 filter port overwrite on register update
From: muhammad.nazim.amirul.nazle.asmade @ 2026-06-16  4:26 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, rmk+kernel,
	maxime.chevallier, Jose.Abreu, linux-kernel
In-Reply-To: <20260616042655.7782-1-muhammad.nazim.amirul.nazle.asmade@altera.com>

From: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>

The XGMAC_L4_ADDR register holds both source and destination port
match values. The current implementation overwrites the entire register
when configuring either port, so setting one silently erases the other.

Fix this by reading the register first, then masking and updating only
the relevant field before writing back.

Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower")
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com>
---
 .../ethernet/stmicro/stmmac/dwxgmac2_core.c   | 28 +++++++++++--------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
index f02b434bbd50..52054f31376d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
@@ -1370,36 +1370,40 @@ static int dwxgmac2_config_l4_filter(struct mac_device_info *hw, u32 filter_no,
 		value &= ~XGMAC_L4PEN0;
 	}
 
-	value &= ~(XGMAC_L4SPM0 | XGMAC_L4SPIM0);
-	value &= ~(XGMAC_L4DPM0 | XGMAC_L4DPIM0);
 	if (sa) {
 		value |= XGMAC_L4SPM0;
 		if (inv)
 			value |= XGMAC_L4SPIM0;
+		else
+			value &= ~XGMAC_L4SPIM0;
 	} else {
 		value |= XGMAC_L4DPM0;
 		if (inv)
 			value |= XGMAC_L4DPIM0;
+		else
+			value &= ~XGMAC_L4DPIM0;
 	}
 
 	ret = dwxgmac2_filter_write(hw, filter_no, XGMAC_L3L4_CTRL, value);
 	if (ret)
 		return ret;
 
-	if (sa) {
-		value = FIELD_PREP(XGMAC_L4SP0, match);
+	ret = dwxgmac2_filter_read(hw, filter_no, XGMAC_L4_ADDR, &value);
+	if (ret)
+		return ret;
 
-		ret = dwxgmac2_filter_write(hw, filter_no, XGMAC_L4_ADDR, value);
-		if (ret)
-			return ret;
+	if (sa) {
+		value &= ~XGMAC_L4SP0;
+		value |= FIELD_PREP(XGMAC_L4SP0, match);
 	} else {
-		value = FIELD_PREP(XGMAC_L4DP0, match);
-
-		ret = dwxgmac2_filter_write(hw, filter_no, XGMAC_L4_ADDR, value);
-		if (ret)
-			return ret;
+		value &= ~XGMAC_L4DP0;
+		value |= FIELD_PREP(XGMAC_L4DP0, match);
 	}
 
+	ret = dwxgmac2_filter_write(hw, filter_no, XGMAC_L4_ADDR, value);
+	if (ret)
+		return ret;
+
 	if (!en)
 		return dwxgmac2_filter_write(hw, filter_no, XGMAC_L3L4_CTRL, 0);
 
-- 
2.43.7


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox