Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next v1 2/3] net: motorcomm: phy: set drive strength in 8531s RGMII case
From: Andrew Lunn @ 2026-04-15 14:42 UTC (permalink / raw)
  To: Minda Chen
  Cc: Frank, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260415092654.64907-3-minda.chen@starfivetech.com>

On Wed, Apr 15, 2026 at 05:26:53PM +0800, Minda Chen wrote:
> Set RXD and RX CLK pin drive strength while in 8531s RGMII
> case.
> 
> Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
> ---
>  drivers/net/phy/motorcomm.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/phy/motorcomm.c b/drivers/net/phy/motorcomm.c
> index 35aff1519b4b..f3129419f7c9 100644
> --- a/drivers/net/phy/motorcomm.c
> +++ b/drivers/net/phy/motorcomm.c
> @@ -1714,6 +1714,11 @@ static int yt8521_config_init(struct phy_device *phydev)
>  		if (ret < 0)
>  			goto err_restore_page;
>  	}
> +
> +	if (phydev->drv->phy_id == PHY_ID_YT8531S &&
> +	    phydev->interface != PHY_INTERFACE_MODE_SGMII)
> +		ret = yt8531_set_ds(phydev, true);

phy_interface_is_rgmii().


^ permalink raw reply

* Re: [PATCH net 1/1] 8021q: free cleared egress QoS mappings safely
From: Simon Horman @ 2026-04-15 14:47 UTC (permalink / raw)
  To: n05ec
  Cc: 'Simon Horman', netdev, andrew+netdev, davem, edumazet,
	kuba, pabeni, kees, yifanwucs, tomapufckgml, yuantan098, bird,
	ylong030
In-Reply-To: <b877895cd02d35254b5c05d3c40abbf130cd87eb.1776039122.git.ylong030@ucr.edu>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
8021q: free cleared egress QoS mappings safely

This commit converts the 802.1Q egress priority mapping lists to use
RCU-protected pointers instead of keeping tombstones. It deletes mappings
when vlan_prio is cleared, deferring the freeing of removed nodes until
after an RCU grace period.

> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
> index a000b1ef05206..bbe7cbd97939d 100644
> --- a/net/8021q/vlan_netlink.c
> +++ b/net/8021q/vlan_netlink.c
> @@ -260,13 +260,10 @@ static int vlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
>  			goto nla_put_failure;
>  
>  		for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) {
> -			for (pm = vlan->egress_priority_map[i]; pm;
> -			     pm = pm->next) {
> -				if (!pm->vlan_qos)
> -					continue;
> -
> +			for (pm = rtnl_dereference(vlan->egress_priority_map[i]); pm;
> +			     pm = rtnl_dereference(pm->next)) {

Is it safe to use rtnl_dereference() for this traversal?

The vlan_fill_info() function implements the rtnl_link_ops->fill_info
callback, which is invoked by rtnl_dump_ifinfo() to service interface
netlink dumps. Modern netlink dumps can operate locklessly under
rcu_read_lock() without acquiring the RTNL lock.

Using rtnl_dereference() expands to rcu_dereference_protected(), which
might trigger a lockdep warning when the RTNL lock is not held.

Additionally, it omits the memory barriers needed for safe RCU reader
traversal. Without these barriers, concurrent updates from
vlan_dev_set_egress_priority() could cause the lockless reader to observe
stale or uninitialized list nodes.

Would using rcu_dereference_rtnl() or rcu_dereference() here be more
appropriate to ensure safe lockless traversal during netlink dumps?

>  				m.from = pm->priority;
> -				m.to   = (pm->vlan_qos >> 13) & 0x7;
> +				m.to   = (READ_ONCE(pm->vlan_qos) >> 13) & 0x7;
>  				if (nla_put(skb, IFLA_VLAN_QOS_MAPPING,
>  					    sizeof(m), &m))
>  					goto nla_put_failure;

^ permalink raw reply

* Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance
From: Andrew Lunn @ 2026-04-15 14:53 UTC (permalink / raw)
  To: wenzhaoliao
  Cc: hkallweit1, fujita.tomonori, linux, tmgross, ojeda, netdev,
	rust-for-linux
In-Reply-To: <AN6AigCwKNz*0oYAjaS2aKqr.3.1776262694321.Hmail.2023000929@ruc.edu.cn>

> - paged register access is open-coded and does not robustly propagate or
>   restore errors;
> - several vendor sequences use magic page/register values with no documented
>   rationale in the driver;
> - there are unconditional resets and fixed `mdelay`/`msleep` delays without a
>   clear completion check or justification;
> - debugging uses raw `printk()` calls;
> - some helper return values are ignored, and `ret |= ...` is not a good fit
>   for mainline-style error handling;
> - the MMD / EEE handling looks narrowly special-cased and would need to be
>   re-checked against phylib conventions and proper documentation.

Nice, you spotted many of the issues in that code. That gives me a
better feeling, you have some understanding of Ethernet PHYs.

> At the same time, we should also be explicit that we do not currently have
> MAE0621A hardware in hand, nor sufficient public documentation to claim that
> it is already a well-grounded first target. Our current local setup is useful
> for Rust-for-Linux build/tooling validation and limited non-hardware checks,
> but not for real hardware-backed PHY validation.

My personal experience is that anything which is not tested is
broken. For a driver to be merged, it needs to be tested on real
hardware.

Can you get one of the amlogic boards? TrustOnX Player (TOX3)? Radxa
A5E? I've no idea how easy it is to get Mainline running on these
boards.

	Andrew

^ permalink raw reply

* Re:Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance
From: wenzhaoliao @ 2026-04-15 15:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: hkallweit1, fujita.tomonori, linux, tmgross, ojeda, netdev,
	rust-for-linux
In-Reply-To: <f57bb151-bc84-4c67-ae43-8901b09de884@lunn.ch>

Hello Andrew,

Thank you, this is very helpful.

We agree with your point that for a driver to be merged, it needs to be tested
on real hardware. So we will not push the driver-RFC direction further without
getting hardware into the loop first.

We will now look into obtaining one of the suggested boards and check what it
takes to get a near-mainline or mainline kernel running on it. Based on that,
we will decide whether this is a realistic first target before investing more
time in the Rust driver work itself.

At the moment, Radxa A5E looks like the more concrete option from our side,
mainly because there seems to be visible public discussion around MAE0621A on
that board. But we are still checking the practical side of board availability
and software support.

If you have a preference between TrustOnX Player (TOX3) and Radxa A5E as a
first board to try, that would be very helpful. Otherwise, we will investigate
the A5E path first and come back once we have a clearer hardware/testing plan.

Thank you again for the guidance.

Best regards,
Liao Wenzhao

发件人：Andrew Lunn <andrew@lunn.ch>
发送日期：2026-04-15 22:53:17
收件人：wenzhaoliao <wenzhaoliao@ruc.edu.cn>
抄送人：hkallweit1@gmail.com,fujita.tomonori@gmail.com,linux@armlinux.org.uk,tmgross@umich.edu,ojeda@kernel.org,netdev@vger.kernel.org,rust-for-linux@vger.kernel.org
主题：Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance>> - paged register access is open-coded and does not robustly propagate or
>>   restore errors;
>> - several vendor sequences use magic page/register values with no documented
>>   rationale in the driver;
>> - there are unconditional resets and fixed `mdelay`/`msleep` delays without a
>>   clear completion check or justification;
>> - debugging uses raw `printk()` calls;
>> - some helper return values are ignored, and `ret |= ...` is not a good fit
>>   for mainline-style error handling;
>> - the MMD / EEE handling looks narrowly special-cased and would need to be
>>   re-checked against phylib conventions and proper documentation.
>
>Nice, you spotted many of the issues in that code. That gives me a
>better feeling, you have some understanding of Ethernet PHYs.
>
>> At the same time, we should also be explicit that we do not currently have
>> MAE0621A hardware in hand, nor sufficient public documentation to claim that
>> it is already a well-grounded first target. Our current local setup is useful
>> for Rust-for-Linux build/tooling validation and limited non-hardware checks,
>> but not for real hardware-backed PHY validation.
>
>My personal experience is that anything which is not tested is
>broken. For a driver to be merged, it needs to be tested on real
>hardware.
>
>Can you get one of the amlogic boards? TrustOnX Player (TOX3)? Radxa
>A5E? I've no idea how easy it is to get Mainline running on these
>boards.
>
>	Andrew
>

^ permalink raw reply

* [PATCH net v4 0/3] vsock/virtio: fix MSG_PEEK calculation on bytes to copy
From: Luigi Leonardi @ 2026-04-15 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi

`virtio_transport_stream_do_peek`, when calculating the number of bytes to
copy, didn't consider the `offset`, caused by partial reads that happened
before.
This might cause out-of-bounds read that lead to an EFAULT.
More details in the commits.

Commit 1 introduces the fix
Commit 2 introduces some preliminary work for adding a test and fixes a
problem in existing tests.
Commit 3 introduces a test that checks for this bug to avoid future
regressions.

For disclosure: this bug was found initially by claude opus 4.6, I then analyzed
it and worked on the fix and the test.

Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
Changes in v4:
- Picked up RoB
- Increased sleep time from 10 us to 10 ms
- Minor changes to commit messages and comments as suggested by Stefano.
- Link to v3: https://lore.kernel.org/r/20260414-fix_peek-v3-0-e7daead49f83@redhat.com

Changes in v3:
- Addressed reviwers omment
    - Dropped test client, reusing the one already existing
    - Minor changes: added comment, improved commit messages
    - Rebased to latest net-next
- Link to v2: https://lore.kernel.org/r/20260407-fix_peek-v2-0-2e2581dc8b7c@redhat.com

Changes in v2:
- Addressed reviewers comment
    - Test now uses the recv_buf utils.
    - Removed unnecessary barrier
    - Checkpatch warnings.
- Added new commit that allows to use recv_buf with MSG_PEEK
- Picked up RoBs
- Link to v1: https://lore.kernel.org/r/20260402-fix_peek-v1-0-ad274fcef77b@redhat.com

---
Luigi Leonardi (3):
      vsock/virtio: fix MSG_PEEK ignoring skb offset when calculating bytes to copy
      vsock/test: fix MSG_PEEK handling in recv_buf()
      vsock/test: add MSG_PEEK after partial recv test

 net/vmw_vsock/virtio_transport_common.c |  5 ++--
 tools/testing/vsock/util.c              | 15 ++++++++++
 tools/testing/vsock/vsock_test.c        | 50 +++++++++++++++++++++++++--------
 3 files changed, 55 insertions(+), 15 deletions(-)
---
base-commit: 35c2c39832e569449b9192fa1afbbc4c66227af7
change-id: 20260401-fix_peek-6837b83469e3

Best regards,
-- 
Luigi Leonardi <leonardi@redhat.com>


^ permalink raw reply

* [PATCH net v4 1/3] vsock/virtio: fix MSG_PEEK ignoring skb offset when calculating bytes to copy
From: Luigi Leonardi @ 2026-04-15 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260415-fix_peek-v4-0-8207e872759e@redhat.com>

`virtio_transport_stream_do_peek()` does not account for the skb offset
when computing the number of bytes to copy.

This means that, after a partial recv() that advances the offset, a peek
requesting more bytes than are available in the sk_buff causes
`skb_copy_datagram_iter()` to go past the valid payload, resulting in
a -EFAULT.

The dequeue path already handles this correctly.
Apply the same logic to the peek path.

Fixes: 0df7cd3c13e4 ("vsock/virtio/vhost: read data from non-linear skb")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a152a9e208d0..b5015ab2ee1e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -545,9 +545,8 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk,
 	skb_queue_walk(&vvs->rx_queue, skb) {
 		size_t bytes;
 
-		bytes = len - total;
-		if (bytes > skb->len)
-			bytes = skb->len;
+		bytes = min_t(size_t, len - total,
+			      skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset);
 
 		spin_unlock_bh(&vvs->rx_lock);
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH net v4 2/3] vsock/test: fix MSG_PEEK handling in recv_buf()
From: Luigi Leonardi @ 2026-04-15 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260415-fix_peek-v4-0-8207e872759e@redhat.com>

`recv_buf` does not handle the MSG_PEEK flag correctly: it keeps calling
`recv` until all requested bytes are available or an error occurs.

The problem is how it calculates the number of bytes read: MSG_PEEK
doesn't consume any bytes and will re-read the same bytes from the buffer
head, so summing the return value every time is wrong.

Moreover, MSG_PEEK doesn't consume the bytes in the buffer, so if more
bytes are requested than are available, the loop will never terminate,
because `recv` will never return EOF. For this reason, we need to compare
the number of bytes read with the number of bytes expected.

Add a check: if the MSG_PEEK flag is present, update the byte counter and
break out of the loop only after at least the expected number of bytes
have been received; otherwise, retry after a short delay to avoid
consuming too many CPU cycles.

This allows us to simplify the `test_stream_credit_update_test` by
reusing `recv_buf`, like some other tests already do.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 tools/testing/vsock/util.c       | 15 +++++++++++++++
 tools/testing/vsock/vsock_test.c | 13 +------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 1fe1338c79cd..fe316b02a590 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -381,7 +381,13 @@ void send_buf(int fd, const void *buf, size_t len, int flags,
 	}
 }

+#define RECV_PEEK_RETRY_USEC (10 * 1000)
+
 /* Receive bytes in a buffer and check the return value.
+ *
+ * When MSG_PEEK is set, recv() is retried until it returns at least
+ * expected_ret bytes. The function returns on error, EOF, or timeout
+ * as usual.
  *
  * expected_ret:
  *  <0 Negative errno (for testing errors)
@@ -403,6 +409,15 @@ void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret)
 		if (ret <= 0)
 			break;

+		if (flags & MSG_PEEK) {
+			if (ret >= expected_ret) {
+				nread = ret;
+				break;
+			}
+			timeout_usleep(RECV_PEEK_RETRY_USEC);
+			continue;
+		}
+
 		nread += ret;
 	} while (nread < len);
 	timeout_end();
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 5bd20ccd9335..bdb0754965df 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -1500,18 +1500,7 @@ static void test_stream_credit_update_test(const struct test_opts *opts,
 	}

 	/* Wait until there will be 128KB of data in rx queue. */
-	while (1) {
-		ssize_t res;
-
-		res = recv(fd, buf, buf_size, MSG_PEEK);
-		if (res == buf_size)
-			break;
-
-		if (res <= 0) {
-			fprintf(stderr, "unexpected 'recv()' return: %zi\n", res);
-			exit(EXIT_FAILURE);
-		}
-	}
+	recv_buf(fd, buf, buf_size, MSG_PEEK, buf_size);

 	/* There is 128KB of data in the socket's rx queue, dequeue first
 	 * 64KB, credit update is sent if 'low_rx_bytes_test' == true.

-- 
2.53.0

^ permalink raw reply related

* [PATCH net v4 3/3] vsock/test: add MSG_PEEK after partial recv test
From: Luigi Leonardi @ 2026-04-15 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260415-fix_peek-v4-0-8207e872759e@redhat.com>

Add a test that verifies MSG_PEEK works correctly after a partial
recv().

This is to test a bug that was present in the
`virtio_transport_stream_do_peek()` when computing the number of bytes to
copy: After a partial read, the peek function didn't take into
consideration the number of bytes that were already read. So peeking the
whole buffer would cause an out-of-bounds read, that resulted in a -EFAULT.

This test does exactly this: do a partial recv on a buffer, then try to
peek the whole buffer content. The test re-uses
`test_stream_msg_peek_client()` to also cover this scenario.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 tools/testing/vsock/vsock_test.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index bdb0754965df..76be0e4a7f0e 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -346,6 +346,38 @@ static void test_stream_msg_peek_server(const struct test_opts *opts)
 	return test_msg_peek_server(opts, false);
 }
 
+static void test_stream_peek_after_recv_server(const struct test_opts *opts)
+{
+	unsigned char buf_normal[MSG_PEEK_BUF_LEN];
+	unsigned char buf_peek[MSG_PEEK_BUF_LEN];
+	int fd;
+
+	fd = vsock_stream_accept(VMADDR_CID_ANY, opts->peer_port, NULL);
+	if (fd < 0) {
+		perror("accept");
+		exit(EXIT_FAILURE);
+	}
+
+	control_writeln("SRVREADY");
+
+	/* Partial recv to advance offset within the skb */
+	recv_buf(fd, buf_normal, 1, 0, 1);
+
+	/* Peek with a buffer larger than the remaining data */
+	recv_buf(fd, buf_peek, sizeof(buf_peek), MSG_PEEK, sizeof(buf_peek) - 1);
+
+	/* Consume the remaining data */
+	recv_buf(fd, buf_normal, sizeof(buf_normal) - 1, 0, sizeof(buf_normal) - 1);
+
+	/* Compare full peek and normal read. */
+	if (memcmp(buf_peek, buf_normal, sizeof(buf_peek) - 1)) {
+		fprintf(stderr, "Full peek data mismatch\n");
+		exit(EXIT_FAILURE);
+	}
+
+	close(fd);
+}
+
 #define SOCK_BUF_SIZE (2 * 1024 * 1024)
 #define SOCK_BUF_SIZE_SMALL (64 * 1024)
 #define MAX_MSG_PAGES 4
@@ -2509,6 +2541,11 @@ static struct test_case test_cases[] = {
 		.run_client = test_stream_tx_credit_bounds_client,
 		.run_server = test_stream_tx_credit_bounds_server,
 	},
+	{
+		.name = "SOCK_STREAM MSG_PEEK after partial recv",
+		.run_client = test_stream_msg_peek_client,
+		.run_server = test_stream_peek_after_recv_server,
+	},
 	{},
 };
 

-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net 1/1] 8021q: free cleared egress QoS mappings safely
From: Simon Horman @ 2026-04-15 15:15 UTC (permalink / raw)
  To: Ren Wei
  Cc: netdev, andrew+netdev, davem, edumazet, kuba, pabeni, kees,
	yifanwucs, tomapufckgml, yuantan098, bird, ylong030
In-Reply-To: <b877895cd02d35254b5c05d3c40abbf130cd87eb.1776039122.git.ylong030@ucr.edu>

On Mon, Apr 13, 2026 at 05:07:20PM +0800, Ren Wei wrote:
> From: Longxuan Yu <ylong030@ucr.edu>
> 
> vlan_dev_set_egress_priority() leaves cleared egress priority mapping
> nodes in the hash until device teardown. Repeated set/clear cycles with
> distinct skb priorities therefore allocate an unbounded number of
> vlan_priority_tci_mapping objects and leak memory.
> 
> Delete mappings when vlan_prio is cleared instead of keeping
> tombstones. The TX fast path and reporting paths walk the lists without
> RTNL, so convert the egress mapping lists to RCU-protected pointers and
> defer freeing removed nodes until after a grace period.
> 
> Cc: stable@kernel.org
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
> Suggested-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
>  include/linux/if_vlan.h  | 23 +++++++++++--------
>  net/8021q/vlan_dev.c     | 48 +++++++++++++++++++++++-----------------
>  net/8021q/vlan_netlink.c |  9 +++-----
>  net/8021q/vlanproc.c     | 12 ++++++----
>  4 files changed, 53 insertions(+), 39 deletions(-)

There is a lot of change here. And I'd suggest splitting the patch up into
(at least) two patches:

1. Convert mappings to use RCU
2. Fix bug

As is, the bug fix itself is difficult to isolate amongst the other changes.

Also, AI generated review suggests that this bug was introduced by commit
b020cb488586 ("[VLAN]: Keep track of number of QoS mappings"). If so,
it would be appropriate to use that commit in the Fixes tag.

-- 
pw-bot: changes-requested

^ permalink raw reply

* [PATCH net v2 0/2] bnge fixes
From: Vikas Gupta @ 2026-04-15 15:16 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta

Hi,
 This series fix two issues.

 Patch-1: 
    Due to wrong HWRM sequence, driver do not get the correct
    information regarding resources and capabilities.
    The patch fixes the initial HWRM sequence.
Patch-2:
    Remove the unsupported backing store type initialization, which is
    not supported in Thor Ultra devices.

Thanks,
Vikas

v1->v2: 
   Include Fixes tags.


Vikas Gupta (2):
  bnge: fix initial HWRM sequence
  bnge: remove unsupported backing store type

 .../net/ethernet/broadcom/bnge/bnge_core.c    | 39 ++++++++++---------
 .../net/ethernet/broadcom/bnge/bnge_rmem.c    | 16 --------
 2 files changed, 21 insertions(+), 34 deletions(-)

-- 
2.47.1


^ permalink raw reply

* [PATCH net v2 1/2] bnge: fix initial HWRM sequence
From: Vikas Gupta @ 2026-04-15 15:16 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta
In-Reply-To: <20260415151621.1104956-1-vikas.gupta@broadcom.com>

Firmware may not advertize correct resources if backing store is not
enabled before resource information is queried.
Fix the initial sequence of HWRMs so that driver gets capabilities
and resource information correctly.

Fixes: 3fa9e977a0cd ("bng_en: Initialize default configuration")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rahul Gupta <rahul-rg.gupta@broadcom.com>
---
 .../net/ethernet/broadcom/bnge/bnge_core.c    | 39 ++++++++++---------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_core.c b/drivers/net/ethernet/broadcom/bnge/bnge_core.c
index b4090283df0f..9f6a33b912a6 100644
--- a/drivers/net/ethernet/broadcom/bnge/bnge_core.c
+++ b/drivers/net/ethernet/broadcom/bnge/bnge_core.c
@@ -73,25 +73,35 @@ static int bnge_func_qcaps(struct bnge_dev *bd)
 		return rc;
 	}
 
+	rc = bnge_alloc_ctx_mem(bd);
+	if (rc) {
+		dev_err(bd->dev, "Failed to allocate ctx mem rc: %d\n", rc);
+		goto err_free_ctx_mem;
+	}
+
 	rc = bnge_hwrm_func_resc_qcaps(bd);
 	if (rc) {
 		dev_err(bd->dev, "query resc caps failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	rc = bnge_hwrm_func_qcfg(bd);
 	if (rc) {
 		dev_err(bd->dev, "query config failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	rc = bnge_hwrm_vnic_qcaps(bd);
 	if (rc) {
 		dev_err(bd->dev, "vnic caps failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	return 0;
+
+err_free_ctx_mem:
+	bnge_free_ctx_mem(bd);
+	return rc;
 }
 
 static void bnge_fw_unregister_dev(struct bnge_dev *bd)
@@ -132,32 +142,25 @@ static int bnge_fw_register_dev(struct bnge_dev *bd)
 
 	bnge_hwrm_fw_set_time(bd);
 
-	rc =  bnge_hwrm_func_drv_rgtr(bd);
+	/* Get the resources and configuration from firmware */
+	rc = bnge_func_qcaps(bd);
 	if (rc) {
-		dev_err(bd->dev, "Failed to rgtr with firmware rc: %d\n", rc);
+		dev_err(bd->dev, "Failed initial configuration rc: %d\n", rc);
 		return rc;
 	}
 
-	rc = bnge_alloc_ctx_mem(bd);
+	rc = bnge_hwrm_func_drv_rgtr(bd);
 	if (rc) {
-		dev_err(bd->dev, "Failed to allocate ctx mem rc: %d\n", rc);
-		goto err_func_unrgtr;
-	}
-
-	/* Get the resources and configuration from firmware */
-	rc = bnge_func_qcaps(bd);
-	if (rc) {
-		dev_err(bd->dev, "Failed initial configuration rc: %d\n", rc);
-		rc = -ENODEV;
-		goto err_func_unrgtr;
+		dev_err(bd->dev, "Failed to rgtr with firmware rc: %d\n", rc);
+		goto err_free_ctx_mem;
 	}
 
 	bnge_set_dflt_rss_hash_type(bd);
 
 	return 0;
 
-err_func_unrgtr:
-	bnge_fw_unregister_dev(bd);
+err_free_ctx_mem:
+	bnge_free_ctx_mem(bd);
 	return rc;
 }
 
-- 
2.47.1


^ permalink raw reply related

* [PATCH net v2 2/2] bnge: remove unsupported backing store type
From: Vikas Gupta @ 2026-04-15 15:16 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta
In-Reply-To: <20260415151621.1104956-1-vikas.gupta@broadcom.com>

The backing store type, BNGE_CTX_MRAV, is not applicable in Thor Ultra
devices. Remove it from the backing store configuration, as the firmware
will not populate entities in this backing store type, due to which the
driver load fails.

Fixes: 29c5b358f385 ("bng_en: Add backing store support")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Dharmender Garg <dharmender.garg@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnge/bnge_rmem.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
index 94f15e08a88c..b066ee887a09 100644
--- a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
+++ b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
@@ -324,7 +324,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd)
 	u32 l2_qps, qp1_qps, max_qps;
 	u32 ena, entries_sp, entries;
 	u32 srqs, max_srqs, min;
-	u32 num_mr, num_ah;
 	u32 extra_srqs = 0;
 	u32 extra_qps = 0;
 	u32 fast_qpmd_qps;
@@ -390,21 +389,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd)
 	if (!bnge_is_roce_en(bd))
 		goto skip_rdma;
 
-	ctxm = &ctx->ctx_arr[BNGE_CTX_MRAV];
-	/* 128K extra is needed to accommodate static AH context
-	 * allocation by f/w.
-	 */
-	num_mr = min_t(u32, ctxm->max_entries / 2, 1024 * 256);
-	num_ah = min_t(u32, num_mr, 1024 * 128);
-	ctxm->split_entry_cnt = BNGE_CTX_MRAV_AV_SPLIT_ENTRY + 1;
-	if (!ctxm->mrav_av_entries || ctxm->mrav_av_entries > num_ah)
-		ctxm->mrav_av_entries = num_ah;
-
-	rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, num_mr + num_ah, 2);
-	if (rc)
-		return rc;
-	ena |= FUNC_BACKING_STORE_CFG_REQ_ENABLES_MRAV;
-
 	ctxm = &ctx->ctx_arr[BNGE_CTX_TIM];
 	rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, l2_qps + qp1_qps + extra_qps, 1);
 	if (rc)
-- 
2.47.1


^ permalink raw reply related

* Re: [PATCH v2] Bluetooth: Add Broadcom channel priority commands
From: Luiz Augusto von Dentz @ 2026-04-15 15:19 UTC (permalink / raw)
  To: Sasha Finkelstein
  Cc: Sven Peter, Janne Grunau, Neal Gompa, Marcel Holtmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-kernel, asahi, linux-arm-kernel,
	linux-bluetooth, netdev
In-Reply-To: <CAMT+MTQ6orj5tpiGL9hz8m2TGiBjA-9D_0e1iLt=_dXBFHcOgg@mail.gmail.com>

Hi Sasha,

On Wed, Apr 15, 2026 at 8:34 AM Sasha Finkelstein <fnkl.kernel@gmail.com> wrote:
>
> On Tue, 14 Apr 2026 at 16:00, Luiz Augusto von Dentz
> <luiz.dentz@gmail.com> wrote:
> > > +       if (sock)
> > > +               set_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
> >
> > This is more complicated than it needs to be. I'd just add a new
> > callback, `hdev->set_priority(handle, skb->priority)`, so the driver
> > is called whenever it needs to elevate a connection's priority, that
> > said there could be cases where a connection needs its priority set
> > momentarily to transmit A2DP, followed by OBEX packets that are best
> > effort. Therefore, `hci_conn` will probably need to track the priority
> > so it can detect when it needs changing on a per skb basis.
>
> I have tested per-skb priorities, and unfortunately, this does not work.
> If something tries to send a low-priority packet (for example - a volume
> adjustment), a priority drop causes the same kind of dropout that is
> caused by scans. It appears that the only way to make this hardware work
> is to set the entire hci connection as high priority for as long as it
> is being used to transmit audio.

Ok, then maybe we should decrease the priority, so it can only go up.
That said, in a multiple connection scenario, we cannot really tell
what should be prioritized if we cannot momentarily decrease the
priority.

-- 
Luiz Augusto von Dentz

^ permalink raw reply

* Re: Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance
From: Andrew Lunn @ 2026-04-15 15:30 UTC (permalink / raw)
  To: wenzhaoliao
  Cc: hkallweit1, fujita.tomonori, linux, tmgross, ojeda, netdev,
	rust-for-linux
In-Reply-To: <AF6AxgB-KF5*XoYukJCGZKoD.3.1776265264920.Hmail.2023000929@ruc.edu.cn>

> If you have a preference between TrustOnX Player (TOX3) and Radxa A5E as a
> first board to try, that would be very helpful. Otherwise, we will investigate
> the A5E path first and come back once we have a clearer hardware/testing plan.

Makes no difference to me. Use whatever is simplest for you.

If you want to do more than just convert the C driver to Rust, but add
new features, made see if the LEDs are controlled by the PHY? Is there
an interrupt output from the PHY to the SoC? Does the board support
WoL? These are all features which the Rust binding is missing, and
could be added, if you have a device to test them on, and the needed
register information.

      Andrew

^ permalink raw reply

* Re: [PATCH v2] Bluetooth: Add Broadcom channel priority commands
From: Sasha Finkelstein @ 2026-04-15 15:31 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Sven Peter, Janne Grunau, Neal Gompa, Marcel Holtmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-kernel, asahi, linux-arm-kernel,
	linux-bluetooth, netdev
In-Reply-To: <CABBYNZLNR8hYS9jLLKeB=M9XVvtSFtf1wi4DmcJKBbQVvHTPaw@mail.gmail.com>

On Wed, 15 Apr 2026 at 17:19, Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
> Ok, then maybe we should decrease the priority, so it can only go up.
> That said, in a multiple connection scenario, we cannot really tell
> what should be prioritized if we cannot momentarily decrease the
> priority.

I believe that the priority is only per-connection and is not designed
to be used per-packet. On Android they change priority when an
A2DP stream starts or stops, by sending the commands from
userspace and are accepting that other things using the same hci
connection will also have high priority.

^ permalink raw reply

* Re: [PATCH] net: sched: teql: fix use-after-free in teql_master_xmit
From: Simon Horman @ 2026-04-15 15:31 UTC (permalink / raw)
  To: hxzene
  Cc: 'Simon Horman', jhs, jiri, davem, edumazet, kuba, pabeni,
	netdev, linux-kernel
In-Reply-To: <20260413094448.2263828-1-hxzene@gmail.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
net: sched: teql: fix use-after-free in teql_master_xmit

This patch attempts to fix a use-after-free in teql_master_xmit() by
saving master->slaves to a local head variable during teql_destroy(),
and by adding READ_ONCE()/WRITE_ONCE() to datapath accesses.

> diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
> index ec4039a201a2c..2e86397a5219c 100644
> --- a/net/sched/sch_teql.c
> +++ b/net/sched/sch_teql.c
> @@ -101,7 +101,7 @@ teql_dequeue(struct Qdisc *sch)
>  	if (skb == NULL) {
>  		struct net_device *m = qdisc_dev(q);
>  		if (m) {
> -			dat->m->slaves = sch;
> +			WRITE_ONCE(dat->m->slaves, sch);

Can this lockless publication write a soon-to-be-freed pointer to
master->slaves?

If teql_dequeue() concurrently executes this write while teql_destroy()
is unlinking sch and scheduling it for deletion via call_rcu(), sch could
be published globally just as it is being destroyed.

When the RCU grace period ends, sch would be freed but still globally
referenced. The WRITE_ONCE() prevents store-tearing but does not prevent
this race condition.

>  		}
>  	} else {
[ ... ]
> @@ -136,19 +136,23 @@ teql_destroy(struct Qdisc *sch)
>  	if (!master)
>  		return;
>  
> -	prev = master->slaves;
> +	prev = READ_ONCE(master->slaves);
>  	if (prev) {
> +		struct Qdisc *head = prev;
> +
>  		do {
>  			q = NEXT_SLAVE(prev);
>  			if (q == sch) {
>  				NEXT_SLAVE(prev) = NEXT_SLAVE(q);
> -				if (q == master->slaves) {
> -					master->slaves = NEXT_SLAVE(q);
> -					if (q == master->slaves) {
> +				if (q == head) {

Could changing this condition introduce a use-after-free if master->slaves
is concurrently advanced?

If the datapath (teql_master_xmit() or teql_dequeue()) advances
master->slaves to point to sch after head was snapshot, head will not
equal sch.

When teql_destroy() unlinks sch, the q == head check would evaluate
to false, meaning master->slaves is not updated. This would leave
master->slaves pointing to the unlinked sch, which will then be freed.

> +					WRITE_ONCE(master->slaves,
> +						   NEXT_SLAVE(q));
[ ... ]
> @@ -317,7 +321,7 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
>  				    netdev_start_xmit(skb, slave, slave_txq, false) ==
>  				    NETDEV_TX_OK) {
>  					__netif_tx_unlock(slave_txq);
> -					master->slaves = NEXT_SLAVE(q);
> +					WRITE_ONCE(master->slaves, NEXT_SLAVE(q));

Similarly to teql_dequeue() above, does this risk publishing a dying qdisc
globally?

If NEXT_SLAVE(q) is read before teql_destroy() unlinks it, but the
WRITE_ONCE() executes immediately after the unlink, it appears the
unlinked qdisc would be written to the global master->slaves pointer.

[ ... ]

While looking at the concurrency issues addressed by this patch, it appears
teql_qdisc_init() might also be affected by concurrent updates to
m->slaves:

net/sched/sch_teql.c:teql_qdisc_init() {
    ...
		q->next = NEXT_SLAVE(m->slaves);
		NEXT_SLAVE(m->slaves) = sch;
    ...
}

Because NEXT_SLAVE(m->slaves) evaluates m->slaves twice, if m->slaves
is concurrently advanced between these two statements, could this break
the list linkage and cause lost elements or infinite loops?

^ permalink raw reply

* Re: [PATCH net v4] ipvs: fix MTU check for GSO packets in tunnel mode
From: Julian Anastasov @ 2026-04-15 15:35 UTC (permalink / raw)
  To: Yingnan Zhang
  Cc: pablo, coreteam, davem, edumazet, fw, horms, kuba, linux-kernel,
	lvs-devel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <tencent_7F7B107ECA750C095D05C19C3B723AFFA60A@qq.com>


	Hello,

On Wed, 15 Apr 2026, Yingnan Zhang wrote:

> Currently, IPVS skips MTU checks for GSO packets by excluding them with
> the !skb_is_gso(skb) condition. This creates problems when IPVS tunnel
> mode encapsulates GSO packets with IPIP headers.
> 
> The issue manifests in two ways:
> 
> 1. MTU violation after encapsulation:
>    When a GSO packet passes through IPVS tunnel mode, the original MTU
>    check is bypassed. After adding the IPIP tunnel header, the packet
>    size may exceed the outgoing interface MTU, leading to unexpected
>    fragmentation at the IP layer.
> 
> 2. Fragmentation with problematic IP IDs:
>    When net.ipv4.vs.pmtu_disc=1 and a GSO packet with multiple segments
>    is fragmented after encapsulation, each segment gets a sequentially
>    incremented IP ID (0, 1, 2, ...). This happens because:
> 
>    a) The GSO packet bypasses MTU check and gets encapsulated
>    b) At __ip_finish_output, the oversized GSO packet is split into
>       separate SKBs (one per segment), with IP IDs incrementing
>    c) Each SKB is then fragmented again based on the actual MTU
> 
>    This sequential IP ID allocation differs from the expected behavior
>    and can cause issues with fragment reassembly and packet tracking.
> 
> Fix this by properly validating GSO packets using
> skb_gso_validate_network_len(). This function correctly validates
> whether the GSO segments will fit within the MTU after segmentation. If
> validation fails, send an ICMP Fragmentation Needed message to enable
> proper PMTU discovery.
> 
> Fixes: 4cdd34084d53 ("netfilter: nf_conntrack_ipv6: improve fragmentation handling")
> Signed-off-by: Yingnan Zhang <342144303@qq.com>

	Looks good to me for the nf tree, thanks!

Acked-by: Julian Anastasov <ja@ssi.bg>

> ---
> v4:
> - Introduce a new helper function ip_vs_exceeds_mtu() to improve readability (reviewer feedback)
> 
> v3: https://lore.kernel.org/netdev/tencent_73010FBD5FA1C05C3BC23A07A50B11CEC90A@qq.com/
> v2: https://lore.kernel.org/netdev/tencent_CA2C1C219C99D315086BE55E8654AF7E6009@qq.com/
> v1: https://lore.kernel.org/netdev/tencent_4A3E1C339C75D359093BE4F08648AFAA6009@qq.com/
> ---
> ---
>  net/netfilter/ipvs/ip_vs_xmit.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index 0fb5162992e5..64dfdf8b00c4 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -102,6 +102,18 @@ __ip_vs_dst_check(struct ip_vs_dest *dest)
>  	return dest_dst;
>  }
>  
> +/* Based on ip_exceeds_mtu(). */
> +static bool ip_vs_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
> +{
> +	if (skb->len <= mtu)
> +		return false;
> +
> +	if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
> +		return false;
> +
> +	return true;
> +}
> +
>  static inline bool
>  __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
>  {
> @@ -112,7 +124,7 @@ __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
>  		if (IP6CB(skb)->frag_max_size > mtu)
>  			return true; /* largest fragment violate MTU */
>  	}
> -	else if (skb->len > mtu && !skb_is_gso(skb)) {
> +	else if (ip_vs_exceeds_mtu(skb, mtu)) {
>  		return true; /* Packet size violate MTU size */
>  	}
>  	return false;
> @@ -232,7 +244,7 @@ static inline bool ensure_mtu_is_adequate(struct netns_ipvs *ipvs, int skb_af,
>  			return true;
>  
>  		if (unlikely(ip_hdr(skb)->frag_off & htons(IP_DF) &&
> -			     skb->len > mtu && !skb_is_gso(skb) &&
> +			     ip_vs_exceeds_mtu(skb, mtu) &&
>  			     !ip_vs_iph_icmp(ipvsh))) {
>  			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
>  				  htonl(mtu));
> -- 
> 2.51.0.windows.1

Regards

--
Julian Anastasov <ja@ssi.bg>


^ permalink raw reply

* Re: [PATCH net v4 2/3] vsock/test: fix MSG_PEEK handling in recv_buf()
From: Stefano Garzarella @ 2026-04-15 15:40 UTC (permalink / raw)
  To: Luigi Leonardi
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Arseniy Krasnov, kvm, virtualization,
	netdev, linux-kernel
In-Reply-To: <20260415-fix_peek-v4-2-8207e872759e@redhat.com>

On Wed, Apr 15, 2026 at 05:09:29PM +0200, Luigi Leonardi wrote:
>`recv_buf` does not handle the MSG_PEEK flag correctly: it keeps calling
>`recv` until all requested bytes are available or an error occurs.
>
>The problem is how it calculates the number of bytes read: MSG_PEEK
>doesn't consume any bytes and will re-read the same bytes from the buffer
>head, so summing the return value every time is wrong.
>
>Moreover, MSG_PEEK doesn't consume the bytes in the buffer, so if more
>bytes are requested than are available, the loop will never terminate,
>because `recv` will never return EOF. For this reason, we need to compare
>the number of bytes read with the number of bytes expected.
>
>Add a check: if the MSG_PEEK flag is present, update the byte counter and
>break out of the loop only after at least the expected number of bytes
>have been received; otherwise, retry after a short delay to avoid
>consuming too many CPU cycles.
>
>This allows us to simplify the `test_stream_credit_update_test` by
>reusing `recv_buf`, like some other tests already do.
>
>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
>---
> tools/testing/vsock/util.c       | 15 +++++++++++++++
> tools/testing/vsock/vsock_test.c | 13 +------------
> 2 files changed, 16 insertions(+), 12 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply

* Re: [PATCH net] net: ax25: fix integer overflow in ax25_rx_fragment()
From: Simon Horman @ 2026-04-15 15:56 UTC (permalink / raw)
  To: mashiro.chen
  Cc: 'Simon Horman', netdev, davem, edumazet, kuba, pabeni,
	jreuter, linux-hams, linux-kernel, stable
In-Reply-To: <20260408172521.281365-1-mashiro.chen@mailbox.org>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
net: ax25: fix integer overflow in ax25_rx_fragment()

This commit addresses an integer overflow in the ax25_cb fragmentation
reassembly accumulator. It adds a check to ensure the combined payload
of fragments does not exceed USHRT_MAX. If the limit is exceeded, the
fragment queue is purged to prevent allocating a tiny buffer that leads
to a panic.

> Every skb_put() call in the copy loop
> that follows writes far beyond the allocated headroom, corrupting
> the kernel heap.

Is this description accurate?

skb_put() explicitly performs strict bounds checking and triggers a
deterministic kernel panic via skb_over_panic() if exceeded, rather than
silently corrupting the heap.

> Fix mirrors the identical bug fixed in NET/ROM (nr_in.c):

This isn't a bug, but is this the correct file reference? The check for
USHRT_MAX actually appears to exist in X.25 (x25_in.c), not NET/ROM.

> diff --git a/net/ax25/ax25_in.c b/net/ax25/ax25_in.c
> index d75b3e9ed93de..68202c19b19e3 100644
> --- a/net/ax25/ax25_in.c
> +++ b/net/ax25/ax25_in.c
> @@ -41,6 +41,11 @@ static int ax25_rx_fragment(ax25_cb *ax25, struct sk_buff *skb)
>  				/* Enqueue fragment */
>  				ax25->fragno = *skb->data & AX25_SEG_REM;
>  				skb_pull(skb, 1);	/* skip fragno */
> +				if ((unsigned int)ax25->fraglen + skb->len > USHRT_MAX) {

Is there a race condition with ax25->fraglen here?

If fragments for the same connection arrive concurrently on different CPUs,
both could evaluate this bounds check as safe and then concurrently add
their lengths.

> +					skb_queue_purge(&ax25->frag_queue);
> +					ax25->fragno = 0;
> +					return 1;

Does this code leak the skb?

When returning 1 here, the caller might skip freeing the skb because it
assumes the packet was successfully consumed or queued.

Could we explicitly call kfree_skb(skb) before returning 1, or return 0 to
instruct the caller to free it?

> +				}
>  				ax25->fraglen += skb->len;
>  				skb_queue_tail(&ax25->frag_queue, skb);

[ ... ]

>  					/* Copy data from the fragments */
>  					while ((skbo = skb_dequeue(&ax25->frag_queue)) != NULL) {
>  						skb_copy_from_linear_data(skbo,
>  							  skb_put(skbn, skbo->len),
>  									  skbo->len);

Could this read past the end of the skbo->data buffer if the fragment is
non-linear?

If the skb has paged data, skbo->len exceeds the linearly mapped data size,
which might trigger a panic or leak adjacent memory into the reassembled
packet.

Should this use skb_copy_bits() instead?

[ ... ]

>  	} else {
>  		/* First fragment received */
>  		if (*skb->data & AX25_SEG_FIRST) {
>  			skb_queue_purge(&ax25->frag_queue);
>  			ax25->fragno = *skb->data & AX25_SEG_REM;
>  			skb_pull(skb, 1);		/* skip fragno */
>  			ax25->fraglen = skb->len;

Is there a possibility of truncating the length here?

Since ax25->fraglen is an unsigned short, if a first fragment arrives with
skb->len > USHRT_MAX, it might be silently truncated.

Subsequent continuation fragments will be validated against this artificially
small fraglen, bypassing the new USHRT_MAX check.

Should the bounds check be applied to the first fragment as well?

>  			skb_queue_tail(&ax25->frag_queue, skb);
>  			return 1;
>  		}
>  	}

^ permalink raw reply

* Re: [PATCH] rose: Fix rose_find_socket() returning without sock_hold()
From: kernel test robot @ 2026-04-15 16:12 UTC (permalink / raw)
  To: Dudu Lu, netdev
  Cc: llvm, oe-kbuild-all, davem, edumazet, kuba, pabeni, Dudu Lu
In-Reply-To: <20260413090420.79932-1-phx0fer@gmail.com>

Hi Dudu,

kernel test robot noticed the following build errors:

[auto build test ERROR on net/main]
[also build test ERROR on net-next/main linus/master horms-ipvs/master v7.0 next-20260414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dudu-Lu/rose-Fix-rose_find_socket-returning-without-sock_hold/20260414-194608
base:   net/main
patch link:    https://lore.kernel.org/r/20260413090420.79932-1-phx0fer%40gmail.com
patch subject: [PATCH] rose: Fix rose_find_socket() returning without sock_hold()
config: i386-randconfig-012-20260415 (https://download.01.org/0day-ci/archive/20260416/202604160039.PLn74vyE-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260416/202604160039.PLn74vyE-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604160039.PLn74vyE-lkp@intel.com/

All errors (new ones prefixed by >>):

>> net/rose/af_rose.c:1:2: error: expected identifier or '('
       1 |         if (s)
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:98:11: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
      98 |                 return (set->sig[3] | set->sig[2] |
         |                         ^        ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:98:25: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
      98 |                 return (set->sig[3] | set->sig[2] |
         |                                       ^        ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:114:11: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     114 |                 return  (set1->sig[3] == set2->sig[3]) &&
         |                          ^         ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:114:27: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     114 |                 return  (set1->sig[3] == set2->sig[3]) &&
         |                                          ^         ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:115:5: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     115 |                         (set1->sig[2] == set2->sig[2]) &&
         |                          ^         ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:115:21: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     115 |                         (set1->sig[2] == set2->sig[2]) &&
         |                                          ^         ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:157:1: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     157 | _SIG_SET_BINOP(sigorsets, _sig_or)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:138:8: note: expanded from macro '_SIG_SET_BINOP'
     138 |                 a3 = a->sig[3]; a2 = a->sig[2];                         \
         |                      ^      ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:157:1: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     157 | _SIG_SET_BINOP(sigorsets, _sig_or)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:138:24: note: expanded from macro '_SIG_SET_BINOP'
     138 |                 a3 = a->sig[3]; a2 = a->sig[2];                         \
         |                                      ^      ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:157:1: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     157 | _SIG_SET_BINOP(sigorsets, _sig_or)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:139:8: note: expanded from macro '_SIG_SET_BINOP'
     139 |                 b3 = b->sig[3]; b2 = b->sig[2];                         \
         |                      ^      ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:157:1: warning: array index 2 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     157 | _SIG_SET_BINOP(sigorsets, _sig_or)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:139:24: note: expanded from macro '_SIG_SET_BINOP'
     139 |                 b3 = b->sig[3]; b2 = b->sig[2];                         \
         |                                      ^      ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from net/rose/af_rose.c:21:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:157:1: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
     157 | _SIG_SET_BINOP(sigorsets, _sig_or)
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/signal.h:140:3: note: expanded from macro '_SIG_SET_BINOP'


vim +1 net/rose/af_rose.c

   > 1		if (s)
     2			sock_hold(s);// SPDX-License-Identifier: GPL-2.0-or-later
     3	/*
     4	 *
     5	 * Copyright (C) Jonathan Naylor G4KLX (g4klx@g4klx.demon.co.uk)
     6	 * Copyright (C) Alan Cox GW4PTS (alan@lxorguk.ukuu.org.uk)
     7	 * Copyright (C) Terry Dawson VK2KTJ (terry@animats.net)
     8	 * Copyright (C) Tomi Manninen OH2BNS (oh2bns@sral.fi)
     9	 */
    10	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH net] ixgbevf: fix use-after-free in VEPA multicast source pruning
From: Simon Horman @ 2026-04-15 16:17 UTC (permalink / raw)
  To: Michael Bommarito
  Cc: intel-wired-lan, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, stable, linux-kernel
In-Reply-To: <20260413182427.298513-1-michael.bommarito@gmail.com>

On Mon, Apr 13, 2026 at 02:24:27PM -0400, Michael Bommarito wrote:
> ixgbevf_clean_rx_irq() prunes frames whose source MAC matches the VF's
> own address (VEPA multicast workaround) by freeing the skb and
> continuing to the next descriptor:
> 
>     dev_kfree_skb_irq(skb);
>     continue;
> 
> The skb pointer is declared outside the while loop and persists across
> iterations.  Because the continue skips the "skb = NULL" reset at the
> bottom of the loop, the next iteration enters the "else if (skb)" path
> and calls ixgbevf_add_rx_frag() on the freed skb, dereferencing
> skb_shinfo(skb)->nr_frags — a use-after-free in NAPI softirq context.
> 
> The sibling driver iavf already handles this correctly by nulling the
> pointer before continuing.  Apply the same pattern here.
> 
> I do not have ixgbevf hardware; the bug was found by static analysis
> (scan_drop_continue_loops.py + semgrep drop_continue_in_loop, multi-tool
> corroboration with the highest score in the scan).  The UAF was confirmed
> under KASAN by loading a test module that reproduces the exact code
> pattern (alloc skb, kfree_skb, then read skb_shinfo(skb)->nr_frags):
> 
>   BUG: KASAN: slab-use-after-free in ixgbevf_uaf_test_init+0x100/0x1000
>   Read of size 8 at addr 000000006163ae78 by task insmod/30
>   freed 208-byte region [000000006163adc0, 000000006163ae90)
> 
> QEMU emulates igb (82576) but not ixgbe (82599), and the igbvf VF
> driver does not include the VEPA source pruning path, so a full
> end-to-end reproduction with emulated hardware was not possible.
> 
> Fixes: bad17234ba70 ("ixgbevf: Change receive model to use double buffered page based receives")
> Cc: stable@vger.kernel.org
> Assisted-by: Claude:claude-opus-4-6
> Assisted-by: Codex:gpt-5-4
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>

Reviewed-by: Simon Horman <horms@kernel.org>

Sashiko flags a number of issues in the same function that
do not seem related to your patch.

I'd suggest looking over them if you are interested in
follow-up work in this area.

...

^ permalink raw reply

* Re: [RFC PATCH net-next 2/2] selftests: net: add FOU multicast encapsulation resubmit test
From: Jakub Kicinski @ 2026-04-15 16:18 UTC (permalink / raw)
  To: Anton Danilov
  Cc: Breno Leitao, netdev, willemdebruijn.kernel, davem, dsahern,
	edumazet, pabeni, horms, shuah, linux-kselftest
In-Reply-To: <ad9hkJXAnlv2ZUm6@gmail.com>

On Wed, 15 Apr 2026 03:25:59 -0700 Breno Leitao wrote:
> On Wed, Apr 15, 2026 at 02:28:06AM +0300, Anton Danilov wrote:
> > +send_fou_gre_packets() {
> > +	local count=$1
> > +
> > +	ip netns exec "$NSENDER" python3 -c "  
> 
> Having Python code embedded directly in the shell function makes this
> difficult to review and maintain. Could you extract the Python script to
> a separate file? This would simplify the code to just:
> 
> 	ip netns exec "$NSENDER" python3 my_python_script.py

Or just rewrite the whole thing in Python (no preference)

^ permalink raw reply

* Re: [PATCH] netfilter: xt_realm: fix null-ptr-deref in realm_mt()
From: Pablo Neira Ayuso @ 2026-04-15 16:21 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kito Xu (veritas501), phil, davem, edumazet, kuba, pabeni, horms,
	jengelh, kaber, netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <ad9d52dQWrS1H_ju@strlen.de>

On Wed, Apr 15, 2026 at 11:44:07AM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Apr 15, 2026 at 11:02:15AM +0200, Florian Westphal wrote:
> > > Kito Xu (veritas501) <hxzene@gmail.com> wrote:
> > > > realm_mt() unconditionally dereferences skb_dst(skb) without a NULL
> > > > check. The xt_realm match registers with .family = NFPROTO_UNSPEC,
> > > > making it available to all netfilter protocol families. Through the
> > > > nftables compat layer (nft_compat), an unprivileged user inside a
> > > > user/net namespace can load this match into a bridge-family chain.
> > > 
> > > I do not think this bug is related to nft_compat.
> > > You can also use ebtables setsockopt api to request xt_realm, no?
> > > 
> > > > Fixes: ab4f21e6fb1c ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions")
> > > 
> > > Looks correct.  Alternatively we could revert the xt_realm.c change.
> > > But I don't have a strong opinion here, patch looks correct.
> > 
> > Maybe partial revert makes sense, since in ab4f21e6fb1c:
> > 
> > - xt_MARK: OK
> > - xt_NOTRACK: OK
> > - xt_comment: OK
> 
> Agree.
> 
> > - xt_mac: There is a better way to do this in bridge.
> 
> Right.
> 
> > - xt_owner, no sockets in bridge.
> 
> Output/postrouting maybe?
> 
> > - xt_physdev, which makes no sense in bridge, this is for br_netfilter
> >   only.
> 
> Agree.
> 
> > - xt_realm (as already mentioned).
> > That is, a partial revert of this patch for:
> > 
> > - xt_mac
> > - xt_owner
> > - xt_physdev
> > - xt_realm
> 
> I'm ok with that too.

For the record, this patch has been replaced by:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20260415113334.61008-1-pablo@netfilter.org/

^ permalink raw reply

* [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: syzbot ci @ 2026-04-15 16:22 UTC (permalink / raw)
  To: nogikh, hawk, linux-kernel, netdev, syzbot, syzkaller-bugs
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260415130533.849053-1-nogikh@google.com>

syzbot ci has tested the suggested fix patch on top of the following series:

[v2] veth: add Byte Queue Limits (BQL) support
https://lore.kernel.org/all/20260413094442.1376022-1-hawk@kernel.org

Patch: https://ci.syzbot.org/jobs/4a19c4e7-8505-49e5-b80f-6107406612b0/patch

The patch testing request could not be completed:
Testing failed due to an infrastructure error.
Testing results:
* [build 0] Build Patched: error

Full report is available here:
https://ci.syzbot.org/session/67022682-86d9-4483-a528-4d95990f8038

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

^ permalink raw reply

* Re: [PATCH net 1/1] 8021q: free cleared egress QoS mappings safely
From: Eric Dumazet @ 2026-04-15 16:25 UTC (permalink / raw)
  To: Ren Wei
  Cc: netdev, andrew+netdev, davem, kuba, pabeni, horms, kees,
	yifanwucs, tomapufckgml, yuantan098, bird, ylong030
In-Reply-To: <b877895cd02d35254b5c05d3c40abbf130cd87eb.1776039122.git.ylong030@ucr.edu>

On Mon, Apr 13, 2026 at 2:08 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
>
> From: Longxuan Yu <ylong030@ucr.edu>
>
> vlan_dev_set_egress_priority() leaves cleared egress priority mapping
> nodes in the hash until device teardown. Repeated set/clear cycles with
> distinct skb priorities therefore allocate an unbounded number of
> vlan_priority_tci_mapping objects and leak memory.
>
> Delete mappings when vlan_prio is cleared instead of keeping
> tombstones. The TX fast path and reporting paths walk the lists without
> RTNL, so convert the egress mapping lists to RCU-protected pointers and
> defer freeing removed nodes until after a grace period.
>
> Cc: stable@kernel.org
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
> Suggested-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
>  include/linux/if_vlan.h  | 23 +++++++++++--------
>  net/8021q/vlan_dev.c     | 48 +++++++++++++++++++++++-----------------
>  net/8021q/vlan_netlink.c |  9 +++-----
>  net/8021q/vlanproc.c     | 12 ++++++----

>
> @@ -604,11 +606,17 @@ void vlan_dev_free_egress_priority(const struct net_device *dev)
>         int i;
>
>         for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) {
> -               while ((pm = vlan->egress_priority_map[i]) != NULL) {
> -                       vlan->egress_priority_map[i] = pm->next;
> -                       kfree(pm);
> +               pm = rtnl_dereference(vlan->egress_priority_map[i]);
> +               RCU_INIT_POINTER(vlan->egress_priority_map[i], NULL);
> +               while (pm) {
> +                       struct vlan_priority_tci_mapping *next;
> +
> +                       next = rtnl_dereference(pm->next);
> +                       kfree_rcu_mightsleep(pm);

Please avoid kfree_rcu_mightsleep().

Embed  instead one rcu_head in the object.

> +                       pm = next;
>                 }
>         }
> +       vlan->nr_egress_mappings = 0;

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox