Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v3 00/12] BIG TCP for UDP tunnels
From: Alice Mikityanska @ 2026-04-15 12:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alice Mikityanska, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Paolo Abeni, Xin Long, Willem de Bruijn, David Ahern,
	Nikolay Aleksandrov, Shuah Khan, Stanislav Fomichev, Andrew Lunn,
	Simon Horman, Florian Westphal, netdev
In-Reply-To: <20260413155552.5cd00bc0@kernel.org>

On Tue, 14 Apr 2026 at 01:55, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 10 Apr 2026 18:09:31 +0300 Alice Mikityanska wrote:
> > This series is a follow-up to "BIG TCP without HBH in IPv6", and it adds
> > support for BIG TCP IPv4/IPv6 workloads in vxlan and geneve. Now that
> > IPv6 BIG TCP doesn't require stripping the HBH in all various
> > combinations in tunneled traffic, adding BIG TCP becomes feasible.
>
> No longer applies, sorry :(

That's a pity :(. I see that the only conflict is because udplite
parts have been removed from net/netfilter/nf_conntrack_proto_udp.c,
so I just need to drop my change that touches udplite.

> We'll have to revisit after the merge window.

OK, I'll resubmit after the merge window. I'd appreciate it if I can
still collect review comments in the meanwhile.

> --
> pw-bot: cr

^ permalink raw reply

* Re: [PATCH net v3 2/3] vsock/test: fix MSG_PEEK handling in recv_buf()
From: Stefano Garzarella @ 2026-04-15 11:54 UTC (permalink / raw)
  To: Luigi Leonardi
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Arseniy Krasnov, kvm, virtualization,
	netdev, linux-kernel
In-Reply-To: <ad9uYrUjgCkW1D_k@sgarzare-redhat>

On Wed, Apr 15, 2026 at 01:31:11PM +0200, Stefano Garzarella wrote:
>On Tue, Apr 14, 2026 at 06:10:22PM +0200, Luigi Leonardi wrote:
>>`recv_buf` does not handle the MSG_PEEK flag correctly: it keeps calling
>>`recv` until all requested bytes are available or an error occurs.
>>
>>The problem is how it calculates the amount of bytes read: MSG_PEEK
>>doesn't consume any bytes, will re-read the same bytes from the buffer
>>head, so, summing the return value every time is wrong.
>>
>>Moreover, MSG_PEEK doesn't consume the bytes in the buffer, so if the
>>requested amount is more than the bytes available, the loop will never
>>terminate, because `recv` will never return EOF. For this reason we need
>>to compare the amount of read bytes with the number of bytes expected.
>>
>>Add a check, and if the MSG_PEEK flag is present, update the counter of
>>read bytes differently, and break if we read the expected amount.
>
>nit: "..., update the counter for bytes read only after all expected
>bytes have been read and break out of the loop; otherwise, try again
>after a short delay to avoid consuming too many CPU cycles."
>
>>
>>This allows us to simplify the `test_stream_credit_update_test`, by
>>reusing `recv_buf`, like some other tests already do.
>>
>>This also fixes callers that pass MSG_PEEK to recv_buf().
>
>nit: this is implicit from the first part of the description.
>
>>
>>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>>Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
>>---
>>tools/testing/vsock/util.c       | 15 +++++++++++++++
>>tools/testing/vsock/vsock_test.c | 13 +------------
>>2 files changed, 16 insertions(+), 12 deletions(-)
>>
>>diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>>index 1fe1338c79cd..2c9ee3210090 100644
>>--- a/tools/testing/vsock/util.c
>>+++ b/tools/testing/vsock/util.c
>>@@ -381,7 +381,13 @@ void send_buf(int fd, const void *buf, size_t len, int flags,
>>	}
>>}
>>
>>+#define RECV_PEEK_RETRY_USEC 10
>
>10 usec IMO are a bit low, it could be the same order of the syscalls 
>involved in the loop, I'd go to some milliseconds like we do for 
>SEND_SLEEP_USEC.
>
>>+
>>/* Receive bytes in a buffer and check the return value.
>>+ *
>>+ * MSG_PEEK note: MSG_PEEK doesn't consume bytes from the buffer, so partial
>>+ * reads cannot be summed. Instead, the function retries until recv() returns
>>+ * exactly expected_ret bytes in a single call.
>
>I'd replace with something like this:
>
>   * When MSG_PEEK is set, recv() is retried until it returns exactly
>   * expected_ret bytes. The function returns on error, EOF, or timeout
>   * as usual.
>
>Thanks,
>Stefano
>
>> *
>> * expected_ret:
>> *  <0 Negative errno (for testing errors)
>>@@ -403,6 +409,15 @@ void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret)
>>		if (ret <= 0)
>>			break;
>>
>>+		if (flags & MSG_PEEK) {
>>+			if (ret == expected_ret) {

On second thought, I think it would be more appropriate to check for
`ret >= expected_ret` here, because all subsequent recv() will
definitely return more bytes, so there’s no point in continuing the
loop... and anyway, we’ll check the result later, so just that change
should be fine.

And of course I'd update the comment on top in this way:

    * When MSG_PEEK is set, recv() is retried until it returns at least
    * expected_ret bytes. The function returns on error, EOF, or timeout
    * as usual.

Thanks,
Stefano

>>+				nread = ret;
>>+				break;
>>+			}
>>+			timeout_usleep(RECV_PEEK_RETRY_USEC);
>>+			continue;
>>+		}
>>+
>>		nread += ret;
>>	} while (nread < len);
>>	timeout_end();
>>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>>index 5bd20ccd9335..bdb0754965df 100644
>>--- a/tools/testing/vsock/vsock_test.c
>>+++ b/tools/testing/vsock/vsock_test.c
>>@@ -1500,18 +1500,7 @@ static void test_stream_credit_update_test(const struct test_opts *opts,
>>	}
>>
>>	/* Wait until there will be 128KB of data in rx queue. */
>>-	while (1) {
>>-		ssize_t res;
>>-
>>-		res = recv(fd, buf, buf_size, MSG_PEEK);
>>-		if (res == buf_size)
>>-			break;
>>-
>>-		if (res <= 0) {
>>-			fprintf(stderr, "unexpected 'recv()' return: %zi\n", res);
>>-			exit(EXIT_FAILURE);
>>-		}
>>-	}
>>+	recv_buf(fd, buf, buf_size, MSG_PEEK, buf_size);
>>
>>	/* There is 128KB of data in the socket's rx queue, dequeue first
>>	 * 64KB, credit update is sent if 'low_rx_bytes_test' == true.
>>
>>-- 
>>2.53.0
>>


^ permalink raw reply

* Re: [PATCH net-next v2 5/5] selftests: net: add veth BQL stress test
From: Breno Leitao @ 2026-04-15 11:47 UTC (permalink / raw)
  To: hawk
  Cc: netdev, kernel-team, Jonas Köppeler, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Shuah Khan, linux-kernel, linux-kselftest
In-Reply-To: <20260413094442.1376022-6-hawk@kernel.org>

On Mon, Apr 13, 2026 at 11:44:38AM +0200, hawk@kernel.org wrote:
> From: Jesper Dangaard Brouer <hawk@kernel.org>
> 
> Add a selftest that exercises veth's BQL (Byte Queue Limits) code path
> under sustained UDP load. The test creates a veth pair with GRO enabled
> (activating the NAPI path and BQL), attaches a qdisc, optionally loads
> iptables rules in the consumer namespace to slow NAPI processing, and
> floods UDP packets for a configurable duration.
> 
> The test serves two purposes: benchmarking BQL's latency impact under
> configurable load (iptables rules, qdisc type and parameters), and
> detecting kernel BUG/Oops from DQL accounting mismatches. It monitors
> dmesg throughout the run and reports PASS/FAIL via kselftest (lib.sh).
> 
> Diagnostic output is printed every 5 seconds:
>   - BQL sysfs inflight/limit and watchdog tx_timeout counter
>   - qdisc stats: packets, drops, requeues, backlog, qlen, overlimits
>   - consumer PPS and NAPI-64 cycle time (shows fq_codel target impact)
>   - sink PPS (per-period delta), latency min/avg/max (stddev at exit)
>   - ping RTT to measure latency under load
> 
> Generating enough traffic to fill the 256-entry ptr_ring requires care:
> the UDP sendto() path charges each SKB to sk_wmem_alloc, and the SKB
> stays charged (via sock_wfree destructor) until the consumer NAPI thread
> finishes processing it -- including any iptables rules in the receive
> path. With the default sk_sndbuf (~208KB from wmem_default), only ~93
> packets can be in-flight before sendto(MSG_DONTWAIT) returns EAGAIN.
> Since 93 < 256 ring entries, the ring never fills and no backpressure
> occurs. The test raises wmem_max via sysctl and sets SO_SNDBUF=1MB on
> the flood socket to remove this bottleneck. An earlier multi-namespace
> routing approach avoided this limit because ip_forward creates new SKBs
> detached from the sender's socket.
> 
> The --bql-disable option (sets limit_min=1GB) enables A/B comparison.
> Typical results with --nrules 6000 --qdisc-opts 'target 2ms interval 20ms':
> 
>   fq_codel + BQL disabled:  ping RTT ~10.8ms, 15% loss, 400KB in ptr_ring
>   fq_codel + BQL enabled:   ping RTT ~0.6ms,   0% loss, 4KB in ptr_ring
> 
> Both cases show identical consumer speed (~20Kpps) and fq_codel drops
> (~255K), proving the improvement comes purely from where packets buffer.
> 
> BQL moves buffering from the ptr_ring into the qdisc, where AQM
> (fq_codel/CAKE) can act on it -- eliminating the "dark buffer" that
> hides congestion from the scheduler.
> 
> The --qdisc-replace mode cycles through sfq/pfifo/fq_codel/noqueue
> under active traffic to verify that stale BQL state (STACK_XOFF) is
> properly handled during live qdisc transitions.
> 
> A companion wrapper (veth_bql_test_virtme.sh) launches the test inside
> a virtme-ng VM, with .config validation to prevent silent stalls.
> 
> Usage:
>   sudo ./veth_bql_test.sh [--duration 300] [--nrules 100]
>                           [--qdisc sfq] [--qdisc-opts '...']
>                           [--bql-disable] [--normal-napi]
>                           [--qdisc-replace]
> 
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>

Tested-by: Breno Leitao <leitao@debian.org>

> diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
> index 2a390cae41bf..7b1f41421145 100644
> --- a/tools/testing/selftests/net/config
> +++ b/tools/testing/selftests/net/config
> @@ -97,6 +97,7 @@ CONFIG_NET_PKTGEN=m
>  CONFIG_NET_SCH_ETF=m
>  CONFIG_NET_SCH_FQ=m
>  CONFIG_NET_SCH_FQ_CODEL=m
> +CONFIG_NET_SCH_SFQ=m

nit: This breaks the alphabetical ordering of the config file.

^ permalink raw reply

* Re: [PATCH net v3 3/3] vsock/test: add MSG_PEEK after partial recv test
From: Stefano Garzarella @ 2026-04-15 11:40 UTC (permalink / raw)
  To: Luigi Leonardi
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Arseniy Krasnov, kvm, virtualization,
	netdev, linux-kernel
In-Reply-To: <20260414-fix_peek-v3-3-e7daead49f83@redhat.com>

On Tue, Apr 14, 2026 at 06:10:23PM +0200, Luigi Leonardi wrote:
>Add a test that verifies MSG_PEEK works correctly after a partial
>recv().
>
>This is to test a bug that was present in the
>`virtio_transport_stream_do_peek()` when computing the number of bytes to
>copy: After a partial read, the peek function didn't take into
>consideration the number of bytes that were already read. So peeking the
>whole buffer would cause an out-of-bounds read, that resulted in a -EFAULT.
>
>This test does exactly this: do a partial recv on a buffer, then try to
>peek the whole buffer content.

nit: I think it's better to mention also that we are re-using
test_stream_msg_peek_client() also for this test.

>
>Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
>---
> tools/testing/vsock/vsock_test.c | 37 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index bdb0754965df..ab387a13f0ae 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -346,6 +346,38 @@ static void test_stream_msg_peek_server(const struct test_opts *opts)
> 	return test_msg_peek_server(opts, false);
> }
>
>+static void test_stream_peek_after_recv_server(const struct test_opts *opts)
>+{
>+	unsigned char buf_normal[MSG_PEEK_BUF_LEN];
>+	unsigned char buf_peek[MSG_PEEK_BUF_LEN];
>+	int fd;
>+
>+	fd = vsock_stream_accept(VMADDR_CID_ANY, opts->peer_port, NULL);
>+	if (fd < 0) {
>+		perror("accept");
>+		exit(EXIT_FAILURE);
>+	}
>+
>+	control_writeln("SRVREADY");
>+
>+	/* Partial recv to advance offset within the skb */
>+	recv_buf(fd, buf_normal, 1, 0, 1);
>+
>+	/* Ask more bytes than available */

nit:	/* Peek with a buffer larger than the remaining data */

>+	recv_buf(fd, buf_peek, sizeof(buf_peek), MSG_PEEK, sizeof(buf_peek) - 1);
>+
>+	/* Recv rest of the data */

nit:	/* Consume the remaining data */

>+	recv_buf(fd, buf_normal, sizeof(buf_normal) - 1, 0, sizeof(buf_normal) - 1);
>+
>+	/* Compare full peek and normal read. */
>+	if (memcmp(buf_peek, buf_normal, sizeof(buf_peek) - 1)) {
>+		fprintf(stderr, "Full peek data mismatch\n");
>+		exit(EXIT_FAILURE);
>+	}
>+
>+	close(fd);
>+}
>+
> #define SOCK_BUF_SIZE (2 * 1024 * 1024)
> #define SOCK_BUF_SIZE_SMALL (64 * 1024)
> #define MAX_MSG_PAGES 4
>@@ -2509,6 +2541,11 @@ static struct test_case test_cases[] = {
> 		.run_client = test_stream_tx_credit_bounds_client,
> 		.run_server = test_stream_tx_credit_bounds_server,
> 	},
>+	{
>+		.name = "SOCK_STREAM MSG_PEEK after partial recv",
>+		.run_client = test_stream_msg_peek_client,
>+		.run_server = test_stream_peek_after_recv_server,

I left just minor comments, the test LGTM:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply

* Re: [PATCH net-next v2 05/14] libie: add bookkeeping support for control queue messages
From: Larysa Zaremba @ 2026-04-15 11:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Tony Nguyen, davem, kuba, edumazet, andrew+netdev, netdev,
	Phani R Burra, przemyslaw.kitszel, aleksander.lobakin,
	sridhar.samudrala, anjali.singhai, michal.swiatkowski,
	maciej.fijalkowski, emil.s.tantilov, madhu.chittim, joshua.a.hay,
	jacob.e.keller, jayaprakash.shanmugam, jiri, horms, corbet,
	richardcochran, linux-doc, Bharath R, Samuel Salin,
	Aleksandr Loktionov
In-Reply-To: <b559c877-7712-4ed7-adb4-d2b667e16e74@redhat.com>

On Thu, Apr 09, 2026 at 11:07:02AM +0200, Paolo Abeni wrote:
> On 4/3/26 9:49 PM, Tony Nguyen wrote:
> > +static bool
> > +libie_ctlq_xn_process_recv(struct libie_ctlq_xn_recv_params *params,
> > +			   struct libie_ctlq_msg *ctlq_msg)
> > +{
> > +	struct libie_ctlq_xn_manager *xnm = params->xnm;
> > +	struct libie_ctlq_xn *xn;
> > +	u16 msg_cookie, xn_index;
> > +	struct kvec *response;
> > +	int status;
> > +	u16 data;
> > +
> > +	data = ctlq_msg->sw_cookie;
> > +	xn_index = FIELD_GET(LIBIE_CTLQ_XN_INDEX_M, data);
> > +	msg_cookie = FIELD_GET(LIBIE_CTLQ_XN_COOKIE_M, data);
> > +	status = ctlq_msg->chnl_retval ? -EFAULT : 0;
> > +
> > +	xn = &xnm->ring[xn_index];
> > +	if (ctlq_msg->chnl_opcode != xn->virtchnl_opcode ||
> > +	    msg_cookie != xn->cookie)
> > +		return false;
> > +
> > +	spin_lock(&xn->xn_lock);
> 
> Sashiko says:
> 
> ---
> Because the cookie and opcode are checked before acquiring the lock, is
> it possible for the transaction to time out, be returned to the free
> list, and get reallocated for a new message before the lock is acquired?
> If that happens, could the old delayed response falsely complete the
> newly allocated transaction since the identifiers are not re-verified
> inside the lock?
> ---
> 

Yes, there is a race condition risk that is easy to fix.

> > +/**
> > + * libie_xn_check_async_timeout - Check for asynchronous message timeouts
> > + * @xnm: Xn transaction manager
> > + *
> > + * Call the corresponding callback to notify the caller about the timeout.
> > + */
> > +static void libie_xn_check_async_timeout(struct libie_ctlq_xn_manager *xnm)
> > +{
> > +	u32 idx;
> > +
> > +	for_each_clear_bit(idx, xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES) {
> 
> Sashiko says:
> 
> ---
> This iterates over the bitmap without holding the lock. Concurrently,
> other paths modify this bitmap using non-atomic bitwise operations like
> __clear_bit() and __set_bit() under the lock. Will this cause torn reads
> or data races that might lead the timeout handler to skip valid
> transactions or examine invalid ones?
> ---
>

This should create only false-negatives, which is not a problem, timeout time is 
much longer than libie_xn_check_async_timeout() calling period.

> 
> > +		params->ctlq_msg->sw_cookie = cookie;
> > +		params->ctlq_msg->send_mem = *dma_mem;
> > +		params->ctlq_msg->data_len = buf_len;
> > +		params->ctlq_msg->chnl_opcode = params->chnl_opcode;
> > +		ret = libie_ctlq_send(params->ctlq, params->ctlq_msg, 1);
> > +	}
> > +
> > +	if (ret && !libie_cp_can_send_onstack(buf_len))
> > +		libie_cp_unmap_dma_mem(dev, dma_mem);
> 
> Sashiko says:
> 
> ---
> When libie_ctlq_send() fails here, the DMA memory is unmapped and the
> buffer is freed by the caller. However, the software tracking ring at
> tx_msg[next_to_use] still contains the populated send_mem details and a
> non-zero data_len.
> 
> During driver teardown, libie_ctlq_xn_send_clean() is invoked with
> params->force = true, which processes the ring without checking the
> hardware completion bit. Could this cause the cleanup routine to process
> the failed slot again, resulting in a double-free and double-unmap?
> ---

Yes, I think that in trying to avoid unnecessary copying, I shot myself in the 
foot, will fix.

> 
> There are more remarks on the following patch, please have a look.
>

There are also a few AI's comments that will result in fixes to stable.

> Also, it would be very helpful if you could help triaging such
> (overwhelming amount of) feedback on future submissions, explicitly
> commenting on the ML. Sashiko tends to be quite noise on device driver code.
> 
> Thanks,
> 
> Paolo
> 

^ permalink raw reply

* Re: [PATCH] macvlan: fix macvlan_get_size() not reserving space for IFLA_MACVLAN_BC_CUTOFF
From: Eric Dumazet @ 2026-04-15 11:37 UTC (permalink / raw)
  To: Dudu Lu; +Cc: netdev, andrew+netdev, davem, kuba, pabeni
In-Reply-To: <20260413085349.73977-1-phx0fer@gmail.com>

On Mon, Apr 13, 2026 at 1:53 AM Dudu Lu <phx0fer@gmail.com> wrote:
>
> macvlan_get_size() does not account for IFLA_MACVLAN_BC_CUTOFF, but
> macvlan_fill_info() conditionally includes it when port->bc_cutoff != 1.
> This causes nla_put_s32() to fail with -EMSGSIZE when the netlink skb
> runs out of space, triggering a WARN_ON in rtnetlink and preventing the
> interface from being dumped.
>
> The bug can be reproduced with:
>
>   ip link add macvlan0 link eth0 type macvlan mode bridge
>   ip link set macvlan0 type macvlan bc_cutoff 0

Was this generated by LLM ?

AFAIK, iproute2 command would look like this

 ip link set macvlan0 type macvlan bclim 0

>   ip -d link show macvlan0   # fails with -EMSGSIZE
>
> The bc_cutoff feature was added in commit 954d1fa1ac93 ("macvlan: Add
> netlink attribute for broadcast cutoff"), which added the nla_put_s32()
> call in macvlan_fill_info() but missed adding the corresponding
> nla_total_size(4) in macvlan_get_size(). A follow-up commit
> 55cef78c244d ("macvlan: add forgotten nla_policy for
> IFLA_MACVLAN_BC_CUTOFF") fixed the missing nla_policy entry but still
> did not fix the size calculation.
>
> Fixes: 954d1fa1ac93 ("macvlan: Add netlink attribute for broadcast cutoff")
> Signed-off-by: Dudu Lu <phx0fer@gmail.com>
> ---
>  drivers/net/macvlan.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index a71f058eceef..80f87599a503 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -1681,6 +1681,7 @@ static size_t macvlan_get_size(const struct net_device *dev)
>                 + macvlan_get_size_mac(vlan) /* IFLA_MACVLAN_MACADDR */
>                 + nla_total_size(4) /* IFLA_MACVLAN_BC_QUEUE_LEN */
>                 + nla_total_size(4) /* IFLA_MACVLAN_BC_QUEUE_LEN_USED */
> +               + nla_total_size(4) /* IFLA_MACVLAN_BC_CUTOFF */
>                 );
>  }
>

Note that skbs have more tailroom than requested, because kmalloc()
power-of-two roundings,
so the bug does not show in practice, just in case someone tries the
repro and sees nothing wrong.

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net v3 2/3] vsock/test: fix MSG_PEEK handling in recv_buf()
From: Stefano Garzarella @ 2026-04-15 11:31 UTC (permalink / raw)
  To: Luigi Leonardi
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Arseniy Krasnov, kvm, virtualization,
	netdev, linux-kernel
In-Reply-To: <20260414-fix_peek-v3-2-e7daead49f83@redhat.com>

On Tue, Apr 14, 2026 at 06:10:22PM +0200, Luigi Leonardi wrote:
>`recv_buf` does not handle the MSG_PEEK flag correctly: it keeps calling
>`recv` until all requested bytes are available or an error occurs.
>
>The problem is how it calculates the amount of bytes read: MSG_PEEK
>doesn't consume any bytes, will re-read the same bytes from the buffer
>head, so, summing the return value every time is wrong.
>
>Moreover, MSG_PEEK doesn't consume the bytes in the buffer, so if the
>requested amount is more than the bytes available, the loop will never
>terminate, because `recv` will never return EOF. For this reason we need
>to compare the amount of read bytes with the number of bytes expected.
>
>Add a check, and if the MSG_PEEK flag is present, update the counter of
>read bytes differently, and break if we read the expected amount.

nit: "..., update the counter for bytes read only after all expected
bytes have been read and break out of the loop; otherwise, try again
after a short delay to avoid consuming too many CPU cycles."

>
>This allows us to simplify the `test_stream_credit_update_test`, by
>reusing `recv_buf`, like some other tests already do.
>
>This also fixes callers that pass MSG_PEEK to recv_buf().

nit: this is implicit from the first part of the description.

>
>Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
>Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
>---
> tools/testing/vsock/util.c       | 15 +++++++++++++++
> tools/testing/vsock/vsock_test.c | 13 +------------
> 2 files changed, 16 insertions(+), 12 deletions(-)
>
>diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>index 1fe1338c79cd..2c9ee3210090 100644
>--- a/tools/testing/vsock/util.c
>+++ b/tools/testing/vsock/util.c
>@@ -381,7 +381,13 @@ void send_buf(int fd, const void *buf, size_t len, int flags,
> 	}
> }
>
>+#define RECV_PEEK_RETRY_USEC 10

10 usec IMO are a bit low, it could be the same order of the syscalls 
involved in the loop, I'd go to some milliseconds like we do for 
SEND_SLEEP_USEC.

>+
> /* Receive bytes in a buffer and check the return value.
>+ *
>+ * MSG_PEEK note: MSG_PEEK doesn't consume bytes from the buffer, so partial
>+ * reads cannot be summed. Instead, the function retries until recv() returns
>+ * exactly expected_ret bytes in a single call.

I'd replace with something like this:

    * When MSG_PEEK is set, recv() is retried until it returns exactly
    * expected_ret bytes. The function returns on error, EOF, or timeout
    * as usual.

Thanks,
Stefano

>  *
>  * expected_ret:
>  *  <0 Negative errno (for testing errors)
>@@ -403,6 +409,15 @@ void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret)
> 		if (ret <= 0)
> 			break;
>
>+		if (flags & MSG_PEEK) {
>+			if (ret == expected_ret) {
>+				nread = ret;
>+				break;
>+			}
>+			timeout_usleep(RECV_PEEK_RETRY_USEC);
>+			continue;
>+		}
>+
> 		nread += ret;
> 	} while (nread < len);
> 	timeout_end();
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index 5bd20ccd9335..bdb0754965df 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -1500,18 +1500,7 @@ static void test_stream_credit_update_test(const struct test_opts *opts,
> 	}
>
> 	/* Wait until there will be 128KB of data in rx queue. */
>-	while (1) {
>-		ssize_t res;
>-
>-		res = recv(fd, buf, buf_size, MSG_PEEK);
>-		if (res == buf_size)
>-			break;
>-
>-		if (res <= 0) {
>-			fprintf(stderr, "unexpected 'recv()' return: %zi\n", res);
>-			exit(EXIT_FAILURE);
>-		}
>-	}
>+	recv_buf(fd, buf, buf_size, MSG_PEEK, buf_size);
>
> 	/* There is 128KB of data in the socket's rx queue, dequeue first
> 	 * 64KB, credit update is sent if 'low_rx_bytes_test' == true.
>
>-- 
>2.53.0
>


^ permalink raw reply

* [PATCH iwl-next] ice: add SBQ posted writes with non-posted support for CGU
From: Przemyslaw Korba @ 2026-04-15 11:27 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, Przemyslaw Korba,
	Aleksandr Loktionov, Arkadiusz Kubalewski

From: Karol Kolacinski <karol.kolacinski@intel.com>

Sideband queue (SBQ) is a HW queue with very short completion time. All
SBQ writes were posted by default, which means that the driver did not
have to wait for completion from the neighbor device, because there was
none. This introduced unnecessary delays, where only those delays were
"ensuring" that the command is "completed" and this was a potential race
condition.

Add the possibility to perform non-posted writes where it's necessary to
wait for completion, instead of relying on fake completion from the FW,
where only the delays are guarding the writes.

Flush the SBQ by reading address 0 from the PHY 0 before issuing SYNC
command to ensure that writes to all PHYs were completed and skip SBQ
message completion if it's posted.

To analyze if delays are gone, look for and compare time spent in
ice_sq_send_cmd — posted writes should return immediately after the wr32.
That can be done for example by adjusting phc time with phc_ctl on E830
device, for less than 2 seconds to use this new mechanism. Without it,
command below will fail.

Reproduction steps:
phc_ctl eth13 adj 1
phc_ctl[4478170.994]: adjusted clock by 1.000000 seconds

Check trace for timing for comparisions:
echo ice_sbq_send_cmd > /sys/kernel/debug/tracing/set_ftrace_filter
echo function_graph > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/trace

Tested on:
  - Intel E830 NIC (FW version 1.00)
  - Kernel 6.19.0+

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c  | 18 ++++--
 drivers/net/ethernet/intel/ice/ice_ptp_hw.c  | 64 ++++++++++++--------
 drivers/net/ethernet/intel/ice/ice_sbq_cmd.h |  5 +-
 3 files changed, 53 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index f84990996530..2cd3d6d450a9 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1777,23 +1777,29 @@ int ice_sbq_rw_reg(struct ice_hw *hw, struct ice_sbq_msg_input *in, u16 flags)
 	msg.msg_addr_low = cpu_to_le16(in->msg_addr_low);
 	msg.msg_addr_high = cpu_to_le32(in->msg_addr_high);
 
-	if (in->opcode)
+	switch (in->opcode) {
+	case ice_sbq_msg_wr_p:
+	case ice_sbq_msg_wr_np:
 		msg.data = cpu_to_le32(in->data);
-	else
+		break;
+	case ice_sbq_msg_rd:
 		/* data read comes back in completion, so shorten the struct by
 		 * sizeof(msg.data)
 		 */
 		msg_len -= sizeof(msg.data);
+		break;
+	default:
+		return -EINVAL;
+	}
 
-	if (in->opcode == ice_sbq_msg_wr)
-		cd.posted = 1;
+	cd.posted = in->opcode == ice_sbq_msg_wr_p;
 
 	desc.flags = cpu_to_le16(flags);
 	desc.opcode = cpu_to_le16(ice_sbq_opc_neigh_dev_req);
 	desc.param0.cmd_len = cpu_to_le16(msg_len);
 	status = ice_sbq_send_cmd(hw, &desc, &msg, msg_len, &cd);
 
-	if (!status && !in->opcode)
+	if (!status && in->opcode == ice_sbq_msg_rd)
 		in->data = le32_to_cpu
 			(((struct ice_sbq_msg_cmpl *)&msg)->data);
 	return status;
@@ -6701,7 +6707,7 @@ int ice_write_cgu_reg(struct ice_hw *hw, u32 addr, u32 val)
 {
 	struct ice_sbq_msg_input cgu_msg = {
 		.dest_dev = ice_get_dest_cgu(hw),
-		.opcode = ice_sbq_msg_wr,
+		.opcode = ice_sbq_msg_wr_np,
 		.msg_addr_low = addr,
 		.data = val
 	};
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
index 690f9d874443..0f202d4dae7c 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
@@ -368,6 +368,16 @@ void ice_ptp_src_cmd(struct ice_hw *hw, enum ice_ptp_tmr_cmd cmd)
 static void ice_ptp_exec_tmr_cmd(struct ice_hw *hw)
 {
 	struct ice_pf *pf = container_of(hw, struct ice_pf, hw);
+	struct ice_sbq_msg_input msg = {
+		.dest_dev = ice_sbq_dev_phy_0,
+		.opcode = ice_sbq_msg_rd,
+	};
+	int err;
+
+	/* Flush SBQ by reading address 0 on PHY 0 */
+	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
+	if (err)
+		dev_warn(ice_hw_to_dev(hw), "Failed to flush SBQ: %d\n", err);
 
 	if (!ice_is_primary(hw))
 		hw = ice_get_primary_hw(pf);
@@ -433,7 +443,7 @@ static int ice_write_phy_eth56g(struct ice_hw *hw, u8 port, u32 addr, u32 val)
 {
 	struct ice_sbq_msg_input msg = {
 		.dest_dev = ice_ptp_get_dest_dev_e825(hw, port),
-		.opcode = ice_sbq_msg_wr,
+		.opcode = ice_sbq_msg_wr_p,
 		.msg_addr_low = lower_16_bits(addr),
 		.msg_addr_high = upper_16_bits(addr),
 		.data = val
@@ -2358,11 +2368,12 @@ static bool ice_is_40b_phy_reg_e82x(u16 low_addr, u16 *high_addr)
 static int
 ice_read_phy_reg_e82x(struct ice_hw *hw, u8 port, u16 offset, u32 *val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.opcode = ice_sbq_msg_rd,
+	};
 	int err;
 
 	ice_fill_phy_msg_e82x(hw, &msg, port, offset);
-	msg.opcode = ice_sbq_msg_rd;
 
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
@@ -2435,12 +2446,13 @@ ice_read_64b_phy_reg_e82x(struct ice_hw *hw, u8 port, u16 low_addr, u64 *val)
 static int
 ice_write_phy_reg_e82x(struct ice_hw *hw, u8 port, u16 offset, u32 val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.opcode = ice_sbq_msg_wr_p,
+		.data = val
+	};
 	int err;
 
 	ice_fill_phy_msg_e82x(hw, &msg, port, offset);
-	msg.opcode = ice_sbq_msg_wr;
-	msg.data = val;
 
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
@@ -2594,15 +2606,15 @@ static int ice_fill_quad_msg_e82x(struct ice_hw *hw,
 int
 ice_read_quad_reg_e82x(struct ice_hw *hw, u8 quad, u16 offset, u32 *val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.opcode = ice_sbq_msg_rd,
+	};
 	int err;
 
 	err = ice_fill_quad_msg_e82x(hw, &msg, quad, offset);
 	if (err)
 		return err;
 
-	msg.opcode = ice_sbq_msg_rd;
-
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
 		ice_debug(hw, ICE_DBG_PTP, "Failed to send message to PHY, err %d\n",
@@ -2628,16 +2640,16 @@ ice_read_quad_reg_e82x(struct ice_hw *hw, u8 quad, u16 offset, u32 *val)
 int
 ice_write_quad_reg_e82x(struct ice_hw *hw, u8 quad, u16 offset, u32 val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.opcode = ice_sbq_msg_wr_p,
+		.data = val
+	};
 	int err;
 
 	err = ice_fill_quad_msg_e82x(hw, &msg, quad, offset);
 	if (err)
 		return err;
 
-	msg.opcode = ice_sbq_msg_wr;
-	msg.data = val;
-
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
 		ice_debug(hw, ICE_DBG_PTP, "Failed to send message to PHY, err %d\n",
@@ -4275,14 +4287,14 @@ static void ice_ptp_init_phy_e82x(struct ice_ptp_hw *ptp)
  */
 static int ice_read_phy_reg_e810(struct ice_hw *hw, u32 addr, u32 *val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.dest_dev = ice_sbq_dev_phy_0,
+		.opcode = ice_sbq_msg_rd,
+		.msg_addr_low = lower_16_bits(addr),
+		.msg_addr_high = upper_16_bits(addr),
+	};
 	int err;
 
-	msg.msg_addr_low = lower_16_bits(addr);
-	msg.msg_addr_high = upper_16_bits(addr);
-	msg.opcode = ice_sbq_msg_rd;
-	msg.dest_dev = ice_sbq_dev_phy_0;
-
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
 		ice_debug(hw, ICE_DBG_PTP, "Failed to send message to PHY, err %d\n",
@@ -4305,15 +4317,15 @@ static int ice_read_phy_reg_e810(struct ice_hw *hw, u32 addr, u32 *val)
  */
 static int ice_write_phy_reg_e810(struct ice_hw *hw, u32 addr, u32 val)
 {
-	struct ice_sbq_msg_input msg = {0};
+	struct ice_sbq_msg_input msg = {
+		.dest_dev = ice_sbq_dev_phy_0,
+		.opcode = ice_sbq_msg_wr_p,
+		.msg_addr_low = lower_16_bits(addr),
+		.msg_addr_high = upper_16_bits(addr),
+		.data = val
+	};
 	int err;
 
-	msg.msg_addr_low = lower_16_bits(addr);
-	msg.msg_addr_high = upper_16_bits(addr);
-	msg.opcode = ice_sbq_msg_wr;
-	msg.dest_dev = ice_sbq_dev_phy_0;
-	msg.data = val;
-
 	err = ice_sbq_rw_reg(hw, &msg, LIBIE_AQ_FLAG_RD);
 	if (err) {
 		ice_debug(hw, ICE_DBG_PTP, "Failed to send message to PHY, err %d\n",
diff --git a/drivers/net/ethernet/intel/ice/ice_sbq_cmd.h b/drivers/net/ethernet/intel/ice/ice_sbq_cmd.h
index 21bb861febbf..86a143ebf089 100644
--- a/drivers/net/ethernet/intel/ice/ice_sbq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_sbq_cmd.h
@@ -54,8 +54,9 @@ enum ice_sbq_dev_id {
 };
 
 enum ice_sbq_msg_opcode {
-	ice_sbq_msg_rd	= 0x00,
-	ice_sbq_msg_wr	= 0x01
+	ice_sbq_msg_rd		= 0x00,
+	ice_sbq_msg_wr_p	= 0x01,
+	ice_sbq_msg_wr_np	= 0x02,
 };
 
 #define ICE_SBQ_MSG_FLAGS	0x40

base-commit: 0851f49814a8899a9769619b50baaeef59f9ece4
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH] macvlan: fix macvlan_get_size() not reserving space for IFLA_MACVLAN_BC_CUTOFF
From: Vadim Fedorenko @ 2026-04-15 11:11 UTC (permalink / raw)
  To: Dudu Lu, netdev; +Cc: andrew+netdev, davem, edumazet, kuba, pabeni
In-Reply-To: <20260413085349.73977-1-phx0fer@gmail.com>

On 13/04/2026 09:53, Dudu Lu wrote:
> macvlan_get_size() does not account for IFLA_MACVLAN_BC_CUTOFF, but
> macvlan_fill_info() conditionally includes it when port->bc_cutoff != 1.
> This causes nla_put_s32() to fail with -EMSGSIZE when the netlink skb
> runs out of space, triggering a WARN_ON in rtnetlink and preventing the
> interface from being dumped.
> 
> The bug can be reproduced with:
> 
>    ip link add macvlan0 link eth0 type macvlan mode bridge
>    ip link set macvlan0 type macvlan bc_cutoff 0
>    ip -d link show macvlan0   # fails with -EMSGSIZE
> 
> The bc_cutoff feature was added in commit 954d1fa1ac93 ("macvlan: Add
> netlink attribute for broadcast cutoff"), which added the nla_put_s32()
> call in macvlan_fill_info() but missed adding the corresponding
> nla_total_size(4) in macvlan_get_size(). A follow-up commit
> 55cef78c244d ("macvlan: add forgotten nla_policy for
> IFLA_MACVLAN_BC_CUTOFF") fixed the missing nla_policy entry but still
> did not fix the size calculation.
> 
> Fixes: 954d1fa1ac93 ("macvlan: Add netlink attribute for broadcast cutoff")
> Signed-off-by: Dudu Lu <phx0fer@gmail.com>
> ---
>   drivers/net/macvlan.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index a71f058eceef..80f87599a503 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -1681,6 +1681,7 @@ static size_t macvlan_get_size(const struct net_device *dev)
>   		+ macvlan_get_size_mac(vlan) /* IFLA_MACVLAN_MACADDR */
>   		+ nla_total_size(4) /* IFLA_MACVLAN_BC_QUEUE_LEN */
>   		+ nla_total_size(4) /* IFLA_MACVLAN_BC_QUEUE_LEN_USED */
> +		+ nla_total_size(4) /* IFLA_MACVLAN_BC_CUTOFF */
>   		);
>   }

Please, use tree indication for the next submissions. As this patch
fixes the issue, it will go to net tree.

Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>

^ permalink raw reply

* Re: [PATCH v2] net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
From: kernel test robot @ 2026-04-15 11:09 UTC (permalink / raw)
  To: Pavitra Jha, pabeni
  Cc: llvm, oe-kbuild-all, w, chandrashekar.devegowda, linux-wwan,
	netdev, stable, Pavitra Jha
In-Reply-To: <20260414153201.1633720-1-jhapavitra98@gmail.com>

Hi Pavitra,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]
[also build test WARNING on net-next/main linus/master v7.0 next-20260414]
[cannot apply to horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Pavitra-Jha/net-wwan-t7xx-validate-port_count-against-message-length-in-t7xx_port_enum_msg_handler/20260415-014321
base:   net/main
patch link:    https://lore.kernel.org/r/20260414153201.1633720-1-jhapavitra98%40gmail.com
patch subject: [PATCH v2] net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
config: loongarch-randconfig-002-20260415 (https://download.01.org/0day-ci/archive/20260415/202604151900.1tnLdQi7-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604151900.1tnLdQi7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604151900.1tnLdQi7-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c:127 function parameter 'msg_len' not described in 't7xx_port_enum_msg_handler'
>> Warning: drivers/net/wwan/t7xx/t7xx_port_ctrl_msg.c:127 function parameter 'msg_len' not described in 't7xx_port_enum_msg_handler'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [net-next v1 2/3] net: motorcomm: phy: set drive strength in 8531s RGMII case
From: Minda Chen @ 2026-04-15  9:26 UTC (permalink / raw)
  To: Frank, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev
  Cc: linux-kernel, Minda Chen
In-Reply-To: <20260415092654.64907-1-minda.chen@starfivetech.com>

Set RXD and RX CLK pin drive strength while in 8531s RGMII
case.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
---
 drivers/net/phy/motorcomm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/phy/motorcomm.c b/drivers/net/phy/motorcomm.c
index 35aff1519b4b..f3129419f7c9 100644
--- a/drivers/net/phy/motorcomm.c
+++ b/drivers/net/phy/motorcomm.c
@@ -1714,6 +1714,11 @@ static int yt8521_config_init(struct phy_device *phydev)
 		if (ret < 0)
 			goto err_restore_page;
 	}
+
+	if (phydev->drv->phy_id == PHY_ID_YT8531S &&
+	    phydev->interface != PHY_INTERFACE_MODE_SGMII)
+		ret = yt8531_set_ds(phydev, true);
+
 err_restore_page:
 	return phy_restore_page(phydev, old_page, ret);
 }
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v3 net] vsock: fix buffer size clamping order
From: Stefano Garzarella @ 2026-04-15 10:42 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: Norbert Szetei, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, virtualization, netdev, linux-kernel
In-Reply-To: <e965ff22-37b2-406d-b885-e2736e84c0f3@rbox.co>

On Tue, Apr 14, 2026 at 04:22:04PM +0200, Michal Luczaj wrote:
>On 4/9/26 18:34, Norbert Szetei wrote:
>> In vsock_update_buffer_size(), the buffer size was being clamped to the
>> maximum first, and then to the minimum. If a user sets a minimum buffer
>> size larger than the maximum, the minimum check overrides the maximum
>> check, inverting the constraint.
>>
>> This breaks the intended socket memory boundaries by allowing the
>> vsk->buffer_size to grow beyond the configured vsk->buffer_max_size.
>>
>> Fix this by checking the minimum first, and then the maximum. This
>> ensures the buffer size never exceeds the buffer_max_size.
>
>Something may be missing. After adding another ioctl to your reproducer, I
>still see crashes.
>
>     SYSCHK(setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MIN_SIZE, &min,
>                       sizeof(min)));
>+    SYSCHK(setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MAX_SIZE, &min,
>+                      sizeof(min)));
> }
>
>[*] Setting buffer_min_size to 0x400000000.
>[socket][0] sending...
>
>refcount_t: saturated; leaking memory.
>WARNING: lib/refcount.c:22 at refcount_warn_saturate+0x7d/0xb0, CPU#2:
>a.out/1478
>...
>refcount_t: underflow; use-after-free.
>WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x50/0xb0, CPU#12:
>kworker/12:0/80
>Workqueue: vsock-loopback vsock_loopback_work
>...
>

yeah, I pointed out the same during the bug discussion 
(https://lore.kernel.org/netdev/acuKUpZQq6z1DY_n@sgarzare-redhat/) and 
suggested to add a sysctl or reuse net.core.wmem_max/rmem_max
(https://lore.kernel.org/netdev/adYKERRYwzMIhZAl@sgarzare-redhat/)

Thanks,
Stefano


^ permalink raw reply

* Re: [PATCH net] hv_sock: Report EOF instead of -EIO for FIN
From: Stefano Garzarella @ 2026-04-15 10:38 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: kys, haiyangz, wei.liu, longli, davem, edumazet, kuba, pabeni,
	horms, niuxuewei.nxw, linux-hyperv, virtualization, netdev,
	linux-kernel, stable, Ben Hillis, Mitchell Levy
In-Reply-To: <20260414234316.711578-1-decui@microsoft.com>

On Tue, Apr 14, 2026 at 04:43:16PM -0700, Dexuan Cui wrote:
>Commit f0c5827d07cb unluckily causes a regression for the FIN packet,
>and the final read syscall gets an error rather than 0.
>
>Ideally, we would want to fix hvs_channel_readable_payload() so that it
>could return 0 in the FIN scenario, but it's not good for the hv_sock
>driver to use the VMBus ringbuffer's cached priv_read_index, which is
>internal data in the VMBus driver.
>
>Fix the regression in hv_sock by returning 0 rather than -EIO.
>
>Fixes: f0c5827d07cb ("hv_sock: Return the readable bytes in hvs_stream_has_data()")
>Cc: stable@vger.kernel.org
>Reported-by: Ben Hillis <Ben.Hillis@microsoft.com>
>Reported-by: Mitchell Levy <levymitchell0@gmail.com>
>Signed-off-by: Dexuan Cui <decui@microsoft.com>
>---
> net/vmw_vsock/hyperv_transport.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
>diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>index 069386a74557..63d3549125be 100644
>--- a/net/vmw_vsock/hyperv_transport.c
>+++ b/net/vmw_vsock/hyperv_transport.c
>@@ -703,8 +703,22 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
> 	switch (hvs_channel_readable_payload(hvs->chan)) {
> 	case 1:
> 		need_refill = !hvs->recv_desc;
>-		if (!need_refill)
>-			return -EIO;
>+		if (!need_refill) {

Can we drop `need_refill` entirly and just check `hvs->recv_desc` here?

Mainly because now the comment we are adding is confusing me about what 
`need_refill` means.

The rest LGTM.

Thanks,
Stefano

>+			/* Here hvs->recv_data_len is 0, so hvs->recv_desc must
>+			 * be NULL unless it points to the 0-byte-payload FIN
>+			 * packet: see hvs_update_recv_data().
>+			 *
>+			 * Here all the payload has been dequeued, but
>+			 * hvs_channel_readable_payload() still returns 1,
>+			 * because the VMBus ringbuffer's read_index is not
>+			 * updated for the FIN packet: hvs_stream_dequeue() ->
>+			 * hv_pkt_iter_next() updates the cached priv_read_index
>+			 * but has no opportunity to update the read_index in
>+			 * hv_pkt_iter_close() as hvs_stream_has_data() returns
>+			 * 0 for the FIN packet, so it won't get dequeued.
>+			 */
>+			return 0;
>+		}
>
> 		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
> 		if (!hvs->recv_desc)
>-- 
>2.49.0
>
>


^ permalink raw reply

* Re: [PATCH] rose: Fix rose_find_socket() returning without sock_hold()
From: kernel test robot @ 2026-04-15 10:36 UTC (permalink / raw)
  To: Dudu Lu, netdev; +Cc: oe-kbuild-all, davem, edumazet, kuba, pabeni, Dudu Lu
In-Reply-To: <20260413090420.79932-1-phx0fer@gmail.com>

Hi Dudu,

kernel test robot noticed the following build errors:

[auto build test ERROR on net/main]
[also build test ERROR on net-next/main linus/master horms-ipvs/master v7.0 next-20260414]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dudu-Lu/rose-Fix-rose_find_socket-returning-without-sock_hold/20260414-194608
base:   net/main
patch link:    https://lore.kernel.org/r/20260413090420.79932-1-phx0fer%40gmail.com
patch subject: [PATCH] rose: Fix rose_find_socket() returning without sock_hold()
config: i386-randconfig-141-20260415 (https://download.01.org/0day-ci/archive/20260415/202604151819.celyrwKo-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
smatch: v0.5.0-9007-gcf3ea02b
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260415/202604151819.celyrwKo-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604151819.celyrwKo-lkp@intel.com/

All errors (new ones prefixed by >>):

>> net/rose/af_rose.c:1:9: error: expected identifier or '(' before 'if'
       1 |         if (s)
         |         ^~


vim +1 net/rose/af_rose.c

   > 1		if (s)
     2			sock_hold(s);// SPDX-License-Identifier: GPL-2.0-or-later
     3	/*
     4	 *
     5	 * Copyright (C) Jonathan Naylor G4KLX (g4klx@g4klx.demon.co.uk)
     6	 * Copyright (C) Alan Cox GW4PTS (alan@lxorguk.ukuu.org.uk)
     7	 * Copyright (C) Terry Dawson VK2KTJ (terry@animats.net)
     8	 * Copyright (C) Tomi Manninen OH2BNS (oh2bns@sral.fi)
     9	 */
    10	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v2] vsock/virtio: fix accept queue count leak on transport mismatch
From: Michael S. Tsirkin @ 2026-04-15 10:28 UTC (permalink / raw)
  To: Dudu Lu; +Cc: netdev, stefanha, sgarzare, jasowang
In-Reply-To: <20260413131409.19022-1-phx0fer@gmail.com>

On Mon, Apr 13, 2026 at 09:14:09PM +0800, Dudu Lu wrote:
> virtio_transport_recv_listen() calls sk_acceptq_added() before
> vsock_assign_transport(). If vsock_assign_transport() fails or
> selects a different transport, the error path returns without
> calling sk_acceptq_removed(), permanently incrementing
> sk_ack_backlog.
> 
> After approximately backlog+1 such failures, sk_acceptq_is_full()
> returns true, causing the listener to reject all new connections.
> 
> Fix by moving sk_acceptq_added() to after the transport validation,
> matching the pattern used by vmci_transport and hyperv_transport.
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> Signed-off-by: Dudu Lu <phx0fer@gmail.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  net/vmw_vsock/virtio_transport_common.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 8a9fb23c6e85..e01d983488e5 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1560,8 +1560,6 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>  		return -ENOMEM;
>  	}
>  
> -	sk_acceptq_added(sk);
> -
>  	lock_sock_nested(child, SINGLE_DEPTH_NESTING);
>  
>  	child->sk_state = TCP_ESTABLISHED;
> @@ -1583,6 +1581,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>  		return ret;
>  	}
>  
> +	sk_acceptq_added(sk);
>  	if (virtio_transport_space_update(child, skb))
>  		child->sk_write_space(child);
>  
> -- 
> 2.39.3 (Apple Git-145)


^ permalink raw reply

* Re: [PATCH v2] vsock/virtio: fix accept queue count leak on transport mismatch
From: Stefano Garzarella @ 2026-04-15 10:27 UTC (permalink / raw)
  To: Dudu Lu; +Cc: netdev, stefanha, mst, jasowang
In-Reply-To: <20260413131409.19022-1-phx0fer@gmail.com>

On Mon, Apr 13, 2026 at 09:14:09PM +0800, Dudu Lu wrote:
>virtio_transport_recv_listen() calls sk_acceptq_added() before
>vsock_assign_transport(). If vsock_assign_transport() fails or
>selects a different transport, the error path returns without
>calling sk_acceptq_removed(), permanently incrementing
>sk_ack_backlog.
>
>After approximately backlog+1 such failures, sk_acceptq_is_full()
>returns true, causing the listener to reject all new connections.
>
>Fix by moving sk_acceptq_added() to after the transport validation,
>matching the pattern used by vmci_transport and hyperv_transport.
>
>Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
>Signed-off-by: Dudu Lu <phx0fer@gmail.com>
>---
> net/vmw_vsock/virtio_transport_common.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply

* [PATCH] net: ipv4: igmp: add sysctl option to ignore inbound llm_reports
From: Steffen Trumtrar @ 2026-04-15 10:26 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, David Ahern
  Cc: netdev, linux-doc, linux-kernel, Steffen Trumtrar

Add a new sysctl option 'igmp_link_local_mcast_reports_drop' that allows
dropping inbound IGMP reports for link-local multicast groups in the
224.0.0.X range. This can be used to prevent the local system from
processing IGMP reports for link local multicast groups and therefore
let the kernel still send the own outbound IGMP reports.

Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
---
 Documentation/networking/ip-sysctl.rst                       | 12 ++++++++++++
 .../networking/net_cachelines/netns_ipv4_sysctl.rst          |  1 +
 include/net/netns/ipv4.h                                     |  1 +
 net/ipv4/af_inet.c                                           |  1 +
 net/ipv4/igmp.c                                              |  2 ++
 net/ipv4/sysctl_net_ipv4.c                                   |  7 +++++++
 6 files changed, 24 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 6921d8594b849..2da4cd6ac7202 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -2306,6 +2306,18 @@ igmp_link_local_mcast_reports - BOOLEAN
 
 	Default TRUE
 
+igmp_link_local_mcast_reports_drop - BOOLEAN
+	Drop inbound IGMP reports for link local multicast groups in
+	the 224.0.0.X range. When enabled, IGMP membership reports for
+	link local multicast addresses are silently dropped without
+	processing.
+	When the kernel gets inbound IGMP reports it stops sending own
+	IGMP reports. With allowing to drop and process the inbound reports,
+	the kernel will not stop sending the own reports, even when IGMP
+	reports from other hosts are seen on the network.
+
+	Default FALSE
+
 Alexey Kuznetsov.
 kuznet@ms2.inr.ac.ru
 
diff --git a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
index beaf1880a19bf..703afe2ba063b 100644
--- a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
+++ b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
@@ -140,6 +140,7 @@ int                             sysctl_udp_rmem_min
 u8                              sysctl_fib_notify_on_flag_change
 u8                              sysctl_udp_l3mdev_accept
 u8                              sysctl_igmp_llm_reports
+u8                              sysctl_igmp_llm_reports_drop
 int                             sysctl_igmp_max_memberships
 int                             sysctl_igmp_max_msf
 int                             sysctl_igmp_qrv
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 8e971c7bf1646..1453f825ffd4d 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -258,6 +258,7 @@ struct netns_ipv4 {
 	u8 sysctl_igmp_llm_reports;
 	int sysctl_igmp_max_memberships;
 	int sysctl_igmp_max_msf;
+	u8 sysctl_igmp_llm_reports_drop;
 	int sysctl_igmp_qrv;
 
 	struct ping_group_range ping_group_range;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c7731e300a442..b8f96a5d8afdc 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1825,6 +1825,7 @@ static __net_init int inet_init_net(struct net *net)
 	net->ipv4.sysctl_igmp_max_msf = 10;
 	/* IGMP reports for link-local multicast groups are enabled by default */
 	net->ipv4.sysctl_igmp_llm_reports = 1;
+	net->ipv4.sysctl_igmp_llm_reports_drop = 0;
 	net->ipv4.sysctl_igmp_qrv = 2;
 
 	net->ipv4.sysctl_fib_notify_on_flag_change = 0;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index a674fb44ec25b..3a4932e4108bd 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -931,6 +931,8 @@ static bool igmp_heard_report(struct in_device *in_dev, __be32 group)
 	if (ipv4_is_local_multicast(group) &&
 	    !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports))
 		return false;
+	if (READ_ONCE(net->ipv4.sysctl_igmp_llm_reports_drop))
+		return true;
 
 	rcu_read_lock();
 	for_each_pmc_rcu(in_dev, im) {
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 5654cc9c8a0b9..24dde84d289e4 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -948,6 +948,13 @@ static struct ctl_table ipv4_net_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dou8vec_minmax,
 	},
+	{
+		.procname	= "igmp_link_local_mcast_reports_drop",
+		.data		= &init_net.ipv4.sysctl_igmp_llm_reports_drop,
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= proc_dou8vec_minmax,
+	},
 	{
 		.procname	= "igmp_max_memberships",
 		.data		= &init_net.ipv4.sysctl_igmp_max_memberships,

---
base-commit: 028ef9c96e96197026887c0f092424679298aae8
change-id: 20260415-v7-0-topic-igmp-llm-drop-e4c13dbf17cc

Best regards,
--  
Steffen Trumtrar <s.trumtrar@pengutronix.de>


^ permalink raw reply related

* Re: [RFC PATCH net-next 2/2] selftests: net: add FOU multicast encapsulation resubmit test
From: Breno Leitao @ 2026-04-15 10:25 UTC (permalink / raw)
  To: Anton Danilov
  Cc: netdev, willemdebruijn.kernel, davem, dsahern, edumazet, kuba,
	pabeni, horms, shuah, linux-kselftest
In-Reply-To: <ad7NhoxmrFJAsXe7@dau-home-pc>

On Wed, Apr 15, 2026 at 02:28:06AM +0300, Anton Danilov wrote:
> +send_fou_gre_packets() {
> +	local count=$1
> +
> +	ip netns exec "$NSENDER" python3 -c "

Having Python code embedded directly in the shell function makes this
difficult to review and maintain. Could you extract the Python script to
a separate file? This would simplify the code to just:

	ip netns exec "$NSENDER" python3 my_python_script.py

^ permalink raw reply

* [PATCH iwl-net] i40e: set supported_extts_flags for rising edge
From: Przemyslaw Korba @ 2026-04-15 10:25 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, Przemyslaw Korba,
	Arkadiusz Kubalewski, Aleksandr Loktionov

The i40e driver always supported only rising edge detection, so
advertise PTP_RISING_EDGE, and PTP_STRICT_FLAGS to ensure the
PTP core properly validates user requests.

Fixes: 7c571ac57d9d ("net: ptp: introduce .supported_extts_flags to ptp_clock_info")
Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ptp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 7d07c389bb23..c4525bfab09c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -1344,6 +1344,8 @@ static int i40e_init_pin_config(struct i40e_pf *pf)
 	pf->ptp_caps.n_ext_ts = 2;
 	pf->ptp_caps.pps = 1;
 	pf->ptp_caps.n_per_out = 2;
+	pf->ptp_caps.supported_extts_flags = PTP_RISING_EDGE |
+					     PTP_STRICT_FLAGS;
 
 	pf->ptp_caps.pin_config = kzalloc_objs(*pf->ptp_caps.pin_config,
 					       pf->ptp_caps.n_pins);

base-commit: d4999456017dd09ff5f7a34e236c471560d8f8e4
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH] gcov: use atomic counter updates to fix concurrent access crashes
From: Andrew Morton @ 2026-04-15 10:19 UTC (permalink / raw)
  To: Peter Oberparleiter
  Cc: Nathan Chancellor, Konstantin Khorenko, Mikhail Zaslonko,
	Nicolas Schier, Masahiro Yamada, Thomas Weißschuh,
	Arnd Bergmann, Steffen Klassert, Herbert Xu, linux-kbuild,
	linux-kernel, netdev, Pavel Tikhomirov, Vasileios Almpanis,
	Jakub Kicinski
In-Reply-To: <9fba075d-9388-483f-818e-6ee3b168f18d@linux.ibm.com>

On Thu, 9 Apr 2026 10:11:24 +0200 Peter Oberparleiter <oberpar@linux.ibm.com> wrote:

> > would be to defer it to 7.2 at this point in the development cycle so
> > that it can have most of a cycle to sit in -next.
> 
> Adding Andrew since he typically integrates GCOV patches via his tree,
> and for input on how to handle this patch.
> 
> To summarize the situation, this patch:
> - is only effective with GCC + GCOV profiling enabled
> - fixes a run-time crash
> - improves overall GCOV coverage data consistency
> - triggers a number of build errors due to side-effects on GCC constant
>   folding and therefore depends on the associated series [1] that fixes
>   these build-errors
> - has a non-zero chance to trigger additional build-time errors, e.g.
>   in similar macros guarded by arch/config symbols not covered by
>   current testing
> 
> Given the last point, I agree with Nathan that this patch would benefit
> from additional test coverage to minimize regression risks, e.g. via a
> cycle in -next.

Great, thanks for preempting lots of dumb akpm questions ;)

Agree, I'll stash this in the post-rc1 pile.

^ permalink raw reply

* RE: [PATCH net-next v3 5/5] net: phy: Move phy_init_hw() from phy_resume() to __phy_resume()
From: Biju Das @ 2026-04-15 10:00 UTC (permalink / raw)
  To: Andrew Lunn, Russell King
  Cc: Heiner Kallweit, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Russell King, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Geert Uytterhoeven,
	Prabhakar Mahadev Lad, linux-renesas-soc@vger.kernel.org,
	Chris Paterson
In-Reply-To: <TY3PR01MB113462DBCD0E25184E2B1630186252@TY3PR01MB11346.jpnprd01.prod.outlook.com>

Hi Andrew/Russell,

> -----Original Message-----
> From: Biju Das <biju.das.jz@bp.renesas.com>
> Sent: 14 April 2026 19:43
> Subject: RE: [PATCH net-next v3 5/5] net: phy: Move phy_init_hw() from phy_resume() to __phy_resume()
> 
> Hi Andrew,
> 
> > -----Original Message-----
> > From: Andrew Lunn <andrew@lunn.ch>
> > Sent: 14 April 2026 17:03
> > Subject: Re: [PATCH net-next v3 5/5] net: phy: Move phy_init_hw() from
> > phy_resume() to __phy_resume()
> >
> > On Sun, Apr 12, 2026 at 03:00:27PM +0100, Biju wrote:
> > > From: Biju Das <biju.das.jz@bp.renesas.com>
> > >
> > > Now that redundant locking has been removed from PHY driver
> > > callbacks,
> > > phy_init_hw() can be called with phydev->lock held.
> > >
> > > Many MAC drivers and the phylink framework resume the PHY via
> > > phy_start(), which invokes __phy_resume() directly without going
> > > through phy_resume(). Keeping phy_init_hw() in phy_resume() means it
> > > is not called in this path.
> > >
> > > Move phy_init_hw() into __phy_resume() so that PHY soft reset and
> > > re-initialisation happen unconditionally on every resume, regardless
> > > of which code path triggers it.
> >
> > I would change the order of these patches. First remove the redundant
> > locks. You can then put
> > phy_init_hw() into __phy_resume(), rather than first moving it into
> > phy_resume() and then __phy_resume().
> 
> Agreed.

One of my colleague pointed out that this patch may break[1], but we don't have the hardware to
test this IP. According to him

 "It does a phy_init_hw(), resetting the PHY, it applies some ethtool settings and then it calls
phy_start(). Since we are calling phy_init_hw() again during phy_start(),we might be undoing the
ethtool settings"

This kind of cleanup/fixing will be in 7.2 cycle, right?

[1] https://elixir.bootlin.com/linux/v7.0/source/drivers/net/ethernet/marvell/mv643xx_eth.c#L2327

Cheers,
Biju



^ permalink raw reply

* rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance
From: wenzhaoliao @ 2026-04-15  9:59 UTC (permalink / raw)
  To: andrew, hkallweit1, fujita.tomonori
  Cc: linux, tmgross, ojeda, netdev, rust-for-linux


Hello PHY and Rust maintainers,


I am a PhD student working on a C-to-Rust migration tool for systems code.
We would like to validate it in Linux with one concrete PHY target and would
like to confirm direction before posting a larger RFC series.


Scope of this intent:
- Initial target: MAE0621A (currently out-of-tree C driver).
- We do NOT intend to submit a duplicate Rust rewrite of an existing in-tree C PHY driver.
- Goal: evaluate a semi-automatic abstraction completion workflow:
  reuse existing Rust PHY abstractions where possible, and add only minimal missing abstractions.


Planned deliverables:
- A gap analysis between MAE0621A C callbacks and current rust/kernel/net/phy.rs coverage.
- A small RFC patch series with minimal abstraction additions (if needed).
- A MAE0621A Rust driver prototype on top of those abstractions for linux-next/rust-next evaluation.


Quality and process commitments:
- Full human review by submitters; we can explain all submitted code.
- Transparent disclosure of tool assistance in cover letters/changelogs.
- Hardware-backed test results and explicit limitations in each posting.


Questions:
1. Is MAE0621A an acceptable first target for this direction?
2. If MAE0621A is not suitable, could you recommend one or two better out-of-tree PHY drivers for a first Rust submission?
3. For review flow, do you prefer:
   (a) abstractions-first RFC, then driver, or
   (b) minimal abstractions + concrete driver in one RFC series?


If there are no objections, we plan to post an RFC 0/N in about 2 weeks.


Thanks for your guidance.


Best regards,
Liao Wenzhao
Renmin University of China


^ permalink raw reply

* [net-next v1 1/3] net: phy: motorcomm: Add yt8531_set_ds() mdio_locked bool parameter
From: Minda Chen @ 2026-04-15  9:26 UTC (permalink / raw)
  To: Frank, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev
  Cc: linux-kernel, Minda Chen
In-Reply-To: <20260415092654.64907-1-minda.chen@starfivetech.com>

yt8531_set_ds() default set register with mdio lock and only called
with YT8531 PHY. But new type YT8531s support RGMII and has the same
pin strength setting with YT8531, YT8531s need to call yt8531_set_ds()
setting pin drive strength. But Its config init function
yt8521_config_init() already get the mdio lock with phy_select_page().

Need to add ytphy API without lock in yt8531_set_ds() and a new
bool parameter for YT8531s RGMII case.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
---
 drivers/net/phy/motorcomm.c | 51 +++++++++++++++++++++++++------------
 1 file changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/motorcomm.c b/drivers/net/phy/motorcomm.c
index 4d62f7b36212..35aff1519b4b 100644
--- a/drivers/net/phy/motorcomm.c
+++ b/drivers/net/phy/motorcomm.c
@@ -970,22 +970,26 @@ static const struct ytphy_ldo_vol_map yt8531_ldo_vol[] = {
 	{.vol = YT8531_LDO_VOL_3V3, .ds = 7, .cur = 6140},
 };
 
-static u32 yt8531_get_ldo_vol(struct phy_device *phydev)
+static u32 yt8531_get_ldo_vol(struct phy_device *phydev, bool mdio_locked)
 {
 	u32 val;
 
-	val = ytphy_read_ext_with_lock(phydev, YT8521_CHIP_CONFIG_REG);
+	if (mdio_locked)
+		val = ytphy_read_ext(phydev, YT8521_CHIP_CONFIG_REG);
+	else
+		val = ytphy_read_ext_with_lock(phydev, YT8521_CHIP_CONFIG_REG);
+
 	val = FIELD_GET(YT8531_RGMII_LDO_VOL_MASK, val);
 
 	return val <= YT8531_LDO_VOL_1V8 ? val : YT8531_LDO_VOL_1V8;
 }
 
-static int yt8531_get_ds_map(struct phy_device *phydev, u32 cur)
+static int yt8531_get_ds_map(struct phy_device *phydev, u32 cur, bool mdio_locked)
 {
 	u32 vol;
 	int i;
 
-	vol = yt8531_get_ldo_vol(phydev);
+	vol = yt8531_get_ldo_vol(phydev, mdio_locked);
 	for (i = 0; i < ARRAY_SIZE(yt8531_ldo_vol); i++) {
 		if (yt8531_ldo_vol[i].vol == vol && yt8531_ldo_vol[i].cur == cur)
 			return yt8531_ldo_vol[i].ds;
@@ -994,7 +998,7 @@ static int yt8531_get_ds_map(struct phy_device *phydev, u32 cur)
 	return -EINVAL;
 }
 
-static int yt8531_set_ds(struct phy_device *phydev)
+static int yt8531_set_ds(struct phy_device *phydev, bool mdio_locked)
 {
 	struct device_node *node = phydev->mdio.dev.of_node;
 	u32 ds_field_low, ds_field_hi, val;
@@ -1002,7 +1006,7 @@ static int yt8531_set_ds(struct phy_device *phydev)
 
 	/* set rgmii rx clk driver strength */
 	if (!of_property_read_u32(node, "motorcomm,rx-clk-drv-microamp", &val)) {
-		ds = yt8531_get_ds_map(phydev, val);
+		ds = yt8531_get_ds_map(phydev, val, mdio_locked);
 		if (ds < 0)
 			return dev_err_probe(&phydev->mdio.dev, ds,
 					     "No matching current value was found.\n");
@@ -1010,16 +1014,23 @@ static int yt8531_set_ds(struct phy_device *phydev)
 		ds = YT8531_RGMII_RX_DS_DEFAULT;
 	}
 
-	ret = ytphy_modify_ext_with_lock(phydev,
-					 YTPHY_PAD_DRIVE_STRENGTH_REG,
-					 YT8531_RGMII_RXC_DS_MASK,
-					 FIELD_PREP(YT8531_RGMII_RXC_DS_MASK, ds));
+	if (mdio_locked)
+		ret = ytphy_modify_ext(phydev,
+				       YTPHY_PAD_DRIVE_STRENGTH_REG,
+				       YT8531_RGMII_RXC_DS_MASK,
+				       FIELD_PREP(YT8531_RGMII_RXC_DS_MASK, ds));
+	else
+		ret = ytphy_modify_ext_with_lock(phydev,
+						 YTPHY_PAD_DRIVE_STRENGTH_REG,
+						 YT8531_RGMII_RXC_DS_MASK,
+						 FIELD_PREP(YT8531_RGMII_RXC_DS_MASK, ds));
+
 	if (ret < 0)
 		return ret;
 
 	/* set rgmii rx data driver strength */
 	if (!of_property_read_u32(node, "motorcomm,rx-data-drv-microamp", &val)) {
-		ds = yt8531_get_ds_map(phydev, val);
+		ds = yt8531_get_ds_map(phydev, val, mdio_locked);
 		if (ds < 0)
 			return dev_err_probe(&phydev->mdio.dev, ds,
 					     "No matching current value was found.\n");
@@ -1033,10 +1044,18 @@ static int yt8531_set_ds(struct phy_device *phydev)
 	ds_field_low = FIELD_GET(GENMASK(1, 0), ds);
 	ds_field_low = FIELD_PREP(YT8531_RGMII_RXD_DS_LOW_MASK, ds_field_low);
 
-	ret = ytphy_modify_ext_with_lock(phydev,
-					 YTPHY_PAD_DRIVE_STRENGTH_REG,
-					 YT8531_RGMII_RXD_DS_LOW_MASK | YT8531_RGMII_RXD_DS_HI_MASK,
-					 ds_field_low | ds_field_hi);
+	if (mdio_locked)
+		ret = ytphy_modify_ext(phydev,
+				       YTPHY_PAD_DRIVE_STRENGTH_REG,
+				       YT8531_RGMII_RXD_DS_LOW_MASK | YT8531_RGMII_RXD_DS_HI_MASK,
+				       ds_field_low | ds_field_hi);
+	else
+		ret = ytphy_modify_ext_with_lock(phydev,
+						 YTPHY_PAD_DRIVE_STRENGTH_REG,
+						 YT8531_RGMII_RXD_DS_LOW_MASK |
+						 YT8531_RGMII_RXD_DS_HI_MASK,
+						 ds_field_low | ds_field_hi);
+
 	if (ret < 0)
 		return ret;
 
@@ -1826,7 +1845,7 @@ static int yt8531_config_init(struct phy_device *phydev)
 			return ret;
 	}
 
-	ret = yt8531_set_ds(phydev);
+	ret = yt8531_set_ds(phydev, false);
 	if (ret < 0)
 		return ret;
 
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] netfilter: xt_realm: fix null-ptr-deref in realm_mt()
From: Pablo Neira Ayuso @ 2026-04-15  9:44 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Kito Xu (veritas501), phil, davem, edumazet, kuba, pabeni, horms,
	jengelh, kaber, netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <ad9aDziQEBR0h3U8@chamomile>

On Wed, Apr 15, 2026 at 11:27:43AM +0200, Pablo Neira Ayuso wrote:
> On Wed, Apr 15, 2026 at 11:02:15AM +0200, Florian Westphal wrote:
> > Kito Xu (veritas501) <hxzene@gmail.com> wrote:
> > > realm_mt() unconditionally dereferences skb_dst(skb) without a NULL
> > > check. The xt_realm match registers with .family = NFPROTO_UNSPEC,
> > > making it available to all netfilter protocol families. Through the
> > > nftables compat layer (nft_compat), an unprivileged user inside a
> > > user/net namespace can load this match into a bridge-family chain.
> > 
> > I do not think this bug is related to nft_compat.
> > You can also use ebtables setsockopt api to request xt_realm, no?
> > 
> > > Fixes: ab4f21e6fb1c ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions")
> > 
> > Looks correct.  Alternatively we could revert the xt_realm.c change.
> > But I don't have a strong opinion here, patch looks correct.
> 
> Maybe partial revert makes sense, since in ab4f21e6fb1c:
> 
> - xt_MARK: OK
> - xt_NOTRACK: OK
> - xt_comment: OK
> - xt_mac: There is a better way to do this in bridge.
> - xt_owner, no sockets in bridge.
> - xt_physdev, which makes no sense in bridge, this is for br_netfilter
>   only.
> - xt_realm (as already mentioned).
> 
> That is, a partial revert of this patch for:
> 
> - xt_mac
> - xt_owner
> - xt_physdev
> - xt_realm

"this patch" refers to ab4f21e6fb1c 

^ permalink raw reply

* Re: [PATCH] netfilter: xt_realm: fix null-ptr-deref in realm_mt()
From: Florian Westphal @ 2026-04-15  9:44 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Kito Xu (veritas501), phil, davem, edumazet, kuba, pabeni, horms,
	jengelh, kaber, netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <ad9aDziQEBR0h3U8@chamomile>

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Apr 15, 2026 at 11:02:15AM +0200, Florian Westphal wrote:
> > Kito Xu (veritas501) <hxzene@gmail.com> wrote:
> > > realm_mt() unconditionally dereferences skb_dst(skb) without a NULL
> > > check. The xt_realm match registers with .family = NFPROTO_UNSPEC,
> > > making it available to all netfilter protocol families. Through the
> > > nftables compat layer (nft_compat), an unprivileged user inside a
> > > user/net namespace can load this match into a bridge-family chain.
> > 
> > I do not think this bug is related to nft_compat.
> > You can also use ebtables setsockopt api to request xt_realm, no?
> > 
> > > Fixes: ab4f21e6fb1c ("netfilter: xtables: use NFPROTO_UNSPEC in more extensions")
> > 
> > Looks correct.  Alternatively we could revert the xt_realm.c change.
> > But I don't have a strong opinion here, patch looks correct.
> 
> Maybe partial revert makes sense, since in ab4f21e6fb1c:
> 
> - xt_MARK: OK
> - xt_NOTRACK: OK
> - xt_comment: OK

Agree.

> - xt_mac: There is a better way to do this in bridge.

Right.

> - xt_owner, no sockets in bridge.

Output/postrouting maybe?

> - xt_physdev, which makes no sense in bridge, this is for br_netfilter
>   only.

Agree.

> - xt_realm (as already mentioned).
> That is, a partial revert of this patch for:
> 
> - xt_mac
> - xt_owner
> - xt_physdev
> - xt_realm

I'm ok with that too.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox