Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v4 07/10] enic: add MBOX PF handlers for VF register and capability
From: Jakub Kicinski @ 2026-04-14  0:21 UTC (permalink / raw)
  To: devnull+satishkh.cisco.com
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel, 20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9,
	satishkh
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-7-f052326c2a57@cisco.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
enic: add MBOX PF handlers for VF register and capability

Implement PF-side mailbox message processing for SR-IOV V2 admin channel
communication. It processes capability, register, and unregister requests
from VFs.

> diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.c b/drivers/net/ethernet/cisco/enic/enic_mbox.c
> index d144c86d9ef84..f5784624ebbdf 100644
> --- a/drivers/net/ethernet/cisco/enic/enic_mbox.c
> +++ b/drivers/net/ethernet/cisco/enic/enic_mbox.c
[ ... ]
> @@ -124,10 +124,168 @@ int enic_mbox_send_msg(struct enic *enic, u8 msg_type, u16 dst_vnic_id,
[ ... ]
> +static int enic_mbox_pf_handle_unregister(struct enic *enic, void *msg,
> +					  u16 vf_id, u64 msg_num)
> +{
> +	struct enic_mbox_vf_register_reply_msg reply = {};
> +	int err;
> +
> +	if (!enic->vf_state || vf_id >= enic->num_vfs) {
> +		netdev_warn(enic->netdev,
> +			    "MBOX: unregister from invalid VF %u\n", vf_id);
> +		return -EINVAL;
> +	}
> +
> +	reply.reply.ret_major = cpu_to_le16(0);
> +	err = enic_mbox_send_msg(enic, ENIC_MBOX_VF_UNREGISTER_REPLY, vf_id,
> +				 &reply, sizeof(reply));
> +	if (err)
> +		return err;
> +
> +	enic->vf_state[vf_id].registered = false;

If enic_mbox_send_msg() fails (for example, due to a full admin WQ or channel
timeout), the function returns early and the PF misses setting the registered
state to false.

Since an unregister request indicates the VF driver is unloading or shutting
down, should the PF unconditionally clear the registration state regardless of
whether the acknowledgment successfully reaches the defunct VF?

If the local state is not cleared, could the PF assume the VF is still active
and continue attempting to send it unnecessary link state notifications?

[ ... ]
> +static void enic_mbox_pf_process_msg(struct enic *enic,
> +				     struct enic_mbox_hdr *hdr, void *payload)
> +{
> +	u16 vf_id = le16_to_cpu(hdr->src_vnic_id);
> +	u16 msg_len = le16_to_cpu(hdr->msg_len);
> +	int err = 0;
> +
> +	if (!enic->vf_state) {
> +		netdev_dbg(enic->netdev,
> +			   "MBOX: PF received msg but SRIOV not active\n");
> +		return;
> +	}
> +
> +	if (vf_id >= enic->num_vfs) {
> +		netdev_warn(enic->netdev,
> +			    "MBOX: PF received msg from invalid VF %u\n",
> +			    vf_id);
> +		return;
> +	}

Could a malicious or buggy guest VM use this untrusted VF input to cause a
host denial of service through log spamming?

Because this warning is not rate-limited, an untrusted VF could continuously
flood the PF's admin channel with malformed requests. This might exhaust host
CPU cycles and cause dmesg lock contention.

Should these logging calls in the message parsing path use
netdev_warn_ratelimited() instead?

[ ... ]
> @@ -142,10 +300,22 @@ static void enic_mbox_recv_handler(struct enic *enic, void *buf,
>  		return;
>  	}
>  
> +	msg_len = le16_to_cpu(hdr->msg_len);
> +	if (msg_len < sizeof(*hdr) || msg_len > len) {
> +		netdev_warn(enic->netdev,
> +			    "MBOX: invalid msg_len %u (buf len %u)\n",
> +			    msg_len, len);
> +		return;
> +	}

Similarly here, does this log also need rate limiting to protect the host
from being spammed by malformed headers sent from a VF?

^ permalink raw reply

* Re: [PATCH net-next v4 06/10] enic: add MBOX core send and receive for admin channel
From: Jakub Kicinski @ 2026-04-14  0:21 UTC (permalink / raw)
  To: devnull+satishkh.cisco.com
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel, 20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9,
	satishkh
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-6-f052326c2a57@cisco.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: enic: add MBOX core send and receive for admin channel

This commit implements the mailbox protocol engine used for PF-VF
communication over the admin channel.

> diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
> index 345d194c6eebe..c96268adc173f 100644
> --- a/drivers/net/ethernet/cisco/enic/enic_admin.c
> +++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
> @@ -19,6 +19,7 @@
>  #include "cq_enet_desc.h"
>  #include "wq_enet_desc.h"
>  #include "rq_enet_desc.h"
> +#include "enic_mbox.h"
>  
>  /* No-op: admin WQ buffers are freed inline after completion polling */
>  static void enic_admin_wq_buf_clean(struct vnic_wq *wq,
>  				    struct vnic_wq_buf *buf)
>  {
>  }

If an MBOX send times out, the timeout path intentionally leaves the buffer
and DMA mapping pending to prevent a hardware use-after-free. Because this
cleanup function is a no-op, are those pending descriptors and mappings
permanently leaked when the admin channel is later closed or the driver is
removed?

[ ... ]
> @@ -156,7 +157,26 @@ unsigned int enic_admin_rq_cq_service(struct enic *enic, unsigned int budget)
>  					buf->dma_addr, buf->len,
>  					DMA_FROM_DEVICE);
>  
> -		enic_admin_msg_enqueue(enic, buf->os_buf, buf->len);
> +		if (enic->admin_rq_handler) {
> +			struct cq_enet_rq_desc *rq_desc = desc;
> +			u16 sender_vlan;

Does this sequence process the CQ completion unconditionally without checking
the hardware descriptor for errors?

Standard data paths typically verify flags like packet_error or fcs_ok before
accepting the packet. If corrupted administrative messages are received here,
will they be processed as valid?

> +
> +			/* Firmware sets the CQ VLAN field to identify the
> +			 * sender: 0 = PF, 1-based = VF index.  Overwrite
> +			 * the untrusted src_vnic_id in the MBOX header with
> +			 * the hardware-verified value.
> +			 */
> +			sender_vlan = le16_to_cpu(rq_desc->vlan);
> +			if (buf->len >= sizeof(struct enic_mbox_hdr)) {
> +				struct enic_mbox_hdr *hdr = buf->os_buf;
> +
> +				hdr->src_vnic_id = (sender_vlan == 0) ?
> +					cpu_to_le16(ENIC_MBOX_DST_PF) :
> +					cpu_to_le16(sender_vlan - 1);
> +			}
> +
> +			enic_admin_msg_enqueue(enic, buf->os_buf, buf->len);
> +		}

Is buf->len the original static allocation size (2048 bytes) rather than the
actual number of bytes written by the hardware?

If the hardware wrote fewer bytes than the full buffer capacity, it looks
like passing buf->len will enqueue uninitialized heap memory. This would
also bypass the bounds checks in enic_mbox_recv_handler(), as the length
will appear to always be 2048 regardless of the actual packet size.

[ ... ]
> diff --git a/drivers/net/ethernet/cisco/enic/enic_mbox.c b/drivers/net/ethernet/cisco/enic/enic_mbox.c
> new file mode 100644
> index 0000000000000..d144c86d9ef84
> --- /dev/null
> +++ b/drivers/net/ethernet/cisco/enic/enic_mbox.c
[ ... ]
> +	vnic_wq_doorbell(wq);
> +
> +	timeout = jiffies + usecs_to_jiffies(ENIC_MBOX_POLL_TIMEOUT_US);
> +	err = -ETIMEDOUT;
> +	while (time_before(jiffies, timeout)) {
> +		if (enic_admin_wq_cq_service(enic)) {
> +			err = 0;
> +			break;
> +		}
> +		usleep_range(ENIC_MBOX_POLL_INTERVAL_US,
> +			     ENIC_MBOX_POLL_INTERVAL_US + 50);
> +	}

Could this loop falsely report a timeout if the thread experiences a long
scheduling delay during usleep_range()?

If the thread wakes up after the timeout has passed, it will exit the loop
immediately without checking one last time. If the hardware actually
completed the operation during the sleep window, the completion would be
missed, incorrectly disabling the channel.

[ ... ]

^ permalink raw reply

* Re: [PATCH net-next v4 05/10] enic: define MBOX message types and header structures
From: Jakub Kicinski @ 2026-04-14  0:21 UTC (permalink / raw)
  To: devnull+satishkh.cisco.com
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel, 20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9,
	satishkh
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-5-f052326c2a57@cisco.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
enic: define MBOX message types and header structures

This commit defines the mailbox protocol used for PF-VF communication over
the admin channel. It introduces the request/reply message types and the
structures representing the message headers and payloads.

> --- /dev/null
> +++ b/drivers/net/ethernet/cisco/enic/enic_mbox.h
> @@ -0,0 +1,75 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright 2025 Cisco Systems, Inc.  All rights reserved. */
> +
> +#ifndef _ENIC_MBOX_H_
> +#define _ENIC_MBOX_H_

This isn't a bug, but should this header include <linux/types.h> and
<linux/bits.h>? It uses fixed-width types like __le16, __le32, u8 and the
BIT() macro. Without these includes, it might cause compilation errors if
included in an isolated context.

[ ... ]

> +struct enic_mbox_generic_reply {
> +	__le16 ret_major;
> +	__le16 ret_minor;
> +};

[ ... ]

> +struct enic_mbox_vf_capability_reply_msg {
> +	struct enic_mbox_generic_reply reply;
> +	__le32 version;
> +	__le32 reserved[32];
> +};

Does this structure introduce misaligned memory accesses by violating
structure alignment inheritance rules?

struct enic_mbox_generic_reply consists only of __le16 fields, giving it
a natural 2-byte alignment. struct enic_mbox_vf_capability_reply_msg
embeds this reply struct and subsequently adds __le32 version and
__le32 reserved[32], which require 4-byte alignment.

If the outer structure is cast from a buffer that only guarantees the inner
structure's 2-byte alignment, the 4-byte aligned fields could cause
misaligned memory accesses.

Should the wider 32-bit fields be split into smaller __le16 fields to
strictly preserve the inherited 2-byte alignment?

^ permalink raw reply

* Re: [PATCH net-next v4 04/10] enic: add admin CQ service with MSI-X interrupt and NAPI polling
From: Jakub Kicinski @ 2026-04-14  0:21 UTC (permalink / raw)
  To: devnull+satishkh.cisco.com
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel, 20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9,
	satishkh
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-4-f052326c2a57@cisco.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
enic: add admin CQ service with MSI-X interrupt and NAPI polling

Add completion queue service for the admin channel WQ and RQ, driven
by an MSI-X interrupt and NAPI polling.

> diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
> index a8fcd5f116d17..345d194c6eebe 100644
> --- a/drivers/net/ethernet/cisco/enic/enic_admin.c
> +++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
[ ... ]
> @@ -83,6 +85,207 @@ static void enic_admin_rq_drain(struct enic *enic)
>  	vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
>  }
>  
[ ... ]
> +unsigned int enic_admin_rq_cq_service(struct enic *enic, unsigned int budget)
> +{
> +	struct vnic_cq *cq = &enic->admin_cq[1];
> +	struct vnic_rq *rq = &enic->admin_rq;
> +	struct vnic_rq_buf *buf;
> +	unsigned int work = 0;
> +	void *desc;
> +
> +	desc = vnic_cq_to_clean(cq);
> +	while (work < budget &&
> +	       enic_admin_cq_color(desc, cq->ring.desc_size) !=
> +	       cq->last_color) {
> +		/* Ensure CQ descriptor fields are read after
> +		 * the color/valid check.
> +		 */
> +		rmb();
> +		buf = rq->to_clean;
> +
> +		dma_sync_single_for_cpu(&enic->pdev->dev,
> +					buf->dma_addr, buf->len,
> +					DMA_FROM_DEVICE);
> +
> +		enic_admin_msg_enqueue(enic, buf->os_buf, buf->len);

Does this code expose uninitialized kernel heap memory?

The driver reads the completion queue descriptor but does not decode it to
determine the actual number of bytes written by the hardware. Instead, it
passes the full buf->len (2048 bytes) to enic_admin_msg_enqueue().

Since enic_admin_rq_post_one() allocates the receive buffers using kmalloc()
without __GFP_ZERO, passing the full 2048 bytes copies uninitialized kernel
memory following the actual packet payload.

> +
> +		enic_admin_rq_buf_clean(rq, rq->to_clean);
> +		rq->to_clean = rq->to_clean->next;
> +		rq->ring.desc_avail++;
> +
> +		vnic_cq_inc_to_clean(cq);
> +		work++;
> +		desc = vnic_cq_to_clean(cq);
> +	}
> +
> +	enic_admin_rq_fill(enic, GFP_ATOMIC);

Can this lead to a permanent receive queue stall?

Every processed receive buffer is unconditionally unmapped and freed via
enic_admin_rq_buf_clean() above. If enic_admin_rq_fill() fails due to memory
pressure, it silently returns an error and no replacement buffers are posted.

Because the original buffers were destroyed rather than recycled, and there
is no deferred retry mechanism, the descriptors are permanently lost. Repeated
transient failures will empty the RQ and permanently deadlock the admin channel.

[ ... ]
> @@ -128,23 +331,8 @@ static int enic_admin_alloc_resources(struct enic *enic)
>  	if (err)
>  		goto free_cq0;
>  
> -	/* PFs have dedicated SRIOV_INTR resources for admin channel.
> -	 * VFs lack SRIOV_INTR; use a regular INTR_CTRL slot instead.
> -	 */
> -	if (vnic_dev_get_res_count(enic->vdev, RES_TYPE_SRIOV_INTR) >= 1)
> -		err = vnic_intr_alloc_with_type(enic->vdev,
> -						&enic->admin_intr, 0,
> -						RES_TYPE_SRIOV_INTR);
> -	else
> -		err = vnic_intr_alloc(enic->vdev, &enic->admin_intr,
> -				      enic->intr_count);
> -	if (err)
> -		goto free_cq1;
> -
>  	return 0;

Will this removal break the admin channel on Physical Functions (PFs)?

Because PFs frequently have all their standard INTR_CTRL slots consumed by
regular networking queues, they rely on the dedicated SRIOV_INTR slot for
the admin channel.

Removing this fallback means enic_admin_setup_intr() might fail its bounds
check or vnic_intr_alloc() might fail, breaking initialization.

[ ... ]
> @@ -165,10 +353,32 @@ static void enic_admin_free_resources(struct enic *enic)
>  
>  static void enic_admin_init_resources(struct enic *enic)
>  {
> +	unsigned int intr_offset = enic->admin_intr_index;
> +
>  	vnic_wq_init(&enic->admin_wq, 0, 0, 0);
>  	vnic_rq_init(&enic->admin_rq, 1, 0, 0);
> -	vnic_cq_init(&enic->admin_cq[0], 0, 1, 0, 0, 1, 0, 1, 0, 0, 0);
> -	vnic_cq_init(&enic->admin_cq[1], 0, 1, 0, 0, 1, 0, 1, 0, 0, 0);
> +	vnic_cq_init(&enic->admin_cq[0],
> +		     0 /* flow_control_enable */,
> +		     1 /* color_enable */,
> +		     0 /* cq_head */,
> +		     0 /* cq_tail */,
> +		     1 /* cq_tail_color */,
> +		     1 /* interrupt_enable */,

Could setting interrupt_enable to 1 on the transmit completion queue (admin_cq[0])
cause an interrupt storm?

When a transmit completion occurs, it asserts the admin MSI-X interrupt and
schedules the NAPI poll routine (enic_admin_napi_poll). However, the NAPI
poll routine only services the receive queue (admin_cq[1]) and never clears
the event for admin_cq[0].

Since NAPI unmasks the interrupt upon completion, the uncleared transmit
completion would immediately re-trigger the interrupt.

[ ... ]
> @@ -187,12 +397,24 @@ int enic_admin_channel_open(struct enic *enic)
>  		return err;
>  	}
>  
> +	err = enic_admin_setup_intr(enic);
> +	if (err) {
> +		netdev_err(enic->netdev,
> +			   "Admin channel requires MSI-X, SR-IOV unavailable: %d\n",
> +			   err);
> +		goto free_resources;
> +	}
> +
> +	spin_lock_init(&enic->admin_msg_lock);
> +	INIT_LIST_HEAD(&enic->admin_msg_list);
> +	INIT_WORK(&enic->admin_msg_work, enic_admin_msg_work_handler);

Can a race condition occur here?

enic_admin_setup_intr() calls request_irq(), which registers the interrupt
handler and enables the IRQ line. If an interrupt fires immediately upon
registration, the ISR will schedule NAPI, which calls enic_admin_rq_cq_service().

NAPI would then attempt to acquire the uninitialized spinlock, modify the
uninitialized list, and schedule the uninitialized workqueue. Should the data
structures be initialized before requesting the interrupt?

[ ... ]
>  	return 0;
>  
>  disable_queues:
> +	enic_admin_teardown_intr(enic);
>  	vnic_wq_disable(&enic->admin_wq);
>  	vnic_rq_disable(&enic->admin_rq);
>  	enic_admin_qp_type_set(enic, 0);
>  	enic_admin_rq_drain(enic);
> +free_resources:
>  	enic_admin_free_resources(enic);
>  	return err;
>  }

Does the disable_queues error path leak queued messages?

If initialization fails after enic_admin_setup_intr() is called (for example,
during enic_admin_rq_fill()), the code branches to disable_queues. By this time,
the MSI-X interrupt and NAPI have been enabled.

If an interrupt fired and scheduled NAPI, messages could be added to
admin_msg_list and admin_msg_work could be scheduled. The error path does not
call cancel_work_sync(&enic->admin_msg_work) or enic_admin_msg_drain(enic),
which are required to stop the deferred work.

^ permalink raw reply

* Re: [PATCH net-next v4 03/10] enic: add admin RQ buffer management
From: Jakub Kicinski @ 2026-04-14  0:21 UTC (permalink / raw)
  To: devnull+satishkh.cisco.com
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel, 20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9,
	satishkh
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-3-f052326c2a57@cisco.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
enic: add admin RQ buffer management

This commit adds buffer management for the enic driver's admin receive queue,
allocating DMA buffers on open and draining them on close.

> diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
> index d1abe6a50095d..a8fcd5f116d17 100644
> --- a/drivers/net/ethernet/cisco/enic/enic_admin.c
> +++ b/drivers/net/ethernet/cisco/enic/enic_admin.c
[ ... ]
> @@ -151,6 +212,7 @@ int enic_admin_channel_open(struct enic *enic)
>  	vnic_wq_disable(&enic->admin_wq);
>  	vnic_rq_disable(&enic->admin_rq);
>  	enic_admin_qp_type_set(enic, 0);
> +	enic_admin_rq_drain(enic);
>  	enic_admin_free_resources(enic);
>  	return err;
>  }

Could this sequence allow the hardware to DMA into freed memory?

Here and in enic_admin_channel_close() below, vnic_rq_disable() is called
before enic_admin_qp_type_set(enic, 0), meaning the firmware might still be
actively routing admin messages to the queue.

Additionally, the return value of vnic_rq_disable() is ignored. If the disable
operation fails or times out, the hardware queue will remain active.

Then enic_admin_rq_drain() unconditionally unmaps the DMA addresses and frees
the buffers. If the queue is still active, incoming messages could be written
directly into the freed memory.

Would it be safer to stop the source of the messages first, and verify the
queue is successfully disabled before freeing the buffers?

[ ... ]
> @@ -166,7 +228,7 @@ void enic_admin_channel_close(struct enic *enic)
>  	enic_admin_qp_type_set(enic, 0);
>  
>  	vnic_wq_clean(&enic->admin_wq, enic_admin_wq_buf_clean);
> -	vnic_rq_clean(&enic->admin_rq, enic_admin_rq_buf_clean);
> +	enic_admin_rq_drain(enic);
>  	vnic_cq_clean(&enic->admin_cq[0]);
>  	vnic_cq_clean(&enic->admin_cq[1]);
>  	vnic_intr_clean(&enic->admin_intr);

^ permalink raw reply

* Re: [PATCH net-next v4 02/10] enic: add admin channel open and close for SR-IOV
From: Jakub Kicinski @ 2026-04-14  0:21 UTC (permalink / raw)
  To: devnull+satishkh.cisco.com
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel, 20260401-enic-sriov-v2-prep-v4-0-d5834b2ef1b9,
	satishkh
In-Reply-To: <20260411-enic-sriov-v2-admin-channel-v2-v4-2-f052326c2a57@cisco.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
enic: add admin channel open and close for SR-IOV

Introduce enic_admin_channel_open() and enic_admin_channel_close() to allocate
and initialize the admin WQ, RQ, two CQs, and one SR-IOV interrupt for the V2
SR-IOV PF-VF mailbox communication.

> diff --git a/drivers/net/ethernet/cisco/enic/enic_admin.c b/drivers/net/ethernet/cisco/enic/enic_admin.c
> new file mode 100644
> index 0000000000000..d1abe6a50095d
> --- /dev/null
> +++ b/drivers/net/ethernet/cisco/enic/enic_admin.c

[ ... ]

> +	err = vnic_cq_alloc_with_type(enic->vdev, &enic->admin_cq[0], 0,
> +				      ENIC_ADMIN_DESC_COUNT,
> +				      sizeof(struct cq_desc),
> +				      RES_TYPE_ADMIN_CQ);
> +	if (err)
> +		goto free_rq;
> +
> +	err = vnic_cq_alloc_with_type(enic->vdev, &enic->admin_cq[1], 1,
> +				      ENIC_ADMIN_DESC_COUNT,
> +				      16 << enic->ext_cq,
                                      ^^^^^^^^^^^^^^^^^^

Does this code use the wrong descriptor size for the admin RQ's completion
queue?

The admin WQ's completion queue (admin_cq[0]) uses sizeof(struct cq_desc),
which is 16 bytes. If the admin channel handles PF-VF mailbox messages rather
than data packets with extended metadata, will firmware still write standard
16-byte completions here?

If so, won't allocating the ring with an extended size cause the host polling
loop to look for the color bit at the wrong offset (e.g., byte 31 instead of
byte 15), causing it to silently miss incoming admin messages?

> +				      RES_TYPE_ADMIN_CQ);
> +	if (err)
> +		goto free_cq0;
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next] net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration
From: patchwork-bot+netdevbpf @ 2026-04-14  0:20 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, linux-arm-kernel,
	linux-mediatek, netdev
In-Reply-To: <20260412-airoha_fe_init_remove_mc_en_bit-v1-1-7b6a5a25a74d@kernel.org>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 12 Apr 2026 11:56:25 +0200 you wrote:
> PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration performed in
> airoha_fe_init() is used to duplicate multicast packets and send a copy
> to the CPU when the traffic is offloaded. This is necessary just if
> it is requested by the user. Disable multicast packets duplication by
> default.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> 
> [...]

Here is the summary with links:
  - [net-next] net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration
    https://git.kernel.org/netdev/net-next/c/34e1a98ff2a8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v05 1/6] hinic3: Add ethtool queue ops
From: Jakub Kicinski @ 2026-04-14  0:18 UTC (permalink / raw)
  To: Fan Gong
  Cc: Zhu Yikai, netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Andrew Lunn, Ioana Ciornei, Mohsin Bashir,
	linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <157d5cc6e757ffa77eee01dfdc3f2159dc97905f.1775711066.git.zhuyikai1@h-partners.com>

On Sat, 11 Apr 2026 11:36:59 +0800 Fan Gong wrote:
>   Implement following ethtool callback function:
> .get_ringparam
> .set_ringparam
> 
>   These callbacks allow users to utilize ethtool for detailed
> queue depth configuration and monitoring.

> +static int hinic3_check_ringparam_valid(struct net_device *netdev,
> +					const struct ethtool_ringparam *ring)
> +{
> +	if (ring->rx_jumbo_pending || ring->rx_mini_pending) {

Can driver actually be called with non-zero values if max is not set?

> +		netdev_err(netdev, "Unsupported rx_jumbo_pending/rx_mini_pending\n");
> +		return -EINVAL;
> +	}


> +	if (ring->tx_pending > HINIC3_MAX_TX_QUEUE_DEPTH ||
> +	    ring->tx_pending < HINIC3_MIN_QUEUE_DEPTH ||
> +	    ring->rx_pending > HINIC3_MAX_RX_QUEUE_DEPTH ||
> +	    ring->rx_pending < HINIC3_MIN_QUEUE_DEPTH) {

similar question - do you need to check the upper bound?
kernel should check the input against max returned by .get

> +		netdev_err(netdev,

please use extack for errors

> +			   "Queue depth out of range tx[%d-%d] rx[%d-%d]\n",
> +			   HINIC3_MIN_QUEUE_DEPTH, HINIC3_MAX_TX_QUEUE_DEPTH,
> +			   HINIC3_MIN_QUEUE_DEPTH, HINIC3_MAX_RX_QUEUE_DEPTH);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int hinic3_set_ringparam(struct net_device *netdev,
> +				struct ethtool_ringparam *ring,
> +				struct kernel_ethtool_ringparam *kernel_ring,
> +				struct netlink_ext_ack *extack)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	struct hinic3_dyna_txrxq_params q_params = {};
> +	u32 new_sq_depth, new_rq_depth;
> +	int err;
> +
> +	err = hinic3_check_ringparam_valid(netdev, ring);
> +	if (err)
> +		return err;
> +
> +	new_sq_depth = 1U << ilog2(ring->tx_pending);
> +	new_rq_depth = 1U << ilog2(ring->rx_pending);
> +	if (new_sq_depth == nic_dev->q_params.sq_depth &&
> +	    new_rq_depth == nic_dev->q_params.rq_depth)
> +		return 0;
> +
> +	if (new_sq_depth != ring->tx_pending)
> +		netdev_info(netdev, "Requested Tx depth trimmed to %d\n",
> +			    new_sq_depth);

please use extack for warnings

> +	if (new_rq_depth != ring->rx_pending)
> +		netdev_info(netdev, "Requested Rx depth trimmed to %d\n",
> +			    new_rq_depth);
> +
> +	netdev_info(netdev, "Change Tx/Rx ring depth from %u/%u to %u/%u\n",
> +		    nic_dev->q_params.sq_depth, nic_dev->q_params.rq_depth,
> +		    new_sq_depth, new_rq_depth);
> +
> +	if (!netif_running(netdev)) {
> +		hinic3_update_qp_depth(netdev, new_sq_depth, new_rq_depth);
> +	} else {
> +		q_params = nic_dev->q_params;
> +		q_params.sq_depth = new_sq_depth;
> +		q_params.rq_depth = new_rq_depth;
> +
> +		err = hinic3_change_channel_settings(netdev, &q_params);
> +		if (err) {
> +			netdev_err(netdev, "Failed to change channel settings\n");
> +			return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static const struct ethtool_ops hinic3_ethtool_ops = {
>  	.supported_coalesce_params      = ETHTOOL_COALESCE_USECS |
>  					  ETHTOOL_COALESCE_PKT_RATE_RX_USECS,
> @@ -417,6 +516,8 @@ static const struct ethtool_ops hinic3_ethtool_ops = {
>  	.get_msglevel                   = hinic3_get_msglevel,
>  	.set_msglevel                   = hinic3_set_msglevel,
>  	.get_link                       = ethtool_op_get_link,
> +	.get_ringparam                  = hinic3_get_ringparam,
> +	.set_ringparam                  = hinic3_set_ringparam,
>  };

> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
> index 0a888fe4c975..3b470978714a 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
> @@ -179,6 +179,8 @@ static int hinic3_sw_init(struct net_device *netdev)
>  	int err;
>  
>  	mutex_init(&nic_dev->port_state_mutex);
> +	mutex_init(&nic_dev->channel_cfg_lock);

Why do you need this mutex?
Aren't all the places you take it under rtnl_lock anyway?

> +	spin_lock_init(&nic_dev->channel_res_lock);
>  
>  	nic_dev->q_params.sq_depth = HINIC3_SQ_DEPTH;
>  	nic_dev->q_params.rq_depth = HINIC3_RQ_DEPTH;
> @@ -314,6 +316,15 @@ static void hinic3_link_status_change(struct net_device *netdev,
>  				      bool link_status_up)
>  {
>  	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	unsigned long flags;
> +	bool valid;
> +
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	valid = HINIC3_CHANNEL_RES_VALID(nic_dev);
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	if (!valid)

Why are you checking valid here? What if the state changes immediately
after unlocking?

> +		return;
>  
>  	if (link_status_up) {
>  		if (netif_carrier_ok(netdev))
> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c b/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
> index da73811641a9..d652a5ffdc2c 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
> @@ -428,6 +428,85 @@ static void hinic3_vport_down(struct net_device *netdev)
>  	}
>  }
>  
> +int
> +hinic3_change_channel_settings(struct net_device *netdev,
> +			       struct hinic3_dyna_txrxq_params *trxq_params)
> +{
> +	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> +	struct hinic3_dyna_txrxq_params old_qp_params = {};
> +	struct hinic3_dyna_qp_params new_qp_params = {};
> +	struct hinic3_dyna_qp_params cur_qp_params = {};
> +	bool need_teardown = false;
> +	unsigned long flags;
> +	int err;
> +
> +	mutex_lock(&nic_dev->channel_cfg_lock);
> +
> +	hinic3_config_num_qps(netdev, trxq_params);
> +
> +	err = hinic3_alloc_channel_resources(netdev, &new_qp_params,
> +					     trxq_params);
> +	if (err) {
> +		netdev_err(netdev, "Failed to alloc channel resources\n");
> +		mutex_unlock(&nic_dev->channel_cfg_lock);
> +		return err;
> +	}
> +
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	if (!test_and_set_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags))
> +		need_teardown = true;
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	if (need_teardown) {
> +		hinic3_vport_down(netdev);
> +		hinic3_close_channel(netdev);
> +		hinic3_uninit_qps(nic_dev, &cur_qp_params);
> +		hinic3_free_channel_resources(netdev, &cur_qp_params,
> +					      &nic_dev->q_params);
> +	}
> +
> +	if (nic_dev->num_qp_irq > trxq_params->num_qps)
> +		hinic3_qp_irq_change(netdev, trxq_params->num_qps);
> +
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	old_qp_params = nic_dev->q_params;
> +	nic_dev->q_params = *trxq_params;
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	hinic3_init_qps(nic_dev, &new_qp_params);
> +
> +	err = hinic3_open_channel(netdev);

This "open" function allocates Rx buffers, and fails if it couldn't get
even one. That's no good.

> +	if (err)
> +		goto err_uninit_qps;
> +
> +	err = hinic3_vport_up(netdev);
> +	if (err)
> +		goto err_close_channel;
> +
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	clear_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags);
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	mutex_unlock(&nic_dev->channel_cfg_lock);
> +
> +	return 0;
> +
> +err_close_channel:
> +	hinic3_close_channel(netdev);
> +err_uninit_qps:
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	nic_dev->q_params = old_qp_params;
> +	clear_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags);
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	hinic3_uninit_qps(nic_dev, &new_qp_params);
> +	hinic3_free_channel_resources(netdev, &new_qp_params, trxq_params);
> +
> +	mutex_unlock(&nic_dev->channel_cfg_lock);

AI says:

  Can this error path lead to memory corruption?
  If need_teardown was true, the old channel resources were freed earlier in
  the function. If hinic3_open_channel() or hinic3_vport_up() fails, the code
  jumps to err_uninit_qps and restores nic_dev->q_params = old_qp_params.
  However, it doesn't appear to re-allocate those old resources or mark the
  interface as down. Could a subsequent administrative teardown or network
  traffic dereference these freed pointers?

> +
>  static int hinic3_open(struct net_device *netdev)
>  {
>  	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> @@ -487,16 +566,33 @@ static int hinic3_close(struct net_device *netdev)
>  {
>  	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
>  	struct hinic3_dyna_qp_params qp_params;
> +	bool need_teardown = false;
> +	unsigned long flags;
>  
>  	if (!test_and_clear_bit(HINIC3_INTF_UP, &nic_dev->flags)) {
>  		netdev_dbg(netdev, "Netdev already close, do nothing\n");
>  		return 0;
>  	}
>  
> -	hinic3_vport_down(netdev);
> -	hinic3_close_channel(netdev);
> -	hinic3_uninit_qps(nic_dev, &qp_params);
> -	hinic3_free_channel_resources(netdev, &qp_params, &nic_dev->q_params);
> +	mutex_lock(&nic_dev->channel_cfg_lock);
> +
> +	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> +	if (!test_and_set_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags))
> +		need_teardown = true;
> +	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> +	if (need_teardown) {
> +		hinic3_vport_down(netdev);
> +		hinic3_close_channel(netdev);
> +		hinic3_uninit_qps(nic_dev, &qp_params);
> +		hinic3_free_channel_resources(netdev, &qp_params,
> +					      &nic_dev->q_params);
> +	}
> +
> +	hinic3_free_nicio_res(nic_dev);
> +	hinic3_destroy_num_qps(netdev);
> +
> +	mutex_unlock(&nic_dev->channel_cfg_lock);
>  
>  	return 0;
>  }
> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
> index 9502293ff710..55b280888ad8 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
> @@ -10,6 +10,9 @@
>  #include "hinic3_hw_cfg.h"
>  #include "hinic3_hwdev.h"
>  #include "hinic3_mgmt_interface.h"
> +#include "hinic3_nic_io.h"
> +#include "hinic3_tx.h"
> +#include "hinic3_rx.h"
>  
>  #define HINIC3_VLAN_BITMAP_BYTE_SIZE(nic_dev)  (sizeof(*(nic_dev)->vlan_bitmap))
>  #define HINIC3_VLAN_BITMAP_SIZE(nic_dev)  \
> @@ -20,8 +23,13 @@ enum hinic3_flags {
>  	HINIC3_MAC_FILTER_CHANGED,
>  	HINIC3_RSS_ENABLE,
>  	HINIC3_UPDATE_MAC_FILTER,
> +	HINIC3_CHANGE_RES_INVALID,
>  };
>  
> +#define HINIC3_CHANNEL_RES_VALID(nic_dev) \
> +	(test_bit(HINIC3_INTF_UP, &(nic_dev)->flags) && \
> +	 !test_bit(HINIC3_CHANGE_RES_INVALID, &(nic_dev)->flags))

I don't get why you need to check both of these bits.
Can't there be one bit for "resources valid" ?
And it will only be set while device is up (of course) so no need to
also check UP (this way checking can be atomic without the spin lock).

>  enum hinic3_event_work_flags {
>  	HINIC3_EVENT_WORK_TX_TIMEOUT,
>  };
> @@ -129,6 +137,10 @@ struct hinic3_nic_dev {
>  	struct work_struct              rx_mode_work;
>  	/* lock for enable/disable port */
>  	struct mutex                    port_state_mutex;
> +	/* lock for channel configuration */
> +	struct mutex                    channel_cfg_lock;
> +	/* lock for channel resources */
> +	spinlock_t                      channel_res_lock;
>  
>  	struct list_head                uc_filter_list;
>  	struct list_head                mc_filter_list;
> @@ -143,6 +155,10 @@ struct hinic3_nic_dev {
>  
>  void hinic3_set_netdev_ops(struct net_device *netdev);
>  int hinic3_set_hw_features(struct net_device *netdev);
> +int
> +hinic3_change_channel_settings(struct net_device *netdev,
> +			       struct hinic3_dyna_txrxq_params *trxq_params);
> +
>  int hinic3_qps_irq_init(struct net_device *netdev);
>  void hinic3_qps_irq_uninit(struct net_device *netdev);
>  
> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
> index 12eefabcf1db..3791b9bc865b 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
> @@ -14,6 +14,10 @@ struct hinic3_nic_dev;
>  #define HINIC3_RQ_WQEBB_SHIFT      3
>  #define HINIC3_SQ_WQEBB_SIZE       BIT(HINIC3_SQ_WQEBB_SHIFT)
>  
> +#define HINIC3_MAX_TX_QUEUE_DEPTH  65536
> +#define HINIC3_MAX_RX_QUEUE_DEPTH  16384
> +#define HINIC3_MIN_QUEUE_DEPTH     128
> +
>  /* ******************** RQ_CTRL ******************** */
>  enum hinic3_rq_wqe_type {
>  	HINIC3_NORMAL_RQ_WQE = 1,


^ permalink raw reply

* Re: [PATCH net v2] RDS: Fix memory leak in rds_rdma_extra_size()
From: Allison Henderson @ 2026-04-14  0:15 UTC (permalink / raw)
  To: Xiaobo Liu, David S. Miller
  Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
	linux-rdma, rds-devel, linux-kernel
In-Reply-To: <20260413070005.15272-1-cppcoffee@gmail.com>

On Mon, 2026-04-13 at 15:00 +0800, Xiaobo Liu wrote:
> Free iov->iov when copy_from_user() or page count validation fails
> in rds_rdma_extra_size().
> 
> This preserves the existing success path and avoids leaking the
> allocated iovec array on error.
> 
> Signed-off-by: Xiaobo Liu <cppcoffee@gmail.com>

I think this looks good now.  Thanks Xiaobo.
Reviewed-by: Allison Henderson <achender@kernel.org>

> ---
>  net/rds/rdma.c | 28 +++++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> index aa6465dc7..91a20c1e2 100644
> --- a/net/rds/rdma.c
> +++ b/net/rds/rdma.c
> @@ -560,6 +560,7 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  	struct rds_iovec *vec;
>  	struct rds_iovec __user *local_vec;
>  	int tot_pages = 0;
> +	int ret = 0;
>  	unsigned int nr_pages;
>  	unsigned int i;
>  
> @@ -578,16 +579,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  	vec = &iov->iov[0];
>  
>  	if (copy_from_user(vec, local_vec, args->nr_local *
> -			   sizeof(struct rds_iovec)))
> -		return -EFAULT;
> +			   sizeof(struct rds_iovec))) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
>  	iov->len = args->nr_local;
>  
>  	/* figure out the number of pages in the vector */
>  	for (i = 0; i < args->nr_local; i++, vec++) {
>  
>  		nr_pages = rds_pages_in_vec(vec);
> -		if (nr_pages == 0)
> -			return -EINVAL;
> +		if (nr_pages == 0) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
>  
>  		tot_pages += nr_pages;
>  
> @@ -595,11 +600,20 @@ int rds_rdma_extra_size(struct rds_rdma_args *args,
>  		 * nr_pages for one entry is limited to (UINT_MAX>>PAGE_SHIFT)+1,
>  		 * so tot_pages cannot overflow without first going negative.
>  		 */
> -		if (tot_pages < 0)
> -			return -EINVAL;
> +		if (tot_pages < 0) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
>  	}
>  
> -	return tot_pages * sizeof(struct scatterlist);
> +	ret = tot_pages * sizeof(struct scatterlist);
> +
> +out:
> +	if (ret < 0) {
> +		kfree(iov->iov);
> +		iov->iov = NULL;
> +	}
> +	return ret;
>  }
>  
>  /*


^ permalink raw reply

* Re: [PATCH net v3] netfilter: nft_set_rbtree: fix use count leak on transaction abort
From: Pablo Neira Ayuso @ 2026-04-14  0:11 UTC (permalink / raw)
  To: Marko Jevtic
  Cc: fw, netfilter-devel, phil, coreteam, davem, edumazet, kuba,
	pabeni, horms, netdev, linux-kernel
In-Reply-To: <20260412222801.34965-1-marko.jevtic@codereflect.io>

Hi,

On Mon, Apr 13, 2026 at 12:28:01AM +0200, Marko Jevtic wrote:
> nft_rbtree_abort() does not handle elements moved to the expired list
> by inline GC during __nft_rbtree_insert(). When inline GC encounters
> expired elements during overlap detection, it calls
> nft_rbtree_gc_elem_move() which deactivates element data (decrementing
> chain/object use counts), removes the element from the rbtree, and
> queues it for deferred freeing. On commit, these elements are freed
> via nft_rbtree_gc_queue(). On abort, however, the expired list is
> ignored entirely.
> 
> This leaves use counts permanently decremented after abort.

Yes, but that is expected?

Expired elements reside in priv->expired, these elements are already
deactivated, ie. removed from the rbtree and chain reference is
decremented.

From abort path, the deactivated element simply remains there until
there is a commit run that gets rid of it.

I can't make any sense of this bug report so far.

Why do you think there is a need to restore an expired element?

^ permalink raw reply

* Re: [PATCH net] sctp: fix missing encap_port propagation for GSO fragments
From: patchwork-bot+netdevbpf @ 2026-04-14  0:10 UTC (permalink / raw)
  To: Xin Long
  Cc: netdev, linux-sctp, davem, kuba, edumazet, pabeni, horms,
	marcelo.leitner
In-Reply-To: <ea65ed61b3598d8b4940f0170b9aa1762307e6c3.1776017631.git.lucien.xin@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 12 Apr 2026 14:13:51 -0400 you wrote:
> encap_port in SCTP_INPUT_CB(skb) is used by sctp_vtag_verify() for
> SCTP-over-UDP processing. In the GSO case, it is only set on the head
> skb, while fragment skbs leave it 0.
> 
> This results in fragment skbs seeing encap_port == 0, breaking
> SCTP-over-UDP connections.
> 
> [...]

Here is the summary with links:
  - [net] sctp: fix missing encap_port propagation for GSO fragments
    https://git.kernel.org/netdev/net/c/bf6f95ae3b8b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] sctp: disable BH before calling udp_tunnel_xmit_skb()
From: patchwork-bot+netdevbpf @ 2026-04-14  0:10 UTC (permalink / raw)
  To: Xin Long
  Cc: netdev, linux-sctp, davem, kuba, edumazet, pabeni, horms,
	marcelo.leitner, bestswngs
In-Reply-To: <c874a8548221dcd56ff03c65ba75a74e6cf99119.1776017727.git.lucien.xin@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 12 Apr 2026 14:15:27 -0400 you wrote:
> udp_tunnel_xmit_skb() / udp_tunnel6_xmit_skb() are expected to run with
> BH disabled.  After commit 6f1a9140ecda ("add xmit recursion limit to
> tunnel xmit functions"), on the path:
> 
>   udp(6)_tunnel_xmit_skb() -> ip(6)tunnel_xmit()
> 
> dev_xmit_recursion_inc()/dec() must stay balanced on the same CPU.
> 
> [...]

Here is the summary with links:
  - [net] sctp: disable BH before calling udp_tunnel_xmit_skb()
    https://git.kernel.org/netdev/net/c/2cd7e6971fc2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 0/3] net: airoha: Preliminary series to support multiple net_devices connected to the same GDM port
From: patchwork-bot+netdevbpf @ 2026-04-14  0:00 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, linux-arm-kernel,
	linux-mediatek, netdev, xuegang.lu
In-Reply-To: <20260412-airoha-multi-serdes-preliminary-patch-v1-0-08d5b670ca8f@kernel.org>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 12 Apr 2026 19:13:11 +0200 you wrote:
> EN7581 or AN7583 SoCs support connecting multiple external SerDes (e.g.
> Ethernet or USB SerDes) to GDM3 or GDM4 ports via a hw arbiter that
> manages the traffic in a TDM manner.
> This series introduces some preliminary changes necessary to introduce
> support for multiple net_devices connected to the same Frame Engine (FE)
> GDM port (GDM3 or GDM4).
> 
> [...]

Here is the summary with links:
  - [net-next,1/3] net: airoha: Rely on net_device pointer in airoha_dev_setup_tc_block signature
    https://git.kernel.org/netdev/net-next/c/360d745a5319
  - [net-next,2/3] net: airoha: Rely on net_device pointer in HTB callbacks
    https://git.kernel.org/netdev/net-next/c/8baf4bf72ef9
  - [net-next,3/3] net: airoha: Rely on net_device pointer in ETS callbacks
    https://git.kernel.org/netdev/net-next/c/ae32f80018f0

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 2/5] selftests: ovpn: fail notification check on mismatch
From: Jakub Kicinski @ 2026-04-14  0:00 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: netdev, ralf, Sabrina Dubroca, Paolo Abeni, Andrew Lunn,
	David S. Miller, Eric Dumazet
In-Reply-To: <20260412221121.410365-3-antonio@openvpn.net>

On Mon, 13 Apr 2026 00:11:18 +0200 Antonio Quartulli wrote:
> compare_ntfs doesn't fail when expected and received notification
> streams diverge.
> 
> Fix this bug by trackink the diff exit status explicitly and return it
> to the caller so notification mismatches propagate as test failures.

Hm, this series nicely cleans up test_mark.sh failures
but test_tcp.sh now always fails on debug (slow) kernel
builds with:

# TAP version 13
# 1..12
# ok 1 setup network topology
# ok 2 run baseline data traffic
# ok 3 run LAN traffic behind peer1
# ok 4 run iperf throughput
# ok 5 run key rollout
# ok 6 query peers
# ok 7 query missing peer fails
# ok 8 peer lifecycle and key queries
# ok 9 delete peer while traffic
# ok 10 delete stale keys
# ok 11 check timeout behavior
# Checking notifications for peer 3... failed
# 1,9d0
# < {
# <   "name": "peer-del-ntf",
# <   "msg": {
# <     "peer": {
# <       "del-reason": "expired",
# <       "id": 12
# <     }
# <   }
# < }
# validate listener output for peer 3: command failed with rc=1: ovpn_compare_ntfs 3
# not ok 12 validate notification output
# # Totals: pass:11 fail:1 xfail:0 xpass:0 skip:0 error:0

Similar failure in test_symmetric_id_tcp.sh

Only the debug kernels tho, non-debug kernels seem to pass.
So probably some race / slowness.

More runs if you want to compare
https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-extra-dbg&test=test-tcp-sh
This series landed in net-next-2026-04-13--06-00

^ permalink raw reply

* Re: [PATCH net-next 5/5] selftests: ovpn: align command flow with TAP
From: Jakub Kicinski @ 2026-04-13 23:56 UTC (permalink / raw)
  To: antonio
  Cc: Jakub Kicinski, netdev, ralf, sd, pabeni, andrew+netdev, davem,
	edumazet
In-Reply-To: <20260412221121.410365-6-antonio@openvpn.net>

This is an AI-generated review of your patch. The human sending this
email says: "The second one looks legit, would you prefer to follow
up or respin?"
---
selftests: ovpn: align command flow with TAP

This commit restructures the ovpn selftests to use the TAP infrastructure,
splitting each test into stages with KTAP output, adding command wrappers,
and centralizing behavior control in common.sh.

> diff --git a/tools/testing/selftests/net/ovpn/common.sh b/tools/testing/selftests/net/ovpn/common.sh
> index 2b3f2e5c8cc9d..ec6fea37ceb39 100644
> --- a/tools/testing/selftests/net/ovpn/common.sh
> +++ b/tools/testing/selftests/net/ovpn/common.sh

[ ... ]

> @@ -152,8 +263,8 @@ ovpn_compare_ntfs() {
>  		received="${OVPN_TMP_JSONS[$1]}"
>  		diff_file=$(mktemp)
>  
> -		kill -TERM ${OVPN_LISTENER_PIDS[$1]} || true
> -		wait ${OVPN_LISTENER_PIDS[$1]} || true
> +		kill -TERM ${OVPN_LISTENER_PIDS[$1]} 2>/dev/null || true
> +		wait ${OVPN_LISTENER_PIDS[$1]} 2>/dev/null || true
>  		printf "Checking notifications for peer ${1}... "

Does this leave the reaped PID in OVPN_LISTENER_PIDS?

Later, ovpn_cleanup() calls ovpn_stop_listener() which iterates over the
array and issues another kill -TERM on the same PID. Since the process was
already reaped by the wait here, could the OS recycle the PID, potentially
causing the second kill to terminate an unrelated system process?

[ ... ]

> +ovpn_cleanup_peer_ns() {
> +	local peer="$1"
> +
> +	if ! ip netns list | grep -qx "${peer}"; then
> +		return 0
> +	fi

Can this fail to match and permanently leak network namespaces?

When virtual interfaces are moved into namespaces during setup, ip netns list
often includes an internal ID in its output, such as:
peer0 (id: 0)

Because grep is using the -x flag to enforce an exact whole-line match,
it appears this will fail to match peer0 and return early, skipping the
subsequent namespace deletion.

[ ... ]

> diff --git a/tools/testing/selftests/net/ovpn/test-close-socket.sh b/tools/testing/selftests/net/ovpn/test-close-socket.sh
> index 4f0367c60fdaf..e97affe727383 100755
> --- a/tools/testing/selftests/net/ovpn/test-close-socket.sh
> +++ b/tools/testing/selftests/net/ovpn/test-close-socket.sh
> @@ -9,37 +9,72 @@ set -e
>  
>  source ./common.sh

Is test-close-socket.sh missing the set -eE flag and the ovpn_stage_err
ERR trap?

The other test scripts were updated to use set -eE and register the trap,
which handles emitting the KTAP failure directive. If a command fails here,
will set -e cause the script to exit immediately and bypass the KTAP
failure reporting entirely?

^ permalink raw reply

* Re: [PATCH net-next v18 00/15] Begin upstreaming Homa transport protocol
From: John Ousterhout @ 2026-04-13 23:54 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, pabeni, edumazet, horms
In-Reply-To: <20260412134531.21341692@kernel.org>

On Sun, Apr 12, 2026 at 1:45 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 10 Apr 2026 13:02:54 -0700 John Ousterhout wrote:
> > This patch series begins the process of upstreaming the Homa transport
> > protocol. Homa is an alternative to TCP for use in datacenter
> > environments. It provides 10-100x reductions in tail latency for short
> > messages relative to TCP. Its benefits are greatest for mixed workloads
> > containing both short and long messages running under high network loads.
> > Homa is not API-compatible with TCP: it is connectionless and message-
> > oriented (but still reliable and flow-controlled). Homa's new API not
> > only contributes to its performance gains, but it also eliminates the
> > massive amount of connection state required by TCP for highly connected
> > datacenter workloads (Homa uses ~ 1 socket per application, whereas
> > TCP requires a separate socket for each peer).
>
> make coccicheck says:
>
> net/homa/homa_peer.c:213:21-22: WARNING opportunity for swap()

Fixed.

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] net: dsa: mxl862xx: add statistics support
From: patchwork-bot+netdevbpf @ 2026-04-13 23:50 UTC (permalink / raw)
  To: Daniel Golle
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, linux, netdev,
	linux-kernel, frankwu, chad, cezary.wilmanski, lxu, yweng, jverdu,
	ajayaraman, john
In-Reply-To: <cover.1775951347.git.daniel@makrotopia.org>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 12 Apr 2026 01:01:48 +0100 you wrote:
> Add per-port RMON statistics support for the MxL862xx DSA driver,
> covering hardware-specific ethtool -S counters, standard IEEE 802.3
> MAC/ctrl/pause statistics, and rtnl_link_stats64 via polled 64-bit
> accumulation.
> 
> Changes since v1:
>  * trim mxl862xx_mib[] to counters not covered elsewhere only
>  * remove histogram counters (moved to .get_rmon_stats)
>  * remove RMON error counters (moved to .get_rmon_stats)
>  * remove counters already in .get_eth_mac_stats
>  * remove counters already in .get_stats64
>  * add mxl862xx_rmon_ranges[] and mxl862xx_get_rmon_stats()
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/2] net: dsa: mxl862xx: add ethtool statistics support
    https://git.kernel.org/netdev/net-next/c/e6295d124644
  - [net-next,v2,2/2] net: dsa: mxl862xx: implement .get_stats64
    https://git.kernel.org/netdev/net-next/c/a21d33a5265f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net: phy: realtek: use LEDCR page number define on RTL8211F
From: patchwork-bot+netdevbpf @ 2026-04-13 23:50 UTC (permalink / raw)
  To: Aleksander Jan Bajkowski
  Cc: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni, daniel,
	vladimir.oltean, michael, daniel.braunwarth, ih, rmk+kernel,
	netdev, linux-kernel
In-Reply-To: <20260411105150.184577-1-olek2@wp.pl>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 11 Apr 2026 12:51:45 +0200 you wrote:
> Replace the magic number with an existing define for the LEDCR
> register page number on the RTL8211F.
> 
> Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
> ---
>  drivers/net/phy/realtek/realtek_main.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Here is the summary with links:
  - net: phy: realtek: use LEDCR page number define on RTL8211F
    https://git.kernel.org/netdev/net-next/c/43a2deae3661

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH iwl-net] ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
From: Jacob Keller @ 2026-04-13 23:43 UTC (permalink / raw)
  To: Petr Oros, netdev
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Aleksandr Loktionov,
	Nikolay Aleksandrov, Daniel Zahka, Paul Greenwalt, Dave Ertman,
	Michal Swiatkowski, intel-wired-lan, linux-kernel
In-Reply-To: <20260413191420.3524013-1-poros@redhat.com>

On 4/13/2026 12:14 PM, Petr Oros wrote:
> On certain E810 configurations where firmware supports Tx scheduler
> topology switching (tx_sched_topo_comp_mode_en), ice_cfg_tx_topo()
> may need to apply a new 5-layer or 9-layer topology from the DDP
> package. If the AQ command to set the topology fails (e.g. due to
> invalid DDP data or firmware limitations), the global configuration
> lock must still be cleared via a CORER reset.
> 
> Commit 86aae43f21cf ("ice: don't leave device non-functional if Tx
> scheduler config fails") correctly fixed this by refactoring
> ice_cfg_tx_topo() to always trigger CORER after acquiring the global
> lock and re-initialize hardware via ice_init_hw() afterwards.
> 
> However, commit 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end
> of deinit paths") later moved ice_init_dev_hw() into ice_init_hw(),
> breaking the reinit path introduced by 86aae43f21cf. This creates an
> infinite recursive call chain:
> 
>   ice_init_hw()
>     ice_init_dev_hw()
>       ice_cfg_tx_topo()         # topology change needed
>         ice_deinit_hw()
>         ice_init_hw()           # reinit after CORER
>           ice_init_dev_hw()     # recurse
>             ice_cfg_tx_topo()
>               ...               # stack overflow
> 

Oof, ya thats not good. I guess this only happens if the topology needs
to change, so it wouldn't affect many systems where we had already
changed the topology before hand on the old driver.

> Fix by moving ice_init_dev_hw() back out of ice_init_hw() and calling
> it explicitly from ice_probe() and ice_devlink_reinit_up(). The third
> caller, ice_cfg_tx_topo(), intentionally does not need ice_init_dev_hw()
> during its reinit, it only needs the core HW reinitialization. This
> breaks the recursion cleanly without adding flags or guards.
> 
> The deinit ordering changes from commit 8a37f9e2ff40 ("ice: move
> ice_deinit_dev() to the end of deinit paths") which fixed slow rmmod
> are preserved, only the init-side placement of ice_init_dev_hw() is
> reverted.
> 
> Fixes: 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end of deinit paths")
> Signed-off-by: Petr Oros <poros@redhat.com>

The fix looks correct to me, and definitely the most elegant.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply

* Re: [PATCH v1 net] tcp: Don't set treq->req_usec_ts in cookie_tcp_reqsk_init().
From: patchwork-bot+netdevbpf @ 2026-04-13 23:10 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: edumazet, ncardwell, davem, kuba, pabeni, horms, kuni1840, netdev
In-Reply-To: <20260410235328.1773449-1-kuniyu@google.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 23:53:27 +0000 you wrote:
> Commit de5626b95e13 ("tcp: Factorise cookie-independent fields
> initialisation in cookie_v[46]_check().") miscategorised
> tcp_rsk(req)->req_usec_ts init to cookie_tcp_reqsk_init(),
> which is used by both BPF/non-BPF SYN cookie reqsk.
> 
> Rather, it should have been moved to cookie_tcp_reqsk_alloc() by
> commit 8e7bab6b9652 ("tcp: Factorise cookie-dependent fields
> initialisation in cookie_v[46]_check()") so that only non-BPF SYN
> cookie sets tcp_rsk(req)->req_usec_ts to false.
> 
> [...]

Here is the summary with links:
  - [v1,net] tcp: Don't set treq->req_usec_ts in cookie_tcp_reqsk_init().
    https://git.kernel.org/netdev/net/c/c058bbf05b11

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

^ permalink raw reply

* Re: [PATCH net] bnge: return after auxiliary_device_uninit() in error path
From: patchwork-bot+netdevbpf @ 2026-04-13 23:10 UTC (permalink / raw)
  To: Greg KH
  Cc: netdev, linux-kernel, vikas.gupta, andrew+netdev, davem, edumazet,
	kuba, pabeni, stable
In-Reply-To: <2026041124-banshee-molecular-0f70@gregkh>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 11 Apr 2026 12:45:25 +0200 you wrote:
> When auxiliary_device_add() fails, the error block calls
> auxiliary_device_uninit() but does not return.  The uninit drops the
> last reference and synchronously runs bnge_aux_dev_release(), which sets
> bd->auxr_dev = NULL and frees the underlying object.  The subsequent
> bd->auxr_dev->net = bd->netdev then dereferences NULL, which is not a
> good thing to have happen when trying to clean up from an error.
> 
> [...]

Here is the summary with links:
  - [net] bnge: return after auxiliary_device_uninit() in error path
    https://git.kernel.org/netdev/net/c/8b0c25528cb6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net/sched: taprio: fix use-after-free in advance_sched() on schedule switch
From: Jakub Kicinski @ 2026-04-13 23:01 UTC (permalink / raw)
  To: vinicius.gomes
  Cc: Jakub Kicinski, vladimir.oltean, jhs, jiri, davem, edumazet,
	pabeni, horms, netdev, linux-kernel, qjx1298677004
In-Reply-To: <20260410-taprio-user-after-free-fix-net-v1-1-0931b1579988@intel.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/sched: taprio: fix use-after-free in advance_sched() on schedule switch

This commit updates advance_sched() to select the next entry from the new
oper schedule immediately after switch_schedules() returns. It relies on the
pre-calculated end_time of the new schedule's first entry instead of
overwriting it with the admin schedule's base time.

> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index f721c03514f60..0316f2dee06ac 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c

[ ... ]

> @@ -972,11 +972,12 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>  	}
>  
>  	if (should_change_schedules(admin, oper, end_time)) {
> -		/* Set things so the next time this runs, the new
> -		 * schedule runs.
> -		 */
> -		end_time = sched_base_time(admin);

Does removing this assignment introduce a regression where the new schedule
activates prematurely?

Since the new schedule's first entry has its end_time pre-calculated
as base_time + interval, if we activate it early (before its actual
base_time), does it stretch the entry backwards in time?

Could this cause its gate mask to apply during the time gap, violating the
strictly configured gate durations?

>  		switch_schedules(q, &admin, &oper);

Can this sequence cause a use-after-free regression due to the RCU grace
period starting too early?

Inside switch_schedules(), call_rcu() is invoked on the old oper
schedule. However, q->current_entry is not updated until later in
advance_sched() at the first_run label:

first_run:
	rcu_assign_pointer(q->current_entry, next);

If a concurrent reader calls rcu_dereference(q->current_entry) between
switch_schedules() and this assignment, can it acquire a pointer to the
old schedule after call_rcu() has already started? This would mean the
grace period will not wait for this new reader.

As a separate pre-existing issue, can switch_schedules() be called
concurrently by lockless readers?

If advance_sched() calls it under q->current_entry_lock, but a concurrent
reader like get_packet_txtime() evaluates the switch condition and calls
switch_schedules() without holding locks, can they race?

Could this lead to call_rcu() being called twice on the same old oper
object, potentially corrupting the RCU callback list?

> +		/* After changing schedules, the next entry is the first one
> +		 * in the new schedule, with a pre-calculated end_time.
> +		 */
> +		next = list_first_entry(&oper->entries, struct sched_entry, list);
> +		end_time = next->end_time;
>  	}
>  
>  	next->end_time = end_time;

^ permalink raw reply

* Re: [PATCH v2] udp: Force compute_score to always inline
From: patchwork-bot+netdevbpf @ 2026-04-13 23:00 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	kuniyu, horms, netdev, willemb
In-Reply-To: <20260410155936.654915-1-krisman@suse.de>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Apr 2026 11:59:36 -0400 you wrote:
> Back in 2024 I reported a 7-12% regression on an iperf3 UDP loopback
> thoughput test that we traced to the extra overhead of calling
> compute_score on two places, introduced by commit f0ea27e7bfe1 ("udp:
> re-score reuseport groups when connected sockets are present").  At the
> time, I pointed out the overhead was caused by the multiple calls,
> associated with cpu-specific mitigations, and merged commit
> 50aee97d1511 ("udp: Avoid call to compute_score on multiple sites") to
> jump back explicitly, to force the rescore call in a single place.
> 
> [...]

Here is the summary with links:
  - [v2] udp: Force compute_score to always inline
    https://git.kernel.org/netdev/net-next/c/b80a95ccf160

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: airoha: Add missing bits in airoha_qdma_cleanup_tx_queue()
From: Jakub Kicinski @ 2026-04-13 22:57 UTC (permalink / raw)
  To: lorenzo
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
	linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260410-airoha_qdma_cleanup_tx_queue-fix-net-v1-1-b7171c8f1e78@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: airoha: Add missing bits in airoha_qdma_cleanup_tx_queue()

This commit resets DMA TX descriptors in airoha_qdma_cleanup_tx_queue
and synchronizes TX_DMA_IDX to TX_CPU_IDX to notify the hardware that
the ring is empty.

> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 9285a68f435fe..963ab7b8d166b 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1044,13 +1044,17 @@ static int airoha_qdma_init_tx(struct airoha_qdma *qdma)
[ ... ]
>  	spin_lock_bh(&q->lock);
>  	for (i = 0; i < q->ndesc; i++) {
> -		struct airoha_queue_entry *e = &q->entry[i];
> +		struct airoha_qdma_desc *desc = &q->desc[i];
>  
> +		e = &q->entry[i];

Can this result in a NULL pointer dereference?

If airoha_qdma_init_tx_queue() sets q->ndesc before allocating q->entry and
q->desc, and devm_kzalloc() fails, the error path jumps to
airoha_hw_cleanup().

Because q->ndesc > 0, this cleanup loop would execute and dereference
q->entry[i] and q->desc[i], which may not have been allocated.

[ ... ]
> +		/* Reset DMA descriptor */
> +		WRITE_ONCE(desc->ctrl, 0);
> +		WRITE_ONCE(desc->addr, 0);
> +		WRITE_ONCE(desc->data, 0);
> +		WRITE_ONCE(desc->msg0, 0);
> +		WRITE_ONCE(desc->msg1, 0);
> +		WRITE_ONCE(desc->msg2, 0);

Is it safe to modify these DMA descriptors and reset the ring indices below
without waiting for the hardware DMA engine to become idle?

In airoha_dev_stop(), the driver clears GLOBAL_CFG_TX_DMA_EN_MASK to request
the hardware DMA engine to stop, but it does not appear to poll
GLOBAL_CFG_TX_DMA_BUSY_MASK to ensure it has actually stopped.

Modifying the DMA descriptors and internal ring indices while the hardware
is still actively processing in-flight descriptors could potentially lead
to corrupted DMA fetches or IOMMU faults.

>  		q->queued--;
>  	}
> +
> +	e = list_first_entry(&q->tx_list, struct airoha_queue_entry,
> +			     list);
> +	index = e - q->entry;
> +	/* Set TX_DMA_IDX to TX_CPU_IDX to notify the hw the QDMA TX ring is
> +	 * empty.
> +	 */
> +	airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid), TX_RING_CPU_IDX_MASK,
> +			FIELD_PREP(TX_RING_CPU_IDX_MASK, index));
> +	airoha_qdma_rmw(qdma, REG_TX_DMA_IDX(qid), TX_RING_DMA_IDX_MASK,
> +			FIELD_PREP(TX_RING_DMA_IDX_MASK, index));

Could list_first_entry() be called on an empty list here?

If q->tx_list is empty (for example, if allocation failed in
airoha_qdma_init_tx_queue or if no entries had a valid dma_addr), calling
list_first_entry() returns an invalid pointer.

The subsequent pointer arithmetic would produce a garbage index value,
which is then written to the REG_TX_CPU_IDX and REG_TX_DMA_IDX registers.
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net-next v3 00/12] BIG TCP for UDP tunnels
From: Jakub Kicinski @ 2026-04-13 22:55 UTC (permalink / raw)
  To: Alice Mikityanska
  Cc: Daniel Borkmann, David S. Miller, Eric Dumazet, Paolo Abeni,
	Xin Long, Willem de Bruijn, David Ahern, Nikolay Aleksandrov,
	Shuah Khan, Stanislav Fomichev, Andrew Lunn, Simon Horman,
	Florian Westphal, netdev, Alice Mikityanska
In-Reply-To: <20260410150943.993350-1-alice.kernel@fastmail.im>

On Fri, 10 Apr 2026 18:09:31 +0300 Alice Mikityanska wrote:
> This series is a follow-up to "BIG TCP without HBH in IPv6", and it adds
> support for BIG TCP IPv4/IPv6 workloads in vxlan and geneve. Now that
> IPv6 BIG TCP doesn't require stripping the HBH in all various
> combinations in tunneled traffic, adding BIG TCP becomes feasible.

No longer applies, sorry :( 
We'll have to revisit after the merge window.
-- 
pw-bot: cr

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox