Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] vhost/net: fix clear_user start address in VHOST_GET_FEATURES_ARRAY
From: Eugenio Perez Martin @ 2026-06-25 13:56 UTC (permalink / raw)
  To: rom.wang
  Cc: Michael S . Tsirkin, Jason Wang, Paolo Abeni, kvm, virtualization,
	netdev, linux-kernel, Yufeng Wang
In-Reply-To: <CAJaqyWcFm0A5ucL5TLP8+T8JNOiZyaL-_mb747_fKhH9Qm83ig@mail.gmail.com>

On Thu, Jun 25, 2026 at 3:48 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, May 26, 2026 at 10:04 AM rom.wang <r4o5m6e8o@163.com> wrote:
> >
> > From: Yufeng Wang <wangyufeng@kylinos.cn>
> >
> > The clear_user() call in VHOST_GET_FEATURES_ARRAY incorrectly starts
> > at argp, which is the beginning of the features array, overwriting the
> > data just written by copy_to_user(). It should start after the copied
> > elements at argp + copied * sizeof(u64) to only zero the trailing
> > unused space.
> >
> > Fixes: 333c515d1896 ("vhost-net: allow configuring extended features")
> > Signed-off-by: Yufeng Wang <wangyufeng@kylinos.cn>
> > ---
> >  drivers/vhost/net.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index db341c922673..70c578acf840 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -1777,7 +1777,8 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
> >                         return -EFAULT;
> >
> >                 /* Zero the trailing space provided by user-space, if any */
> > -               if (clear_user(argp, size_mul(count - copied, sizeof(u64))))
> > +               if (clear_user(argp + copied * sizeof(u64),
> > +                              size_mul(count - copied, sizeof(u64))))
>
> The fix looks good to me, but why not use size_mul() macro for copied
> * sizeof(u64) multiplication?
>

Also, could you add a new switch to tools/virtio/vhost_net_test.c to
use the VHOST_GET_FEATURES_ARRAY and VHOST_SET_FEATURES_ARRAY instead
of VHOST_GET_FEATURES and VHOST_SET_FEATURES?

> >                         return -EFAULT;
> >                 return 0;
> >         case VHOST_SET_FEATURES_ARRAY:
> > --
> > 2.34.1
> >
> >


^ permalink raw reply

* Re: [PATCH] vhost/net: fix clear_user start address in VHOST_GET_FEATURES_ARRAY
From: Eugenio Perez Martin @ 2026-06-25 13:48 UTC (permalink / raw)
  To: rom.wang
  Cc: Michael S . Tsirkin, Jason Wang, Paolo Abeni, kvm, virtualization,
	netdev, linux-kernel, Yufeng Wang
In-Reply-To: <20260526080336.61296-1-r4o5m6e8o@163.com>

On Tue, May 26, 2026 at 10:04 AM rom.wang <r4o5m6e8o@163.com> wrote:
>
> From: Yufeng Wang <wangyufeng@kylinos.cn>
>
> The clear_user() call in VHOST_GET_FEATURES_ARRAY incorrectly starts
> at argp, which is the beginning of the features array, overwriting the
> data just written by copy_to_user(). It should start after the copied
> elements at argp + copied * sizeof(u64) to only zero the trailing
> unused space.
>
> Fixes: 333c515d1896 ("vhost-net: allow configuring extended features")
> Signed-off-by: Yufeng Wang <wangyufeng@kylinos.cn>
> ---
>  drivers/vhost/net.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index db341c922673..70c578acf840 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1777,7 +1777,8 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
>                         return -EFAULT;
>
>                 /* Zero the trailing space provided by user-space, if any */
> -               if (clear_user(argp, size_mul(count - copied, sizeof(u64))))
> +               if (clear_user(argp + copied * sizeof(u64),
> +                              size_mul(count - copied, sizeof(u64))))

The fix looks good to me, but why not use size_mul() macro for copied
* sizeof(u64) multiplication?

>                         return -EFAULT;
>                 return 0;
>         case VHOST_SET_FEATURES_ARRAY:
> --
> 2.34.1
>
>


^ permalink raw reply

* Re: [PATCH v2 net-next] selftests/xsk: Preserve UMEM view in BIDIRECTIONAL test
From: Jason Xing @ 2026-06-25 13:40 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	tushar.vyavahare
In-Reply-To: <20260625115215.1101928-1-maciej.fijalkowski@intel.com>

On Thu, Jun 25, 2026 at 7:52 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> The UMEM state refactor made __send_pkts() use xsk->umem for Tx
> address generation. At the same time, the shared-UMEM Tx setup copies the
> Rx UMEM state into a Tx-local state object and resets base_addr and
> next_buffer before configuring the Tx socket.
>
> Passing that Tx-local object to xsk_configure() makes xsk->umem point to
> the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
> the roles are switched: the same socket is then used for Rx validation, but
> received descriptors from the other logical UMEM half are checked against
> base_addr == 0. With the new UMEM bounds check, a valid address such as
> base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
> window.
>
> Keep xsk->umem as the shared/Rx UMEM view used for socket configuration
> and Rx validation. Use the ifobject-local UMEM copy only for Tx descriptor
> address generation, preserving the BIDIRECTIONAL test's intent of using
> the proper logical UMEM half after the direction switch.
>
> Fixes: b17631032769 ("selftests/xsk: Move UMEM state from ifobject to xsk_socket_info")
> Reviewed-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
> Tested-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

Oh, you've already pushed a v2 patch, so again:

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thanks!

^ permalink raw reply

* Re: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
From: Jason Xing @ 2026-06-25 13:38 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	tushar.vyavahare
In-Reply-To: <20260623091008.1046547-1-maciej.fijalkowski@intel.com>

On Tue, Jun 23, 2026 at 5:10 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> The UMEM state refactor made __send_pkts() use xsk->umem for Tx
> address generation. At the same time, the shared-UMEM Tx setup copies the
> Rx UMEM state into a Tx-local state object and resets base_addr and
> next_buffer before configuring the Tx socket.
>
> Passing that Tx-local object to xsk_configure() makes xsk->umem point to
> the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
> the roles are switched: the same socket is then used for Rx validation, but
> received descriptors from the other logical UMEM half are checked against
> base_addr == 0. With the new UMEM bounds check, a valid address such as
> base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
> window.
>
> Keep xsk->umem as the shared/Rx UMEM view used for socket configuration
> and Rx validation. Use the ifobject-local UMEM copy only for Tx descriptor
> address generation, preserving the BIDIRECTIONAL test's intent of using
> the proper logical UMEM half after the direction switch.
>
> Fixes: b17631032769 ("selftests/xsk: Move UMEM state from ifobject to xsk_socket_info")
> Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thanks!

^ permalink raw reply

* [PATCH v1 1/1] ocp: Add I2C control support for Adva TimeCard
From: Sagi Maimon @ 2026-06-25 13:38 UTC (permalink / raw)
  To: jonathan.lemon, vadim.fedorenko, richardcochran, andrew+netdev,
	davem, edumazet, kuba, pabeni
  Cc: linux-kernel, netdev, Sagi Maimon

- Load i2c-dev module to expose /dev/i2c-N character devices
- Add sysfs-based I2C bus control for Adva TimeCard model

Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
---
 drivers/ptp/ptp_ocp.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/ptp/ptp_ocp.c b/drivers/ptp/ptp_ocp.c
index 35e911f1ad78..1b4ccb4feca5 100644
--- a/drivers/ptp/ptp_ocp.c
+++ b/drivers/ptp/ptp_ocp.c
@@ -4224,6 +4224,34 @@ static const struct ocp_attr_group art_timecard_groups[] = {
 	{ },
 };
 
+static ssize_t
+i2c_bus_ctrl_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct ptp_ocp *bp = dev_get_drvdata(dev);
+
+	if (!bp->pps_select)
+		return -ENODEV;
+	return sysfs_emit(buf, "0x%08x\n",
+			  ioread32(&bp->pps_select->__pad1));
+}
+
+static ssize_t
+i2c_bus_ctrl_store(struct device *dev, struct device_attribute *attr,
+		   const char *buf, size_t count)
+{
+	struct ptp_ocp *bp = dev_get_drvdata(dev);
+	u32 val;
+
+	if (!bp->pps_select)
+		return -ENODEV;
+	if (kstrtou32(buf, 0, &val))
+		return -EINVAL;
+	iowrite32(val, &bp->pps_select->__pad1);
+	return count;
+}
+
+static DEVICE_ATTR_RW(i2c_bus_ctrl);
+
 static struct attribute *adva_timecard_attrs[] = {
 	&dev_attr_serialnum.attr,
 	&dev_attr_gnss_sync.attr,
@@ -4272,6 +4300,7 @@ static struct attribute *adva_timecard_x1_attrs[] = {
 	&dev_attr_ts_window_adjust.attr,
 	&dev_attr_utc_tai_offset.attr,
 	&dev_attr_tod_correction.attr,
+	&dev_attr_i2c_bus_ctrl.attr,
 	NULL,
 };
 
@@ -5235,6 +5264,7 @@ ptp_ocp_init(void)
 	const char *what;
 	int err;
 
+	request_module("i2c-dev");
 	ptp_ocp_debugfs_init();
 
 	what = "timecard class";
-- 
2.47.0


^ permalink raw reply related

* Re: [PATCH net] mlxsw: spectrum_acl_erp: Fix const qualifier of delta_clear()
From: Petr Machata @ 2026-06-25 13:27 UTC (permalink / raw)
  To: Evgenii Burenchev
  Cc: stable, Greg Kroah-Hartman, idosch, petrm, andrew+netdev, davem,
	edumazet, kuba, pabeni, jiri, netdev, linux-kernel, lvc-project
In-Reply-To: <20260625114831.17386-1-evg28bur@yandex.ru>


Evgenii Burenchev <evg28bur@yandex.ru> writes:

> mlxsw_sp_acl_erp_delta_clear() takes 'const char *enc_key' but modifies
> the memory it points to. This is a logical error in the function
> declaration.
>
> The only caller passes a non-const buffer (aentry->ht_key.enc_key), so
> the const qualifier is misleading and unnecessary.
>
> Remove const from the enc_key parameter to match the actual usage.
>
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>
> Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
> Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru>

Dunno how much of a net material this is, there's no bug to be fixed,
it's a source code cleanliness improvement. But the patch is correct.

Reviewed-by: Petr Machata <petrm@nvidia.com>

^ permalink raw reply

* [PATCH nf-next] netfilter: remove redundant null check before kvfree()
From: Subasri S @ 2026-06-25 13:31 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: netfilter-devel, coreteam, netdev, linux-kernel, Subasri S

kvfree() internally performs NULL check on the pointer
handed to it and takes no action if it indeed is NULL.
Hence there is no need for a pre-check of the memory
pointer before handing it to kvfree().

Issue reported by ifnullfree.cocci Coccinelle semantic
patch script.

Signed-off-by: Subasri S <subasris1210@gmail.com>
---
 net/netfilter/nft_set_rbtree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 018bbb6df4ce..efc25e788a1c 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -544,8 +544,7 @@ static int nft_array_intervals_alloc(struct nft_array *array, u32 max_intervals)
 	if (!intervals)
 		return -ENOMEM;
 
-	if (array->intervals)
-		kvfree(array->intervals);
+	kvfree(array->intervals);
 
 	array->intervals = intervals;
 	array->max_intervals = max_intervals;
-- 
2.43.0


^ permalink raw reply related

* Re: [BUG] TCP connection deadlock under simultaneous bidirectional ICSK_ACK_NOMEM (OOM)
From: xietangxin @ 2026-06-25 13:22 UTC (permalink / raw)
  To: Menglong Dong
  Cc: edumazet, davem, kuba, pabeni, jmaloy, menglong8.dong, kuniyu,
	horms, willemb, netdev, linux-kernel, linux-stable
In-Reply-To: <g7DvITFWSiW9AoI49uyghw@linux.dev>



On 6/8/2026 7:55 PM, Menglong Dong wrote:
> On 2026/6/4 16:22 xietangxin <xietangxin@yeah.net> write:
>> Hi all,
>>
>> We have observed a TCP connection deadlock on stable 6.6 under heavy stress testing.
>>
>> 1.Both Peer A and Peer B enter the ICSK_ACK_NOMEM branch in tcp_select_window().
>> After commit 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"),
>> Both peers freeze their rcv_nxt and set rcv_wnd = 0.
>>
>> 2.Prior to freezing, both sides had already sent out flight data.
>> Since both sides are dropping incoming data packets due to OOM, rcv_nxt stops advancing,
>> but the peer's seq of subsequent packets continues to grow.
>>
>> 3.When Peer A receives Peer B's Zero Window ACK,
>> the packet's seq is far ahead of Peer A's frozen rcv_nxt.
>> Both peers drop each other's packet, also no Zero Window Probes are triggered
>> because snd_wnd is never updated to 0.
>>
> 
> Hi,
> 
> The problem you addressed is already fixed in this commit:
> 0e24d17bd966 ("tcp: implement RFC 7323 window retraction receiver requirements"),
> which hasn't been picked to the 6.6 branch.
> 
> That patch doesn't have the Fix tag, so I'm not sure if it will be picked
> to the 6.6 branch. Just CC the linux-stable :)
> 
> Thanks!
> Menglong Dong
> 
>>
>> Simplified Packet Trace:
>>
>> Assume Peer A's rcv_nxt = 1000, and Peer B's rcv_nxt = 5000 initially.
>>
>> Time  Dir      Type        Seq   Ack   Win  Len  Status
>> ------------------------------------------------------------------------
>> T1:   B -> A   [PSH, ACK]  1000  5000  3000 100  (A hits OOM, rcv_nxt=1000)
>> T2:   B -> A   [ACK]       1100  5000  3000 200  (Dropped due to A's OOM)
>> T3:   B -> A   [PSH, ACK]  1300  5000  3000 200  (Dropped due to A's OOM)
>>
>> T4:   A -> B   [PSH, ACK]  5000  1000  3000 100  (B hits OOM, rcv_nxt=5000)
>> T5:   A -> B   [ACK]       5100  1000  3000 200  (Dropped due to B's OOM)
>> T6:   A -> B   [PSH, ACK]  5300  1000  3000 200  (Dropped due to B's OOM)
>>
>> -- Both sides are now in OOM. B's Seq is 1500; A's Seq is 5500 --
>>
>> T7:   B -> A   [ZeroWin]   1500  5000  0    0    (Dropped: Seq 1500 != 1000)
>> T8:   A -> B   [ZeroWin]   5500  1000  0    0    (Dropped: Seq 5500 != 5000)
>> T9:   A -> B   [WinUpdate] 5500  1000  20   0    (Dropped: Seq 5500 != 5000)
>>
>> Should we relax the sequence check in tcp_sequence() for zero window ACK?
>>
>> Any feedback or guidance would be greatly appreciated.
>>
>> -- 
>> Best regards,
>> Tangxin Xie
>>
>>
>>
> 
> 
> 

Hi,

We observed a throughput regression (dropping from ~1GB/s to 100MB/s)
in our test environment after commit 0e24d17bd966
("tcp: implement RFC 7323 window retraction receiver requirements").

When the rcv_buf reaches the pressure triggers tcp_clamp_window().
then rcv_ssthresh is strictly capped to 2 * advmss.
Subsequently, even after the user completely consumes the data and releases
a massive amount of free_space, tcp_select_window() is still heavily
suppressed by the clamped rcv_ssthresh. As a result, the receiver advertises
an extremely small window (Win=23) to the peer.

The sender cannot transmit any new data segments, until the sender's RTO timer
expires and triggers a slow-start recovery. This 200ms silence window slashes
our bandwidth by 90%.


No.   Time           Source       Destination  Info
-----------------------------------------------------------------------------------------------
1045  08:16:06.8005  192.168.1.9  192.168.1.10  [TCP ZeroWindow] 57334 -> 6666 [PSH, ACK] Win=0
1052  08:16:06.8013  192.168.1.9  192.168.1.10  [TCP Window Update] 57334 -> 6666 [ACK] Win=23
1055  08:16:06.8036  192.168.1.10  192.168.1.9  6666 -> 57334 [ACK] Seq=2999704568 Ack=2416286095
=========================== 200ms  SILENCE (RTO WAITING) ===================================
1088  08:16:07.0056  192.168.1.10  192.168.1.9  [TCP Retransmission] 6666 -> 57334 Len=1448
1090  08:16:07.0060  192.168.1.10  192.168.1.9  [TCP Retransmission] Len=2896

-- 
Best regards,
Tangxin Xie


^ permalink raw reply

* Re: [PATCH net] nfc: nci: fix out-of-bounds write in nci_target_auto_activated()
From: Simon Horman @ 2026-06-25 12:57 UTC (permalink / raw)
  To: Sam P
  Cc: david, davem, edumazet, kuba, pabeni, oe-linux-nfc, netdev,
	linux-kernel, stable
In-Reply-To: <443e2ee1-e9c1-45ca-be57-0c67966ec7d9@bynar.io>

On Wed, Jun 24, 2026 at 12:33:21AM +0200, Sam P wrote:
> On 23/06/2026 19:21, Simon Horman wrote:
> > > diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
> > > index c96512bb86531..566ca839fa488 100644
> > > --- a/net/nfc/nci/ntf.c
> > > +++ b/net/nfc/nci/ntf.c
> > > @@ -603,6 +603,12 @@ static void nci_target_auto_activated(struct nci_dev *ndev,
> > >       struct nfc_target *target;
> > >       int rc;
> > > 
> > > +    /* This is a new target, check if we've enough room */
> > > +    if (ndev->n_targets == NCI_MAX_DISCOVERED_TARGETS) {
> > > +        pr_debug("not enough room, ignoring new target...\n");
> > > +        return;
> > 
> > [Severity: High]
> > Does this early return cause the state machine to stall?
> > 
> > Looking at nci_rf_intf_activated_ntf_packet(), the state is transitioned
> > to NCI_POLL_ACTIVE right before this function is called:
> > 
> > net/nfc/nci/ntf.c:nci_rf_intf_activated_ntf_packet() {
> >      ...
> >      atomic_set(&ndev->state, NCI_POLL_ACTIVE);
> >      if (err == NCI_STATUS_OK)
> >          nci_target_auto_activated(ndev, &ntf);
> >      ...
> > }
> > 
> > Because of the early return here, nfc_targets_found() is skipped. Does this
> > leave the state wedged in NCI_POLL_ACTIVE, permanently blocking future
> > attempts to stop polling since nci_stop_poll() requires the state to be
> > NCI_DISCOVERY?
> > 
> > > +    }
> > > +
> > >       target = &ndev->targets[ndev->n_targets];
> > 
> > [ ... ]
> 
> Thanks for sharing the review.
> 
> I don't believe the early return can stall the state machine, as
> nci_target_auto_activated() does not touch ndev->state.
> 
> nci_rf_intf_activated_ntf_packet() sets ndev->state = NCI_POLL_ACTIVE
> before it calls nci_target_auto_activated(), so the transition has
> already happened by the time the new check runs. Neither
> nci_target_auto_activated() nor nfc_targets_found() modifies ndev->state;
> nfc_targets_found() only clears the nfc_dev dev->polling flag under
> device_lock, which is independent of the NCI state machine that
> nci_stop_poll() tests. So ndev->state is NCI_POLL_ACTIVE after an
> auto-activation whether the target is appended or the guard returns
> early; the state behaviour is the same with or without this patch.
> 
> The check is the same n_targets == NCI_MAX_DISCOVERED_TARGETS guard the
> sibling nci_add_new_target() (the RF_DISCOVER_NTF path) has used since
> the Fixes: commit; it only drops the out-of-spec 11th+ target, which a
> conformant controller shouldn't produce. The device still returns to
> NCI_DISCOVERY/NCI_IDLE through the normal RF_DEACTIVATE path.

Thanks, I agree with your analysis.

> > [Severity: Critical]
> > This isn't a bug introduced by this patch, but can the ntf union copy
> > uninitialized stack memory into persistent device structures?
> > 
> > In net/nfc/nci/ntf.c:nci_rf_discover_ntf_packet(), the local ntf struct
> > is declared on the stack:
> > 
> >      struct nci_rf_discover_ntf ntf;
> > 
> > If the incoming packet has ntf.rf_tech_specific_params_len == 0, the
> > extraction routines are bypassed, leaving the rf_tech_specific_params union
> > uninitialized. The struct is then passed to nci_add_new_target():
> > 
> >      if (add_target == true)
> >          nci_add_new_target(ndev, &ntf);
> > 
> > This eventually calls nci_add_new_protocol(), which blindly reads from this
> > union:
> > 
> >      nfca_poll = (struct rf_tech_specific_params_nfca_poll *)params;
> >      target->sens_res = nfca_poll->sens_res;
> >      target->sel_res = nfca_poll->sel_res;
> >      target->nfcid1_len = nfca_poll->nfcid1_len;
> > 
> > Could this uninitialized stack garbage be copied into ndev->targets, and
> > later broadcasted to userspace via Netlink in nfc_targets_found()?
> 
> I managed to repro this issue locally via KMSAN, I can queue up a patch
> for this that I can submit separately, ntf just needs to be zero-initd.

Thanks, I agree this can be handled separately.

> > [Severity: Critical]
> > This isn't a bug introduced by this patch, but does the general bytes length
> > calculation underflow?
> > 
> > In net/nfc/nci/ntf.c:nci_store_general_bytes_nfc_dep():
> > 
> >      ndev->remote_gb_len = min_t(__u8,
> >          (ntf->activation_params.listen_nfc_dep.atr_req_len -
> >           NFC_ATR_REQ_GT_OFFSET), NFC_ATR_REQ_GB_MAXSIZE);
> > 
> >      memcpy(ndev->remote_gb,
> >          (ntf->activation_params.listen_nfc_dep.atr_req +
> >           NFC_ATR_REQ_GT_OFFSET), ndev->remote_gb_len);
> > 
> > The length is computed by subtracting NFC_ATR_REQ_GT_OFFSET (14) from
> > atr_req_len. If atr_req_len is less than 14, this subtraction will
> > underflow.
> > 
> > Because of the min_t cast, this large underflowed value is bounded to
> > NFC_ATR_REQ_GB_MAXSIZE (48). The memcpy() then copies 48 bytes starting
> > from atr_req + 14 into ndev->remote_gb.
> > 
> > Since the source buffer is part of the ntf union allocated on the stack
> > in nci_rf_intf_activated_ntf_packet(), could this leak up to 48 bytes of
> > uninitialized kernel stack memory to userspace?
> 
> This one is already being fixed on the list:
> 
> [PATCH net v4 1/5] nfc: nci: fix u8 underflow in
> nci_store_general_bytes_nfc_dep  (Lekë Hapçiu)
> https://lore.kernel.org/all/20260424180151.3808557-2-snowwlake@icloud.com/

Ack.

> > [Severity: High]
> > This isn't a bug introduced by this patch, but does the lack of skb->len
> > validation in NCI response handlers allow out-of-bounds reads?
> > 
> > In nci_core_init_rsp_packet_v1() in net/nfc/nci/rsp.c, fields are read
> > and arrays are copied from the payload:
> > 
> >      ndev->num_supported_rf_interfaces =
> >          rsp_1->num_supported_rf_interfaces;
> > 
> >      memcpy(ndev->supported_rf_interfaces, rsp_1->supported_rf_interfaces,
> >             ndev->num_supported_rf_interfaces);
> > 
> > Similarly in nci_core_init_rsp_packet_v2():
> > 
> >      while (rf_interface_idx < ndev->num_supported_rf_interfaces) {
> >          ndev->supported_rf_interfaces[rf_interface_idx++] =
> >              *supported_rf_interface++;
> >      }
> > 
> > Since the NCI control header is stripped via skb_pull(), leaving the
> > payload length in skb->len, what happens if an NFCC sends a short response
> > packet (e.g. plen = 1)? Could this cause the handlers to read adjacent
> > kernel slab memory?
> 
> I've since submitted a fix for this issue:
> 
> [PATCH net] nfc: nci: fix uninit-value in
> nci_core_init_rsp_packet()
> https://lore.kernel.org/netdev/20260623222402.175798-1-sam@bynar.io/

Ack.

> > [Severity: High]
> > This isn't a bug introduced by this patch, but is there a missing boundary
> > validation when extracting activation parameters?
> > 
> > In net/nfc/nci/ntf.c:nci_extract_activation_params_iso_dep():
> > 
> >      nfca_poll->rats_res_len = min_t(__u8, *data++, NFC_ATS_MAXSIZE);
> >      if (nfca_poll->rats_res_len > 0) {
> >          memcpy(nfca_poll->rats_res, data, nfca_poll->rats_res_len);
> >      }
> > 
> > And in nci_extract_activation_params_nfc_dep():
> > 
> >      poll->atr_res_len = min_t(__u8, *data++, NFC_ATR_RES_MAXSIZE - 2);
> >      if (poll->atr_res_len > 0)
> >          memcpy(poll->atr_res, data, poll->atr_res_len);
> > 
> > These inner lengths are bounded by a maximum size like NFC_ATS_MAXSIZE, but
> > they are not validated against the remaining bytes in the SKB (skb->len).
> > 
> > If a malformed packet provides a large inner length but a short payload,
> > could this memcpy() read out of bounds into the device structure,
> > potentially leaking adjacent slab memory to userspace later?
> 
> This one is already being fixed on the list too:
> 
> [PATCH v3] nfc: nci: add data_len bound checks to activation
> parameter extractors  (Bryam Vargas)
> https://lore.kernel.org/all/20260612-b4-disp-6d52d8b0-v3-1-e26221f8826d@proton.me/

Ack.

> > [Severity: High]
> > This isn't a bug introduced by this patch, but is there a potential
> > priority inversion or timeout deadlock involving the device lock?
> > 
> > Functions like nfc_stop_poll() in net/nfc/core.c acquire the device lock:
> > 
> >      device_lock(&dev->dev);
> >      ...
> >      if (dev->ops->stop_poll)
> >          dev->ops->stop_poll(dev);
> > 
> > This eventually calls nci_request(), which synchronously waits for a
> > completion signaled by the nci_rx_work thread. However, if an NTF packet
> > is received first, nci_rx_work processes it and invokes
> > nfc_targets_found(), which also attempts to acquire the device lock:
> > 
> >      device_lock(&dev->dev);
> > 
> > Since the calling thread already holds the device lock, nci_rx_work blocks
> > indefinitely. Because the RX worker is blocked, it cannot process the
> > pending RSP, causing nci_request() to time out and fail. Could this
> > deadlock the RX thread?
> 
> No patch for this one, although I'm not sure how accurate it is.

With the above in mind, this now looks good to me.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH v3 net] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
From: Pengfei Zhang @ 2026-06-25 12:56 UTC (permalink / raw)
  To: idosch
  Cc: dsahern, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, Pengfei Zhang
In-Reply-To: <20260625122411.GA1175897@shredder>

Hi Ido,

Thank you for the review. Thanks also to Jakub Kicinski, Eric Dumazet
and Kuniyuki Iwashima for their feedback on the earlier versions.

I will prepare a patch to fix inet_dump_fib() using the same tb_id-based
resume logic and send it to net-next once it opens.

Thanks,
Pengfei

^ permalink raw reply

* Re: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level triggered.
From: Parthiban.Veerasooran @ 2026-06-25 12:36 UTC (permalink / raw)
  To: Selvamani.Rajagopal, andrew+netdev, davem, edumazet, kuba, pabeni,
	robh, krzk+dt, conor+dt, Pier.Beruto
  Cc: andrew, netdev, linux-kernel, Conor.Dooley, devicetree
In-Reply-To: <CYYPR02MB9828E1167750AEA090EC60CD83EE2@CYYPR02MB9828.namprd02.prod.outlook.com>

Hi Selvamani,

On 23/06/26 11:18 am, Selvamani Rajagopal wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
>> -----Original Message-----
>> From: Parthiban.Veerasooran@microchip.com <Parthiban.Veerasooran@microchip.com>
>> Subject: Re: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level
>> triggered.
>>
>>
>> I will find some time this week to test and share my feedback. In the
>> meantime, would it be possible for you to test using two instances (Test
>> Case 2)? I did not encounter many issues when testing with a single
>> instance.
>>
>> I believe that testing with two instances increases the likelihood of
>> reproducing the issue in your setup as well.
> 
> Parthiban,
> 
> Thanks.
> 
> Our EVB design allows only one board to be connected to one Raspberry Pi.
> So, I don't think I can have a setup like yours. We did test with three Raspberry Pi boards with
> multi-drop connection. Couldn't see your "NULL pointer" crash. Will keep trying though.
Thank you for the update. So it seems you can't connect two of your 
MAC-PHYs in one RPI 4? RPI 4 can support two SPI devices (MAC-PHYs).

https://patchwork.kernel.org/project/netdevbpf/list/?series=1114495&state=%2A&archive=both

https://patchwork.kernel.org/project/netdevbpf/patch/20260621-fix-race-condition-and-crash-v1-1-87e290d9357f@onsemi.com/

https://patchwork.kernel.org/project/netdevbpf/patch/20260621-fix-race-condition-and-crash-v1-2-87e290d9357f@onsemi.com/

With your above patches, I did a quick test (Test case 2) with two 
Microchip MAC-PHYs and faced a similar issue reported before. Sharing 
the dmesg crash log for your reference.

[ 2863.182105] eth1: Receive buffer overflow error
[ 2863.199905] eth1: Receive buffer overflow error
[ 2867.669312] Unable to handle kernel NULL pointer dereference at 
virtual address 00000000000000b8
[ 2867.677658] Mem abort info:
[ 2867.680474]   ESR = 0x0000000096000005
[ 2867.684258]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 2867.689630]   SET = 0, FnV = 0
[ 2867.692717]   EA = 0, S1PTW = 0
[ 2867.695888]   FSC = 0x05: level 1 translation fault
[ 2867.700825] Data abort info:
[ 2867.703726]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 2867.709303]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 2867.714399]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 2867.719773] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000113c2e000
[ 2867.726296] [00000000000000b8] pgd=0000000000000000, 
p4d=0000000000000000, pud=0000000000000000
[ 2867.735109] Internal error: Oops: 0000000096000005 [#1]  SMP
[ 2867.740830] Modules linked in: lan865x_t1s(O) microchip_t1s(O) sch_fq 
snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfcomm algif_hash 
aes_neon_bs algif_skcipher af_alg bnep binfmt_misc brcmfmac_cyw brcmfmac 
hci_uart brcmutil btbcm bluetooth vc4 cfg80211 snd_soc_hdmi_codec 
drm_exec ecdh_generic ecc drm_display_helper cec rfkill bcm2835_codec(C) 
drm_dma_helper v3d rpi_hevc_dec drm_client_lib bcm2835_v4l2(C) gpu_sched 
drm_shmem_helper crc_ccitt bcm2835_isp snd_soc_core drm_kms_helper 
bcm2835_mmal_vchiq v4l2_mem2mem vc_sm_cma videobuf2_vmalloc 
videobuf2_dma_contig raspberrypi_hwmon videobuf2_memops snd_compress 
snd_bcm2835(C) videobuf2_v4l2 snd_pcm_dmaengine i2c_brcmstb snd_pcm 
snd_timer videodev videobuf2_common snd mc raspberrypi_gpiomem 
spi_bcm2835 gpio_fan nvmem_rmem sch_fq_codel i2c_dev zram lz4_compress 
drm fuse drm_panel_orientation_quirks backlight nfnetlink [last 
unloaded: microchip_t1s(O)]
[ 2867.821558] CPU: 3 UID: 0 PID: 2808 Comm: irq/59-spi0.0 Tainted: G 
      C O        7.1.0-rc7-v8+ #2 PREEMPT
[ 2867.831779] Tainted: [C]=CRAP, [O]=OOT_MODULE
[ 2867.836183] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 2867.842088] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[ 2867.849138] pc : oa_tc6_update_rx_skb+0x2c/0xa8 [lan865x_t1s]
[ 2867.854955] lr : oa_tc6_macphy_threaded_irq+0x430/0x870 [lan865x_t1s]
[ 2867.861476] sp : ffffffc083dbbd20
[ 2867.864825] x29: ffffffc083dbbd20 x28: 000000003e002020 x27: 
ffffffed1cf609c8
[ 2867.872051] x26: 0000000000000000 x25: 0000000000000001 x24: 
ffffff8040796480
[ 2867.879277] x23: 000000002020003e x22: 0000000000000000 x21: 
0000000000000040
[ 2867.886504] x20: ffffff804a479080 x19: ffffff8040796480 x18: 
00000000000982f8
[ 2867.893731] x17: ffffff80482d6500 x16: ffffffed1d87b6b0 x15: 
ffffff8041a43c00
[ 2867.900957] x14: 0000000000000016 x13: 0000073d6a5d38dc x12: 
00000000001d4ebe
[ 2867.908184] x11: 00000000000000c0 x10: 0000000000001ae0 x9 : 
ffffffecc9c959e8
[ 2867.915410] x8 : ffffff804f1e5a40 x7 : 0000000000000002 x6 : 
ffffffffffffffff
[ 2867.922636] x5 : ffffffed1e59d000 x4 : 0000000000000002 x3 : 
0000000000000000
[ 2867.929863] x2 : 0000000000000040 x1 : ffffff804a479080 x0 : 
0000000000000000
[ 2867.937090] Call trace:
[ 2867.939558]  oa_tc6_update_rx_skb+0x2c/0xa8 [lan865x_t1s] (P)
[ 2867.945375]  oa_tc6_macphy_threaded_irq+0x430/0x870 [lan865x_t1s]
[ 2867.951543]  irq_thread_fn+0x34/0xc0
[ 2867.955156]  irq_thread+0x1a8/0x308
[ 2867.958680]  kthread+0x138/0x150
[ 2867.961942]  ret_from_fork+0x10/0x20
[ 2867.965558] Code: aa0103f4 f90013f5 12001c55 f9403800 (29570403)
[ 2867.971727] ---[ end trace 0000000000000000 ]---
[ 2867.976443] genirq: exiting task "irq/59-spi0.0" (2808) is an active 
IRQ thread (irq 59)
[ 2868.094789] irq 59: nobody cared (try booting with the "irqpoll" option)
[ 2868.101000] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G      D  C 
O        7.1.0-rc7-v8+ #2 PREEMPT
[ 2868.101007] Tainted: [D]=DIE, [C]=CRAP, [O]=OOT_MODULE
[ 2868.101009] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 2868.101011] Call trace:
[ 2868.101013]  show_stack+0x20/0x38 (C)
[ 2868.101027]  dump_stack_lvl+0x60/0x80
[ 2868.101033]  dump_stack+0x18/0x24
[ 2868.101038]  __report_bad_irq+0x54/0xf0
[ 2868.101043]  note_interrupt+0x344/0x398
[ 2868.101048]  handle_irq_event+0xa4/0x110
[ 2868.101051]  handle_level_irq+0xe0/0x178
[ 2868.101056]  handle_irq_desc+0x3c/0x68
[ 2868.101061]  generic_handle_domain_irq+0x20/0x40
[ 2868.101067]  bcm2835_gpio_irq_handle_bank+0x180/0x1c8
[ 2868.101074]  bcm2835_gpio_irq_handler+0x88/0x188
[ 2868.101080]  handle_irq_desc+0x3c/0x68
[ 2868.101085]  generic_handle_domain_irq+0x20/0x40
[ 2868.101091]  gic_handle_irq+0x4c/0xe0
[ 2868.101094]  call_on_irq_stack+0x30/0x88
[ 2868.101099]  do_interrupt_handler+0x88/0x98
[ 2868.101102]  el1_interrupt+0x3c/0x60
[ 2868.101108]  el1h_64_irq_handler+0x18/0x30
[ 2868.101113]  el1h_64_irq+0x6c/0x70
[ 2868.101116]  default_idle_call+0x34/0x1a0 (P)
[ 2868.101123]  do_idle+0x260/0x2a0
[ 2868.101128]  cpu_startup_entry+0x3c/0x50
[ 2868.101132]  rest_init+0xe8/0xf0
[ 2868.101137]  start_kernel+0x7f4/0x800
[ 2868.101143]  __primary_switched+0x88/0x98
[ 2868.101149] handlers:
[ 2868.207750] lan8650 spi0.1: SPI transfer timed out
[ 2868.208070] [<0000000019361c17>] oa_tc6_macphy_isr [lan865x_t1s]
[ 2868.212048] spi_master spi0: failed to transfer one message from queue
[ 2868.215296]  threaded [<00000000a4e6f0fa>] oa_tc6_macphy_threaded_irq 
[lan865x_t1s]
[ 2868.219005] spi_master spi0: noqueue transfer failed

[ 2868.223053] Disabling IRQ #59
[ 2868.260162] lan8650 spi0.1 eth2: SPI data transfer failed: -110
[ 2868.266211] lan8650 spi0.1: Device interrupt disabled to avoid 
interrupt storm

Best regards,
Parthiban V
> 
> But I could see assert in skb_put immediately quickly.
> 
>>
>> Best regards,
>> Parthiban V
>>>
>>>>
> 


^ permalink raw reply

* Re: [PATCH v3 net] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
From: Ido Schimmel @ 2026-06-25 12:24 UTC (permalink / raw)
  To: Pengfei Zhang
  Cc: dsahern, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, chenzhangqi, baohua
In-Reply-To: <20260625070517.965597-1-zhangfeionline@gmail.com>

On Thu, Jun 25, 2026 at 03:05:17PM +0800, Pengfei Zhang wrote:
> inet6_dump_fib() saves its progress in cb->args[1] as a positional
> index within the current hash chain.  Between batches, a concurrent
> fib6_new_table() can insert a new table at the chain head, shifting
> all existing entries.  The saved index then lands on a different
> table, causing fib6_dump_table() to set w->root to the wrong table
> while w->node still points into the previous one.
> fib6_walk_continue() dereferences w->node->parent (NULL) and panics:
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000008
>   RIP: 0010:fib6_walk_continue+0x6e/0x170
>   Call Trace:
>    <TASK>
>    fib6_dump_table.isra.0+0xc5/0x240
>    inet6_dump_fib+0xf6/0x420
>    rtnl_dumpit+0x30/0xa0
>    netlink_dump+0x15b/0x460
>    netlink_recvmsg+0x1d6/0x2a0
>    ____sys_recvmsg+0x17a/0x190
> 
> Fix by storing tb->tb6_id in cb->args[1] instead of a positional
> index.  On resume, skip entries until the id matches; a concurrent
> head-insert can never match the saved id, so the walker always
> resumes on the correct table.
> 
> Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
> Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

You should have waited at least 24h between versions:

https://docs.kernel.org/process/maintainer-netdev.html

The same pattern exists in IPv4, but there we don't crash because the
per-table resume logic is different. Instead, it is possible that we
restart the dump from the wrong table and re-dump routes from the next
table in the chain.

I'm aware that netlink dumps do not guarantee consistency, but for
parity / robustness reasons I suggest to align IPv4 with IPv6 and use
the same tb_id-based resume logic there. Given we don't crash there,
target the IPv4 patch at net-next (currently closed, should open next
week).

Thanks

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net v7 2/4] i40e: skip unnecessary VF reset when setting trust
From: Przemek Kitszel @ 2026-06-25 12:17 UTC (permalink / raw)
  To: Simon Horman, jtornosm
  Cc: netdev, intel-wired-lan, aleksandr.loktionov, jacob.e.keller,
	anthony.l.nguyen, davem, edumazet, kuba, pabeni
In-Reply-To: <20260624163050.1145761-1-horms@kernel.org>

On 6/24/26 18:30, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> i40e: skip unnecessary VF reset when setting trust
> 
> This patch avoids the ~10 second VF reset delay on trust changes by
> flipping the I40E_VIRTCHNL_VF_CAP_PRIVILEGE bit directly, and only
> resorting to a full reset when revoking trust from a VF that has
> advanced features (ADQ, cloud filters, UC/MC promisc) configured.
> 
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> index a26c3d47ec15..0cc434b26eb8 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> @@ -4987,19 +5004,20 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
>>   	set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
>>   	pf->vsi[vf->lan_vsi_idx]->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
>>   
>> -	i40e_vc_reset_vf(vf, true);
>> +	/* Reset only if revoking trust and VF has advanced features configured */
>> +	if (!setting &&
>> +	    (vf->adq_enabled || vf->num_cloud_filters > 0 ||
>> +	     test_bit(I40E_VF_STATE_UC_PROMISC, &vf->vf_states) ||
>> +	     test_bit(I40E_VF_STATE_MC_PROMISC, &vf->vf_states))) {
> 
> [High]
> Does this revoke-trust heuristic miss MAC/VLAN filters that were only
> installable while the VF was trusted?

No.

> 
> A trusted VF is allowed to add many more MAC addresses than
> I40E_VC_MAX_MAC_ADDR_PER_VF (i40e_check_vf_permission() uses
> I40E_VC_MAX_MACVLAN_PER_TRUSTED_VF) and more than I40E_VC_MAX_VLAN_PER_VF
> VLANs.  These quotas are enforced only at add-time.
> 
> The previous always-reset path went through i40e_vc_reset_vf() ->
> i40e_cleanup_reset_vf() -> i40e_free_vf_res() -> i40e_vsi_release(),
> which destroyed the VSI and its mac_filter_hash, and reset
> vf->num_vlan = 0.

Current "reset only on revoke" does exactly the same.

> 
> The new fast path only flips the PRIVILEGE bit, leaving any

NOPE.
New "fast path" only *TESTS* the PRIVILEGE bit,
it does not "clear" it in fast path.

This is just negated/wrong/inverted logic on AI side



^ permalink raw reply

* [PATCH net v2] netpoll: fix a use-after-free on shutdown path
From: Breno Leitao @ 2026-06-25 12:03 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Amerigo Wang
  Cc: netdev, linux-kernel, vlad.wing, asantostc, paulmck, kernel-team,
	stable, Pavan Chebbi, Breno Leitao

There is a use-after-free error on netpoll, which is clearly detected by
KASAN.

      BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x3b/0x80
      Read of size 1 at addr ... by task kworker/9:1
      Workqueue: events queue_process
      Call Trace:
       skb_dequeue+0x1e/0xb0
       queue_process+0x2c/0x600
       process_scheduled_works+0x4b6/0x850
       worker_thread+0x414/0x5a0
      Allocated by task 242:
       __netpoll_setup+0x201/0x4a0
       netpoll_setup+0x249/0x550
       enabled_store+0x32f/0x380
      Freed by task 0:
       kfree+0x1b7/0x540
       rcu_core+0x3f8/0x7a0

The problem happens when there is a pending TX worker running in
parallel with the cleanup path.

This is what happens on netpoll shutdown path:

1) __netpoll_cleanup() is called
2) set dev->npinfo to NULL
3) call_rcu() with rcu_cleanup_netpoll_info()
  3.1) rcu_cleanup_netpoll_info() tries to cancel all workers with
       cancel_delayed_work(), but doesn't wait for the worker to finish
4) and kfree(npinfo);

Because 3.1) doesn't really cancel the work, as the comment says "we
can't call cancel_delayed_work_sync here, as we are in softirq", the TX
worker can run after 4).

Tl;DR: queue_process() is not an RCU reader, it reaches npinfo through
the work item via container_of().

Use disable_delayed_work_sync() to ensure the worker is completely
stopped and prevent any future re-arming attempts. Once npinfo is set
to NULL, senders will bail out and not queue new work. The disable flag
ensures any in-flight re-arming attempts also fail silently.

In the future, we can do the cleanup inline here without needing the
npinfo->rcu rcu_head, but that is net-next material.

Cc: stable@vger.kernel.org
Fixes: 38e6bc185d95 ("netpoll: make __netpoll_cleanup non-block")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- Remove the synchronize_rcu() and keep cancel the tx_work
  before call_rcu(). (Jakub)
- Link to v1: https://lore.kernel.org/r/20260622-netpoll_rcu_fix-v1-1-15c3285e92e6@debian.org
---
 net/core/netpoll.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 229dde818ab33..96d5945e6a30f 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -633,14 +633,6 @@ static void rcu_cleanup_netpoll_info(struct rcu_head *rcu_head)
 			container_of(rcu_head, struct netpoll_info, rcu);
 
 	skb_queue_purge(&npinfo->txq);
-
-	/* we can't call cancel_delayed_work_sync here, as we are in softirq */
-	cancel_delayed_work(&npinfo->tx_work);
-
-	/* clean after last, unfinished work */
-	__skb_queue_purge(&npinfo->txq);
-	/* now cancel it again */
-	cancel_delayed_work(&npinfo->tx_work);
 	kfree(npinfo);
 }
 
@@ -664,6 +656,7 @@ static void __netpoll_cleanup(struct netpoll *np)
 			ops->ndo_netpoll_cleanup(np->dev);
 
 		RCU_INIT_POINTER(np->dev->npinfo, NULL);
+		disable_delayed_work_sync(&npinfo->tx_work);
 		call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
 	}
 

---
base-commit: d07d80b6a129a44538cda1549b7acf95154fb197
change-id: 20260622-netpoll_rcu_fix-def7bce1207a

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply related

* Re: [PATCH net 1/3] i40e: keep q_vectors array in sync with channel count changes
From: Maciej Fijalkowski @ 2026-06-25 11:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tony Nguyen, davem, pabeni, edumazet, andrew+netdev, netdev,
	poros, arkadiusz.kubalewski, przemyslaw.kitszel, horms,
	aleksandr.loktionov, pmenzel, sx.rinitha
In-Reply-To: <20260605163934.547c7bdd@kernel.org>

On Fri, Jun 05, 2026 at 04:39:34PM -0700, Jakub Kicinski wrote:
> On Fri, 5 Jun 2026 11:01:19 -0700 Tony Nguyen wrote:
> > > Should the new err_lump label, and the existing err_vsi exits from the
> > > two allocation steps above, instead unwind through the err_rings block
> > > (unregister_netdev / free_netdev / i40e_devlink_destroy_port /
> > > i40e_aq_delete_element) the way i40e_vsi_setup()'s err_msix path does?
> > > 
> > > The pre-patch code had the same defective err_vsi target for the
> > > qp_pile and arrays paths, but the patch adds two new failure points
> > > (the unconditional q_vectors kzalloc and the new
> > > i40e_vsi_setup_vectors() call) that route into it during reset
> > > rebuild, where vsi->netdev is already registered.  
> >  
> > This does seem valid, but as mentioned by Sashiko the pre-patch code has 
> > the same target/issue. There's a recent submission [1], with changes 
> > requested, that should cover this. Did you want to take this now or wait 
> > and have it sent with this other one?
> 
> Hm. I convinced myself yesterday that the old code did _not_ 
> have the issue because it was pass false as the second arg to
> i40e_vsi_{alloc,free}_arrays() ? Good chance that I misread,
> it's tricky code. As much as I would love to apply this to prevent 
> the deadlock in NIPA - let's wait for the follow up. I'll pick up 
> the other two patches from this series off the list.

FWIW it was our beloved "pre-existing issue", alloc arrays could fail at
ring memory allocation and bail out without de-registering netdev.

Regardless, I'm gonna send a v4 with preceding patch that should fix
this...

> 

^ permalink raw reply

* [PATCH v2 net-next] selftests/xsk: Preserve UMEM view in BIDIRECTIONAL test
From: Maciej Fijalkowski @ 2026-06-25 11:52 UTC (permalink / raw)
  To: netdev
  Cc: bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	tushar.vyavahare, kerneljasonxing, Maciej Fijalkowski

The UMEM state refactor made __send_pkts() use xsk->umem for Tx
address generation. At the same time, the shared-UMEM Tx setup copies the
Rx UMEM state into a Tx-local state object and resets base_addr and
next_buffer before configuring the Tx socket.

Passing that Tx-local object to xsk_configure() makes xsk->umem point to
the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
the roles are switched: the same socket is then used for Rx validation, but
received descriptors from the other logical UMEM half are checked against
base_addr == 0. With the new UMEM bounds check, a valid address such as
base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
window.

Keep xsk->umem as the shared/Rx UMEM view used for socket configuration
and Rx validation. Use the ifobject-local UMEM copy only for Tx descriptor
address generation, preserving the BIDIRECTIONAL test's intent of using
the proper logical UMEM half after the direction switch.

Fixes: b17631032769 ("selftests/xsk: Move UMEM state from ifobject to xsk_socket_info")
Reviewed-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Tested-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
v2:
- fix SoB line
- rebase
- add tags from Tushar
---
 tools/testing/selftests/bpf/prog_tests/test_xsk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index 72875071d4f1..26437d4bdc8e 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -1169,8 +1169,8 @@ static int receive_pkts(struct test_spec *test)
 static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, bool timeout)
 {
 	u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len;
+	struct xsk_umem_info *umem = ifobject->xsk_arr[0].umem_real;
 	struct pkt_stream *pkt_stream = xsk->pkt_stream;
-	struct xsk_umem_info *umem = xsk->umem;
 	bool use_poll = ifobject->use_poll;
 	struct pollfd fds = { };
 	int ret;
@@ -1521,7 +1521,7 @@ static int thread_common_ops_tx(struct test_spec *test, struct ifobject *ifobjec
 	umem_tx->base_addr = 0;
 	umem_tx->next_buffer = 0;

-	ret = xsk_configure(test, ifobject, umem_tx, true);
+	ret = xsk_configure(test, ifobject, umem_rx, true);
 	if (ret)
 		return ret;
 	ifobject->xsk = &ifobject->xsk_arr[0];
-- 
2.43.0

^ permalink raw reply related

* [PATCH net] mlxsw: spectrum_acl_erp: Fix const qualifier of delta_clear()
From: Evgenii Burenchev @ 2026-06-25 11:48 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Evgenii Burenchev, idosch, petrm, andrew+netdev, davem, edumazet,
	kuba, pabeni, jiri, netdev, linux-kernel, lvc-project

mlxsw_sp_acl_erp_delta_clear() takes 'const char *enc_key' but modifies
the memory it points to. This is a logical error in the function
declaration.

The only caller passes a non-const buffer (aentry->ht_key.enc_key), so
the const qualifier is misleading and unnecessary.

Remove const from the enc_key parameter to match the actual usage.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c  | 2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
index cbb272a96359..0d0cd093b3c6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
@@ -1118,7 +1118,7 @@ u8 mlxsw_sp_acl_erp_delta_value(const struct mlxsw_sp_acl_erp_delta *delta,
 }
 
 void mlxsw_sp_acl_erp_delta_clear(const struct mlxsw_sp_acl_erp_delta *delta,
-				  const char *enc_key)
+				  char *enc_key)
 {
 	u16 start = delta->start;
 	u8 mask = delta->mask;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
index 010204f73ea4..67cc7a5737dd 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
@@ -245,7 +245,7 @@ u8 mlxsw_sp_acl_erp_delta_mask(const struct mlxsw_sp_acl_erp_delta *delta);
 u8 mlxsw_sp_acl_erp_delta_value(const struct mlxsw_sp_acl_erp_delta *delta,
 				const char *enc_key);
 void mlxsw_sp_acl_erp_delta_clear(const struct mlxsw_sp_acl_erp_delta *delta,
-				  const char *enc_key);
+				  char *enc_key);
 
 struct mlxsw_sp_acl_erp_mask;
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v2] octeontx2-af: Block VFs from clobbering special CGX PKIND state
From: kernel test robot @ 2026-06-25 11:47 UTC (permalink / raw)
  To: Ratheesh Kannoth, davem, gakula, linux-kernel, netdev, sgoutham
  Cc: llvm, oe-kbuild-all, andrew+netdev, edumazet, kuba, pabeni,
	Hariprasad Kelam, Ratheesh Kannoth
In-Reply-To: <20260625044621.2841831-1-rkannoth@marvell.com>

Hi Ratheesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]
[also build test WARNING on linus/master v7.1 next-20260623]
[cannot apply to linux-review/Ratheesh-Kannoth/octeontx2-af-Block-VFs-from-clobbering-special-CGX-PKIND-state/20260622-133621 horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ratheesh-Kannoth/octeontx2-af-Block-VFs-from-clobbering-special-CGX-PKIND-state/20260625-124846
base:   net/main
patch link:    https://lore.kernel.org/r/20260625044621.2841831-1-rkannoth%40marvell.com
patch subject: [PATCH net v2] octeontx2-af: Block VFs from clobbering special CGX PKIND state
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20260625/202606251954.vsXupLpQ-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 6cc609bb250b21b47fc7d394b4019101e9983597)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260625/202606251954.vsXupLpQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606251954.vsXupLpQ-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1522:6: warning: variable 'pf' set but not used [-Wunused-but-set-variable]
    1522 |         int pf;
         |             ^
>> drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1696:24: warning: variable 'cgx' is uninitialized when used here [-Wuninitialized]
    1696 |                 cgxd = rvu_cgx_pdata(cgx, rvu);
         |                                      ^~~
   drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1521:8: note: initialize the variable 'cgx' to silence this warning
    1521 |         u8 cgx;
         |               ^
         |                = '\0'
   2 warnings generated.


vim +/pf +1522 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c

  1506	
  1507	int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
  1508					  struct nix_lf_alloc_req *req,
  1509					  struct nix_lf_alloc_rsp *rsp)
  1510	{
  1511		int nixlf, qints, hwctx_size, intf, rc = 0;
  1512		u16 bcast, mcast, promisc, ucast;
  1513		struct rvu_hwinfo *hw = rvu->hw;
  1514		u16 pcifunc = req->hdr.pcifunc;
  1515		bool rules_created = false;
  1516		struct rvu_block *block;
  1517		struct rvu_pfvf *pfvf;
  1518		u64 cfg, ctx_cfg;
  1519		struct cgx *cgxd;
  1520		int blkaddr;
  1521		u8 cgx;
> 1522		int pf;
  1523	
  1524		if (!req->rq_cnt || !req->sq_cnt || !req->cq_cnt)
  1525			return NIX_AF_ERR_PARAM;
  1526	
  1527		if (req->way_mask)
  1528			req->way_mask &= 0xFFFF;
  1529	
  1530		pfvf = rvu_get_pfvf(rvu, pcifunc);
  1531		blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, pcifunc);
  1532		if (!pfvf->nixlf || blkaddr < 0)
  1533			return NIX_AF_ERR_AF_LF_INVALID;
  1534	
  1535		block = &hw->block[blkaddr];
  1536		nixlf = rvu_get_lf(rvu, block, pcifunc, 0);
  1537		if (nixlf < 0)
  1538			return NIX_AF_ERR_AF_LF_INVALID;
  1539	
  1540		/* Check if requested 'NIXLF <=> NPALF' mapping is valid */
  1541		if (req->npa_func) {
  1542			/* If default, use 'this' NIXLF's PFFUNC */
  1543			if (req->npa_func == RVU_DEFAULT_PF_FUNC)
  1544				req->npa_func = pcifunc;
  1545			if (!is_pffunc_map_valid(rvu, req->npa_func, BLKTYPE_NPA))
  1546				return NIX_AF_INVAL_NPA_PF_FUNC;
  1547		}
  1548	
  1549		/* Check if requested 'NIXLF <=> SSOLF' mapping is valid */
  1550		if (req->sso_func) {
  1551			/* If default, use 'this' NIXLF's PFFUNC */
  1552			if (req->sso_func == RVU_DEFAULT_PF_FUNC)
  1553				req->sso_func = pcifunc;
  1554			if (!is_pffunc_map_valid(rvu, req->sso_func, BLKTYPE_SSO))
  1555				return NIX_AF_INVAL_SSO_PF_FUNC;
  1556		}
  1557	
  1558		/* If RSS is being enabled, check if requested config is valid.
  1559		 * RSS table size should be power of two, otherwise
  1560		 * RSS_GRP::OFFSET + adder might go beyond that group or
  1561		 * won't be able to use entire table.
  1562		 */
  1563		if (req->rss_sz && (req->rss_sz > MAX_RSS_INDIR_TBL_SIZE ||
  1564				    !is_power_of_2(req->rss_sz)))
  1565			return NIX_AF_ERR_RSS_SIZE_INVALID;
  1566	
  1567		if (req->rss_sz &&
  1568		    (!req->rss_grps || req->rss_grps > MAX_RSS_GROUPS))
  1569			return NIX_AF_ERR_RSS_GRPS_INVALID;
  1570	
  1571		/* Reset this NIX LF */
  1572		rc = rvu_lf_reset(rvu, block, nixlf);
  1573		if (rc) {
  1574			dev_err(rvu->dev, "Failed to reset NIX%d LF%d\n",
  1575				block->addr - BLKADDR_NIX0, nixlf);
  1576			return NIX_AF_ERR_LF_RESET;
  1577		}
  1578	
  1579		ctx_cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST3);
  1580	
  1581		/* Alloc NIX RQ HW context memory and config the base */
  1582		hwctx_size = 1UL << ((ctx_cfg >> 4) & 0xF);
  1583		rc = qmem_alloc(rvu->dev, &pfvf->rq_ctx, req->rq_cnt, hwctx_size);
  1584		if (rc)
  1585			goto free_mem;
  1586	
  1587		pfvf->rq_bmap = kcalloc(req->rq_cnt, sizeof(long), GFP_KERNEL);
  1588		if (!pfvf->rq_bmap) {
  1589			rc = -ENOMEM;
  1590			goto free_mem;
  1591		}
  1592	
  1593		rvu_write64(rvu, blkaddr, NIX_AF_LFX_RQS_BASE(nixlf),
  1594			    (u64)pfvf->rq_ctx->iova);
  1595	
  1596		/* Set caching and queue count in HW */
  1597		cfg = BIT_ULL(36) | (req->rq_cnt - 1) | req->way_mask << 20;
  1598		rvu_write64(rvu, blkaddr, NIX_AF_LFX_RQS_CFG(nixlf), cfg);
  1599	
  1600		/* Alloc NIX SQ HW context memory and config the base */
  1601		hwctx_size = 1UL << (ctx_cfg & 0xF);
  1602		rc = qmem_alloc(rvu->dev, &pfvf->sq_ctx, req->sq_cnt, hwctx_size);
  1603		if (rc)
  1604			goto free_mem;
  1605	
  1606		pfvf->sq_bmap = kcalloc(req->sq_cnt, sizeof(long), GFP_KERNEL);
  1607		if (!pfvf->sq_bmap) {
  1608			rc = -ENOMEM;
  1609			goto free_mem;
  1610		}
  1611	
  1612		rvu_write64(rvu, blkaddr, NIX_AF_LFX_SQS_BASE(nixlf),
  1613			    (u64)pfvf->sq_ctx->iova);
  1614	
  1615		cfg = BIT_ULL(36) | (req->sq_cnt - 1) | req->way_mask << 20;
  1616		rvu_write64(rvu, blkaddr, NIX_AF_LFX_SQS_CFG(nixlf), cfg);
  1617	
  1618		/* Alloc NIX CQ HW context memory and config the base */
  1619		hwctx_size = 1UL << ((ctx_cfg >> 8) & 0xF);
  1620		rc = qmem_alloc(rvu->dev, &pfvf->cq_ctx, req->cq_cnt, hwctx_size);
  1621		if (rc)
  1622			goto free_mem;
  1623	
  1624		pfvf->cq_bmap = kcalloc(req->cq_cnt, sizeof(long), GFP_KERNEL);
  1625		if (!pfvf->cq_bmap) {
  1626			rc = -ENOMEM;
  1627			goto free_mem;
  1628		}
  1629	
  1630		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CQS_BASE(nixlf),
  1631			    (u64)pfvf->cq_ctx->iova);
  1632	
  1633		cfg = BIT_ULL(36) | (req->cq_cnt - 1) | req->way_mask << 20;
  1634		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CQS_CFG(nixlf), cfg);
  1635	
  1636		/* Initialize receive side scaling (RSS) */
  1637		hwctx_size = 1UL << ((ctx_cfg >> 12) & 0xF);
  1638		rc = nixlf_rss_ctx_init(rvu, blkaddr, pfvf, nixlf, req->rss_sz,
  1639					req->rss_grps, hwctx_size, req->way_mask,
  1640					!!(req->flags & NIX_LF_RSS_TAG_LSB_AS_ADDER));
  1641		if (rc)
  1642			goto free_mem;
  1643	
  1644		/* Alloc memory for CQINT's HW contexts */
  1645		cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
  1646		qints = (cfg >> 24) & 0xFFF;
  1647		hwctx_size = 1UL << ((ctx_cfg >> 24) & 0xF);
  1648		rc = qmem_alloc(rvu->dev, &pfvf->cq_ints_ctx, qints, hwctx_size);
  1649		if (rc)
  1650			goto free_mem;
  1651	
  1652		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CINTS_BASE(nixlf),
  1653			    (u64)pfvf->cq_ints_ctx->iova);
  1654	
  1655		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CINTS_CFG(nixlf),
  1656			    BIT_ULL(36) | req->way_mask << 20);
  1657	
  1658		/* Alloc memory for QINT's HW contexts */
  1659		cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
  1660		qints = (cfg >> 12) & 0xFFF;
  1661		hwctx_size = 1UL << ((ctx_cfg >> 20) & 0xF);
  1662		rc = qmem_alloc(rvu->dev, &pfvf->nix_qints_ctx, qints, hwctx_size);
  1663		if (rc)
  1664			goto free_mem;
  1665	
  1666		rvu_write64(rvu, blkaddr, NIX_AF_LFX_QINTS_BASE(nixlf),
  1667			    (u64)pfvf->nix_qints_ctx->iova);
  1668		rvu_write64(rvu, blkaddr, NIX_AF_LFX_QINTS_CFG(nixlf),
  1669			    BIT_ULL(36) | req->way_mask << 20);
  1670	
  1671		/* Setup VLANX TPID's.
  1672		 * Use VLAN1 for 802.1Q
  1673		 * and VLAN0 for 802.1AD.
  1674		 */
  1675		cfg = (0x8100ULL << 16) | 0x88A8ULL;
  1676		rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_CFG(nixlf), cfg);
  1677	
  1678		/* Enable LMTST for this NIX LF */
  1679		rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_CFG2(nixlf), BIT_ULL(0));
  1680	
  1681		/* Set CQE/WQE size, NPA_PF_FUNC for SQBs and also SSO_PF_FUNC */
  1682		if (req->npa_func)
  1683			cfg = req->npa_func;
  1684		if (req->sso_func)
  1685			cfg |= (u64)req->sso_func << 16;
  1686	
  1687		cfg |= (u64)req->xqe_sz << 33;
  1688		rvu_write64(rvu, blkaddr, NIX_AF_LFX_CFG(nixlf), cfg);
  1689	
  1690		/* Config Rx pkt length, csum checks and apad  enable / disable */
  1691		rvu_write64(rvu, blkaddr, NIX_AF_LFX_RX_CFG(nixlf), req->rx_cfg);
  1692	
  1693		/* Configure pkind for TX parse config */
  1694		if (is_pf_cgxmapped(rvu, rvu_get_pf(rvu->pdev, pcifunc))) {
  1695			pf = rvu_get_pf(rvu->pdev, pcifunc);
> 1696			cgxd = rvu_cgx_pdata(cgx, rvu);
  1697	
  1698			mutex_lock(&cgxd->lock);
  1699			if (rvu_cgx_is_pkind_config_permitted(rvu, pcifunc)) {
  1700				cfg = NPC_TX_DEF_PKIND;
  1701				rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_PARSE_CFG(nixlf), cfg);
  1702			}
  1703			mutex_unlock(&cgxd->lock);
  1704		}
  1705	
  1706		if (is_rep_dev(rvu, pcifunc)) {
  1707			pfvf->tx_chan_base = RVU_SWITCH_LBK_CHAN;
  1708			pfvf->tx_chan_cnt = 1;
  1709			goto exit;
  1710		}
  1711	
  1712		intf = is_lbk_vf(rvu, pcifunc) ? NIX_INTF_TYPE_LBK : NIX_INTF_TYPE_CGX;
  1713		if (is_sdp_pfvf(rvu, pcifunc))
  1714			intf = NIX_INTF_TYPE_SDP;
  1715	
  1716		if (is_cn20k(rvu->pdev)) {
  1717			rc = npc_cn20k_dft_rules_idx_get(rvu, pcifunc, &bcast, &mcast,
  1718							 &promisc, &ucast);
  1719			if (rc) {
  1720				rc = npc_cn20k_dft_rules_alloc(rvu, pcifunc);
  1721				if (rc)
  1722					goto free_mem;
  1723	
  1724				rules_created = true;
  1725			}
  1726		}
  1727	
  1728		rc = nix_interface_init(rvu, pcifunc, intf, nixlf, rsp,
  1729					!!(req->flags & NIX_LF_LBK_BLK_SEL));
  1730		if (rc)
  1731			goto free_dft;
  1732	
  1733		/* Disable NPC entries as NIXLF's contexts are not initialized yet */
  1734		rvu_npc_disable_default_entries(rvu, pcifunc, nixlf);
  1735	
  1736		/* Configure RX VTAG Type 7 (strip) for vf vlan */
  1737		rvu_write64(rvu, blkaddr,
  1738			    NIX_AF_LFX_RX_VTAG_TYPEX(nixlf, NIX_AF_LFX_RX_VTAG_TYPE7),
  1739			    VTAGSIZE_T4 | VTAG_STRIP);
  1740	
  1741		goto exit;
  1742	
  1743	free_dft:
  1744		if (is_cn20k(rvu->pdev) && rules_created)
  1745			npc_cn20k_dft_rules_free(rvu, pcifunc);
  1746	
  1747	free_mem:
  1748		nix_ctx_free(rvu, pfvf);
  1749	
  1750	exit:
  1751		/* Set macaddr of this PF/VF */
  1752		ether_addr_copy(rsp->mac_addr, pfvf->mac_addr);
  1753	
  1754		/* set SQB size info */
  1755		cfg = rvu_read64(rvu, blkaddr, NIX_AF_SQ_CONST);
  1756		rsp->sqb_size = (cfg >> 34) & 0xFFFF;
  1757		rsp->rx_chan_base = pfvf->rx_chan_base;
  1758		rsp->tx_chan_base = pfvf->tx_chan_base;
  1759		rsp->rx_chan_cnt = pfvf->rx_chan_cnt;
  1760		rsp->tx_chan_cnt = pfvf->tx_chan_cnt;
  1761		rsp->lso_tsov4_idx = NIX_LSO_FORMAT_IDX_TSOV4;
  1762		rsp->lso_tsov6_idx = NIX_LSO_FORMAT_IDX_TSOV6;
  1763		/* Get HW supported stat count */
  1764		cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST1);
  1765		rsp->lf_rx_stats = ((cfg >> 32) & 0xFF);
  1766		rsp->lf_tx_stats = ((cfg >> 24) & 0xFF);
  1767		/* Get count of CQ IRQs and error IRQs supported per LF */
  1768		cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
  1769		rsp->qints = ((cfg >> 12) & 0xFFF);
  1770		rsp->cints = ((cfg >> 24) & 0xFFF);
  1771		rsp->cgx_links = hw->cgx_links;
  1772		rsp->lbk_links = hw->lbk_links;
  1773		rsp->sdp_links = hw->sdp_links;
  1774	
  1775		return rc;
  1776	}
  1777	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH 6.12.y] net: add missing ns_capable check for peer netns
From: Greg KH @ 2026-06-25 11:37 UTC (permalink / raw)
  To: Maximilian Heyne
  Cc: stable, Marc Kleine-Budde, Vincent Mailhol, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Daniel Borkmann, Nikolay Aleksandrov, Eric W. Biederman,
	linux-can, netdev, linux-kernel, bpf
In-Reply-To: <20260617-pats-coif-316245c6@mheyne-amazon>

On Wed, Jun 17, 2026 at 08:25:31AM +0000, Maximilian Heyne wrote:
> The upstream commit 7b735ef81286 ("rtnetlink: add missing
> netlink_ns_capable() check for peer netns") doesn't apply on older
> stable kernels due to refactoring. Therefore, this patch is an attempt
> to implement the same capability check just directly in the respective
> interface types.

Why can't we take the full series of patches instead?  Otherwise this is
going to be a pain over time for any other fixes/updates in this area,
right?

And if not, then we need acks from the maintainers here...

thanks,

greg k-h

^ permalink raw reply

* [PATCH iproute2-next v5] ip/bond: add lacp_strict support
From: Louis Scalbert @ 2026-06-25 11:42 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
	maheshb, jonas.gorski, horms, stephen, Louis Scalbert

lacp_strict defines the behavior of a LACP bonding interface
when no slaves are in Collecting_Distributing state while at least
'min_links' slaves have carrier.

In the default (off) mode, the bonding master remains up and a
single slave is selected for TX/RX, while traffic received on other
slaves is dropped. This preserves the existing behavior.

In lacp_strict mode, the bonding master reports carrier down in this
situation.

Link: https://lore.kernel.org/netdev/20260603150331.1919611-1-louis.scalbert@6wind.com/
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
 include/uapi/linux/if_link.h |  1 +
 ip/iplink_bond.c             | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 70aee114..d3a21fba 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1601,6 +1601,7 @@ enum {
 	IFLA_BOND_NS_IP6_TARGET,
 	IFLA_BOND_COUPLED_CONTROL,
 	IFLA_BOND_BROADCAST_NEIGH,
+	IFLA_BOND_LACP_STRICT,
 	__IFLA_BOND_MAX,
 };
 
diff --git a/ip/iplink_bond.c b/ip/iplink_bond.c
index 714fe7bd..7e2e397a 100644
--- a/ip/iplink_bond.c
+++ b/ip/iplink_bond.c
@@ -87,6 +87,12 @@ static const char *lacp_rate_tbl[] = {
 	NULL,
 };
 
+static const char *lacp_strict_tbl[] = {
+	"off",
+	"on",
+	NULL,
+};
+
 static const char *ad_select_tbl[] = {
 	"stable",
 	"bandwidth",
@@ -155,6 +161,7 @@ static void print_explain(FILE *f)
 		"                [ ad_user_port_key PORTKEY ]\n"
 		"                [ ad_actor_sys_prio SYSPRIO ]\n"
 		"                [ ad_actor_system LLADDR ]\n"
+		"                [ lacp_strict LACP_STRICT ]\n"
 		"                [ arp_missed_max MISSED_MAX ]\n"
 		"\n"
 		"BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb\n"
@@ -168,6 +175,7 @@ static void print_explain(FILE *f)
 		"AD_SELECT := stable|bandwidth|count\n"
 		"COUPLED_CONTROL := off|on\n"
 		"BROADCAST_NEIGHBOR := off|on\n"
+		"LACP_STRICT := off|on\n"
 	);
 }
 
@@ -188,6 +196,7 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u32 packets_per_slave;
 	__u8 missed_max;
 	__u8 broadcast_neighbor;
+	__u8 lacp_strict;
 	unsigned int ifindex;
 	int ret;
 
@@ -417,6 +426,13 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv,
 				return -1;
 			addattr_l(n, 1024, IFLA_BOND_AD_ACTOR_SYSTEM,
 				  abuf, len);
+		} else if (matches(*argv, "lacp_strict") == 0) {
+			NEXT_ARG();
+			lacp_strict = parse_on_off("lacp_strict", *argv, &ret);
+			if (ret)
+				return ret;
+			lacp_strict = get_index(lacp_strict_tbl, *argv);
+			addattr8(n, 1024, IFLA_BOND_LACP_STRICT, lacp_strict);
 		} else if (matches(*argv, "tlb_dynamic_lb") == 0) {
 			NEXT_ARG();
 			if (get_u8(&tlb_dynamic_lb, *argv, 0)) {
@@ -642,6 +658,10 @@ static void bond_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 			   "all_slaves_active %u ",
 			   rta_getattr_u8(tb[IFLA_BOND_ALL_SLAVES_ACTIVE]));
 
+	if (tb[IFLA_BOND_LACP_STRICT])
+		print_on_off(PRINT_ANY, "lacp_strict", "lacp_strict %s ",
+			     rta_getattr_u8(tb[IFLA_BOND_LACP_STRICT]));
+
 	if (tb[IFLA_BOND_MIN_LINKS])
 		print_uint(PRINT_ANY,
 			   "min_links",
-- 
2.39.2


^ permalink raw reply related

* Re: [PATCH net 3/4] vlan: defer real device state propagation to netdev_work
From: Nicolai Buchwitz @ 2026-06-25 11:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, jv, sdf,
	dongchenchen2, idosch, n05ec, yuantan098, kuniyu,
	aleksandr.loktionov, dtatulea, syzbot+09da62a8b78959ceb8bb,
	syzbot+cb67c392b0b8f0fd0fc1, syzbot+9bb8bd77f3966641f298
In-Reply-To: <20260624182018.2445732-4-kuba@kernel.org>

On 24.6.2026 20:20, Jakub Kicinski wrote:
> vlan_device_event() generates nested UP/DOWN, MTU and feature
> change events. It executes an event for the VLAN device directly
> from the notifier - while the locks of the lower device are held.
> 
> This causes deadlocks, for example:
> 
>   bond    (3) bond_update_speed_duplex(vlan)
>     |           ^                v
>   vlan    (2) UP(vlan)    (4) vlan_ethtool_get_link_ksettings()
>     |           ^                v
>   dummy   (1) UP(dummy)   (5) __ethtool_get_link_ksettings()
> 
> The dummy device is ops locked, vlan creates a nested event (2),
> then bond wants to ask vlan for link state (3). bond uses the
> "I'm already holding the instance lock" flavor of API. But in
> this case the lock held refers to vlan itself. We hit vlan's
> link settings trampoline (4) and call __ethtool_get_link_ksettings()
> which tries to lock dummy. Deadlock. There's no clean way for us
> to tell the vlan_ethtool_get_link_ksettings() that the caller
> is already in lower device's critical section.
> 
> Defer the propagation to the per-netdev work facility instead:
> the notifier only schedules netdev_work_sched(vlandev, VLAN_WORK_*),
> and ndo_work (vlan_dev_work) applies the change later. Hopefully
> nobody expects the VLAN state changes to be instantaneous.
> 
> If someone does expect the changes to be instantaneous we will
> have to do the same thing Stan did for rx_mode and "strategically"
> place sync calls, to make sure such delayed works are executed
> after we drop the ops lock but before we drop rtnl_lock.
> 
> Stan suggests that if we need that down the line we may
> consider reshaping the mechanism into "async notifications".
> AFAICT only vlan does this sort of netdev open chaining,
> so as a first try I think that sticking the complexity into
> the vlan code makes sense.
> 
> One corner case is that we need to cancel the event if user
> explicitly changes the state before work could run. Consider
> the following operations with vlan0 on top of dummy0:
> 
>   ip link set dev dummy0 up    # queues work to up vlan0
>   ip link set dev vlan0 down   # user explicitly downs the vlan
>   ndo_work                     # acts on the stale event
> 
> Reported-by: syzbot+09da62a8b78959ceb8bb@syzkaller.appspotmail.com
> Reported-by: syzbot+cb67c392b0b8f0fd0fc1@syzkaller.appspotmail.com
> Reported-by: syzbot+9bb8bd77f3966641f298@syzkaller.appspotmail.com
> Fixes: 9f275c2e9020 ("net: ethtool: make sure 
> __ethtool_get_link_ksettings() is ops-locked")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---

> [...]

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>

Thanks
Nicolai

^ permalink raw reply

* [PATCH net] nfc: clear active_target when the target list is replaced
From: Yinhao Hu @ 2026-06-25 11:18 UTC (permalink / raw)
  To: netdev
  Cc: david, davem, edumazet, kuba, pabeni, horms, dzm91,
	hust-os-kernel-patches, Yinhao Hu

nfc_activate_target() and nfc_dep_link_up() cache dev->active_target as a
raw pointer into the dev->targets array. When a later poll reports new
targets, nfc_targets_found() frees and replaces dev->targets but does not
clear dev->active_target, so the cached pointer is left dangling into
freed memory. Any subsequent NFC core path that dereferences
dev->active_target->idx then reads the freed memory, e.g.
nfc_deactivate_target(), nfc_data_exchange().

When nfc_targets_found() is about to free the current target array, clear
dev->active_target if it points into that array, and tear down the
associated active state (stop the presence-check timer, drop the DEP link
and reset the RF mode) as nfc_deactivate_target() does.

Fixes: 900994332675 ("NFC: Cache the core NFC active target pointer instead of its index")
Signed-off-by: Yinhao Hu <dddddd@hust.edu.cn>
---
 net/nfc/core.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/net/nfc/core.c b/net/nfc/core.c
index a92a6566e6a0..950807906645 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -786,6 +786,21 @@ int nfc_targets_found(struct nfc_dev *dev,

 	dev->targets_generation++;

+	if (dev->active_target && dev->targets) {
+		for (i = 0; i < dev->n_targets; i++) {
+			if (dev->active_target != &dev->targets[i])
+				continue;
+
+			if (dev->ops->check_presence)
+				timer_delete_sync(&dev->check_pres_timer);
+
+			dev->active_target = NULL;
+			dev->dep_link_up = false;
+			dev->rf_mode = NFC_RF_NONE;
+			break;
+		}
+	}
+
 	kfree(dev->targets);
 	dev->targets = NULL;

-- 
2.43.0

^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp reporting when NET_RX_BUSY_POLL is disabled
From: Marcin Szycik @ 2026-06-25 11:07 UTC (permalink / raw)
  To: Florian Bezdeka, Kwapulinski, Piotr, Ding Meng, Nguyen, Anthony L,
	Kitszel, Przemyslaw, andrew+netdev@lunn.ch, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	Kiszka, Jan
  Cc: intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, wq.wang@siemens.com
In-Reply-To: <d058b0fa9ad923514084a44f51c78ae8355c4ebb.camel@siemens.com>



On 24/06/2026 11:05, Florian Bezdeka via Intel-wired-lan wrote:
> On Tue, 2026-06-23 at 09:46 +0000, Kwapulinski, Piotr wrote:
>>> -----Original Message-----
>>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ding Meng via Intel-wired-lan
>>> Sent: Monday, June 22, 2026 6:13 AM
>>> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Kiszka, Jan <jan.kiszka@siemens.com>; Bezdeka, Florian <florian.bezdeka@siemens.com>
>>> Cc: intel-wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; meng.ding@siemens.com; wq.wang@siemens.com
>>> Subject: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp reporting when NET_RX_BUSY_POLL is disabled
>>>
>>> When CONFIG_NET_RX_BUSY_POLL is deactivated, fetching RX HW timestamps from the NIC no longer works as expected.
>>>
>>> This occurs because disabling CONFIG_NET_RX_BUSY_POLL disables the SKB NAPI mapping in __skb_mark_napi_id(). Consequently, get_timestamp() fails to perform its driver lookup, and the igc driver's struct net_device_ops::ndo_get_tstamp is never invoked.
>>>
>>> Instead, get_timestamp() falls back to use shhwtstamps(skb)->hwtstamp, a field that the driver has not populated.
>>>
>>> Fix this by populating the hwtstamp field with the correct timestamp in the default timer when CONFIG_NET_RX_BUSY_POLL is disabled.
>>>
>>> Fixes: 069b142f5819 ("igc: Add support for PTP .getcyclesx64()")
>>> Co-developed-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> Signed-off-by: Ding Meng <meng.ding@siemens.com>
>>> ---
>>> drivers/net/ethernet/intel/igc/igc_main.c | 38 ++++++++++++++++-------
>>> 1 file changed, 26 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
>>> index 8ac16808023..1da8d7aa76d 100644
>>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>>> @@ -1992,7 +1992,26 @@ static struct sk_buff *igc_build_skb(struct igc_ring *rx_ring,
>>> 	return skb;
>>> }
>>>
>>> -static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
>>> +static void igc_construct_skb_timestamps(struct igc_adapter *adapter,
>>> +					 struct sk_buff *skb,
>>> +					 struct igc_xdp_buff *ctx)
>>> +{
>>> +	if (!ctx->rx_ts)
>>> +		return;
>>> +#ifdef CONFIG_NET_RX_BUSY_POLL
>>> +	skb_shinfo(skb)->tx_flags |= SKBTX_HW_TSTAMP_NETDEV;
>>> +	skb_hwtstamps(skb)->netdev_data = ctx->rx_ts; #else
>>> +	struct igc_inline_rx_tstamps *tstamps;
>> Please move at the top of the function and add:
> 
> That would trigger a "unused variable" warning in the
> CONFIG_NET_RX_BUSY_POLL case.

Put it under #ifndef CONFIG_NET_RX_BUSY_POLL. Variable declarations
need to be on top.

Thanks,
Marcin

> Btw: I was really confused that the #else statement moved to the end of
> the previous line. Might someone be using a wrongly configured mail
> client here?
> 
> Florian
> 
>> Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com
>>
>>> +
>>> +	tstamps = ctx->rx_ts;
>>> +	skb_hwtstamps(skb)->hwtstamp = igc_ptp_rx_pktstamp(adapter,
>>> +							   tstamps->timer0);
>>> +#endif
>>> +}
>>> +
> 
> [snip]


^ permalink raw reply

* [PATCH bpf-next v10 5/5] selftests/bpf: add bpf_icmp_send no route test
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

For normal live cgroup_skb paths, the skb should already be routed. The
exception is for test run via BPF_PROG_TEST_RUN with packets created
via bpf_prog_test_run_skb. Those lack dst route and thus the icmp_send
would quietly fail by returning early.

This test exercises this and makes sure the kfunc returns -ENETUNREACH.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 26 +++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index bb532aa0d158..ffaf0fe1880b 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -169,6 +169,29 @@ static void run_icmp_test(struct icmp_send *skel, int af, const char *ip,
 	}
 }

+static void run_icmp_no_route_test(struct icmp_send *skel)
+{
+	struct ipv4_packet pkt = pkt_v4;
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt,
+		.data_size_in = sizeof(pkt),
+	);
+	int err;
+
+	pkt.iph.version = 4;
+	pkt.iph.daddr = inet_addr("127.0.0.1");
+	pkt.tcp.dest = htons(80);
+	skel->bss->server_port = 80;
+	skel->bss->unreach_type = ICMP_DEST_UNREACH;
+	skel->bss->unreach_code = ICMP_HOST_UNREACH;
+	skel->data->kfunc_ret = KFUNC_RET_UNSET;
+
+	err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.egress), &opts);
+	if (!ASSERT_OK(err, "test_run"))
+		return;
+	ASSERT_EQ(skel->data->kfunc_ret, -ENETUNREACH, "kfunc_ret_no_route");
+}
+
 void test_icmp_send_unreach_cgroup(void)
 {
 	struct icmp_send *skel;
@@ -193,6 +216,9 @@ void test_icmp_send_unreach_cgroup(void)
 	if (test__start_subtest("ipv6"))
 		run_icmp_test(skel, AF_INET6, "::1", ICMPV6_REJECT_ROUTE);

+	if (test__start_subtest("no_route"))
+		run_icmp_no_route_test(skel);
+
 cleanup:
 	icmp_send__destroy(skel);
 	if (cgroup_fd >= 0)
--
2.34.1


^ permalink raw reply related

* [PATCH bpf-next v10 4/5] selftests/bpf: add bpf_icmp_send recursion test
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
  To: bpf
  Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
	yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
	Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>

This test is similar to test_icmp_send_unreach_cgroup but checks that,
in case of recursion, meaning that the BPF program calling the kfunc was
re-triggered by the icmp_send done by the kfunc, the kfunc will stop
early and return -EBUSY.

The test attaches to the root cgroup to ensure the ICMP packet generated
by the kfunc re-triggers the BPF program.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
 .../bpf/prog_tests/icmp_send_kfunc.c          | 46 ++++++++++++++++
 tools/testing/selftests/bpf/progs/icmp_send.c | 55 +++++++++++++++++++
 2 files changed, 101 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index bbb3c3d4509c..bb532aa0d158 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -1,8 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 #include <network_helpers.h>
+#include <cgroup_helpers.h>
 #include <linux/errqueue.h>
 #include <poll.h>
+#include <unistd.h>
 #include "icmp_send.skel.h"

 #define TIMEOUT_MS 1000
@@ -10,6 +12,7 @@
 #define ICMP_DEST_UNREACH 3
 #define ICMPV6_DEST_UNREACH 1

+#define ICMP_HOST_UNREACH 1
 #define ICMP_FRAG_NEEDED 4
 #define NR_ICMP_UNREACH 15
 #define ICMPV6_REJECT_ROUTE 6
@@ -195,3 +198,46 @@ void test_icmp_send_unreach_cgroup(void)
 	if (cgroup_fd >= 0)
 		close(cgroup_fd);
 }
+
+void test_icmp_send_unreach_recursion(void)
+{
+	struct icmp_send *skel;
+	int cgroup_fd = -1;
+	int err;
+
+	err = setup_cgroup_environment();
+	if (!ASSERT_OK(err, "setup_cgroup_environment"))
+		return;
+
+	skel = icmp_send__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	cgroup_fd = get_root_cgroup();
+	if (!ASSERT_OK_FD(cgroup_fd, "get_root_cgroup"))
+		goto cleanup;
+
+	skel->data->target_pid = getpid();
+	skel->links.recursion =
+		bpf_program__attach_cgroup(skel->progs.recursion, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.recursion, "prog_attach_cgroup"))
+		goto cleanup;
+
+	trigger_prog_read_icmp_errqueue(skel, ICMP_HOST_UNREACH, AF_INET,
+					"127.0.0.1");
+
+	/*
+	 * Because there's recursion involved, the first call will return at
+	 * index 1 since it will return the second, and the second call will
+	 * return at index 0 since it will return the first.
+	 */
+	ASSERT_EQ(skel->bss->rec_count, 2, "rec_count");
+	ASSERT_EQ(skel->data->rec_kfunc_rets[0], -EBUSY, "kfunc_rets[0]");
+	ASSERT_EQ(skel->data->rec_kfunc_rets[1], 0, "kfunc_rets[1]");
+
+cleanup:
+	icmp_send__destroy(skel);
+	if (cgroup_fd >= 0)
+		close(cgroup_fd);
+	cleanup_cgroup_environment();
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
index 6e1ba539eeb0..c642ccdf9fd5 100644
--- a/tools/testing/selftests/bpf/progs/icmp_send.c
+++ b/tools/testing/selftests/bpf/progs/icmp_send.c
@@ -12,6 +12,10 @@ __u16 server_port = 0;
 int unreach_type = 0;
 int unreach_code = 0;
 int kfunc_ret = -1;
+int target_pid = -1;
+
+unsigned int rec_count = 0;
+int rec_kfunc_rets[] = { -1, -1 };

 SEC("cgroup_skb/egress")
 int egress(struct __sk_buff *skb)
@@ -65,4 +69,55 @@ int egress(struct __sk_buff *skb)
 	return SK_DROP;
 }

+SEC("cgroup_skb/egress")
+int recursion(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+	struct icmphdr *icmph;
+	struct tcphdr *tcph;
+	struct iphdr *iph;
+	int ret;
+
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return SK_PASS;
+
+	iph = data;
+	if ((void *)(iph + 1) > data_end || iph->version != 4)
+		return SK_PASS;
+
+	if (iph->daddr != bpf_htonl(SERVER_IP))
+		return SK_PASS;
+
+	if (iph->protocol == IPPROTO_TCP) {
+		tcph = (void *)iph + iph->ihl * 4;
+		if ((void *)(tcph + 1) > data_end ||
+		    tcph->dest != bpf_htons(server_port))
+			return SK_PASS;
+	} else if (iph->protocol == IPPROTO_ICMP) {
+		icmph = (void *)iph + iph->ihl * 4;
+		if ((void *)(icmph + 1) > data_end ||
+		    icmph->type != unreach_type || icmph->code != unreach_code)
+			return SK_PASS;
+	} else {
+		return SK_PASS;
+	}
+
+	/*
+	 * This call will provoke a recursion: the ICMP packet generated by the
+	 * kfunc will re-trigger this program since we are in the root cgroup in
+	 * which the kernel ICMP socket belongs. However when re-entering the
+	 * kfunc, it should return EBUSY.
+	 */
+	ret = bpf_icmp_send(skb, unreach_type, unreach_code);
+	rec_kfunc_rets[rec_count & 1] = ret;
+	__sync_fetch_and_add(&rec_count, 1);
+
+	/* Let the first ICMP error message pass */
+	if (iph->protocol == IPPROTO_ICMP)
+		return SK_PASS;
+
+	return SK_DROP;
+}
+
 char LICENSE[] SEC("license") = "Dual BSD/GPL";
--
2.34.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox