* Re: [PATCH] vhost/net: fix clear_user start address in VHOST_GET_FEATURES_ARRAY
From: Eugenio Perez Martin @ 2026-06-25 13:56 UTC (permalink / raw)
To: rom.wang
Cc: Michael S . Tsirkin, Jason Wang, Paolo Abeni, kvm, virtualization,
netdev, linux-kernel, Yufeng Wang
In-Reply-To: <CAJaqyWcFm0A5ucL5TLP8+T8JNOiZyaL-_mb747_fKhH9Qm83ig@mail.gmail.com>
On Thu, Jun 25, 2026 at 3:48 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, May 26, 2026 at 10:04 AM rom.wang <r4o5m6e8o@163.com> wrote:
> >
> > From: Yufeng Wang <wangyufeng@kylinos.cn>
> >
> > The clear_user() call in VHOST_GET_FEATURES_ARRAY incorrectly starts
> > at argp, which is the beginning of the features array, overwriting the
> > data just written by copy_to_user(). It should start after the copied
> > elements at argp + copied * sizeof(u64) to only zero the trailing
> > unused space.
> >
> > Fixes: 333c515d1896 ("vhost-net: allow configuring extended features")
> > Signed-off-by: Yufeng Wang <wangyufeng@kylinos.cn>
> > ---
> > drivers/vhost/net.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index db341c922673..70c578acf840 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -1777,7 +1777,8 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
> > return -EFAULT;
> >
> > /* Zero the trailing space provided by user-space, if any */
> > - if (clear_user(argp, size_mul(count - copied, sizeof(u64))))
> > + if (clear_user(argp + copied * sizeof(u64),
> > + size_mul(count - copied, sizeof(u64))))
>
> The fix looks good to me, but why not use size_mul() macro for copied
> * sizeof(u64) multiplication?
>
Also, could you add a new switch to tools/virtio/vhost_net_test.c to
use the VHOST_GET_FEATURES_ARRAY and VHOST_SET_FEATURES_ARRAY instead
of VHOST_GET_FEATURES and VHOST_SET_FEATURES?
> > return -EFAULT;
> > return 0;
> > case VHOST_SET_FEATURES_ARRAY:
> > --
> > 2.34.1
> >
> >
^ permalink raw reply
* Re: [PATCH] vhost/net: fix clear_user start address in VHOST_GET_FEATURES_ARRAY
From: Eugenio Perez Martin @ 2026-06-25 13:48 UTC (permalink / raw)
To: rom.wang
Cc: Michael S . Tsirkin, Jason Wang, Paolo Abeni, kvm, virtualization,
netdev, linux-kernel, Yufeng Wang
In-Reply-To: <20260526080336.61296-1-r4o5m6e8o@163.com>
On Tue, May 26, 2026 at 10:04 AM rom.wang <r4o5m6e8o@163.com> wrote:
>
> From: Yufeng Wang <wangyufeng@kylinos.cn>
>
> The clear_user() call in VHOST_GET_FEATURES_ARRAY incorrectly starts
> at argp, which is the beginning of the features array, overwriting the
> data just written by copy_to_user(). It should start after the copied
> elements at argp + copied * sizeof(u64) to only zero the trailing
> unused space.
>
> Fixes: 333c515d1896 ("vhost-net: allow configuring extended features")
> Signed-off-by: Yufeng Wang <wangyufeng@kylinos.cn>
> ---
> drivers/vhost/net.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index db341c922673..70c578acf840 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -1777,7 +1777,8 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
> return -EFAULT;
>
> /* Zero the trailing space provided by user-space, if any */
> - if (clear_user(argp, size_mul(count - copied, sizeof(u64))))
> + if (clear_user(argp + copied * sizeof(u64),
> + size_mul(count - copied, sizeof(u64))))
The fix looks good to me, but why not use size_mul() macro for copied
* sizeof(u64) multiplication?
> return -EFAULT;
> return 0;
> case VHOST_SET_FEATURES_ARRAY:
> --
> 2.34.1
>
>
^ permalink raw reply
* Re: [PATCH v2 net-next] selftests/xsk: Preserve UMEM view in BIDIRECTIONAL test
From: Jason Xing @ 2026-06-25 13:40 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
tushar.vyavahare
In-Reply-To: <20260625115215.1101928-1-maciej.fijalkowski@intel.com>
On Thu, Jun 25, 2026 at 7:52 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> The UMEM state refactor made __send_pkts() use xsk->umem for Tx
> address generation. At the same time, the shared-UMEM Tx setup copies the
> Rx UMEM state into a Tx-local state object and resets base_addr and
> next_buffer before configuring the Tx socket.
>
> Passing that Tx-local object to xsk_configure() makes xsk->umem point to
> the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
> the roles are switched: the same socket is then used for Rx validation, but
> received descriptors from the other logical UMEM half are checked against
> base_addr == 0. With the new UMEM bounds check, a valid address such as
> base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
> window.
>
> Keep xsk->umem as the shared/Rx UMEM view used for socket configuration
> and Rx validation. Use the ifobject-local UMEM copy only for Tx descriptor
> address generation, preserving the BIDIRECTIONAL test's intent of using
> the proper logical UMEM half after the direction switch.
>
> Fixes: b17631032769 ("selftests/xsk: Move UMEM state from ifobject to xsk_socket_info")
> Reviewed-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
> Tested-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Oh, you've already pushed a v2 patch, so again:
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Thanks!
^ permalink raw reply
* Re: [PATCH net-next] selftests/xsk: preserve UMEM view in bidi test
From: Jason Xing @ 2026-06-25 13:38 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
tushar.vyavahare
In-Reply-To: <20260623091008.1046547-1-maciej.fijalkowski@intel.com>
On Tue, Jun 23, 2026 at 5:10 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> The UMEM state refactor made __send_pkts() use xsk->umem for Tx
> address generation. At the same time, the shared-UMEM Tx setup copies the
> Rx UMEM state into a Tx-local state object and resets base_addr and
> next_buffer before configuring the Tx socket.
>
> Passing that Tx-local object to xsk_configure() makes xsk->umem point to
> the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
> the roles are switched: the same socket is then used for Rx validation, but
> received descriptors from the other logical UMEM half are checked against
> base_addr == 0. With the new UMEM bounds check, a valid address such as
> base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
> window.
>
> Keep xsk->umem as the shared/Rx UMEM view used for socket configuration
> and Rx validation. Use the ifobject-local UMEM copy only for Tx descriptor
> address generation, preserving the BIDIRECTIONAL test's intent of using
> the proper logical UMEM half after the direction switch.
>
> Fixes: b17631032769 ("selftests/xsk: Move UMEM state from ifobject to xsk_socket_info")
> Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Thanks!
^ permalink raw reply
* [PATCH v1 1/1] ocp: Add I2C control support for Adva TimeCard
From: Sagi Maimon @ 2026-06-25 13:38 UTC (permalink / raw)
To: jonathan.lemon, vadim.fedorenko, richardcochran, andrew+netdev,
davem, edumazet, kuba, pabeni
Cc: linux-kernel, netdev, Sagi Maimon
- Load i2c-dev module to expose /dev/i2c-N character devices
- Add sysfs-based I2C bus control for Adva TimeCard model
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
---
drivers/ptp/ptp_ocp.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/ptp/ptp_ocp.c b/drivers/ptp/ptp_ocp.c
index 35e911f1ad78..1b4ccb4feca5 100644
--- a/drivers/ptp/ptp_ocp.c
+++ b/drivers/ptp/ptp_ocp.c
@@ -4224,6 +4224,34 @@ static const struct ocp_attr_group art_timecard_groups[] = {
{ },
};
+static ssize_t
+i2c_bus_ctrl_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct ptp_ocp *bp = dev_get_drvdata(dev);
+
+ if (!bp->pps_select)
+ return -ENODEV;
+ return sysfs_emit(buf, "0x%08x\n",
+ ioread32(&bp->pps_select->__pad1));
+}
+
+static ssize_t
+i2c_bus_ctrl_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct ptp_ocp *bp = dev_get_drvdata(dev);
+ u32 val;
+
+ if (!bp->pps_select)
+ return -ENODEV;
+ if (kstrtou32(buf, 0, &val))
+ return -EINVAL;
+ iowrite32(val, &bp->pps_select->__pad1);
+ return count;
+}
+
+static DEVICE_ATTR_RW(i2c_bus_ctrl);
+
static struct attribute *adva_timecard_attrs[] = {
&dev_attr_serialnum.attr,
&dev_attr_gnss_sync.attr,
@@ -4272,6 +4300,7 @@ static struct attribute *adva_timecard_x1_attrs[] = {
&dev_attr_ts_window_adjust.attr,
&dev_attr_utc_tai_offset.attr,
&dev_attr_tod_correction.attr,
+ &dev_attr_i2c_bus_ctrl.attr,
NULL,
};
@@ -5235,6 +5264,7 @@ ptp_ocp_init(void)
const char *what;
int err;
+ request_module("i2c-dev");
ptp_ocp_debugfs_init();
what = "timecard class";
--
2.47.0
^ permalink raw reply related
* Re: [PATCH net] mlxsw: spectrum_acl_erp: Fix const qualifier of delta_clear()
From: Petr Machata @ 2026-06-25 13:27 UTC (permalink / raw)
To: Evgenii Burenchev
Cc: stable, Greg Kroah-Hartman, idosch, petrm, andrew+netdev, davem,
edumazet, kuba, pabeni, jiri, netdev, linux-kernel, lvc-project
In-Reply-To: <20260625114831.17386-1-evg28bur@yandex.ru>
Evgenii Burenchev <evg28bur@yandex.ru> writes:
> mlxsw_sp_acl_erp_delta_clear() takes 'const char *enc_key' but modifies
> the memory it points to. This is a logical error in the function
> declaration.
>
> The only caller passes a non-const buffer (aentry->ht_key.enc_key), so
> the const qualifier is misleading and unnecessary.
>
> Remove const from the enc_key parameter to match the actual usage.
>
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>
> Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
> Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru>
Dunno how much of a net material this is, there's no bug to be fixed,
it's a source code cleanliness improvement. But the patch is correct.
Reviewed-by: Petr Machata <petrm@nvidia.com>
^ permalink raw reply
* [PATCH nf-next] netfilter: remove redundant null check before kvfree()
From: Subasri S @ 2026-06-25 13:31 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Cc: netfilter-devel, coreteam, netdev, linux-kernel, Subasri S
kvfree() internally performs NULL check on the pointer
handed to it and takes no action if it indeed is NULL.
Hence there is no need for a pre-check of the memory
pointer before handing it to kvfree().
Issue reported by ifnullfree.cocci Coccinelle semantic
patch script.
Signed-off-by: Subasri S <subasris1210@gmail.com>
---
net/netfilter/nft_set_rbtree.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 018bbb6df4ce..efc25e788a1c 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -544,8 +544,7 @@ static int nft_array_intervals_alloc(struct nft_array *array, u32 max_intervals)
if (!intervals)
return -ENOMEM;
- if (array->intervals)
- kvfree(array->intervals);
+ kvfree(array->intervals);
array->intervals = intervals;
array->max_intervals = max_intervals;
--
2.43.0
^ permalink raw reply related
* Re: [BUG] TCP connection deadlock under simultaneous bidirectional ICSK_ACK_NOMEM (OOM)
From: xietangxin @ 2026-06-25 13:22 UTC (permalink / raw)
To: Menglong Dong
Cc: edumazet, davem, kuba, pabeni, jmaloy, menglong8.dong, kuniyu,
horms, willemb, netdev, linux-kernel, linux-stable
In-Reply-To: <g7DvITFWSiW9AoI49uyghw@linux.dev>
On 6/8/2026 7:55 PM, Menglong Dong wrote:
> On 2026/6/4 16:22 xietangxin <xietangxin@yeah.net> write:
>> Hi all,
>>
>> We have observed a TCP connection deadlock on stable 6.6 under heavy stress testing.
>>
>> 1.Both Peer A and Peer B enter the ICSK_ACK_NOMEM branch in tcp_select_window().
>> After commit 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"),
>> Both peers freeze their rcv_nxt and set rcv_wnd = 0.
>>
>> 2.Prior to freezing, both sides had already sent out flight data.
>> Since both sides are dropping incoming data packets due to OOM, rcv_nxt stops advancing,
>> but the peer's seq of subsequent packets continues to grow.
>>
>> 3.When Peer A receives Peer B's Zero Window ACK,
>> the packet's seq is far ahead of Peer A's frozen rcv_nxt.
>> Both peers drop each other's packet, also no Zero Window Probes are triggered
>> because snd_wnd is never updated to 0.
>>
>
> Hi,
>
> The problem you addressed is already fixed in this commit:
> 0e24d17bd966 ("tcp: implement RFC 7323 window retraction receiver requirements"),
> which hasn't been picked to the 6.6 branch.
>
> That patch doesn't have the Fix tag, so I'm not sure if it will be picked
> to the 6.6 branch. Just CC the linux-stable :)
>
> Thanks!
> Menglong Dong
>
>>
>> Simplified Packet Trace:
>>
>> Assume Peer A's rcv_nxt = 1000, and Peer B's rcv_nxt = 5000 initially.
>>
>> Time Dir Type Seq Ack Win Len Status
>> ------------------------------------------------------------------------
>> T1: B -> A [PSH, ACK] 1000 5000 3000 100 (A hits OOM, rcv_nxt=1000)
>> T2: B -> A [ACK] 1100 5000 3000 200 (Dropped due to A's OOM)
>> T3: B -> A [PSH, ACK] 1300 5000 3000 200 (Dropped due to A's OOM)
>>
>> T4: A -> B [PSH, ACK] 5000 1000 3000 100 (B hits OOM, rcv_nxt=5000)
>> T5: A -> B [ACK] 5100 1000 3000 200 (Dropped due to B's OOM)
>> T6: A -> B [PSH, ACK] 5300 1000 3000 200 (Dropped due to B's OOM)
>>
>> -- Both sides are now in OOM. B's Seq is 1500; A's Seq is 5500 --
>>
>> T7: B -> A [ZeroWin] 1500 5000 0 0 (Dropped: Seq 1500 != 1000)
>> T8: A -> B [ZeroWin] 5500 1000 0 0 (Dropped: Seq 5500 != 5000)
>> T9: A -> B [WinUpdate] 5500 1000 20 0 (Dropped: Seq 5500 != 5000)
>>
>> Should we relax the sequence check in tcp_sequence() for zero window ACK?
>>
>> Any feedback or guidance would be greatly appreciated.
>>
>> --
>> Best regards,
>> Tangxin Xie
>>
>>
>>
>
>
>
Hi,
We observed a throughput regression (dropping from ~1GB/s to 100MB/s)
in our test environment after commit 0e24d17bd966
("tcp: implement RFC 7323 window retraction receiver requirements").
When the rcv_buf reaches the pressure triggers tcp_clamp_window().
then rcv_ssthresh is strictly capped to 2 * advmss.
Subsequently, even after the user completely consumes the data and releases
a massive amount of free_space, tcp_select_window() is still heavily
suppressed by the clamped rcv_ssthresh. As a result, the receiver advertises
an extremely small window (Win=23) to the peer.
The sender cannot transmit any new data segments, until the sender's RTO timer
expires and triggers a slow-start recovery. This 200ms silence window slashes
our bandwidth by 90%.
No. Time Source Destination Info
-----------------------------------------------------------------------------------------------
1045 08:16:06.8005 192.168.1.9 192.168.1.10 [TCP ZeroWindow] 57334 -> 6666 [PSH, ACK] Win=0
1052 08:16:06.8013 192.168.1.9 192.168.1.10 [TCP Window Update] 57334 -> 6666 [ACK] Win=23
1055 08:16:06.8036 192.168.1.10 192.168.1.9 6666 -> 57334 [ACK] Seq=2999704568 Ack=2416286095
=========================== 200ms SILENCE (RTO WAITING) ===================================
1088 08:16:07.0056 192.168.1.10 192.168.1.9 [TCP Retransmission] 6666 -> 57334 Len=1448
1090 08:16:07.0060 192.168.1.10 192.168.1.9 [TCP Retransmission] Len=2896
--
Best regards,
Tangxin Xie
^ permalink raw reply
* Re: [PATCH net] nfc: nci: fix out-of-bounds write in nci_target_auto_activated()
From: Simon Horman @ 2026-06-25 12:57 UTC (permalink / raw)
To: Sam P
Cc: david, davem, edumazet, kuba, pabeni, oe-linux-nfc, netdev,
linux-kernel, stable
In-Reply-To: <443e2ee1-e9c1-45ca-be57-0c67966ec7d9@bynar.io>
On Wed, Jun 24, 2026 at 12:33:21AM +0200, Sam P wrote:
> On 23/06/2026 19:21, Simon Horman wrote:
> > > diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
> > > index c96512bb86531..566ca839fa488 100644
> > > --- a/net/nfc/nci/ntf.c
> > > +++ b/net/nfc/nci/ntf.c
> > > @@ -603,6 +603,12 @@ static void nci_target_auto_activated(struct nci_dev *ndev,
> > > struct nfc_target *target;
> > > int rc;
> > >
> > > + /* This is a new target, check if we've enough room */
> > > + if (ndev->n_targets == NCI_MAX_DISCOVERED_TARGETS) {
> > > + pr_debug("not enough room, ignoring new target...\n");
> > > + return;
> >
> > [Severity: High]
> > Does this early return cause the state machine to stall?
> >
> > Looking at nci_rf_intf_activated_ntf_packet(), the state is transitioned
> > to NCI_POLL_ACTIVE right before this function is called:
> >
> > net/nfc/nci/ntf.c:nci_rf_intf_activated_ntf_packet() {
> > ...
> > atomic_set(&ndev->state, NCI_POLL_ACTIVE);
> > if (err == NCI_STATUS_OK)
> > nci_target_auto_activated(ndev, &ntf);
> > ...
> > }
> >
> > Because of the early return here, nfc_targets_found() is skipped. Does this
> > leave the state wedged in NCI_POLL_ACTIVE, permanently blocking future
> > attempts to stop polling since nci_stop_poll() requires the state to be
> > NCI_DISCOVERY?
> >
> > > + }
> > > +
> > > target = &ndev->targets[ndev->n_targets];
> >
> > [ ... ]
>
> Thanks for sharing the review.
>
> I don't believe the early return can stall the state machine, as
> nci_target_auto_activated() does not touch ndev->state.
>
> nci_rf_intf_activated_ntf_packet() sets ndev->state = NCI_POLL_ACTIVE
> before it calls nci_target_auto_activated(), so the transition has
> already happened by the time the new check runs. Neither
> nci_target_auto_activated() nor nfc_targets_found() modifies ndev->state;
> nfc_targets_found() only clears the nfc_dev dev->polling flag under
> device_lock, which is independent of the NCI state machine that
> nci_stop_poll() tests. So ndev->state is NCI_POLL_ACTIVE after an
> auto-activation whether the target is appended or the guard returns
> early; the state behaviour is the same with or without this patch.
>
> The check is the same n_targets == NCI_MAX_DISCOVERED_TARGETS guard the
> sibling nci_add_new_target() (the RF_DISCOVER_NTF path) has used since
> the Fixes: commit; it only drops the out-of-spec 11th+ target, which a
> conformant controller shouldn't produce. The device still returns to
> NCI_DISCOVERY/NCI_IDLE through the normal RF_DEACTIVATE path.
Thanks, I agree with your analysis.
> > [Severity: Critical]
> > This isn't a bug introduced by this patch, but can the ntf union copy
> > uninitialized stack memory into persistent device structures?
> >
> > In net/nfc/nci/ntf.c:nci_rf_discover_ntf_packet(), the local ntf struct
> > is declared on the stack:
> >
> > struct nci_rf_discover_ntf ntf;
> >
> > If the incoming packet has ntf.rf_tech_specific_params_len == 0, the
> > extraction routines are bypassed, leaving the rf_tech_specific_params union
> > uninitialized. The struct is then passed to nci_add_new_target():
> >
> > if (add_target == true)
> > nci_add_new_target(ndev, &ntf);
> >
> > This eventually calls nci_add_new_protocol(), which blindly reads from this
> > union:
> >
> > nfca_poll = (struct rf_tech_specific_params_nfca_poll *)params;
> > target->sens_res = nfca_poll->sens_res;
> > target->sel_res = nfca_poll->sel_res;
> > target->nfcid1_len = nfca_poll->nfcid1_len;
> >
> > Could this uninitialized stack garbage be copied into ndev->targets, and
> > later broadcasted to userspace via Netlink in nfc_targets_found()?
>
> I managed to repro this issue locally via KMSAN, I can queue up a patch
> for this that I can submit separately, ntf just needs to be zero-initd.
Thanks, I agree this can be handled separately.
> > [Severity: Critical]
> > This isn't a bug introduced by this patch, but does the general bytes length
> > calculation underflow?
> >
> > In net/nfc/nci/ntf.c:nci_store_general_bytes_nfc_dep():
> >
> > ndev->remote_gb_len = min_t(__u8,
> > (ntf->activation_params.listen_nfc_dep.atr_req_len -
> > NFC_ATR_REQ_GT_OFFSET), NFC_ATR_REQ_GB_MAXSIZE);
> >
> > memcpy(ndev->remote_gb,
> > (ntf->activation_params.listen_nfc_dep.atr_req +
> > NFC_ATR_REQ_GT_OFFSET), ndev->remote_gb_len);
> >
> > The length is computed by subtracting NFC_ATR_REQ_GT_OFFSET (14) from
> > atr_req_len. If atr_req_len is less than 14, this subtraction will
> > underflow.
> >
> > Because of the min_t cast, this large underflowed value is bounded to
> > NFC_ATR_REQ_GB_MAXSIZE (48). The memcpy() then copies 48 bytes starting
> > from atr_req + 14 into ndev->remote_gb.
> >
> > Since the source buffer is part of the ntf union allocated on the stack
> > in nci_rf_intf_activated_ntf_packet(), could this leak up to 48 bytes of
> > uninitialized kernel stack memory to userspace?
>
> This one is already being fixed on the list:
>
> [PATCH net v4 1/5] nfc: nci: fix u8 underflow in
> nci_store_general_bytes_nfc_dep (Lekë Hapçiu)
> https://lore.kernel.org/all/20260424180151.3808557-2-snowwlake@icloud.com/
Ack.
> > [Severity: High]
> > This isn't a bug introduced by this patch, but does the lack of skb->len
> > validation in NCI response handlers allow out-of-bounds reads?
> >
> > In nci_core_init_rsp_packet_v1() in net/nfc/nci/rsp.c, fields are read
> > and arrays are copied from the payload:
> >
> > ndev->num_supported_rf_interfaces =
> > rsp_1->num_supported_rf_interfaces;
> >
> > memcpy(ndev->supported_rf_interfaces, rsp_1->supported_rf_interfaces,
> > ndev->num_supported_rf_interfaces);
> >
> > Similarly in nci_core_init_rsp_packet_v2():
> >
> > while (rf_interface_idx < ndev->num_supported_rf_interfaces) {
> > ndev->supported_rf_interfaces[rf_interface_idx++] =
> > *supported_rf_interface++;
> > }
> >
> > Since the NCI control header is stripped via skb_pull(), leaving the
> > payload length in skb->len, what happens if an NFCC sends a short response
> > packet (e.g. plen = 1)? Could this cause the handlers to read adjacent
> > kernel slab memory?
>
> I've since submitted a fix for this issue:
>
> [PATCH net] nfc: nci: fix uninit-value in
> nci_core_init_rsp_packet()
> https://lore.kernel.org/netdev/20260623222402.175798-1-sam@bynar.io/
Ack.
> > [Severity: High]
> > This isn't a bug introduced by this patch, but is there a missing boundary
> > validation when extracting activation parameters?
> >
> > In net/nfc/nci/ntf.c:nci_extract_activation_params_iso_dep():
> >
> > nfca_poll->rats_res_len = min_t(__u8, *data++, NFC_ATS_MAXSIZE);
> > if (nfca_poll->rats_res_len > 0) {
> > memcpy(nfca_poll->rats_res, data, nfca_poll->rats_res_len);
> > }
> >
> > And in nci_extract_activation_params_nfc_dep():
> >
> > poll->atr_res_len = min_t(__u8, *data++, NFC_ATR_RES_MAXSIZE - 2);
> > if (poll->atr_res_len > 0)
> > memcpy(poll->atr_res, data, poll->atr_res_len);
> >
> > These inner lengths are bounded by a maximum size like NFC_ATS_MAXSIZE, but
> > they are not validated against the remaining bytes in the SKB (skb->len).
> >
> > If a malformed packet provides a large inner length but a short payload,
> > could this memcpy() read out of bounds into the device structure,
> > potentially leaking adjacent slab memory to userspace later?
>
> This one is already being fixed on the list too:
>
> [PATCH v3] nfc: nci: add data_len bound checks to activation
> parameter extractors (Bryam Vargas)
> https://lore.kernel.org/all/20260612-b4-disp-6d52d8b0-v3-1-e26221f8826d@proton.me/
Ack.
> > [Severity: High]
> > This isn't a bug introduced by this patch, but is there a potential
> > priority inversion or timeout deadlock involving the device lock?
> >
> > Functions like nfc_stop_poll() in net/nfc/core.c acquire the device lock:
> >
> > device_lock(&dev->dev);
> > ...
> > if (dev->ops->stop_poll)
> > dev->ops->stop_poll(dev);
> >
> > This eventually calls nci_request(), which synchronously waits for a
> > completion signaled by the nci_rx_work thread. However, if an NTF packet
> > is received first, nci_rx_work processes it and invokes
> > nfc_targets_found(), which also attempts to acquire the device lock:
> >
> > device_lock(&dev->dev);
> >
> > Since the calling thread already holds the device lock, nci_rx_work blocks
> > indefinitely. Because the RX worker is blocked, it cannot process the
> > pending RSP, causing nci_request() to time out and fail. Could this
> > deadlock the RX thread?
>
> No patch for this one, although I'm not sure how accurate it is.
With the above in mind, this now looks good to me.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* Re: [PATCH v3 net] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
From: Pengfei Zhang @ 2026-06-25 12:56 UTC (permalink / raw)
To: idosch
Cc: dsahern, davem, edumazet, kuba, pabeni, horms, netdev,
linux-kernel, Pengfei Zhang
In-Reply-To: <20260625122411.GA1175897@shredder>
Hi Ido,
Thank you for the review. Thanks also to Jakub Kicinski, Eric Dumazet
and Kuniyuki Iwashima for their feedback on the earlier versions.
I will prepare a patch to fix inet_dump_fib() using the same tb_id-based
resume logic and send it to net-next once it opens.
Thanks,
Pengfei
^ permalink raw reply
* Re: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level triggered.
From: Parthiban.Veerasooran @ 2026-06-25 12:36 UTC (permalink / raw)
To: Selvamani.Rajagopal, andrew+netdev, davem, edumazet, kuba, pabeni,
robh, krzk+dt, conor+dt, Pier.Beruto
Cc: andrew, netdev, linux-kernel, Conor.Dooley, devicetree
In-Reply-To: <CYYPR02MB9828E1167750AEA090EC60CD83EE2@CYYPR02MB9828.namprd02.prod.outlook.com>
Hi Selvamani,
On 23/06/26 11:18 am, Selvamani Rajagopal wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>> -----Original Message-----
>> From: Parthiban.Veerasooran@microchip.com <Parthiban.Veerasooran@microchip.com>
>> Subject: Re: [PATCH net v5 1/4] net: ethernet: oa_tc6: Interrupt is active low, level
>> triggered.
>>
>>
>> I will find some time this week to test and share my feedback. In the
>> meantime, would it be possible for you to test using two instances (Test
>> Case 2)? I did not encounter many issues when testing with a single
>> instance.
>>
>> I believe that testing with two instances increases the likelihood of
>> reproducing the issue in your setup as well.
>
> Parthiban,
>
> Thanks.
>
> Our EVB design allows only one board to be connected to one Raspberry Pi.
> So, I don't think I can have a setup like yours. We did test with three Raspberry Pi boards with
> multi-drop connection. Couldn't see your "NULL pointer" crash. Will keep trying though.
Thank you for the update. So it seems you can't connect two of your
MAC-PHYs in one RPI 4? RPI 4 can support two SPI devices (MAC-PHYs).
https://patchwork.kernel.org/project/netdevbpf/list/?series=1114495&state=%2A&archive=both
https://patchwork.kernel.org/project/netdevbpf/patch/20260621-fix-race-condition-and-crash-v1-1-87e290d9357f@onsemi.com/
https://patchwork.kernel.org/project/netdevbpf/patch/20260621-fix-race-condition-and-crash-v1-2-87e290d9357f@onsemi.com/
With your above patches, I did a quick test (Test case 2) with two
Microchip MAC-PHYs and faced a similar issue reported before. Sharing
the dmesg crash log for your reference.
[ 2863.182105] eth1: Receive buffer overflow error
[ 2863.199905] eth1: Receive buffer overflow error
[ 2867.669312] Unable to handle kernel NULL pointer dereference at
virtual address 00000000000000b8
[ 2867.677658] Mem abort info:
[ 2867.680474] ESR = 0x0000000096000005
[ 2867.684258] EC = 0x25: DABT (current EL), IL = 32 bits
[ 2867.689630] SET = 0, FnV = 0
[ 2867.692717] EA = 0, S1PTW = 0
[ 2867.695888] FSC = 0x05: level 1 translation fault
[ 2867.700825] Data abort info:
[ 2867.703726] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 2867.709303] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 2867.714399] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 2867.719773] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000113c2e000
[ 2867.726296] [00000000000000b8] pgd=0000000000000000,
p4d=0000000000000000, pud=0000000000000000
[ 2867.735109] Internal error: Oops: 0000000096000005 [#1] SMP
[ 2867.740830] Modules linked in: lan865x_t1s(O) microchip_t1s(O) sch_fq
snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfcomm algif_hash
aes_neon_bs algif_skcipher af_alg bnep binfmt_misc brcmfmac_cyw brcmfmac
hci_uart brcmutil btbcm bluetooth vc4 cfg80211 snd_soc_hdmi_codec
drm_exec ecdh_generic ecc drm_display_helper cec rfkill bcm2835_codec(C)
drm_dma_helper v3d rpi_hevc_dec drm_client_lib bcm2835_v4l2(C) gpu_sched
drm_shmem_helper crc_ccitt bcm2835_isp snd_soc_core drm_kms_helper
bcm2835_mmal_vchiq v4l2_mem2mem vc_sm_cma videobuf2_vmalloc
videobuf2_dma_contig raspberrypi_hwmon videobuf2_memops snd_compress
snd_bcm2835(C) videobuf2_v4l2 snd_pcm_dmaengine i2c_brcmstb snd_pcm
snd_timer videodev videobuf2_common snd mc raspberrypi_gpiomem
spi_bcm2835 gpio_fan nvmem_rmem sch_fq_codel i2c_dev zram lz4_compress
drm fuse drm_panel_orientation_quirks backlight nfnetlink [last
unloaded: microchip_t1s(O)]
[ 2867.821558] CPU: 3 UID: 0 PID: 2808 Comm: irq/59-spi0.0 Tainted: G
C O 7.1.0-rc7-v8+ #2 PREEMPT
[ 2867.831779] Tainted: [C]=CRAP, [O]=OOT_MODULE
[ 2867.836183] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 2867.842088] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 2867.849138] pc : oa_tc6_update_rx_skb+0x2c/0xa8 [lan865x_t1s]
[ 2867.854955] lr : oa_tc6_macphy_threaded_irq+0x430/0x870 [lan865x_t1s]
[ 2867.861476] sp : ffffffc083dbbd20
[ 2867.864825] x29: ffffffc083dbbd20 x28: 000000003e002020 x27:
ffffffed1cf609c8
[ 2867.872051] x26: 0000000000000000 x25: 0000000000000001 x24:
ffffff8040796480
[ 2867.879277] x23: 000000002020003e x22: 0000000000000000 x21:
0000000000000040
[ 2867.886504] x20: ffffff804a479080 x19: ffffff8040796480 x18:
00000000000982f8
[ 2867.893731] x17: ffffff80482d6500 x16: ffffffed1d87b6b0 x15:
ffffff8041a43c00
[ 2867.900957] x14: 0000000000000016 x13: 0000073d6a5d38dc x12:
00000000001d4ebe
[ 2867.908184] x11: 00000000000000c0 x10: 0000000000001ae0 x9 :
ffffffecc9c959e8
[ 2867.915410] x8 : ffffff804f1e5a40 x7 : 0000000000000002 x6 :
ffffffffffffffff
[ 2867.922636] x5 : ffffffed1e59d000 x4 : 0000000000000002 x3 :
0000000000000000
[ 2867.929863] x2 : 0000000000000040 x1 : ffffff804a479080 x0 :
0000000000000000
[ 2867.937090] Call trace:
[ 2867.939558] oa_tc6_update_rx_skb+0x2c/0xa8 [lan865x_t1s] (P)
[ 2867.945375] oa_tc6_macphy_threaded_irq+0x430/0x870 [lan865x_t1s]
[ 2867.951543] irq_thread_fn+0x34/0xc0
[ 2867.955156] irq_thread+0x1a8/0x308
[ 2867.958680] kthread+0x138/0x150
[ 2867.961942] ret_from_fork+0x10/0x20
[ 2867.965558] Code: aa0103f4 f90013f5 12001c55 f9403800 (29570403)
[ 2867.971727] ---[ end trace 0000000000000000 ]---
[ 2867.976443] genirq: exiting task "irq/59-spi0.0" (2808) is an active
IRQ thread (irq 59)
[ 2868.094789] irq 59: nobody cared (try booting with the "irqpoll" option)
[ 2868.101000] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G D C
O 7.1.0-rc7-v8+ #2 PREEMPT
[ 2868.101007] Tainted: [D]=DIE, [C]=CRAP, [O]=OOT_MODULE
[ 2868.101009] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 2868.101011] Call trace:
[ 2868.101013] show_stack+0x20/0x38 (C)
[ 2868.101027] dump_stack_lvl+0x60/0x80
[ 2868.101033] dump_stack+0x18/0x24
[ 2868.101038] __report_bad_irq+0x54/0xf0
[ 2868.101043] note_interrupt+0x344/0x398
[ 2868.101048] handle_irq_event+0xa4/0x110
[ 2868.101051] handle_level_irq+0xe0/0x178
[ 2868.101056] handle_irq_desc+0x3c/0x68
[ 2868.101061] generic_handle_domain_irq+0x20/0x40
[ 2868.101067] bcm2835_gpio_irq_handle_bank+0x180/0x1c8
[ 2868.101074] bcm2835_gpio_irq_handler+0x88/0x188
[ 2868.101080] handle_irq_desc+0x3c/0x68
[ 2868.101085] generic_handle_domain_irq+0x20/0x40
[ 2868.101091] gic_handle_irq+0x4c/0xe0
[ 2868.101094] call_on_irq_stack+0x30/0x88
[ 2868.101099] do_interrupt_handler+0x88/0x98
[ 2868.101102] el1_interrupt+0x3c/0x60
[ 2868.101108] el1h_64_irq_handler+0x18/0x30
[ 2868.101113] el1h_64_irq+0x6c/0x70
[ 2868.101116] default_idle_call+0x34/0x1a0 (P)
[ 2868.101123] do_idle+0x260/0x2a0
[ 2868.101128] cpu_startup_entry+0x3c/0x50
[ 2868.101132] rest_init+0xe8/0xf0
[ 2868.101137] start_kernel+0x7f4/0x800
[ 2868.101143] __primary_switched+0x88/0x98
[ 2868.101149] handlers:
[ 2868.207750] lan8650 spi0.1: SPI transfer timed out
[ 2868.208070] [<0000000019361c17>] oa_tc6_macphy_isr [lan865x_t1s]
[ 2868.212048] spi_master spi0: failed to transfer one message from queue
[ 2868.215296] threaded [<00000000a4e6f0fa>] oa_tc6_macphy_threaded_irq
[lan865x_t1s]
[ 2868.219005] spi_master spi0: noqueue transfer failed
[ 2868.223053] Disabling IRQ #59
[ 2868.260162] lan8650 spi0.1 eth2: SPI data transfer failed: -110
[ 2868.266211] lan8650 spi0.1: Device interrupt disabled to avoid
interrupt storm
Best regards,
Parthiban V
>
> But I could see assert in skb_put immediately quickly.
>
>>
>> Best regards,
>> Parthiban V
>>>
>>>>
>
^ permalink raw reply
* Re: [PATCH v3 net] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
From: Ido Schimmel @ 2026-06-25 12:24 UTC (permalink / raw)
To: Pengfei Zhang
Cc: dsahern, davem, edumazet, kuba, pabeni, horms, netdev,
linux-kernel, chenzhangqi, baohua
In-Reply-To: <20260625070517.965597-1-zhangfeionline@gmail.com>
On Thu, Jun 25, 2026 at 03:05:17PM +0800, Pengfei Zhang wrote:
> inet6_dump_fib() saves its progress in cb->args[1] as a positional
> index within the current hash chain. Between batches, a concurrent
> fib6_new_table() can insert a new table at the chain head, shifting
> all existing entries. The saved index then lands on a different
> table, causing fib6_dump_table() to set w->root to the wrong table
> while w->node still points into the previous one.
> fib6_walk_continue() dereferences w->node->parent (NULL) and panics:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> RIP: 0010:fib6_walk_continue+0x6e/0x170
> Call Trace:
> <TASK>
> fib6_dump_table.isra.0+0xc5/0x240
> inet6_dump_fib+0xf6/0x420
> rtnl_dumpit+0x30/0xa0
> netlink_dump+0x15b/0x460
> netlink_recvmsg+0x1d6/0x2a0
> ____sys_recvmsg+0x17a/0x190
>
> Fix by storing tb->tb6_id in cb->args[1] instead of a positional
> index. On resume, skip entries until the id matches; a concurrent
> head-insert can never match the saved id, so the walker always
> resumes on the correct table.
>
> Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
> Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
You should have waited at least 24h between versions:
https://docs.kernel.org/process/maintainer-netdev.html
The same pattern exists in IPv4, but there we don't crash because the
per-table resume logic is different. Instead, it is possible that we
restart the dump from the wrong table and re-dump routes from the next
table in the chain.
I'm aware that netlink dumps do not guarantee consistency, but for
parity / robustness reasons I suggest to align IPv4 with IPv6 and use
the same tb_id-based resume logic there. Given we don't crash there,
target the IPv4 patch at net-next (currently closed, should open next
week).
Thanks
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH net v7 2/4] i40e: skip unnecessary VF reset when setting trust
From: Przemek Kitszel @ 2026-06-25 12:17 UTC (permalink / raw)
To: Simon Horman, jtornosm
Cc: netdev, intel-wired-lan, aleksandr.loktionov, jacob.e.keller,
anthony.l.nguyen, davem, edumazet, kuba, pabeni
In-Reply-To: <20260624163050.1145761-1-horms@kernel.org>
On 6/24/26 18:30, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> i40e: skip unnecessary VF reset when setting trust
>
> This patch avoids the ~10 second VF reset delay on trust changes by
> flipping the I40E_VIRTCHNL_VF_CAP_PRIVILEGE bit directly, and only
> resorting to a full reset when revoking trust from a VF that has
> advanced features (ADQ, cloud filters, UC/MC promisc) configured.
>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> index a26c3d47ec15..0cc434b26eb8 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
>> @@ -4987,19 +5004,20 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
>> set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
>> pf->vsi[vf->lan_vsi_idx]->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
>>
>> - i40e_vc_reset_vf(vf, true);
>> + /* Reset only if revoking trust and VF has advanced features configured */
>> + if (!setting &&
>> + (vf->adq_enabled || vf->num_cloud_filters > 0 ||
>> + test_bit(I40E_VF_STATE_UC_PROMISC, &vf->vf_states) ||
>> + test_bit(I40E_VF_STATE_MC_PROMISC, &vf->vf_states))) {
>
> [High]
> Does this revoke-trust heuristic miss MAC/VLAN filters that were only
> installable while the VF was trusted?
No.
>
> A trusted VF is allowed to add many more MAC addresses than
> I40E_VC_MAX_MAC_ADDR_PER_VF (i40e_check_vf_permission() uses
> I40E_VC_MAX_MACVLAN_PER_TRUSTED_VF) and more than I40E_VC_MAX_VLAN_PER_VF
> VLANs. These quotas are enforced only at add-time.
>
> The previous always-reset path went through i40e_vc_reset_vf() ->
> i40e_cleanup_reset_vf() -> i40e_free_vf_res() -> i40e_vsi_release(),
> which destroyed the VSI and its mac_filter_hash, and reset
> vf->num_vlan = 0.
Current "reset only on revoke" does exactly the same.
>
> The new fast path only flips the PRIVILEGE bit, leaving any
NOPE.
New "fast path" only *TESTS* the PRIVILEGE bit,
it does not "clear" it in fast path.
This is just negated/wrong/inverted logic on AI side
^ permalink raw reply
* [PATCH net v2] netpoll: fix a use-after-free on shutdown path
From: Breno Leitao @ 2026-06-25 12:03 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Amerigo Wang
Cc: netdev, linux-kernel, vlad.wing, asantostc, paulmck, kernel-team,
stable, Pavan Chebbi, Breno Leitao
There is a use-after-free error on netpoll, which is clearly detected by
KASAN.
BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x3b/0x80
Read of size 1 at addr ... by task kworker/9:1
Workqueue: events queue_process
Call Trace:
skb_dequeue+0x1e/0xb0
queue_process+0x2c/0x600
process_scheduled_works+0x4b6/0x850
worker_thread+0x414/0x5a0
Allocated by task 242:
__netpoll_setup+0x201/0x4a0
netpoll_setup+0x249/0x550
enabled_store+0x32f/0x380
Freed by task 0:
kfree+0x1b7/0x540
rcu_core+0x3f8/0x7a0
The problem happens when there is a pending TX worker running in
parallel with the cleanup path.
This is what happens on netpoll shutdown path:
1) __netpoll_cleanup() is called
2) set dev->npinfo to NULL
3) call_rcu() with rcu_cleanup_netpoll_info()
3.1) rcu_cleanup_netpoll_info() tries to cancel all workers with
cancel_delayed_work(), but doesn't wait for the worker to finish
4) and kfree(npinfo);
Because 3.1) doesn't really cancel the work, as the comment says "we
can't call cancel_delayed_work_sync here, as we are in softirq", the TX
worker can run after 4).
Tl;DR: queue_process() is not an RCU reader, it reaches npinfo through
the work item via container_of().
Use disable_delayed_work_sync() to ensure the worker is completely
stopped and prevent any future re-arming attempts. Once npinfo is set
to NULL, senders will bail out and not queue new work. The disable flag
ensures any in-flight re-arming attempts also fail silently.
In the future, we can do the cleanup inline here without needing the
npinfo->rcu rcu_head, but that is net-next material.
Cc: stable@vger.kernel.org
Fixes: 38e6bc185d95 ("netpoll: make __netpoll_cleanup non-block")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- Remove the synchronize_rcu() and keep cancel the tx_work
before call_rcu(). (Jakub)
- Link to v1: https://lore.kernel.org/r/20260622-netpoll_rcu_fix-v1-1-15c3285e92e6@debian.org
---
net/core/netpoll.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 229dde818ab33..96d5945e6a30f 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -633,14 +633,6 @@ static void rcu_cleanup_netpoll_info(struct rcu_head *rcu_head)
container_of(rcu_head, struct netpoll_info, rcu);
skb_queue_purge(&npinfo->txq);
-
- /* we can't call cancel_delayed_work_sync here, as we are in softirq */
- cancel_delayed_work(&npinfo->tx_work);
-
- /* clean after last, unfinished work */
- __skb_queue_purge(&npinfo->txq);
- /* now cancel it again */
- cancel_delayed_work(&npinfo->tx_work);
kfree(npinfo);
}
@@ -664,6 +656,7 @@ static void __netpoll_cleanup(struct netpoll *np)
ops->ndo_netpoll_cleanup(np->dev);
RCU_INIT_POINTER(np->dev->npinfo, NULL);
+ disable_delayed_work_sync(&npinfo->tx_work);
call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
}
---
base-commit: d07d80b6a129a44538cda1549b7acf95154fb197
change-id: 20260622-netpoll_rcu_fix-def7bce1207a
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related
* Re: [PATCH net 1/3] i40e: keep q_vectors array in sync with channel count changes
From: Maciej Fijalkowski @ 2026-06-25 11:55 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tony Nguyen, davem, pabeni, edumazet, andrew+netdev, netdev,
poros, arkadiusz.kubalewski, przemyslaw.kitszel, horms,
aleksandr.loktionov, pmenzel, sx.rinitha
In-Reply-To: <20260605163934.547c7bdd@kernel.org>
On Fri, Jun 05, 2026 at 04:39:34PM -0700, Jakub Kicinski wrote:
> On Fri, 5 Jun 2026 11:01:19 -0700 Tony Nguyen wrote:
> > > Should the new err_lump label, and the existing err_vsi exits from the
> > > two allocation steps above, instead unwind through the err_rings block
> > > (unregister_netdev / free_netdev / i40e_devlink_destroy_port /
> > > i40e_aq_delete_element) the way i40e_vsi_setup()'s err_msix path does?
> > >
> > > The pre-patch code had the same defective err_vsi target for the
> > > qp_pile and arrays paths, but the patch adds two new failure points
> > > (the unconditional q_vectors kzalloc and the new
> > > i40e_vsi_setup_vectors() call) that route into it during reset
> > > rebuild, where vsi->netdev is already registered.
> >
> > This does seem valid, but as mentioned by Sashiko the pre-patch code has
> > the same target/issue. There's a recent submission [1], with changes
> > requested, that should cover this. Did you want to take this now or wait
> > and have it sent with this other one?
>
> Hm. I convinced myself yesterday that the old code did _not_
> have the issue because it was pass false as the second arg to
> i40e_vsi_{alloc,free}_arrays() ? Good chance that I misread,
> it's tricky code. As much as I would love to apply this to prevent
> the deadlock in NIPA - let's wait for the follow up. I'll pick up
> the other two patches from this series off the list.
FWIW it was our beloved "pre-existing issue", alloc arrays could fail at
ring memory allocation and bail out without de-registering netdev.
Regardless, I'm gonna send a v4 with preceding patch that should fix
this...
>
^ permalink raw reply
* [PATCH v2 net-next] selftests/xsk: Preserve UMEM view in BIDIRECTIONAL test
From: Maciej Fijalkowski @ 2026-06-25 11:52 UTC (permalink / raw)
To: netdev
Cc: bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
tushar.vyavahare, kerneljasonxing, Maciej Fijalkowski
The UMEM state refactor made __send_pkts() use xsk->umem for Tx
address generation. At the same time, the shared-UMEM Tx setup copies the
Rx UMEM state into a Tx-local state object and resets base_addr and
next_buffer before configuring the Tx socket.
Passing that Tx-local object to xsk_configure() makes xsk->umem point to
the zero-based Tx allocator state. This breaks the BIDIRECTIONAL test once
the roles are switched: the same socket is then used for Rx validation, but
received descriptors from the other logical UMEM half are checked against
base_addr == 0. With the new UMEM bounds check, a valid address such as
base_addr + XDP_PACKET_HEADROOM is rejected as being outside the UMEM
window.
Keep xsk->umem as the shared/Rx UMEM view used for socket configuration
and Rx validation. Use the ifobject-local UMEM copy only for Tx descriptor
address generation, preserving the BIDIRECTIONAL test's intent of using
the proper logical UMEM half after the direction switch.
Fixes: b17631032769 ("selftests/xsk: Move UMEM state from ifobject to xsk_socket_info")
Reviewed-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Tested-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
v2:
- fix SoB line
- rebase
- add tags from Tushar
---
tools/testing/selftests/bpf/prog_tests/test_xsk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index 72875071d4f1..26437d4bdc8e 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -1169,8 +1169,8 @@ static int receive_pkts(struct test_spec *test)
static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, bool timeout)
{
u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len;
+ struct xsk_umem_info *umem = ifobject->xsk_arr[0].umem_real;
struct pkt_stream *pkt_stream = xsk->pkt_stream;
- struct xsk_umem_info *umem = xsk->umem;
bool use_poll = ifobject->use_poll;
struct pollfd fds = { };
int ret;
@@ -1521,7 +1521,7 @@ static int thread_common_ops_tx(struct test_spec *test, struct ifobject *ifobjec
umem_tx->base_addr = 0;
umem_tx->next_buffer = 0;
- ret = xsk_configure(test, ifobject, umem_tx, true);
+ ret = xsk_configure(test, ifobject, umem_rx, true);
if (ret)
return ret;
ifobject->xsk = &ifobject->xsk_arr[0];
--
2.43.0
^ permalink raw reply related
* [PATCH net] mlxsw: spectrum_acl_erp: Fix const qualifier of delta_clear()
From: Evgenii Burenchev @ 2026-06-25 11:48 UTC (permalink / raw)
To: stable, Greg Kroah-Hartman
Cc: Evgenii Burenchev, idosch, petrm, andrew+netdev, davem, edumazet,
kuba, pabeni, jiri, netdev, linux-kernel, lvc-project
mlxsw_sp_acl_erp_delta_clear() takes 'const char *enc_key' but modifies
the memory it points to. This is a logical error in the function
declaration.
The only caller passes a non-const buffer (aentry->ht_key.enc_key), so
the const qualifier is misleading and unnecessary.
Remove const from the enc_key parameter to match the actual usage.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: c22291f7cf45 ("mlxsw: spectrum: acl: Implement delta for ERP")
Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c | 2 +-
drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
index cbb272a96359..0d0cd093b3c6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
@@ -1118,7 +1118,7 @@ u8 mlxsw_sp_acl_erp_delta_value(const struct mlxsw_sp_acl_erp_delta *delta,
}
void mlxsw_sp_acl_erp_delta_clear(const struct mlxsw_sp_acl_erp_delta *delta,
- const char *enc_key)
+ char *enc_key)
{
u16 start = delta->start;
u8 mask = delta->mask;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
index 010204f73ea4..67cc7a5737dd 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.h
@@ -245,7 +245,7 @@ u8 mlxsw_sp_acl_erp_delta_mask(const struct mlxsw_sp_acl_erp_delta *delta);
u8 mlxsw_sp_acl_erp_delta_value(const struct mlxsw_sp_acl_erp_delta *delta,
const char *enc_key);
void mlxsw_sp_acl_erp_delta_clear(const struct mlxsw_sp_acl_erp_delta *delta,
- const char *enc_key);
+ char *enc_key);
struct mlxsw_sp_acl_erp_mask;
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net v2] octeontx2-af: Block VFs from clobbering special CGX PKIND state
From: kernel test robot @ 2026-06-25 11:47 UTC (permalink / raw)
To: Ratheesh Kannoth, davem, gakula, linux-kernel, netdev, sgoutham
Cc: llvm, oe-kbuild-all, andrew+netdev, edumazet, kuba, pabeni,
Hariprasad Kelam, Ratheesh Kannoth
In-Reply-To: <20260625044621.2841831-1-rkannoth@marvell.com>
Hi Ratheesh,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net/main]
[also build test WARNING on linus/master v7.1 next-20260623]
[cannot apply to linux-review/Ratheesh-Kannoth/octeontx2-af-Block-VFs-from-clobbering-special-CGX-PKIND-state/20260622-133621 horms-ipvs/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Ratheesh-Kannoth/octeontx2-af-Block-VFs-from-clobbering-special-CGX-PKIND-state/20260625-124846
base: net/main
patch link: https://lore.kernel.org/r/20260625044621.2841831-1-rkannoth%40marvell.com
patch subject: [PATCH net v2] octeontx2-af: Block VFs from clobbering special CGX PKIND state
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20260625/202606251954.vsXupLpQ-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 6cc609bb250b21b47fc7d394b4019101e9983597)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260625/202606251954.vsXupLpQ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606251954.vsXupLpQ-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1522:6: warning: variable 'pf' set but not used [-Wunused-but-set-variable]
1522 | int pf;
| ^
>> drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1696:24: warning: variable 'cgx' is uninitialized when used here [-Wuninitialized]
1696 | cgxd = rvu_cgx_pdata(cgx, rvu);
| ^~~
drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:1521:8: note: initialize the variable 'cgx' to silence this warning
1521 | u8 cgx;
| ^
| = '\0'
2 warnings generated.
vim +/pf +1522 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
1506
1507 int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
1508 struct nix_lf_alloc_req *req,
1509 struct nix_lf_alloc_rsp *rsp)
1510 {
1511 int nixlf, qints, hwctx_size, intf, rc = 0;
1512 u16 bcast, mcast, promisc, ucast;
1513 struct rvu_hwinfo *hw = rvu->hw;
1514 u16 pcifunc = req->hdr.pcifunc;
1515 bool rules_created = false;
1516 struct rvu_block *block;
1517 struct rvu_pfvf *pfvf;
1518 u64 cfg, ctx_cfg;
1519 struct cgx *cgxd;
1520 int blkaddr;
1521 u8 cgx;
> 1522 int pf;
1523
1524 if (!req->rq_cnt || !req->sq_cnt || !req->cq_cnt)
1525 return NIX_AF_ERR_PARAM;
1526
1527 if (req->way_mask)
1528 req->way_mask &= 0xFFFF;
1529
1530 pfvf = rvu_get_pfvf(rvu, pcifunc);
1531 blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NIX, pcifunc);
1532 if (!pfvf->nixlf || blkaddr < 0)
1533 return NIX_AF_ERR_AF_LF_INVALID;
1534
1535 block = &hw->block[blkaddr];
1536 nixlf = rvu_get_lf(rvu, block, pcifunc, 0);
1537 if (nixlf < 0)
1538 return NIX_AF_ERR_AF_LF_INVALID;
1539
1540 /* Check if requested 'NIXLF <=> NPALF' mapping is valid */
1541 if (req->npa_func) {
1542 /* If default, use 'this' NIXLF's PFFUNC */
1543 if (req->npa_func == RVU_DEFAULT_PF_FUNC)
1544 req->npa_func = pcifunc;
1545 if (!is_pffunc_map_valid(rvu, req->npa_func, BLKTYPE_NPA))
1546 return NIX_AF_INVAL_NPA_PF_FUNC;
1547 }
1548
1549 /* Check if requested 'NIXLF <=> SSOLF' mapping is valid */
1550 if (req->sso_func) {
1551 /* If default, use 'this' NIXLF's PFFUNC */
1552 if (req->sso_func == RVU_DEFAULT_PF_FUNC)
1553 req->sso_func = pcifunc;
1554 if (!is_pffunc_map_valid(rvu, req->sso_func, BLKTYPE_SSO))
1555 return NIX_AF_INVAL_SSO_PF_FUNC;
1556 }
1557
1558 /* If RSS is being enabled, check if requested config is valid.
1559 * RSS table size should be power of two, otherwise
1560 * RSS_GRP::OFFSET + adder might go beyond that group or
1561 * won't be able to use entire table.
1562 */
1563 if (req->rss_sz && (req->rss_sz > MAX_RSS_INDIR_TBL_SIZE ||
1564 !is_power_of_2(req->rss_sz)))
1565 return NIX_AF_ERR_RSS_SIZE_INVALID;
1566
1567 if (req->rss_sz &&
1568 (!req->rss_grps || req->rss_grps > MAX_RSS_GROUPS))
1569 return NIX_AF_ERR_RSS_GRPS_INVALID;
1570
1571 /* Reset this NIX LF */
1572 rc = rvu_lf_reset(rvu, block, nixlf);
1573 if (rc) {
1574 dev_err(rvu->dev, "Failed to reset NIX%d LF%d\n",
1575 block->addr - BLKADDR_NIX0, nixlf);
1576 return NIX_AF_ERR_LF_RESET;
1577 }
1578
1579 ctx_cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST3);
1580
1581 /* Alloc NIX RQ HW context memory and config the base */
1582 hwctx_size = 1UL << ((ctx_cfg >> 4) & 0xF);
1583 rc = qmem_alloc(rvu->dev, &pfvf->rq_ctx, req->rq_cnt, hwctx_size);
1584 if (rc)
1585 goto free_mem;
1586
1587 pfvf->rq_bmap = kcalloc(req->rq_cnt, sizeof(long), GFP_KERNEL);
1588 if (!pfvf->rq_bmap) {
1589 rc = -ENOMEM;
1590 goto free_mem;
1591 }
1592
1593 rvu_write64(rvu, blkaddr, NIX_AF_LFX_RQS_BASE(nixlf),
1594 (u64)pfvf->rq_ctx->iova);
1595
1596 /* Set caching and queue count in HW */
1597 cfg = BIT_ULL(36) | (req->rq_cnt - 1) | req->way_mask << 20;
1598 rvu_write64(rvu, blkaddr, NIX_AF_LFX_RQS_CFG(nixlf), cfg);
1599
1600 /* Alloc NIX SQ HW context memory and config the base */
1601 hwctx_size = 1UL << (ctx_cfg & 0xF);
1602 rc = qmem_alloc(rvu->dev, &pfvf->sq_ctx, req->sq_cnt, hwctx_size);
1603 if (rc)
1604 goto free_mem;
1605
1606 pfvf->sq_bmap = kcalloc(req->sq_cnt, sizeof(long), GFP_KERNEL);
1607 if (!pfvf->sq_bmap) {
1608 rc = -ENOMEM;
1609 goto free_mem;
1610 }
1611
1612 rvu_write64(rvu, blkaddr, NIX_AF_LFX_SQS_BASE(nixlf),
1613 (u64)pfvf->sq_ctx->iova);
1614
1615 cfg = BIT_ULL(36) | (req->sq_cnt - 1) | req->way_mask << 20;
1616 rvu_write64(rvu, blkaddr, NIX_AF_LFX_SQS_CFG(nixlf), cfg);
1617
1618 /* Alloc NIX CQ HW context memory and config the base */
1619 hwctx_size = 1UL << ((ctx_cfg >> 8) & 0xF);
1620 rc = qmem_alloc(rvu->dev, &pfvf->cq_ctx, req->cq_cnt, hwctx_size);
1621 if (rc)
1622 goto free_mem;
1623
1624 pfvf->cq_bmap = kcalloc(req->cq_cnt, sizeof(long), GFP_KERNEL);
1625 if (!pfvf->cq_bmap) {
1626 rc = -ENOMEM;
1627 goto free_mem;
1628 }
1629
1630 rvu_write64(rvu, blkaddr, NIX_AF_LFX_CQS_BASE(nixlf),
1631 (u64)pfvf->cq_ctx->iova);
1632
1633 cfg = BIT_ULL(36) | (req->cq_cnt - 1) | req->way_mask << 20;
1634 rvu_write64(rvu, blkaddr, NIX_AF_LFX_CQS_CFG(nixlf), cfg);
1635
1636 /* Initialize receive side scaling (RSS) */
1637 hwctx_size = 1UL << ((ctx_cfg >> 12) & 0xF);
1638 rc = nixlf_rss_ctx_init(rvu, blkaddr, pfvf, nixlf, req->rss_sz,
1639 req->rss_grps, hwctx_size, req->way_mask,
1640 !!(req->flags & NIX_LF_RSS_TAG_LSB_AS_ADDER));
1641 if (rc)
1642 goto free_mem;
1643
1644 /* Alloc memory for CQINT's HW contexts */
1645 cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
1646 qints = (cfg >> 24) & 0xFFF;
1647 hwctx_size = 1UL << ((ctx_cfg >> 24) & 0xF);
1648 rc = qmem_alloc(rvu->dev, &pfvf->cq_ints_ctx, qints, hwctx_size);
1649 if (rc)
1650 goto free_mem;
1651
1652 rvu_write64(rvu, blkaddr, NIX_AF_LFX_CINTS_BASE(nixlf),
1653 (u64)pfvf->cq_ints_ctx->iova);
1654
1655 rvu_write64(rvu, blkaddr, NIX_AF_LFX_CINTS_CFG(nixlf),
1656 BIT_ULL(36) | req->way_mask << 20);
1657
1658 /* Alloc memory for QINT's HW contexts */
1659 cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
1660 qints = (cfg >> 12) & 0xFFF;
1661 hwctx_size = 1UL << ((ctx_cfg >> 20) & 0xF);
1662 rc = qmem_alloc(rvu->dev, &pfvf->nix_qints_ctx, qints, hwctx_size);
1663 if (rc)
1664 goto free_mem;
1665
1666 rvu_write64(rvu, blkaddr, NIX_AF_LFX_QINTS_BASE(nixlf),
1667 (u64)pfvf->nix_qints_ctx->iova);
1668 rvu_write64(rvu, blkaddr, NIX_AF_LFX_QINTS_CFG(nixlf),
1669 BIT_ULL(36) | req->way_mask << 20);
1670
1671 /* Setup VLANX TPID's.
1672 * Use VLAN1 for 802.1Q
1673 * and VLAN0 for 802.1AD.
1674 */
1675 cfg = (0x8100ULL << 16) | 0x88A8ULL;
1676 rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_CFG(nixlf), cfg);
1677
1678 /* Enable LMTST for this NIX LF */
1679 rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_CFG2(nixlf), BIT_ULL(0));
1680
1681 /* Set CQE/WQE size, NPA_PF_FUNC for SQBs and also SSO_PF_FUNC */
1682 if (req->npa_func)
1683 cfg = req->npa_func;
1684 if (req->sso_func)
1685 cfg |= (u64)req->sso_func << 16;
1686
1687 cfg |= (u64)req->xqe_sz << 33;
1688 rvu_write64(rvu, blkaddr, NIX_AF_LFX_CFG(nixlf), cfg);
1689
1690 /* Config Rx pkt length, csum checks and apad enable / disable */
1691 rvu_write64(rvu, blkaddr, NIX_AF_LFX_RX_CFG(nixlf), req->rx_cfg);
1692
1693 /* Configure pkind for TX parse config */
1694 if (is_pf_cgxmapped(rvu, rvu_get_pf(rvu->pdev, pcifunc))) {
1695 pf = rvu_get_pf(rvu->pdev, pcifunc);
> 1696 cgxd = rvu_cgx_pdata(cgx, rvu);
1697
1698 mutex_lock(&cgxd->lock);
1699 if (rvu_cgx_is_pkind_config_permitted(rvu, pcifunc)) {
1700 cfg = NPC_TX_DEF_PKIND;
1701 rvu_write64(rvu, blkaddr, NIX_AF_LFX_TX_PARSE_CFG(nixlf), cfg);
1702 }
1703 mutex_unlock(&cgxd->lock);
1704 }
1705
1706 if (is_rep_dev(rvu, pcifunc)) {
1707 pfvf->tx_chan_base = RVU_SWITCH_LBK_CHAN;
1708 pfvf->tx_chan_cnt = 1;
1709 goto exit;
1710 }
1711
1712 intf = is_lbk_vf(rvu, pcifunc) ? NIX_INTF_TYPE_LBK : NIX_INTF_TYPE_CGX;
1713 if (is_sdp_pfvf(rvu, pcifunc))
1714 intf = NIX_INTF_TYPE_SDP;
1715
1716 if (is_cn20k(rvu->pdev)) {
1717 rc = npc_cn20k_dft_rules_idx_get(rvu, pcifunc, &bcast, &mcast,
1718 &promisc, &ucast);
1719 if (rc) {
1720 rc = npc_cn20k_dft_rules_alloc(rvu, pcifunc);
1721 if (rc)
1722 goto free_mem;
1723
1724 rules_created = true;
1725 }
1726 }
1727
1728 rc = nix_interface_init(rvu, pcifunc, intf, nixlf, rsp,
1729 !!(req->flags & NIX_LF_LBK_BLK_SEL));
1730 if (rc)
1731 goto free_dft;
1732
1733 /* Disable NPC entries as NIXLF's contexts are not initialized yet */
1734 rvu_npc_disable_default_entries(rvu, pcifunc, nixlf);
1735
1736 /* Configure RX VTAG Type 7 (strip) for vf vlan */
1737 rvu_write64(rvu, blkaddr,
1738 NIX_AF_LFX_RX_VTAG_TYPEX(nixlf, NIX_AF_LFX_RX_VTAG_TYPE7),
1739 VTAGSIZE_T4 | VTAG_STRIP);
1740
1741 goto exit;
1742
1743 free_dft:
1744 if (is_cn20k(rvu->pdev) && rules_created)
1745 npc_cn20k_dft_rules_free(rvu, pcifunc);
1746
1747 free_mem:
1748 nix_ctx_free(rvu, pfvf);
1749
1750 exit:
1751 /* Set macaddr of this PF/VF */
1752 ether_addr_copy(rsp->mac_addr, pfvf->mac_addr);
1753
1754 /* set SQB size info */
1755 cfg = rvu_read64(rvu, blkaddr, NIX_AF_SQ_CONST);
1756 rsp->sqb_size = (cfg >> 34) & 0xFFFF;
1757 rsp->rx_chan_base = pfvf->rx_chan_base;
1758 rsp->tx_chan_base = pfvf->tx_chan_base;
1759 rsp->rx_chan_cnt = pfvf->rx_chan_cnt;
1760 rsp->tx_chan_cnt = pfvf->tx_chan_cnt;
1761 rsp->lso_tsov4_idx = NIX_LSO_FORMAT_IDX_TSOV4;
1762 rsp->lso_tsov6_idx = NIX_LSO_FORMAT_IDX_TSOV6;
1763 /* Get HW supported stat count */
1764 cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST1);
1765 rsp->lf_rx_stats = ((cfg >> 32) & 0xFF);
1766 rsp->lf_tx_stats = ((cfg >> 24) & 0xFF);
1767 /* Get count of CQ IRQs and error IRQs supported per LF */
1768 cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
1769 rsp->qints = ((cfg >> 12) & 0xFFF);
1770 rsp->cints = ((cfg >> 24) & 0xFFF);
1771 rsp->cgx_links = hw->cgx_links;
1772 rsp->lbk_links = hw->lbk_links;
1773 rsp->sdp_links = hw->sdp_links;
1774
1775 return rc;
1776 }
1777
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH 6.12.y] net: add missing ns_capable check for peer netns
From: Greg KH @ 2026-06-25 11:37 UTC (permalink / raw)
To: Maximilian Heyne
Cc: stable, Marc Kleine-Budde, Vincent Mailhol, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Daniel Borkmann, Nikolay Aleksandrov, Eric W. Biederman,
linux-can, netdev, linux-kernel, bpf
In-Reply-To: <20260617-pats-coif-316245c6@mheyne-amazon>
On Wed, Jun 17, 2026 at 08:25:31AM +0000, Maximilian Heyne wrote:
> The upstream commit 7b735ef81286 ("rtnetlink: add missing
> netlink_ns_capable() check for peer netns") doesn't apply on older
> stable kernels due to refactoring. Therefore, this patch is an attempt
> to implement the same capability check just directly in the respective
> interface types.
Why can't we take the full series of patches instead? Otherwise this is
going to be a pain over time for any other fixes/updates in this area,
right?
And if not, then we need acks from the maintainers here...
thanks,
greg k-h
^ permalink raw reply
* [PATCH iproute2-next v5] ip/bond: add lacp_strict support
From: Louis Scalbert @ 2026-06-25 11:42 UTC (permalink / raw)
To: netdev
Cc: andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy, shemminger,
maheshb, jonas.gorski, horms, stephen, Louis Scalbert
lacp_strict defines the behavior of a LACP bonding interface
when no slaves are in Collecting_Distributing state while at least
'min_links' slaves have carrier.
In the default (off) mode, the bonding master remains up and a
single slave is selected for TX/RX, while traffic received on other
slaves is dropped. This preserves the existing behavior.
In lacp_strict mode, the bonding master reports carrier down in this
situation.
Link: https://lore.kernel.org/netdev/20260603150331.1919611-1-louis.scalbert@6wind.com/
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
---
include/uapi/linux/if_link.h | 1 +
ip/iplink_bond.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 70aee114..d3a21fba 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1601,6 +1601,7 @@ enum {
IFLA_BOND_NS_IP6_TARGET,
IFLA_BOND_COUPLED_CONTROL,
IFLA_BOND_BROADCAST_NEIGH,
+ IFLA_BOND_LACP_STRICT,
__IFLA_BOND_MAX,
};
diff --git a/ip/iplink_bond.c b/ip/iplink_bond.c
index 714fe7bd..7e2e397a 100644
--- a/ip/iplink_bond.c
+++ b/ip/iplink_bond.c
@@ -87,6 +87,12 @@ static const char *lacp_rate_tbl[] = {
NULL,
};
+static const char *lacp_strict_tbl[] = {
+ "off",
+ "on",
+ NULL,
+};
+
static const char *ad_select_tbl[] = {
"stable",
"bandwidth",
@@ -155,6 +161,7 @@ static void print_explain(FILE *f)
" [ ad_user_port_key PORTKEY ]\n"
" [ ad_actor_sys_prio SYSPRIO ]\n"
" [ ad_actor_system LLADDR ]\n"
+ " [ lacp_strict LACP_STRICT ]\n"
" [ arp_missed_max MISSED_MAX ]\n"
"\n"
"BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb\n"
@@ -168,6 +175,7 @@ static void print_explain(FILE *f)
"AD_SELECT := stable|bandwidth|count\n"
"COUPLED_CONTROL := off|on\n"
"BROADCAST_NEIGHBOR := off|on\n"
+ "LACP_STRICT := off|on\n"
);
}
@@ -188,6 +196,7 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv,
__u32 packets_per_slave;
__u8 missed_max;
__u8 broadcast_neighbor;
+ __u8 lacp_strict;
unsigned int ifindex;
int ret;
@@ -417,6 +426,13 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv,
return -1;
addattr_l(n, 1024, IFLA_BOND_AD_ACTOR_SYSTEM,
abuf, len);
+ } else if (matches(*argv, "lacp_strict") == 0) {
+ NEXT_ARG();
+ lacp_strict = parse_on_off("lacp_strict", *argv, &ret);
+ if (ret)
+ return ret;
+ lacp_strict = get_index(lacp_strict_tbl, *argv);
+ addattr8(n, 1024, IFLA_BOND_LACP_STRICT, lacp_strict);
} else if (matches(*argv, "tlb_dynamic_lb") == 0) {
NEXT_ARG();
if (get_u8(&tlb_dynamic_lb, *argv, 0)) {
@@ -642,6 +658,10 @@ static void bond_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
"all_slaves_active %u ",
rta_getattr_u8(tb[IFLA_BOND_ALL_SLAVES_ACTIVE]));
+ if (tb[IFLA_BOND_LACP_STRICT])
+ print_on_off(PRINT_ANY, "lacp_strict", "lacp_strict %s ",
+ rta_getattr_u8(tb[IFLA_BOND_LACP_STRICT]));
+
if (tb[IFLA_BOND_MIN_LINKS])
print_uint(PRINT_ANY,
"min_links",
--
2.39.2
^ permalink raw reply related
* Re: [PATCH net 3/4] vlan: defer real device state propagation to netdev_work
From: Nicolai Buchwitz @ 2026-06-25 11:37 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, jv, sdf,
dongchenchen2, idosch, n05ec, yuantan098, kuniyu,
aleksandr.loktionov, dtatulea, syzbot+09da62a8b78959ceb8bb,
syzbot+cb67c392b0b8f0fd0fc1, syzbot+9bb8bd77f3966641f298
In-Reply-To: <20260624182018.2445732-4-kuba@kernel.org>
On 24.6.2026 20:20, Jakub Kicinski wrote:
> vlan_device_event() generates nested UP/DOWN, MTU and feature
> change events. It executes an event for the VLAN device directly
> from the notifier - while the locks of the lower device are held.
>
> This causes deadlocks, for example:
>
> bond (3) bond_update_speed_duplex(vlan)
> | ^ v
> vlan (2) UP(vlan) (4) vlan_ethtool_get_link_ksettings()
> | ^ v
> dummy (1) UP(dummy) (5) __ethtool_get_link_ksettings()
>
> The dummy device is ops locked, vlan creates a nested event (2),
> then bond wants to ask vlan for link state (3). bond uses the
> "I'm already holding the instance lock" flavor of API. But in
> this case the lock held refers to vlan itself. We hit vlan's
> link settings trampoline (4) and call __ethtool_get_link_ksettings()
> which tries to lock dummy. Deadlock. There's no clean way for us
> to tell the vlan_ethtool_get_link_ksettings() that the caller
> is already in lower device's critical section.
>
> Defer the propagation to the per-netdev work facility instead:
> the notifier only schedules netdev_work_sched(vlandev, VLAN_WORK_*),
> and ndo_work (vlan_dev_work) applies the change later. Hopefully
> nobody expects the VLAN state changes to be instantaneous.
>
> If someone does expect the changes to be instantaneous we will
> have to do the same thing Stan did for rx_mode and "strategically"
> place sync calls, to make sure such delayed works are executed
> after we drop the ops lock but before we drop rtnl_lock.
>
> Stan suggests that if we need that down the line we may
> consider reshaping the mechanism into "async notifications".
> AFAICT only vlan does this sort of netdev open chaining,
> so as a first try I think that sticking the complexity into
> the vlan code makes sense.
>
> One corner case is that we need to cancel the event if user
> explicitly changes the state before work could run. Consider
> the following operations with vlan0 on top of dummy0:
>
> ip link set dev dummy0 up # queues work to up vlan0
> ip link set dev vlan0 down # user explicitly downs the vlan
> ndo_work # acts on the stale event
>
> Reported-by: syzbot+09da62a8b78959ceb8bb@syzkaller.appspotmail.com
> Reported-by: syzbot+cb67c392b0b8f0fd0fc1@syzkaller.appspotmail.com
> Reported-by: syzbot+9bb8bd77f3966641f298@syzkaller.appspotmail.com
> Fixes: 9f275c2e9020 ("net: ethtool: make sure
> __ethtool_get_link_ksettings() is ops-locked")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> [...]
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Thanks
Nicolai
^ permalink raw reply
* [PATCH net] nfc: clear active_target when the target list is replaced
From: Yinhao Hu @ 2026-06-25 11:18 UTC (permalink / raw)
To: netdev
Cc: david, davem, edumazet, kuba, pabeni, horms, dzm91,
hust-os-kernel-patches, Yinhao Hu
nfc_activate_target() and nfc_dep_link_up() cache dev->active_target as a
raw pointer into the dev->targets array. When a later poll reports new
targets, nfc_targets_found() frees and replaces dev->targets but does not
clear dev->active_target, so the cached pointer is left dangling into
freed memory. Any subsequent NFC core path that dereferences
dev->active_target->idx then reads the freed memory, e.g.
nfc_deactivate_target(), nfc_data_exchange().
When nfc_targets_found() is about to free the current target array, clear
dev->active_target if it points into that array, and tear down the
associated active state (stop the presence-check timer, drop the DEP link
and reset the RF mode) as nfc_deactivate_target() does.
Fixes: 900994332675 ("NFC: Cache the core NFC active target pointer instead of its index")
Signed-off-by: Yinhao Hu <dddddd@hust.edu.cn>
---
net/nfc/core.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/net/nfc/core.c b/net/nfc/core.c
index a92a6566e6a0..950807906645 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -786,6 +786,21 @@ int nfc_targets_found(struct nfc_dev *dev,
dev->targets_generation++;
+ if (dev->active_target && dev->targets) {
+ for (i = 0; i < dev->n_targets; i++) {
+ if (dev->active_target != &dev->targets[i])
+ continue;
+
+ if (dev->ops->check_presence)
+ timer_delete_sync(&dev->check_pres_timer);
+
+ dev->active_target = NULL;
+ dev->dep_link_up = false;
+ dev->rf_mode = NFC_RF_NONE;
+ break;
+ }
+ }
+
kfree(dev->targets);
dev->targets = NULL;
--
2.43.0
^ permalink raw reply related
* Re: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp reporting when NET_RX_BUSY_POLL is disabled
From: Marcin Szycik @ 2026-06-25 11:07 UTC (permalink / raw)
To: Florian Bezdeka, Kwapulinski, Piotr, Ding Meng, Nguyen, Anthony L,
Kitszel, Przemyslaw, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
Kiszka, Jan
Cc: intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, wq.wang@siemens.com
In-Reply-To: <d058b0fa9ad923514084a44f51c78ae8355c4ebb.camel@siemens.com>
On 24/06/2026 11:05, Florian Bezdeka via Intel-wired-lan wrote:
> On Tue, 2026-06-23 at 09:46 +0000, Kwapulinski, Piotr wrote:
>>> -----Original Message-----
>>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ding Meng via Intel-wired-lan
>>> Sent: Monday, June 22, 2026 6:13 AM
>>> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Kiszka, Jan <jan.kiszka@siemens.com>; Bezdeka, Florian <florian.bezdeka@siemens.com>
>>> Cc: intel-wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org; netdev@vger.kernel.org; meng.ding@siemens.com; wq.wang@siemens.com
>>> Subject: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp reporting when NET_RX_BUSY_POLL is disabled
>>>
>>> When CONFIG_NET_RX_BUSY_POLL is deactivated, fetching RX HW timestamps from the NIC no longer works as expected.
>>>
>>> This occurs because disabling CONFIG_NET_RX_BUSY_POLL disables the SKB NAPI mapping in __skb_mark_napi_id(). Consequently, get_timestamp() fails to perform its driver lookup, and the igc driver's struct net_device_ops::ndo_get_tstamp is never invoked.
>>>
>>> Instead, get_timestamp() falls back to use shhwtstamps(skb)->hwtstamp, a field that the driver has not populated.
>>>
>>> Fix this by populating the hwtstamp field with the correct timestamp in the default timer when CONFIG_NET_RX_BUSY_POLL is disabled.
>>>
>>> Fixes: 069b142f5819 ("igc: Add support for PTP .getcyclesx64()")
>>> Co-developed-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>>> Signed-off-by: Ding Meng <meng.ding@siemens.com>
>>> ---
>>> drivers/net/ethernet/intel/igc/igc_main.c | 38 ++++++++++++++++-------
>>> 1 file changed, 26 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
>>> index 8ac16808023..1da8d7aa76d 100644
>>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>>> @@ -1992,7 +1992,26 @@ static struct sk_buff *igc_build_skb(struct igc_ring *rx_ring,
>>> return skb;
>>> }
>>>
>>> -static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
>>> +static void igc_construct_skb_timestamps(struct igc_adapter *adapter,
>>> + struct sk_buff *skb,
>>> + struct igc_xdp_buff *ctx)
>>> +{
>>> + if (!ctx->rx_ts)
>>> + return;
>>> +#ifdef CONFIG_NET_RX_BUSY_POLL
>>> + skb_shinfo(skb)->tx_flags |= SKBTX_HW_TSTAMP_NETDEV;
>>> + skb_hwtstamps(skb)->netdev_data = ctx->rx_ts; #else
>>> + struct igc_inline_rx_tstamps *tstamps;
>> Please move at the top of the function and add:
>
> That would trigger a "unused variable" warning in the
> CONFIG_NET_RX_BUSY_POLL case.
Put it under #ifndef CONFIG_NET_RX_BUSY_POLL. Variable declarations
need to be on top.
Thanks,
Marcin
> Btw: I was really confused that the #else statement moved to the end of
> the previous line. Might someone be using a wrongly configured mail
> client here?
>
> Florian
>
>> Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com
>>
>>> +
>>> + tstamps = ctx->rx_ts;
>>> + skb_hwtstamps(skb)->hwtstamp = igc_ptp_rx_pktstamp(adapter,
>>> + tstamps->timer0);
>>> +#endif
>>> +}
>>> +
>
> [snip]
^ permalink raw reply
* [PATCH bpf-next v10 5/5] selftests/bpf: add bpf_icmp_send no route test
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
To: bpf
Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>
For normal live cgroup_skb paths, the skb should already be routed. The
exception is for test run via BPF_PROG_TEST_RUN with packets created
via bpf_prog_test_run_skb. Those lack dst route and thus the icmp_send
would quietly fail by returning early.
This test exercises this and makes sure the kfunc returns -ENETUNREACH.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
.../bpf/prog_tests/icmp_send_kfunc.c | 26 +++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index bb532aa0d158..ffaf0fe1880b 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -169,6 +169,29 @@ static void run_icmp_test(struct icmp_send *skel, int af, const char *ip,
}
}
+static void run_icmp_no_route_test(struct icmp_send *skel)
+{
+ struct ipv4_packet pkt = pkt_v4;
+ LIBBPF_OPTS(bpf_test_run_opts, opts,
+ .data_in = &pkt,
+ .data_size_in = sizeof(pkt),
+ );
+ int err;
+
+ pkt.iph.version = 4;
+ pkt.iph.daddr = inet_addr("127.0.0.1");
+ pkt.tcp.dest = htons(80);
+ skel->bss->server_port = 80;
+ skel->bss->unreach_type = ICMP_DEST_UNREACH;
+ skel->bss->unreach_code = ICMP_HOST_UNREACH;
+ skel->data->kfunc_ret = KFUNC_RET_UNSET;
+
+ err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.egress), &opts);
+ if (!ASSERT_OK(err, "test_run"))
+ return;
+ ASSERT_EQ(skel->data->kfunc_ret, -ENETUNREACH, "kfunc_ret_no_route");
+}
+
void test_icmp_send_unreach_cgroup(void)
{
struct icmp_send *skel;
@@ -193,6 +216,9 @@ void test_icmp_send_unreach_cgroup(void)
if (test__start_subtest("ipv6"))
run_icmp_test(skel, AF_INET6, "::1", ICMPV6_REJECT_ROUTE);
+ if (test__start_subtest("no_route"))
+ run_icmp_no_route_test(skel);
+
cleanup:
icmp_send__destroy(skel);
if (cgroup_fd >= 0)
--
2.34.1
^ permalink raw reply related
* [PATCH bpf-next v10 4/5] selftests/bpf: add bpf_icmp_send recursion test
From: Mahe Tardy @ 2026-06-25 11:03 UTC (permalink / raw)
To: bpf
Cc: andrii, ast, daniel, john.fastabend, jordan, martin.lau,
yonghong.song, emil, netdev, edumazet, kuba, pabeni, davem, horms,
Mahe Tardy
In-Reply-To: <20260625110321.28236-1-mahe.tardy@gmail.com>
This test is similar to test_icmp_send_unreach_cgroup but checks that,
in case of recursion, meaning that the BPF program calling the kfunc was
re-triggered by the icmp_send done by the kfunc, the kfunc will stop
early and return -EBUSY.
The test attaches to the root cgroup to ensure the ICMP packet generated
by the kfunc re-triggers the BPF program.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
---
.../bpf/prog_tests/icmp_send_kfunc.c | 46 ++++++++++++++++
tools/testing/selftests/bpf/progs/icmp_send.c | 55 +++++++++++++++++++
2 files changed, 101 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
index bbb3c3d4509c..bb532aa0d158 100644
--- a/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/icmp_send_kfunc.c
@@ -1,8 +1,10 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include <network_helpers.h>
+#include <cgroup_helpers.h>
#include <linux/errqueue.h>
#include <poll.h>
+#include <unistd.h>
#include "icmp_send.skel.h"
#define TIMEOUT_MS 1000
@@ -10,6 +12,7 @@
#define ICMP_DEST_UNREACH 3
#define ICMPV6_DEST_UNREACH 1
+#define ICMP_HOST_UNREACH 1
#define ICMP_FRAG_NEEDED 4
#define NR_ICMP_UNREACH 15
#define ICMPV6_REJECT_ROUTE 6
@@ -195,3 +198,46 @@ void test_icmp_send_unreach_cgroup(void)
if (cgroup_fd >= 0)
close(cgroup_fd);
}
+
+void test_icmp_send_unreach_recursion(void)
+{
+ struct icmp_send *skel;
+ int cgroup_fd = -1;
+ int err;
+
+ err = setup_cgroup_environment();
+ if (!ASSERT_OK(err, "setup_cgroup_environment"))
+ return;
+
+ skel = icmp_send__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ cgroup_fd = get_root_cgroup();
+ if (!ASSERT_OK_FD(cgroup_fd, "get_root_cgroup"))
+ goto cleanup;
+
+ skel->data->target_pid = getpid();
+ skel->links.recursion =
+ bpf_program__attach_cgroup(skel->progs.recursion, cgroup_fd);
+ if (!ASSERT_OK_PTR(skel->links.recursion, "prog_attach_cgroup"))
+ goto cleanup;
+
+ trigger_prog_read_icmp_errqueue(skel, ICMP_HOST_UNREACH, AF_INET,
+ "127.0.0.1");
+
+ /*
+ * Because there's recursion involved, the first call will return at
+ * index 1 since it will return the second, and the second call will
+ * return at index 0 since it will return the first.
+ */
+ ASSERT_EQ(skel->bss->rec_count, 2, "rec_count");
+ ASSERT_EQ(skel->data->rec_kfunc_rets[0], -EBUSY, "kfunc_rets[0]");
+ ASSERT_EQ(skel->data->rec_kfunc_rets[1], 0, "kfunc_rets[1]");
+
+cleanup:
+ icmp_send__destroy(skel);
+ if (cgroup_fd >= 0)
+ close(cgroup_fd);
+ cleanup_cgroup_environment();
+}
diff --git a/tools/testing/selftests/bpf/progs/icmp_send.c b/tools/testing/selftests/bpf/progs/icmp_send.c
index 6e1ba539eeb0..c642ccdf9fd5 100644
--- a/tools/testing/selftests/bpf/progs/icmp_send.c
+++ b/tools/testing/selftests/bpf/progs/icmp_send.c
@@ -12,6 +12,10 @@ __u16 server_port = 0;
int unreach_type = 0;
int unreach_code = 0;
int kfunc_ret = -1;
+int target_pid = -1;
+
+unsigned int rec_count = 0;
+int rec_kfunc_rets[] = { -1, -1 };
SEC("cgroup_skb/egress")
int egress(struct __sk_buff *skb)
@@ -65,4 +69,55 @@ int egress(struct __sk_buff *skb)
return SK_DROP;
}
+SEC("cgroup_skb/egress")
+int recursion(struct __sk_buff *skb)
+{
+ void *data = (void *)(long)skb->data;
+ void *data_end = (void *)(long)skb->data_end;
+ struct icmphdr *icmph;
+ struct tcphdr *tcph;
+ struct iphdr *iph;
+ int ret;
+
+ if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+ return SK_PASS;
+
+ iph = data;
+ if ((void *)(iph + 1) > data_end || iph->version != 4)
+ return SK_PASS;
+
+ if (iph->daddr != bpf_htonl(SERVER_IP))
+ return SK_PASS;
+
+ if (iph->protocol == IPPROTO_TCP) {
+ tcph = (void *)iph + iph->ihl * 4;
+ if ((void *)(tcph + 1) > data_end ||
+ tcph->dest != bpf_htons(server_port))
+ return SK_PASS;
+ } else if (iph->protocol == IPPROTO_ICMP) {
+ icmph = (void *)iph + iph->ihl * 4;
+ if ((void *)(icmph + 1) > data_end ||
+ icmph->type != unreach_type || icmph->code != unreach_code)
+ return SK_PASS;
+ } else {
+ return SK_PASS;
+ }
+
+ /*
+ * This call will provoke a recursion: the ICMP packet generated by the
+ * kfunc will re-trigger this program since we are in the root cgroup in
+ * which the kernel ICMP socket belongs. However when re-entering the
+ * kfunc, it should return EBUSY.
+ */
+ ret = bpf_icmp_send(skb, unreach_type, unreach_code);
+ rec_kfunc_rets[rec_count & 1] = ret;
+ __sync_fetch_and_add(&rec_count, 1);
+
+ /* Let the first ICMP error message pass */
+ if (iph->protocol == IPPROTO_ICMP)
+ return SK_PASS;
+
+ return SK_DROP;
+}
+
char LICENSE[] SEC("license") = "Dual BSD/GPL";
--
2.34.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox