* Re: [PATCH] rocker: Fix memory leak in ofdpa_port_fdb()
From: patchwork-bot+netdevbpf @ 2026-06-23 0:10 UTC (permalink / raw)
To: Ziran Zhang
Cc: jiri, andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
linux-kernel
In-Reply-To: <20260616013245.7098-1-zhangcoder@yeah.net>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 16 Jun 2026 09:32:45 +0800 you wrote:
> In ofdpa_port_fdb(), the hash_del() only unlinks the node from
> hash table, but does not free it.
>
> Fix this by adding kfree(found) after the !found == removing check,
> where the pointer value is no longer needed.
>
> Found by Coccinelle kfree script.
>
> [...]
Here is the summary with links:
- rocker: Fix memory leak in ofdpa_port_fdb()
https://git.kernel.org/netdev/net/c/53442aad1d57
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net 3/3] net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable
From: Jakub Kicinski @ 2026-06-23 0:03 UTC (permalink / raw)
To: Shay Drori
Cc: Simon Horman, tariqt, edumazet, pabeni, andrew+netdev, davem,
saeedm, mbloch, leon, ohartoov, edwards, msanalla, phaddad, parav,
gbayer, kees, moshe, rongweil, jacob.e.keller, netdev, linux-rdma,
linux-kernel, gal
In-Reply-To: <e18662ac-413e-43f6-ac65-a4e15fd47bb7@nvidia.com>
On Mon, 22 Jun 2026 09:40:44 +0300 Shay Drori wrote:
> > [Severity: Critical]
> > Does skipping the remainder of this loop prevent the flow from being
> > unlinked from the doubly-linked list?
> >
> > By using continue here in mlx5e_tc_del_fdb_peers_flow(), we skip calling
> > mlx5e_tc_del_fdb_peer_flow(), which means flow->peer[i] is never
> > unlinked from the esw->offloads.peer_flows[original_i] list.
>
> only in case a LAG member is removed from ldev, mlx5_lag_get_dev_seq()
> will return error.
> before LAG member is removed, esw->devcom is cleanup, which invoke
> mlx5e_tc_clean_fdb_peer_flows(), which remove all peer flows.
> Hence, no flow remains.
Thanks for responding. That said, the series no longer applies.
--
pw-bot: cr
^ permalink raw reply
* Re: [PATCH net-next 0/3] selftests/xsk: stabilize timeout test behavior
From: Jason Xing @ 2026-06-22 23:48 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Maciej Fijalkowski, Tushar Vyavahare, netdev, magnus.karlsson,
stfomichev, kernelxing, davem, pabeni, ast, daniel,
tirthendu.sarkar, bpf
In-Reply-To: <20260622160706.0b4a27bf@kernel.org>
On Tue, Jun 23, 2026 at 7:07 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 17 Jun 2026 11:43:14 +0200 Maciej Fijalkowski wrote:
> > > On Tue, Jun 16, 2026 at 11:50 PM Tushar Vyavahare
> > > <tushar.vyavahare@intel.com> wrote:
> > > >
> > > > This series improves AF_XDP selftests by making timeout handling
> > > > explicit and fixing sources of non-determinism in xsk timeout tests.
> > > >
> > > > Patch 1 introduces test_spec::poll_tmout and removes implicit
> > > > dependence on RX UMEM setup state for timeout behavior.
> > > >
> > > > Patch 2 fixes thread harness sequencing by attaching XDP programs
> > > > before worker startup, removing signal-based termination, and using
> > > > barrier synchronization only for dual-thread runs.
> > > >
> > > > Patch 3 restores shared_umem after POLL_TXQ_FULL so test-local
> > > > configuration does not leak into subsequent cases on shared-netdev
> > > > runs.
> > > >
> > > > Together these changes make timeout handling easier to follow and
> > > > improve selftest stability, especially on real NIC runs.
> > >
> > > net-next is closed, but in the meantime I'll review the series ASAP.
> > >
> > > BTW, another thing about selftests I had in my mind is that are you
> > > planning to work on this [1]?
> >
> > This one is on me. I took your changes Jason and aligned ZC batching side
> > to this behavior, followed by xskxceiver adjustment. I am planning to send
> > this today EOD, however let's see how badly internal Sashiko will kick my
> > ass.
>
> Hi Maciej, do you want these applied? If they help make the tests less
> flaky I think that it's fine to take them during the merge window.
I'm not Maciej, and my take on this overall looks fine to me. Sorry
for the delay (I've been too busy to review patches during this
period).
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Thanks,
Jason
^ permalink raw reply
* Re: [PATCH net] net, bpf: check master for NULL in xdp_master_redirect()
From: Xiang Mei @ 2026-06-22 23:34 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jiayuan Chen, Daniel Borkmann, Martin KaFai Lau,
Jesper Dangaard Brouer, netdev, bpf, John Fastabend,
Stanislav Fomichev, Alexei Starovoitov, Jussi Maki, Paolo Abeni,
Weiming Shi, Ido Schimmel, David Ahern
In-Reply-To: <20260622155854.75977aac@kernel.org>
On Mon, Jun 22, 2026 at 3:58 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 21 Jun 2026 18:28:09 -0700 Xiang Mei wrote:
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index 40037413dd4e..6037860d5283 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -4430,7 +4430,7 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
> > > > struct net_device *master, *slave;
> > > >
> > > > master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
> > > > - if (unlikely(!(master->flags & IFF_UP)))
> > > > + if (unlikely(!master || !(master->flags & IFF_UP)))
> > > > return XDP_ABORTED;
> > >
> > >
> > > I recall that when I previously modified this code, I removed the
> > > !master check
> > >
> > > because this is on the fastpath. However, since this is a triggerable bug,
> > > I think adding it here is fine.
> >
> > Thanks for the review. It's difficult to hit under normal statue, but
> > the bug is real.
> > We have triggered this bug with a PoC plus GDB to pause one thread (no
> > other `cheating').
>
> Can you double-confirm that this triggers on current HEAD
> of linux/master ? I thought commit 2674d603a9e6 ("vrf: Fix a potential
> NPD when removing a port from a VRF") was supposed to prevent all the
> torn master fetches. Adding VRF folks to CC.
Yes.
We have triggered the crash on 56abdaebbf0da304b860bed1f2b5a85f5a6a16a0,
which is the latest for net.git, and 2674d603a9e6 was applied. We can
still trigger the crash:
```
[ 0.516445] BUG: kernel NULL pointer dereference, address: 00000000000000b0
[ 0.516448] bond1: (slave veth1): Releasing backup interface
[ 0.516732] #PF: supervisor read access in kernel mode
[ 0.516733] #PF: error_code(0x0000) - not-present page
[ 0.516734] PGD 102597067 P4D 102597067 PUD 102598067 PMD 0
[ 0.516736] Oops: Oops: 0000 [#1] SMP NOPTI
[ 0.517948] CPU: 0 UID: 0 PID: 133 Comm: exploit Not tainted 7.1.0+
#19 PREEMPTLAZY
[ 0.518320] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 0.518796] RIP: 0010:xdp_master_redirect+0x5f/0xc0
[ 0.519019] Code: 00 48 c7 43 10 00 00 00 00 48 c7 43 18 00 00 00
00 c7 43 20 00 00 00 00 89 43 38 48 8b 45 20 48 8b 38 e8 94 a0 fb ff
48 89 c7 0
[ 0.519795] RSP: 0018:ffffc90000003b98 EFLAGS: 00010246
[ 0.520028] RAX: 0000000000000000 RBX: ffffc90000003ee8 RCX: ffff88810268cd02
[ 0.520336] RDX: ffffffffc0000654 RSI: ffffc90000121060 RDI: 0000000000000000
[ 0.520657] RBP: ffffc90000003c18 R08: 0000000000000040 R09: ffffc90000121000
[ 0.521003] R10: 0000000000000001 R11: ffffc90000003ff8 R12: 000000000000000e
[ 0.521322] R13: 0000000000000008 R14: ffff88810268cd42 R15: 0000000000000003
[ 0.521617] FS: 00007d34bc6546c0(0000) GS:ffff888197ac5000(0000)
knlGS:0000000000000000
[ 0.521964] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.522215] CR2: 00000000000000b0 CR3: 0000000102590003 CR4: 0000000000772ef0
[ 0.522513] PKRU: 55555554
[ 0.522632] Call Trace:
[ 0.522747] <IRQ>
[ 0.522841] bpf_prog_run_generic_xdp+0x39c/0x3b0
[ 0.523057] do_xdp_generic+0x1a0/0x350
[ 0.523221] __netif_receive_skb_core.constprop.0+0x5c6/0xce0
...
```
Thanks,
Xiang
^ permalink raw reply
* Re: [PATCH bpf-next v2 0/2] bpf: Guard conntrack opts error writes
From: patchwork-bot+netdevbpf @ 2026-06-22 23:10 UTC (permalink / raw)
To: Yiyang Chen
Cc: bpf, netfilter-devel, pablo, fw, phil, davem, edumazet, kuba,
pabeni, horms, andrii, eddyz87, ast, daniel, memxor, martin.lau,
song, yonghong.song, jolsa, emil, shuah, kartikey406, coreteam,
netdev, linux-kernel, linux-kselftest
In-Reply-To: <cover.1781765747.git.chenyy23@mails.tsinghua.edu.cn>
Hello:
This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Thu, 18 Jun 2026 10:18:42 +0000 you wrote:
> The conntrack lookup/allocation kfuncs expose an opts/opts__sz pair.
> The verifier checks the caller-provided opts__sz range, but the wrappers
> currently write opts->error after internal errors even when opts__sz is too
> small to include that field.
>
> Patch 1 writes opts->error only when opts__sz includes it, and uses a
> single helper to fold ERR_PTR returns into the kfunc ABI result while keeping
> the local nfct result variable in each wrapper.
> Patch 2 adds a bpf_nf regression check that keeps a guard in opts->error
> while passing opts__sz covering only netns_id.
>
> [...]
Here is the summary with links:
- [bpf-next,v2,1/2] bpf: Guard conntrack opts error writes
https://git.kernel.org/bpf/bpf/c/6f6183a39533
- [bpf-next,v2,2/2] selftests/bpf: Cover small conntrack opts error writes
https://git.kernel.org/bpf/bpf/c/38ba6d43af38
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net-next 0/3] selftests/xsk: stabilize timeout test behavior
From: Jakub Kicinski @ 2026-06-22 23:07 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Jason Xing, Tushar Vyavahare, netdev, magnus.karlsson, stfomichev,
kernelxing, davem, pabeni, ast, daniel, tirthendu.sarkar, bpf
In-Reply-To: <ajJsMj0QMOF5I8qq@boxer>
On Wed, 17 Jun 2026 11:43:14 +0200 Maciej Fijalkowski wrote:
> > On Tue, Jun 16, 2026 at 11:50 PM Tushar Vyavahare
> > <tushar.vyavahare@intel.com> wrote:
> > >
> > > This series improves AF_XDP selftests by making timeout handling
> > > explicit and fixing sources of non-determinism in xsk timeout tests.
> > >
> > > Patch 1 introduces test_spec::poll_tmout and removes implicit
> > > dependence on RX UMEM setup state for timeout behavior.
> > >
> > > Patch 2 fixes thread harness sequencing by attaching XDP programs
> > > before worker startup, removing signal-based termination, and using
> > > barrier synchronization only for dual-thread runs.
> > >
> > > Patch 3 restores shared_umem after POLL_TXQ_FULL so test-local
> > > configuration does not leak into subsequent cases on shared-netdev
> > > runs.
> > >
> > > Together these changes make timeout handling easier to follow and
> > > improve selftest stability, especially on real NIC runs.
> >
> > net-next is closed, but in the meantime I'll review the series ASAP.
> >
> > BTW, another thing about selftests I had in my mind is that are you
> > planning to work on this [1]?
>
> This one is on me. I took your changes Jason and aligned ZC batching side
> to this behavior, followed by xskxceiver adjustment. I am planning to send
> this today EOD, however let's see how badly internal Sashiko will kick my
> ass.
Hi Maciej, do you want these applied? If they help make the tests less
flaky I think that it's fine to take them during the merge window.
^ permalink raw reply
* Re: [PATCH net] net, bpf: check master for NULL in xdp_master_redirect()
From: Jakub Kicinski @ 2026-06-22 22:58 UTC (permalink / raw)
To: Xiang Mei
Cc: Jiayuan Chen, Daniel Borkmann, Martin KaFai Lau,
Jesper Dangaard Brouer, netdev, bpf, John Fastabend,
Stanislav Fomichev, Alexei Starovoitov, Jussi Maki, Paolo Abeni,
Weiming Shi, Ido Schimmel, David Ahern
In-Reply-To: <CAPpSM+TRJaqSSUs6tBp_CBSVUJMiDZm01o5BK6aCNak8vuQi+w@mail.gmail.com>
On Sun, 21 Jun 2026 18:28:09 -0700 Xiang Mei wrote:
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 40037413dd4e..6037860d5283 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -4430,7 +4430,7 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
> > > struct net_device *master, *slave;
> > >
> > > master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
> > > - if (unlikely(!(master->flags & IFF_UP)))
> > > + if (unlikely(!master || !(master->flags & IFF_UP)))
> > > return XDP_ABORTED;
> >
> >
> > I recall that when I previously modified this code, I removed the
> > !master check
> >
> > because this is on the fastpath. However, since this is a triggerable bug,
> > I think adding it here is fine.
>
> Thanks for the review. It's difficult to hit under normal statue, but
> the bug is real.
> We have triggered this bug with a PoC plus GDB to pause one thread (no
> other `cheating').
Can you double-confirm that this triggers on current HEAD
of linux/master ? I thought commit 2674d603a9e6 ("vrf: Fix a potential
NPD when removing a port from a VRF") was supposed to prevent all the
torn master fetches. Adding VRF folks to CC.
^ permalink raw reply
* Re: [PATCH net v2] sfc: Use acquire/release for irq_soft_enabled
From: Jakub Kicinski @ 2026-06-22 22:39 UTC (permalink / raw)
To: Gui-Dong Han
Cc: netdev, linux-net-drivers, ecree.xilinx, linux-kernel,
andrew+netdev, davem, edumazet, pabeni, horms, baijiaju1990
In-Reply-To: <20260618091618.3874171-1-hanguidong02@gmail.com>
On Thu, 18 Jun 2026 17:16:18 +0800 Gui-Dong Han wrote:
> Subject: [PATCH net v2] sfc: Use acquire/release for irq_soft_enabled
Sorry, I'm not applying driver memory barrier ordering changes of this
complexity because some static analyzer thinks it's wrong. Please don't
repost this unless someone from solarflare asks you to.
^ permalink raw reply
* Re: [PATCH net v2 2/2] net: ethernet: sunplus: spl2sw: fix multiple of_node refcount leaks in probe
From: Jakub Kicinski @ 2026-06-22 22:36 UTC (permalink / raw)
To: Shitalkumar Gandhi
Cc: Wells Lu, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, netdev, linux-kernel, Shitalkumar Gandhi
In-Reply-To: <fe9cbe90adf4bb6fe5f2f7b7f267075e603d84cb.1781552725.git.shitalkumar.gandhi@cambiumnetworks.com>
On Tue, 16 Jun 2026 01:20:32 +0530 Shitalkumar Gandhi wrote:
> + struct device_node *eth_ports_np __free(device_node) = NULL;
Please don't use __free() in networking code
--
pw-bot: cr
^ permalink raw reply
* [PATCH net v3 2/2] vsock/virtio: restore msg_iter on transmission failure
From: Octavian Purdila @ 2026-06-22 22:27 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Jens Axboe, Octavian Purdila,
syzbot+28e5f3d207b14bae122a
In-Reply-To: <20260622222757.2130402-1-tavip@google.com>
When transmission fails in virtio_transport_send_pkt_info, the msg_iter
might have been partially advanced. If we don't restore it, the next
attempt to send data will use an incorrect iterator state, leading to
desync and warnings like "send_pkt() returns 0, but X expected".
Specifically, this can happen in the following scenario, triggered by
the syzkaller repro:
1. A write-only VMA (PROT_WRITE only) is partially populated by a
prior TUN write that failed with -EIO but still faulted in some
pages).
2. A vsock sendmmsg call with MSG_ZEROCOPY requests transmission of a
buffer from this VMA.
3. The first packet (64KB) is sent successfully because the pages are
populated.
4. The second packet allocation fails because GUP fast pins the first page
but GUP slow fails on the next unpopulated page due to PROT_WRITE-only
permissions.
5. The iterator is advanced by the partially successful GUP (68KB total
advanced: 64KB from first packet + 4KB from second), but the send loop
breaks and only reports 64KB sent. This creates a 4KB desync.
6. The next retry starts with a non-zero iov_offset, disabling zerocopy
and falling back to copy mode.
7. In copy mode, the transmission succeeds for the next packets but
exhausts the iterator early because of the desync.
8. The final retry sees an empty iterator but zerocopy is re-enabled
(offset resets). It attempts to send the remaining bytes with zerocopy
but pins 0 pages, creating an empty packet.
9. The transport sends the empty packet, triggering the warning because
the returned bytes (header only) do not match the expected payload size.
10. The loop continues to spin, allocating ubuf_info each time, eventually
exhausting sysctl_optmem_max and returning -ENOMEM to userspace.
Restore msg_iter to its original state before the packet allocation
and transmission attempt if they fail.
Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
Reported-by: syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=28e5f3d207b14bae122a
Assisted-by: gemini:gemini-3.1-pro
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Octavian Purdila <tavip@google.com>
---
net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 09475007165b3..35fd4094d771d 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -295,6 +295,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
u32 src_cid, src_port, dst_cid, dst_port;
const struct virtio_transport *t_ops;
+ struct iov_iter_state msg_iter_state;
struct virtio_vsock_sock *vvs;
struct ubuf_info *uarg = NULL;
u32 pkt_len = info->pkt_len;
@@ -368,8 +369,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
struct sk_buff *skb;
size_t skb_len;
+ /* Save iterator state in case allocation or transmission fails
+ * so we can restore it and retry.
+ */
+ if (info->msg)
+ iov_iter_save_state(&info->msg->msg_iter, &msg_iter_state);
+
skb_len = min(max_skb_len, rest_len);
+ /* Note: virtio_transport_alloc_skb() can advance info->msg->msg_iter
+ * even if it fails (e.g. partial GUP success).
+ */
skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
uarg,
src_cid, src_port,
@@ -399,6 +409,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
break;
} while (rest_len);
+ if (info->msg && ret < 0)
+ iov_iter_restore(&info->msg->msg_iter, &msg_iter_state);
+
virtio_transport_put_credit(vvs, rest_len);
/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH net v3 1/2] iov_iter: export iov_iter_restore
From: Octavian Purdila @ 2026-06-22 22:27 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Jens Axboe, Octavian Purdila
In-Reply-To: <20260622222757.2130402-1-tavip@google.com>
Export iov_iter_restore so that it can be used by modules.
This is needed by the virtio vsock transport (which can be built as a
module) to restore the msg_iter state when transmission fails.
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Octavian Purdila <tavip@google.com>
---
lib/iov_iter.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 273919b161617..f5df63961fb24 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
i->__iov -= state->nr_segs - i->nr_segs;
i->nr_segs = state->nr_segs;
}
+EXPORT_SYMBOL_GPL(iov_iter_restore);
/*
* Extract a list of contiguous pages from an ITER_FOLIOQ iterator. This does
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH net v3 0/2] vsock/virtio: fix msg_iter desync on transmission failure
From: Octavian Purdila @ 2026-06-22 22:27 UTC (permalink / raw)
To: netdev
Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
virtualization, Xuan Zhuo, Jens Axboe, Octavian Purdila
This series fixes a msg_iter desync issue in the virtio vsock transport
that can lead to warnings and eventual -ENOMEM under specific failure
scenarios (e.g. partial GUP failure during MSG_ZEROCOPY transmission).
To fix this, we need to restore the msg_iter state on transmission failure.
However, since virtio vsock transport can be built as a module, we first
need to export iov_iter_restore.
Patch 1 exports iov_iter_restore.
Patch 2 implements the msg_iter restoration in virtio vsock.
Changes in v3:
- Use EXPORT_SYMBOL_GPL (Jens)
Changes in v2:
- Use iov_iter_savestate()/iov_iter_restore() (Stefano)
- Use a single restore point (Stefano)
- Reverse xmas tree (Stefano)
- Added comments in the code (Stefano)
v2: https://lore.kernel.org/all/20260613000953.467473-1-tavip@google.com/
v1: https://lore.kernel.org/all/20260609004809.1285028-1-tavip@google.com/
Octavian Purdila (2):
iov_iter: export iov_iter_restore
vsock/virtio: restore msg_iter on transmission failure
lib/iov_iter.c | 1 +
net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
2 files changed, 14 insertions(+)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply
* Re: [PATCH net 0/2] tcp: make TCP-AO lookups more predictable
From: Jakub Kicinski @ 2026-06-22 22:15 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, Simon Horman, Dmitry Safonov,
Neal Cardwell, Kuniyuki Iwashima, netdev, eric.dumazet
In-Reply-To: <20260622185248.1717846-1-edumazet@google.com>
On Mon, 22 Jun 2026 18:52:46 +0000 Eric Dumazet wrote:
> This series fixes a TCP-AO key lookup precedence bug.
>
> TCP-AO stores MKTs in an unsorted list and returns the first match. This
> allows newer, less-specific keys (wildcard VRF or shorter prefixes) to
> shadow older, more-specific keys if inserted later.
>
> Fix this by implementing sorted insertion in tcp_ao_link_mkt() based on
> key specificity (VRF binding, then prefix length). This keeps the RX
> lookup path fast while ensuring correctness.
>
> The second patch adds a selftest to verify this behavior.
Unhappiness:
https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-extra/results/702701/29-key-management-ipv4/stdout
^ permalink raw reply
* [PATCH net 8/8] e1000e: Reconfigure PLL clock gate timeout and re-enable K1 on Meteor Lake
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Dima Ruinskiy, anthony.l.nguyen, jacob.e.keller, timo.teras,
Moriya Kadosh, Todd Brandt
In-Reply-To: <20260622220059.2471844-1-anthony.l.nguyen@intel.com>
From: Dima Ruinskiy <dima.ruinskiy@intel.com>
Commit 3c7bf5af21960 ("e1000e: Introduce private flag to disable K1")
disabled K1 by default on Meteor Lake and newer systems due to packet
loss observed on various platforms. However, disabling K1 caused an
increase in power consumption.
To mitigate this, reconfigure the PLL clock gate value so that K1 can
remain enabled without incurring the additional power consumption.
Re-enable K1 by default, but keep the private flag to support disabling
it via ethtool. Additionally, introduce a DMI quirk table, so that K1 may
be disabled by default on known problematic systems. Currently, this
includes the Dell Pro 16 Plus, where the issue has been reported to persist
despite the changes to the PLL lock timeout.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220954
Link: https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20250623/048860.html
Link: https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20260330/054059.html
Signed-off-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Co-developed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Fixes: 3c7bf5af21960 ("e1000e: Introduce private flag to disable K1")
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 +++
drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index dea208db1be5..aa90e0ce8aca 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1594,6 +1594,9 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
phy_reg &= ~I217_PLL_CLOCK_GATE_MASK;
if (speed == SPEED_100 || speed == SPEED_10)
phy_reg |= 0x3E8;
+ else if (hw->mac.type == e1000_pch_mtp ||
+ hw->mac.type == e1000_pch_ptp)
+ phy_reg |= 0x1D5;
else
phy_reg |= 0xFA;
e1e_wphy_locked(hw, I217_PLL_CLOCK_GATE_REG, phy_reg);
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 808e5cddd6a9..844f31ab37ad 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -25,6 +25,7 @@
#include <linux/pm_runtime.h>
#include <linux/prefetch.h>
#include <linux/suspend.h>
+#include <linux/dmi.h>
#include "e1000.h"
#define CREATE_TRACE_POINTS
@@ -58,6 +59,17 @@ static const struct e1000_info *e1000_info_tbl[] = {
[board_pch_ptp] = &e1000_pch_ptp_info,
};
+static const struct dmi_system_id disable_k1_list[] = {
+ {
+ .ident = "Dell Pro 16 Plus PB16250",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Dell Pro 16 Plus PB16250"),
+ },
+ },
+ {}
+};
+
struct e1000_reg_info {
u32 ofs;
char *name;
@@ -7670,7 +7682,8 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
/* init PTP hardware clock */
e1000e_ptp_init(adapter);
- if (hw->mac.type >= e1000_pch_mtp)
+ /* disable K1 by default on known problematic systems */
+ if (hw->mac.type >= e1000_pch_mtp && dmi_check_system(disable_k1_list))
adapter->flags2 |= FLAG2_DISABLE_K1;
/* reset the hardware with the new settings */
--
2.47.1
^ permalink raw reply related
* [PATCH net 7/8] i40e: Fix i40e_debug() to use struct i40e_hw argument
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Mohamed Khalfella, anthony.l.nguyen, Aleksandr Loktionov,
Paul Menzel, Alexander Nowlin
In-Reply-To: <20260622220059.2471844-1-anthony.l.nguyen@intel.com>
From: Mohamed Khalfella <mkhalfella@purestorage.com>
i40e_debug() macro takes struct i40e_hw *h as first argument. But the
macro body uses hw instead of h. This has been working so far because hw
happens to be the name of the variable in the context where the macro is
expanded. Fix the macro to use the passed argument.
Fixes: 5dfd37c37a44 ("i40e: Split i40e_osdep.h")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_debug.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debug.h b/drivers/net/ethernet/intel/i40e/i40e_debug.h
index e9871dfb32bd..01fd70db9086 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debug.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_debug.h
@@ -42,7 +42,7 @@ struct device *i40e_hw_to_dev(struct i40e_hw *hw);
#define i40e_debug(h, m, s, ...) \
do { \
if (((m) & (h)->debug_mask)) \
- dev_info(i40e_hw_to_dev(hw), s, ##__VA_ARGS__); \
+ dev_info(i40e_hw_to_dev(h), s, ##__VA_ARGS__); \
} while (0)
#endif /* _I40E_DEBUG_H_ */
--
2.47.1
^ permalink raw reply related
* [PATCH net 4/8] ice: call netif_keep_dst() once when entering switchdev mode
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Marcin Szycik, anthony.l.nguyen, Aleksandr Loktionov, Paul Menzel,
Patryk Holda
In-Reply-To: <20260622220059.2471844-1-anthony.l.nguyen@intel.com>
From: Marcin Szycik <marcin.szycik@intel.com>
netif_keep_dst() only needs to be called once for the uplink VSI, not
once for each port representor. Move it from ice_eswitch_setup_repr()
to ice_eswitch_enable_switchdev().
Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path")
Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Patryk Holda <patryk.holda@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/ice/ice_eswitch.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_eswitch.c b/drivers/net/ethernet/intel/ice/ice_eswitch.c
index 2e4f0969035f..c30e27bbfe6e 100644
--- a/drivers/net/ethernet/intel/ice/ice_eswitch.c
+++ b/drivers/net/ethernet/intel/ice/ice_eswitch.c
@@ -117,8 +117,6 @@ static int ice_eswitch_setup_repr(struct ice_pf *pf, struct ice_repr *repr)
if (!repr->dst)
return -ENOMEM;
- netif_keep_dst(uplink_vsi->netdev);
-
dst = repr->dst;
dst->u.port_info.port_id = vsi->vsi_num;
dst->u.port_info.lower_dev = uplink_vsi->netdev;
@@ -312,6 +310,8 @@ static int ice_eswitch_enable_switchdev(struct ice_pf *pf)
if (ice_eswitch_br_offloads_init(pf))
goto err_br_offloads;
+ netif_keep_dst(uplink_vsi->netdev);
+
pf->eswitch.is_running = true;
return 0;
--
2.47.1
^ permalink raw reply related
* [PATCH net 6/8] ice: dpll: fix memory leak in ice_dpll_init_info error paths
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: ZhaoJinming, anthony.l.nguyen, arkadiusz.kubalewski,
grzegorz.nitka, jiri, vadim.fedorenko, Aleksandr Loktionov,
Rinitha S
In-Reply-To: <20260622220059.2471844-1-anthony.l.nguyen@intel.com>
From: ZhaoJinming <zhaojinming@uniontech.com>
Several error return paths in ice_dpll_init_info() directly return
without freeing previously allocated resources, causing memory leaks:
- When de->input_prio allocation fails, d->inputs is leaked
- When dp->input_prio allocation fails, d->inputs and de->input_prio
are leaked
- When ice_get_cgu_rclk_pin_info() fails, all previously allocated
inputs/outputs/input_prio are leaked
- When ice_dpll_init_pins_info(RCLK_INPUT) fails, same resources
are leaked
Fix this by jumping to the deinit_info label which properly calls
ice_dpll_deinit_info() to free all allocated resources.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/ice/ice_dpll.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
index 3876ee7255ac..30c3a4db7d61 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.c
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
@@ -4752,12 +4752,16 @@ static int ice_dpll_init_info(struct ice_pf *pf, bool cgu)
alloc_size = sizeof(*de->input_prio) * d->num_inputs;
de->input_prio = kzalloc(alloc_size, GFP_KERNEL);
- if (!de->input_prio)
- return -ENOMEM;
+ if (!de->input_prio) {
+ ret = -ENOMEM;
+ goto deinit_info;
+ }
dp->input_prio = kzalloc(alloc_size, GFP_KERNEL);
- if (!dp->input_prio)
- return -ENOMEM;
+ if (!dp->input_prio) {
+ ret = -ENOMEM;
+ goto deinit_info;
+ }
ret = ice_dpll_init_pins_info(pf, ICE_DPLL_PIN_TYPE_INPUT);
if (ret)
@@ -4782,12 +4786,12 @@ static int ice_dpll_init_info(struct ice_pf *pf, bool cgu)
ret = ice_get_cgu_rclk_pin_info(&pf->hw, &d->base_rclk_idx,
&pf->dplls.rclk.num_parents);
if (ret)
- return ret;
+ goto deinit_info;
for (i = 0; i < pf->dplls.rclk.num_parents; i++)
pf->dplls.rclk.parent_idx[i] = d->base_rclk_idx + i;
ret = ice_dpll_init_pins_info(pf, ICE_DPLL_PIN_TYPE_RCLK_INPUT);
if (ret)
- return ret;
+ goto deinit_info;
de->mode = DPLL_MODE_AUTOMATIC;
dp->mode = DPLL_MODE_AUTOMATIC;
--
2.47.1
^ permalink raw reply related
* [PATCH net 3/8] ice: fix ice_init_link() error return preventing probe
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Paul Greenwalt, anthony.l.nguyen, stable, Aleksandr Loktionov,
Simon Horman, Alexander Nowlin
In-Reply-To: <20260622220059.2471844-1-anthony.l.nguyen@intel.com>
From: Paul Greenwalt <paul.greenwalt@intel.com>
ice_init_link() can return an error status from ice_update_link_info()
or ice_init_phy_user_cfg(), causing probe to fail.
An incorrect NVM update procedure can result in link/PHY errors, and
the recommended resolution is to update the NVM using the correct
procedure. If the driver fails probe due to link errors, the user
cannot update the NVM to recover. The link/PHY errors logged are
non-fatal: they are already annotated as 'not a fatal error if this
fails'.
Since none of the errors inside ice_init_link() should prevent probe
from completing, convert it to void and remove the error check in the
caller. All failures are already logged; callers have no meaningful
recovery path for link init errors.
Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/ice/ice_main.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index e2fbe111f849..e2fd2dab03e3 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4789,16 +4789,14 @@ static void ice_init_wakeup(struct ice_pf *pf)
device_set_wakeup_enable(ice_pf_to_dev(pf), false);
}
-static int ice_init_link(struct ice_pf *pf)
+static void ice_init_link(struct ice_pf *pf)
{
struct device *dev = ice_pf_to_dev(pf);
int err;
err = ice_init_link_events(pf->hw.port_info);
- if (err) {
+ if (err)
dev_err(dev, "ice_init_link_events failed: %d\n", err);
- return err;
- }
/* not a fatal error if this fails */
err = ice_init_nvm_phy_type(pf->hw.port_info);
@@ -4838,8 +4836,6 @@ static int ice_init_link(struct ice_pf *pf)
} else {
set_bit(ICE_FLAG_NO_MEDIA, pf->flags);
}
-
- return err;
}
static int ice_init_pf_sw(struct ice_pf *pf)
@@ -4982,13 +4978,11 @@ static int ice_init(struct ice_pf *pf)
ice_init_wakeup(pf);
- err = ice_init_link(pf);
- if (err)
- goto err_init_link;
+ ice_init_link(pf);
err = ice_send_version(pf);
if (err)
- goto err_init_link;
+ goto err_deinit_pf_sw;
ice_verify_cacheline_size(pf);
@@ -5007,7 +5001,7 @@ static int ice_init(struct ice_pf *pf)
return 0;
-err_init_link:
+err_deinit_pf_sw:
ice_deinit_pf_sw(pf);
err_init_pf_sw:
ice_dealloc_vsis(pf);
--
2.47.1
^ permalink raw reply related
* [PATCH net 5/8] ice: dpll: set pointers to NULL after kfree in ice_dpll_deinit_info
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: ZhaoJinming, anthony.l.nguyen, arkadiusz.kubalewski,
grzegorz.nitka, jiri, vadim.fedorenko, Aleksandr Loktionov,
Rinitha S
In-Reply-To: <20260622220059.2471844-1-anthony.l.nguyen@intel.com>
From: ZhaoJinming <zhaojinming@uniontech.com>
ice_dpll_deinit_info() calls kfree() on several pf->dplls fields
(inputs, outputs, eec.input_prio, pps.input_prio) but does not set
the pointers to NULL afterward. This leaves dangling pointers in the
pf->dplls structure.
While not currently exploitable through existing code paths, this is
unsafe because:
1. If ice_dpll_init_info() is called again after a deinit (e.g. during
driver recovery), and a subsequent allocation within init fails, the
error path will jump to deinit_info and call ice_dpll_deinit_info()
again. Since some pointers still hold the old freed addresses, this
would result in a double-free.
2. Any future code that checks these pointers before use or after free
would be unprotected against use-after-free.
Follow the common kernel convention of setting pointers to NULL after
kfree() so that:
- kfree(NULL) is a safe no-op, preventing double-free
- NULL checks on these pointers become meaningful
This is a preparatory fix for a subsequent patch that routes additional
error paths in ice_dpll_init_info() to the deinit_info label.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/ice/ice_dpll.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
index 462c69cc11e1..3876ee7255ac 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.c
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
@@ -4645,9 +4645,13 @@ ice_dpll_init_pins_info(struct ice_pf *pf, enum ice_dpll_pin_type pin_type)
static void ice_dpll_deinit_info(struct ice_pf *pf)
{
kfree(pf->dplls.inputs);
+ pf->dplls.inputs = NULL;
kfree(pf->dplls.outputs);
+ pf->dplls.outputs = NULL;
kfree(pf->dplls.eec.input_prio);
+ pf->dplls.eec.input_prio = NULL;
kfree(pf->dplls.pps.input_prio);
+ pf->dplls.pps.input_prio = NULL;
}
/**
--
2.47.1
^ permalink raw reply related
* [PATCH net 2/8] ice: fix AQ error code comparison in ice_set_pauseparam()
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Lukasz Czapnik, anthony.l.nguyen, stephen, Aleksandr Loktionov,
Simon Horman, Rinitha S
In-Reply-To: <20260622220059.2471844-1-anthony.l.nguyen@intel.com>
From: Lukasz Czapnik <lukasz.czapnik@intel.com>
Fix unreachable code: the conditionals in ice_set_pauseparam() used
the bitwise-AND operator suggesting aq_failures is a bitmap, but it
is actually an enum, making the third condition logically unreachable.
Replace the if-else ladder with a switch statement. Also move the
aq_failures initialization to the variable declaration and remove the
redundant zeroing from ice_set_fc().
Fixes: fcea6f3da546 ("ice: Add stats and ethtool support")
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/ice/ice_common.c | 1 -
drivers/net/ethernet/intel/ice/ice_ethtool.c | 12 ++++++++----
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 31e0de9e7f60..ef1ce106f81b 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -3882,7 +3882,6 @@ ice_set_fc(struct ice_port_info *pi, u8 *aq_failures, bool ena_auto_link_update)
if (!pi || !aq_failures)
return -EINVAL;
- *aq_failures = 0;
hw = pi->hw;
pcaps = kzalloc_obj(*pcaps);
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 236d293aba98..49371b065845 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3508,7 +3508,7 @@ ice_set_pauseparam(struct net_device *netdev, struct ethtool_pauseparam *pause)
struct ice_vsi *vsi = np->vsi;
struct ice_hw *hw = &pf->hw;
struct ice_port_info *pi;
- u8 aq_failures;
+ u8 aq_failures = 0;
bool link_up;
u32 is_an;
int err;
@@ -3579,18 +3579,22 @@ ice_set_pauseparam(struct net_device *netdev, struct ethtool_pauseparam *pause)
/* Set the FC mode and only restart AN if link is up */
err = ice_set_fc(pi, &aq_failures, link_up);
- if (aq_failures & ICE_SET_FC_AQ_FAIL_GET) {
+ switch (aq_failures) {
+ case ICE_SET_FC_AQ_FAIL_GET:
netdev_info(netdev, "Set fc failed on the get_phy_capabilities call with err %d aq_err %s\n",
err, libie_aq_str(hw->adminq.sq_last_status));
err = -EAGAIN;
- } else if (aq_failures & ICE_SET_FC_AQ_FAIL_SET) {
+ break;
+ case ICE_SET_FC_AQ_FAIL_SET:
netdev_info(netdev, "Set fc failed on the set_phy_config call with err %d aq_err %s\n",
err, libie_aq_str(hw->adminq.sq_last_status));
err = -EAGAIN;
- } else if (aq_failures & ICE_SET_FC_AQ_FAIL_UPDATE) {
+ break;
+ case ICE_SET_FC_AQ_FAIL_UPDATE:
netdev_info(netdev, "Set fc failed on the get_link_info call with err %d aq_err %s\n",
err, libie_aq_str(hw->adminq.sq_last_status));
err = -EAGAIN;
+ break;
}
return err;
--
2.47.1
^ permalink raw reply related
* [PATCH net 1/8] ice: fix FDIR CTRL VSI resource leak in ice_reset_all_vfs()
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
Cc: Dawid Osuchowski, anthony.l.nguyen, Aleksandr Loktionov,
Simon Horman, Rafal Romanowski
In-Reply-To: <20260622220059.2471844-1-anthony.l.nguyen@intel.com>
From: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Resetting all VFs causes resource leak on VFs with FDIR filters
enabled as CTRL VSIs are only invalidated and not freed. Fix by using
ice_vf_ctrl_vsi_release() instead of ice_vf_ctrl_invalidate_vsi() which
aligns behavior with the ice_reset_vf() function.
Reproduction:
echo 1 > /sys/class/net/$pf/device/sriov_numvfs
ethtool -N $vf flow-type ether proto 0x9000 action 0
echo 1 > /sys/class/net/$pf/device/reset
Fixes: da62c5ff9dcd ("ice: Add support for per VF ctrl VSI enabling")
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/ice/ice_vf_lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
index b1f46707dcc0..27e4acb1620f 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
@@ -801,7 +801,7 @@ void ice_reset_all_vfs(struct ice_pf *pf)
* setup only when VF creates its first FDIR rule.
*/
if (vf->ctrl_vsi_idx != ICE_NO_VSI)
- ice_vf_ctrl_invalidate_vsi(vf);
+ ice_vf_ctrl_vsi_release(vf);
ice_vf_pre_vsi_rebuild(vf);
if (ice_vf_rebuild_vsi(vf)) {
--
2.47.1
^ permalink raw reply related
* [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-06-22 (ice, i40e, e1000e)
From: Tony Nguyen @ 2026-06-22 22:00 UTC (permalink / raw)
To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev; +Cc: Tony Nguyen
For ice:
Dawid changes call to release control VSI during reset to prevent
leaking it.
Lukasz fixes flow control error check to check value rather than treat
is as bitmap values.
Paul makes link related errors non-fatal to probe to allow for recovery
in certain NVM update situations.
Marcin moves netif_keep_dst() to only be called once when entering
switchdev mode.
ZhaoJinming adds a cleanup path for ice_dpll_init_info() to prevent
memory leaks on error path.
For i40e:
Mohamed Khalfella corrects argument passed in macro to match the
one provided to the macro.
For e1000e:
Dima resolves power state issues by adjusting value of PLL clock gate
and re-enabling K1; a quirk table is added to keep it off for known bad
systems.
The following are changes since commit 56abdaebbf0da304b860bed1f2b5a85f5a6a16a0:
Merge tag 'nf-26-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 100GbE
Dawid Osuchowski (1):
ice: fix FDIR CTRL VSI resource leak in ice_reset_all_vfs()
Dima Ruinskiy (1):
e1000e: Reconfigure PLL clock gate timeout and re-enable K1 on Meteor
Lake
Lukasz Czapnik (1):
ice: fix AQ error code comparison in ice_set_pauseparam()
Marcin Szycik (1):
ice: call netif_keep_dst() once when entering switchdev mode
Mohamed Khalfella (1):
i40e: Fix i40e_debug() to use struct i40e_hw argument
Paul Greenwalt (1):
ice: fix ice_init_link() error return preventing probe
ZhaoJinming (2):
ice: dpll: set pointers to NULL after kfree in ice_dpll_deinit_info
ice: dpll: fix memory leak in ice_dpll_init_info error paths
drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 +++
drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++++++++++-
drivers/net/ethernet/intel/i40e/i40e_debug.h | 2 +-
drivers/net/ethernet/intel/ice/ice_common.c | 1 -
drivers/net/ethernet/intel/ice/ice_dpll.c | 20 ++++++++++++++------
drivers/net/ethernet/intel/ice/ice_eswitch.c | 4 ++--
drivers/net/ethernet/intel/ice/ice_ethtool.c | 12 ++++++++----
drivers/net/ethernet/intel/ice/ice_main.c | 16 +++++-----------
drivers/net/ethernet/intel/ice/ice_vf_lib.c | 2 +-
9 files changed, 48 insertions(+), 27 deletions(-)
--
2.47.1
^ permalink raw reply
* Re: building ynl afaics requires updating the UAPI headers first
From: Jakub Kicinski @ 2026-06-22 21:56 UTC (permalink / raw)
To: Thorsten Leemhuis; +Cc: Donald Hunter, netdev, Riana Tauro
In-Reply-To: <f88cfe04-c817-4383-866b-530c5bc5bd95@leemhuis.info>
On Mon, 22 Jun 2026 18:33:29 +0200 Thorsten Leemhuis wrote:
> On 6/22/26 18:05, Jakub Kicinski wrote:
> > On Fri, 19 Jun 2026 09:28:47 +0200 Thorsten Leemhuis wrote:
> >> On 6/19/26 02:06, Jakub Kicinski wrote:
> >>> Can't repro for some reason, but we probably need something like
> >>> commit 46e9b0224475abc to add the explicit include rule.
> >>
> >> Thx for the pointer. So I guess you mean something like the below,
> >> which did the trick for me. Will submit this as properly, unless
> >> someone points out something stupid in it.
> [...]
>
> No, because the funny thing is: now I fail to reproduce it myself. And I
> don't know why, as 24h earlier when you had written "Can't repro for
> some reason" I had once more checked that I could trigger this by
> downgrading Fedora's kernel-headers package to a version from some weeks
> ago. Not sure what changed since then.
>
> Want me to sent it nevertheless?
Yes, let's get it in place, it can't hurt AFAIU.
^ permalink raw reply
* [RFC PATCH v3 1/4] timekeeping: Apply extrapolated ntp_error to clock snapshots
From: David Woodhouse @ 2026-06-22 20:36 UTC (permalink / raw)
To: Rodolfo Giometti, David Woodhouse, Richard Cochran, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
John Stultz, Thomas Gleixner, Stephen Boyd, Miroslav Lichvar,
linux-kernel, netdev, Alexander Gordeev
Cc: David Woodhouse
In-Reply-To: <20260622211822.1056437-1-dwmw2@infradead.org>
From: David Woodhouse <dwmw@amazon.co.uk>
The time reported in ::systime of a system_time_snapshot is known to be
slightly inaccurate because of the way that the reported realtime clock
sawtooths around the *intended* time series, limited by the integer mult
value used to calculate the inter-tick times, and designed to ensure
smoothness and monotonicity for its consumers.
It is particularly inaccurate in a tickless kernel, where ntp_err_mult
is not adjusted on each tick, allowing the reported clock to diverge
from the intended time for a large number of ticks before re-converging.
This appears to be the reason why CONFIG_NTP_PPS is not enabled on
tickless kernels — because at that scale of precision, the realtime
snapshot at the time of the pulse bears little relation to the time the
kernel *actually* believes it to be, thus introducing random errors into
the PPS phase correction.
It would be better for callers of get_device_system_crosststamp() and
ktime_get_snapshot_id() to receive the *accurate* time, not the
sanitized version provided to gettimeofday().
Compute the deviation in snapshot_ntp_error() and add it to the returned
::systime so the snapshot lands on the ideal line. It sums four terms in
ns << NTP_SCALE_SHIFT before converting to signed ns:
- tk->ntp_error, the deviation as of the last update;
- (cycle_delta * ntp_err_frac), the fractional-mult drift accrued
since then (cycle_delta is at most a tick on a tickful kernel, but
many ticks' worth under NO_HZ);
- (cycle_delta * ntp_err_mult), subtracting the applied +1 mult dither
over the same span;
- the sub-nanosecond fraction dropped when the read was truncated to
whole ns (low shift bits, exact despite the multiply overflowing).
The helper uses the timekeeper selected for the requested clock id, so
all NTP-disciplined clocks are corrected, including the AUX clocks (each
has its own NTP instance); only CLOCK_MONOTONIC_RAW is undisciplined and
gets no correction. The residual is then a single clocksource cycle, the
same bound as a tickful kernel.
Note that this *unconditionally* changes the ::systime returned by all
snapshot and cross timestamp consumers (PTP SYS_OFFSET_PRECISE/EXTENDED,
etc.): it is now the ideal NTP-disciplined time rather than the raw
accumulated clock.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Assisted-by: Kiro:claude-opus-4.8
---
include/linux/timekeeper_internal.h | 6 +++
kernel/time/timekeeping.c | 71 +++++++++++++++++++++++++++--
2 files changed, 73 insertions(+), 4 deletions(-)
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index fb37a736ec1c..bc05d52a5f96 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -97,6 +97,11 @@ struct tk_read_base {
* @ntp_error_shift: Shift conversion between clock shifted nano seconds and
* ntp shifted nano seconds.
* @ntp_err_mult: Multiplication factor for scaled math conversion
+ * @ntp_err_frac: Fractional part of the per-cycle NTP-ideal mult that the
+ * integer @mult truncates, as a fraction of 2^32 in
+ * clock-shifted nanoseconds per cycle. Used to
+ * extrapolate @ntp_error to an arbitrary cycle count in
+ * the lockless snapshot readers (ktime_get_snapshot_id).
* @cs_tick_adj: Per-second adjustment handed to NTP via ntp_clear()
* accounting for the difference between the nominal
* NTP interval and the real time taken by the
@@ -187,6 +192,7 @@ struct timekeeper {
s64 ntp_error;
u32 ntp_error_shift;
u32 ntp_err_mult;
+ u64 ntp_err_frac;
s64 cs_tick_adj;
u32 skip_second_overflow;
s64 skew_delta;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index eb94ec99d503..1c989e5ebe84 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -422,6 +422,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
tk->tkr_mono.mult = clock->mult;
tk->tkr_raw.mult = clock->mult;
tk->ntp_err_mult = 0;
+ tk->ntp_err_frac = 0;
tk->skip_second_overflow = 0;
tk->skew_delta = 0;
@@ -1226,6 +1227,51 @@ static inline u64 tk_clock_read_snapshot(const struct tk_read_base *tkr,
return clock->read(clock);
}
+/*
+ * snapshot_ntp_error - record how far a snapshot's ::systime is from the
+ * ideal NTP-disciplined time at @now, in signed nanoseconds, so a caller
+ * can land exactly on the ideal line by adding it to ::systime.
+ *
+ * The value is summed in ns << NTP_SCALE_SHIFT from four parts:
+ *
+ * - tk->ntp_error, the deviation accumulated as of the last timekeeping
+ * update (tkr_mono.cycle_last);
+ * - (cycle_delta * ntp_err_frac), the fractional-mult drift accrued over
+ * the cycles read since then -- at most a tick on a tickful kernel, but
+ * potentially many ticks' worth under NO_HZ;
+ * - (cycle_delta * ntp_err_mult), subtracting the applied +1 mult dither
+ * over the same span;
+ * - the sub-nanosecond fraction that ::systime dropped when the read was
+ * truncated to whole ns (the low @shift bits, exact even though the
+ * multiply overflows).
+ *
+ * CLOCK_MONOTONIC_RAW is not NTP-disciplined and carries no error. Every
+ * other clock id uses its own timekeeper @tk -- including the AUX clocks,
+ * which each have their own NTP instance.
+ */
+static s64 snapshot_ntp_error(const struct timekeeper *tk, clockid_t clock_id,
+ u64 now)
+{
+ u64 cycle_delta;
+ u32 nes;
+ s64 tmp, err;
+
+ if (clock_id == CLOCK_MONOTONIC_RAW)
+ return 0;
+
+ cycle_delta = (now - tk->tkr_mono.cycle_last) & tk->tkr_mono.mask;
+ nes = tk->ntp_error_shift;
+
+ err = tk->ntp_error;
+ err += ((s64)mul_u64_u64_shr(cycle_delta, tk->ntp_err_frac, 32) -
+ (s64)(cycle_delta * tk->ntp_err_mult)) << nes;
+
+ tmp = (s64)(cycle_delta * tk->tkr_mono.mult + tk->tkr_mono.xtime_nsec);
+ tmp &= (1ULL << tk->tkr_mono.shift) - 1;
+ err += tmp << nes;
+
+ return (err + (1LL << (NTP_SCALE_SHIFT - 1))) >> NTP_SCALE_SHIFT;
+}
/**
* ktime_get_snapshot_id - Simultaneously snapshot a given clock ID with
@@ -1238,6 +1284,7 @@ void ktime_get_snapshot_id(clockid_t clock_id, struct system_time_snapshot *syst
{
ktime_t base_raw, base_sys, offs_sys, *offs, offs_zero = 0;
u64 nsec_raw, nsec_sys, now;
+ s64 ntp_error;
struct timekeeper *tk;
struct tk_data *tkd;
unsigned int seq;
@@ -1300,10 +1347,12 @@ void ktime_get_snapshot_id(clockid_t clock_id, struct system_time_snapshot *syst
nsec_sys = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
+
+ ntp_error = snapshot_ntp_error(tk, clock_id, now);
} while (read_seqcount_retry(&tkd->seq, seq));
systime_snapshot->cycles = now;
- systime_snapshot->systime = ktime_add_ns(base_sys, offs_sys + nsec_sys);
+ systime_snapshot->systime = ktime_add_ns(base_sys, offs_sys + nsec_sys) + ntp_error;
systime_snapshot->monoraw = ktime_add_ns(base_raw, nsec_raw);
/*
@@ -1552,6 +1601,7 @@ int get_device_system_crosststamp(int (*get_time_fn)
unsigned int seq, clock_was_set_seq = 0;
ktime_t base_sys, base_raw, *offs;
u64 nsec_sys, nsec_raw;
+ s64 ntp_error;
u8 cs_was_changed_seq;
bool do_interp;
struct timekeeper *tk;
@@ -1617,9 +1667,10 @@ int get_device_system_crosststamp(int (*get_time_fn)
nsec_sys = timekeeping_cycles_to_ns(&tk->tkr_mono, cycles);
nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, cycles);
+ ntp_error = snapshot_ntp_error(tk, xtstamp->clock_id, cycles);
} while (read_seqcount_retry(&tkd->seq, seq));
- xtstamp->sys_systime = ktime_add_ns(base_sys, nsec_sys);
+ xtstamp->sys_systime = ktime_add_ns(base_sys, nsec_sys) + ntp_error;
xtstamp->sys_monoraw = ktime_add_ns(base_raw, nsec_raw);
/*
@@ -2447,6 +2498,7 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
{
u64 ntp_tl = ntp_tick_length(tk->id);
s64 skew = ntp_get_skew_delta(tk->id);
+ u64 dividend;
u32 mult;
/*
@@ -2465,8 +2517,19 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
* scale it back up to the full per-tick rate for the mult bias.
*/
skew *= NTP_INTERVAL_FREQ;
- mult = div64_u64((tk->ntp_tick + skew) >> tk->ntp_error_shift,
- tk->cycle_interval);
+ dividend = (tk->ntp_tick + skew) >> tk->ntp_error_shift;
+ mult = div64_u64(dividend, tk->cycle_interval);
+ /*
+ * Stash the fractional part of the per-cycle ideal mult that
+ * the integer @mult discards, scaled by 2^32, in clock-shifted
+ * ns per cycle. The lockless snapshot readers use it to
+ * extrapolate @ntp_error forward over the cycles accumulated
+ * since the last tick (which on a NO_HZ kernel may be many
+ * ticks' worth).
+ */
+ tk->ntp_err_frac = div64_u64((dividend - (u64)mult *
+ tk->cycle_interval) << 32,
+ tk->cycle_interval);
}
/*
--
2.54.0
^ permalink raw reply related
* [RFC PATCH v3 0/4] Add ntp_error to clock snapshot, enable NTP_PPS on tickless kernel
From: David Woodhouse @ 2026-06-22 20:36 UTC (permalink / raw)
To: Rodolfo Giometti, David Woodhouse, Richard Cochran, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
John Stultz, Thomas Gleixner, Stephen Boyd, Miroslav Lichvar,
linux-kernel, netdev, Alexander Gordeev
The real time reported by the kernel's timekeeping sawtooths around the
'ideal' time line that the kernel would like to report. Limited by the
integer arithmetic, the kernel varies the 'mult' factor by ±1 each tick
to achieve the correct rate on average over time. The reported time is
further sanitized to ensure continuity when mult changes, even part way
through a tick. The delta from the ideal to what is currently reported,
is stored in tk->ntp_error. The sawtooth effect is more pronounced on
tickless kernels, as the 'mult' value does not get adjusted each tick
but only less frequently.
Both ktime_get_snapshot_id() and get_device_system_crosststamp() report
the same sanitized time, but *every* user of those functions (PTP, PPS,
KVM enlightenments) would be better served by the true ideal time.
Add snapshot_ntp_error() helper and use it from both of those functions
to adjust the system time they report and return the more accurate
result.
With this change, CONFIG_NTP_PPS works correctly on a tickless kernel;
enable it. And change the non-CONFIG_NTP_PPS code path in pps_get_ts()
to use ktime_get_snapshot_id() too, for the more accurate data.
Tested with a hack to make vmclock simulate a 1PPS signal, although there
are now better options for that. But it's enough to show that even the
tickless kernel converges to ±1ns of the PPS signal and remains there
(tested with a periodic PTP_SYS_OFFSET_EXTENDED to compare with the
vmclock reference).
[ 0.653414] Run /init as init process
PPS: coarse-set CLOCK_REALTIME from vmclock
[ 1.655824] pps pps0: bound kernel consumer: edge=0x1
PPS: enabled, bound to hardpps, STA_PPSTIME|STA_PPSFREQ set
PRECISE: dev-sys(adj)=+5609ns OK
EXT[ 0] diff=+5608ns
EXT[ 1] diff=+5262ns
[ 2.917076] hardpps: PPSJITTER: jitter=5173, limit=0
EXT[ 2] diff=+4914ns
EXT[ 3] diff=+1197ns
EXT[ 4] diff=-297ns
EXT[ 5] diff=-103ns
EXT[ 6] diff=+0ns
EXT[ 7] diff=+0ns
EXT[ 8] diff=-1ns
EXT[ 9] diff=+0ns
EXT[10] diff=+0ns
David Woodhouse (4):
timekeeping: Apply extrapolated ntp_error to clock snapshots
pps: Drop the !NO_HZ_COMMON dependency from NTP_PPS
pps: Always use ktime_get_snapshot_id() for pps_get_ts()
[DO NOT MERGE] ptp: ptp_vmclock: Add simulated 1PPS support
drivers/pps/Kconfig | 3 -
drivers/ptp/ptp_vmclock.c | 196 ++++++++++++++++++++++++++++++++++--
include/linux/pps_kernel.h | 4 +-
include/linux/timekeeper_internal.h | 6 ++
kernel/time/timekeeping.c | 71 ++++++++++++-
5 files changed, 259 insertions(+), 21 deletions(-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox