* Re: [PATCH net-next v3 09/12] net: airoha: Support multiple net_devices for a single FE GDM port
From: Lorenzo Bianconi @ 2026-04-10 10:19 UTC (permalink / raw)
To: Jakub Kicinski
Cc: andrew+netdev, davem, edumazet, pabeni, robh, krzk+dt, conor+dt,
ansuelsmth, benjamin.larsson, linux-arm-kernel, linux-mediatek,
netdev, devicetree
In-Reply-To: <20260410025724.1810007-1-kuba@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 8640 bytes --]
On Apr 09, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email says "For your consideration."
> ---
> net: airoha: Support multiple net_devices for a single FE GDM port
>
> This commit introduces support for mapping multiple net_device interfaces
> to a single Frame Engine (FE) GDM port via an external hardware arbiter.
> It updates the rx/tx paths, port allocation, and registration logic to
> handle the new net_device array.
>
> > diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> > --- a/drivers/net/ethernet/airoha/airoha_eth.c
> > +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> [ ... ]
> > @@ -615,9 +617,8 @@ static int airoha_qdma_rx_process(struct airoha_queue *q, int budget)
> > u32 hash, reason, msg1 = le32_to_cpu(desc->msg1);
> > struct page *page = virt_to_head_page(e->buf);
> > u32 desc_ctrl = le32_to_cpu(desc->ctrl);
> > - struct airoha_gdm_port *port;
> > - struct net_device *netdev;
> > - int data_len, len, p;
> > + struct airoha_gdm_dev *dev;
> > + int data_len, len;
> >
> > if (!(desc_ctrl & QDMA_DESC_DONE_MASK))
> > break;
>
> While looking at this code, I noticed an issue that predates this patch.
> Can the lack of a memory barrier after checking the descriptor ownership bit
> cause stale reads?
>
> The driver checks QDMA_DESC_DONE_MASK to verify that the hardware has finished
> writing the descriptor. However, it reads desc->msg1 before this check and
> without a dma_rmb():
>
> u32 hash, reason, msg1 = le32_to_cpu(desc->msg1);
> struct page *page = virt_to_head_page(e->buf);
> u32 desc_ctrl = le32_to_cpu(desc->ctrl);
>
> if (!(desc_ctrl & QDMA_DESC_DONE_MASK))
> break;
>
> Since weakly ordered architectures can reorder memory reads, could the CPU
> prefetch msg1 before the DMA engine writes the new descriptor values, leading
> to routing the packet to the wrong gdm_dev?
This issue has been already fixed with the following commit:
commit 4ae0604a0673e11e2075b178387151fcad5111b5
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue Apr 7 08:48:04 2026 +0200
net: airoha: Add dma_rmb() and READ_ONCE() in airoha_qdma_rx_process()
Add missing dma_rmb() in airoha_qdma_rx_process routine to make sure the
DMA read operations are completed when the NIC reports the processing on
the current descriptor is done. Moreover, add missing READ_ONCE() in
airoha_qdma_rx_process() for DMA descriptor control fields in order to
avoid any compiler reordering.
>
> [ ... ]
> > @@ -1764,9 +1765,7 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_dev *dev)
> > airoha_fe_clear(eth, REG_FE_VIP_PORT_EN, BIT(AIROHA_GDM2_IDX));
> > airoha_fe_clear(eth, REG_FE_IFC_PORT_EN, BIT(AIROHA_GDM2_IDX));
> >
> > - /* XXX: handle XSI_USB_PORT and XSI_PCE1_PORT */
> > - nbq = port->id == AIROHA_GDM3_IDX && airoha_is_7581(eth) ? 4 : 0;
> > - src_port = eth->soc->ops.get_src_port_id(port, nbq);
> > + src_port = eth->soc->ops.get_src_port_id(port, dev->nbq);
> > if (src_port < 0)
> > return src_port;
>
> Can configuring the loopback for one device break connectivity for other
> devices sharing the same port?
>
> When multiple net_devices (like PCIE0 and PCIE1 on GDM3) open and call
> airoha_set_gdm2_loopback(), they modify global hardware registers such as
> REG_FE_WAN_PORT.
>
> Since WAN0_MASK is unconditionally updated to track the current device's
> src_port:
>
> airoha_fe_rmw(eth, REG_FE_WAN_PORT,
> WAN1_EN_MASK | WAN1_MASK | WAN0_MASK,
> FIELD_PREP(WAN0_MASK, src_port));
>
> won't the last configured device clobber the hardware state for the others?
This issue is fixed with the following patch in the series:
commit 008995bd8f39c5fdf37a933fa3c962e7ef52e8c6
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Fri Apr 3 12:07:27 2026 +0200
net: airoha: Introduce WAN device flag
Introduce WAN flag to specify if a given device is used to transmit/receive
WAN or LAN traffic. Current codebase supports specifying LAN/WAN device
configuration in ndo_init() callback during device bootstrap.
Please note it is possible to specify multiple LAN devices but just a
single WAN one.
>
> [ ... ]
> > @@ -1987,7 +1986,8 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
> > }
> >
> > fport = airoha_get_fe_port(dev);
> > - msg1 = FIELD_PREP(QDMA_ETH_TXMSG_FPORT_MASK, fport) |
> > + msg1 = FIELD_PREP(QDMA_ETH_TXMSG_NBOQ_MASK, dev->nbq) |
> > + FIELD_PREP(QDMA_ETH_TXMSG_FPORT_MASK, fport) |
> > FIELD_PREP(QDMA_ETH_TXMSG_METER_MASK, 0x7f);
> >
> > q = &qdma->q_tx[qid];
>
> Does sharing the hardware QDMA TX queues among multiple net_device instances
> introduce a permanent transmit stall?
>
> In airoha_dev_xmit(), if a shared hardware queue is full, the driver stops the
> netdev_queue of the specific netdev making the call:
>
> if (q->queued + nr_frags >= q->ndesc) {
> /* not enough space in the queue */
> netif_tx_stop_queue(txq);
> spin_unlock_bh(&q->lock);
> return NETDEV_TX_BUSY;
> }
>
> However, the TX completion handler (airoha_tx_irq_process()) only wakes the
> queue of the netdev attached to the completed packet. If an interface's TX
> queue is stopped but it has no pending packets in the hardware queue (because
> another interface filled it), will it ever get woken up?
This issue is orthogonal to this series and it will be fixed with a dedicated
patch.
>
> [ ... ]
> > @@ -2955,11 +2958,25 @@ static int airoha_alloc_gdm_device(struct airoha_eth *eth,
> > netdev->dev_addr);
> > }
> >
> > + /* Allowed nbq for EN7581 on GDM3 port are 4 and 5 for PCIE0
> > + * and PCIE1 respectively.
> > + */
> > + index = nbq;
> > + if (index && airoha_is_7581(eth) && port->id == AIROHA_GDM3_IDX)
> > + index -= 4;
> > +
> > + if (index >= ARRAY_SIZE(port->devs) || port->devs[index]) {
> > + dev_err(eth->dev, "invalid nbq id: %d\n", nbq);
> > + return -EINVAL;
> > + }
> > +
> > + netdev->dev.of_node = of_node_get(np);
> > dev = netdev_priv(netdev);
>
> Does this code leak the of_node reference?
>
> The node reference is explicitly incremented via of_node_get(np). The networking
> core does not automatically drop explicit of_node references on free_netdev().
> Without a matching of_node_put() in the error paths or the driver's teardown
> function (airoha_remove()), will this reference permanently leak when the device
> is destroyed?
ack, I will fix it in v4.
>
> [ ... ]
> > @@ -3003,7 +3021,43 @@ static int airoha_alloc_gdm_port(struct airoha_eth *eth,
> > if (err)
> > return err;
> >
> > - return airoha_alloc_gdm_device(eth, port, np);
> > + for_each_child_of_node(np, node) {
> > + /* Multiple external serdes connected to the FE GDM port via an
> > + * external arbiter.
> > + */
> > + const __be32 *nbq_ptr;
> > + int nbq;
> > +
> > + if (!of_device_is_compatible(node, "airoha,eth-port"))
> > + continue;
> > +
> > + d++;
> > + if (!of_device_is_available(node))
> > + continue;
> > +
> > + nbq_ptr = of_get_property(node, "reg", NULL);
> > + if (!nbq_ptr) {
> > + dev_err(eth->dev, "missing nbq id\n");
> > + of_node_put(node);
> > + return -EINVAL;
> > + }
> > +
> > + /* Verify the provided nbq parameter is valid */
> > + nbq = be32_to_cpup(nbq_ptr);
> > + err = eth->soc->ops.get_src_port_id(port, nbq);
> > + if (err < 0) {
> > + of_node_put(node);
> > + return err;
> > + }
> > +
> > + err = airoha_alloc_gdm_device(eth, port, nbq, node);
> > + if (err) {
> > + of_node_put(node);
> > + return err;
> > + }
> > + }
> > +
> > + return !d ? airoha_alloc_gdm_device(eth, port, 0, np) : 0;
> > }
>
> Does passing 0 for nbq in the fallback path break older device trees on the
> EN7581 SoC?
>
> Later, when the interface is brought up, airoha_set_gdm2_loopback() calls
> airoha_en7581_get_src_port_id(), which returns -EINVAL for GDM3 if nbq is
> not 4 or 5:
>
> switch (port->id) {
> case AIROHA_GDM3_IDX:
> /* 7581 SoC supports PCIe serdes on GDM3 port */
> if (nbq == 4)
> return HSGMII_LAN_7581_PCIE0_SRCPORT;
> if (nbq == 5)
> return HSGMII_LAN_7581_PCIE1_SRCPORT;
> break;
>
> If nbq is hardcoded to 0 for legacy DTs, won't this cause interface
> initialization to fail unconditionally for GDM3?
ack, I will fix it in v4.
Regards,
Lorenzo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* [bug report] xfrm: fix ip_rt_bug race in icmp_route_lookup reverse path
From: Dan Carpenter @ 2026-04-10 10:16 UTC (permalink / raw)
To: Jiayuan Chen; +Cc: Simon Horman, netdev
Hello Jiayuan Chen,
Commit 81b84de32bb2 ("xfrm: fix ip_rt_bug race in icmp_route_lookup
reverse path") from Feb 6, 2026 (linux-next), leads to the following
Smatch static checker warning:
net/ipv4/icmp.c:587 icmp_route_lookup()
error: we previously assumed 'rt2' could be null (see line 576)
net/ipv4/icmp.c
491 static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
492 struct sk_buff *skb_in,
493 const struct iphdr *iph, __be32 saddr,
494 dscp_t dscp, u32 mark, int type,
495 int code, struct icmp_bxm *param)
496 {
497 struct net_device *route_lookup_dev;
498 struct dst_entry *dst, *dst2;
499 struct rtable *rt, *rt2;
500 struct flowi4 fl4_dec;
501 int err;
502
503 memset(fl4, 0, sizeof(*fl4));
504 fl4->daddr = (param->replyopts.opt.srr ?
505 param->replyopts.opt.faddr : iph->saddr);
506 fl4->saddr = saddr;
507 fl4->flowi4_mark = mark;
508 fl4->flowi4_uid = sock_net_uid(net, NULL);
509 fl4->flowi4_dscp = dscp;
510 fl4->flowi4_proto = IPPROTO_ICMP;
511 fl4->fl4_icmp_type = type;
512 fl4->fl4_icmp_code = code;
513 route_lookup_dev = icmp_get_route_lookup_dev(skb_in);
514 fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev);
515
516 security_skb_classify_flow(skb_in, flowi4_to_flowi_common(fl4));
517 rt = ip_route_output_key_hash(net, fl4, skb_in);
518 if (IS_ERR(rt))
519 return rt;
520
521 /* No need to clone since we're just using its address. */
522 rt2 = rt;
523
524 dst = xfrm_lookup(net, &rt->dst,
525 flowi4_to_flowi(fl4), NULL, 0);
526 rt = dst_rtable(dst);
527 if (!IS_ERR(dst)) {
528 if (rt != rt2)
529 return rt;
530 if (inet_addr_type_dev_table(net, route_lookup_dev,
531 fl4->daddr) == RTN_LOCAL)
532 return rt;
533 } else if (PTR_ERR(dst) == -EPERM) {
534 rt = NULL;
535 } else {
536 return rt;
537 }
538 err = xfrm_decode_session_reverse(net, skb_in, flowi4_to_flowi(&fl4_dec), AF_INET);
539 if (err)
540 goto relookup_failed;
541
542 if (inet_addr_type_dev_table(net, route_lookup_dev,
543 fl4_dec.saddr) == RTN_LOCAL) {
544 rt2 = __ip_route_output_key(net, &fl4_dec);
545 if (IS_ERR(rt2))
546 err = PTR_ERR(rt2);
547 } else {
548 struct flowi4 fl4_2 = {};
549 unsigned long orefdst;
550
551 fl4_2.daddr = fl4_dec.saddr;
552 rt2 = ip_route_output_key(net, &fl4_2);
553 if (IS_ERR(rt2)) {
554 err = PTR_ERR(rt2);
555 goto relookup_failed;
556 }
557 /* Ugh! */
558 orefdst = skb_dstref_steal(skb_in);
559 err = ip_route_input(skb_in, fl4_dec.daddr, fl4_dec.saddr,
560 dscp, rt2->dst.dev) ? -EINVAL : 0;
561
562 dst_release(&rt2->dst);
563 rt2 = skb_rtable(skb_in);
564 /* steal dst entry from skb_in, don't drop refcnt */
565 skb_dstref_steal(skb_in);
566 skb_dstref_restore(skb_in, orefdst);
567
568 /*
569 * At this point, fl4_dec.daddr should NOT be local (we
570 * checked fl4_dec.saddr above). However, a race condition
571 * may occur if the address is added to the interface
572 * concurrently. In that case, ip_route_input() returns a
573 * LOCAL route with dst.output=ip_rt_bug, which must not
574 * be used for output.
575 */
576 if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
^^^
Can rt2 really be NULL here?
577 net_warn_ratelimited("detected local route for %pI4 during ICMP sending, src %pI4\n",
578 &fl4_dec.daddr, &fl4_dec.saddr);
579 dst_release(&rt2->dst);
580 err = -EINVAL;
581 }
582 }
583
584 if (err)
585 goto relookup_failed;
586
--> 587 dst2 = xfrm_lookup(net, &rt2->dst, flowi4_to_flowi(&fl4_dec), NULL,
^^^^^^^^^
Because, if so, then we are screwed here.
588 XFRM_LOOKUP_ICMP);
589 rt2 = dst_rtable(dst2);
590 if (!IS_ERR(dst2)) {
591 dst_release(&rt->dst);
592 rt = rt2;
593 } else if (PTR_ERR(dst2) == -EPERM) {
594 if (rt)
595 dst_release(&rt->dst);
596 return rt2;
597 } else {
598 err = PTR_ERR(dst2);
599 goto relookup_failed;
600 }
601 return rt;
602
603 relookup_failed:
604 if (rt)
605 return rt;
606 return ERR_PTR(err);
607 }
This email is a free service from the Smatch-CI project [smatch.sf.net].
regards,
dan carpenter
^ permalink raw reply
* [bug report] ipv4: icmp: fix null-ptr-deref in icmp_build_probe()
From: Dan Carpenter @ 2026-04-10 10:16 UTC (permalink / raw)
To: Yiqi Sun, Fernando Fernandez Mancera; +Cc: Simon Horman, netdev
Hello Yiqi Sun,
Commit fde29fd93493 ("ipv4: icmp: fix null-ptr-deref in
icmp_build_probe()") from Apr 2, 2026 (linux-next), leads to the
following Smatch static checker warning:
net/ipv4/icmp.c:1351 icmp_build_probe()
warn: 'dev' is not an error pointer
net/ipv4/icmp.c
1341 #if IS_ENABLED(CONFIG_IPV6)
1342 case ICMP_AFI_IP6:
1343 if (iio->ident.addr.ctype3_hdr.addrlen != sizeof(struct in6_addr))
1344 goto send_mal_query;
1345 dev = ipv6_dev_find(net, &iio->ident.addr.ip_addr.ipv6_addr, dev);
1346
1347 /*
1348 * If IPv6 identifier lookup is unavailable, silently
1349 * discard the request instead of misreporting NO_IF.
1350 */
--> 1351 if (IS_ERR(dev))
1352 return false;
It looks like there were two patches that went in around the same
time. Commit fde29fd93493 ("ipv4: icmp: fix null-ptr-deref in
icmp_build_probe()") updated the checking for
ipv6_stub->ipv6_dev_find() but d98adfbdd5c0 ("ipv4: drop ipv6_stub usage
and use direct function calls") changed it to not return error pointers.
This IS_ERR() check can be removed.
1353
1354 dev_hold(dev);
1355 break;
1356 #endif
1357 default:
1358 goto send_mal_query;
1359 }
This email is a free service from the Smatch-CI project [smatch.sf.net].
regards,
dan carpenter
^ permalink raw reply
* [PATCH nf] netfilter: nf_tables: use RCU-safe list primitives for basechain hook list
From: Weiming Shi @ 2026-04-10 10:13 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Phil Sutter, Simon Horman, netfilter-devel, coreteam, netdev,
linux-kernel, Xiang Mei, Weiming Shi
NFT_MSG_GETCHAIN runs as an NFNL_CB_RCU callback, so chain dumps
traverse basechain->hook_list under rcu_read_lock() without holding
commit_mutex. Meanwhile, nft_delchain_hook() mutates that same live
hook_list with plain list_move() and list_splice(), and the commit/abort
paths splice hooks back with plain list_splice(). None of these are
RCU-safe list operations.
A concurrent GETCHAIN dump can observe partially updated list pointers,
follow them into stack-local or transaction-private list heads, and
crash when container_of() produces a bogus struct nft_hook pointer.
The PoC triggers this by racing GETCHAIN dumps against aborting DELCHAIN
hook updates, reachable from an unprivileged user namespace since all
capability checks use ns_capable() with CONFIG_NF_TABLES=y (default):
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
RIP: 0010:strlen (lib/string.c:420 (discriminator 1))
Call Trace:
<TASK>
nf_tables_fill_chain_info (net/netfilter/nf_tables_api.c:1987 (discriminator 1) net/netfilter/nf_tables_api.c:1992 (discriminator 1) net/netfilter/nf_tables_api.c:2028 (discriminator 1) net/netfilter/nf_tables_api.c:2077 (discriminator 1))
nf_tables_dump_chains (net/netfilter/nf_tables_api.c:2173 (discriminator 1))
netlink_dump (net/netlink/af_netlink.c:2325 (discriminator 1))
__netlink_dump_start (net/netlink/af_netlink.c:2442)
nf_tables_getchain (net/netfilter/nf_tables_api.c:1314 net/netfilter/nf_tables_api.c:2212)
nfnetlink_rcv_msg (net/netfilter/nfnetlink.c:290)
netlink_rcv_skb (net/netlink/af_netlink.c:2550)
nfnetlink_rcv (net/netfilter/nfnetlink.c:653)
netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344)
netlink_sendmsg (net/netlink/af_netlink.c:1894)
__sys_sendto (net/socket.c:727 net/socket.c:742 net/socket.c:2206)
__x64_sys_sendto (net/socket.c:2209)
</TASK>
Replace list_move() in nft_delchain_hook() with list_del_rcu() plus an
intermediate pointer array, followed by synchronize_rcu() before the
deleted hooks' list pointers are reused to link them into the
transaction's private list. In the error paths, put hooks back with
list_add_tail_rcu() which is safe for concurrent RCU readers (they
either continue to the original successor or see the list head and
terminate the walk).
Add nft_hook_list_splice_rcu() helper that splices entries from a
private list into a live RCU-protected list using individual
list_add_tail_rcu() calls instead of plain list_splice(). Use it in
the commit and abort paths for NEWCHAIN updates and DELCHAIN rollback.
Fixes: 7d937b107108 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
net/netfilter/nf_tables_api.c | 64 ++++++++++++++++++++++++++++++-----
1 file changed, 56 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 8c42247a176c7..62fcfefba7b0f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -391,6 +391,22 @@ static void nft_netdev_unregister_hooks(struct net *net,
}
}
+/* Splice hooks from a private list into a live (RCU-protected) hook list.
+ * Each entry is published individually via list_add_tail_rcu() so that
+ * concurrent RCU readers walking the destination list never observe torn
+ * list pointers.
+ */
+static void nft_hook_list_splice_rcu(struct list_head *from,
+ struct list_head *to)
+{
+ struct nft_hook *hook, *next;
+
+ list_for_each_entry_safe(hook, next, from, list) {
+ list_del(&hook->list);
+ list_add_tail_rcu(&hook->list, to);
+ }
+}
+
static int nf_tables_register_hook(struct net *net,
const struct nft_table *table,
struct nft_chain *chain)
@@ -3162,9 +3178,11 @@ static int nft_delchain_hook(struct nft_ctx *ctx,
const struct nlattr * const *nla = ctx->nla;
struct nft_chain_hook chain_hook = {};
struct nft_hook *this, *hook;
+ struct nft_hook **del_hooks;
LIST_HEAD(chain_del_list);
struct nft_trans *trans;
- int err;
+ int err, n = 0, i;
+ int max_hooks = 0;
if (ctx->table->flags & __NFT_TABLE_F_UPDATE)
return -EOPNOTSUPP;
@@ -3174,19 +3192,38 @@ static int nft_delchain_hook(struct nft_ctx *ctx,
if (err < 0)
return err;
+ list_for_each_entry(this, &chain_hook.list, list)
+ max_hooks++;
+
+ del_hooks = kcalloc(max_hooks, sizeof(*del_hooks), GFP_KERNEL);
+ if (!del_hooks) {
+ nft_chain_release_hook(&chain_hook);
+ return -ENOMEM;
+ }
+
list_for_each_entry(this, &chain_hook.list, list) {
hook = nft_hook_list_find(&basechain->hook_list, this);
if (!hook) {
err = -ENOENT;
goto err_chain_del_hook;
}
- list_move(&hook->list, &chain_del_list);
+ list_del_rcu(&hook->list);
+ del_hooks[n++] = hook;
}
+ /* Wait for any concurrent RCU readers (e.g. GETCHAIN dumps walking
+ * basechain->hook_list) to finish before modifying the removed hooks'
+ * list pointers to link them into the transaction's private list.
+ */
+ synchronize_rcu();
+
+ for (i = 0; i < n; i++)
+ list_add_tail(&del_hooks[i]->list, &chain_del_list);
+
trans = nft_trans_alloc_chain(ctx, NFT_MSG_DELCHAIN);
if (!trans) {
err = -ENOMEM;
- goto err_chain_del_hook;
+ goto err_chain_add_back;
}
nft_trans_basechain(trans) = basechain;
@@ -3194,13 +3231,24 @@ static int nft_delchain_hook(struct nft_ctx *ctx,
INIT_LIST_HEAD(&nft_trans_chain_hooks(trans));
list_splice(&chain_del_list, &nft_trans_chain_hooks(trans));
nft_chain_release_hook(&chain_hook);
+ kfree(del_hooks);
nft_trans_commit_list_add_tail(ctx->net, trans);
return 0;
+err_chain_add_back:
+ for (i = 0; i < n; i++)
+ list_add_tail_rcu(&del_hooks[i]->list, &basechain->hook_list);
+ kfree(del_hooks);
+ nft_chain_release_hook(&chain_hook);
+
+ return err;
+
err_chain_del_hook:
- list_splice(&chain_del_list, &basechain->hook_list);
+ for (i = 0; i < n; i++)
+ list_add_tail_rcu(&del_hooks[i]->list, &basechain->hook_list);
+ kfree(del_hooks);
nft_chain_release_hook(&chain_hook);
return err;
@@ -10912,8 +10960,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
nft_chain_commit_update(nft_trans_container_chain(trans));
nf_tables_chain_notify(&ctx, NFT_MSG_NEWCHAIN,
&nft_trans_chain_hooks(trans));
- list_splice(&nft_trans_chain_hooks(trans),
- &nft_trans_basechain(trans)->hook_list);
+ nft_hook_list_splice_rcu(&nft_trans_chain_hooks(trans),
+ &nft_trans_basechain(trans)->hook_list);
/* trans destroyed after rcu grace period */
} else {
nft_chain_commit_drop_policy(nft_trans_container_chain(trans));
@@ -11231,8 +11279,8 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
case NFT_MSG_DELCHAIN:
case NFT_MSG_DESTROYCHAIN:
if (nft_trans_chain_update(trans)) {
- list_splice(&nft_trans_chain_hooks(trans),
- &nft_trans_basechain(trans)->hook_list);
+ nft_hook_list_splice_rcu(&nft_trans_chain_hooks(trans),
+ &nft_trans_basechain(trans)->hook_list);
} else {
nft_use_inc_restore(&table->use);
nft_clear(trans->net, nft_trans_chain(trans));
--
2.43.0
^ permalink raw reply related
* [bug report] [NET]: Add Tehuti network driver.
From: Dan Carpenter @ 2026-04-10 10:13 UTC (permalink / raw)
To: Andy Gospodarek; +Cc: netdev
Hello Andy Gospodarek,
Commit 1a348ccc1047 ("[NET]: Add Tehuti network driver.") from Sep
17, 2007 (linux-next), leads to the following Smatch static checker
warning:
drivers/net/ethernet/tehuti/tehuti.c:2210 bdx_set_coalesce()
warn: no lower bound on 'rx_max_coal' rl='s32min-15'
drivers/net/ethernet/tehuti/tehuti.c
2179 static int bdx_set_coalesce(struct net_device *netdev,
2180 struct ethtool_coalesce *ecoal,
2181 struct kernel_ethtool_coalesce *kernel_coal,
2182 struct netlink_ext_ack *extack)
2183 {
2184 u32 rdintcm;
2185 u32 tdintcm;
2186 struct bdx_priv *priv = netdev_priv(netdev);
2187 int rx_coal;
2188 int tx_coal;
2189 int rx_max_coal;
2190 int tx_max_coal;
2191
2192 /* Check for valid input */
2193 rx_coal = ecoal->rx_coalesce_usecs / INT_COAL_MULT;
2194 tx_coal = ecoal->tx_coalesce_usecs / INT_COAL_MULT;
2195 rx_max_coal = ecoal->rx_max_coalesced_frames
2196 tx_max_coal = ecoal->tx_max_coalesced_frames;
2197
2198 /* Translate from packets to multiples of FIFO bytes */
2199 rx_max_coal =
2200 (((rx_max_coal * sizeof(struct rxf_desc)) + PCK_TH_MULT - 1)
2201 / PCK_TH_MULT);
2202 tx_max_coal =
2203 (((tx_max_coal * BDX_TXF_DESC_SZ) + PCK_TH_MULT - 1)
2204 / PCK_TH_MULT);
2205
2206 if ((rx_coal > 0x7FFF) || (tx_coal > 0x7FFF) ||
2207 (rx_max_coal > 0xF) || (tx_max_coal > 0xF))
Check rx_max_coal and tx_max_coal for negative values?
2208 return -EINVAL;
2209
--> 2210 rdintcm = INT_REG_VAL(rx_coal, GET_INT_COAL_RC(priv->rdintcm),
2211 GET_RXF_TH(priv->rdintcm), rx_max_coal);
^^^^^^^^^^^
2212 tdintcm = INT_REG_VAL(tx_coal, GET_INT_COAL_RC(priv->tdintcm), 0,
2213 tx_max_coal);
2214
2215 priv->rdintcm = rdintcm;
2216 priv->tdintcm = tdintcm;
2217
2218 WRITE_REG(priv, regRDINTCM0, rdintcm);
2219 WRITE_REG(priv, regTDINTCM0, tdintcm);
2220
2221 return 0;
2222 }
This email is a free service from the Smatch-CI project [smatch.sf.net].
regards,
dan carpenter
^ permalink raw reply
* [bug report] octeontx2-af: npc: cn20k: virtual index support
From: Dan Carpenter @ 2026-04-10 10:12 UTC (permalink / raw)
To: Ratheesh Kannoth; +Cc: netdev
Hello Ratheesh Kannoth,
Commit 645c6e3c1999 ("octeontx2-af: npc: cn20k: virtual index
support") from Feb 24, 2026 (linux-next), leads to the following
Smatch static checker warning:
drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c:3534 npc_defrag_alloc_free_slots()
warn: missing error code 'rc'
drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
3479 static int npc_defrag_alloc_free_slots(struct rvu *rvu,
3480 struct npc_defrag_node *f,
3481 int cnt, u16 *save)
3482 {
3483 int alloc_cnt1, alloc_cnt2;
3484 struct npc_subbank *sb;
3485 int rc, sb_off, i;
3486 bool deleted;
3487
3488 sb = &npc_priv.sb[f->idx];
3489
3490 alloc_cnt1 = 0;
3491 alloc_cnt2 = 0;
3492
3493 rc = __npc_subbank_alloc(rvu, sb,
3494 NPC_MCAM_KEY_X2, sb->b0b,
3495 sb->b0t,
3496 NPC_MCAM_LOWER_PRIO,
3497 false, cnt, save, cnt, true,
3498 &alloc_cnt1);
3499 if (alloc_cnt1 < cnt) {
3500 rc = __npc_subbank_alloc(rvu, sb,
3501 NPC_MCAM_KEY_X2, sb->b1b,
3502 sb->b1t,
3503 NPC_MCAM_LOWER_PRIO,
3504 false, cnt - alloc_cnt1,
3505 save + alloc_cnt1,
3506 cnt - alloc_cnt1,
3507 true, &alloc_cnt2);
3508 }
3509
3510 if (alloc_cnt1 + alloc_cnt2 != cnt) {
3511 dev_err(rvu->dev,
3512 "%s: Failed to alloc cnt=%u alloc_cnt1=%u alloc_cnt2=%u\n",
3513 __func__, cnt, alloc_cnt1, alloc_cnt2);
3514 goto fail_free_alloc;
3515 }
3516 return 0;
3517
3518 fail_free_alloc:
3519 for (i = 0; i < alloc_cnt1 + alloc_cnt2; i++) {
3520 rc = npc_mcam_idx_2_subbank_idx(rvu, save[i],
3521 &sb, &sb_off);
3522 if (rc) {
3523 dev_err(rvu->dev,
3524 "%s: Error to find subbank for mcam idx=%u\n",
3525 __func__, save[i]);
3526 break;
3527 }
3528
3529 deleted = __npc_subbank_free(rvu, sb, sb_off);
3530 if (!deleted) {
3531 dev_err(rvu->dev,
3532 "%s: Error to free mcam idx=%u\n",
3533 __func__, save[i]);
--> 3534 break;
Set an error code here?
3535 }
3536 }
3537
3538 return rc;
3539 }
This email is a free service from the Smatch-CI project [smatch.sf.net].
regards,
dan carpenter
^ permalink raw reply
* [bug report] octeontx2-af: npc: cn20k: add debugfs support
From: Dan Carpenter @ 2026-04-10 10:12 UTC (permalink / raw)
To: Ratheesh Kannoth; +Cc: netdev
Hello Ratheesh Kannoth,
Commit 528530dff56b ("octeontx2-af: npc: cn20k: add debugfs support")
from Feb 24, 2026 (linux-next), leads to the following Smatch static
checker warning:
drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c:257 npc_cn20k_debugfs_init() warn: 'npc_dentry' is an error pointer or valid
drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c:263 npc_cn20k_debugfs_init() warn: 'npc_dentry' is an error pointer or valid
drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c:268 npc_cn20k_debugfs_init() warn: 'npc_dentry' is an error pointer or valid
drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c:273 npc_cn20k_debugfs_init() warn: 'npc_dentry' is an error pointer or valid
drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c:278 npc_cn20k_debugfs_init() warn: 'npc_dentry' is an error pointer or valid
drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
249 int npc_cn20k_debugfs_init(struct rvu *rvu)
250 {
251 struct npc_priv_t *npc_priv = npc_priv_get();
252 struct dentry *npc_dentry;
253
254 npc_dentry = debugfs_create_file("mcam_layout", 0444, rvu->rvu_dbg.npc,
255 npc_priv, &npc_mcam_layout_fops);
256
257 if (!npc_dentry)
258 return -EFAULT;
This error checking is wrong, but instead of fixing it, just delete it.
See my blog for details:
https://staticthinking.wordpress.com/2023/07/24/debugfs-functions-are-not-supposed-to-be-checked/
259
260 npc_dentry = debugfs_create_file("mcam_default", 0444, rvu->rvu_dbg.npc,
261 rvu, &npc_mcam_default_fops);
262
263 if (!npc_dentry)
264 return -EFAULT;
265
266 npc_dentry = debugfs_create_file("vidx2idx", 0444, rvu->rvu_dbg.npc,
267 npc_priv, &npc_vidx2idx_map_fops);
268 if (!npc_dentry)
269 return -EFAULT;
270
271 npc_dentry = debugfs_create_file("idx2vidx", 0444, rvu->rvu_dbg.npc,
272 npc_priv, &npc_idx2vidx_map_fops);
273 if (!npc_dentry)
274 return -EFAULT;
275
276 npc_dentry = debugfs_create_file("defrag", 0444, rvu->rvu_dbg.npc,
277 npc_priv, &npc_defrag_fops);
--> 278 if (!npc_dentry)
279 return -EFAULT;
280
281 return 0;
282 }
This email is a free service from the Smatch-CI project [smatch.sf.net].
regards,
dan carpenter
^ permalink raw reply
* Re: [net-next PATCH v5 1/4] octeontx2-af: npa: cn20k: Add NPA Halo support
From: Subbaraya Sundeep @ 2026-04-10 10:11 UTC (permalink / raw)
To: Alexander Lobakin
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
bbhushan2, netdev, linux-kernel, Linu Cherian
In-Reply-To: <1dc56269-d6d5-48da-a4c2-0686ce4fd1f6@intel.com>
On 2026-04-10 at 15:06:56, Alexander Lobakin (aleksander.lobakin@intel.com) wrote:
> From: Subbaraya Sundeep <sbhatta@marvell.com>
> Date: Fri, 10 Apr 2026 15:05:36 +0530
>
> > On 2026-04-09 at 20:39:02, Alexander Lobakin (aleksander.lobakin@intel.com) wrote:
> >> From: Subbaraya Sundeep <sbhatta@marvell.com>
> >> Date: Thu, 9 Apr 2026 15:23:21 +0530
> >>
> >>> From: Linu Cherian <lcherian@marvell.com>
> >>>
> >>> CN20K silicon implements unified aura and pool context
> >>> type called Halo for better resource usage. Add support to
> >>> handle Halo context type operations.
> >>>
> >>> Signed-off-by: Linu Cherian <lcherian@marvell.com>
> >>> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
> >>
> >> [...]
> >>
> >>> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
> >>> index 763f6cabd7c2..2364bafd329d 100644
> >>> --- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
> >>> +++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
> >>> @@ -377,4 +377,85 @@ struct npa_cn20k_pool_s {
> >>>
> >>> static_assert(sizeof(struct npa_cn20k_pool_s) == NIX_MAX_CTX_SIZE);
> >>>
> >>> +struct npa_cn20k_halo_s {
> >>> + u64 stack_base : 64;
> >>
> >> It's redundant to add : 64 to a 64-bit field.
> > Agreed. But this is for readability, it helps when checking HRM. For
> > instance HRM says [703:640] and we define as u64 reserved_640_703 : 64;
> > so that we do not have to count bits in mind.
> >> Moreover, on 32-bit systems, the compilers sometimes complain on
> >> bitfields > 32 bits.
> > This driver depends on 64BIT.
> >>
> >>> + u64 ena : 1;
> >>> + u64 nat_align : 1;
> >>> + u64 reserved_66_67 : 2;
> >>> + u64 stack_caching : 1;
> >>> + u64 reserved_69_71 : 3;
> >>> + u64 aura_drop_ena : 1;
> >>> + u64 reserved_73_79 : 7;
> >>> + u64 aura_drop : 8;
> >>> + u64 buf_offset : 12;
> >>> + u64 reserved_100_103 : 4;
> >>> + u64 buf_size : 12;
> >>> + u64 reserved_116_119 : 4;
> >>> + u64 ref_cnt_prof : 3;
> >>> + u64 reserved_123_127 : 5;
> >>> + u64 stack_max_pages : 32;
> >>> + u64 stack_pages : 32;
> >>> + u64 bp_0 : 7;
> >>> + u64 bp_1 : 7;
> >>> + u64 bp_2 : 7;
> >>> + u64 bp_3 : 7;
> >>> + u64 bp_4 : 7;
> >>> + u64 bp_5 : 7;
> >>> + u64 bp_6 : 7;
> >>> + u64 bp_7 : 7;
> >>> + u64 bp_ena_0 : 1;
> >>> + u64 bp_ena_1 : 1;
> >>> + u64 bp_ena_2 : 1;
> >>> + u64 bp_ena_3 : 1;
> >>> + u64 bp_ena_4 : 1;
> >>> + u64 bp_ena_5 : 1;
> >>> + u64 bp_ena_6 : 1;
> >>> + u64 bp_ena_7 : 1;
> >>> + u64 stack_offset : 4;
> >>> + u64 reserved_260_263 : 4;
> >>> + u64 shift : 6;
> >>> + u64 reserved_270_271 : 2;
> >>> + u64 avg_level : 8;
> >>> + u64 avg_con : 9;
> >>> + u64 fc_ena : 1;
> >>> + u64 fc_stype : 2;
> >>> + u64 fc_hyst_bits : 4;
> >>> + u64 fc_up_crossing : 1;
> >>> + u64 reserved_297_299 : 3;
> >>> + u64 update_time : 16;
> >>> + u64 reserved_316_319 : 4;
> >>> + u64 fc_addr : 64;
> >>> + u64 ptr_start : 64;
> >>> + u64 ptr_end : 64;
> >>> + u64 bpid_0 : 12;
> >>> + u64 reserved_524_535 : 12;
> >>> + u64 err_int : 8;
> >>> + u64 err_int_ena : 8;
> >>> + u64 thresh_int : 1;
> >>> + u64 thresh_int_ena : 1;
> >>> + u64 thresh_up : 1;
> >>> + u64 reserved_555 : 1;
> >>> + u64 thresh_qint_idx : 7;
> >>> + u64 reserved_563 : 1;
> >>> + u64 err_qint_idx : 7;
> >>> + u64 reserved_571_575 : 5;
> >>> + u64 thresh : 36;
> >>> + u64 reserved_612_615 : 4;
> >>> + u64 fc_msh_dst : 11;
> >>> + u64 reserved_627_630 : 4;
> >>> + u64 op_dpc_ena : 1;
> >>> + u64 op_dpc_set : 5;
> >>> + u64 reserved_637_637 : 1;
> >>> + u64 stream_ctx : 1;
> >>> + u64 unified_ctx : 1;
> >>> + u64 reserved_640_703 : 64;
> >>> + u64 reserved_704_767 : 64;
> >>> + u64 reserved_768_831 : 64;
> >>> + u64 reserved_832_895 : 64;
> >>> + u64 reserved_896_959 : 64;
> >>> + u64 reserved_960_1023 : 64;
> >>> +};
> >>> +
> >>> +static_assert(sizeof(struct npa_cn20k_halo_s) == NIX_MAX_CTX_SIZE);
> >>
> >> Now the main question:
> >>
> >> Is mailbox's Endianness fixed (LE/BE)? Or is it always the same as the
> >> host's ones (I doubt so)?
> >> If not, these need to be __le{8,16,32,64} (or __be if it's Big Endian)
> >> and you need to handle the conversions manually.
> >>
> > Yes endianness is LE and fixed. This is NOT a host side driver for an
> > endpoint card. This is driver for on chip PCI device of CN20K soc.
> > Hope I answered your question wrt host.
>
> But the mailbox is shared between the SoC and the host or HW or not? Is
In hardware it is just shared DDR region between two on chip devices and both
devices access shared region using their BARs.
> it possible that one client of the mailbox will have LE and the second
> will have BE?
No not possible.
Thanks,
Sundeep
>
> >
> > Thanks,
> > Sundeep
>
> Thanks,
> Olek
^ permalink raw reply
* [PATCH v3] selftests: vsock: avoid races creating Unix socket paths
From: Cao Ruichuang @ 2026-04-10 10:07 UTC (permalink / raw)
To: Stefano Garzarella, Shuah Khan
Cc: Stefano Garzarella, Simon Horman, Bobby Eshleman, virtualization,
netdev, linux-kselftest, linux-kernel
In-Reply-To: <20260410035237.59644-1-create0818@163.com>
vmtest.sh currently uses mktemp -u to precompute Unix socket paths for the
namespace bridge helpers. That only returns an unused pathname and leaves a
time-of-check/time-of-use window before socat binds or connects to it.
Create a private temporary directory with mktemp -d and place the
socket path inside it instead. This removes the pathname race while
keeping cleanup straightforward.
Signed-off-by: Cao Ruichuang <create0818@163.com>
---
v3:
- restore the missing patch description
- add Bobby Eshleman to Cc
tools/testing/selftests/vsock/vmtest.sh | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 86e338886b3..c345fa539d3 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -718,6 +718,7 @@ test_ns_diff_global_host_connect_to_global_vm_ok() {
local pids pid pidfile
local ns0 ns1 port
declare -a pids
+ local unixdir
local unixfile
ns0="global0"
ns1="global1"
@@ -736,7 +737,8 @@ test_ns_diff_global_host_connect_to_global_vm_ok() {
oops_before=$(vm_dmesg_oops_count "${ns0}")
warn_before=$(vm_dmesg_warn_count "${ns0}")
- unixfile=$(mktemp -u /tmp/XXXX.sock)
+ unixdir=$(mktemp -d /tmp/vsock_vmtest_XXXXXX)
+ unixfile="${unixdir}/sock"
ip netns exec "${ns1}" \
socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \
UNIX-CONNECT:"${unixfile}" &
@@ -758,6 +760,8 @@ test_ns_diff_global_host_connect_to_global_vm_ok() {
terminate_pids "${pids[@]}"
terminate_pidfiles "${pidfile}"
+ rm "${unixfile}"
+ rmdir "${unixdir}"
if [[ "${rc}" -ne 0 ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
return "${KSFT_FAIL}"
@@ -814,6 +818,7 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
local ns0="global0"
local ns1="global1"
local port=12345
+ local unixdir
local unixfile
local dmesg_rc
local pidfile
@@ -826,7 +831,8 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}"
- unixfile=$(mktemp -u /tmp/XXXX.sock)
+ unixdir=$(mktemp -d /tmp/vsock_vmtest_XXXXXX)
+ unixfile="${unixdir}/sock"
ip netns exec "${ns0}" \
socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" &
@@ -845,7 +851,8 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
if ! vm_start "${pidfile}" "${ns0}"; then
log_host "failed to start vm (cid=${cid}, ns=${ns0})"
terminate_pids "${pids[@]}"
- rm -f "${unixfile}"
+ rm "${unixfile}"
+ rmdir "${unixdir}"
return "${KSFT_FAIL}"
fi
@@ -862,7 +869,8 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
terminate_pidfiles "${pidfile}"
terminate_pids "${pids[@]}"
- rm -f "${unixfile}"
+ rm "${unixfile}"
+ rmdir "${unixdir}"
if [[ "${rc}" -ne 0 ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
return "${KSFT_FAIL}"
--
2.39.5 (Apple Git-154)
^ permalink raw reply related
* Re: [PATCH v2] selftests: vsock: avoid races creating Unix socket paths
From: Cao Ruichuang @ 2026-04-10 10:05 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Stefano Garzarella, Simon Horman, Shuah Khan, Bobby Eshleman,
virtualization, netdev, linux-kselftest, linux-kernel
In-Reply-To: <adi1X_3wVUsDhRmR@sgarzare-redhat>
Sorry, that was my mistake.
I accidentally sent v2 without the patch description. I will resend it as v3
with the proper changelog and Cc Bobby as suggested.
Thanks for catching this.
^ permalink raw reply
* Re: [PATCH v2 1/2] mfd: nct6694: Switch to devm_mfd_add_devices() and drop IDA
From: Ming Yu @ 2026-04-10 9:59 UTC (permalink / raw)
To: Bartosz Golaszewski
Cc: tmyu0, linusw, linux, andi.shyti, lee, mkl, mailhol,
alexandre.belloni, wim, linux-kernel, linux-gpio, linux-i2c,
linux-can, netdev, linux-watchdog, linux-hwmon, linux-rtc,
linux-usb
In-Reply-To: <CAMRc=MeJL_po8HuBa4btVowR-e0i2FyzbDgNVo2u54iPKyuvWw@mail.gmail.com>
Hi Bart, all,
Thanks for the review.
Bartosz Golaszewski <brgl@kernel.org> 於 2026年4月8日週三 下午3:25寫道:
>
> On Wed, Apr 8, 2026 at 7:31 AM <a0282524688@gmail.com> wrote:
> >
> > From: Ming Yu <a0282524688@gmail.com>
> >
> > Currently, the nct6694 core driver uses mfd_add_hotplug_devices()
> > and an IDA to manage subdevice IDs.
> >
> > Switch the core implementation to use the managed
> > devm_mfd_add_devices() API, which simplifies the error handling and
> > device lifecycle management. Concurrently, drop the custom IDA
> > implementation and transition to using pdev->id.
> >
> > Signed-off-by: Ming Yu <a0282524688@gmail.com>
> > ---
>
> This does result in a nice code shrink but I'd split this commit into
> two: one switching to using MFD_CELL_BASIC() with hard-coded devices
> IDs and one completing the transition to devres.
>
You are right that this change is trying to do too much at once, and
splitting it as you suggested would make the series much cleaner.
After looking more closely at the ID handling and hotplug
implications, I realized that switching to devm_mfd_add_devices() and
dropping the IDA is not a good fit for this driver. The current
mfd_add_hotplug_devices() path uses PLATFORM_DEVID_AUTO, which gives
globally unique device IDs and avoids sysfs name collisions. If we
switch to devm_mfd_add_devices() with fixed IDs, multiple identical
NCT6694 devices can end up registering subdevices with the same
platform device names, which would break hotplug support when more
than one device is present.
So I think it is better not to pursue this direction further.
For the next revision, I will drop this part of the change and keep
the existing MFD core logic, including the IDA usage. The series will
focus on adding the nct6694-hif MFD driver only, and I will add the
IDA initialization there as needed.
Thanks again for the suggestion and review.
Best regards,
Ming
^ permalink raw reply
* Re: [bug report] octeontx2-af: npc: cn20k: Use common APIs
From: Ratheesh Kannoth @ 2026-04-10 9:52 UTC (permalink / raw)
To: Dan Carpenter; +Cc: Suman Ghosh, netdev
In-Reply-To: <adiQJvuKlEhq2ILx@stanley.mountain>
On 2026-04-10 at 11:22:38, Dan Carpenter (error27@gmail.com) wrote:
> Hello Suman Ghosh,
>
> Commit 6d1e70282f76 ("octeontx2-af: npc: cn20k: Use common APIs")
> from Feb 24, 2026 (linux-next), leads to the following Smatch static
> checker warning:
>
> drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c:1065 npc_cn20k_config_mcam_entry()
> error: uninitialized symbol 'kw_type'.
>
> drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
> 1045 void npc_cn20k_config_mcam_entry(struct rvu *rvu, int blkaddr, int index,
> 1046 u8 intf, struct cn20k_mcam_entry *entry,
> 1047 bool enable, u8 hw_prio, u8 req_kw_type)
> 1048 {
> 1049 struct npc_mcam *mcam = &rvu->hw->mcam;
> 1050 int mcam_idx = index % mcam->banksize;
> 1051 int bank = index / mcam->banksize;
> 1052 int kw = 0;
> 1053 u8 kw_type;
> 1054
> 1055 /* Disable before mcam entry update */
> 1056 npc_cn20k_enable_mcam_entry(rvu, blkaddr, index, false);
> 1057
> 1058 npc_mcam_idx_2_key_type(rvu, index, &kw_type);
>
> No error checking?
ACK. will push
>
> 1059 /* CAM1 takes the comparison value and
> 1060 * CAM0 specifies match for a bit in key being '0' or '1' or 'dontcare'.
> 1061 * CAM1<n> = 0 & CAM0<n> = 1 => match if key<n> = 0
> 1062 * CAM1<n> = 1 & CAM0<n> = 0 => match if key<n> = 1
> 1063 * CAM1<n> = 0 & CAM0<n> = 0 => always match i.e dontcare.
> 1064 */
> --> 1065 if (kw_type == NPC_MCAM_KEY_X2) {
> ^^^^^^^
kw_type is set by npc_mcam_idx_2_key_type() call above.
> Uninitialized
>
^ permalink raw reply
* Re: [PATCH net] ice: fix netdev bring-up and bring-down in self-test
From: Alexander Lobakin @ 2026-04-10 9:42 UTC (permalink / raw)
To: Tony Nguyen, Jakub Kicinski
Cc: Aleksandr Loktionov, intel-wired-lan, netdev, Konstantin Ilichev,
Grzegorz Nitka
In-Reply-To: <c5f6796a-66de-40df-9f12-2bc7ff04be34@intel.com>
From: Anthony Nguyen <anthony.l.nguyen@intel.com>
Date: Wed, 8 Apr 2026 14:12:28 -0700
>
>
> On 3/27/2026 12:23 AM, Aleksandr Loktionov wrote:
>> From: Konstantin Ilichev <konstantin.ilichev@intel.com>
>>
>> When an offline self-test is initiated with ethtool -t, any ongoing
>> traffic could get stuck because ice_stop() and ice_open() are called
>> without letting the OS know about state transitions. In most cases
>> a write() system call would block.
>>
>> Fix this by calling dev_change_flags() to bring the netdev up and
>> down, which ensures ndo_open()/ndo_stop() are called and all watchers
>> are notified correctly.
>
> + Olek
>
> AI review reports:
>
> The ethtool core acquires the per-netdev mutex via netdev_lock_ops(dev)
> before invoking the driver's .self_test callback. dev_change_flags() is
> an exported API that explicitly re-acquires this exact same lock:
> net/core/dev_api.c:dev_change_flags() {
> ...
> netdev_lock_ops(dev);
> ret = netif_change_flags(dev, flags, extack);
> netdev_unlock_ops(dev);
> ...
> }
> Because dev->lock is a standard, non-recursive mutex, this will result
> in a hard deadlock for any driver that opts into request_ops_lock. While
> ice might not currently set this flag, introducing nested lock
> acquisitions of the same mutex guarantees a deadlock as the subsystem
> migrates toward per-netdev locking.
>
>
> With ice netdev lock changes in progress [1], this would soon become an
> issue.
Hmmm, seems like we'll need to either export netif_change_flags() or
introduce something like dev_change_flags_locked()...
>
> Thanks,
> Tony
>
> [1] https://lore.kernel.org/netdev/20260325200644.2528726-4-
> anthony.l.nguyen@intel.com/
>
>> Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest")
>> Cc: stable@vger.kernel.org
>> Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
>> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
>> Signed-off-by: Konstantin Ilichev <konstantin.ilichev@intel.com>
>> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>> ---
>>
>> drivers/net/ethernet/intel/ice/ice_ethtool.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/
>> net/ethernet/intel/ice/ice_ethtool.c
>> index 96d95af..2a4f06f 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
>> @@ -1416,7 +1416,7 @@ ice_self_test(struct net_device *netdev, struct
>> ethtool_test *eth_test,
>> /* If the device is online then take it offline */
>> if (if_running)
>> /* indicate we're in test mode */
>> - ice_stop(netdev);
>> + dev_change_flags(netdev, netdev->flags & ~IFF_UP, NULL);
>> data[ICE_ETH_TEST_LINK] = ice_link_test(netdev);
>> data[ICE_ETH_TEST_EEPROM] = ice_eeprom_test(netdev);
>> @@ -1434,10 +1434,12 @@ ice_self_test(struct net_device *netdev,
>> struct ethtool_test *eth_test,
>> clear_bit(ICE_TESTING, pf->state);
>> if (if_running) {
>> - int status = ice_open(netdev);
>> + int status = dev_change_flags(netdev,
>> + netdev->flags | IFF_UP,
>> + NULL);
BTW are you sure there's no other way to do an ifdown/ifup cycle other
than sneaking into the flags? Have you checked whether there's an
exported generic function which calls .ndo_open()?
>> if (status) {
>> - dev_err(dev, "Could not open device %s, err %d\n",
>> + dev_err(dev, "Could not bring up device %s, err %d\n",
>> pf->int_name, status);
>> }
>> }
Thanks,
Olek
^ permalink raw reply
* [PATCH net-next] ppp: tear down bridge before clearing pch->chan
From: Qingfang Deng @ 2026-04-10 9:38 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Qingfang Deng, Yue Haibing, Kuniyuki Iwashima,
Kees Cook, Sebastian Andrzej Siewior, linux-ppp, netdev,
linux-kernel
Cc: Tom Parkin, James Chapman, Guillaume Nault
As we previously did to ppp_disconnect_channel(), also move
ppp_unbridge_channels() before pch->chan is set to NULL in
ppp_unregister_channel().
ppp_unbridge_channels() calls synchronize_rcu(), so no concurrent RCU
readers in ppp_channel_bridge_input() can observe the channel after its
chan pointer is cleared.
This makes the !pchb->chan check in ppp_channel_bridge_input()
redundant and can be safely removed.
Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev>
---
drivers/net/ppp/ppp_generic.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index b097d1b38ac9..3a609d48a424 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2285,17 +2285,11 @@ static bool ppp_channel_bridge_input(struct channel *pch, struct sk_buff *skb)
goto out_rcu;
spin_lock_bh(&pchb->downl);
- if (!pchb->chan) {
- /* channel got unregistered */
- kfree_skb(skb);
- goto outl;
- }
skb_scrub_packet(skb, !net_eq(pch->chan_net, pchb->chan_net));
if (!pchb->chan->ops->start_xmit(pchb->chan, skb))
kfree_skb(skb);
-outl:
spin_unlock_bh(&pchb->downl);
out_rcu:
rcu_read_unlock();
@@ -2997,6 +2991,8 @@ ppp_unregister_channel(struct ppp_channel *chan)
* the channel's start_xmit or ioctl routine before we proceed.
*/
ppp_disconnect_channel(pch);
+ ppp_unbridge_channels(pch);
+
down_write(&pch->chan_sem);
spin_lock_bh(&pch->downl);
pch->chan = NULL;
@@ -3008,8 +3004,6 @@ ppp_unregister_channel(struct ppp_channel *chan)
list_del(&pch->list);
spin_unlock_bh(&pn->all_channels_lock);
- ppp_unbridge_channels(pch);
-
pch->file.dead = 1;
wake_up_interruptible(&pch->file.rwait);
--
2.43.0
^ permalink raw reply related
* Re: [net-next PATCH v5 1/4] octeontx2-af: npa: cn20k: Add NPA Halo support
From: Alexander Lobakin @ 2026-04-10 9:36 UTC (permalink / raw)
To: Subbaraya Sundeep
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
bbhushan2, netdev, linux-kernel, Linu Cherian
In-Reply-To: <20260410093536.GA1783667@kernel-ep2>
From: Subbaraya Sundeep <sbhatta@marvell.com>
Date: Fri, 10 Apr 2026 15:05:36 +0530
> On 2026-04-09 at 20:39:02, Alexander Lobakin (aleksander.lobakin@intel.com) wrote:
>> From: Subbaraya Sundeep <sbhatta@marvell.com>
>> Date: Thu, 9 Apr 2026 15:23:21 +0530
>>
>>> From: Linu Cherian <lcherian@marvell.com>
>>>
>>> CN20K silicon implements unified aura and pool context
>>> type called Halo for better resource usage. Add support to
>>> handle Halo context type operations.
>>>
>>> Signed-off-by: Linu Cherian <lcherian@marvell.com>
>>> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
>>
>> [...]
>>
>>> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
>>> index 763f6cabd7c2..2364bafd329d 100644
>>> --- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
>>> +++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
>>> @@ -377,4 +377,85 @@ struct npa_cn20k_pool_s {
>>>
>>> static_assert(sizeof(struct npa_cn20k_pool_s) == NIX_MAX_CTX_SIZE);
>>>
>>> +struct npa_cn20k_halo_s {
>>> + u64 stack_base : 64;
>>
>> It's redundant to add : 64 to a 64-bit field.
> Agreed. But this is for readability, it helps when checking HRM. For
> instance HRM says [703:640] and we define as u64 reserved_640_703 : 64;
> so that we do not have to count bits in mind.
>> Moreover, on 32-bit systems, the compilers sometimes complain on
>> bitfields > 32 bits.
> This driver depends on 64BIT.
>>
>>> + u64 ena : 1;
>>> + u64 nat_align : 1;
>>> + u64 reserved_66_67 : 2;
>>> + u64 stack_caching : 1;
>>> + u64 reserved_69_71 : 3;
>>> + u64 aura_drop_ena : 1;
>>> + u64 reserved_73_79 : 7;
>>> + u64 aura_drop : 8;
>>> + u64 buf_offset : 12;
>>> + u64 reserved_100_103 : 4;
>>> + u64 buf_size : 12;
>>> + u64 reserved_116_119 : 4;
>>> + u64 ref_cnt_prof : 3;
>>> + u64 reserved_123_127 : 5;
>>> + u64 stack_max_pages : 32;
>>> + u64 stack_pages : 32;
>>> + u64 bp_0 : 7;
>>> + u64 bp_1 : 7;
>>> + u64 bp_2 : 7;
>>> + u64 bp_3 : 7;
>>> + u64 bp_4 : 7;
>>> + u64 bp_5 : 7;
>>> + u64 bp_6 : 7;
>>> + u64 bp_7 : 7;
>>> + u64 bp_ena_0 : 1;
>>> + u64 bp_ena_1 : 1;
>>> + u64 bp_ena_2 : 1;
>>> + u64 bp_ena_3 : 1;
>>> + u64 bp_ena_4 : 1;
>>> + u64 bp_ena_5 : 1;
>>> + u64 bp_ena_6 : 1;
>>> + u64 bp_ena_7 : 1;
>>> + u64 stack_offset : 4;
>>> + u64 reserved_260_263 : 4;
>>> + u64 shift : 6;
>>> + u64 reserved_270_271 : 2;
>>> + u64 avg_level : 8;
>>> + u64 avg_con : 9;
>>> + u64 fc_ena : 1;
>>> + u64 fc_stype : 2;
>>> + u64 fc_hyst_bits : 4;
>>> + u64 fc_up_crossing : 1;
>>> + u64 reserved_297_299 : 3;
>>> + u64 update_time : 16;
>>> + u64 reserved_316_319 : 4;
>>> + u64 fc_addr : 64;
>>> + u64 ptr_start : 64;
>>> + u64 ptr_end : 64;
>>> + u64 bpid_0 : 12;
>>> + u64 reserved_524_535 : 12;
>>> + u64 err_int : 8;
>>> + u64 err_int_ena : 8;
>>> + u64 thresh_int : 1;
>>> + u64 thresh_int_ena : 1;
>>> + u64 thresh_up : 1;
>>> + u64 reserved_555 : 1;
>>> + u64 thresh_qint_idx : 7;
>>> + u64 reserved_563 : 1;
>>> + u64 err_qint_idx : 7;
>>> + u64 reserved_571_575 : 5;
>>> + u64 thresh : 36;
>>> + u64 reserved_612_615 : 4;
>>> + u64 fc_msh_dst : 11;
>>> + u64 reserved_627_630 : 4;
>>> + u64 op_dpc_ena : 1;
>>> + u64 op_dpc_set : 5;
>>> + u64 reserved_637_637 : 1;
>>> + u64 stream_ctx : 1;
>>> + u64 unified_ctx : 1;
>>> + u64 reserved_640_703 : 64;
>>> + u64 reserved_704_767 : 64;
>>> + u64 reserved_768_831 : 64;
>>> + u64 reserved_832_895 : 64;
>>> + u64 reserved_896_959 : 64;
>>> + u64 reserved_960_1023 : 64;
>>> +};
>>> +
>>> +static_assert(sizeof(struct npa_cn20k_halo_s) == NIX_MAX_CTX_SIZE);
>>
>> Now the main question:
>>
>> Is mailbox's Endianness fixed (LE/BE)? Or is it always the same as the
>> host's ones (I doubt so)?
>> If not, these need to be __le{8,16,32,64} (or __be if it's Big Endian)
>> and you need to handle the conversions manually.
>>
> Yes endianness is LE and fixed. This is NOT a host side driver for an
> endpoint card. This is driver for on chip PCI device of CN20K soc.
> Hope I answered your question wrt host.
But the mailbox is shared between the SoC and the host or HW or not? Is
it possible that one client of the mailbox will have LE and the second
will have BE?
>
> Thanks,
> Sundeep
Thanks,
Olek
^ permalink raw reply
* Re: [net-next PATCH v5 1/4] octeontx2-af: npa: cn20k: Add NPA Halo support
From: Subbaraya Sundeep @ 2026-04-10 9:35 UTC (permalink / raw)
To: Alexander Lobakin
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
bbhushan2, netdev, linux-kernel, Linu Cherian
In-Reply-To: <00fc6c1e-638a-4e80-b3c4-14e4dfa3651a@intel.com>
On 2026-04-09 at 20:39:02, Alexander Lobakin (aleksander.lobakin@intel.com) wrote:
> From: Subbaraya Sundeep <sbhatta@marvell.com>
> Date: Thu, 9 Apr 2026 15:23:21 +0530
>
> > From: Linu Cherian <lcherian@marvell.com>
> >
> > CN20K silicon implements unified aura and pool context
> > type called Halo for better resource usage. Add support to
> > handle Halo context type operations.
> >
> > Signed-off-by: Linu Cherian <lcherian@marvell.com>
> > Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
>
> [...]
>
> > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
> > index 763f6cabd7c2..2364bafd329d 100644
> > --- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
> > +++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
> > @@ -377,4 +377,85 @@ struct npa_cn20k_pool_s {
> >
> > static_assert(sizeof(struct npa_cn20k_pool_s) == NIX_MAX_CTX_SIZE);
> >
> > +struct npa_cn20k_halo_s {
> > + u64 stack_base : 64;
>
> It's redundant to add : 64 to a 64-bit field.
Agreed. But this is for readability, it helps when checking HRM. For
instance HRM says [703:640] and we define as u64 reserved_640_703 : 64;
so that we do not have to count bits in mind.
> Moreover, on 32-bit systems, the compilers sometimes complain on
> bitfields > 32 bits.
This driver depends on 64BIT.
>
> > + u64 ena : 1;
> > + u64 nat_align : 1;
> > + u64 reserved_66_67 : 2;
> > + u64 stack_caching : 1;
> > + u64 reserved_69_71 : 3;
> > + u64 aura_drop_ena : 1;
> > + u64 reserved_73_79 : 7;
> > + u64 aura_drop : 8;
> > + u64 buf_offset : 12;
> > + u64 reserved_100_103 : 4;
> > + u64 buf_size : 12;
> > + u64 reserved_116_119 : 4;
> > + u64 ref_cnt_prof : 3;
> > + u64 reserved_123_127 : 5;
> > + u64 stack_max_pages : 32;
> > + u64 stack_pages : 32;
> > + u64 bp_0 : 7;
> > + u64 bp_1 : 7;
> > + u64 bp_2 : 7;
> > + u64 bp_3 : 7;
> > + u64 bp_4 : 7;
> > + u64 bp_5 : 7;
> > + u64 bp_6 : 7;
> > + u64 bp_7 : 7;
> > + u64 bp_ena_0 : 1;
> > + u64 bp_ena_1 : 1;
> > + u64 bp_ena_2 : 1;
> > + u64 bp_ena_3 : 1;
> > + u64 bp_ena_4 : 1;
> > + u64 bp_ena_5 : 1;
> > + u64 bp_ena_6 : 1;
> > + u64 bp_ena_7 : 1;
> > + u64 stack_offset : 4;
> > + u64 reserved_260_263 : 4;
> > + u64 shift : 6;
> > + u64 reserved_270_271 : 2;
> > + u64 avg_level : 8;
> > + u64 avg_con : 9;
> > + u64 fc_ena : 1;
> > + u64 fc_stype : 2;
> > + u64 fc_hyst_bits : 4;
> > + u64 fc_up_crossing : 1;
> > + u64 reserved_297_299 : 3;
> > + u64 update_time : 16;
> > + u64 reserved_316_319 : 4;
> > + u64 fc_addr : 64;
> > + u64 ptr_start : 64;
> > + u64 ptr_end : 64;
> > + u64 bpid_0 : 12;
> > + u64 reserved_524_535 : 12;
> > + u64 err_int : 8;
> > + u64 err_int_ena : 8;
> > + u64 thresh_int : 1;
> > + u64 thresh_int_ena : 1;
> > + u64 thresh_up : 1;
> > + u64 reserved_555 : 1;
> > + u64 thresh_qint_idx : 7;
> > + u64 reserved_563 : 1;
> > + u64 err_qint_idx : 7;
> > + u64 reserved_571_575 : 5;
> > + u64 thresh : 36;
> > + u64 reserved_612_615 : 4;
> > + u64 fc_msh_dst : 11;
> > + u64 reserved_627_630 : 4;
> > + u64 op_dpc_ena : 1;
> > + u64 op_dpc_set : 5;
> > + u64 reserved_637_637 : 1;
> > + u64 stream_ctx : 1;
> > + u64 unified_ctx : 1;
> > + u64 reserved_640_703 : 64;
> > + u64 reserved_704_767 : 64;
> > + u64 reserved_768_831 : 64;
> > + u64 reserved_832_895 : 64;
> > + u64 reserved_896_959 : 64;
> > + u64 reserved_960_1023 : 64;
> > +};
> > +
> > +static_assert(sizeof(struct npa_cn20k_halo_s) == NIX_MAX_CTX_SIZE);
>
> Now the main question:
>
> Is mailbox's Endianness fixed (LE/BE)? Or is it always the same as the
> host's ones (I doubt so)?
> If not, these need to be __le{8,16,32,64} (or __be if it's Big Endian)
> and you need to handle the conversions manually.
>
Yes endianness is LE and fixed. This is NOT a host side driver for an
endpoint card. This is driver for on chip PCI device of CN20K soc.
Hope I answered your question wrt host.
Thanks,
Sundeep
> Thanks,
> Olek
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-net 4/4] ice: fix ice_ptp_read_tx_hwtstamp_status_eth56g
From: Petr Oros @ 2026-04-10 9:34 UTC (permalink / raw)
To: Jacob Keller, Anthony Nguyen, Intel Wired LAN, netdev
Cc: Aleksandr Loktionov, Timothy Miskell
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-4-b959da91a81f@intel.com>
On 4/8/26 20:46, Jacob Keller wrote:
> The ice_ptp_read_tx_hwtstamp_status_eth56g function calls
> ice_read_phy_eth56g with a PHY index. However the function actually expects
> a port index. This causes the function to read the wrong PHY_PTP_INT_STATUS
> registers, and effectively makes the status wrong for the second set of
> ports from 4 to 7.
>
> The ice_read_phy_eth56g function uses the provided port index to determine
> which PHY device to read. We could refactor the entire chain to take a PHY
> index, but this would impact many code sites. Instead, multiply the PHY
> index by the number of ports, so that we read from the first port of each
> PHY.
>
> Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> index 64ad5ed5c688..672218e5d1f9 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> @@ -2219,13 +2219,19 @@ int ice_ptp_read_tx_hwtstamp_status_eth56g(struct ice_hw *hw, u32 *ts_status)
> *ts_status = 0;
>
> for (phy = 0; phy < params->num_phys; phy++) {
> + u8 port;
> int err;
>
> - err = ice_read_phy_eth56g(hw, phy, PHY_PTP_INT_STATUS, &status);
> + /* ice_read_phy_eth56g expects a port index, so use the first
> + * port of the PHY
> + */
> + port = phy * hw->ptp.ports_per_phy;
> +
> + err = ice_read_phy_eth56g(hw, port, PHY_PTP_INT_STATUS, &status);
> if (err)
> return err;
>
> - *ts_status |= (status & mask) << (phy * hw->ptp.ports_per_phy);
> + *ts_status |= (status & mask) << port;
> }
>
> ice_debug(hw, ICE_DBG_PTP, "PHY interrupt err: %x\n", *ts_status);
Reviewed-by: Petr Oros <poros@redhat.com>
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-net 3/4] ice: fix ready bitmap check for non-E822 devices
From: Petr Oros @ 2026-04-10 9:34 UTC (permalink / raw)
To: Jacob Keller, Anthony Nguyen, Intel Wired LAN, netdev
Cc: Aleksandr Loktionov, Timothy Miskell
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-3-b959da91a81f@intel.com>
On 4/8/26 20:46, Jacob Keller wrote:
> The E800 hardware (apart from E810) has a ready bitmap for the PHY
> indicating which timestamp slots currently have an outstanding timestamp
> waiting to be read by software.
>
> This bitmap is checked in multiple places using the
> ice_get_phy_tx_tstamp_ready():
>
> * ice_ptp_process_tx_tstamp() calls it to determine which timestamps to
> attempt reading from the PHY
> * ice_ptp_tx_tstamps_pending() calls it in a loop at the end of the
> miscellaneous IRQ to check if new timestamps came in while the interrupt
> handler was executing.
> * ice_ptp_maybe_trigger_tx_interrupt() calls it in the auxiliary work task
> to trigger a software interrupt in the event that the hardware logic
> gets stuck.
>
> For E82X devices, multiple PHYs share the same block, and the parameter
> passed to the ready bitmap is a block number associated with the given
> port. For E825-C devices, the PHYs have their own independent blocks and do
> not share, so the parameter passed needs to be the port number. For E810
> devices, the ice_get_phy_tx_tstamp_ready() always returns all 1s regardless
> of what port, since this hardware does not have a ready bitmap. Finally,
> for E830 devices, each PF has its own ready bitmap accessible via register,
> and the block parameter is unused.
>
> The first call correctly uses the Tx timestamp tracker block parameter to
> check the appropriate timestamp block. This works because the tracker is
> setup correctly for each timestamp device type.
>
> The second two callers behave incorrectly for all device types other than
> the older E822 devices. They both iterate in a loop using
> ICE_GET_QUAD_NUM() which is a macro only used by E822 devices. This logic
> is incorrect for devices other than the E822 devices.
>
> For E810 the calls would always return true, causing E810 devices to always
> attempt to trigger a software interrupt even when they have no reason to.
> For E830, this results in duplicate work as the ready bitmap is checked
> once per number of quads. Finally, for E825-C, this results in the pending
> checks failing to detect timestamps on ports other than the first two.
>
> Fix this by introducing a new hardware API function to ice_ptp_hw.c,
> ice_check_phy_tx_tstamp_ready(). This function will check if any timestamps
> are available and returns a positive value if any timestamps are pending.
> For E810, the function always returns false, so that the re-trigger checks
> never happen. For E830, check the ready bitmap just once. For E82x
> hardware, check each quad. Finally, for E825-C, check every port.
>
> The interface function returns an integer to enable reporting of error code
> if the driver is unable read the ready bitmap. This enables callers to
> handle this case properly. The previous implementation assumed that
> timestamps are available if they failed to read the bitmap. This is
> problematic as it could lead to continuous software IRQ triggering if the
> PHY timestamp registers somehow become inaccessible.
>
> This change is especially important for E825-C devices, as the missing
> checks could leave a window open where a new timestamp could arrive while
> the existing timestamps aren't completed. As a result, the hardware
> threshold logic would not trigger a new interrupt. Without the check, the
> timestamp is left unhandled, and new timestamps will not cause an interrupt
> again until the timestamp is handled. Since both the interrupt check and
> the backup check in the auxiliary task do not function properly, the device
> may have Tx timestamps permanently stuck failing on a given port.
>
> The faulty checks originate from commit d938a8cca88a ("ice: Auxbus devices
> & driver for E822 TS") and commit 712e876371f8 ("ice: periodically kick Tx
> timestamp interrupt"), however at the time of the original coding, both
> functions only operated on E822 hardware. This is no longer the case, and
> hasn't been since the introduction of the ETH56G PHY model in commit
> 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
>
> Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 1 +
> drivers/net/ethernet/intel/ice/ice_ptp.c | 40 ++++------
> drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 117 ++++++++++++++++++++++++++++
> 3 files changed, 132 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
> index 9d7acc7eb2ce..1b58b054f4a5 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
> @@ -300,6 +300,7 @@ void ice_ptp_reset_ts_memory(struct ice_hw *hw);
> int ice_ptp_init_phc(struct ice_hw *hw);
> void ice_ptp_init_hw(struct ice_hw *hw);
> int ice_get_phy_tx_tstamp_ready(struct ice_hw *hw, u8 block, u64 *tstamp_ready);
> +int ice_check_phy_tx_tstamp_ready(struct ice_hw *hw);
> int ice_ptp_one_port_cmd(struct ice_hw *hw, u8 configured_port,
> enum ice_ptp_tmr_cmd configured_cmd);
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
> index ada42bcc4d0b..34906f972d17 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
> @@ -2718,7 +2718,7 @@ static bool ice_any_port_has_timestamps(struct ice_pf *pf)
> bool ice_ptp_tx_tstamps_pending(struct ice_pf *pf)
> {
> struct ice_hw *hw = &pf->hw;
> - unsigned int i;
> + int ret;
>
> /* Check software indicator */
> switch (pf->ptp.tx_interrupt_mode) {
> @@ -2739,16 +2739,15 @@ bool ice_ptp_tx_tstamps_pending(struct ice_pf *pf)
> }
>
> /* Check hardware indicator */
> - for (i = 0; i < ICE_GET_QUAD_NUM(hw->ptp.num_lports); i++) {
> - u64 tstamp_ready = 0;
> - int err;
> -
> - err = ice_get_phy_tx_tstamp_ready(&pf->hw, i, &tstamp_ready);
> - if (err || tstamp_ready)
> - return true;
> + ret = ice_check_phy_tx_tstamp_ready(hw);
> + if (ret < 0) {
> + dev_dbg(ice_pf_to_dev(pf), "Unable to read PHY Tx timestamp ready bitmap, err %d\n",
> + ret);
> + /* Stop triggering IRQs if we're unable to read PHY */
> + return false;
> }
>
> - return false;
> + return ret;
> }
>
> /**
> @@ -2832,8 +2831,7 @@ static void ice_ptp_maybe_trigger_tx_interrupt(struct ice_pf *pf)
> {
> struct device *dev = ice_pf_to_dev(pf);
> struct ice_hw *hw = &pf->hw;
> - bool trigger_oicr = false;
> - unsigned int i;
> + int ret;
>
> if (!pf->ptp.port.tx.has_ready_bitmap)
> return;
> @@ -2841,21 +2839,11 @@ static void ice_ptp_maybe_trigger_tx_interrupt(struct ice_pf *pf)
> if (!ice_pf_src_tmr_owned(pf))
> return;
>
> - for (i = 0; i < ICE_GET_QUAD_NUM(hw->ptp.num_lports); i++) {
> - u64 tstamp_ready;
> - int err;
> -
> - err = ice_get_phy_tx_tstamp_ready(&pf->hw, i, &tstamp_ready);
> - if (!err && tstamp_ready) {
> - trigger_oicr = true;
> - break;
> - }
> - }
> -
> - if (trigger_oicr) {
> - /* Trigger a software interrupt, to ensure this data
> - * gets processed.
> - */
> + ret = ice_check_phy_tx_tstamp_ready(hw);
> + if (ret < 0) {
> + dev_dbg(dev, "PTP periodic task unable to read PHY timestamp ready bitmap, err %d\n",
> + ret);
> + } else if (ret) {
> dev_dbg(dev, "PTP periodic task detected waiting timestamps. Triggering Tx timestamp interrupt now.\n");
>
> wr32(hw, PFINT_OICR, PFINT_OICR_TSYN_TX_M);
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> index 441b5f10e4bb..64ad5ed5c688 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> @@ -2168,6 +2168,35 @@ int ice_start_phy_timer_eth56g(struct ice_hw *hw, u8 port)
> return 0;
> }
>
> +/**
> + * ice_check_phy_tx_tstamp_ready_eth56g - Check Tx memory status for all ports
> + * @hw: pointer to the HW struct
> + *
> + * Check the PHY_REG_TX_MEMORY_STATUS for all ports. A set bit indicates
> + * a waiting timestamp.
> + *
> + * Return: 1 if any port has at least one timestamp ready bit set,
> + * 0 otherwise, and a negative error code if unable to read the bitmap.
> + */
> +static int ice_check_phy_tx_tstamp_ready_eth56g(struct ice_hw *hw)
> +{
> + int port;
> +
> + for (port = 0; port < hw->ptp.num_lports; port++) {
> + u64 tstamp_ready;
> + int err;
> +
> + err = ice_get_phy_tx_tstamp_ready(hw, port, &tstamp_ready);
> + if (err)
> + return err;
> +
> + if (tstamp_ready)
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> /**
> * ice_ptp_read_tx_hwtstamp_status_eth56g - Get TX timestamp status
> * @hw: pointer to the HW struct
> @@ -4318,6 +4347,35 @@ ice_get_phy_tx_tstamp_ready_e82x(struct ice_hw *hw, u8 quad, u64 *tstamp_ready)
> return 0;
> }
>
> +/**
> + * ice_check_phy_tx_tstamp_ready_e82x - Check Tx memory status for all quads
> + * @hw: pointer to the HW struct
> + *
> + * Check the Q_REG_TX_MEMORY_STATUS for all quads. A set bit indicates
> + * a waiting timestamp.
> + *
> + * Return: 1 if any quad has at least one timestamp ready bit set,
> + * 0 otherwise, and a negative error value if unable to read the bitmap.
> + */
> +static int ice_check_phy_tx_tstamp_ready_e82x(struct ice_hw *hw)
> +{
> + int quad;
> +
> + for (quad = 0; quad < ICE_GET_QUAD_NUM(hw->ptp.num_lports); quad++) {
> + u64 tstamp_ready;
> + int err;
> +
> + err = ice_get_phy_tx_tstamp_ready(hw, quad, &tstamp_ready);
> + if (err)
> + return err;
> +
> + if (tstamp_ready)
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> /**
> * ice_phy_cfg_intr_e82x - Configure TX timestamp interrupt
> * @hw: pointer to the HW struct
> @@ -4870,6 +4928,23 @@ ice_get_phy_tx_tstamp_ready_e810(struct ice_hw *hw, u8 port, u64 *tstamp_ready)
> return 0;
> }
>
> +/**
> + * ice_check_phy_tx_tstamp_ready_e810 - Check Tx memory status register
> + * @hw: pointer to the HW struct
> + *
> + * The E810 devices do not have a Tx memory status register. Note this is
> + * intentionally different behavior from ice_get_phy_tx_tstamp_ready_e810
> + * which always says that all bits are ready. This function is called in cases
> + * where code will trigger interrupts if timestamps are waiting, and should
> + * not be called for E810 hardware.
> + *
> + * Return: 0.
> + */
> +static int ice_check_phy_tx_tstamp_ready_e810(struct ice_hw *hw)
> +{
> + return 0;
> +}
> +
> /* E810 SMA functions
> *
> * The following functions operate specifically on E810 hardware and are used
> @@ -5124,6 +5199,21 @@ static void ice_get_phy_tx_tstamp_ready_e830(const struct ice_hw *hw, u8 port,
> *tstamp_ready |= rd32(hw, E830_PRTMAC_TS_TX_MEM_VALID_L);
> }
>
> +/**
> + * ice_check_phy_tx_tstamp_ready_e830 - Check Tx memory status register
> + * @hw: pointer to the HW struct
> + *
> + * Return: 1 if the device has waiting timestamps, 0 otherwise.
> + */
> +static int ice_check_phy_tx_tstamp_ready_e830(struct ice_hw *hw)
> +{
> + u64 tstamp_ready;
> +
> + ice_get_phy_tx_tstamp_ready_e830(hw, 0, &tstamp_ready);
> +
> + return !!tstamp_ready;
> +}
> +
> /**
> * ice_ptp_init_phy_e830 - initialize PHY parameters
> * @ptp: pointer to the PTP HW struct
> @@ -5716,6 +5806,33 @@ int ice_get_phy_tx_tstamp_ready(struct ice_hw *hw, u8 block, u64 *tstamp_ready)
> }
> }
>
> +/**
> + * ice_check_phy_tx_tstamp_ready - Check PHY Tx timestamp memory status
> + * @hw: pointer to the HW struct
> + *
> + * Check the PHY for Tx timestamp memory status on all ports. If you need to
> + * see individual timestamp status for each index, use
> + * ice_get_phy_tx_tstamp_ready() instead.
> + *
> + * Return: 1 if any port has timestamps available, 0 if there are no timestamps
> + * available, and a negative error code on failure.
> + */
> +int ice_check_phy_tx_tstamp_ready(struct ice_hw *hw)
> +{
> + switch (hw->mac_type) {
> + case ICE_MAC_E810:
> + return ice_check_phy_tx_tstamp_ready_e810(hw);
> + case ICE_MAC_E830:
> + return ice_check_phy_tx_tstamp_ready_e830(hw);
> + case ICE_MAC_GENERIC:
> + return ice_check_phy_tx_tstamp_ready_e82x(hw);
> + case ICE_MAC_GENERIC_3K_E825:
> + return ice_check_phy_tx_tstamp_ready_eth56g(hw);
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> /**
> * ice_cgu_get_pin_desc_e823 - get pin description array
> * @hw: pointer to the hw struct
>
Reviewed-by: Petr Oros <poros@redhat.com>
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-net 2/4] ice: perform PHY soft reset for E825C ports at initialization
From: Petr Oros @ 2026-04-10 9:33 UTC (permalink / raw)
To: Jacob Keller, Anthony Nguyen, Intel Wired LAN, netdev
Cc: Aleksandr Loktionov, Timothy Miskell
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-2-b959da91a81f@intel.com>
On 4/8/26 20:46, Jacob Keller wrote:
> From: Grzegorz Nitka <grzegorz.nitka@intel.com>
>
> In some cases the PHY timestamp block of the E825C can become stuck. This
> is known to occur if the software writes 0 to the Tx timestamp threshold,
> and with older versions of the ice driver the threshold configuration is
> buggy and can race in such that hardware briefly operates with a zero
> threshold enabled. There are no other known ways to trigger this behavior,
> but once it occurs, the hardware is not recovered by normal reset, a driver
> reload, or even a warm power cycle of the system. A cold power cycle is
> sufficient to recover hardware, but this is extremely invasive and can
> result in significant downtime on customer deployments.
>
> The PHY for each port has a timestamping block which has its own reset
> functionality accessible by programming the PHY_REG_GLOBAL register.
> Writing to the PHY_REG_GLOBAL_SOFT_RESET_BIT triggers the hardware to
> perform a complete reset of the timestamping block of the PHY. This
> includes clearing the timestamp status for the port, clearing all
> outstanding timestamps in the memory bank, and resetting the PHY timer.
>
> The new ice_ptp_phy_soft_reset_eth56g() function toggles the
> PHY_REG_GLOBAL soft reset bit with the required delays, ensuring the
> PHY is properly reinitialized without requiring a full device reset.
> The sequence clears the reset bit, asserts it, then clears it again,
> with short waits between transitions to allow hardware stabilization.
>
> Call this function in the new ice_ptp_init_phc_e825c(), implementing the
> E825C device specific variant of the ice_ptp_init_phc(). Note that if
> ice_ptp_init_phc() fails, PTP functionality may be disabled, but the driver
> will still load to allow basic functionality to continue.
>
> This causes the clock owning PF driver to perform a PHY soft reset for
> every port during initialization. This ensures the driver begins life in a
> known functional state regardless of how it was previously programmed.
>
> This ensures that we properly reconfigure the hardware after a device reset
> or when loading the driver, even if it was previously misconfigured with an
> out-of-date or modified driver.
>
> Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
> Signed-off-by: Timothy Miskell <timothy.miskell@intel.com>
> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 4 ++
> drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 90 ++++++++++++++++++++++++++++-
> 2 files changed, 93 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
> index 5896b346e579..9d7acc7eb2ce 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h
> @@ -374,6 +374,7 @@ int ice_stop_phy_timer_eth56g(struct ice_hw *hw, u8 port, bool soft_reset);
> int ice_start_phy_timer_eth56g(struct ice_hw *hw, u8 port);
> int ice_phy_cfg_intr_eth56g(struct ice_hw *hw, u8 port, bool ena, u8 threshold);
> int ice_phy_cfg_ptp_1step_eth56g(struct ice_hw *hw, u8 port);
> +int ice_ptp_phy_soft_reset_eth56g(struct ice_hw *hw, u8 port);
>
> #define ICE_ETH56G_NOMINAL_INCVAL 0x140000000ULL
> #define ICE_ETH56G_NOMINAL_PCS_REF_TUS 0x100000000ULL
> @@ -676,6 +677,9 @@ static inline u64 ice_get_base_incval(struct ice_hw *hw)
> #define ICE_P0_GNSS_PRSNT_N BIT(4)
>
> /* ETH56G PHY register addresses */
> +#define PHY_REG_GLOBAL 0x0
> +#define PHY_REG_GLOBAL_SOFT_RESET_M BIT(11)
> +
> /* Timestamp PHY incval registers */
> #define PHY_REG_TIMETUS_L 0x8
> #define PHY_REG_TIMETUS_U 0xC
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> index 67775beb9449..441b5f10e4bb 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> @@ -377,6 +377,31 @@ static void ice_ptp_cfg_sync_delay(const struct ice_hw *hw, u32 delay)
> * The following functions operate on devices with the ETH 56G PHY.
> */
>
> +/**
> + * ice_ptp_init_phc_e825c - Perform E825C specific PHC initialization
> + * @hw: pointer to HW struct
> + *
> + * Perform E825C-specific PTP hardware clock initialization steps.
> + *
> + * Return: 0 on success, or a negative error value on failure.
> + */
> +static int ice_ptp_init_phc_e825c(struct ice_hw *hw)
> +{
> + int err;
> +
> + /* Soft reset all ports, to ensure everything is at a clean state */
> + for (int port = 0; port < hw->ptp.num_lports; port++) {
> + err = ice_ptp_phy_soft_reset_eth56g(hw, port);
> + if (err) {
> + ice_debug(hw, ICE_DBG_PTP, "Failed to soft reset port %d, err %d\n",
> + port, err);
> + return err;
> + }
> + }
> +
> + return 0;
> +}
> +
> /**
> * ice_ptp_get_dest_dev_e825 - get destination PHY for given port number
> * @hw: pointer to the HW struct
> @@ -2179,6 +2204,69 @@ int ice_ptp_read_tx_hwtstamp_status_eth56g(struct ice_hw *hw, u32 *ts_status)
> return 0;
> }
>
> +/**
> + * ice_ptp_phy_soft_reset_eth56g - Perform a PHY soft reset on ETH56G
> + * @hw: pointer to the HW structure
> + * @port: PHY port number
> + *
> + * Trigger a soft reset of the ETH56G PHY by toggling the soft reset
> + * bit in the PHY global register. The reset sequence consists of:
> + * 1. Clearing the soft reset bit
> + * 2. Asserting the soft reset bit
> + * 3. Clearing the soft reset bit again
> + *
> + * Short delays are inserted between each step to allow the hardware
> + * to settle. This provides a controlled way to reinitialize the PHY
> + * without requiring a full device reset.
> + *
> + * Return: 0 on success, or a negative error code on failure when
> + * reading or writing the PHY register.
> + */
> +int ice_ptp_phy_soft_reset_eth56g(struct ice_hw *hw, u8 port)
> +{
> + u32 global_val;
> + int err;
> +
> + err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, &global_val);
> + if (err) {
> + ice_debug(hw, ICE_DBG_PTP, "Failed to read PHY_REG_GLOBAL for port %d, err %d\n",
> + port, err);
> + return err;
> + }
> +
> + global_val &= ~PHY_REG_GLOBAL_SOFT_RESET_M;
> + ice_debug(hw, ICE_DBG_PTP, "Clearing soft reset bit for port %d, val: 0x%x\n",
> + port, global_val);
> + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val);
> + if (err) {
> + ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n",
> + port, err);
> + return err;
> + }
> +
> + usleep_range(5000, 6000);
> +
> + global_val |= PHY_REG_GLOBAL_SOFT_RESET_M;
> + ice_debug(hw, ICE_DBG_PTP, "Set soft reset bit for port %d, val: 0x%x\n",
> + port, global_val);
> + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val);
> + if (err) {
> + ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n",
> + port, err);
> + return err;
> + }
> + usleep_range(5000, 6000);
> +
> + global_val &= ~PHY_REG_GLOBAL_SOFT_RESET_M;
> + ice_debug(hw, ICE_DBG_PTP, "Clear soft reset bit for port %d, val: 0x%x\n",
> + port, global_val);
> + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val);
> + if (err)
> + ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n",
> + port, err);
> + return err;
> +}
> +
> /**
> * ice_get_phy_tx_tstamp_ready_eth56g - Read the Tx memory status register
> * @hw: pointer to the HW struct
> @@ -5591,7 +5679,7 @@ int ice_ptp_init_phc(struct ice_hw *hw)
> case ICE_MAC_GENERIC:
> return ice_ptp_init_phc_e82x(hw);
> case ICE_MAC_GENERIC_3K_E825:
> - return 0;
> + return ice_ptp_init_phc_e825c(hw);
> default:
> return -EOPNOTSUPP;
> }
>
Reviewed-by: Petr Oros <poros@redhat.com>
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-net 1/4] ice: fix timestamp interrupt configuration for E825C
From: Petr Oros @ 2026-04-10 9:33 UTC (permalink / raw)
To: Jacob Keller, Anthony Nguyen, Intel Wired LAN, netdev
Cc: Aleksandr Loktionov, Timothy Miskell
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-1-b959da91a81f@intel.com>
On 4/8/26 20:46, Jacob Keller wrote:
> From: Grzegorz Nitka <grzegorz.nitka@intel.com>
>
> The E825C ice_phy_cfg_intr_eth56g() function is responsible for programming
> the PHY interrupt for a given port. This function writes to the
> PHY_REG_TS_INT_CONFIG register of the port. The register is responsible for
> configuring whether the port interrupt logic is enabled, as well as
> programming the threshold of waiting timestamps that will trigger an
> interrupt from this port.
>
> This threshold value must not be programmed to zero while the interrupt is
> enabled. Doing so puts the port in a misconfigured state where the PHY
> timestamp interrupt for the quad of connected ports will become stuck.
>
> This occurs, because a threshold of zero results in the timestamp interrupt
> status for the port becoming stuck high. The four ports in the connected
> quad have their timestamp status indicators muxed together. A new interrupt
> cannot be generated until the timestamp status indicators return low for
> all four ports.
>
> Normally, the timestamp status for a port will clear once there are fewer
> timestamps in that ports timestamp memory bank than the threshold. A
> threshold of zero makes this impossible, so the timestamp status for the
> port does not clear.
>
> The ice driver never intentionally programs the threshold to zero, indeed
> the driver always programs it to a value of 1, intending to get an
> interrupt immediately as soon as even a single packet is waiting for a
> timestamp.
>
> However, there is a subtle flaw in the programming logic in the
> ice_phy_cfg_intr_eth56g() function. Due to the way that the hardware
> handles enabling the PHY interrupt. If the threshold value is modified at
> the same time as the interrupt is enabled, the HW PHY state machine might
> enable the interrupt before the new threshold value is actually updated.
> This leaves a potential race condition caused by the hardware logic where
> a PHY timestamp interrupt might be triggered before the non-zero threshold
> is written, resulting in the PHY timestamp logic becoming stuck.
>
> Once the PHY timestamp status is stuck high, it will remain stuck even
> after attempting to reprogram the PHY block by changing its threshold or
> disabling the interrupt. Even a typical PF or CORE reset will not reset the
> particular block of the PHY that becomes stuck. Even a warm power cycle is
> not guaranteed to cause the PHY block to reset, and a cold power cycle is
> required.
>
> Prevent this by always writing the PHY_REG_TS_INT_CONFIG in two stages.
> First write the threshold value with the interrupt disabled, and only write
> the enable bit after the threshold has been programmed. When disabling the
> interrupt, leave the threshold unchanged. Additionally, re-read the
> register after writing it to guarantee that the write to the PHY has been
> flushed upon exit of the function.
>
> While we're modifying this function implementation, explicitly reject
> programming a threshold of 0 when enabling the interrupt. No caller does
> this today, but the consequences of doing so are significant. An explicit
> rejection in the code makes this clear.
>
> Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 36 +++++++++++++++++++++++++----
> 1 file changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> index e3db252c3918..67775beb9449 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c
> @@ -1847,6 +1847,8 @@ static int ice_phy_cfg_mac_eth56g(struct ice_hw *hw, u8 port)
> * @ena: enable or disable interrupt
> * @threshold: interrupt threshold
> *
> + * The threshold cannot be 0 while the interrupt is enabled.
> + *
> * Configure TX timestamp interrupt for the specified port
> *
> * Return:
> @@ -1858,19 +1860,45 @@ int ice_phy_cfg_intr_eth56g(struct ice_hw *hw, u8 port, bool ena, u8 threshold)
> int err;
> u32 val;
>
> + if (ena && !threshold)
> + return -EINVAL;
> +
> err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, &val);
> if (err)
> return err;
>
> + val &= ~PHY_TS_INT_CONFIG_ENA_M;
> if (ena) {
> - val |= PHY_TS_INT_CONFIG_ENA_M;
> val &= ~PHY_TS_INT_CONFIG_THRESHOLD_M;
> val |= FIELD_PREP(PHY_TS_INT_CONFIG_THRESHOLD_M, threshold);
> - } else {
> - val &= ~PHY_TS_INT_CONFIG_ENA_M;
> + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG,
> + val);
> + if (err) {
> + ice_debug(hw, ICE_DBG_PTP,
> + "Failed to update 'threshold' PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n",
> + port, !!ena, threshold);
> + return err;
> + }
> + val |= PHY_TS_INT_CONFIG_ENA_M;
> }
>
> - return ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, val);
> + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, val);
> + if (err) {
> + ice_debug(hw, ICE_DBG_PTP,
> + "Failed to update 'ena' PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n",
> + port, !!ena, threshold);
> + return err;
> + }
> +
> + err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, &val);
> + if (err) {
> + ice_debug(hw, ICE_DBG_PTP,
> + "Failed to read PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n",
> + port, !!ena, threshold);
> + return err;
> + }
> +
> + return 0;
> }
>
> /**
>
Reviewed-by: Petr Oros <poros@redhat.com>
^ permalink raw reply
* Re: [PATCH v2] selftests: vsock: avoid races creating Unix socket paths
From: Stefano Garzarella @ 2026-04-10 9:06 UTC (permalink / raw)
To: Cao Ruichuang, Bobby Eshleman
Cc: stefano.garzarella, shuah, virtualization, netdev,
linux-kselftest, linux-kernel, horms
In-Reply-To: <adi1X_3wVUsDhRmR@sgarzare-redhat>
On Fri, 10 Apr 2026 at 10:33, Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Fri, Apr 10, 2026 at 11:52:37AM +0800, Cao Ruichuang wrote:
>
> No patch description at all?
Also, maybe better to CC Bobby.
Stefano
>
> >Signed-off-by: Cao Ruichuang <create0818@163.com>
> >---
> >v2:
> >- retitle the patch to describe the race being fixed
> >- replace rm -rf with explicit rm and rmdir cleanup
> >
> > tools/testing/selftests/vsock/vmtest.sh | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> >
> >diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
> >index 86e338886b3..c345fa539d3 100755
> >--- a/tools/testing/selftests/vsock/vmtest.sh
> >+++ b/tools/testing/selftests/vsock/vmtest.sh
> >@@ -718,6 +718,7 @@ test_ns_diff_global_host_connect_to_global_vm_ok() {
> > local pids pid pidfile
> > local ns0 ns1 port
> > declare -a pids
> >+ local unixdir
> > local unixfile
> > ns0="global0"
> > ns1="global1"
> >@@ -736,7 +737,8 @@ test_ns_diff_global_host_connect_to_global_vm_ok() {
> > oops_before=$(vm_dmesg_oops_count "${ns0}")
> > warn_before=$(vm_dmesg_warn_count "${ns0}")
> >
> >- unixfile=$(mktemp -u /tmp/XXXX.sock)
> >+ unixdir=$(mktemp -d /tmp/vsock_vmtest_XXXXXX)
> >+ unixfile="${unixdir}/sock"
> > ip netns exec "${ns1}" \
> > socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \
> > UNIX-CONNECT:"${unixfile}" &
> >@@ -758,6 +760,8 @@ test_ns_diff_global_host_connect_to_global_vm_ok() {
> >
> > terminate_pids "${pids[@]}"
> > terminate_pidfiles "${pidfile}"
> >+ rm "${unixfile}"
> >+ rmdir "${unixdir}"
> >
> > if [[ "${rc}" -ne 0 ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
> > return "${KSFT_FAIL}"
> >@@ -814,6 +818,7 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
> > local ns0="global0"
> > local ns1="global1"
> > local port=12345
> >+ local unixdir
> > local unixfile
> > local dmesg_rc
> > local pidfile
> >@@ -826,7 +831,8 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
> >
> > log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}"
> >
> >- unixfile=$(mktemp -u /tmp/XXXX.sock)
> >+ unixdir=$(mktemp -d /tmp/vsock_vmtest_XXXXXX)
> >+ unixfile="${unixdir}/sock"
> >
> > ip netns exec "${ns0}" \
> > socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" &
> >@@ -845,7 +851,8 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
> > if ! vm_start "${pidfile}" "${ns0}"; then
> > log_host "failed to start vm (cid=${cid}, ns=${ns0})"
> > terminate_pids "${pids[@]}"
> >- rm -f "${unixfile}"
> >+ rm "${unixfile}"
> >+ rmdir "${unixdir}"
> > return "${KSFT_FAIL}"
> > fi
> >
> >@@ -862,7 +869,8 @@ test_ns_diff_global_vm_connect_to_global_host_ok() {
> >
> > terminate_pidfiles "${pidfile}"
> > terminate_pids "${pids[@]}"
> >- rm -f "${unixfile}"
> >+ rm "${unixfile}"
> >+ rmdir "${unixdir}"
> >
> > if [[ "${rc}" -ne 0 ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
> > return "${KSFT_FAIL}"
> >--
> >2.39.5 (Apple Git-154)
> >
> >
^ permalink raw reply
* [PATCH net-next] net: lan743x: rename chip_rev to fpga_rev
From: Thangaraj Samynathan @ 2026-04-10 8:57 UTC (permalink / raw)
To: bryan.whitehead
Cc: UNGLinuxDriver, andrew+netdev, davem, edumazet, kuba, pabeni,
netdev, linux-kernel
The variable chip_rev stores the value read from the FPGA_REV
register and represents the FPGA revision. Rename it to fpga_rev
to better reflect its meaning.
No functional change intended.
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
---
drivers/net/ethernet/microchip/lan743x_main.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index b4cabde6625a..f3332417162e 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -36,7 +36,7 @@ static bool pci11x1x_is_a0(struct lan743x_adapter *adapter)
static void pci11x1x_strap_get_status(struct lan743x_adapter *adapter)
{
- u32 chip_rev;
+ u32 fpga_rev;
u32 cfg_load;
u32 hw_cfg;
u32 strap;
@@ -63,9 +63,9 @@ static void pci11x1x_strap_get_status(struct lan743x_adapter *adapter)
else
adapter->is_sgmii_en = false;
} else {
- chip_rev = lan743x_csr_read(adapter, FPGA_REV);
- if (chip_rev) {
- if (chip_rev & FPGA_SGMII_OP)
+ fpga_rev = lan743x_csr_read(adapter, FPGA_REV);
+ if (fpga_rev) {
+ if (fpga_rev & FPGA_SGMII_OP)
adapter->is_sgmii_en = true;
else
adapter->is_sgmii_en = false;
--
2.34.1
^ permalink raw reply related
* Re: [PATCH net v2] net: fix __this_cpu_add() in preemptible code in dev_xmit_recursion_inc/dec
From: Eric Dumazet @ 2026-04-10 8:55 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski, Paolo Abeni,
Simon Horman, Weiming Shi, linux-kernel
In-Reply-To: <20260410020631.191786-1-jiayuan.chen@linux.dev>
On Thu, Apr 9, 2026 at 7:06 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> dev_xmit_recursion_{inc,dec}() use __this_cpu_{inc,dec}() which requires
> the caller to be non-preemptible in order to avoid cpu migration. However,
> some callers like SCTP's UDP encapsulation path invoke iptunnel_xmit()
> from process context without disabling BH or preemption:
>
> sctp_inet_connect -> __sctp_connect -> sctp_do_sm ->
> sctp_outq_flush -> sctp_packet_transmit -> sctp_v4_xmit ->
> udp_tunnel_xmit_skb -> iptunnel_xmit -> dev_xmit_recursion_inc
>
> This triggers the following warning on PREEMPT(full) kernels:
>
>
> All other callers of dev_xmit_recursion_{inc,dec}() are fine: those in
> net/core/dev.c and net/core/filter.c run under local_bh_disable(), and
> lwtunnel_input() asserts in_softirq() context. Currently only
> iptunnel_xmit() and ip6tunnel_xmit() can be reached from process
> context via the SCTP UDP encapsulation path.
>
> Fix this by adding guard(migrate)() at the top of iptunnel_xmit() and
> ip6tunnel_xmit() to ensure dev_recursion_level(), dev_xmit_recursion_inc()
> and dev_xmit_recursion_dec() all run on the same CPU.
>
> Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
Reviewed-by: Eric Dumazet <edumazet@google.com>
Thanks!
^ permalink raw reply
* Re: [PATCH net 1/1] af_unix: Hold receive queue lock in ioctl(SIOCATMARK)
From: Eric Dumazet @ 2026-04-10 8:48 UTC (permalink / raw)
To: Ren Wei
Cc: netdev, kuniyu, davem, kuba, pabeni, horms, rao.shoaib, yifanwucs,
tomapufckgml, yuantan098, bird, enjou1224z, wangjiexun2025
In-Reply-To: <f6cbbc8da90e95584847b5ceb60aae830d1631c2.1775731983.git.wangjiexun2025@gmail.com>
On Thu, Apr 9, 2026 at 11:32 PM Ren Wei <n05ec@lzu.edu.cn> wrote:
>
> From: Jiexun Wang <wangjiexun2025@gmail.com>
>
> unix_ioctl() peeks at the receive queue and may check both the head skb
> and its successor while deciding whether SIOCATMARK should report the
> mark. However, u->iolock does not stabilize receive-queue element
> lifetime. Queue teardown paths can purge or splice the queue under
> sk->sk_receive_queue.lock and free the skb while unix_ioctl() still
> uses it.
>
> Take sk->sk_receive_queue.lock while inspecting the queue so the skb
> and next_skb stay alive for the whole decision.
>
> Fixes: 314001f0bf92 ("af_unix: Add OOB support")
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
> Suggested-by: Xin Liu <bird@lzu.edu.cn>
> Tested-by: Ren Wei <enjou1224z@gmail.com>
> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
> net/unix/af_unix.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index b23c33df8b46..54f12d5cda37 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -3301,6 +3301,8 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
> int answ = 0;
>
> mutex_lock(&u->iolock);
> + /* The receive queue lock keeps skb and next_skb alive. */
> + spin_lock(&sk->sk_receive_queue.lock);
>
> skb = skb_peek(&sk->sk_receive_queue);
> if (skb) {
> @@ -3315,6 +3317,7 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
> answ = 1;
> }
>
> + spin_unlock(&sk->sk_receive_queue.lock);
> mutex_unlock(&u->iolock);
>
> err = put_user(answ, (int __user *)arg);
Ok, but I think we also should remove the READ_ONCE() now we have
proper locking.
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5d4baac499af8d0a9cc36f652acf129a2e3619b2..1c3b80b25ef8b37d8e6553066fb7da20d81294a5
100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3301,10 +3301,12 @@ static int unix_ioctl(struct socket *sock,
unsigned int cmd, unsigned long arg)
int answ = 0;
mutex_lock(&u->iolock);
+ /* The receive queue lock keeps skb and
next_skb alive. */
+ spin_lock(&sk->sk_receive_queue.lock);
skb = skb_peek(&sk->sk_receive_queue);
if (skb) {
- struct sk_buff *oob_skb = READ_ONCE(u->oob_skb);
+ struct sk_buff *oob_skb = u->oob_skb;
struct sk_buff *next_skb;
next_skb = skb_peek_next(skb,
&sk->sk_receive_queue);
@@ -3314,7 +3316,7 @@ static int unix_ioctl(struct socket *sock,
unsigned int cmd, unsigned long arg)
(!oob_skb || next_skb == oob_skb)))
answ = 1;
}
-
+ spin_unlock(&sk->sk_receive_queue.lock);
mutex_unlock(&u->iolock);
err = put_user(answ, (int __user *)arg);
^ permalink raw reply
* Re: [PATCH RFC net-next 02/10] net: stmmac: rename dev_id to userver
From: Russell King (Oracle) @ 2026-04-10 8:39 UTC (permalink / raw)
To: Jitendra Vegiraju
Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, Chen-Yu Tsai,
David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
linux-stm32, linux-sunxi, netdev, Paolo Abeni, Samuel Holland
In-Reply-To: <CAMdnO-+TK65AxjTsDd017Mhop+VC3Xf8jtfaTXYpE6wBNZOt4g@mail.gmail.com>
On Thu, Apr 09, 2026 at 04:07:42PM -0700, Jitendra Vegiraju wrote:
> Hi Russell,
>
> On Wed, Apr 8, 2026 at 2:27 AM Russell King (Oracle)
> <rmk+kernel@armlinux.org.uk> wrote:
> >
> > The Synopsys Databook and several implementation TRMs identify bits
> > 15:8 of the version register in dwmac v3.xx and v4.xx as "userver".
> > We even print its value with "User ID". Rather than using "dev_id",
> > use "userver" instead.
> >
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---
> > drivers/net/ethernet/stmicro/stmmac/hwif.c | 18 +++++++++---------
> > 1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c
> > index 3774af66db48..830ff816ab4f 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
> > @@ -15,7 +15,7 @@
> >
> > struct stmmac_version {
> > u8 snpsver;
> > - u8 dev_id;
> > + u8 userver;
> > };
> From the XGMAC databook that I have access to bits(15:8) identify the
> DEVID field of MAC_version register.
> The userver field is from bits(23:16) of the same register. This is a
> customer defined field (configured with coreConsultant).
> Currently stmmac doesn't care about bits(23:16).
Thanks for the additional information.
I don't have any XGMAC documentation, but this indicates that it differs
between XGMAC and previous cores - GMAC and GMAC4 cores, 15:8 are
documented as userver, and 31:16 are marked as reserved.
Note that the dev_info() also prints 15:8 as "User ID" not "Device ID".
To confirm, is the XGMAC version register at offset 0x20 ? Later GMAC
cores moved it to 0x110.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox