Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] net: wan: fsl_ucc_hdlc: free tx_skbuff in uhdlc_memclean
From: Jakub Kicinski @ 2026-05-05 23:47 UTC (permalink / raw)
  To: Holger Brunck
  Cc: Christophe Leroy (CS GROUP), netdev@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, andrew+netdev@lunn.ch,
	qiang.zhao@nxp.com, horms@kernel.org
In-Reply-To: <AM0PR06MB103966EF99B706158C49CB3EAF73E2@AM0PR06MB10396.eurprd06.prod.outlook.com>

On Tue, 5 May 2026 08:33:34 +0000 Holger Brunck wrote:
> > I don't think you can just kfree() an skb like this.
> > 
> > I think you have to call dev_kfree_skb_any() instead.
> 
> yes you are right or at least dev_kfree_skb() as the error handling code in
> ucc_hdlc_tx does.

Please make it clear in the commit message how you discovered
the issue and how you tested your patches.

^ permalink raw reply

* [PATCH net v1] net/rds: reset op_nents when zerocopy page pin fails
From: Allison Henderson @ 2026-05-05 23:43 UTC (permalink / raw)
  To: netdev, pabeni, edumazet, kuba, horms, linux-rdma, achender

When iov_iter_get_pages2() fails in rds_message_zcopy_from_user(),
the pinned pages are released with put_page(), and
rm->data.op_mmp_znotifier is cleared.  But we fail to properly
clear rm->data.op_nents.

Later when rds_message_purge() is called from rds_sendmsg() the
cleanup loop iterates over the incorrectly non zero number of
op_nents and frees them again.

Fix this by properly resetting op_nents when it should be in
rds_message_zcopy_from_user().

Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.")
Signed-off-by: Allison Henderson <achender@kernel.org>
---
 net/rds/message.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/rds/message.c b/net/rds/message.c
index 25fedcb3cd00..7feb0eb6537d 100644
--- a/net/rds/message.c
+++ b/net/rds/message.c
@@ -448,6 +448,7 @@ static int rds_message_zcopy_from_user(struct rds_message *rm, struct iov_iter *

 			for (i = 0; i < rm->data.op_nents; i++)
 				put_page(sg_page(&rm->data.op_sg[i]));
+			rm->data.op_nents = 0;
 			mmp = &rm->data.op_mmp_znotifier->z_mmp;
 			mm_unaccount_pinned_pages(mmp);
 			ret = -EFAULT;
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH net-next 2/5] ionic: Report "link_down_events_phy" in ethtool statistics
From: Jakub Kicinski @ 2026-05-05 23:21 UTC (permalink / raw)
  To: Eric Joyner
  Cc: netdev, Brett Creeley, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni
In-Reply-To: <a1a9ebf6-fde7-40bf-ada4-7b032a5fdafb@amd.com>

On Tue, 5 May 2026 12:53:26 -0700 Eric Joyner wrote:
> > We have a standard stat for this:
> > 
> > struct ethtool_link_ext_stats {
> >         /* Custom Linux statistic for PHY level link down events.
> >          * In a simpler world it should be equal to netdev->carrier_down_count
> >          * unfortunately netdev also counts local reconfigurations which don't
> >          * actually take the physical link down, not to mention NC-SI which,
> >          * if present, keeps the link up regardless of host state.
> >          * This statistic counts when PHY _actually_ went down, or lost link.
> >          *
> >          * Note that we need u64 for ethtool_stats_init() and comparisons
> >          * to ETHTOOL_STAT_NOT_SET, but only u32 is exposed to the user.
> >          */
> >         u64 link_down_events;
> > };
> > 
> > 
> > IOW the definition of this stat is - ignoring asymetric link faults
> > this counter should match between link partners.  
> 
> So, this is a little awkward to talk about -- we are already filling out that
> field with a stat that's calculated by the driver; there's a task that monitors
> link transitions and increments it.
> 
> But, I'm not sure if that method exists because of older firmwares or cards not
> tracking the link_down_event count for us. So, I didn't want to just overwrite
> this stat with the value from firmware because then we'd lose the count on cards
> running firmware that doesn't do that counting for us.
> 
> So that's one reason why I put the stat in the generic ethtool stats output, and
> gave it the slightly different name "link_down_events_phy" to distinguish it
> (this also matches Mellanox's name for consistency) from this stat in the struct
> you posted. Though reading the doc comment you posted, this new stat really does
> belong there.

If the stat is currently filled in with something that does not work as
documented it should be fine to drop it. The counter is optional and
relatively recent. I think it's unlikely that any monitoring will have
a hard dependency on it.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] net: bcmasp: Divide init to allow partial bring up
From: Florian Fainelli @ 2026-05-05 23:18 UTC (permalink / raw)
  To: Justin Chen, netdev
  Cc: bcm-kernel-feedback-list, pabeni, kuba, edumazet, davem,
	andrew+netdev
In-Reply-To: <20260501185625.422361-2-justin.chen@broadcom.com>



On 5/1/2026 11:56 AM, Justin Chen wrote:
> To prepare for a partial bring up of the interface during resume,
> we break apart the bcmasp_netif_init() function into smaller chunks
> that can be called as necessary. Also consolidate some functions that
> do not need to be standalone.
> 
> Signed-off-by: Justin Chen <justin.chen@broadcom.com>

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
-- 
Florian


^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Document the deprecation of AF_ALG
From: Andy Lutomirski @ 2026-05-05 23:17 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, Herbert Xu, linux-doc, linux-api, linux-kernel,
	netdev, Linus Torvalds
In-Reply-To: <20260430011544.31823-1-ebiggers@kernel.org>

> On Apr 29, 2026, at 6:19 PM, Eric Biggers <ebiggers@kernel.org> wrote:
>
> AF_ALG is almost completely unnecessary, and it exposes a massive attack
> surface that hasn't been standing up to modern vulnerability discovery
> tools.  The latest one even has its own website, providing a small
> Python script that reliably roots most Linux distros: https://copy.fail/

How about adding a configuration option, defaulted on, that requires
capable(CAP_SYS_ADMIN) to create the socket (and maybe also to bind /
connect it).  And a sysctl to allow the administrator to override this
in the unlikely event that it’s needed.

IIRC cryptsetup used to and maybe even still does require these
sockets sometimes and this would let it keep working.  And there's all
the FIPS stuff downthread.


>
> This isn't sustainable, especially as LLMs have accelerated the rate the
> vulnerabilities are coming in.  The effort that is being put into this
> thing is vastly disproportional to the few programs that actually use
> it, and those programs would be better served by userspace code anyway.
>
> These issues have been noted in many mailing list discussions already.
> But until now they haven't been reflected in the documentation or
> kconfig menu itself, and the vulnerabilities are still coming in.
>
> Let's go ahead and document the deprecation.
>
> This isn't intended to change anything overnight.  After all, most Linux
> distros won't be able to disable the kconfig options quite yet, mainly
> because of iwd.  But this should create a bit more impetus for these
> userspace programs to be fixed, and the documentation update should also
> help prevent more users from appearing.
>
> Signed-off-by: Eric Biggers <ebiggers@kernel.org>
> ---
>
> This patch is targeting crypto/master
>
> Documentation/crypto/userspace-if.rst | 82 ++++++++++++++++++++-------
> crypto/Kconfig                        | 69 ++++++++++++++++------
> 2 files changed, 113 insertions(+), 38 deletions(-)
>
> diff --git a/Documentation/crypto/userspace-if.rst b/Documentation/crypto/userspace-if.rst
> index 021759198fe7..c39f5c79a5b7 100644
> --- a/Documentation/crypto/userspace-if.rst
> +++ b/Documentation/crypto/userspace-if.rst
> @@ -2,30 +2,72 @@ User Space Interface
> ====================
>
> Introduction
> ------------
>
> -The concepts of the kernel crypto API visible to kernel space is fully
> -applicable to the user space interface as well. Therefore, the kernel
> -crypto API high level discussion for the in-kernel use cases applies
> -here as well.
> -
> -The major difference, however, is that user space can only act as a
> -consumer and never as a provider of a transformation or cipher
> -algorithm.
> -
> -The following covers the user space interface exported by the kernel
> -crypto API. A working example of this description is libkcapi that can
> -be obtained from [1]. That library can be used by user space
> -applications that require cryptographic services from the kernel.
> -
> -Some details of the in-kernel kernel crypto API aspects do not apply to
> -user space, however. This includes the difference between synchronous
> -and asynchronous invocations. The user space API call is fully
> -synchronous.
> -
> -[1] https://www.chronox.de/libkcapi/index.html
> +AF_ALG provides unprivileged userspace programs access to arbitrary hash,
> +symmetric cipher, AEAD, and RNG algorithms that are implemented in kernel-mode
> +code.
> +
> +AF_ALG is insecure and is deprecated. Originally added to the kernel in 2010,
> +most kernel developers now consider it to be a mistake.
> +
> +AF_ALG continues to be supported only for backwards compatibility. On systems
> +where no programs using AF_ALG remain, the support for it should be disabled by
> +disabling ``CONFIG_CRYPTO_USER_API_*``.
> +
> +Deprecation
> +-----------
> +
> +AF_ALG was originally intended to provide userspace programs access to crypto
> +accelerators that they wouldn't otherwise have access to.
> +
> +However, that capability turned out to not be useful on very many systems. More
> +significantly, the actual implementation exposes a vastly greater amount of
> +functionality than that. It actually provides access to all software algorithms.
> +
> +This includes arbitrary compositions of different algorithms created via a
> +complex template system, as well as algorithms that only make sense as internal
> +implementation details of other algorithms. It also includes full zero-copy
> +support, which is difficult for the kernel to implement securely.
> +
> +Ultimately, these algorithms are just math computations. They use the same
> +instructions that userspace programs already have access to, just accessed in a
> +much more convoluted and less efficient way.
> +
> +Indeed, userspace code is nearly always what is being used anyway. These same
> +algorithms are widely implemented in userspace crypto libraries.
> +
> +Meanwhile, AF_ALG hasn't been withstanding modern vulnerability discovery tools
> +such as syzbot and large language models. It receives a steady stream of CVEs.
> +Some of the examples include:
> +
> +- CVE-2026-31677
> +- CVE-2026-31431 (https://copy.fail)
> +- CVE-2025-38079
> +- CVE-2025-37808
> +- CVE-2024-26824
> +- CVE-2022-48781
> +- CVE-2019-8912
> +- CVE-2018-14619
> +- CVE-2017-18075
> +- CVE-2017-17806
> +- CVE-2017-17805
> +- CVE-2016-10147
> +- CVE-2015-8970
> +- CVE-2015-3331
> +- CVE-2014-9644
> +- CVE-2013-7421
> +- CVE-2011-4081
> +
> +It is recommended that, whenever possible, userspace programs be migrated to
> +userspace crypto code (which again, is what is normally used anyway) and
> +``CONFIG_CRYPTO_USER_API_*`` be disabled.  On systems that use SELinux, SELinux
> +can also be used to restrict the use of AF_ALG to trusted programs.
> +
> +The remainder of this documentation provides the historical documentation for
> +the deprecated AF_ALG interface.
>
> User Space API General Remarks
> ------------------------------
>
> The kernel crypto API is accessible from user space. Currently, the
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index 103d1f58cb7c..6cd1c478d4be 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -1278,48 +1278,72 @@ config CRYPTO_DF80090A
>    tristate
>    select CRYPTO_AES
>    select CRYPTO_CTR
>
> endmenu
> -menu "Userspace interface"
> +menu "Userspace interface (deprecated)"
>
> config CRYPTO_USER_API
>    tristate
>
> config CRYPTO_USER_API_HASH
> -    tristate "Hash algorithms"
> +    tristate "Hash algorithms (deprecated)"
>    depends on NET
>    select CRYPTO_HASH
>    select CRYPTO_USER_API
>    help
> -      Enable the userspace interface for hash algorithms.
> +      Enable the AF_ALG userspace interface for hash algorithms.  This
> +      provides unprivileged userspace programs access to arbitrary hash
> +      algorithms implemented in the kernel's privileged execution context.
>
> -      See Documentation/crypto/userspace-if.rst and
> -      https://www.chronox.de/libkcapi/html/index.html
> +      This interface is deprecated and is supported only for backwards
> +      compatibility.  It regularly has vulnerabilities, and the capabilities
> +      it provides are redundant with userspace crypto libraries.
> +
> +      Enable this only if needed for support for a program that hasn't yet
> +      been converted to userspace crypto, for example iwd.
> +
> +      See also Documentation/crypto/userspace-if.rst
>
> config CRYPTO_USER_API_SKCIPHER
> -    tristate "Symmetric key cipher algorithms"
> +    tristate "Symmetric key cipher algorithms (deprecated)"
>    depends on NET
>    select CRYPTO_SKCIPHER
>    select CRYPTO_USER_API
>    help
> -      Enable the userspace interface for symmetric key cipher algorithms.
> +      Enable the AF_ALG userspace interface for symmetric key algorithms.
> +      This provides unprivileged userspace programs access to arbitrary
> +      symmetric key algorithms implemented in the kernel's privileged
> +      execution context.
> +
> +      This interface is deprecated and is supported only for backwards
> +      compatibility.  It regularly has vulnerabilities, and the capabilities
> +      it provides are redundant with userspace crypto libraries.
> +
> +      Enable this only if needed for support for a program that hasn't yet
> +      been converted to userspace crypto, for example iwd, or cryptsetup
> +      with certain algorithms.
>
> -      See Documentation/crypto/userspace-if.rst and
> -      https://www.chronox.de/libkcapi/html/index.html
> +      See also Documentation/crypto/userspace-if.rst
>
> config CRYPTO_USER_API_RNG
> -    tristate "RNG (random number generator) algorithms"
> +    tristate "Random number generation algorithms (deprecated)"
>    depends on NET
>    select CRYPTO_RNG
>    select CRYPTO_USER_API
>    help
> -      Enable the userspace interface for RNG (random number generator)
> -      algorithms.
> +      Enable the AF_ALG userspace interface for random number generation
> +      (RNG) algorithms.  This provides unprivileged userspace programs
> +      access to arbitrary RNG algorithms implemented in the kernel's
> +      privileged execution context.
>
> -      See Documentation/crypto/userspace-if.rst and
> -      https://www.chronox.de/libkcapi/html/index.html
> +      This interface is deprecated and is supported only for backwards
> +      compatibility.  It regularly has vulnerabilities, and the capabilities
> +      it provides are redundant with userspace crypto libraries as well as
> +      the normal kernel RNG (e.g., /dev/urandom and getrandom(2)).
> +
> +      See also Documentation/crypto/userspace-if.rst
>
> config CRYPTO_USER_API_RNG_CAVP
>    bool "Enable CAVP testing of DRBG"
>    depends on CRYPTO_USER_API_RNG && CRYPTO_DRBG
>    help
> @@ -1330,20 +1354,29 @@ config CRYPTO_USER_API_RNG_CAVP
>
>      This should only be enabled for CAVP testing. You should say
>      no unless you know what this is.
>
> config CRYPTO_USER_API_AEAD
> -    tristate "AEAD cipher algorithms"
> +    tristate "AEAD cipher algorithms (deprecated)"
>    depends on NET
>    select CRYPTO_AEAD
>    select CRYPTO_SKCIPHER
>    select CRYPTO_USER_API
>    help
> -      Enable the userspace interface for AEAD cipher algorithms.
> +      Enable the AF_ALG userspace interface for authenticated encryption
> +      with associated data (AEAD) algorithms.  This provides unprivileged
> +      userspace programs access to arbitrary AEAD algorithms implemented in
> +      the kernel's privileged execution context.
> +
> +      This interface is deprecated and is supported only for backwards
> +      compatibility.  It regularly has vulnerabilities, and the capabilities
> +      it provides are redundant with userspace crypto libraries.
> +
> +      Enable this only if needed for support for a program that hasn't yet
> +      been converted to userspace crypto, for example iwd.
>
> -      See Documentation/crypto/userspace-if.rst and
> -      https://www.chronox.de/libkcapi/html/index.html
> +      See also Documentation/crypto/userspace-if.rst
>
> config CRYPTO_USER_API_ENABLE_OBSOLETE
>    bool "Obsolete cryptographic algorithms"
>    depends on CRYPTO_USER_API
>    default y
>
> base-commit: 57b8e2d666a31fa201432d58f5fe3469a0dd83ba
> --
> 2.54.0
>
>

^ permalink raw reply

* Re: [PATCH] llc: fix coding style and memory barrier documentation
From: Jakub Kicinski @ 2026-05-05 23:16 UTC (permalink / raw)
  To: Lucas Poupeau; +Cc: davem, edumazet, pabeni, horms, netdev, linux-kernel
In-Reply-To: <20260505192549.74382-1-lucasp.linux@gmail.com>

On Tue,  5 May 2026 21:25:49 +0200 Lucas Poupeau wrote:
> Fix multiple style issues reported by checkpatch.pl to improve code
> maintainability and compliance with kernel standards.

Quoting documentation:

  Clean-up patches
  ~~~~~~~~~~~~~~~~
  
  Netdev discourages patches which perform simple clean-ups, which are not in
  the context of other work. For example:
  
  * Addressing ``checkpatch.pl``, and other trivial coding style warnings
  * Addressing :ref:`Local variable ordering<rcs>` issues
  * Conversions to device-managed APIs (``devm_`` helpers)
  
  This is because it is felt that the churn that such changes produce comes
  at a greater cost than the value of such clean-ups.
  
  Conversely, spelling and grammar fixes are not discouraged.
  
See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#clean-up-patches

^ permalink raw reply

* [ANN] netdev foundation TSC meeting notes - May 5th
From: Jakub Kicinski @ 2026-05-05 23:07 UTC (permalink / raw)
  To: netdev

Present: Eric, Jakub, Johannes, Kuniyuki, Simon, Willem, Paolo

Logo funding approved: aim to use at Netdev 0x1A

Academic Research program
 - Academic Research Grant Program proposal drafted and shared with the group
 - We should not expect usable patches to be produced
 - Plan to set general guidance - Netdev focus - and some example topics

Netconf
 - On the first day of Netdev 0x1A. 13th July
 - No CFP due to shorter length and collocation with a relevant conference
 - Invitations to be sent within the next few weeks

AI Review
 - Claude-shiko running, and finding slightly different issues, does
   not report preexisting issues
 - Hoping to integrate codex once broadly available on Bedrock, as well
 - Once we have Claude + Codex the accuracy should be high enough to
   start emailing the reviews out automatically (and update the netdev
   "FAQ" as appropriate)

^ permalink raw reply

* Re: [PATCH batadv 0/8] batman-adv: follow up fixes
From: Yuan Tan @ 2026-05-05 23:02 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: Jakub Kicinski, Konstantin Ryabitsev, Paolo Abeni, netdev,
	linux-kernel, Ao Zhou, Haoze Xie, Jiexun Wang, Juefei Pu,
	Luxing Yin, Ruide Cao, Xin Liu, Yifan Wu, Joe Perches
In-Reply-To: <2313096.ZfL8zNpBrT@ripper>

On Tue, May 5, 2026 at 12:20 AM Sven Eckelmann <sven@narfation.org> wrote:
>
> On Tuesday, 5 May 2026 07:21:13 CEST Matthieu Baerts wrote:
> [...]
> > >>> Are you CCing netdev to get this reviewed by Sashiko?
> > >>> Please don't..
> > >>> We delegate code to sub-sub-systems to lower the patch volume :(
> > >>>
> > >>
> > >> Because of `b4 prep --auto-to-cc`. Will now manually remove you.
> > >
> > > To speed up the discussion: @Konstantin, is there a way in b4 to say "stop at
> > > the sub-sub-systems" when doing `b4 prep --auto-to-cc`? I am just trying to get the
> > > `b4` workflow somehow working with the netdev requirements.
> >
> > Maybe a new option could be added, but that seems difficult to guess
> > where to stop, and to which subsystems to apply this.
> >
> > Can you not simply omit using `b4 prep --auto-to-cc` when working
> > with "internal" patches?
>
> Yes, no, maybe :)
> I will for the moment ignore the .b4-config part and talk about it at the end
> of the mail.
>
> b4 is trying to (afaik) to have a good common work flow for kernel related
> projects (and more). Independent of my role (if I am the maintainer or just
> another contributor), it will nag before a send: "Hey, please run
> --auto-to-cc, --check, --check-deps before you submit this patch(set) - you
> know how embarrassing it is when you notice some obvious problem 2 seconds
> after the SMTP server accepted your mail."
>
> And I agree with this and also try to convince people to try b4 because I
> think it is really helpful. Or at least ask them to use
> `./scripts/get_maintainer.pl` and NOT send patches with the prefix "net" or
> "net-next" when it actually targets our tree. But as it turns out, these
> recommendation seem to have been wrong and I am sorry about this.
>
> And I know, b4 is a good tool but adding a bazillion options just for every
> special case doesn't make a lot of sense and might make it a worse tool. I was
> therefore more thinking about `scripts/get_maintainer.pl`  (see `b4.send-auto-
> cc-cmd`) which also called by b4 with various options to avoid adding too many
> people.
>
> I don't say that any of these tools need to change. I am guessing more that I
> have to adjust something (MAINTAINERS, ...) to avoid that people are sending
> batman-adv sub-sub-system patches directly to netdev. I am just not aware of
> what this should be. But it sounds to me like there is at least a need for it
> (from the netdev maintainers perspective).
>
> > On my side, that's what I'm doing. I added a .b4-config file with this
> > content, not to have to specify --set-prefix nor --to:
>
> Regarding the .b4-config - yes, this is helpful and I should add it to
> batctl.git. I was more thinking about the normal contributor to
> net/batman-adv/. Regardless of this person taking as base net/net-next.git or
> our repo.
>
> The fixes from Ren Wei (and associates) and some other people were sent with
> "net" in the prefix, were Cc'ing netdev and didn't seem to use our tree as
> base. This is of course not correct and they should have targeted our tree
> instead. I didn't complain because the fix was otherwise extremely helpful and
> I though that there was no harm done. As it looks now, I should have and I am
> sorry for not communicating this.
>
> And I am at the moment not sure how to fix this without overloading
> contributers with "when you are contributing to some sub-subsystem of netdev
> ..., but when you are contributing to ext4, other rules apply .... don't
> forget about i2c rules for patch submission, ...".
>
> But maybe I am just ignorant and this is already quite simple (and there are
> no special "netdev" rules) - I am just not aware of it. In this case, please
> point me in the right direction, just to avoid reproducing wrong
> recommendations to other people.
>
> Regards,
>         Sven

Hi Sven,

I apologize for the inconvenience this caused. This was my oversight;
I’m embarrassed to admit that I was unaware that batman-adv used its
own maintenance tree and had instructed my team to follow the standard
netdev process.

It is our responsibility as contributors to carefully follow rules. I
will update our internal guides immediately to ensure our future
patches adhere to the correct workflow.

Best,
Yuan

^ permalink raw reply

* [PATCH] igb: Remove redundant extern declarations
From: Md Shofiqul Islam @ 2026-05-05 23:01 UTC (permalink / raw)
  To: netdev
  Cc: anthony.l.nguyen, przemyslaw.kitszel, linux-kernel,
	Md Shofiqul Islam

igb_open() and igb_close() are already declared in igb.h which
is included by igb_main.c. Remove the duplicate extern declarations
from igb_main.c to avoid redundancy and fix checkpatch warnings.
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ce91dda00..e2691b5c0 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -108,8 +108,6 @@ static void igb_free_all_rx_resources(struct igb_adapter *);
 static void igb_setup_mrqc(struct igb_adapter *);
 static void igb_init_queue_configuration(struct igb_adapter *adapter);
 static int igb_sw_init(struct igb_adapter *);
-int igb_open(struct net_device *);
-int igb_close(struct net_device *);
 static void igb_configure(struct igb_adapter *);
 static void igb_configure_tx(struct igb_adapter *);
 static void igb_configure_rx(struct igb_adapter *);
-- 
2.51.1


^ permalink raw reply related

* Re: [PATCH net-next v5 2/2] net: ti: icssg-prueth: Add ethtool ops for Frame Preemption MAC Merge
From: Jakub Kicinski @ 2026-05-05 23:01 UTC (permalink / raw)
  To: m-malladi
  Cc: Jakub Kicinski, vadim.fedorenko, haokexin, jacob.e.keller, horms,
	arnd, parvathi, afd, basharath, vladimir.oltean, rogerq,
	danishanwar, pabeni, edumazet, davem, andrew+netdev,
	linux-arm-kernel, netdev, linux-kernel, srk, vigneshr
In-Reply-To: <20260430111723.497113-3-m-malladi@ti.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: ti: icssg-prueth: Add ethtool ops for Frame Preemption MAC Merge

This patch adds driver support for viewing and changing the MAC Merge
sublayer parameters via the .get_mm(), .set_mm() and .get_mm_stats()
ethtool ops, and extends emac_get_stat_by_name() to return u64.

> diff --git a/drivers/net/ethernet/ti/icssg/icssg_ethtool.c b/drivers/net/ethernet/ti/icssg/icssg_ethtool.c
> index b715af21d23a..6a08c40adde5 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_ethtool.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_ethtool.c

[ ... ]

> @@ -294,6 +293,109 @@ static int emac_set_per_queue_coalesce(struct net_device *ndev, u32 queue,
>  	return 0;
>  }
>  
> +static int emac_get_mm(struct net_device *ndev, struct ethtool_mm_state *state)
> +{
> +	struct prueth_emac *emac = netdev_priv(ndev);
> +	struct prueth_qos_iet *iet = &emac->qos.iet;
> +	enum icssg_ietfpe_verify_states verify_status;
> +
> +	if (emac->is_sr1)
> +		return -EOPNOTSUPP;
> +
> +	mutex_lock(&iet->fpe_lock);
> +	state->tx_enabled = iet->fpe_enabled;
> +	state->tx_min_frag_size = iet->tx_min_frag_size - ETH_FCS_LEN;

Can emac_get_mm() return sensible values on first call?

Looking at icssg_qos_init() in icssg_qos.c:

    void icssg_qos_init(struct net_device *ndev)
    {
        struct prueth_emac *emac = netdev_priv(ndev);
        struct prueth_qos_iet *iet = &emac->qos.iet;

        iet->emac = emac;
        mutex_init(&iet->fpe_lock);
    }

Only emac and the mutex are initialized, so iet->tx_min_frag_size and
iet->verify_time_ms remain zero (netdev_priv() is zero-initialized).

On first invocation state->tx_min_frag_size becomes 0 - ETH_FCS_LEN,
which wraps to 0xFFFFFFFC and is then reported to userspace via
ETHTOOL_A_MM_TX_MIN_FRAG_SIZE.

There is a follow-on effect in ethnl_set_mm() (net/ethtool/mm.c) which
calls get_mm first and copies tx_min_frag_size and verify_time into
cfg via mm_state_to_cfg().  If the user does not pass
ETHTOOL_A_MM_TX_MIN_FRAG_SIZE, cfg.tx_min_frag_size stays at
0xFFFFFFFC.  emac_set_mm() then hands that value to
icssg_qos_validate_tx_min_frag_size(), where:

    min_frag_size += ETH_FCS_LEN;   /* 0xFFFFFFFC + 4 == 0 */
    if (min_frag_size < 64)
        return -EINVAL;  /* "tx_min_frag_size must be at least 64 bytes" */

so the first ethtool --set-mm invocation fails with a misleading
diagnostic unless the user specifies every tunable explicitly.

The verify_time_ms = 0 case behaves similarly and is rejected by
icssg_qos_validate_verify_time().

Should icssg_qos_init() seed iet->tx_min_frag_size (e.g. 64 including
FCS) and iet->verify_time_ms (e.g. 128) to sane defaults?

> +	state->tx_active = iet->fpe_active;
> +	state->verify_enabled = iet->mac_verify_configure;
> +	state->verify_time = iet->verify_time_ms;
> +	verify_status = iet->verify_status;
> +	mutex_unlock(&iet->fpe_lock);
> +
> +	state->pmac_enabled = true;
> +	/* 64Bytes is the minimum fragment size supported
> +	 * by the firmware. <64B leads to min frame errors
> +	 */
> +	state->rx_min_frag_size = 64;

Is the unit of rx_min_frag_size consistent with tx_min_frag_size here?

The 64 byte floor includes FCS, but the tx_min_frag_size reported just
above is iet->tx_min_frag_size - ETH_FCS_LEN, i.e. excluding FCS.

Peer drivers report rx_min_frag_size without FCS, for example
mscc/ocelot_mm.c and stmmac_ethtool.c use ETH_ZLEN (60), and
enetc_ethtool.c uses ethtool_mm_frag_size_add_to_min(lafs).

The inconsistency also means a user echoing the reported rx value back
as tx_min_frag_size would hit icssg_qos_validate_tx_min_frag_size()
because 64 + 4 = 68 is not a multiple of 64.

Should this be 64 - ETH_FCS_LEN (or ETH_ZLEN) to match the uAPI
convention?

[ ... ]

> +	return 0;
> +}
> +
> +static int emac_set_mm(struct net_device *ndev, struct ethtool_mm_cfg *cfg,
> +		       struct netlink_ext_ack *extack)
> +{
> +	struct prueth_emac *emac = netdev_priv(ndev);
> +	struct prueth_qos_iet *iet = &emac->qos.iet;
> +	int err;
> +
> +	if (emac->is_sr1)
> +		return -EOPNOTSUPP;
> +
> +	if (!cfg->pmac_enabled)
> +		NL_SET_ERR_MSG_MOD(extack, "preemptible MAC is always enabled");

Should this path also return an error?

The extack is set but control falls through, all remaining fields are
applied, icssg_config_ietfpe() runs, and the function returns 0.  A
caller doing "ethtool --set-mm ethN pmac-enabled off" sees success even
though emac_get_mm() hardcodes state->pmac_enabled = true and the
request was ignored.

Peer drivers (enetc, igc, ocelot, am65-cpsw) either honour
pmac_enabled or return -EOPNOTSUPP.  Would returning -EOPNOTSUPP (or
-EINVAL) here be more consistent with the ethtool convention that a
non-empty extack accompanies a non-zero return?

> +
> +	err = icssg_qos_validate_tx_min_frag_size(cfg->tx_min_frag_size, extack);
> +	if (err)
> +		return err;
> +
> +	err = icssg_qos_validate_verify_time(cfg->verify_time, extack);
> +	if (err)
> +		return err;
> +
> +	mutex_lock(&iet->fpe_lock);
> +	iet->verify_time_ms = cfg->verify_time;
> +	iet->tx_min_frag_size = cfg->tx_min_frag_size + ETH_FCS_LEN;
> +	iet->fpe_enabled = cfg->tx_enabled;
> +	iet->mac_verify_configure = cfg->verify_enabled;
> +	icssg_config_ietfpe(emac, cfg->tx_enabled);
> +	mutex_unlock(&iet->fpe_lock);
> +
> +	return err;
> +}

Can firmware-level failures in icssg_config_ietfpe() be surfaced to
userspace?

icssg_config_ietfpe() returns void, but it can fail in several ways:

    /* icssg_set_port_state() command failure */
    ret = icssg_set_port_state(emac, ... );
    if (ret) {
        netdev_err(emac->ndev, "TX preempt %s command failed\n", ...);
        ...
        return;
    }

    /* MAC Verify wait timeout */
    ret = icssg_iet_verify_wait(emac);
    if (ret) {
        netdev_err(emac->ndev, "MAC Verification failed with timeout\n");
        return;
    }

    /* Firmware failed to activate FPE */
    val = readb(config + PRE_EMPTION_ACTIVE_TX);
    if (val != 1) {
        netdev_err(emac->ndev, "Firmware fails to activate IET/FPE\n");
        return;
    }

Each case emits only a netdev_err().  emac_set_mm() then returns err,
which is 0 at that point, so "ethtool --set-mm" reports success even
when the hardware rejected the configuration or MAC Verify timed out.

Would it be reasonable to change icssg_config_ietfpe() to return int
and propagate the error (with an extack message) from emac_set_mm()?

[ ... ]

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] net: ti: icssg-prueth: Add Frame Preemption MAC Merge support
From: Jakub Kicinski @ 2026-05-05 23:01 UTC (permalink / raw)
  To: m-malladi
  Cc: Jakub Kicinski, vadim.fedorenko, haokexin, jacob.e.keller, horms,
	arnd, parvathi, afd, basharath, vladimir.oltean, rogerq,
	danishanwar, pabeni, edumazet, davem, andrew+netdev,
	linux-arm-kernel, netdev, linux-kernel, srk, vigneshr
In-Reply-To: <20260430111723.497113-2-m-malladi@ti.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: ti: icssg-prueth: Add Frame Preemption MAC Merge support

This patch adds QoS infrastructure for Frame Preemption (FPE) support to
the ICSSG Ethernet driver, introducing prueth_qos_iet state tracking,
icssg_config_ietfpe() for firmware IET configuration, MQPRIO qdisc
offload that maps traffic classes to express/preemptible queues, and
link-state driven verify state machine re-triggering.

A few questions on the code below.

> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> index 591be5c8056b..7657dc1015f0 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c

[ ... ]

> @@ -1686,6 +1689,8 @@ static int prueth_netdev_init(struct prueth *prueth,
>  
>  	INIT_DELAYED_WORK(&emac->stats_work, icssg_stats_work_handler);
>  
> +	icssg_qos_init(ndev);
> +
>  	ret = pruss_request_mem_region(prueth->pruss,
>  				       port == PRUETH_PORT_MII0 ?
>  				       PRUSS_MEM_DRAM0 : PRUSS_MEM_DRAM1,

Is the fpe_lock mutex lifecycle balanced on all error paths?

icssg_qos_init() calls mutex_init(&iet->fpe_lock) unconditionally here,
but mutex_destroy() is only invoked in the netdev_unregister label in
prueth_probe() (gated on registered_netdevs[i]) and in prueth_remove().

The remaining paths do not appear to destroy the mutex:

  - Every goto free / goto free_ndev within prueth_netdev_init() itself
    (pruss_request_mem_region failure, platform_get_irq_byname_optional
    failure, of_parse_phandle failure, of_get_phy_mode failure, the
    unsupported phy_if case) calls free_netdev() directly.

  - prueth_netdev_exit() (reached via the netdev_exit label and from
    prueth_remove) frees the netdev without destroying fpe_lock.

  - When register_netdev() succeeds for MAC0 but fails for MAC1 (or
    when prueth_netdev_init() fails for MAC1 after MAC0 succeeded),
    control flows through netdev_exit, which leaks the fpe_lock state
    for the not-yet-registered netdev.

Would it be cleaner to pair mutex_destroy() with free_netdev in
prueth_netdev_exit() rather than at the registered_netdev unwind level?

[ ... ]

> diff --git a/drivers/net/ethernet/ti/icssg/icssg_qos.c b/drivers/net/ethernet/ti/icssg/icssg_qos.c
> new file mode 100644
> index 000000000000..fc4b0ebc7d68
> --- /dev/null
> +++ b/drivers/net/ethernet/ti/icssg/icssg_qos.c
> @@ -0,0 +1,232 @@

[ ... ]

> +static void icssg_iet_set_preempt_mask(struct prueth_emac *emac)
> +{
> +	void __iomem *config = emac->dram.va + ICSSG_CONFIG_OFFSET;
> +	struct prueth_qos_mqprio *p_mqprio = &emac->qos.mqprio;
> +	struct tc_mqprio_qopt *qopt = &p_mqprio->mqprio.qopt;
> +	u8 preemptible_tcs = p_mqprio->preemptible_tcs;
> +	struct prueth_qos_iet *iet = &emac->qos.iet;
> +	int prempt_mask = 0, i;
> +	u8 tc;
> +
> +	/* The preemptible traffic classes should only be committed to hardware
> +	 * once TX is active.
> +	 */
> +	if (!iet->fpe_active) {
> +		netdev_dbg(emac->ndev, "FPE not active, skipping preempt mask config\n");
> +		return;
> +	}
> +
> +	/* Configure the queues based on the preemptible tc map set by the user */
> +	for (tc = 0; tc < p_mqprio->mqprio.qopt.num_tc; tc++) {
> +		/* check if the tc is preemptive or not */
> +		if (preemptible_tcs & BIT(tc)) {
> +			for (i = qopt->offset[tc]; i < qopt->offset[tc] + qopt->count[tc]; i++) {
> +				/* Set all the queues in this tc as preemptive queues */
> +				writeb(BIT(4), config + EXPRESS_PRE_EMPTIVE_Q_MAP + i);
> +				prempt_mask &= ~BIT(i);
                                ^^^^^^^^^^^^^^^^^^^^^^^^^

Is the prempt_mask &= ~BIT(i) here dead code?  prempt_mask is
initialised to 0, so clearing a bit that is already zero has no effect.
The other branch sets bits via prempt_mask |= BIT(i), so it looks like
the clear statement could simply be removed.

> +			}
> +		} else {
> +			/* Set all the queues in this tc as express queues */
> +			for (i = qopt->offset[tc]; i < qopt->offset[tc] + qopt->count[tc]; i++) {
> +				writeb(0, config + EXPRESS_PRE_EMPTIVE_Q_MAP + i);
> +				prempt_mask |= BIT(i);
> +			}
> +		}
> +		netdev_set_tc_queue(emac->ndev, tc, qopt->count[tc], qopt->offset[tc]);
> +	}
> +	writeb(prempt_mask, config + EXPRESS_PRE_EMPTIVE_Q_MASK);
> +}

[ ... ]

> +/* Direct synchronous configuration of IET FPE.
> + * Caller must hold iet->fpe_lock.
> + */
> +void icssg_config_ietfpe(struct prueth_emac *emac, bool enable)
> +{
> +	void __iomem *config = emac->dram.va + ICSSG_CONFIG_OFFSET;
> +	struct prueth_qos_iet *iet = &emac->qos.iet;
> +	int ret;
> +	u8 val;
> +
> +	/* return early if FPE is not active and need not be enabled */
> +	if (!iet->fpe_enabled && !iet->fpe_active)
> +		return;
> +
> +	if (!netif_running(emac->ndev)) {
> +		netdev_dbg(emac->ndev, "cannot change IET/FPE state when interface is down\n");
> +		return;
> +	}
> +
> +	/* Update FPE Tx enable bit (PRE_EMPTION_ENABLE_TX) if
> +	 * fpe_enabled is set to enable MM in Tx direction
> +	 */
> +	writeb(enable ? 1 : 0, config + PRE_EMPTION_ENABLE_TX);
> +
> +	/* If FPE is to be enabled, first configure MAC Verify state
> +	 * machine in firmware as firmware kicks the Verify process
> +	 * as soon as ICSSG_EMAC_PORT_PREMPT_TX_ENABLE command is
> +	 * received.
> +	 */
> +	if (enable && iet->mac_verify_configure) {
> +		writeb(1, config + PRE_EMPTION_ENABLE_VERIFY);
> +		writew(iet->tx_min_frag_size, config + PRE_EMPTION_ADD_FRAG_SIZE_LOCAL);
> +		writel(iet->verify_time_ms, config + PRE_EMPTION_VERIFY_TIME);
> +	} else {
> +		iet->verify_status = ICSSG_IETFPE_STATE_DISABLED;
> +	}
> +
> +	/* Send command to enable FPE Tx side. Rx is always enabled */
> +	ret = icssg_set_port_state(emac,
> +				   enable ? ICSSG_EMAC_PORT_PREMPT_TX_ENABLE :
> +					    ICSSG_EMAC_PORT_PREMPT_TX_DISABLE);
> +	if (ret) {
> +		netdev_err(emac->ndev, "TX preempt %s command failed\n",
> +			   str_enable_disable(enable));
> +		writeb(0, config + PRE_EMPTION_ENABLE_VERIFY);
> +		iet->verify_status = ICSSG_IETFPE_STATE_DISABLED;
> +		return;
        ^^^^^^^

Can this disable path leave the software and hardware state out of sync?

When called with enable=false, PRE_EMPTION_ENABLE_TX has already been
written to 0 above before the firmware command is issued.  If
icssg_set_port_state(ICSSG_EMAC_PORT_PREMPT_TX_DISABLE) then fails, the
function returns without updating iet->fpe_active, so fpe_active stays
true even though the TX-enable register now reads 0.

Later calls will then skip the !fpe_enabled && !fpe_active early return
and icssg_iet_set_preempt_mask() will keep writing preempt masks as if
TX were still active.

> +	}
> +
> +	if (enable && iet->mac_verify_configure) {
> +		ret = icssg_iet_verify_wait(emac);
> +		if (ret) {
> +			netdev_err(emac->ndev, "MAC Verification failed with timeout\n");
> +			return;
                        ^^^^^^^

On the verify-wait timeout path (and the PRE_EMPTION_ACTIVE_TX != 1
path below), the driver has already written PRE_EMPTION_ENABLE_TX=1,
PRE_EMPTION_ENABLE_VERIFY=1, PRE_EMPTION_ADD_FRAG_SIZE_LOCAL,
PRE_EMPTION_VERIFY_TIME, and issued the
ICSSG_EMAC_PORT_PREMPT_TX_ENABLE firmware command.  This error path
returns without reverting any of those writes, and iet->fpe_active
remains false.

A later user-driven disable will then hit the early return:

    if (!iet->fpe_enabled && !iet->fpe_active)
        return;

so the hardware stays configured for FPE while the driver believes it
is off.  Should these error paths also perform rollback like the
icssg_set_port_state() failure branch does?

> +		}
> +	} else if (enable) {
> +		/* Give firmware some time to update PRE_EMPTION_ACTIVE_TX state */
> +		usleep_range(100, 200);
> +	}
> +
> +	if (enable) {
> +		val = readb(config + PRE_EMPTION_ACTIVE_TX);
> +		if (val != 1) {
> +			netdev_err(emac->ndev,
> +				   "Firmware fails to activate IET/FPE\n");
> +			return;
> +		}
> +		iet->fpe_active = true;
> +	} else {
> +		iet->fpe_active = false;
> +	}
> +
> +	icssg_iet_set_preempt_mask(emac);

On the disable path (enable=false), fpe_active was just set to false,
but icssg_iet_set_preempt_mask() early-returns precisely on
!fpe_active:

    if (!iet->fpe_active) {
        netdev_dbg(emac->ndev, "FPE not active, skipping preempt mask config\n");
        return;
    }

So on a normal enable -> disable cycle, the EXPRESS_PRE_EMPTIVE_Q_MAP
per-queue entries and the EXPRESS_PRE_EMPTIVE_Q_MASK register keep the
preemptible markings that were written on the prior enable.  Also,
PRE_EMPTION_ENABLE_VERIFY is only zeroed on the error branch above; the
successful disable does not clear it.  Is a subsequent re-enable with a
different mqprio layout intended to see the stale markings from the
previous layout?

> +	netdev_err(emac->ndev, "IET FPE %s successfully\n",
        ^^^^^^^^^^

Is netdev_err the right severity for a success message?  Every link
up/down event while FPE is active goes through emac_adjust_link() ->
icssg_qos_link_state_update() -> icssg_config_ietfpe(), so this will
emit a KERN_ERR line on every link flap.  netdev_info() or
netdev_dbg() might be more appropriate.

> +		  str_enable_disable(iet->fpe_active));
> +}
> +EXPORT_SYMBOL_GPL(icssg_config_ietfpe);

Is EXPORT_SYMBOL_GPL(icssg_config_ietfpe) needed?  The only caller
outside this file (in the follow-up ethtool MAC Merge patch) lives in
icssg_ethtool.c, which is part of the same icssg-y module as
icssg_qos.c, so no export is required.

> +
> +void icssg_qos_init(struct net_device *ndev)
> +{
> +	struct prueth_emac *emac = netdev_priv(ndev);
> +	struct prueth_qos_iet *iet = &emac->qos.iet;
> +
> +	iet->emac = emac;
        ^^^^^^^^^^^^^^^

Is the iet->emac back-pointer used anywhere?  A grep across
drivers/net/ethernet/ti/icssg shows it is only written here; every
function in icssg_qos.c that needs emac either receives it as a
parameter or derives it from netdev_priv().  If it has no readers, the
field and this assignment can be removed.

> +	mutex_init(&iet->fpe_lock);
> +}

[ ... ]

> +static int emac_tc_setup_mqprio(struct net_device *ndev, void *type_data)
> +{
> +	struct tc_mqprio_qopt_offload *mqprio = type_data;
> +	struct prueth_emac *emac = netdev_priv(ndev);
> +	struct tc_mqprio_qopt *qopt = &mqprio->qopt;
> +	struct prueth_qos_mqprio *p_mqprio;
> +	u8 num_tc = mqprio->qopt.num_tc;
> +	int tc, offset, count;
> +
> +	p_mqprio = &emac->qos.mqprio;
> +
> +	if (!num_tc) {
> +		netdev_reset_tc(ndev);
> +		p_mqprio->preemptible_tcs = 0;
> +		p_mqprio->mqprio.qopt.num_tc = 0;
> +		goto reset_tcs;
> +	}

In the num_tc == 0 (qdisc deletion) path, only software bookkeeping is
reset here.  icssg_iet_set_preempt_mask() iterates tc = 0 .. num_tc, so
with num_tc=0 no writeb() is issued to the EXPRESS_PRE_EMPTIVE_Q_MAP
table and per-queue markings written by a prior configuration persist
in firmware memory.  Should the qdisc-deletion path clear the per-queue
map as well?

> +
> +	memcpy(&p_mqprio->mqprio, mqprio, sizeof(*mqprio));
> +	p_mqprio->preemptible_tcs = mqprio->preemptible_tcs;

Can these writes race against emac_adjust_link()?

struct prueth_qos_iet documents that p_mqprio is protected by fpe_lock,
and icssg_iet_set_preempt_mask() reads preemptible_tcs, qopt->num_tc,
qopt->offset[] and qopt->count[] under fpe_lock via
icssg_iet_change_preemptible_tcs().

However, emac_tc_setup_mqprio() writes p_mqprio here without taking
fpe_lock.  ndo_setup_tc runs under RTNL, but emac_adjust_link() is
invoked from the phylib state-machine workqueue which holds only
phydev->lock, not RTNL, so the two paths can run concurrently on
different CPUs.  While this memcpy is in flight, adjust_link can
acquire fpe_lock and observe a torn p_mqprio (for example new num_tc
with old offset/count arrays) and program incorrect
EXPRESS_PRE_EMPTIVE_Q_MAP / EXPRESS_PRE_EMPTIVE_Q_MASK values.

Should fpe_lock be held around the p_mqprio writes in
emac_tc_setup_mqprio(), rather than only around the HW-commit step?

Also, p_mqprio->preemptible_tcs is u8 while
tc_mqprio_qopt_offload::preemptible_tcs is unsigned long.  Any bits
above bit 7 are silently dropped with no error returned to userspace.
The current 4-queue hardware makes this unreachable in practice, but
should the driver either validate num_tc <= 8 via NL_SET_ERR_MSG_MOD or
widen the storage?

It also looks like mqprio->mode, mqprio->shaper, mqprio->flags, and the
min_rate[]/max_rate[] arrays are never inspected.  A user requesting
TC_MQPRIO_MODE_CHANNEL, a shaper, or rate limits will get 0 (success)
back from the kernel, but only the queue-to-TC mapping is applied.
Should the driver reject offload parameters it does not implement?

> +	netdev_set_num_tc(ndev, mqprio->qopt.num_tc);
> +
> +	for (tc = 0; tc < num_tc; tc++) {
> +		count = qopt->count[tc];
> +		offset = qopt->offset[tc];
> +		netdev_set_tc_queue(ndev, tc, count, offset);
> +	}
> +
> +reset_tcs:
> +	icssg_iet_change_preemptible_tcs(emac);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(icssg_qos_init);
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Is this EXPORT_SYMBOL_GPL placed at the intended function?  It follows
the closing brace of emac_tc_setup_mqprio() rather than
icssg_qos_init().  emac_tc_setup_mqprio is static so it cannot be
exported; the export of icssg_qos_init compiles regardless because the
macro references the name directly, but the placement reads like a
copy/paste slip.

> +
> +int icssg_qos_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type,

[ ... ]

^ permalink raw reply

* Re: [PATCH net-next 14/15] ice: dpll: fix rclk pin state get and misplaced header macros
From: Jacob Keller @ 2026-05-05 22:38 UTC (permalink / raw)
  To: Ivan Vecera, Jakub Kicinski
  Cc: Kitszel, Przemyslaw, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Lobakin, Aleksander, Simon Horman,
	Michal Swiatkowski, Jagielski, Jedrzej, Loktionov, Aleksandr,
	Nitka, Grzegorz, Kubalewski, Arkadiusz, Nguyen, Anthony L,
	Wegrzyn, Stefan, Kwapulinski, Piotr, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <070a7a14-d27a-4831-b596-a806a9d93dbc@redhat.com>

On 5/5/2026 1:33 AM, Ivan Vecera wrote:
> 
> 
> On 5/4/26 8:38 PM, Keller, Jacob E wrote:
>>
>>
>>> -----Original Message-----
>>> From: Jakub Kicinski <kuba@kernel.org>
>>> Sent: Saturday, May 2, 2026 7:10 PM
>>> To: Keller, Jacob E <jacob.e.keller@intel.com>
>>> Cc: Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn
>>> <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric
>>> Dumazet <edumazet@google.com>; Paolo Abeni <pabeni@redhat.com>;
>>> Lobakin, Aleksander <aleksander.lobakin@intel.com>; Simon Horman
>>> <horms@kernel.org>; Michal Swiatkowski
>>> <michal.swiatkowski@linux.intel.com>; Jagielski, Jedrzej
>>> <jedrzej.jagielski@intel.com>; Loktionov, Aleksandr
>>> <aleksandr.loktionov@intel.com>; Nitka, Grzegorz
>>> <grzegorz.nitka@intel.com>; Vecera, Ivan <ivecera@redhat.com>;
>>> Kubalewski, Arkadiusz <arkadiusz.kubalewski@intel.com>; Nguyen,
>>> Anthony L
>>> <anthony.l.nguyen@intel.com>; Wegrzyn, Stefan
>>> <stefan.wegrzyn@intel.com>; Kwapulinski, Piotr
>>> <piotr.kwapulinski@intel.com>; netdev@vger.kernel.org; linux-
>>> kernel@vger.kernel.org
>>> Subject: Re: [PATCH net-next 14/15] ice: dpll: fix rclk pin state get
>>> and
>>> misplaced header macros
>>>
>>> On Thu, 30 Apr 2026 23:37:25 -0700 Jacob Keller wrote:
>>>> Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin
>>> discovery")
>>>
>>> Why are Fixes going to net-next?
>>
>> Hm. It was targeted at next in the IWL patchwork. I'm not sure why
>> Ivan chose to do that. I opted to include this in the series because
>> the patches for unmanaged DPLL support have conflicts otherwise due to
>> the placement of the header macros. I didn't consider that "net"
>> material since its relatively minor issue that I think only causes
>> issues if the ice_dpll.h header gets included twice which it doesn't
>> seem to currently.
> 
> This was because it fixed a problem recently merged to next and at that
> time it didn't make sense to target net branch. Originally it was
> submitted on 02/10 !
> 
> Thanks,
> Ivan
> 

Yea, that makes sense. Unfortunately The Intel Wired LAN backlog is
quite large right now.

I've submitted it in the latest round of net fixes (with the decision to
split the two fixes)

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH net-next v3 1/4] netfilter: conntrack: add shared port and uint parsers for helpers
From: Pablo Neira Ayuso @ 2026-05-05 22:33 UTC (permalink / raw)
  To: HACKE-RC
  Cc: Florian Westphal, Phil Sutter, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netfilter-devel,
	coreteam, netdev, linux-kernel
In-Reply-To: <20260503083220.630655-2-rc@rexion.ai>

On Sun, May 03, 2026 at 02:02:17PM +0530, HACKE-RC wrote:
> Add nf_ct_helper_parse_uint() for bounded unsigned integer parsing
> from an unterminated buffer, and nf_ct_helper_parse_port() which calls
> it with max=65535 and rejects port zero.  Both helpers are exported so
> conntrack protocol helpers can replace ad-hoc simple_strtoul() usage.
> 
> Signed-off-by: HACKE-RC <rc@rexion.ai>

You will need a "real name" here.

> ---
>  include/net/netfilter/nf_conntrack_helper.h |  5 +++
>  net/netfilter/nf_conntrack_helper.c         | 39 +++++++++++++++++++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h
> index de2f956ab..ab145fcd9 100644
> --- a/include/net/netfilter/nf_conntrack_helper.h
> +++ b/include/net/netfilter/nf_conntrack_helper.h
> @@ -160,6 +160,11 @@ nf_ct_helper_expectfn_find_by_name(const char *name);
>  struct nf_ct_helper_expectfn *
>  nf_ct_helper_expectfn_find_by_symbol(const void *symbol);
>  
> +int nf_ct_helper_parse_uint(const char *cp, unsigned int len,
> +			    unsigned long max, unsigned long *val, char **endp);
> +int nf_ct_helper_parse_port(const char *cp, unsigned int len,
> +			    u16 *port, char **endp);
> +
>  extern struct hlist_head *nf_ct_helper_hash;
>  extern unsigned int nf_ct_helper_hsize;
>  
> diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
> index a715304a5..f6229957c 100644
> --- a/net/netfilter/nf_conntrack_helper.c
> +++ b/net/netfilter/nf_conntrack_helper.c
> @@ -499,6 +499,45 @@ void nf_nat_helper_unregister(struct nf_conntrack_nat_helper *nat)
>  }
>  EXPORT_SYMBOL_GPL(nf_nat_helper_unregister);
>  
> +int nf_ct_helper_parse_uint(const char *cp, unsigned int len,
> +			    unsigned long max, unsigned long *val, char **endp)

In nf.git, there is a new function sip_strtouint() that can possibly
be moved to nf_conntrack_helper.c.

Thanks.

^ permalink raw reply

* Re: [PATCH net] tcp: tcp_child_process() related UAF
From: Kuniyuki Iwashima @ 2026-05-05 22:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Neal Cardwell, netdev, eric.dumazet, Damiano Melotti
In-Reply-To: <20260505153927.3435532-1-edumazet@google.com>

On Tue, May 5, 2026 at 8:39 AM Eric Dumazet <edumazet@google.com> wrote:
>
> tcp_child_process( .. child ...) currently calls sock_put(child).
>
> Unfortunately @child (named @nsk in callers) can be used after
> this point to send a RST packet.
>
> To fix this UAF, I remove the sock_put() from tcp_child_process()
> and let the callers handle this after it is safe.
>
> Remove @rsk variable in tcp_v4_do_rcv() and change tcp_v6_do_rcv()
> so that both functions look the same.
>
> Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
> Reported-by: Damiano Melotti <melotti@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply

* [PATCH 2/2 net v2] selftests: fib_tests: add temporary IPv6 address renewal test
From: Fernando Fernandez Mancera @ 2026-05-05 22:28 UTC (permalink / raw)
  To: netdev
  Cc: linux-kselftest, horms, pabeni, kuba, edumazet, davem, idosch,
	dsahern, Fernando Fernandez Mancera
In-Reply-To: <20260505222802.4296-1-fmancera@suse.de>

Add a test to check that temporary IPv6 address is regenerated properly
after the base prefix is deprecated and restored.

Fib6 temporary address renewal test
    TEST: IPv6 temporary address cleanly deprecated and regenerated     [ OK ]

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: adjusted the sleep so there is enough time for the issue to trigger,
added cleanup at the end
---
 tools/testing/selftests/net/fib_tests.sh | 59 +++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index af64f93bb2e1..8f10de0eb985 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -12,7 +12,7 @@ TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify \
        ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr \
        ipv6_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh fib6_gc_test \
        ipv4_mpath_list ipv6_mpath_list ipv4_mpath_balance ipv6_mpath_balance \
-       ipv4_mpath_balance_preferred fib6_ra_to_static"
+       ipv4_mpath_balance_preferred fib6_ra_to_static fib6_temp_addr_renewal"
 
 VERBOSE=0
 PAUSE_ON_FAIL=no
@@ -1611,6 +1611,62 @@ fib6_ra_to_static()
 	cleanup &> /dev/null
 }
 
+fib6_temp_addr_renewal() {
+	setup
+
+	echo
+	echo "Fib6 temporary address renewal test"
+	set -e
+
+	# ra6 is required for the test. (ipv6toolkit)
+	if [ ! -x "$(command -v ra6)" ]; then
+	    echo "SKIP: ra6 not found."
+	    set +e
+	    cleanup &> /dev/null
+	    return
+	fi
+
+	# Create a pair of veth devices to send a RA message from one
+	# device to another.
+	$IP link add veth1 type veth peer name veth2
+	$IP link set dev veth1 up
+	$IP link set dev veth2 up
+
+	# Make veth1 ready to receive RA messages.
+	$NS_EXEC sysctl -wq net.ipv6.conf.veth1.accept_ra=2
+	$NS_EXEC sysctl -wq net.ipv6.conf.veth1.use_tempaddr=2
+	$NS_EXEC sysctl -wq net.ipv6.conf.veth1.temp_prefered_lft=15
+	$NS_EXEC sysctl -wq net.ipv6.conf.veth1.max_desync_factor=0
+
+	# Send a RA message with a prefix from veth2.
+	$NS_EXEC ra6 -i veth2 -s fe80::1 -d ff02::1 -P 2001:12::/64\#LA\#3600\#3600 -e
+	sleep 3
+
+	# Deprecate it
+	$NS_EXEC ra6 -i veth2 -s fe80::1 -d ff02::1 -P 2001:12::/64\#LA\#3600\#0 -e
+	sleep 3
+
+	# Restore it
+	$NS_EXEC ra6 -i veth2 -s fe80::1 -d ff02::1 -P 2001:12::/64\#LA\#3600\#3600 -e
+
+	ret=1
+	for i in $(seq 1 25); do
+		sleep 1
+		num_dep="$($IP -6 addr | grep -c "temporary deprecated" || true)"
+		num_tot="$($IP -6 addr | grep -c "temporary" || true)"
+
+		if [ "$num_dep" -eq 1 ] && [ "$num_tot" -ge 2 ]; then
+			ret=0
+			break
+		fi
+	done
+	log_test "$ret" 0 "IPv6 temporary address cleanly deprecated and regenerated"
+
+	set +e
+
+	cleanup &> /dev/null
+}
+
 # add route for a prefix, flushing any existing routes first
 # expected to be the first step of a test
 add_route()
@@ -3002,6 +3058,7 @@ do
 	ipv6_mpath_balance)		ipv6_mpath_balance_test;;
 	ipv4_mpath_balance_preferred)	ipv4_mpath_balance_preferred_test;;
 	fib6_ra_to_static)		fib6_ra_to_static;;
+	fib6_temp_addr_renewal)		fib6_temp_addr_renewal;;
 
 	help) echo "Test names: $TESTS"; exit 0;;
 	esac
-- 
2.53.0


^ permalink raw reply related

* [PATCH 1/2 net v2] ipv6: addrconf: fix temp address generation after prefix deprecation
From: Fernando Fernandez Mancera @ 2026-05-05 22:28 UTC (permalink / raw)
  To: netdev
  Cc: linux-kselftest, horms, pabeni, kuba, edumazet, davem, idosch,
	dsahern, Fernando Fernandez Mancera, Łukasz Stelmach

When a router temporarily deprecates an IPv6 prefix (either by sending a
Router Advertisement with Preferred Lifetime = 0 or by letting the
lifetime expire) and later restores it, the kernel permanently loses its
ability to generate temporary privacy addresses (RFC 8981) for that
prefix.

This happens because the address worker attempts to generate a
replacement temporary address when the current one nears expiration. As
the base prefix is deprecated already, the generation fails after
marking the temporary address already having spawned a replacement
(ifp->regen_count++).

When the router eventually restores the prefix, the temporary address
becomes active again. However, once it naturally expires, the address
worker sees this temporary address already tried to generate one and
skips the regeneration.

Fix this by verifying that the base prefix has sufficient preferred
lifetime remaining before attempting to generate a new temporary
address. In addition, make ipv6_create_tempaddr() return meaningful
error codes. This way, we can catch if a 0-lft RA arrived just after we
passed the verification mentioned above. If we don't have sufficient
preferred lifetime remaining, the worker will keep the next timer as it
is.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Łukasz Stelmach <steelman@post.pl>
Closes: https://lore.kernel.org/netdev/87340td30q.fsf%25steelman@post.pl/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: adjusted commit message, adjusted the implementation to cover all
race conditions
---
 net/ipv6/addrconf.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5476b6536eb7..23338747b429 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1379,7 +1379,7 @@ static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, bool block)
 		write_unlock_bh(&idev->lock);
 		pr_info("%s: use_tempaddr is disabled\n", __func__);
 		in6_dev_put(idev);
-		ret = -1;
+		ret = -EOPNOTSUPP;
 		goto out;
 	}
 	spin_lock_bh(&ifp->lock);
@@ -1390,7 +1390,7 @@ static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, bool block)
 		pr_warn("%s: regeneration time exceeded - disabled temporary address support\n",
 			__func__);
 		in6_dev_put(idev);
-		ret = -1;
+		ret = -EADDRNOTAVAIL;
 		goto out;
 	}
 	in6_ifa_hold(ifp);
@@ -1466,7 +1466,7 @@ static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, bool block)
 		    cfg.preferred_lft > if_public_preferred_lft) {
 			in6_ifa_put(ifp);
 			in6_dev_put(idev);
-			ret = -1;
+			ret = -EINVAL;
 			goto out;
 		}
 	}
@@ -4655,8 +4655,17 @@ static void addrconf_verify_rtnl(struct net *net)
 				/* This is a non-regenerated temporary addr. */
 
 				unsigned long regen_advance = ipv6_get_regen_advance(ifp->idev);
+				unsigned long pub_tstamp = READ_ONCE(ifp->ifpub->tstamp);
+				unsigned long pub_age = 0;
+				bool pub_expired = false;
+
+				if (time_after(now, pub_tstamp))
+					pub_age = (now - pub_tstamp) / HZ;
 
-				if (age + regen_advance >= ifp->prefered_lft) {
+				if (pub_age + regen_advance >= READ_ONCE(ifp->ifpub->prefered_lft))
+					pub_expired = true;
+
+				if (age + regen_advance >= ifp->prefered_lft && !pub_expired) {
 					struct inet6_ifaddr *ifpub = ifp->ifpub;
 					if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next))
 						next = ifp->tstamp + ifp->prefered_lft * HZ;
@@ -4670,12 +4679,18 @@ static void addrconf_verify_rtnl(struct net *net)
 					ifpub->regen_count = 0;
 					spin_unlock(&ifpub->lock);
 					rcu_read_unlock_bh();
-					ipv6_create_tempaddr(ifpub, true);
+
+					if (ipv6_create_tempaddr(ifpub, true) == -EINVAL) {
+						spin_lock_bh(&ifp->lock);
+						ifp->regen_count = 0;
+						spin_unlock_bh(&ifp->lock);
+					}
 					in6_ifa_put(ifpub);
 					in6_ifa_put(ifp);
 					rcu_read_lock_bh();
 					goto restart;
-				} else if (time_before(ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ, next))
+				} else if (time_before(ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ, next) &&
+					   !pub_expired)
 					next = ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ;
 			}
 
-- 
2.53.0


^ permalink raw reply related

* Re: rds: possible cross netns leak via RDS_INFO_* getsockopt
From: Allison Henderson @ 2026-05-05 22:07 UTC (permalink / raw)
  To: Xie Maoyi
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	rds-devel@oss.oracle.com
In-Reply-To: <TYZPR01MB6758F43459242F22946A8192DC3E2@TYZPR01MB6758.apcprd01.prod.exchangelabs.com>

On Tue, 2026-05-05 at 08:37 +0000, Xie Maoyi wrote:
> Hi all,
> 
> We are not sure whether what we observed is a real bug or
> intended behaviour. We would appreciate your view.
> 
> In net/rds/info.c, rds_info_getsockopt() dispatches to handlers
> registered in rds_info_funcs[]. Each handler reads a global list
> that is not pernet:
> 
>   rds_sock_info / rds6_sock_info        -> rds_sock_list
>   rds_tcp_tc_info / rds6_tcp_tc_info    -> rds_tcp_tc_list
>   rds_conn_info / rds6_conn_info        -> rds_conn_hash[]
> 
> None of those filter by the caller's netns. rds_info_getsockopt()
> also has no netns or capable() check. rds_create() has no
> capable() check either. So AF_RDS is reachable from an
> unprivileged user namespace.
> 
> Our reading is that an unprivileged caller in a fresh user_ns
> plus netns can read RDS state from init_net. We see this in
> practice on the latest net tree.
> 
> The fields that come back include:
> 
>   RDS_INFO_SOCKETS:     bound addr, port, sock inode of every
>                         RDS socket on the host
>   RDS_INFO_TCP_SOCKETS: peer addr, port, last_sent_nxt,
>                         last_expected_una, last_seen_una of
>                         every rds-tcp connection on the host
>   RDS_INFO_CONNECTIONS: peer addr, port, cp_next_tx_seq,
>                         cp_next_rx_seq of every RDS connection
> 
> A small reproducer is attached as poc_rds_info.c. With rds and
> rds_tcp loaded, the steps are:
> 
>   modprobe rds
>   modprobe rds_tcp
>   ./poc_rds_info
> 
> The PoC binds an AF_RDS socket in init_net to 127.0.0.1:4242 as
> root. It then enters a fresh user_ns plus netns and opens AF_RDS
> there. The attacker side reads RDS_INFO_SOCKETS and sees the
> init_net socket. A run log is attached as poc_verification.log.
> 
> We are not sure if this counts as a bug or is by design. The
> RDS_INFO_* interface looks diagnostic. It may be expected to be
> host wide. On the other hand, AF_RDS is reachable from an
> unprivileged user namespace, which is what surprised us.
> 
> Could you let us know whether you consider this worth fixing? If
> yes, we have a draft patch that gates rds_info_getsockopt() to
> init_net. We can send it once you confirm the direction.
> 
> Thanks for your time.
> 
> Maoyi Xie and Praveen Kakkolangara
> 
> Maoyi Xie
> Nanyang Technological University
> https://maoyixie.com/


Hi Xie,

Thanks for looking into this.  I think your findings are valid, diagnostic or debug tools shouldn't allow callers
visibility into another netns.  Note though that while the ib transport is limited to init_net the tcp transport is not
(see rds_set_transport()).  So one gate in rds_info_getsockopt would incorrectly filter netns that a tcp connection
might have legitimate visibility to. So the fix would need a filter in each of the three handlers you've identified,
where we can compare the netns of the socket to the netns of the entry (or c_net for connection paths), and only copy
info for relevant sockets instead of every entry in the respective global list/hash.

Allison

> ________________________________
> 
> CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents.
> Towards a sustainable earth: Print only when necessary. Thank you.


^ permalink raw reply

* Re: [net-next RFC PATCH v5 10/10] net: airoha: add phylink support for GDM2/3/4
From: Lorenzo Bianconi @ 2026-05-05 21:56 UTC (permalink / raw)
  To: Christian Marangi
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Heiner Kallweit, Russell King, Philipp Zabel, Nathan Chancellor,
	Nick Desaulniers, Bill Wendling, Justin Stitt, Daniel Golle,
	netdev, devicetree, linux-kernel, linux-arm-kernel,
	linux-mediatek, llvm
In-Reply-To: <20260505182713.27644-11-ansuelsmth@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9168 bytes --]

> Add phylink support for GDM2/3/4 port that require configuration of the
> PCS to make the external PHY or attached SFP cage work.
> 
> These needs to be defined in the GDM port node using the pcs-handle
> property.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>

Hi Christian,

just a couple of nits inline.

Regards,
Lorenzo

> ---
>  drivers/net/ethernet/airoha/Kconfig       |   1 +
>  drivers/net/ethernet/airoha/airoha_eth.c  | 144 +++++++++++++++++++++-
>  drivers/net/ethernet/airoha/airoha_eth.h  |   3 +
>  drivers/net/ethernet/airoha/airoha_regs.h |  12 ++
>  4 files changed, 159 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/airoha/Kconfig b/drivers/net/ethernet/airoha/Kconfig
> index ad3ce501e7a5..38dcc76e5998 100644
> --- a/drivers/net/ethernet/airoha/Kconfig
> +++ b/drivers/net/ethernet/airoha/Kconfig
> @@ -20,6 +20,7 @@ config NET_AIROHA
>  	depends on NET_DSA || !NET_DSA
>  	select NET_AIROHA_NPU
>  	select PAGE_POOL
> +	select PHYLINK
>  	help
>  	  This driver supports the gigabit ethernet MACs in the
>  	  Airoha SoC family.
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 2bd79da70934..ad0328a25422 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -8,6 +8,7 @@
>  #include <linux/of_reserved_mem.h>
>  #include <linux/platform_device.h>
>  #include <linux/tcp.h>
> +#include <linux/pcs/pcs.h>
>  #include <linux/u64_stats_sync.h>
>  #include <net/dst_metadata.h>
>  #include <net/page_pool/helpers.h>
> @@ -71,6 +72,11 @@ static void airoha_qdma_irq_disable(struct airoha_irq_bank *irq_bank,
>  	airoha_qdma_set_irqmask(irq_bank, index, mask, 0);
>  }
>  
> +static bool airhoa_is_phy_external(struct airoha_gdm_port *port)
> +{
> +	return port->id != 1;

I guess you can use AIROHA_GDM1_IDX here.

> +}
> +
>  static void airoha_set_macaddr(struct airoha_gdm_port *port, const u8 *addr)
>  {
>  	struct airoha_eth *eth = port->qdma->eth;
> @@ -1644,6 +1650,17 @@ static int airoha_dev_open(struct net_device *dev)
>  	struct airoha_qdma *qdma = port->qdma;
>  	u32 pse_port = FE_PSE_PORT_PPE1;
>  
> +	if (airhoa_is_phy_external(port)) {
> +		err = phylink_of_phy_connect(port->phylink, dev->dev.of_node, 0);
> +		if (err) {
> +			netdev_err(dev, "%s: could not attach PHY: %d\n", __func__,
> +				   err);
> +			return err;
> +		}
> +
> +		phylink_start(port->phylink);
> +	}
> +
>  	netif_tx_start_all_queues(dev);
>  	err = airoha_set_vip_for_gdm_port(port, true);
>  	if (err)
> @@ -1707,6 +1724,11 @@ static int airoha_dev_stop(struct net_device *dev)
>  		}
>  	}
>  
> +	if (airhoa_is_phy_external(port)) {
> +		phylink_stop(port->phylink);
> +		phylink_disconnect_phy(port->phylink);
> +	}
> +
>  	return 0;
>  }
>  
> @@ -2883,6 +2905,115 @@ bool airoha_is_valid_gdm_port(struct airoha_eth *eth,
>  	return false;
>  }
>  
> +static void airoha_mac_link_up(struct phylink_config *config, struct phy_device *phy,
> +			       unsigned int mode, phy_interface_t interface,
> +			       int speed, int duplex, bool tx_pause, bool rx_pause)
> +{
> +	struct airoha_gdm_port *port = container_of(config, struct airoha_gdm_port,
> +						    phylink_config);
> +	struct airoha_qdma *qdma = port->qdma;
> +	struct airoha_eth *eth = qdma->eth;

since you do not need qdma pointer here, you can just do port->eth here.

> +	u32 frag_size_tx, frag_size_rx;
> +
> +	if (port->id != 4)

same here, AIROHA_GDM4_IDX

> +		return;
> +
> +	switch (speed) {
> +	case SPEED_10000:
> +	case SPEED_5000:
> +		frag_size_tx = 8;
> +		frag_size_rx = 8;
> +		break;
> +	case SPEED_2500:
> +		frag_size_tx = 2;
> +		frag_size_rx = 1;
> +		break;
> +	default:
> +		frag_size_tx = 1;
> +		frag_size_rx = 0;
> +	}
> +
> +	/* Configure TX/RX frag based on speed */
> +	airoha_fe_rmw(eth, REG_GDMA4_TMBI_FRAG,
> +		      GDMA4_SGMII0_TX_FRAG_SIZE_MASK,
> +		      FIELD_PREP(GDMA4_SGMII0_TX_FRAG_SIZE_MASK,
> +				 frag_size_tx));
> +
> +	airoha_fe_rmw(eth, REG_GDMA4_RMBI_FRAG,
> +		      GDMA4_SGMII0_RX_FRAG_SIZE_MASK,
> +		      FIELD_PREP(GDMA4_SGMII0_RX_FRAG_SIZE_MASK,
> +				 frag_size_rx));
> +}
> +
> +static const struct phylink_mac_ops airoha_phylink_ops = {
> +	.mac_link_up = airoha_mac_link_up,
> +};
> +
> +static int airoha_setup_phylink(struct net_device *dev)
> +{
> +	struct airoha_gdm_port *port = netdev_priv(dev);
> +	struct device_node *np = dev->dev.of_node;
> +	struct phylink_pcs **available_pcs;
> +	phy_interface_t phy_mode;
> +	struct phylink *phylink;
> +	unsigned int num_pcs;
> +	int err;
> +
> +	err = of_get_phy_mode(np, &phy_mode);
> +	if (err) {
> +		dev_err(&dev->dev, "incorrect phy-mode\n");
> +		return err;
> +	}
> +
> +	port->phylink_config.dev = &dev->dev;
> +	port->phylink_config.type = PHYLINK_NETDEV;
> +	port->phylink_config.mac_capabilities = MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
> +						MAC_10 | MAC_100 | MAC_1000 | MAC_2500FD |
> +						MAC_5000FD | MAC_10000FD;
> +
> +	err = fwnode_phylink_pcs_parse(dev_fwnode(&dev->dev), NULL, &num_pcs);
> +	if (err)
> +		return err;
> +
> +	available_pcs = kcalloc(num_pcs, sizeof(*available_pcs), GFP_KERNEL);
> +	if (!available_pcs)
> +		return -ENOMEM;
> +
> +	err = fwnode_phylink_pcs_parse(dev_fwnode(&dev->dev), available_pcs,
> +				       &num_pcs);
> +	if (err)
> +		goto out;
> +
> +	port->phylink_config.available_pcs = available_pcs;
> +	port->phylink_config.num_available_pcs = num_pcs;
> +
> +	__set_bit(PHY_INTERFACE_MODE_SGMII,
> +		  port->phylink_config.supported_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_1000BASEX,
> +		  port->phylink_config.supported_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_2500BASEX,
> +		  port->phylink_config.supported_interfaces);
> +	__set_bit(PHY_INTERFACE_MODE_USXGMII,
> +		  port->phylink_config.supported_interfaces);
> +
> +	phy_interface_copy(port->phylink_config.pcs_interfaces,
> +			   port->phylink_config.supported_interfaces);
> +
> +	phylink = phylink_create(&port->phylink_config,
> +				 of_fwnode_handle(np),
> +				 phy_mode, &airoha_phylink_ops);
> +	if (IS_ERR(phylink)) {
> +		err = PTR_ERR(phylink);
> +		goto out;
> +	}
> +
> +	port->phylink = phylink;
> +out:
> +	kfree(available_pcs);
> +
> +	return err;
> +}
> +
>  static int airoha_alloc_gdm_port(struct airoha_eth *eth,
>  				 struct device_node *np)
>  {
> @@ -2954,6 +3085,12 @@ static int airoha_alloc_gdm_port(struct airoha_eth *eth,
>  	port->id = id;
>  	eth->ports[p] = port;
>  
> +	if (airhoa_is_phy_external(port)) {
> +		err = airoha_setup_phylink(dev);

should it be in airoha_register_gdm_devices()?

> +		if (err)
> +			return err;
> +	}
> +
>  	return airoha_metadata_dst_alloc(port);
>  }
>  
> @@ -3081,8 +3218,11 @@ static int airoha_probe(struct platform_device *pdev)
>  		if (!port)
>  			continue;
>  
> -		if (port->dev->reg_state == NETREG_REGISTERED)
> +		if (port->dev->reg_state == NETREG_REGISTERED) {
> +			if (airhoa_is_phy_external(port))
> +				phylink_destroy(port->phylink);
>  			unregister_netdev(port->dev);
> +		}
>  		airoha_metadata_dst_free(port);
>  	}
>  	airoha_hw_cleanup(eth);
> @@ -3107,6 +3247,8 @@ static void airoha_remove(struct platform_device *pdev)
>  		if (!port)
>  			continue;
>  
> +		if (airhoa_is_phy_external(port))
> +			phylink_destroy(port->phylink);
>  		unregister_netdev(port->dev);
>  		airoha_metadata_dst_free(port);
>  	}
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
> index af29fc74165b..e5c70f1fa4f1 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.h
> +++ b/drivers/net/ethernet/airoha/airoha_eth.h
> @@ -538,6 +538,9 @@ struct airoha_gdm_port {
>  	struct net_device *dev;
>  	int id;
>  
> +	struct phylink *phylink;
> +	struct phylink_config phylink_config;
> +
>  	struct airoha_hw_stats stats;
>  
>  	DECLARE_BITMAP(qos_sq_bmap, AIROHA_NUM_QOS_CHANNELS);
> diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
> index 436f3c8779c1..27f2583e143a 100644
> --- a/drivers/net/ethernet/airoha/airoha_regs.h
> +++ b/drivers/net/ethernet/airoha/airoha_regs.h
> @@ -358,6 +358,18 @@
>  #define IP_FRAGMENT_PORT_MASK		GENMASK(8, 5)
>  #define IP_FRAGMENT_NBQ_MASK		GENMASK(4, 0)
>  
> +#define REG_GDMA4_TMBI_FRAG		0x2028
> +#define GDMA4_SGMII1_TX_WEIGHT_MASK	GENMASK(31, 26)
> +#define GDMA4_SGMII1_TX_FRAG_SIZE_MASK	GENMASK(25, 16)
> +#define GDMA4_SGMII0_TX_WEIGHT_MASK	GENMASK(15, 10)
> +#define GDMA4_SGMII0_TX_FRAG_SIZE_MASK	GENMASK(9, 0)
> +
> +#define REG_GDMA4_RMBI_FRAG		0x202c
> +#define GDMA4_SGMII1_RX_WEIGHT_MASK	GENMASK(31, 26)
> +#define GDMA4_SGMII1_RX_FRAG_SIZE_MASK	GENMASK(25, 16)
> +#define GDMA4_SGMII0_RX_WEIGHT_MASK	GENMASK(15, 10)
> +#define GDMA4_SGMII0_RX_FRAG_SIZE_MASK	GENMASK(9, 0)
> +
>  #define REG_MC_VLAN_EN			0x2100
>  #define MC_VLAN_EN_MASK			BIT(0)
>  
> -- 
> 2.53.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH net v2] net: ethernet: cortina: Drop half-assembled SKB
From: Linus Walleij @ 2026-05-05 21:52 UTC (permalink / raw)
  To: Hans Ulli Kroll, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Michał Mirosław,
	Li Xiasong
  Cc: netdev, Andreas Haarmann-Thiemann, Linus Walleij

From: Andreas Haarmann-Thiemann <eitschman@nebelreich.de>

In gmac_rx() (drivers/net/ethernet/cortina/gemini.c), when
gmac_get_queue_page() returns NULL for the second page of a multi-page
fragment, the driver logs an error and continues — but does not free the
partially assembled skb that was being assembled via napi_build_skb() /
napi_get_frags().

Free the in-progress partially assembled skb via napi_free_frags()
and increase the number of dropped frames appropriately
and assign the skb pointer NULL to make sure it is not lingering
around, matching the pattern already used elsewhere in the driver.

Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Andreas Haarmann-Thiemann <eitschman@nebelreich.de>
Signed-off-by: Linus Walleij <linusw@kernel.org>
---
Changes in v2:
- Fix up the commit message so it is clear what the patch is doing.
- Also increase the number of dropped frames as noted by Li Xiasong.
- Link to v1: https://lore.kernel.org/r/20260330-gemini-ethernet-fix-v1-1-18783a45d13a@kernel.org
---
 drivers/net/ethernet/cortina/gemini.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 4824232f4890..065cbbf52686 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -1491,6 +1491,11 @@ static unsigned int gmac_rx(struct net_device *netdev, unsigned int budget)
 		gpage = gmac_get_queue_page(geth, port, mapping + PAGE_SIZE);
 		if (!gpage) {
 			dev_err(geth->dev, "could not find mapping\n");
+			if (skb) {
+				napi_free_frags(&port->napi);
+				port->stats.rx_dropped++;
+				skb = NULL;
+			}
 			continue;
 		}
 		page = gpage->page;

---
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
change-id: 20260330-gemini-ethernet-fix-604c28c53da1

Best regards,
--  
Linus Walleij <linusw@kernel.org>


^ permalink raw reply related

* Re: [PATCH net-next v5 2/3] gve: make nic clock reads thread safe
From: Jordan Rhee @ 2026-05-05 21:38 UTC (permalink / raw)
  To: Harshitha Ramamurthy
  Cc: netdev, joshwash, andrew+netdev, davem, edumazet, kuba, pabeni,
	richardcochran, jstultz, tglx, sboyd, willemb, nktgrg, jfraker,
	ziweixiao, maolson, thostet, alok.a.tiwari, pkaligineedi, horms,
	dwmw2, jacob.e.keller, yyd, linux-kernel
In-Reply-To: <20260429012819.3102675-3-hramamurthy@google.com>

On Tue, Apr 28, 2026 at 6:28 PM Harshitha Ramamurthy
<hramamurthy@google.com> wrote:
>
> From: Ankit Garg <nktgrg@google.com>
>
> Add a mutex to protect the shared DMA buffer that receives NIC
> timestamp reports. The NIC timestamp will be read from two different
> threads: the periodic worker and upcoming `gettimex64`.
>
> Move clock registration to the last step of initialization to ensure
> that all data needed by the clock module is initialized before
> the clock is exposed to usermode.
>
> Reviewed-by: Joshua Washington <joshwash@google.com>
> Signed-off-by: Ankit Garg <nktgrg@google.com>
> Signed-off-by: Jordan Rhee <jordanrhee@google.com>
> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
> ---
> Changes in v3:
> - Reorder init/teardown to register PTP clock last, and simplify code
> - Move ptp-related members from gve_priv to gve_ptp
> - Only assign priv->ptp after ptp module is successfully initialized
> ---
>  drivers/net/ethernet/google/gve/gve.h         |  12 +-
>  drivers/net/ethernet/google/gve/gve_ethtool.c |   3 +-
>  drivers/net/ethernet/google/gve/gve_ptp.c     | 134 ++++++++----------
>  3 files changed, 63 insertions(+), 86 deletions(-)
>
> diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> index 1d66d3834f7e..7b69d0cfc0d5 100644
> --- a/drivers/net/ethernet/google/gve/gve.h
> +++ b/drivers/net/ethernet/google/gve/gve.h
> @@ -792,6 +792,9 @@ struct gve_ptp {
>         struct ptp_clock_info info;
>         struct ptp_clock *clock;
>         struct gve_priv *priv;
> +       struct mutex nic_ts_read_lock; /* Protects nic_ts_report */
> +       struct gve_nic_ts_report *nic_ts_report;
> +       dma_addr_t nic_ts_report_bus;
>  };
>
>  struct gve_priv {
> @@ -923,8 +926,6 @@ struct gve_priv {
>         bool nic_timestamp_supported;
>         struct gve_ptp *ptp;
>         struct kernel_hwtstamp_config ts_config;
> -       struct gve_nic_ts_report *nic_ts_report;
> -       dma_addr_t nic_ts_report_bus;
>         u64 last_sync_nic_counter; /* Clock counter from last NIC TS report */
>  };
>
> @@ -1201,7 +1202,7 @@ static inline bool gve_supports_xdp_xmit(struct gve_priv *priv)
>
>  static inline bool gve_is_clock_enabled(struct gve_priv *priv)
>  {
> -       return priv->nic_ts_report;
> +       return priv->ptp;
>  }
>
>  /* gqi napi handler defined in gve_main.c */
> @@ -1321,14 +1322,9 @@ int gve_flow_rules_reset(struct gve_priv *priv);
>  int gve_init_rss_config(struct gve_priv *priv, u16 num_queues);
>  /* PTP and timestamping */
>  #if IS_ENABLED(CONFIG_PTP_1588_CLOCK)
> -int gve_clock_nic_ts_read(struct gve_priv *priv);
>  int gve_init_clock(struct gve_priv *priv);
>  void gve_teardown_clock(struct gve_priv *priv);
>  #else /* CONFIG_PTP_1588_CLOCK */
> -static inline int gve_clock_nic_ts_read(struct gve_priv *priv)
> -{
> -       return -EOPNOTSUPP;
> -}
>
>  static inline int gve_init_clock(struct gve_priv *priv)
>  {
> diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
> index dc2213b5ce24..4fd7e8a442c5 100644
> --- a/drivers/net/ethernet/google/gve/gve_ethtool.c
> +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
> @@ -972,8 +972,7 @@ static int gve_get_ts_info(struct net_device *netdev,
>                 info->rx_filters |= BIT(HWTSTAMP_FILTER_NONE) |
>                                     BIT(HWTSTAMP_FILTER_ALL);
>
> -               if (priv->ptp)
> -                       info->phc_index = ptp_clock_index(priv->ptp->clock);
> +               info->phc_index = ptp_clock_index(priv->ptp->clock);
>         }
>
>         return 0;
> diff --git a/drivers/net/ethernet/google/gve/gve_ptp.c b/drivers/net/ethernet/google/gve/gve_ptp.c
> index 06b1cf4a5efc..ad15f1209a83 100644
> --- a/drivers/net/ethernet/google/gve/gve_ptp.c
> +++ b/drivers/net/ethernet/google/gve/gve_ptp.c
> @@ -11,19 +11,20 @@
>  #define GVE_NIC_TS_SYNC_INTERVAL_MS 250
>
>  /* Read the nic timestamp from hardware via the admin queue. */
> -int gve_clock_nic_ts_read(struct gve_priv *priv)
> +static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw)
>  {
> -       u64 nic_raw;
>         int err;
>
> -       err = gve_adminq_report_nic_ts(priv, priv->nic_ts_report_bus);
> +       mutex_lock(&ptp->nic_ts_read_lock);
> +       err = gve_adminq_report_nic_ts(ptp->priv, ptp->nic_ts_report_bus);
>         if (err)
> -               return err;
> +               goto out;
>
> -       nic_raw = be64_to_cpu(priv->nic_ts_report->nic_timestamp);
> -       WRITE_ONCE(priv->last_sync_nic_counter, nic_raw);
> +       *nic_raw = be64_to_cpu(ptp->nic_ts_report->nic_timestamp);
>
> -       return 0;
> +out:
> +       mutex_unlock(&ptp->nic_ts_read_lock);
> +       return err;
>  }
>
>  static int gve_ptp_gettimex64(struct ptp_clock_info *info,
> @@ -41,17 +42,21 @@ static int gve_ptp_settime64(struct ptp_clock_info *info,
>
>  static long gve_ptp_do_aux_work(struct ptp_clock_info *info)
>  {
> -       const struct gve_ptp *ptp = container_of(info, struct gve_ptp, info);
> +       struct gve_ptp *ptp = container_of(info, struct gve_ptp, info);
>         struct gve_priv *priv = ptp->priv;
> +       u64 nic_raw;
>         int err;
>
>         if (gve_get_reset_in_progress(priv) || !gve_get_admin_queue_ok(priv))
>                 goto out;
>
> -       err = gve_clock_nic_ts_read(priv);
> -       if (err && net_ratelimit())
> -               dev_err(&priv->pdev->dev,
> -                       "%s read err %d\n", __func__, err);
> +       err = gve_clock_nic_ts_read(ptp, &nic_raw);
> +       if (err) {
> +               dev_err_ratelimited(&priv->pdev->dev, "%s read err %d\n",
> +                                   __func__, err);
> +               goto out;
> +       }
> +       WRITE_ONCE(priv->last_sync_nic_counter, nic_raw);

Sashiko says:
---
This problem wasn't introduced by this patch, but since the patch moves
the WRITE_ONCE() operation, does it cause a torn read on 32-bit
architectures?

The 64-bit variable last_sync_nic_counter is updated in process context (by
the PTP worker thread) using WRITE_ONCE() and read concurrently in NAPI
softirq context (gve_rx_get_hwtstamp()) using READ_ONCE().

On 32-bit architectures, READ_ONCE() and WRITE_ONCE() on 64-bit variables
do not guarantee atomicity and will compile into two separate 32-bit memory
accesses.

If a softirq interrupts the worker thread exactly between the two 32-bit
writes, could it read a corrupted 64-bit value combining halves of the old
and new timestamps?

Because the upper 32 bits wrap roughly every 4.3 seconds, a torn read will
cause a multi-second jump in the calculated packet timestamps, corrupting
PTP synchronization and packet captures.

Would it be better to use a seqcount, u64_stats_sync, or atomic64_t to
synchronize these accesses?
---

GVE only runs on x86_64 and ARM64, so this won't hit in the wild. It
might be good to fix from a code correctness perspective, but I
believe it's beyond the scope of this patch which is to introduce
thread safety to gve_clock_nic_ts_read().

>
>  out:
>         return msecs_to_jiffies(GVE_NIC_TS_SYNC_INTERVAL_MS);
> @@ -65,94 +70,71 @@ static const struct ptp_clock_info gve_ptp_caps = {
>         .do_aux_work    = gve_ptp_do_aux_work,
>  };
>
> -static int gve_ptp_init(struct gve_priv *priv)
> +int gve_init_clock(struct gve_priv *priv)
>  {
>         struct gve_ptp *ptp;
> +       u64 nic_raw;
>         int err;
>
> -       priv->ptp = kzalloc_obj(*priv->ptp);
> -       if (!priv->ptp)
> +       ptp = kzalloc_obj(*priv->ptp);
> +       if (!ptp)
>                 return -ENOMEM;
>
> -       ptp = priv->ptp;
>         ptp->info = gve_ptp_caps;
> -       ptp->clock = ptp_clock_register(&ptp->info, &priv->pdev->dev);
> -
> -       if (IS_ERR(ptp->clock)) {
> -               dev_err(&priv->pdev->dev, "PTP clock registration failed\n");
> -               err  = PTR_ERR(ptp->clock);
> -               goto free_ptp;
> -       }
> -
>         ptp->priv = priv;
> -       return 0;
> -
> -free_ptp:
> -       kfree(ptp);
> -       priv->ptp = NULL;
> -       return err;
> -}
> -
> -static void gve_ptp_release(struct gve_priv *priv)
> -{
> -       struct gve_ptp *ptp = priv->ptp;
> -
> -       if (!ptp)
> -               return;
> -
> -       if (ptp->clock)
> -               ptp_clock_unregister(ptp->clock);
> -
> -       kfree(ptp);
> -       priv->ptp = NULL;
> -}
> -
> -int gve_init_clock(struct gve_priv *priv)
> -{
> -       int err;
> -
> -       err = gve_ptp_init(priv);
> -       if (err)
> -               return err;
> -
> -       priv->nic_ts_report =
> +       mutex_init(&ptp->nic_ts_read_lock);
> +       ptp->nic_ts_report =
>                 dma_alloc_coherent(&priv->pdev->dev,
>                                    sizeof(struct gve_nic_ts_report),
> -                                  &priv->nic_ts_report_bus,
> -                                  GFP_KERNEL);
> -       if (!priv->nic_ts_report) {
> +                                  &ptp->nic_ts_report_bus, GFP_KERNEL);
> +       if (!ptp->nic_ts_report) {
>                 dev_err(&priv->pdev->dev, "%s dma alloc error\n", __func__);
>                 err = -ENOMEM;
> -               goto release_ptp;
> +               goto free_ptp;
>         }
> -       err = gve_clock_nic_ts_read(priv);
> +
> +       err = gve_clock_nic_ts_read(ptp, &nic_raw);
>         if (err) {
>                 dev_err(&priv->pdev->dev, "failed to read NIC clock %d\n", err);
> -               goto release_nic_ts_report;
> +               goto free_dma_mem;
>         }
> -       ptp_schedule_worker(priv->ptp->clock,
> +       WRITE_ONCE(priv->last_sync_nic_counter, nic_raw);
> +
> +       ptp->clock = ptp_clock_register(&ptp->info, &priv->pdev->dev);
> +       if (IS_ERR(ptp->clock)) {
> +               dev_err(&priv->pdev->dev, "PTP clock registration failed\n");
> +               err = PTR_ERR(ptp->clock);
> +               goto free_dma_mem;
> +       }
> +
> +       priv->ptp = ptp;

Sashiko says:
---
This problem wasn't introduced by this patch, but since the patch attempts
to fix a related publication race by moving the clock registration, could
this priv->ptp assignment be reordered before ptp->clock is fully
initialized?

If gve_get_ts_info() reads priv->ptp locklessly (for example, when called
from sock_timestamping_bind_phc()), could it see an uninitialized
ptp->clock because smp_store_release() wasn't used here?
---

The only place that priv->ptp is accessed is from
gve_is_clock_enabled(), which is called from the following places:
 - gve_get_ts_info() ethtool op
   - No risk of race with gve_init_clock() when called from probe()
because it runs *before* netdev is registered. The ethool operation
cannot be dispatched before the netdev is registered.
   - No risk of race with gve_init_clock() when called in reset
recovery path because rtnl and netdev locks are held across
gve_reset(), and ethtool operations also take the rtnl/netdev locks.
 - gve_set_ts_config() ndo op and gve_xdp_rx_timestamp()
   - No risk of race because gve_init_clock() runs before netdev is registered


> +       ptp_schedule_worker(ptp->clock,
>                             msecs_to_jiffies(GVE_NIC_TS_SYNC_INTERVAL_MS));
>
>         return 0;
>
> -release_nic_ts_report:
> -       dma_free_coherent(&priv->pdev->dev,
> -                         sizeof(struct gve_nic_ts_report),
> -                         priv->nic_ts_report, priv->nic_ts_report_bus);
> -       priv->nic_ts_report = NULL;
> -release_ptp:
> -       gve_ptp_release(priv);
> +free_dma_mem:
> +       dma_free_coherent(&priv->pdev->dev, sizeof(struct gve_nic_ts_report),
> +                         ptp->nic_ts_report, ptp->nic_ts_report_bus);
> +       ptp->nic_ts_report = NULL;
> +free_ptp:
> +       mutex_destroy(&ptp->nic_ts_read_lock);
> +       kfree(ptp);
>         return err;
>  }
>
>  void gve_teardown_clock(struct gve_priv *priv)
>  {
> -       gve_ptp_release(priv);
> +       struct gve_ptp *ptp = priv->ptp;
>
> -       if (priv->nic_ts_report) {
> -               dma_free_coherent(&priv->pdev->dev,
> -                                 sizeof(struct gve_nic_ts_report),
> -                                 priv->nic_ts_report, priv->nic_ts_report_bus);
> -               priv->nic_ts_report = NULL;
> -       }
> +       if (!ptp)
> +               return;
> +
> +       priv->ptp = NULL;
> +       ptp_clock_unregister(ptp->clock);
> +       dma_free_coherent(&priv->pdev->dev, sizeof(struct gve_nic_ts_report),
> +                         ptp->nic_ts_report, ptp->nic_ts_report_bus);
> +       ptp->nic_ts_report = NULL;
> +       mutex_destroy(&ptp->nic_ts_read_lock);
> +       kfree(ptp);

Sashiko says:
---
Can this synchronous kfree() cause a use-after-free or a NULL pointer
dereference?

If a concurrent device reset executes gve_teardown_clock(), it sets
priv->ptp = NULL and frees ptp synchronously without an RCU grace period
(such as using kfree_rcu() or synchronize_rcu()).

If gve_get_ts_info() reads priv->ptp locklessly, could it dereference
priv->ptp->clock while it is being freed, leading to a Time-Of-Check to
Time-Of-Use (TOCTOU) bug?
---

Reset cannot race against the gve_get_ts_info() ethtool op because of
the netdev/rtnl locks.

>  }
> --
> 2.54.0.545.g6539524ca2-goog
>

^ permalink raw reply

* RE: [PATCH v7 net-next 5/8] ice: introduce TXC DPLL device and TX ref clock pin framework for E825
From: Nitka, Grzegorz @ 2026-05-05 21:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, Vecera, Ivan, vadim.fedorenko@linux.dev,
	jiri@resnulli.us, edumazet@google.com, Kitszel, Przemyslaw,
	richardcochran@gmail.com, donald.hunter@gmail.com,
	linux-kernel@vger.kernel.org, Kubalewski, Arkadiusz,
	andrew+netdev@lunn.ch, intel-wired-lan@lists.osuosl.org,
	horms@kernel.org, Prathosh.Satish@microchip.com,
	Nguyen, Anthony L, pabeni@redhat.com, davem@davemloft.net
In-Reply-To: <20260502173312.3849706-1-kuba@kernel.org>



> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Saturday, May 2, 2026 7:33 PM
> To: Nitka, Grzegorz <grzegorz.nitka@intel.com>
> Cc: Jakub Kicinski <kuba@kernel.org>; netdev@vger.kernel.org; Vecera, Ivan
> <ivecera@redhat.com>; vadim.fedorenko@linux.dev; jiri@resnulli.us;
> edumazet@google.com; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>;
> richardcochran@gmail.com; donald.hunter@gmail.com; linux-
> kernel@vger.kernel.org; Kubalewski, Arkadiusz
> <arkadiusz.kubalewski@intel.com>; andrew+netdev@lunn.ch; intel-wired-
> lan@lists.osuosl.org; horms@kernel.org; Prathosh.Satish@microchip.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; pabeni@redhat.com;
> davem@davemloft.net
> Subject: Re: [PATCH v7 net-next 5/8] ice: introduce TXC DPLL device and TX ref
> clock pin framework for E825
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> ice: introduce TXC DPLL device and TX ref clock pin framework for E825
> 
> This patch adds a new TXC DPLL device of DPLL_TYPE_GENERIC for E825
> devices, together with a TXCLK pin type and two TX reference pins
> (EXT_EREF0 and a fwnode-described SYNCE pin). Pin state callbacks and
> TXC lock status are placeholders at this stage.
> 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c
> b/drivers/net/ethernet/intel/ice/ice_dpll.c
> > index 62f75701d652..d839b50187ba 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_dpll.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
> 
> [ ... ]
> 
> > @@ -3199,19 +3276,40 @@ static bool ice_dpll_is_fwnode_pin(struct
> ice_dpll_pin *pin)
> >  	return !IS_ERR_OR_NULL(pin->fwnode);
> >  }
> >
> > +static bool ice_dpll_fwnode_eq(const struct fwnode_handle *a,
> > +			       const struct fwnode_handle *b)
> > +{
> > +	return a && b && a == b;
> > +}
> > +
> >  static void ice_dpll_pin_notify_work(struct work_struct *work)
> >  {
> >  	struct ice_dpll_pin_work *w = container_of(work,
> >  						   struct ice_dpll_pin_work,
> >  						   work);
> >  	struct ice_dpll_pin *pin, *parent = w->pin;
> > +	bool is_tx_synce_parent = false;
> >  	struct ice_pf *pf = parent->pf;
> > +	bool is_rclk_parent = false;
> >  	int ret;
> >
> >  	wait_for_completion(&pf->dplls.dpll_init);
> >  	if (!test_bit(ICE_FLAG_DPLL, pf->flags))
> >  		goto out; /* DPLL initialization failed */
> >
> > +	/* Decide which parent we are handling, defensively checking FWNs */
> 
> [ ... ]
> 
> > +		} else if (is_tx_synce_parent) {
> > +			/* Register TX-CLK SYNCE pin directly to TXC DPLL */
> > +			pin = &pf->dplls.txclks[E825_EXT_SYNCE_PIN_IDX];
> > +			ret = dpll_pin_register(pf->dplls.txc.dpll, pin->pin,
> > +						&ice_dpll_txclk_ops, pin);
> > +			if (ret) {
> > +				dev_err(ice_pf_to_dev(pf),
> > +					"TX SYNCE pin register failed: %pe\n",
> > +					ERR_PTR(ret));
> > +				goto drop_parent_ref;
> > +			}
> >  		}
> >  		break;
> 
> [ ... ]
> 
> > @@ -3401,6 +3524,19 @@ ice_dpll_deinit_fwnode_pins(struct ice_pf *pf,
> struct ice_dpll_pin *pins,
> >  	destroy_workqueue(pf->dplls.wq);
> >  }
> >
> > +static int ice_dpll_deinit_txclk_pins(struct ice_pf *pf)
> > +{
> > +	struct ice_dpll_pin *synce_pin = &pf-
> >dplls.txclks[E825_EXT_SYNCE_PIN_IDX];
> > +	struct ice_dpll *dt = &pf->dplls.txc;
> > +
> > +	ice_dpll_unregister_pins(dt->dpll, pf->dplls.txclks,
> > +				 &ice_dpll_txclk_ops,
> > +				 ARRAY_SIZE(pf->dplls.txclks));
> > +	ice_dpll_release_pins(&pf->dplls.txclks[E825_EXT_EREF_PIN_IDX], 1);
> > +	ice_dpll_deinit_fwnode_pin(synce_pin);
> > +	return 0;
> > +}
> 
> Is there a teardown race between this function and a concurrent
> ice_dpll_pin_notify_work() that is about to register the SYNCE pin on
> the TXC DPLL?
> 
> The work's registration path runs:
> 
> 	parent->pin = fwnode_dpll_pin_find(parent->fwnode,
> 					   &parent->tracker);
> 	...
> 	} else if (is_tx_synce_parent) {
> 		pin = &pf->dplls.txclks[E825_EXT_SYNCE_PIN_IDX];
> 		ret = dpll_pin_register(pf->dplls.txc.dpll, pin->pin,
> 					&ice_dpll_txclk_ops, pin);
> 
> while ice_dpll_deinit_txclk_pins() first scans pf->dplls.txclks[] with
> ice_dpll_unregister_pins(), which skips any slot where pins[i].pin is
> still NULL:
> 
> 	for (i = 0; i < count; i++) {
> 		if (pins[i].hidden)
> 			continue;
> 		if (IS_ERR_OR_NULL(pins[i].pin))
> 			continue;
> 		dpll_pin_unregister(dpll, pins[i].pin, ops, &pins[i]);
> 	}
> 
> Only afterward does it call ice_dpll_deinit_fwnode_pin(synce_pin), which
> unregisters the notifier and flushes pf->dplls.wq.
> 
> If an external fwnode DPLL provider emits a DPLL_PIN_CREATED for
> clk_ref_synce between the two steps, can this sequence occur?
> 
>   deinit thread                    notifier work
>   -------------                    -------------
>   ice_dpll_unregister_pins(txclks)
>      sees txclks[1].pin == NULL
>      skips SYNCE slot
>                                    parent->pin = fwnode_dpll_pin_find()
>                                    dpll_pin_register(txc->dpll,
>                                        pin->pin, &ops, pin)
>                                      -> adds to txc->dpll->pin_refs
>                                         and pin->dpll_refs
>   ice_dpll_release_pins(EREF0, 1)
>   ice_dpll_deinit_fwnode_pin(synce_pin)
>      unregister_dpll_notifier()
>      flush_workqueue()   /* no work pending */
>      dpll_pin_put(synce_pin->pin, ...)
> 
> After this, dpll_pin_put() only drops the pin refcount; it does not call
> dpll_pin_unregister(). So the TXC dpll's pin_refs still contains the
> SYNCE entry when ice_dpll_deinit_dpll(pf, &pf->dplls.txc, false) later
> runs dpll_device_unregister() and dpll_device_put(). Does this trip
> WARN_ON_ONCE(!xa_empty(&dpll->pin_refs)) in __dpll_device_put() and
> then
> free the dpll_device while pin->dpll_refs still references it, leaving a
> use-after-free on any subsequent walk of pin->dpll_refs?
> 
> Would it be safer to call ice_dpll_deinit_fwnode_pin(synce_pin) first so
> the notifier is torn down and pf->dplls.wq drained before txclks[] is
> scanned, or to hold pf->dplls.lock across both the deinit scan and the
> work's parent->pin = ...; dpll_pin_register(...) sequence?
> 
> A similar structural race exists for the rclk path via
> dpll_pin_on_pin_register(), but this patch introduces a new instance on
> the direct-on-TXC SYNCE registration. The ordering in
> ice_dpll_deinit_txclk_pins() appears unchanged by later patches in the
> series, so the same race seems to persist beyond this commit.

This is a valid concern and to be fixed in the next iteration.
I'd go with stopping new notifier-driven work and then tear down DPLL
registration and references.

Thanks

Grzegorz

^ permalink raw reply

* Re: [PATCH net-next 1/5] dt-bindings: net: add onsemi's TS2500/NCN26010 10BASE-T1S MACPHY
From: Andrew Lunn @ 2026-05-05 21:25 UTC (permalink / raw)
  To: Piergiorgio Beruto
  Cc: Selvamani Rajagopal, Rob Herring, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, krzk+dt@kernel.org, conor+dt@kernel.org,
	netdev@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <DM8PR02MB8021DB41F30E52D3DED9E7329D3E2@DM8PR02MB8021.namprd02.prod.outlook.com>

> As far as I’m concerned, we can remove the 15 MHz lower limit, but I would
> leave some note somewhere if possible.

A comment in the device tree binding next to the maximum speed
requirement would make sense.

	Andrew

^ permalink raw reply

* Re: [net-next PATCH v2 8/8] net: dsa: realtek: rtl8365mb: add bridge port flags
From: Luiz Angelo Daros de Luca @ 2026-05-05 21:01 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Andrew Lunn, Vladimir Oltean, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Alvin Šipraga,
	Yury Norov, Rasmus Villemoes, Russell King, netdev, linux-kernel
In-Reply-To: <CAD++jLn3WSuQ0oGfPAX3P-P9qvKfDBO24McrLouhu9sKXM-62w@mail.gmail.com>

> Excellent again, thanks.

Thank you, Linus. The code does look nicer. I hope it will ease our
task of making these Realtek switches feature-complete.

Regards,

Luiz

^ permalink raw reply

* Re: [PATCH net] ipv6: fix potential UAF caused by ip6_forward_proxy_check()
From: David Ahern @ 2026-05-05 20:37 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Ido Schimmel, netdev, eric.dumazet, Damiano Melotti
In-Reply-To: <20260505130056.2927197-1-edumazet@google.com>

On 5/5/26 7:00 AM, Eric Dumazet wrote:
> ip6_forward_proxy_check() calls pskb_may_pull() which might re-allocate
> skb->head.
> 
> Reload ipv6_hdr() after the pskb_may_pull() call to avoid using
> the freed memory.
> 
> Fixes: e21e0b5f19ac ("[IPV6] NDISC: Handle NDP messages to proxied addresses.")
> Reported-by: Damiano Melotti <melotti@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv6/ip6_output.c | 3 +++
>  1 file changed, 3 insertions(+)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>

^ permalink raw reply

* Re: [PATCH] llc: fix coding style and memory barrier documentation
From: Andrew Lunn @ 2026-05-05 20:19 UTC (permalink / raw)
  To: Lucas Poupeau; +Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel
In-Reply-To: <20260505192549.74382-1-lucasp.linux@gmail.com>

On Tue, May 05, 2026 at 09:25:49PM +0200, Lucas Poupeau wrote:
> Fix multiple style issues reported by checkpatch.pl to improve code
> maintainability and compliance with kernel standards.
> 
> Changes included:
> - Add mandatory explanatory comments to memory barriers, specifying
>   the reasoning and pairing context.
> - Insert missing blank lines after variable declarations to improve
>   readability.
> - Relocate EXPORT_SYMBOL macros to immediately follow their respective
>   function definitions as per the modern kernel coding style.
> 
> Note: Some indentation warnings from checkpatch.pl persist in this
> commit. Despite several attempts to align them, the tool remains
> capricious regarding these specific blocks, and further manual
> refinement proved counterproductive.
> 
> Reported-by: checkpatch.pl
> Signed-off-by: Lucas Poupeau <lucasp.linux@gmail.com>

Please take a read of

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html

The Subject line needs to indicate what tree it is for.

Please also break this patch up. One patch per type of fix. You are
aiming for patches which are obviously correct. Such patches are
generally small, and have good commit messages.

> -#if 0
> -#define dprintk(args...) printk(KERN_DEBUG args)
> -#else
> -#define dprintk(args...)
> -#endif

It is not obvious why this is correct. Is dprintk() not actually used
in this file? That would be something you would say in the commit
message.

> +    /*
> +     * This barrier ensures that any previous memory operations
> +     * are visible before the handler is executed.
> +     * Pairs with the smp_wmb() in [indiquez ici la fonction ou le fichier].
> +     */

Please do specify the function here. Otherwise the comment is useless.

	Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox