Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 0/2] Add selftests for ntuple (NFC) rules
From: patchwork-bot+netdevbpf @ 2026-04-10 22:40 UTC (permalink / raw)
  To: Dimitri Daskalakis
  Cc: davem, andrew+netdev, edumazet, kuba, pabeni, shuah, willemb,
	petrm, dw, carges, cjubran, daskald, netdev, linux-kselftest
In-Reply-To: <20260407164954.2977820-1-dimitri.daskalakis1@gmail.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  7 Apr 2026 09:49:52 -0700 you wrote:
> From: Dimitri Daskalakis <daskald@meta.com>
> 
> Thoroughly testing a device's NFC implementation can be tedious. The more
> features a device supports, the more combinations to validate.
> 
> This series aims to ease that burden, validating the most common NFC rule
> combinations.
> 
> [...]

Here is the summary with links:
  - [net-next,1/2] selftests: drv-net: Add ntuple (NFC) flow steering test
    https://git.kernel.org/netdev/net-next/c/18589df9344c
  - [net-next,2/2] selftests: drv-net: ntuple: Add dst-ip, src-port, dst-port fields
    https://git.kernel.org/netdev/net-next/c/a66374a3eb02

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Hugh Blemings @ 2026-04-10 22:25 UTC (permalink / raw)
  To: Kuniyuki Iwashima, kuba
  Cc: davem, edumazet, gregkh, horms, linux-hams, linux-kernel, netdev,
	pabeni, stable, workflows, yizhe
In-Reply-To: <20260410221220.1708137-1-kuniyu@google.com>


On 11/4/2026 08:11, Kuniyuki Iwashima wrote:
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Fri, 10 Apr 2026 14:54:48 -0700
>> On Fri, 10 Apr 2026 14:30:42 -0700 Jakub Kicinski wrote:
>>> On Fri, 10 Apr 2026 07:24:36 +0200 Greg Kroah-Hartman wrote:
>>>> On Thu, Apr 09, 2026 at 08:32:35PM -0700, Jakub Kicinski wrote:
>>>>> Or for simplicity we could also be testing against skb_headlen()
>>>>> since we don't expect any legit non-linear frames here? Dunno.
>>>> I'll be glad to change this either way, your call.  Given that this is
>>>> an obsolete protocol that seems to only be a target for drive-by fuzzers
>>>> to attack, whatever the simplest thing to do to quiet them up I'll be
>>>> glad to implement.
>>>>
>>>> Or can we just delete this stuff entirely?  :)
>>> Yes.
>>>
>>> My thinking is to delete hamradio, nfc, atm, caif.. [more to come]
>>> Create GH repos which provide them as OOT modules.
>>> Hopefully we can convince any existing users to switch to that.
>>>
>>> The only thing stopping me is the concern that this is just the softest
>>> target and the LLMs will find something else to focus on which we can't
>>> delete. I suspect any PCIe driver can be flooded with "aren't you
>>> trusting the HW to provide valid responses here?" bullshit.
>>>
>>> But hey, let's try. I'll post a patch nuking all of hamradio later
>>> today.
>> Well, either we "expunge" this code to OOT repos, or we mark it
>> as broken and tell everyone that we don't take security fixes
>> for anything that depends on BROKEN. I'd personally rather expunge.
> +1 for "expunge" to prevent LLM-based patch flood.
>
> IIRC, we did that recently for one driver only used by OpenWRT ?
>
>
If the main concern here is ongoing maintenance of these Ham Radio 
related protocols/drivers, can we pause for a moment on anything as 
dramatic as removing from the tree entirely ?

There is a good cohort of capable kernel folks that either are or were 
ham radio operators who I believe, upon realising that things have got 
to this point, will be happy to redouble efforts to ensure this code 
maintained and tested to a satisfactory standard.

Or, alternatively, as a technical community it may be that the Ham Radio 
interested folks conclude that out of tree or user space solutions are a 
better way forward as others have proposed.

Give us a few days, please, for the word to be put around that we need 
to pull ourselves together a bit as a technical group :)

Cheers/73
Hugh
VK3YYZ/AD5RV/Lapsed Kernel Maintainer... ;)


>> cc: workflows, we can't be the only ones still nursing Linux 2.2 code

-- 
I am slowly moving to hugh@blemings.id.au as my main email address.
If you're using hugh@blemings.org please update your address book accordingly.
Thank you :)


^ permalink raw reply

* Re: [PATCH iwl-net 3/4] ice: fix ready bitmap check for non-E822 devices
From: Jacob Keller @ 2026-04-10 22:32 UTC (permalink / raw)
  To: Anthony Nguyen, Intel Wired LAN, netdev, Aleksandr Loktionov,
	Petr Oros
  Cc: Grzegorz Nitka, Timothy Miskell, Aleksandr Loktionov
In-Reply-To: <20260408-jk-even-more-e825c-fixes-v1-3-b959da91a81f@intel.com>

On 4/8/2026 11:46 AM, Jacob Keller wrote:
> diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
> index ada42bcc4d0b..34906f972d17 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ptp.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
> @@ -2718,7 +2718,7 @@ static bool ice_any_port_has_timestamps(struct ice_pf *pf)
>  bool ice_ptp_tx_tstamps_pending(struct ice_pf *pf)
>  {
>  	struct ice_hw *hw = &pf->hw;
> -	unsigned int i;
> +	int ret;
>  
>  	/* Check software indicator */
>  	switch (pf->ptp.tx_interrupt_mode) {
> @@ -2739,16 +2739,15 @@ bool ice_ptp_tx_tstamps_pending(struct ice_pf *pf)
>  	}
>  
>  	/* Check hardware indicator */
> -	for (i = 0; i < ICE_GET_QUAD_NUM(hw->ptp.num_lports); i++) {
> -		u64 tstamp_ready = 0;
> -		int err;
> -
> -		err = ice_get_phy_tx_tstamp_ready(&pf->hw, i, &tstamp_ready);
> -		if (err || tstamp_ready)
> -			return true;
> +	ret = ice_check_phy_tx_tstamp_ready(hw);
> +	if (ret < 0) {
> +		dev_dbg(ice_pf_to_dev(pf), "Unable to read PHY Tx timestamp ready bitmap, err %d\n",
> +			ret);
> +		/* Stop triggering IRQs if we're unable to read PHY */
> +		return false;
>  	}
>  
> -	return false;
> +	return ret;

I got some comments off-list from Aleks about this "conversion" from
integer to boolean here. The return from ice_check_phy_tx_tstamp_ready()
is 0 when there are no timsetamps and 1 when there is at least one
timestamp available. The negative values are excluded in the check just
above this.

I guess I can re-write this to be an if/elif/else chain instead, but I
wonder what others on the list think here.

I understand it is slightly confusing to have a return which is 3-way
(error, 0 or 1), but other solutions seemed inellegant. Using a bool for
the check means we lose the nuance of the error code (and the distinct
dev_dbg message). It also doesn't let the caller decide whether its more
imporant to continue on error or stop on error (which might be dependant
on each call flow).

Just doing "ret ? true : false" seems stupid since that is literally
what this will do given that we already excluded negative values in the
check above before returning.

> +/**
> + * ice_check_phy_tx_tstamp_ready_e810 - Check Tx memory status register
> + * @hw: pointer to the HW struct
> + *
> + * The E810 devices do not have a Tx memory status register. Note this is
> + * intentionally different behavior from ice_get_phy_tx_tstamp_ready_e810
> + * which always says that all bits are ready. This function is called in cases
> + * where code will trigger interrupts if timestamps are waiting, and should
> + * not be called for E810 hardware.
> + *
> + * Return: 0.
> + */
> +static int ice_check_phy_tx_tstamp_ready_e810(struct ice_hw *hw)
> +{
> +	return 0;
> +}
> +

I got comments off list about this "unused parameter" and empty
function. The E810 hardware doesn't have a ready bitmap in its
registers. Thus, this function doesn't really do anything. I'm
considering if it makes more sense to move this comment into the
hardware wrapper layer and drop this function (and thus its unused
parameter).
> +/**
> + * ice_check_phy_tx_tstamp_ready - Check PHY Tx timestamp memory status
> + * @hw: pointer to the HW struct
> + *
> + * Check the PHY for Tx timestamp memory status on all ports. If you need to
> + * see individual timestamp status for each index, use
> + * ice_get_phy_tx_tstamp_ready() instead.
> + *
> + * Return: 1 if any port has timestamps available, 0 if there are no timestamps
> + * available, and a negative error code on failure.
> + */

This type of return is somewhat confusing and rare. Typically functions
returning int either return 0 for success or a non-zero error code. In
this case there are two "success" states. C doesn't have anything like
the Error types from newer languages.

The closest example I have is functions that read from a socket or
buffer or file, which return negative error codes or the number of bytes
read/written.

I could change this API to return the total number of timestamps
available.. but every caller just cares about whether there is at least
one timestamp, so I opted to shrink it to 0/1 in order to allow eliding
unnecessary checks for devices with multiple ports.

Thoughts?

Thanks,
Jak

^ permalink raw reply

* Re: [PATCH net-next v1] net: add missing syncookie statistics for BPF custom syncookies
From: Kuniyuki Iwashima @ 2026-04-10 22:31 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: netdev, Eric Dumazet, Neal Cardwell, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern,
	linux-kernel, bpf
In-Reply-To: <20260409124129.361777-1-jiayuan.chen@linux.dev>

On Thu, Apr 9, 2026 at 5:41 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> 1. Replace IS_ENABLED(CONFIG_BPF) with CONFIG_BPF_SYSCALL for
>    cookie_bpf_ok() and cookie_bpf_check(). CONFIG_BPF is selected by
>    CONFIG_NET unconditionally, so IS_ENABLED(CONFIG_BPF) is always
>    true and provides no real guard. CONFIG_BPF_SYSCALL is the correct
>    config for BPF program functionality.
>
> 2. Remove the CONFIG_BPF_SYSCALL guard around struct bpf_tcp_req_attrs.
>    This struct is referenced by bpf_sk_assign_tcp_reqsk() in
>    net/core/filter.c which is compiled unconditionally, so wrapping
>    the definition in a config guard could cause build failures when
>    CONFIG_BPF_SYSCALL=n.
>
> 3. Fix mismatched declaration of cookie_bpf_check() between the
>    CONFIG_BPF_SYSCALL and stub paths: the real definition takes
>    'struct net *net' but the declaration in the header did not.
>    Add the net parameter to the declaration and all call sites.
>
> 4. Add missing LINUX_MIB_SYNCOOKIESRECV and LINUX_MIB_SYNCOOKIESFAILED
>    statistics in cookie_bpf_check(), so that BPF custom syncookie
>    validation is accounted for in SNMP counters just like the
>    non-BPF path.
>
> Compile-tested with CONFIG_BPF_SYSCALL=y and CONFIG_BPF_SYSCALL
> not set.

Can you add SNMP test in tcp_custom_syncookie.c by checking
the delta before connect_to_fd() and after accept() ?


>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> No functional bug here — CONFIG_BPF is always enabled under
> CONFIG_NET, so the existing code compiles and works correctly.
> This is a cleanup and improvement, no backport needed.
> ---
>  include/net/tcp.h     |  7 +++----
>  net/ipv4/syncookies.c | 10 +++++++---
>  net/ipv6/syncookies.c |  2 +-
>  3 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 6156d1d068e1..570a8836c2ba 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -598,7 +598,6 @@ struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
>                                             struct tcp_options_received *tcp_opt,
>                                             int mss, u32 tsoff);
>
> -#if IS_ENABLED(CONFIG_BPF)
>  struct bpf_tcp_req_attrs {
>         u32 rcv_tsval;
>         u32 rcv_tsecr;
> @@ -612,7 +611,6 @@ struct bpf_tcp_req_attrs {
>         u8 usec_ts_ok;
>         u8 reserved[3];
>  };
> -#endif
>
>  #ifdef CONFIG_SYN_COOKIES
>
> @@ -715,13 +713,14 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
>                 dst_feature(dst, RTAX_FEATURE_ECN);
>  }
>
> -#if IS_ENABLED(CONFIG_BPF)
> +#ifdef CONFIG_BPF_SYSCALL
>  static inline bool cookie_bpf_ok(struct sk_buff *skb)
>  {
>         return skb->sk;
>  }
>
> -struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb);
> +struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
> +                                     struct sk_buff *skb);
>  #else
>  static inline bool cookie_bpf_ok(struct sk_buff *skb)
>  {
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index f1474598d2c8..d685631438cb 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -295,8 +295,9 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
>         return 0;
>  }
>
> -#if IS_ENABLED(CONFIG_BPF)
> -struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
> +#ifdef CONFIG_BPF_SYSCALL
> +struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
> +                                     struct sk_buff *skb)
>  {
>         struct request_sock *req = inet_reqsk(skb->sk);
>
> @@ -306,6 +307,9 @@ struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
>         if (cookie_tcp_reqsk_init(sk, skb, req)) {
>                 reqsk_free(req);
>                 req = NULL;
> +               __NET_INC_STATS(net, LINUX_MIB_SYNCOOKIESFAILED);
> +       } else {
> +               __NET_INC_STATS(net, LINUX_MIB_SYNCOOKIESRECV);
>         }
>
>         return req;
> @@ -419,7 +423,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
>                 goto out;
>
>         if (cookie_bpf_ok(skb)) {
> -               req = cookie_bpf_check(sk, skb);
> +               req = cookie_bpf_check(net, sk, skb);
>         } else {
>                 req = cookie_tcp_check(net, sk, skb);
>                 if (IS_ERR(req))
> diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
> index 4f6f0d751d6c..111d7a41d957 100644
> --- a/net/ipv6/syncookies.c
> +++ b/net/ipv6/syncookies.c
> @@ -190,7 +190,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
>                 goto out;
>
>         if (cookie_bpf_ok(skb)) {
> -               req = cookie_bpf_check(sk, skb);
> +               req = cookie_bpf_check(net, sk, skb);
>         } else {
>                 req = cookie_tcp_check(net, sk, skb);
>                 if (IS_ERR(req))
> --
> 2.43.0
>

^ permalink raw reply

* Re: [PATCH v2] net: hamradio: 6pack: fix uninit-value in sixpack_receive_buf
From: patchwork-bot+netdevbpf @ 2026-04-10 22:30 UTC (permalink / raw)
  To: Mashiro Chen
  Cc: netdev, horms, davem, edumazet, kuba, pabeni, linux-hams, stable,
	syzbot+ecdb8c9878a81eb21e54
In-Reply-To: <20260407173101.107352-1-mashiro.chen@mailbox.org>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 Apr 2026 01:31:01 +0800 you wrote:
> sixpack_receive_buf() does not properly skip bytes with TTY error flags.
> The while loop iterates through the flags buffer but never advances the
> data pointer (cp), and passes the original count (including error bytes)
> to sixpack_decode(). This causes sixpack_decode() to process bytes that
> should have been skipped due to TTY errors.  The TTY layer does not
> guarantee that cp[i] holds a meaningful value when fp[i] is set, so
> passing those positions to sixpack_decode() results in KMSAN reporting
> an uninit-value read.
> 
> [...]

Here is the summary with links:
  - [v2] net: hamradio: 6pack: fix uninit-value in sixpack_receive_buf
    https://git.kernel.org/netdev/net/c/bf9a38803b26

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH 5/5] selftests: net: add rss_multiqueue test variant to iou-zcrx
From: David Wei @ 2026-04-10 22:26 UTC (permalink / raw)
  To: Juanlu Herrero, netdev
In-Reply-To: <20260408163816.2760-6-juanlu@fastmail.com>

On 2026-04-08 09:38, Juanlu Herrero wrote:
> Add multi-port support to the iou-zcrx test binary and a new
> rss_multiqueue Python test variant that exercises multi-queue zero-copy
> receive with per-port flow rule steering.
> 
> In multi-port mode, the server creates N listening sockets on
> consecutive ports (cfg_port, cfg_port+1, ...) and uses epoll to accept
> one connection per socket. Each client thread connects to its
> corresponding port. Per-port ntuple flow rules steer traffic to
> different NIC hardware queues, each with its own zcrx instance.
> 
> For single-thread mode (the default), behavior is unchanged: one socket
> on cfg_port, one thread, one queue.
> 
> Signed-off-by: Juanlu Herrero <juanlu@fastmail.com>
> ---
>   .../selftests/drivers/net/hw/iou-zcrx.c       | 81 ++++++++++++++-----
>   .../selftests/drivers/net/hw/iou-zcrx.py      | 45 ++++++++++-
>   2 files changed, 104 insertions(+), 22 deletions(-)
> 
> diff --git a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c
> index 646682167bb0..1f33d7127185 100644
> --- a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c
> +++ b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c

Please make all changes in iou-zcrx.c in a single patch. Then patch 5
only changes the Python selftest.

[...]
> @@ -397,12 +410,36 @@ static void run_server(void)
>   	if (cfg_dry_run)
>   		goto join;
>   
> +	epfd = epoll_create1(0);
> +	if (epfd < 0)
> +		error(1, 0, "epoll_create1()");
> +
>   	for (i = 0; i < cfg_num_threads; i++) {
> -		ctxs[i].connfd = accept(fd, NULL, NULL);
> -		if (ctxs[i].connfd < 0)
> -			error(1, 0, "accept()");
> +		ev.events = EPOLLIN;
> +		ev.data.u32 = i;
> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, fds[i], &ev) < 0)
> +			error(1, 0, "epoll_ctl()");
>   	}
>   
> +	accepted = 0;
> +	while (accepted < cfg_num_threads) {

You're using epoll here but it is still accepting a fixed nr of
connections. The server should be able to accept an arbitrary nr of
connections, dispatching them to the server worker threads.

Also with multiple queues, connections must be dispatched according to
their NAPI IDs to the correct server workers.

> +		nfds = epoll_wait(epfd, events, 64, 5000);
> +		if (nfds < 0)
> +			error(1, 0, "epoll_wait()");
> +		if (nfds == 0)
> +			error(1, 0, "epoll_wait() timeout");
> +
> +		for (i = 0; i < nfds; i++) {
> +			int idx = events[i].data.u32;
> +
> +			ctxs[idx].connfd = accept(fds[idx], NULL, NULL);
> +			if (ctxs[idx].connfd < 0)
> +				error(1, 0, "accept()");
> +			accepted++;
> +		}
> +	}
> +
> +	close(epfd);
>   	pthread_barrier_wait(&barrier);
>   
>   join:

^ permalink raw reply

* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Kuniyuki Iwashima @ 2026-04-10 22:11 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, gregkh, horms, linux-hams, linux-kernel, netdev,
	pabeni, stable, workflows, yizhe
In-Reply-To: <20260410145448.38253e3c@kernel.org>

From: Jakub Kicinski <kuba@kernel.org>
Date: Fri, 10 Apr 2026 14:54:48 -0700
> On Fri, 10 Apr 2026 14:30:42 -0700 Jakub Kicinski wrote:
> > On Fri, 10 Apr 2026 07:24:36 +0200 Greg Kroah-Hartman wrote:
> > > On Thu, Apr 09, 2026 at 08:32:35PM -0700, Jakub Kicinski wrote:  
> > > > Or for simplicity we could also be testing against skb_headlen()
> > > > since we don't expect any legit non-linear frames here? Dunno.    
> > > 
> > > I'll be glad to change this either way, your call.  Given that this is
> > > an obsolete protocol that seems to only be a target for drive-by fuzzers
> > > to attack, whatever the simplest thing to do to quiet them up I'll be
> > > glad to implement.
> > > 
> > > Or can we just delete this stuff entirely?  :)  
> > 
> > Yes.
> > 
> > My thinking is to delete hamradio, nfc, atm, caif.. [more to come]
> > Create GH repos which provide them as OOT modules.
> > Hopefully we can convince any existing users to switch to that.
> > 
> > The only thing stopping me is the concern that this is just the softest
> > target and the LLMs will find something else to focus on which we can't
> > delete. I suspect any PCIe driver can be flooded with "aren't you
> > trusting the HW to provide valid responses here?" bullshit.
> > 
> > But hey, let's try. I'll post a patch nuking all of hamradio later
> > today.
> 
> Well, either we "expunge" this code to OOT repos, or we mark it 
> as broken and tell everyone that we don't take security fixes
> for anything that depends on BROKEN. I'd personally rather expunge.

+1 for "expunge" to prevent LLM-based patch flood.

IIRC, we did that recently for one driver only used by OpenWRT ?


> 
> cc: workflows, we can't be the only ones still nursing Linux 2.2 code

^ permalink raw reply

* Re: [PATCH iwl-next] i40e: PTP: set supported flags in ptp_clock_info
From: Jacob Keller @ 2026-04-10 22:09 UTC (permalink / raw)
  To: Korba, Przemyslaw, Simon Horman
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	Nguyen, Anthony L, Kitszel, Przemyslaw
In-Reply-To: <PH0PR11MB4904AB3C1045FA1A2F99A03194592@PH0PR11MB4904.namprd11.prod.outlook.com>

On 4/10/2026 6:20 AM, Korba, Przemyslaw wrote:
>> -----Original Message-----
>> From: Keller, Jacob E <jacob.e.keller@intel.com>
>> Sent: Thursday, March 26, 2026 1:15 AM
>> To: Korba, Przemyslaw <przemyslaw.korba@intel.com>; Simon Horman <horms@kernel.org>
>> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
>> <przemyslaw.kitszel@intel.com>
>> Subject: Re: [PATCH iwl-next] i40e: PTP: set supported flags in ptp_clock_info
>>
>> On 3/13/2026 6:47 AM, Korba, Przemyslaw wrote:
>>>> -----Original Message-----
>>>> From: Simon Horman <horms@kernel.org>
>>>> Sent: Friday, March 13, 2026 2:35 PM
>>>> To: Korba, Przemyslaw <przemyslaw.korba@intel.com>
>>>> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
>>>> <przemyslaw.kitszel@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>
>>>> Subject: Re: [PATCH iwl-next] i40e: PTP: set supported flags in ptp_clock_info
>> Yes, Simon is correct, but we do have to be certain that the driver
>> actually implements the facts correctly, i.e. that it will actually
>> honor the RISING or FALLING edge, before you actually add the flags to
>> the supported flags list.
>>
>> I don't see any mention of PTP_RISING_EDGE nor PTP_FALLING_EDGE in the
>> driver. Thus, I can't confirm which edge is actually timestamped.
>>
>> Thus I would NACK this patch until you can confirm whether the hardware
>> either a) timestamps one edge, in which case you should set only that
>> flag as allowed, b) timestamps both edges, in which case you should set
>> all flags and then explicitly reject the case where only one flag is
>> set, or c) can be configured based on which flag is set, in which case
>> you should set all the flags and then check the flags when programming
>> to enable the appropriate edge.
>>
>> This patch does none of these, and is therefor incorrect. Applying it
>> will "allow" the userspace to work but they will not get the strict
>> behavior of timestamping the desired edge, which completely negates the
>> point of the strict mode!
>>
>> As an example, look at the ice driver:
>>
>> #define GLTSYN_AUX_IN_0_EVNTLVL_RISING_EDGE     BIT(0)
>> #define GLTSYN_AUX_IN_0_EVNTLVL_FALLING_EDGE    BIT(1)
>>
>>                 /* set event level to requested edge */
>>                 if (rq->flags & PTP_FALLING_EDGE)
>>                         aux_reg |= GLTSYN_AUX_IN_0_EVNTLVL_FALLING_EDGE;
>>                 if (rq->flags & PTP_RISING_EDGE)
>>                         aux_reg |= GLTSYN_AUX_IN_0_EVNTLVL_RISING_EDGE;
>>
>>
>> It sets the appropriate register values to ensure the correct edges are
>> timestamped as requested.
>>
>> Thanks,
>> Jake
> 
> Hi, thank you for your review, and sorry for late response. 
> The original point of this patch was to fix the issue, where ts2phc fails due to not seeing supported flags
> (now when I think about it iwl-net would be a better place for this patch)
> I've read in our documentation FVL supports both rising, falling and both edges, 
> but in i40e_ptp_set_timestamp_mode we are hardcoding EVNTLVL register to Rising edge only. 
> Implementing other edges would require DCR, and I couldn't find anything like that. 
> I think for now setting the rising edge as a supported flag would be the way to go. Do you agree?

Agreed. I would propose the following path to resolving this:

a) as a net fix, set the flag to match what the driver actually enables.
If this is Rising edge only, then set the rising edge and strict flag.

Strictly, this is a bug introduced by commit 1050713026a0 ("i40e: add
support for PTP external synchronization clock"), because this commit
added support for EXTTS output without checking the flags, but.. the
original issue was being too accepting, while the new issue is being
caused by silently excluding the flags because the flags aren't listed
as supported. Thus, use Fixes: 7c571ac57d9d ("net: ptp: introduce
.supported_extts_flags to ptp_clock_info")

Since you confirmed that the driver explicitly sets rising edge support,
you should set STRICT_FLAGS and RISING_EDGE in the
.supported_extts_flags field.

b) as a net-next improvement, add support for both flags and
conditionally set the register to program the desired edge. This might
take longer if we need to go through internal approval for a new
feature, but in my opinion this is small and obviously correct and
something we should support within the PTP ecosystem. Since the window
will be closing soon this part should wait until after it re-opens in 2
weeks.

^ permalink raw reply

* [PATCH net-next] net: airoha: Wait for TX to complete in airoha_dev_stop()
From: Lorenzo Bianconi @ 2026-04-10 22:05 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: linux-arm-kernel, linux-mediatek, netdev, Lorenzo Bianconi

Wait for TX to complete in airoha_dev_stop routine before stopping the
TX DMA and run airoha_qdma_cleanup_tx_queue routine. Moreover,
start/stop TX/RX NAPIs in ndo_open()/ndo_stop() callbacks in order to be
sure the TX NAPIs have completed before stopping the TX DMA engine in
airoha_dev_stop routine.
Please note this patch on the commit 'b1c803d5c816 ("net: airoha: Rework
the code flow in airoha_remove() and in airoha_probe() error path")'
that is available only in net-next tree at the moment.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 44 +++++++++++++++++---------------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 8e4b043af4bc..9e40c8f375c1 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1662,10 +1662,12 @@ static int airoha_dev_open(struct net_device *dev)
 		      FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
 		      FIELD_PREP(GDM_LONG_LEN_MASK, len));
 
-	airoha_qdma_set(qdma, REG_QDMA_GLOBAL_CFG,
-			GLOBAL_CFG_TX_DMA_EN_MASK |
-			GLOBAL_CFG_RX_DMA_EN_MASK);
-	atomic_inc(&qdma->users);
+	if (!atomic_fetch_inc(&qdma->users)) {
+		airoha_qdma_set(qdma, REG_QDMA_GLOBAL_CFG,
+				GLOBAL_CFG_TX_DMA_EN_MASK |
+				GLOBAL_CFG_RX_DMA_EN_MASK);
+		airoha_qdma_start_napi(qdma);
+	}
 
 	if (port->id == AIROHA_GDM2_IDX &&
 	    airoha_ppe_is_enabled(qdma->eth, 1)) {
@@ -1684,18 +1686,26 @@ static int airoha_dev_stop(struct net_device *dev)
 	struct airoha_qdma *qdma = port->qdma;
 	int i, err;
 
-	netif_tx_disable(dev);
 	err = airoha_set_vip_for_gdm_port(port, false);
 	if (err)
 		return err;
 
-	for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++)
-		netdev_tx_reset_subqueue(dev, i);
-
 	airoha_set_gdm_port_fwd_cfg(qdma->eth, REG_GDM_FWD_CFG(port->id),
 				    FE_PSE_PORT_DROP);
 
+	netif_tx_disable(dev);
 	if (atomic_dec_and_test(&qdma->users)) {
+		u32 val;
+
+		/* Wait for TX to complete */
+		err = read_poll_timeout(airoha_qdma_rr, val,
+					!(val & GLOBAL_CFG_TX_DMA_BUSY_MASK),
+					USEC_PER_MSEC, 100 * USEC_PER_MSEC,
+					false, qdma, REG_QDMA_GLOBAL_CFG);
+		if (err)
+			return err;
+
+		airoha_qdma_stop_napi(qdma);
 		airoha_qdma_clear(qdma, REG_QDMA_GLOBAL_CFG,
 				  GLOBAL_CFG_TX_DMA_EN_MASK |
 				  GLOBAL_CFG_RX_DMA_EN_MASK);
@@ -1708,6 +1718,9 @@ static int airoha_dev_stop(struct net_device *dev)
 		}
 	}
 
+	for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++)
+		netdev_tx_reset_subqueue(dev, i);
+
 	return 0;
 }
 
@@ -3048,9 +3061,6 @@ static int airoha_probe(struct platform_device *pdev)
 	if (err)
 		goto error_netdev_free;
 
-	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
-		airoha_qdma_start_napi(&eth->qdma[i]);
-
 	for_each_child_of_node(pdev->dev.of_node, np) {
 		if (!of_device_is_compatible(np, "airoha,eth-mac"))
 			continue;
@@ -3061,20 +3071,17 @@ static int airoha_probe(struct platform_device *pdev)
 		err = airoha_alloc_gdm_port(eth, np);
 		if (err) {
 			of_node_put(np);
-			goto error_napi_stop;
+			goto error_netdev_unregister;
 		}
 	}
 
 	err = airoha_register_gdm_devices(eth);
 	if (err)
-		goto error_napi_stop;
+		goto error_netdev_unregister;
 
 	return 0;
 
-error_napi_stop:
-	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
-		airoha_qdma_stop_napi(&eth->qdma[i]);
-
+error_netdev_unregister:
 	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
 		struct airoha_gdm_port *port = eth->ports[i];
 
@@ -3098,9 +3105,6 @@ static void airoha_remove(struct platform_device *pdev)
 	struct airoha_eth *eth = platform_get_drvdata(pdev);
 	int i;
 
-	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
-		airoha_qdma_stop_napi(&eth->qdma[i]);
-
 	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
 		struct airoha_gdm_port *port = eth->ports[i];
 

---
base-commit: 42f9b4c6ef19e71d2c7d9bfd3c5037d4fe434ad7
change-id: 20260410-airoha-fix-ndo_stop-ebbf3c724ae0

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply related

* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Jakub Kicinski @ 2026-04-10 21:54 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Simon Horman, netdev, linux-kernel, David S. Miller, Eric Dumazet,
	Paolo Abeni, linux-hams, Yizhe Zhuang, stable, workflows
In-Reply-To: <20260410143042.1d4436de@kernel.org>

On Fri, 10 Apr 2026 14:30:42 -0700 Jakub Kicinski wrote:
> On Fri, 10 Apr 2026 07:24:36 +0200 Greg Kroah-Hartman wrote:
> > On Thu, Apr 09, 2026 at 08:32:35PM -0700, Jakub Kicinski wrote:  
> > > Or for simplicity we could also be testing against skb_headlen()
> > > since we don't expect any legit non-linear frames here? Dunno.    
> > 
> > I'll be glad to change this either way, your call.  Given that this is
> > an obsolete protocol that seems to only be a target for drive-by fuzzers
> > to attack, whatever the simplest thing to do to quiet them up I'll be
> > glad to implement.
> > 
> > Or can we just delete this stuff entirely?  :)  
> 
> Yes.
> 
> My thinking is to delete hamradio, nfc, atm, caif.. [more to come]
> Create GH repos which provide them as OOT modules.
> Hopefully we can convince any existing users to switch to that.
> 
> The only thing stopping me is the concern that this is just the softest
> target and the LLMs will find something else to focus on which we can't
> delete. I suspect any PCIe driver can be flooded with "aren't you
> trusting the HW to provide valid responses here?" bullshit.
> 
> But hey, let's try. I'll post a patch nuking all of hamradio later
> today.

Well, either we "expunge" this code to OOT repos, or we mark it 
as broken and tell everyone that we don't take security fixes
for anything that depends on BROKEN. I'd personally rather expunge.

cc: workflows, we can't be the only ones still nursing Linux 2.2 code

^ permalink raw reply

* [PATCH v1 net-next] selftest: net: Use port outside of the default ip_local_ports in csum.c.
From: Kuniyuki Iwashima @ 2026-04-10 21:54 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, Mahesh Bandewar,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

csum.c binds a socket on a fixed port in init_net to test
the csum offload feature between two machines.

In our testbed, the test sometimes fails with -EADDRINUSE.

  bind r: Address already in use
  bind dgram 6: Address already in use

The fixed ports (33000, 33001, 34000) are all within the default
ip_local_ports range (32768 ~ 60999), and other processes may
happen to be using them.

Let's use ports outside of the default ip_local_ports range to
deflake the test.

  # cat /etc/services | grep -E "(13000|13001|13002)" | echo no service
  no service
  # rpm -qf /etc/services
  setup-2.15.0-28.fc44.noarch

We could add an option to specify ports if needed.

Suggested-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 tools/testing/selftests/net/lib/csum.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/lib/csum.c b/tools/testing/selftests/net/lib/csum.c
index e28884ce3ab3..4e044689bc37 100644
--- a/tools/testing/selftests/net/lib/csum.c
+++ b/tools/testing/selftests/net/lib/csum.c
@@ -105,9 +105,9 @@ static char *cfg_mac_src;
 static int cfg_proto = IPPROTO_UDP;
 static int cfg_payload_char = 'a';
 static int cfg_payload_len = 100;
-static uint16_t cfg_port_dst = 34000;
-static uint16_t cfg_port_src = 33000;
-static uint16_t cfg_port_src_encap = 33001;
+static uint16_t cfg_port_dst = 13000;
+static uint16_t cfg_port_src = 13001;
+static uint16_t cfg_port_src_encap = 13002;
 static unsigned int cfg_random_seed;
 static int cfg_rcvbuf = 1 << 22;	/* be able to queue large cfg_num_pkt */
 static bool cfg_send_pfpacket;
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH net] net: airoha: Add missing bits in airoha_qdma_cleanup_tx_queue()
From: Lorenzo Bianconi @ 2026-04-10 21:49 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: linux-arm-kernel, linux-mediatek, netdev, Lorenzo Bianconi

Similar to airoha_qdma_cleanup_rx_queue(), reset DMA TX descriptors in
airoha_qdma_cleanup_tx_queue routine. Moreover, reset TX_DMA_IDX to
TX_CPU_IDX to notify the NIC the QDMA TX ring is empty.

Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 9285a68f435f..963ab7b8d166 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1044,13 +1044,17 @@ static int airoha_qdma_init_tx(struct airoha_qdma *qdma)
 
 static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
 {
-	struct airoha_eth *eth = q->qdma->eth;
-	int i;
+	struct airoha_qdma *qdma = q->qdma;
+	struct airoha_eth *eth = qdma->eth;
+	int i, qid = q - &qdma->q_tx[0];
+	struct airoha_queue_entry *e;
+	u16 index;
 
 	spin_lock_bh(&q->lock);
 	for (i = 0; i < q->ndesc; i++) {
-		struct airoha_queue_entry *e = &q->entry[i];
+		struct airoha_qdma_desc *desc = &q->desc[i];
 
+		e = &q->entry[i];
 		if (!e->dma_addr)
 			continue;
 
@@ -1060,8 +1064,29 @@ static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
 		e->dma_addr = 0;
 		e->skb = NULL;
 		list_add_tail(&e->list, &q->tx_list);
+
+		/* Reset DMA descriptor */
+		WRITE_ONCE(desc->ctrl, 0);
+		WRITE_ONCE(desc->addr, 0);
+		WRITE_ONCE(desc->data, 0);
+		WRITE_ONCE(desc->msg0, 0);
+		WRITE_ONCE(desc->msg1, 0);
+		WRITE_ONCE(desc->msg2, 0);
+
 		q->queued--;
 	}
+
+	e = list_first_entry(&q->tx_list, struct airoha_queue_entry,
+			     list);
+	index = e - q->entry;
+	/* Set TX_DMA_IDX to TX_CPU_IDX to notify the hw the QDMA TX ring is
+	 * empty.
+	 */
+	airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid), TX_RING_CPU_IDX_MASK,
+			FIELD_PREP(TX_RING_CPU_IDX_MASK, index));
+	airoha_qdma_rmw(qdma, REG_TX_DMA_IDX(qid), TX_RING_DMA_IDX_MASK,
+			FIELD_PREP(TX_RING_DMA_IDX_MASK, index));
+
 	spin_unlock_bh(&q->lock);
 }
 

---
base-commit: 12ff2a4aee6c86746623d5aed24389dbf6dffded
change-id: 20260410-airoha_qdma_cleanup_tx_queue-fix-net-93375f5ee80f

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply related

* Re: [PATCH next-next v2] net:mctp: fix setting mctp hdr ver reserved field cause  data dropped
From: Jakub Kicinski @ 2026-04-10 21:43 UTC (permalink / raw)
  To: wit_yuan; +Cc: netdev, Yuan Zhaoming
In-Reply-To: <20260410063334.7531-2-yuanzhaoming901030@126.com>

On Fri, 10 Apr 2026 14:33:34 +0800 wit_yuan wrote:
> From: Yuan Zhaoming <yuanzm2@lenovo.com>
> 
> from spec dsp0236_1.2.1.pdf page 26, the mctp header contains the
> RSVD(4bit) and Hdr version(4 bit).
> 
> mctp_pkttype_receive invoke mctp_hdr, and get mh->ver whole byte 
> compare the MCTP_VER_MIN, MCTP_VER_MAX. the reserver bits may be by 
> misleading used.
> 
> one type hba card will set pcie vdm header Pad Len the same with RSVD
> bit, this will not work on mctp kernel solution.

We ask that:
 - new patches are not posted in reply to old ones
 - at least 24h passes before new version is posted


> +	if (((mh->ver & MCTP_HDR_VER_MASK)) < MCTP_VER_MIN || (mh->ver & MCTP_HDR_VER_MASK) > MCTP_VER_MAX)
>  		goto out;

Please save the masked version to a variable and compare that extracted
variable. This line is unnecessarily long (max line length is 80 in
networking).
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH net] netrom: do some basic forms of validation on incoming frames
From: Jakub Kicinski @ 2026-04-10 21:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Simon Horman, netdev, linux-kernel, David S. Miller, Eric Dumazet,
	Paolo Abeni, linux-hams, Yizhe Zhuang, stable
In-Reply-To: <2026041026-excuse-slashing-c4ee@gregkh>

On Fri, 10 Apr 2026 07:24:36 +0200 Greg Kroah-Hartman wrote:
> On Thu, Apr 09, 2026 at 08:32:35PM -0700, Jakub Kicinski wrote:
> > On Thu, 9 Apr 2026 20:03:28 +0100 Simon Horman wrote:  
> > > I expect that checking skb->len isn't sufficient here
> > > and pskb_may_pull needs to be used to ensure that
> > > the data is also available in the linear section of the skb.  
> > 
> > Or for simplicity we could also be testing against skb_headlen()
> > since we don't expect any legit non-linear frames here? Dunno.  
> 
> I'll be glad to change this either way, your call.  Given that this is
> an obsolete protocol that seems to only be a target for drive-by fuzzers
> to attack, whatever the simplest thing to do to quiet them up I'll be
> glad to implement.
> 
> Or can we just delete this stuff entirely?  :)

Yes.

My thinking is to delete hamradio, nfc, atm, caif.. [more to come]
Create GH repos which provide them as OOT modules.
Hopefully we can convince any existing users to switch to that.

The only thing stopping me is the concern that this is just the softest
target and the LLMs will find something else to focus on which we can't
delete. I suspect any PCIe driver can be flooded with "aren't you
trusting the HW to provide valid responses here?" bullshit.

But hey, let's try. I'll post a patch nuking all of hamradio later
today.

^ permalink raw reply

* [PATCH v2 net-next 15/15] ip6mr: Replace RTNL with a dedicated mutex for MFC.
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

ip6mr does not have rtnetlink interface for MFC unlike ipmr,
which uses dev_get_by_index_rcu() to set struct mfcctl.mfcc_parent.

ip6mr_mfc_add() and ip6mr_mfc_delete() are called under RTNL
from ip6_mroute_setsockopt() only.

There are no RTNL dependant, but ip6_mroute_setsockopt() reuses
RTNL just for mrt->mfc_hash and mrt->mfc_cache_list.

Let's replace RTNL with a new per-netns mutex.

Later, ip6mr_notifier_ops and ipmr_seq will be moved under
CONFIG_IPV6_MROUTE.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 include/net/netns/ipv6.h |  1 +
 net/ipv6/ip6mr.c         | 21 ++++++++++++++-------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 499e4288170f..83ac9c82d7dc 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -112,6 +112,7 @@ struct netns_ipv6 {
 	struct list_head	mr6_tables;
 	struct fib_rules_ops	*mr6_rules_ops;
 #endif
+	struct mutex		mfc_mutex;
 #endif
 	atomic_t		dev_addr_genid;
 	atomic_t		fib6_sernum;
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index a31e3b740581..67385de7befe 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1259,7 +1259,6 @@ static int ip6mr_mfc_delete(struct mr_table *mrt, struct mf6cctl *mfc,
 {
 	struct mfc6_cache *c;
 
-	/* The entries are added/deleted only under RTNL */
 	rcu_read_lock();
 	c = ip6mr_cache_find_parent(mrt, &mfc->mf6cc_origin.sin6_addr,
 				    &mfc->mf6cc_mcastgrp.sin6_addr, parent);
@@ -1349,6 +1348,8 @@ static int __net_init ip6mr_net_init(struct net *net)
 	LIST_HEAD(dev_kill_list);
 	int err;
 
+	mutex_init(&net->ipv6.mfc_mutex);
+
 	err = ip6mr_notifier_init(net);
 	if (err)
 		return err;
@@ -1477,7 +1478,6 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
 			ttls[i] = 1;
 	}
 
-	/* The entries are added/deleted only under RTNL */
 	rcu_read_lock();
 	c = ip6mr_cache_find_parent(mrt, &mfc->mf6cc_origin.sin6_addr,
 				    &mfc->mf6cc_mcastgrp.sin6_addr, parent);
@@ -1555,6 +1555,7 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
 static void mroute_clean_tables(struct mr_table *mrt, int flags,
 				struct list_head *dev_kill_list)
 {
+	struct net *net = read_pnet(&mrt->net);
 	struct mr_mfc *c, *tmp;
 	int i;
 
@@ -1571,18 +1572,21 @@ static void mroute_clean_tables(struct mr_table *mrt, int flags,
 
 	/* Wipe the cache */
 	if (flags & (MRT6_FLUSH_MFC | MRT6_FLUSH_MFC_STATIC)) {
+		mutex_lock(&net->ipv6.mfc_mutex);
+
 		list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
 			if (((c->mfc_flags & MFC_STATIC) && !(flags & MRT6_FLUSH_MFC_STATIC)) ||
 			    (!(c->mfc_flags & MFC_STATIC) && !(flags & MRT6_FLUSH_MFC)))
 				continue;
 			rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params);
 			list_del_rcu(&c->list);
-			call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net),
-						       FIB_EVENT_ENTRY_DEL,
+			call_ip6mr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
 						       (struct mfc6_cache *)c, mrt->id);
 			mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE);
 			mr_cache_put(c);
 		}
+
+		mutex_unlock(&net->ipv6.mfc_mutex);
 	}
 
 	if (flags & MRT6_FLUSH_MFC) {
@@ -1765,15 +1769,18 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 			return -EFAULT;
 		if (parent == 0)
 			parent = mfc.mf6cc_parent;
-		rtnl_lock();
+
+		mutex_lock(&net->ipv6.mfc_mutex);
+
 		if (optname == MRT6_DEL_MFC || optname == MRT6_DEL_MFC_PROXY)
 			ret = ip6mr_mfc_delete(mrt, &mfc, parent);
 		else
 			ret = ip6mr_mfc_add(net, mrt, &mfc,
 					    sk ==
-					    rtnl_dereference(mrt->mroute_sk),
+					    rcu_access_pointer(mrt->mroute_sk),
 					    parent);
-		rtnl_unlock();
+
+		mutex_unlock(&net->ipv6.mfc_mutex);
 		return ret;
 
 	case MRT6_FLUSH:
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 14/15] ip6mr: Call fib_rules_unregister() without RTNL.
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

fib_rules_unregister() removes ops from net->rules_ops under
spinlock, calls ops->delete() for each rule, and frees the ops.

ip6mr_rules_ops_template does not have ->delete(), and any
operation does not require RTNL there.

Let's move fib_rules_unregister() from ip6mr_rules_exit_rtnl()
to ip6mr_net_exit().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6mr.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 3b8867e150fe..a31e3b740581 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -259,6 +259,11 @@ static int __net_init ip6mr_rules_init(struct net *net)
 	return err;
 }
 
+static void __net_exit ip6mr_rules_exit(struct net *net)
+{
+	fib_rules_unregister(net->ipv6.mr6_rules_ops);
+}
+
 static void __net_exit ip6mr_rules_exit_rtnl(struct net *net,
 					     struct list_head *dev_kill_list)
 {
@@ -268,8 +273,6 @@ static void __net_exit ip6mr_rules_exit_rtnl(struct net *net,
 		list_del_rcu(&mrt->list);
 		ip6mr_free_table(mrt, dev_kill_list);
 	}
-
-	fib_rules_unregister(net->ipv6.mr6_rules_ops);
 }
 
 static int ip6mr_rules_dump(struct net *net, struct notifier_block *nb,
@@ -329,6 +332,10 @@ static int __net_init ip6mr_rules_init(struct net *net)
 	return 0;
 }
 
+static void __net_exit ip6mr_rules_exit(struct net *net)
+{
+}
+
 static void __net_exit ip6mr_rules_exit_rtnl(struct net *net,
 					     struct list_head *dev_kill_list)
 {
@@ -1367,6 +1374,7 @@ static int __net_init ip6mr_net_init(struct net *net)
 	remove_proc_entry("ip6_mr_vif", net->proc_net);
 proc_vif_fail:
 	ip6mr_rules_exit_rtnl(net, &dev_kill_list);
+	ip6mr_rules_exit(net);
 #endif
 ip6mr_rules_fail:
 	ip6mr_notifier_exit(net);
@@ -1379,6 +1387,7 @@ static void __net_exit ip6mr_net_exit(struct net *net)
 	remove_proc_entry("ip6_mr_cache", net->proc_net);
 	remove_proc_entry("ip6_mr_vif", net->proc_net);
 #endif
+	ip6mr_rules_exit(net);
 	ip6mr_notifier_exit(net);
 }
 
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 13/15] ip6mr: Remove RTNL in ip6mr_rules_init() and ip6mr_net_init().
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

When ip6mr_free_table() is called from ip6mr_rules_init() or
ip6mr_net_init(), the netns is not yet published.

Thus, no device should have been registered, and
mroute_clean_tables() will not call mif6_delete(), so
unregister_netdevice_many() is unnecessary.

unregister_netdevice_many() does nothing if the list is empty,
but it requires RTNL due to the unconditional ASSERT_RTNL()
at the entry of unregister_netdevice_many_notify().

Let's remove unnecessary RTNL and ASSERT_RTNL() and instead
add WARN_ON_ONCE() in ip6mr_free_table().

Note that we use a local list for the new WARN_ON_ONCE() because
dev_kill_list passed from ip6mr_rules_exit_rtnl() may have some
devices when other ops->init() fails after ipmr durnig setup_net().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6mr.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 860fce51819e..3b8867e150fe 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -253,10 +253,7 @@ static int __net_init ip6mr_rules_init(struct net *net)
 	return 0;
 
 err2:
-	rtnl_lock();
 	ip6mr_free_table(mrt, &dev_kill_list);
-	unregister_netdevice_many(&dev_kill_list);
-	rtnl_unlock();
 err1:
 	fib_rules_unregister(ops);
 	return err;
@@ -267,7 +264,6 @@ static void __net_exit ip6mr_rules_exit_rtnl(struct net *net,
 {
 	struct mr_table *mrt, *next;
 
-	ASSERT_RTNL();
 	list_for_each_entry_safe(mrt, next, &net->ipv6.mr6_tables, list) {
 		list_del_rcu(&mrt->list);
 		ip6mr_free_table(mrt, dev_kill_list);
@@ -338,8 +334,6 @@ static void __net_exit ip6mr_rules_exit_rtnl(struct net *net,
 {
 	struct mr_table *mrt = rcu_dereference_protected(net->ipv6.mrt6, 1);
 
-	ASSERT_RTNL();
-
 	RCU_INIT_POINTER(net->ipv6.mrt6, NULL);
 	ip6mr_free_table(mrt, dev_kill_list);
 }
@@ -420,15 +414,19 @@ static void ip6mr_free_table(struct mr_table *mrt,
 			     struct list_head *dev_kill_list)
 {
 	struct net *net = read_pnet(&mrt->net);
+	LIST_HEAD(ip6mr_dev_kill_list);
 
 	WARN_ON_ONCE(!mr_can_free_table(net));
 
 	timer_shutdown_sync(&mrt->ipmr_expire_timer);
 	mroute_clean_tables(mrt, MRT6_FLUSH_MIFS | MRT6_FLUSH_MIFS_STATIC |
 			    MRT6_FLUSH_MFC | MRT6_FLUSH_MFC_STATIC,
-			    dev_kill_list);
+			    &ip6mr_dev_kill_list);
 	rhltable_destroy(&mrt->mfc_hash);
 	kfree_rcu(mrt, rcu);
+
+	WARN_ON_ONCE(!net_initialized(net) && !list_empty(&ip6mr_dev_kill_list));
+	list_splice(&ip6mr_dev_kill_list, dev_kill_list);
 }
 
 #ifdef CONFIG_PROC_FS
@@ -1368,10 +1366,7 @@ static int __net_init ip6mr_net_init(struct net *net)
 proc_cache_fail:
 	remove_proc_entry("ip6_mr_vif", net->proc_net);
 proc_vif_fail:
-	rtnl_lock();
 	ip6mr_rules_exit_rtnl(net, &dev_kill_list);
-	unregister_netdevice_many(&dev_kill_list);
-	rtnl_unlock();
 #endif
 ip6mr_rules_fail:
 	ip6mr_notifier_exit(net);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 12/15] ip6mr: Convert ip6mr_net_exit_batch() to ->exit_rtnl().
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

ip6mr_net_ops uses ->exit_batch() to acquire RTNL only once
for dying network namespaces.

ip6mr does not depend on the ordering of ->exit_rtnl() and
->exit_batch() of other pernet_operations (unlike fib_net_ops).

Once ip6mr_free_table() is called and all devices are
queued for destruction in ->exit_rtnl(), later during
NETDEV_UNREGISTER, ip6mr_device_event() will not see anything
in vif table and just do nothing.

Let's convert ip6mr_net_exit_batch() to ->exit_rtnl().

Note that fib_rules_unregister() does not need RTNL and
we will remove RTNL and unregister_netdevice_many() in
ip6mr_rules_init().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6mr.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index af11fd883831..860fce51819e 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -262,18 +262,17 @@ static int __net_init ip6mr_rules_init(struct net *net)
 	return err;
 }
 
-static void __net_exit ip6mr_rules_exit(struct net *net)
+static void __net_exit ip6mr_rules_exit_rtnl(struct net *net,
+					     struct list_head *dev_kill_list)
 {
 	struct mr_table *mrt, *next;
-	LIST_HEAD(dev_kill_list);
 
 	ASSERT_RTNL();
 	list_for_each_entry_safe(mrt, next, &net->ipv6.mr6_tables, list) {
 		list_del_rcu(&mrt->list);
-		ip6mr_free_table(mrt, &dev_kill_list);
+		ip6mr_free_table(mrt, dev_kill_list);
 	}
 
-	unregister_netdevice_many(&dev_kill_list);
 	fib_rules_unregister(net->ipv6.mr6_rules_ops);
 }
 
@@ -334,16 +333,15 @@ static int __net_init ip6mr_rules_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit ip6mr_rules_exit(struct net *net)
+static void __net_exit ip6mr_rules_exit_rtnl(struct net *net,
+					     struct list_head *dev_kill_list)
 {
 	struct mr_table *mrt = rcu_dereference_protected(net->ipv6.mrt6, 1);
-	LIST_HEAD(dev_kill_list);
 
 	ASSERT_RTNL();
 
 	RCU_INIT_POINTER(net->ipv6.mrt6, NULL);
-	ip6mr_free_table(mrt, &dev_kill_list);
-	unregister_netdevice_many(&dev_kill_list);
+	ip6mr_free_table(mrt, dev_kill_list);
 }
 
 static int ip6mr_rules_dump(struct net *net, struct notifier_block *nb,
@@ -1343,6 +1341,7 @@ static void __net_exit ip6mr_notifier_exit(struct net *net)
 /* Setup for IP multicast routing */
 static int __net_init ip6mr_net_init(struct net *net)
 {
+	LIST_HEAD(dev_kill_list);
 	int err;
 
 	err = ip6mr_notifier_init(net);
@@ -1370,7 +1369,8 @@ static int __net_init ip6mr_net_init(struct net *net)
 	remove_proc_entry("ip6_mr_vif", net->proc_net);
 proc_vif_fail:
 	rtnl_lock();
-	ip6mr_rules_exit(net);
+	ip6mr_rules_exit_rtnl(net, &dev_kill_list);
+	unregister_netdevice_many(&dev_kill_list);
 	rtnl_unlock();
 #endif
 ip6mr_rules_fail:
@@ -1387,20 +1387,16 @@ static void __net_exit ip6mr_net_exit(struct net *net)
 	ip6mr_notifier_exit(net);
 }
 
-static void __net_exit ip6mr_net_exit_batch(struct list_head *net_list)
+static void __net_exit ip6mr_net_exit_rtnl(struct net *net,
+					   struct list_head *dev_kill_list)
 {
-	struct net *net;
-
-	rtnl_lock();
-	list_for_each_entry(net, net_list, exit_list)
-		ip6mr_rules_exit(net);
-	rtnl_unlock();
+	ip6mr_rules_exit_rtnl(net, dev_kill_list);
 }
 
 static struct pernet_operations ip6mr_net_ops = {
 	.init = ip6mr_net_init,
 	.exit = ip6mr_net_exit,
-	.exit_batch = ip6mr_net_exit_batch,
+	.exit_rtnl = ip6mr_net_exit_rtnl,
 };
 
 static const struct rtnl_msg_handler ip6mr_rtnl_msg_handlers[] __initconst_or_module = {
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 11/15] ip6mr: Move unregister_netdevice_many() out of ip6mr_free_table().
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

This is a prep commit to convert ip6mr_net_exit_batch() to
->exit_rtnl().

Let's move unregister_netdevice_many() in ip6mr_free_table()
to its callers.

Now ip6mr_rules_exit() can do batching all tables per netns.

Note that later we will remove RTNL and unregister_netdevice_many()
in ip6mr_rules_init().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6mr.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index dd72eb346eb1..af11fd883831 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -85,7 +85,8 @@ static DEFINE_SPINLOCK(mfc_unres_lock);
 static struct kmem_cache *mrt_cachep __read_mostly;
 
 static struct mr_table *ip6mr_new_table(struct net *net, u32 id);
-static void ip6mr_free_table(struct mr_table *mrt);
+static void ip6mr_free_table(struct mr_table *mrt,
+			     struct list_head *dev_kill_list);
 
 static void ip6_mr_forward(struct net *net, struct mr_table *mrt,
 			   struct net_device *dev, struct sk_buff *skb,
@@ -228,6 +229,7 @@ static const struct fib_rules_ops __net_initconst ip6mr_rules_ops_template = {
 static int __net_init ip6mr_rules_init(struct net *net)
 {
 	struct fib_rules_ops *ops;
+	LIST_HEAD(dev_kill_list);
 	struct mr_table *mrt;
 	int err;
 
@@ -252,7 +254,8 @@ static int __net_init ip6mr_rules_init(struct net *net)
 
 err2:
 	rtnl_lock();
-	ip6mr_free_table(mrt);
+	ip6mr_free_table(mrt, &dev_kill_list);
+	unregister_netdevice_many(&dev_kill_list);
 	rtnl_unlock();
 err1:
 	fib_rules_unregister(ops);
@@ -262,12 +265,15 @@ static int __net_init ip6mr_rules_init(struct net *net)
 static void __net_exit ip6mr_rules_exit(struct net *net)
 {
 	struct mr_table *mrt, *next;
+	LIST_HEAD(dev_kill_list);
 
 	ASSERT_RTNL();
 	list_for_each_entry_safe(mrt, next, &net->ipv6.mr6_tables, list) {
 		list_del_rcu(&mrt->list);
-		ip6mr_free_table(mrt);
+		ip6mr_free_table(mrt, &dev_kill_list);
 	}
+
+	unregister_netdevice_many(&dev_kill_list);
 	fib_rules_unregister(net->ipv6.mr6_rules_ops);
 }
 
@@ -331,11 +337,13 @@ static int __net_init ip6mr_rules_init(struct net *net)
 static void __net_exit ip6mr_rules_exit(struct net *net)
 {
 	struct mr_table *mrt = rcu_dereference_protected(net->ipv6.mrt6, 1);
+	LIST_HEAD(dev_kill_list);
 
 	ASSERT_RTNL();
 
 	RCU_INIT_POINTER(net->ipv6.mrt6, NULL);
-	ip6mr_free_table(mrt);
+	ip6mr_free_table(mrt, &dev_kill_list);
+	unregister_netdevice_many(&dev_kill_list);
 }
 
 static int ip6mr_rules_dump(struct net *net, struct notifier_block *nb,
@@ -410,18 +418,17 @@ static struct mr_table *ip6mr_new_table(struct net *net, u32 id)
 			      ipmr_expire_process, ip6mr_new_table_set);
 }
 
-static void ip6mr_free_table(struct mr_table *mrt)
+static void ip6mr_free_table(struct mr_table *mrt,
+			     struct list_head *dev_kill_list)
 {
 	struct net *net = read_pnet(&mrt->net);
-	LIST_HEAD(dev_kill_list);
 
 	WARN_ON_ONCE(!mr_can_free_table(net));
 
 	timer_shutdown_sync(&mrt->ipmr_expire_timer);
 	mroute_clean_tables(mrt, MRT6_FLUSH_MIFS | MRT6_FLUSH_MIFS_STATIC |
 			    MRT6_FLUSH_MFC | MRT6_FLUSH_MFC_STATIC,
-			    &dev_kill_list);
-	unregister_netdevice_many(&dev_kill_list);
+			    dev_kill_list);
 	rhltable_destroy(&mrt->mfc_hash);
 	kfree_rcu(mrt, rcu);
 }
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 10/15] ip6mr: Move unregister_netdevice_many() out of mroute_clean_tables().
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

This is a prep commit to convert ip6mr_net_exit_batch() to
->exit_rtnl().

Let's move unregister_netdevice_many() in mroute_clean_tables()
to its callers.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6mr.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index fdec7a541cf6..dd72eb346eb1 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -99,7 +99,8 @@ static int ip6mr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 			      struct netlink_ext_ack *extack);
 static int ip6mr_rtm_dumproute(struct sk_buff *skb,
 			       struct netlink_callback *cb);
-static void mroute_clean_tables(struct mr_table *mrt, int flags);
+static void mroute_clean_tables(struct mr_table *mrt, int flags,
+				struct list_head *dev_kill_list);
 static void ipmr_expire_process(struct timer_list *t);
 
 #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES
@@ -412,12 +413,15 @@ static struct mr_table *ip6mr_new_table(struct net *net, u32 id)
 static void ip6mr_free_table(struct mr_table *mrt)
 {
 	struct net *net = read_pnet(&mrt->net);
+	LIST_HEAD(dev_kill_list);
 
 	WARN_ON_ONCE(!mr_can_free_table(net));
 
 	timer_shutdown_sync(&mrt->ipmr_expire_timer);
 	mroute_clean_tables(mrt, MRT6_FLUSH_MIFS | MRT6_FLUSH_MIFS_STATIC |
-				 MRT6_FLUSH_MFC | MRT6_FLUSH_MFC_STATIC);
+			    MRT6_FLUSH_MFC | MRT6_FLUSH_MFC_STATIC,
+			    &dev_kill_list);
+	unregister_netdevice_many(&dev_kill_list);
 	rhltable_destroy(&mrt->mfc_hash);
 	kfree_rcu(mrt, rcu);
 }
@@ -1541,10 +1545,10 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
  *	Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct mr_table *mrt, int flags)
+static void mroute_clean_tables(struct mr_table *mrt, int flags,
+				struct list_head *dev_kill_list)
 {
 	struct mr_mfc *c, *tmp;
-	LIST_HEAD(list);
 	int i;
 
 	/* Shut down all active vif entries */
@@ -1554,9 +1558,8 @@ static void mroute_clean_tables(struct mr_table *mrt, int flags)
 			     !(flags & MRT6_FLUSH_MIFS_STATIC)) ||
 			    (!(mrt->vif_table[i].flags & VIFF_STATIC) && !(flags & MRT6_FLUSH_MIFS)))
 				continue;
-			mif6_delete(mrt, i, 0, &list);
+			mif6_delete(mrt, i, 0, dev_kill_list);
 		}
-		unregister_netdevice_many(&list);
 	}
 
 	/* Wipe the cache */
@@ -1619,6 +1622,7 @@ int ip6mr_sk_done(struct sock *sk)
 {
 	struct net *net = sock_net(sk);
 	struct ipv6_devconf *devconf;
+	LIST_HEAD(dev_kill_list);
 	struct mr_table *mrt;
 	int err = -EACCES;
 
@@ -1646,11 +1650,13 @@ int ip6mr_sk_done(struct sock *sk)
 						     NETCONFA_IFINDEX_ALL,
 						     net->ipv6.devconf_all);
 
-			mroute_clean_tables(mrt, MRT6_FLUSH_MIFS | MRT6_FLUSH_MFC);
+			mroute_clean_tables(mrt, MRT6_FLUSH_MIFS | MRT6_FLUSH_MFC,
+					    &dev_kill_list);
 			err = 0;
 			break;
 		}
 	}
+	unregister_netdevice_many(&dev_kill_list);
 	rtnl_unlock();
 
 	return err;
@@ -1765,14 +1771,17 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 
 	case MRT6_FLUSH:
 	{
+		LIST_HEAD(dev_kill_list);
 		int flags;
 
 		if (optlen != sizeof(flags))
 			return -EINVAL;
 		if (copy_from_sockptr(&flags, optval, sizeof(flags)))
 			return -EFAULT;
+
 		rtnl_lock();
-		mroute_clean_tables(mrt, flags);
+		mroute_clean_tables(mrt, flags, &dev_kill_list);
+		unregister_netdevice_many(&dev_kill_list);
 		rtnl_unlock();
 		return 0;
 	}
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 09/15] ip6mr: Free mr_table after RCU grace period.
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

Since default_device_exit_batch() is called after ->exit_rtnl(),
idev->mc_ifc_work could finally call mroute6_is_socket() under RCU
while ->exit_rtnl() is running. [0]

With CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=n, ip6mr_fib_lookup() does
not check if net->ipv6.mrt6 is NULL.  If ip6mr_net_exit_batch()
set net->ipv6.mrt6 to NULL and freed it, the mrt->mroute_sk access
could result in null-ptr-deref or use-after-free.

Let's prepare for that situation by applying RCU rule to ip6mr
table similarly.

Link: https://lore.kernel.org/netdev/20260407184202.34cfe2d6@kernel.org/ #[0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6mr.c | 53 +++++++++++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 23 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 2b04e52ec61c..fdec7a541cf6 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -136,16 +136,6 @@ static struct mr_table *__ip6mr_get_table(struct net *net, u32 id)
 	return NULL;
 }
 
-static struct mr_table *ip6mr_get_table(struct net *net, u32 id)
-{
-	struct mr_table *mrt;
-
-	rcu_read_lock();
-	mrt = __ip6mr_get_table(net, id);
-	rcu_read_unlock();
-	return mrt;
-}
-
 static int ip6mr_fib_lookup(struct net *net, struct flowi6 *flp6,
 			    struct mr_table **mrt)
 {
@@ -274,7 +264,7 @@ static void __net_exit ip6mr_rules_exit(struct net *net)
 
 	ASSERT_RTNL();
 	list_for_each_entry_safe(mrt, next, &net->ipv6.mr6_tables, list) {
-		list_del(&mrt->list);
+		list_del_rcu(&mrt->list);
 		ip6mr_free_table(mrt);
 	}
 	fib_rules_unregister(net->ipv6.mr6_rules_ops);
@@ -298,28 +288,30 @@ bool ip6mr_rule_default(const struct fib_rule *rule)
 }
 EXPORT_SYMBOL(ip6mr_rule_default);
 #else
-#define ip6mr_for_each_table(mrt, net) \
-	for (mrt = net->ipv6.mrt6; mrt; mrt = NULL)
-
 static struct mr_table *ip6mr_mr_table_iter(struct net *net,
 					    struct mr_table *mrt)
 {
 	if (!mrt)
-		return net->ipv6.mrt6;
+		return rcu_dereference(net->ipv6.mrt6);
 	return NULL;
 }
 
-static struct mr_table *ip6mr_get_table(struct net *net, u32 id)
+static struct mr_table *__ip6mr_get_table(struct net *net, u32 id)
 {
-	return net->ipv6.mrt6;
+	return rcu_dereference_check(net->ipv6.mrt6,
+				     lockdep_rtnl_is_held() ||
+				     !rcu_access_pointer(net->ipv6.mrt6));
 }
 
-#define __ip6mr_get_table ip6mr_get_table
+#define ip6mr_for_each_table(mrt, net)				\
+	for (mrt = __ip6mr_get_table(net, 0); mrt; mrt = NULL)
 
 static int ip6mr_fib_lookup(struct net *net, struct flowi6 *flp6,
 			    struct mr_table **mrt)
 {
-	*mrt = net->ipv6.mrt6;
+	*mrt = rcu_dereference(net->ipv6.mrt6);
+	if (!*mrt)
+		return -EAGAIN;
 	return 0;
 }
 
@@ -330,15 +322,19 @@ static int __net_init ip6mr_rules_init(struct net *net)
 	mrt = ip6mr_new_table(net, RT6_TABLE_DFLT);
 	if (IS_ERR(mrt))
 		return PTR_ERR(mrt);
-	net->ipv6.mrt6 = mrt;
+
+	rcu_assign_pointer(net->ipv6.mrt6, mrt);
 	return 0;
 }
 
 static void __net_exit ip6mr_rules_exit(struct net *net)
 {
+	struct mr_table *mrt = rcu_dereference_protected(net->ipv6.mrt6, 1);
+
 	ASSERT_RTNL();
-	ip6mr_free_table(net->ipv6.mrt6);
-	net->ipv6.mrt6 = NULL;
+
+	RCU_INIT_POINTER(net->ipv6.mrt6, NULL);
+	ip6mr_free_table(mrt);
 }
 
 static int ip6mr_rules_dump(struct net *net, struct notifier_block *nb,
@@ -353,6 +349,17 @@ static unsigned int ip6mr_rules_seq_read(const struct net *net)
 }
 #endif
 
+static struct mr_table *ip6mr_get_table(struct net *net, u32 id)
+{
+	struct mr_table *mrt;
+
+	rcu_read_lock();
+	mrt = __ip6mr_get_table(net, id);
+	rcu_read_unlock();
+
+	return mrt;
+}
+
 static int ip6mr_hash_cmp(struct rhashtable_compare_arg *arg,
 			  const void *ptr)
 {
@@ -412,7 +419,7 @@ static void ip6mr_free_table(struct mr_table *mrt)
 	mroute_clean_tables(mrt, MRT6_FLUSH_MIFS | MRT6_FLUSH_MIFS_STATIC |
 				 MRT6_FLUSH_MFC | MRT6_FLUSH_MFC_STATIC);
 	rhltable_destroy(&mrt->mfc_hash);
-	kfree(mrt);
+	kfree_rcu(mrt, rcu);
 }
 
 #ifdef CONFIG_PROC_FS
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 08/15] ipmr: Free mr_table after RCU grace period.
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

With CONFIG_IP_MROUTE_MULTIPLE_TABLES=n, ipmr_fib_lookup()
does not check if net->ipv4.mrt is NULL.

Since default_device_exit_batch() is called after ->exit_rtnl(),
a device could receive IGMP packets and access net->ipv4.mrt
during/after ipmr_rules_exit_rtnl().

If ipmr_rules_exit_rtnl() had already cleared it and freed the
memory, the access would trigger null-ptr-deref or use-after-free.

Let's fix it by using RCU helper and free mrt after RCU grace
period.

Note that rcu_head must be placed within <4K offset and mr_table
is already 3864 bytes without rcu_head.

Fixes: b22b01867406 ("ipmr: Convert ipmr_net_exit_batch() to ->exit_rtnl().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 include/linux/mroute_base.h |  2 ++
 net/ipv4/ipmr.c             | 51 ++++++++++++++++++++-----------------
 2 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
index cf3374580f74..db3f98cae4c9 100644
--- a/include/linux/mroute_base.h
+++ b/include/linux/mroute_base.h
@@ -226,6 +226,7 @@ struct mr_table_ops {
 
 /**
  * struct mr_table - a multicast routing table
+ * @rcu: used for table destruction
  * @list: entry within a list of multicast routing tables
  * @net: net where this table belongs
  * @ops: protocol specific operations
@@ -243,6 +244,7 @@ struct mr_table_ops {
  * @mroute_reg_vif_num: PIM-device vif index
  */
 struct mr_table {
+	struct rcu_head		rcu;
 	struct list_head	list;
 	possible_net_t		net;
 	struct mr_table_ops	ops;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index fa168513295d..3bf63f8ea606 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -151,16 +151,6 @@ static struct mr_table *__ipmr_get_table(struct net *net, u32 id)
 	return NULL;
 }
 
-static struct mr_table *ipmr_get_table(struct net *net, u32 id)
-{
-	struct mr_table *mrt;
-
-	rcu_read_lock();
-	mrt = __ipmr_get_table(net, id);
-	rcu_read_unlock();
-	return mrt;
-}
-
 static int ipmr_fib_lookup(struct net *net, struct flowi4 *flp4,
 			   struct mr_table **mrt)
 {
@@ -293,7 +283,7 @@ static void __net_exit ipmr_rules_exit_rtnl(struct net *net,
 	struct mr_table *mrt, *next;
 
 	list_for_each_entry_safe(mrt, next, &net->ipv4.mr_tables, list) {
-		list_del(&mrt->list);
+		list_del_rcu(&mrt->list);
 		ipmr_free_table(mrt, dev_kill_list);
 	}
 }
@@ -315,28 +305,30 @@ bool ipmr_rule_default(const struct fib_rule *rule)
 }
 EXPORT_SYMBOL(ipmr_rule_default);
 #else
-#define ipmr_for_each_table(mrt, net) \
-	for (mrt = net->ipv4.mrt; mrt; mrt = NULL)
-
 static struct mr_table *ipmr_mr_table_iter(struct net *net,
 					   struct mr_table *mrt)
 {
 	if (!mrt)
-		return net->ipv4.mrt;
+		return rcu_dereference(net->ipv4.mrt);
 	return NULL;
 }
 
-static struct mr_table *ipmr_get_table(struct net *net, u32 id)
+static struct mr_table *__ipmr_get_table(struct net *net, u32 id)
 {
-	return net->ipv4.mrt;
+	return rcu_dereference_check(net->ipv4.mrt,
+				     lockdep_rtnl_is_held() ||
+				     !rcu_access_pointer(net->ipv4.mrt));
 }
 
-#define __ipmr_get_table ipmr_get_table
+#define ipmr_for_each_table(mrt, net)				\
+	for (mrt = __ipmr_get_table(net, 0); mrt; mrt = NULL)
 
 static int ipmr_fib_lookup(struct net *net, struct flowi4 *flp4,
 			   struct mr_table **mrt)
 {
-	*mrt = net->ipv4.mrt;
+	*mrt = rcu_dereference(net->ipv4.mrt);
+	if (!*mrt)
+		return -EAGAIN;
 	return 0;
 }
 
@@ -347,7 +339,8 @@ static int __net_init ipmr_rules_init(struct net *net)
 	mrt = ipmr_new_table(net, RT_TABLE_DEFAULT);
 	if (IS_ERR(mrt))
 		return PTR_ERR(mrt);
-	net->ipv4.mrt = mrt;
+
+	rcu_assign_pointer(net->ipv4.mrt, mrt);
 	return 0;
 }
 
@@ -358,9 +351,10 @@ static void __net_exit ipmr_rules_exit(struct net *net)
 static void __net_exit ipmr_rules_exit_rtnl(struct net *net,
 					    struct list_head *dev_kill_list)
 {
-	ipmr_free_table(net->ipv4.mrt, dev_kill_list);
+	struct mr_table *mrt = rcu_dereference_protected(net->ipv4.mrt, 1);
 
-	net->ipv4.mrt = NULL;
+	RCU_INIT_POINTER(net->ipv4.mrt, NULL);
+	ipmr_free_table(mrt, dev_kill_list);
 }
 
 static int ipmr_rules_dump(struct net *net, struct notifier_block *nb,
@@ -381,6 +375,17 @@ bool ipmr_rule_default(const struct fib_rule *rule)
 EXPORT_SYMBOL(ipmr_rule_default);
 #endif
 
+static struct mr_table *ipmr_get_table(struct net *net, u32 id)
+{
+	struct mr_table *mrt;
+
+	rcu_read_lock();
+	mrt = __ipmr_get_table(net, id);
+	rcu_read_unlock();
+
+	return mrt;
+}
+
 static inline int ipmr_hash_cmp(struct rhashtable_compare_arg *arg,
 				const void *ptr)
 {
@@ -446,7 +451,7 @@ static void ipmr_free_table(struct mr_table *mrt, struct list_head *dev_kill_lis
 			    MRT_FLUSH_MFC | MRT_FLUSH_MFC_STATIC,
 			    &ipmr_dev_kill_list);
 	rhltable_destroy(&mrt->mfc_hash);
-	kfree(mrt);
+	kfree_rcu(mrt, rcu);
 
 	WARN_ON_ONCE(!net_initialized(net) && !list_empty(&ipmr_dev_kill_list));
 	list_splice(&ipmr_dev_kill_list, dev_kill_list);
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 07/15] net: Remove rtnl_held of struct fib_dump_filter.
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

Commit 22e36ea9f5d7 ("inet: allow ip_valid_fib_dump_req() to
be called with RTNL or RCU") introduced the rtnl_held field in
struct fib_dump_filter to switch __dev_get_by_index() and
dev_get_by_index_rcu() depending on the caller's context.

This field served as an interim measure while we were incrementally
converting all callers of ip_valid_fib_dump_req() to RCU.

Now that all users (IPv4, IPv6, ipmr, ip6mr, and MPLS) have
been converted to RCU, the field is no longer necessary.

Let's remove it.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 include/net/ip_fib.h    |  1 -
 net/ipv4/fib_frontend.c | 19 ++++++-------------
 net/ipv4/ipmr.c         |  4 +---
 net/ipv6/ip6_fib.c      |  1 -
 net/ipv6/ip6mr.c        |  4 +---
 net/mpls/af_mpls.c      |  6 ++----
 6 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 318593743b6e..1142ffad7444 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -269,7 +269,6 @@ struct fib_dump_filter {
 	bool			filter_set;
 	bool			dump_routes;
 	bool			dump_exceptions;
-	bool			rtnl_held;
 	unsigned char		protocol;
 	unsigned char		rt_type;
 	unsigned int		flags;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 1dab44e13d3b..ceeb87b13b93 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -946,9 +946,6 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
 	struct rtmsg *rtm;
 	int err, i;
 
-	if (filter->rtnl_held)
-		ASSERT_RTNL();
-
 	rtm = nlmsg_payload(nlh, sizeof(*rtm));
 	if (!rtm) {
 		NL_SET_ERR_MSG(extack, "Invalid header for FIB dump request");
@@ -992,10 +989,8 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
 			break;
 		case RTA_OIF:
 			ifindex = nla_get_u32(tb[i]);
-			if (filter->rtnl_held)
-				filter->dev = __dev_get_by_index(net, ifindex);
-			else
-				filter->dev = dev_get_by_index_rcu(net, ifindex);
+
+			filter->dev = dev_get_by_index_rcu(net, ifindex);
 			if (!filter->dev)
 				return -ENODEV;
 			break;
@@ -1017,18 +1012,16 @@ EXPORT_SYMBOL_GPL(ip_valid_fib_dump_req);
 
 static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	const struct nlmsghdr *nlh = cb->nlh;
+	struct net *net = sock_net(skb->sk);
 	struct fib_dump_filter filter = {
 		.dump_routes = true,
 		.dump_exceptions = true,
-		.rtnl_held = false,
 	};
-	const struct nlmsghdr *nlh = cb->nlh;
-	struct net *net = sock_net(skb->sk);
-	unsigned int h, s_h;
-	unsigned int e = 0, s_e;
-	struct fib_table *tb;
+	unsigned int e = 0, s_e, h, s_h;
 	struct hlist_head *head;
 	int dumped = 0, err = 0;
+	struct fib_table *tb;
 
 	rcu_read_lock();
 	if (cb->strict_check) {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 8a08d09b4c30..fa168513295d 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2767,9 +2767,7 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 
 static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	struct fib_dump_filter filter = {
-		.rtnl_held = false,
-	};
+	struct fib_dump_filter filter = {};
 	int err;
 
 	rcu_read_lock();
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index b897b3c5023b..fc95738ded76 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -633,7 +633,6 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	struct rt6_rtnl_dump_arg arg = {
 		.filter.dump_exceptions = true,
 		.filter.dump_routes = true,
-		.filter.rtnl_held = false,
 	};
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 9d02cd3b274c..2b04e52ec61c 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2747,9 +2747,7 @@ static int ip6mr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	const struct nlmsghdr *nlh = cb->nlh;
-	struct fib_dump_filter filter = {
-		.rtnl_held = false,
-	};
+	struct fib_dump_filter filter = {};
 	int err;
 
 	rcu_read_lock();
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 26340a7306b5..ca504d9626cf 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -2221,12 +2221,10 @@ static bool mpls_rt_uses_dev(struct mpls_route *rt,
 
 static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	struct mpls_route __rcu **platform_label;
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
-	struct mpls_route __rcu **platform_label;
-	struct fib_dump_filter filter = {
-		.rtnl_held = false,
-	};
+	struct fib_dump_filter filter = {};
 	unsigned int flags = NLM_F_MULTI;
 	size_t platform_labels;
 	unsigned int index;
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 06/15] ip6mr: Convert ip6mr_rtm_dumproute() to RCU.
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

ip6mr_rtm_dumproute() calls mr_table_dump() or mr_rtm_dumproute(),
and mr_rtm_dumproute() finally calls mr_table_dump().

mr_table_dump() calls the passed function, _ip6mr_fill_mroute().

_ip6mr_fill_mroute() is a wrapper for ip6mr_fill_mroute() to cast
struct mr_mfc * to struct mfc6_cache *.

ip6mr_fill_mroute() can already be called safely under RCU.

Let's convert ip6mr_rtm_dumproute() to RCU.

Now there is no user of the rtnl_held field in struct
fib_dump_filter, and the next patch will remove it.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6mr.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 0054db00fadf..9d02cd3b274c 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1389,7 +1389,7 @@ static const struct rtnl_msg_handler ip6mr_rtnl_msg_handlers[] __initconst_or_mo
 	{.owner = THIS_MODULE, .protocol = RTNL_FAMILY_IP6MR,
 	 .msgtype = RTM_GETROUTE,
 	 .doit = ip6mr_rtm_getroute, .dumpit = ip6mr_rtm_dumproute,
-	 .flags = RTNL_FLAG_DOIT_UNLOCKED},
+	 .flags = RTNL_FLAG_DOIT_UNLOCKED | RTNL_FLAG_DUMP_UNLOCKED},
 };
 
 int __init ip6_mr_init(void)
@@ -2748,15 +2748,17 @@ static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct fib_dump_filter filter = {
-		.rtnl_held = true,
+		.rtnl_held = false,
 	};
 	int err;
 
+	rcu_read_lock();
+
 	if (cb->strict_check) {
 		err = ip_valid_fib_dump_req(sock_net(skb->sk), nlh,
 					    &filter, cb);
 		if (err < 0)
-			return err;
+			goto unlock;
 	}
 
 	if (filter.table_id) {
@@ -2764,17 +2766,26 @@ static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
 
 		mrt = __ip6mr_get_table(sock_net(skb->sk), filter.table_id);
 		if (!mrt) {
-			if (rtnl_msg_family(cb->nlh) != RTNL_FAMILY_IP6MR)
-				return skb->len;
+			if (rtnl_msg_family(cb->nlh) != RTNL_FAMILY_IP6MR) {
+				err = skb->len;
+				goto unlock;
+			}
 
 			NL_SET_ERR_MSG_MOD(cb->extack, "MR table does not exist");
-			return -ENOENT;
+			err = -ENOENT;
+			goto unlock;
 		}
+
 		err = mr_table_dump(mrt, skb, cb, _ip6mr_fill_mroute,
 				    &mfc_unres_lock, &filter);
-		return skb->len ? : err;
+		err = skb->len ? : err;
+		goto unlock;
 	}
 
-	return mr_rtm_dumproute(skb, cb, ip6mr_mr_table_iter,
-				_ip6mr_fill_mroute, &mfc_unres_lock, &filter);
+	err = mr_rtm_dumproute(skb, cb, ip6mr_mr_table_iter,
+			       _ip6mr_fill_mroute, &mfc_unres_lock, &filter);
+unlock:
+	rcu_read_unlock();
+
+	return err;
 }
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH v2 net-next 05/15] ip6mr: Convert ip6mr_rtm_getroute() to RCU.
From: Kuniyuki Iwashima @ 2026-04-10 21:17 UTC (permalink / raw)
  To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
In-Reply-To: <20260410211726.1668756-1-kuniyu@google.com>

ip6mr_rtm_getroute() calls __ip6mr_get_table(), ip6mr_cache_find(),
and ip6mr_fill_mroute().

Once created, struct mr_table is not freed until netns dismantle,
so it's safe under RCU.

ip6mr_cache_find() iterates mrt->mfc_hash with rhl_for_each_entry_rcu().
struct mr_mfc is freed with call_rcu(), so this is also safe under
RCU.

ip6mr_fill_mroute() calls mr_fill_mroute(), which properly uses
RCU helpers.

Let's call them under RCU and register ip6mr_rtm_getroute() with
RTNL_FLAG_DOIT_UNLOCKED.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6mr.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 5356957bfe94..0054db00fadf 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1388,7 +1388,8 @@ static struct pernet_operations ip6mr_net_ops = {
 static const struct rtnl_msg_handler ip6mr_rtnl_msg_handlers[] __initconst_or_module = {
 	{.owner = THIS_MODULE, .protocol = RTNL_FAMILY_IP6MR,
 	 .msgtype = RTM_GETROUTE,
-	 .doit = ip6mr_rtm_getroute, .dumpit = ip6mr_rtm_dumproute},
+	 .doit = ip6mr_rtm_getroute, .dumpit = ip6mr_rtm_dumproute,
+	 .flags = RTNL_FLAG_DOIT_UNLOCKED},
 };
 
 int __init ip6_mr_init(void)
@@ -2712,6 +2713,8 @@ static int ip6mr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		grp = nla_get_in6_addr(tb[RTA_DST]);
 	tableid = nla_get_u32_default(tb[RTA_TABLE], 0);
 
+	rcu_read_lock();
+
 	mrt = __ip6mr_get_table(net, tableid ?: RT_TABLE_DEFAULT);
 	if (!mrt) {
 		NL_SET_ERR_MSG_MOD(extack, "MR table does not exist");
@@ -2719,10 +2722,7 @@ static int ip6mr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		goto err;
 	}
 
-	/* entries are added/deleted only under RTNL */
-	rcu_read_lock();
 	cache = ip6mr_cache_find(mrt, &src, &grp);
-	rcu_read_unlock();
 	if (!cache) {
 		NL_SET_ERR_MSG_MOD(extack, "MR cache entry not found");
 		err = -ENOENT;
@@ -2734,9 +2734,12 @@ static int ip6mr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	if (err < 0)
 		goto err;
 
+	rcu_read_unlock();
+
 	return rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
 
 err:
+	rcu_read_unlock();
 	kfree_skb(skb);
 	return err;
 }
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox