Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v13 10/10] Documentation: networking: Update the phy_port infrastructure description
From: Maxime Chevallier @ 2026-07-01 11:04 UTC (permalink / raw)
  To: davem, Andrew Lunn, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Russell King, Heiner Kallweit
  Cc: Maxime Chevallier, netdev, linux-kernel, thomas.petazzoni,
	Christophe Leroy, Herve Codina, Florian Fainelli, Vladimir Oltean,
	Köry Maincent, Marek Behún, Oleksij Rempel,
	Nicolò Veronese, Simon Horman, mwojtas, Romain Gantois,
	Daniel Golle, Dimitri Fedrau, Frank Wunderlich
In-Reply-To: <20260701110427.143945-1-maxime.chevallier@bootlin.com>

With SFP now properly supported with phy_port, add some details in the
documentation. Fix a typo along the way (driver -> driven).

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
 Documentation/networking/phy-port.rst | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/phy-port.rst b/Documentation/networking/phy-port.rst
index 6e28d9094bce..73ea06db0fd9 100644
--- a/Documentation/networking/phy-port.rst
+++ b/Documentation/networking/phy-port.rst
@@ -99,13 +99,29 @@ will eventually be able to report its own ksettings::
             (_____)-----| Port |
                         +------+
 
+SFP ports
+=========
+
+SFP interfaces involve 2 distinct components, each represented by
+a :c:type:`struct phy_port <phy_port>` instance :
+
+ - The SFP cage itself is a :c:type:`struct phy_port <phy_port>`. It's special
+   in that it's not an MDI interface, but rather a hot-pluggable MII.
+   The :c:type:`struct phy_port <phy_port>` associated to it lists the different
+   MII interfaces we can use on the cage.
+
+ - The SFP module, when inserted, will also be associated to a
+   :c:type:`struct phy_port <phy_port>`, that represents the various linkmodes
+   that it gives access to. The module's :c:type:`struct phy_port <phy_port>`
+   doesn't supersedes the cage's port, it references it through
+   the :c:type:`struct phy_port <phy_port>`. :c:member:`upstream_port` field.
+
 Next steps
 ==========
 
-As of writing this documentation, only ports controlled by PHY devices are
-supported. The next steps will be to add the Netlink API to expose these
-to userspace and add support for raw ports (controlled by some firmware, and directly
-managed by the NIC driver).
+As of writing this documentation, the port's presence and information can only
+be queried, and it's not possible to change any of the port's settings or select
+which one should be used.
 
 Another parallel task is the introduction of a MII muxing framework to allow the
-control of non-PHY driver multi-port setups.
+control of non-PHY driven multi-port setups.
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Kaitao Cheng @ 2026-07-01 11:07 UTC (permalink / raw)
  To: Jani Nikula, David Laight, Christian König,
	David Hildenbrand (Arm), Alexei Starovoitov
  Cc: Andrew Morton, Jens Axboe, Tejun Heo, Alexander Viro,
	Christian Brauner, Daniel Borkmann, Andrii Nakryiko,
	Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
	Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
	Paul E. McKenney, Shakeel Butt, David Howells, Simona Vetter,
	Randy Dunlap, Luca Ceresoli, Philipp Stanner, linux-block,
	linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel, io-uring,
	audit, bpf, netdev, dri-devel, linux-perf-users,
	linux-trace-kernel, kexec, live-patching, linux-modules,
	linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
	damon, llvm, Kaitao Cheng, Muchun Song
In-Reply-To: <734f66ca51485ee3ec9788c0eaaead681e00664b@intel.com>

在 2026/6/25 19:00, Jani Nikula 写道:
> On Thu, 25 Jun 2026, Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>> 在 2026/6/24 22:23, David Laight 写道:
>>> On Wed, 24 Jun 2026 15:23:47 +0200
>>> Christian König <christian.koenig@amd.com> wrote:
>>>> On 6/24/26 15:14, Kaitao Cheng wrote:
>>>>> 在 2026/6/22 16:42, David Laight 写道:  
>>>>>> On Mon, 22 Jun 2026 12:05:31 +0800
>>>>>> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>>>>>  
>>>>>>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>>>
>>>>>>> The list_for_each*_safe() helpers are used when the loop body may
>>>>>>> remove the current entry.  Their API exposes the temporary cursor at
>>>>>>> every call site, even though most users only need it for the iterator
>>>>>>> implementation and never reference it in the loop body.
>>>>>>>
>>>>>>> Add *_mutable() variants for list and hlist iteration.  The new helpers
>>>>>>> support both forms: callers may keep passing an explicit temporary cursor
>>>>>>> when they need to inspect or reset it, or omit it and let the helper use
>>>>>>> a unique internal cursor.  
>>>>>>
>>>>>> I'm not really sure 'mutable' means anything either.
>>>>>> It is possible to make it valid for the loop body (or even other threads)
>>>>>> to delete arbitrary list items - but that needs significant extra overheads.
>>>>>>
>>>>>> It might be worth doing something that doesn't need the extra variable,
>>>>>> but there is little point doing all the churn just to rename things.
>>>>>>  
>>>>>>>
>>>>>>> This makes call sites that only mutate the list through the current entry
>>>>>>> less noisy, while keeping the existing *_safe() helpers available for
>>>>>>> compatibility.
>>>>>>>
>>>>>>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>>> ---
>>>>>>>  include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
>>>>>>>  1 file changed, 231 insertions(+), 38 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/list.h b/include/linux/list.h
>>>>>>> index 09d979976b3b..1081def7cea9 100644
>>>>>>> --- a/include/linux/list.h
>>>>>>> +++ b/include/linux/list.h
>>>>>>> @@ -7,6 +7,7 @@
>>>>>>>  #include <linux/stddef.h>
>>>>>>>  #include <linux/poison.h>
>>>>>>>  #include <linux/const.h>
>>>>>>> +#include <linux/args.h>
>>>>>>>  
>>>>>>>  #include <asm/barrier.h>
>>>>>>>  
>>>>>>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
>>>>>>>  #define list_for_each_prev(pos, head) \
>>>>>>>  	for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>>>>>>>  
>>>>>>> -/**
>>>>>>> - * list_for_each_safe - iterate over a list safe against removal of list entry
>>>>>>> - * @pos:	the &struct list_head to use as a loop cursor.
>>>>>>> - * @n:		another &struct list_head to use as temporary storage
>>>>>>> - * @head:	the head for your list.
>>>>>>> +/*
>>>>>>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
>>>>>>>   */
>>>>>>>  #define list_for_each_safe(pos, n, head) \
>>>>>>>  	for (pos = (head)->next, n = pos->next; \
>>>>>>>  	     !list_is_head(pos, (head)); \
>>>>>>>  	     pos = n, n = pos->next)
>>>>>>>  
>>>>>>> +#define __list_for_each_mutable_internal(pos, tmp, head)		\
>>>>>>> +	for (typeof(pos) tmp = (pos = (head)->next)->next;		\  
>>>>>>
>>>>>> Use auto
>>>>>>  
>>>>>>> +	     !list_is_head(pos, (head));				\
>>>>>>> +	     pos = tmp, tmp = pos->next)
>>>>>>> +
>>>>>>> +#define __list_for_each_mutable1(pos, head)				\
>>>>>>> +	__list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
>>>>>>> +
>>>>>>> +#define __list_for_each_mutable2(pos, next, head)			\
>>>>>>> +	list_for_each_safe(pos, next, head)
>>>>>>> +
>>>>>>>  /**
>>>>>>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
>>>>>>> + * list_for_each_mutable - iterate over a list safe against entry removal
>>>>>>>   * @pos:	the &struct list_head to use as a loop cursor.
>>>>>>> - * @n:		another &struct list_head to use as temporary storage
>>>>>>> - * @head:	the head for your list.
>>>>>>> + * @...:	either (head) or (next, head)
>>>>>>> + *
>>>>>>> + * next:	another &struct list_head to use as optional temporary storage.
>>>>>>> + *		The temporary cursor is internal unless explicitly supplied by
>>>>>>> + *		the caller.
>>>>>>> + * head:	the head for your list.
>>>>>>> + */
>>>>>>> +#define list_for_each_mutable(pos, ...)					\
>>>>>>> +	CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__))	\
>>>>>>> +		(pos, __VA_ARGS__)  
>>>>>>
>>>>>> The variable argument count logic really just slows down compilation.
>>>>>> Maybe there aren't enough copies of this code to make that significant.
>>>>>> But just because you can do it doesn't mean it is a gooD idea.
>>>>>> I'm also not sure it really adds anything to the readability.
>>>>>>
>>>>>> And, it you are going to make the middle argument optional there is
>>>>>> no need to change the macro name.  
>>>>>
>>>>> Christian König and Jani Nikula also disagree with the variadic-argument
>>>>> implementation approach. If we abandon that method, it means we will
>>>>> inevitably need to add some new macros. If mutable is not a good name,
>>>>> suggestions for better alternatives would be welcome; coming up with a
>>>>> suitable name is indeed rather tricky.  
>>>>
>>>> I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
>>>>
>>>> If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
>>>
>>> IIRC currently you have a choice of either:
>>> 	define               Item that can't be deleted
>>> 	list_for_each()	     The current item.
>>> 	list_for_each_safe() The next item.
>>> There is also likely to be code that updates the variables to allow
>>> for other scenarios.
>>>
>>> Note that if increase a reference count and release a lock then list_for_each()
>>> is likely safer than list_for_each_safe() :-)
>>>
>>> list.h has 9 variants of the 'safe' loop.
>>> The bloat of another 9 is getting excessive.
>>>
>>> It has to be said that this is one of my least favourite type of list...
>>
>> Hi Christian König, David Laight, Jani Nikula, David Hildenbrand,
>> Andy Shevchenko, Alexei Starovoitov
>>
>> For ease of discussion, I need to summarize the currently possible
>> approaches and briefly describe their respective pros and cons,
>> using the list_for_each_entry* interfaces as examples.
>>
>> 1. Add list_for_each_entry_mutable, while keeping list_for_each_entry
>> and list_for_each_entry_safe unchanged. list_for_each_entry_mutable
>> would be used specifically for safe deletion scenarios that do not
>> need to expose the temporary cursor externally. The code can refer to
>> the v1 version.
>>
>> Pros: Does not depend on immediate per-subsystem adaptation and can be
>>       merged directly.
>> Cons: Requires adding a whole set of mutable interfaces, which makes the
>>       code somewhat redundant.
> 
> Seems fine, and the original _safe naming is ambiguous anyway.
> 
>> 2. Directly optimize away the temporary cursor in list_for_each_entry_safe
>> and define it inside the loop instead, changing the interface from four
>> arguments to three.
>>
>> Pros: Does not add redundant interfaces.
>> Cons: (1) Users need to manually update special cases that use the
>>       traversal variable of list_for_each_entry_safe, the new
>>       list_for_each_entry_safe would no longer apply there and would
>>       need to be open-coded.
>>       (2) Because the macro arguments changes, all list_for_each_entry_safe
>>       callers would need to be modified and merged together, making it
>>       difficult to merge such a large amount of code at once.
> 
> This won't fly because there are literally thousands of
> list_for_each_entry_safe() users.
> 
>> 3. Use a variadic macro approach to optimize list_for_each_entry_safe,
>> so that it supports both three and four arguments.
>>
>> Pros: (1) Does not add redundant interfaces.
>>       (2) Does not depend on immediate per-subsystem adaptation and can
>>       be merged directly.
>> Cons: (1) Increases compile time.
>>       (2) Makes the interface harder for users to use.
> 
> Basically I'm against any variadic macro tricks where the optional
> argument is not the last argument. That's just way too surprising, and
> goes against common practice in just about all other languages.
> 
>> 4. Optimize list_for_each_entry by defining the temporary cursor internally,
>> making it compatible with the functionality of list_for_each_entry_safe.
>> The code can refer to the v2 version.
>>
>> Pros: (1) Does not add redundant interfaces.
>>       (2) The number of externally visible arguments of list_for_each_entry
>>       remains unchanged, still three.
>> Cons: (1) list_for_each_entry and list_for_each_entry_safe would be merged
>>       into one, and list_for_each_entry_safe would gradually be deprecated.
>>       (2) Users need to manually update special cases that use the traversal
>>       variable of list_for_each_entry, the new list_for_each_entry would no
>>       longer apply there and would need to be open-coded. There are 15 such
>>       cases in total.
> 
> This sounds good to me, though I take it there's some code size increase
> and/or performance penalty?
> 
> Maybe the 15 cases are questionable anyway?
> 
>> 5. Use a variadic macro approach to optimize list_for_each_entry, so that
>> it supports both three and four arguments.
>>
>> Pros: (1) Does not add redundant interfaces.
>>       (2) Does not depend on immediate per-subsystem adaptation and can be
>>       merged directly.
>> Cons: (1) Increases compile time.
>>       (2) list_for_each_entry and list_for_each_entry_safe would be merged
>>       into one, and list_for_each_entry_safe would gradually be deprecated.
> 
> Please don't do the macro tricks.
> 
>> 6. Make no changes, keep the current logic unchanged, and close the current
>> email discussion.
> 
> I like hiding the temporary stuff when possible.
> 
> BR,
> Jani.

Hi all,
If there are no objections, I will make the changes using the first approach.


Hi David Laight,
You previously expressed a different opinion. Do you have any further comments
on the current proposed approach?

-- 
Thanks
Kaitao Cheng


^ permalink raw reply

* Re: [TEST] intel: low timeout
From: Pielech, Adrian @ 2026-07-01 11:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kitszel, Przemyslaw, netdev@vger.kernel.org, intel-wired-lan,
	leszek.pepiak
In-Reply-To: <20260630155022.27c9a271@kernel.org>

On 7/1/2026 12:50 AM, Jakub Kicinski wrote:
> On Tue, 30 Jun 2026 14:56:02 +0200 Pielech, Adrian wrote:
>> On 6/27/2026 6:54 PM, Jakub Kicinski wrote:
>>> Hi!
>>>
>>> Some of the tests need more than 5min, could you increase the timeout
>>> in the runner to 10 or 15min? Looks like it's hard-killing tests right
>>> now after 2min:
>>>
>>> https://netdev-ci-results.intel.com/ice-results/net-next-hw-2026-06-27--16-00/ice-E810-XXV4/xdp.py/stdout
>>>
>>> which leaks config across tests:
>>>
>>> https://netdev-ci-results.intel.com/ice-results/net-next-hw-2026-06-27--16-00/ice-E810-XXV4/irq.py/stdout
>>>
>>> BTW the JSON reports the timed out tests as pass.
>>
>> Hi Jakub,
>>
>> I've increased timeout to 10 minutes per test run. It seems to help with
>> XDP tests score.
> 
> Great, thank you!
> 
>> I'll later take a look on default behavior of runner in case of timeouts.
> 
> default behavior == pass/fail status for the test?

Yes, pass/fail status selection. I've found the culprit and since next 
run timeouts should be reported as fail.

^ permalink raw reply

* Re: [PATCH] firmware: qcom: scm: add missing IRQ_DOMAIN select to QCOM_SCM
From: Konrad Dybcio @ 2026-07-01 11:10 UTC (permalink / raw)
  To: Julian Braha, andersson
  Cc: sumit.garg, linux-arm-msm, dri-devel, freedreno, linux-media,
	netdev, linux-wireless, ath12k, linux-remoteproc, konradybcio,
	robh, krzk+dt, conor+dt, robin.clark, sean, akhilpo, lumag,
	abhinav.kumar, jesszhan0024, marijn.suijten, airlied, simona,
	vikash.garodia, bod, mchehab, elder, andrew+netdev, davem,
	edumazet, kuba, pabeni, jjohnson, mathieu.poirier,
	trilokkumar.soni, mukesh.ojha, pavan.kondeti, jorge.ramirez,
	tonyh, vignesh.viswanathan, srinivas.kandagatla, amirreza.zarrabi,
	jens.wiklander, op-tee, apurupa, skare, linux-kernel, sumit.garg,
	harshal.dev
In-Reply-To: <20260701110344.1999068-1-julianbraha@gmail.com>

On 7/1/26 1:03 PM, Julian Braha wrote:
> 'drivers/firmware/qcom/qcom_scm.c' calls 'irq_create_fwspec_mapping'
> so it will fail to compile if IRQ_DOMAIN is disabled:
> 
> drivers/firmware/qcom/qcom_scm.c: In function ‘qcom_scm_get_waitq_irq’:
>   drivers/firmware/qcom/qcom_scm.c:2512:16: error: implicit declaration
> of function ‘irq_create_fwspec_mapping’; did you mean
> ‘irq_create_of_mapping’? [-Wimplicit-function-declaration]
>    2512 |         return irq_create_fwspec_mapping(&fwspec);
>         |                ^~~~~~~~~~~~~~~~~~~~~~~~~
>         |                irq_create_of_mapping
> 
> A patch-set in review proposes making QCOM_SCM visible in the kconfig
> frontend, so let's ensure that it's safe for users to enable:
> https://lore.kernel.org/lkml/akS_6izxrhgK-I22@sumit-xelite/
> 
> Signed-off-by: Julian Braha <julianbraha@gmail.com>
> ---

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>

Konrad

^ permalink raw reply

* Re: [PATCH v2 4/6] Bluetooth: Introduce Qualcomm IPQ5018 IPC based HCI driver
From: Konrad Dybcio @ 2026-07-01 11:19 UTC (permalink / raw)
  To: george.moussalem, Jens Axboe, Ulf Hansson, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Johannes Berg, Jeff Johnson,
	Bartosz Golaszewski, Marcel Holtmann, Luiz Augusto von Dentz,
	Balakrishna Godavarthi, Rocky Liao, Saravana Kannan, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Bjorn Andersson,
	Konrad Dybcio, Mathieu Poirier, Philipp Zabel
  Cc: linux-block, linux-kernel, linux-mmc, devicetree, linux-wireless,
	ath10k, linux-arm-msm, linux-bluetooth, netdev, linux-remoteproc
In-Reply-To: <20260629-ipq5018-bluetooth-v2-4-02770f03b6bb@outlook.com>

On 6/29/26 3:01 PM, George Moussalem via B4 Relay wrote:
> From: George Moussalem <george.moussalem@outlook.com>
> 
> Add support for the Bluetooth controller found in the IPQ5018 SoC.
> This driver implements firmware loading and the transport layer between
> the HCI core and the Bluetooth controller.
> 
> The firmware is loaded by the host into the dedicated reserved memory
> carveout and authenticated by TrustZone. A Secure Channel Manager (SCM)
> call safely brings the peripheral core out of reset.
> 
> A shared memory ring buffer topology handles runtime data frame
> transport between the host APSS and the controller.
> 
> An outgoing APCS IPC bit and an incoming GIC interrupt handle host/guest
> signaling.
> 
> Signed-off-by: George Moussalem <george.moussalem@outlook.com>
> ---

[...]

> +#include <linux/bits.h>
> +#include <linux/clk.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/elf.h>
> +#include <linux/firmware.h>
> +#include <linux/firmware/qcom/qcom_scm.h>
> +#include <linux/init.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_reserved_mem.h>
> +#include <linux/platform_device.h>
> +#include <linux/regmap.h>
> +#include <linux/reset.h>
> +#include <linux/skbuff.h>
> +#include <linux/slab.h>
> +#include <linux/soc/qcom/mdt_loader.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>

I don't know for sure, but the amount of the includes suggests some may
be unnecessary

[...]

> +static void btqcomipc_update_stats(struct hci_dev *hdev, struct sk_buff *skb);

I don't think the forward-declaration is necessary


> +static struct ring_buffer_info *btss_get_tx_rbuf(struct qcom_btss *desc,
> +						 bool *is_sbuf_full)
> +{
> +	u8 idx;
> +	struct ring_buffer_info *rinfo;
> +
> +	for (rinfo = &(desc->tx_ctxt->sring_buf_info);	rinfo != NULL;
> +		rinfo = (struct ring_buffer_info *)(uintptr_t)(rinfo->next)) {
> +		idx = (rinfo->widx + 1) % (desc->tx_ctxt->smsg_buf_cnt);

That's one complex for-loop! Maybe move the assignments into the loop body

[...]

> +	/* Account for HCI packet type as it's not included in the skb payload */
> +	len = (skb) ? skb->len + 1 : 0;

Unnecessary parentheses, also in some other places

> +	memset(&aux_ptr, 0, sizeof(struct ipc_aux_ptr));

You can do aux_ptr = { } at declaration

Konrad

^ permalink raw reply

* [PATCH net] net: ti: icssg-prueth: Fix link-local addresses being forwarded out of slave ports
From: MD Danish Anwar @ 2026-07-01 11:25 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Meghana Malladi
  Cc: linux-arm-kernel, netdev, linux-kernel, danishanwar

Link-local multicast addresses (01:80:c2:00:00:0x) must only be
delivered to the host port (P0) and must not be forwarded out of
the physical slave ports. icssg_fdb_add_del() was programming these
addresses with P1/P2 membership bits set, causing the firmware to
forward them out of slave ports.

Clear P1/P2 membership and set only P0 membership when
is_link_local_ether_addr() returns true.

Fixes: 487f7323f39a ("net: ti: icssg-prueth: Add helper functions to configure FDB")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
---
 drivers/net/ethernet/ti/icssg/icssg_config.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_config.c b/drivers/net/ethernet/ti/icssg/icssg_config.c
index 3f8237c17d099..04a81402e3f3c 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_config.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_config.c
@@ -732,6 +732,16 @@ int icssg_fdb_add_del(struct prueth_emac *emac, const unsigned char *addr,
 	u8 fid = vid;
 	int ret;
 
+	/* Link-local addresses (01:80:c2:00:00:0x) must only be delivered to
+	 * the host port (P0). Clear P1/P2 membership to prevent the firmware
+	 * from forwarding them out of the physical slave ports.
+	 */
+	if (is_link_local_ether_addr(addr)) {
+		fid_c2 |= ICSSG_FDB_ENTRY_P0_MEMBERSHIP;
+		fid_c2 &= ~(ICSSG_FDB_ENTRY_P1_MEMBERSHIP |
+			    ICSSG_FDB_ENTRY_P2_MEMBERSHIP);
+	}
+
 	icssg_fdb_setup(emac, &fdb_cmd, addr, fid, add ? ICSS_CMD_ADD_FDB : ICSS_CMD_DEL_FDB);
 
 	fid_c2 |= ICSSG_FDB_ENTRY_VALID;

base-commit: a225f8c20712713406ae47024b8df42deacddd4a
-- 
2.34.1


^ permalink raw reply related

* [PATCH net 0/2] octeon_ep, octeon_ep_vf: fix skb frags overflow in the RX path
From: Maoyi Xie @ 2026-07-01 11:28 UTC (permalink / raw)
  To: Veerasenareddy Burru, Sathesh Edara
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel

The PF (octeon_ep) and VF (octeon_ep_vf) RX paths add one skb fragment per
buffer_size chunk of a multi-buffer packet. Neither bounds the count
against MAX_SKB_FRAGS. The packet length comes from the device. A long packet
yields about 18 fragments. The default MAX_SKB_FRAGS is 17, so the last
skb_add_rx_frag() overflows frags[]. Both patches drop such a packet,
matching the recent fixes in atlantic and t7xx.

Patch 1 fixes the PF, patch 2 the VF.

This was posted as an inquiry on 2026-06-23 with no reply:
https://lore.kernel.org/r/178219996724.2539184.5129396914438743404@maoyixie.com

Maoyi Xie (2):
  octeon_ep: fix skb frags overflow in the RX path
  octeon_ep_vf: fix skb frags overflow in the RX path

 .../net/ethernet/marvell/octeon_ep/octep_rx.c   |  6 ++++++
 .../ethernet/marvell/octeon_ep_vf/octep_vf_rx.c | 17 +++++++++++++++++
 2 files changed, 23 insertions(+)

-- 
2.34.1

^ permalink raw reply

* [PATCH net 1/2] octeon_ep: fix skb frags overflow in the RX path
From: Maoyi Xie @ 2026-07-01 11:28 UTC (permalink / raw)
  To: Veerasenareddy Burru, Sathesh Edara
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260701112825.1653044-1-maoyixie.tju@gmail.com>

__octep_oq_process_rx() builds an skb for a multi-buffer packet by adding
one fragment per buffer_size chunk:

	data_len = buff_info->len - oq->max_single_buffer_size;
	while (data_len) {
		...
		skb_add_rx_frag(skb, shinfo->nr_frags, buff_info->page, 0,
				buff_info->len, buff_info->len);
		...
	}

buff_info->len comes from the device response header
(be64_to_cpu(resp_hw->length)). Nothing bounds the fragment count against
MAX_SKB_FRAGS. data_len can be close to 65535. buffer_size defaults to
about 3776 on 4K pages, so a full packet yields about 18 fragments. That
is one more than the default MAX_SKB_FRAGS of 17, so skb_add_rx_frag()
writes past shinfo->frags[].

The driver now drops a packet that would need more fragments than the skb
can hold. octep_oq_drop_rx() consumes its descriptors, as on the build_skb
failure path. The same class was fixed in other RX paths, including
commit 5ffcb7b890f6 ("net: atlantic: fix fragment overflow handling in RX
path") and commit f0813bcd2d9d ("net: wwan: t7xx: fix potential skb->frags
overflow in RX path").

Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support")
Co-developed-by: Kaixuan Li <kaixuan.li@ntu.edu.sg>
Signed-off-by: Kaixuan Li <kaixuan.li@ntu.edu.sg>
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
 drivers/net/ethernet/marvell/octeon_ep/octep_rx.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_rx.c b/drivers/net/ethernet/marvell/octeon_ep/octep_rx.c
index e6ebc7e44a00c..4ee911b6c0107 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_rx.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_rx.c
@@ -476,6 +476,12 @@ static int __octep_oq_process_rx(struct octep_device *oct,
 			skb_put(skb, oq->max_single_buffer_size);
 			shinfo = skb_shinfo(skb);
 			data_len = buff_info->len - oq->max_single_buffer_size;
+			if (DIV_ROUND_UP(data_len, oq->buffer_size) > MAX_SKB_FRAGS) {
+				dev_kfree_skb_any(skb);
+				octep_oq_drop_rx(oq, buff_info,
+						 &read_idx, &desc_used);
+				continue;
+			}
 			while (data_len) {
 				buff_info = (struct octep_rx_buffer *)
 					    &oq->buff_info[read_idx];
-- 
2.34.1


^ permalink raw reply related

* [PATCH net 2/2] octeon_ep_vf: fix skb frags overflow in the RX path
From: Maoyi Xie @ 2026-07-01 11:28 UTC (permalink / raw)
  To: Veerasenareddy Burru, Sathesh Edara
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260701112825.1653044-1-maoyixie.tju@gmail.com>

__octep_vf_oq_process_rx() has the same unbounded fragment loop as the PF
driver. buff_info->len comes from the device response header, and one
fragment is added per buffer_size chunk with no check against
MAX_SKB_FRAGS. A long packet yields about 18 fragments, one past the
default MAX_SKB_FRAGS of 17, so skb_add_rx_frag() writes past
shinfo->frags[].

The driver now drops a packet that would need more fragments than the skb
can hold. It drains the descriptors the same way the build_skb failure
path does.

Fixes: 1cd3b407977c ("octeon_ep_vf: add Tx/Rx processing and interrupt support")
Co-developed-by: Kaixuan Li <kaixuan.li@ntu.edu.sg>
Signed-off-by: Kaixuan Li <kaixuan.li@ntu.edu.sg>
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
 .../ethernet/marvell/octeon_ep_vf/octep_vf_rx.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_rx.c b/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_rx.c
index d982474082423..2e666df26b4c3 100644
--- a/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_rx.c
+++ b/drivers/net/ethernet/marvell/octeon_ep_vf/octep_vf_rx.c
@@ -463,6 +463,23 @@ static int __octep_vf_oq_process_rx(struct octep_vf_device *oct,
 
 			shinfo = skb_shinfo(skb);
 			data_len = buff_info->len - oq->max_single_buffer_size;
+			if (DIV_ROUND_UP(data_len, oq->buffer_size) > MAX_SKB_FRAGS) {
+				dev_kfree_skb_any(skb);
+				while (data_len) {
+					dma_unmap_page(oq->dev, oq->desc_ring[read_idx].buffer_ptr,
+						       PAGE_SIZE, DMA_FROM_DEVICE);
+					buff_info = (struct octep_vf_rx_buffer *)
+						    &oq->buff_info[read_idx];
+					buff_info->page = NULL;
+					if (data_len < oq->buffer_size)
+						data_len = 0;
+					else
+						data_len -= oq->buffer_size;
+					desc_used++;
+					read_idx = octep_vf_oq_next_idx(oq, read_idx);
+				}
+				continue;
+			}
 			while (data_len) {
 				dma_unmap_page(oq->dev, oq->desc_ring[read_idx].buffer_ptr,
 					       PAGE_SIZE, DMA_FROM_DEVICE);
-- 
2.34.1


^ permalink raw reply related

* [PATCH net] net/packet: avoid fanout hook re-registration after unregister
From: David Lee @ 2026-07-01 11:39 UTC (permalink / raw)
  To: david.lee, Willem de Bruijn
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Dominik 'Disconnect3d' Czarnota, Simon Horman, netdev,
	linux-kernel

packet_set_ring() temporarily detaches a socket from packet delivery while
reconfiguring its ring. It records the previous running state, clears
po->num, unregisters the protocol hook when needed, drops po->bind_lock,
and later restores po->num and re-registers the hook from the saved
was_running value.

That unlocked window can race with NETDEV_UNREGISTER. The notifier can
observe the socket as not running, skip __unregister_prot_hook(), and
invalidate the per-socket binding by setting po->ifindex to -1 and clearing
po->prot_hook.dev. A one-member fanout group can still retain its shared
fanout hook device pointer. When packet_set_ring() resumes, re-registering
solely from the stale was_running state can re-add the fanout hook after
the device has been unregistered.

Treat po->ifindex == -1 as an invalidated binding after reacquiring
po->bind_lock. Restore po->num as before, but do not re-register the hook
if device unregister already detached the socket.

Signed-off-by: Dominik 'Disconnect3d' Czarnota <dominik.czarnota@trailofbits.com>
Assisted-by: Codex:gpt-5
---
Bug found and triaged by David Lee from Trail of Bits.

Trail of Bits has a PoC that achieves local privilege escalation using this
bug on a custom kernel config with CONFIG_LIST_HARDENED disabled, which can
be shared further if needed.

 net/packet/af_packet.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8e6f3a734ba0..000000000000 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4561,7 +4561,11 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,

 	spin_lock(&po->bind_lock);
 	WRITE_ONCE(po->num, num);
-	if (was_running)
+	/*
+	 * NETDEV_UNREGISTER may have invalidated the binding while bind_lock
+	 * was dropped above.  Do not re-add a fanout hook to a dead device.
+	 */
+	if (was_running && READ_ONCE(po->ifindex) != -1)
 		register_prot_hook(sk);

 	spin_unlock(&po->bind_lock);
-- 
2.43.0

^ permalink raw reply related

* [PATCH iwl-next v1] ixgbe: E610: force phy link to get down when interface is down
From: Jedrzej Jagielski @ 2026-07-01 11:35 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: anthony.l.nguyen, netdev, Jedrzej Jagielski, Aleksandr Loktionov

For the E610 family, similarly to the E8xx adapters, the default behavior
is for the PHY link to remain up even when the corresponding OS interface
is down.

Add function setting down the PHY config IXGBE_ACI_PHY_ENA_LINK bit
what leads to disabling PHY link.

Align functionality with the implementation of the ice driver.

Let user to configure link-down-on-close enablement through ethtool.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c | 35 ++++++++++++++++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h |  1 +
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  | 15 ++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 27 +++++++++++---
 5 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 30f62174acf2..7bbb82dd962c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -685,6 +685,7 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG2_MOD_POWER_UNSUPPORTED	BIT(22)
 #define IXGBE_FLAG2_API_MISMATCH		BIT(23)
 #define IXGBE_FLAG2_FW_ROLLBACK			BIT(24)
+#define IXGBE_FLAG2_LINK_DOWN_ON_CLOSE		BIT(25)
 
 	/* Tx fast path data */
 	int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
index da445fb673fc..46d8a3ea86b8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
@@ -1923,6 +1923,33 @@ void ixgbe_fc_autoneg_e610(struct ixgbe_hw *hw)
 	hw->fc.current_mode = hw->fc.requested_mode;
 }
 
+/**
+ * ixgbe_disable_phy_link - force phy link to get down
+ * @hw: pointer to hardware structure
+ *
+ * Send 0x0601 with the IXGBE_ACI_PHY_ENA_LINK bit set down.
+ *
+ * Return: the exit code of the operation.
+ */
+int ixgbe_disable_phy_link(struct ixgbe_hw *hw)
+{
+	struct ixgbe_aci_cmd_get_phy_caps_data pcaps = {};
+	struct ixgbe_aci_cmd_set_phy_cfg_data pcfg = {};
+	int err;
+
+	err = ixgbe_aci_get_phy_caps(hw, false, IXGBE_ACI_REPORT_ACTIVE_CFG,
+				     &pcaps);
+	if (err)
+		return err;
+
+	ixgbe_copy_phy_caps_to_cfg(&pcaps, &pcfg);
+
+	pcfg.caps &= ~IXGBE_ACI_PHY_ENA_LINK;
+	pcfg.caps |= IXGBE_ACI_PHY_ENA_AUTO_LINK_UPDT;
+
+	return ixgbe_aci_set_phy_cfg(hw, &pcfg);
+}
+
 /**
  * ixgbe_disable_rx_e610 - Disable RX unit
  * @hw: pointer to hardware structure
@@ -2207,6 +2234,7 @@ int ixgbe_setup_phy_link_e610(struct ixgbe_hw *hw)
 	u8 rmode = IXGBE_ACI_REPORT_TOPO_CAP_MEDIA;
 	u64 sup_phy_type_low, sup_phy_type_high;
 	u64 phy_type_low = 0, phy_type_high = 0;
+	bool force_on_required;
 	int err;
 
 	err = ixgbe_aci_get_link_info(hw, false, NULL);
@@ -2272,6 +2300,11 @@ int ixgbe_setup_phy_link_e610(struct ixgbe_hw *hw)
 		phy_type_high |= IXGBE_PHY_TYPE_HIGH_10G_USXGMII;
 	}
 
+	/* If IXGBE_ACI_PHY_ENA_LINK has been explicitly disabled that means
+	 * we need to force interface enablement after reaching that point
+	 */
+	force_on_required = !(pcfg.caps & IXGBE_ACI_PHY_ENA_LINK);
+
 	/* Mask the set values to avoid requesting unsupported link types. */
 	phy_type_low &= sup_phy_type_low;
 	pcfg.phy_type_low = cpu_to_le64(phy_type_low);
@@ -2280,7 +2313,7 @@ int ixgbe_setup_phy_link_e610(struct ixgbe_hw *hw)
 
 	if (pcfg.phy_type_high != pcaps.phy_type_high ||
 	    pcfg.phy_type_low != pcaps.phy_type_low ||
-	    pcfg.caps != pcaps.caps) {
+	    pcfg.caps != pcaps.caps || force_on_required) {
 		pcfg.caps |= IXGBE_ACI_PHY_ENA_LINK;
 		pcfg.caps |= IXGBE_ACI_PHY_ENA_AUTO_LINK_UPDT;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h
index 2cb76a3d30ae..59044d67ebeb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h
@@ -50,6 +50,7 @@ int ixgbe_cfg_phy_fc(struct ixgbe_hw *hw,
 		     enum ixgbe_fc_mode req_mode);
 int ixgbe_setup_fc_e610(struct ixgbe_hw *hw);
 void ixgbe_fc_autoneg_e610(struct ixgbe_hw *hw);
+int ixgbe_disable_phy_link(struct ixgbe_hw *hw);
 void ixgbe_disable_rx_e610(struct ixgbe_hw *hw);
 int ixgbe_init_phy_ops_e610(struct ixgbe_hw *hw);
 int ixgbe_identify_phy_e610(struct ixgbe_hw *hw);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 4dfae53b4ea1..0fcb9d738984 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -139,6 +139,8 @@ static const char ixgbe_priv_flags_strings[][ETH_GSTRING_LEN] = {
 	"vf-ipsec",
 #define IXGBE_PRIV_FLAGS_AUTO_DISABLE_VF	BIT(2)
 	"mdd-disable-vf",
+#define IXGBE_PRIV_LINK_DOWN_ON_CLOSE	BIT(3)
+	"link-down-on-close",
 };
 
 #define IXGBE_PRIV_FLAGS_STR_LEN ARRAY_SIZE(ixgbe_priv_flags_strings)
@@ -3842,6 +3844,9 @@ static u32 ixgbe_get_priv_flags(struct net_device *netdev)
 	if (adapter->flags2 & IXGBE_FLAG2_AUTO_DISABLE_VF)
 		priv_flags |= IXGBE_PRIV_FLAGS_AUTO_DISABLE_VF;
 
+	if (adapter->flags2 & IXGBE_FLAG2_LINK_DOWN_ON_CLOSE)
+		priv_flags |= IXGBE_PRIV_LINK_DOWN_ON_CLOSE;
+
 	return priv_flags;
 }
 
@@ -3879,6 +3884,16 @@ static int ixgbe_set_priv_flags(struct net_device *netdev, u32 priv_flags)
 		}
 	}
 
+	flags2 &= ~IXGBE_FLAG2_LINK_DOWN_ON_CLOSE;
+	if (priv_flags & IXGBE_PRIV_LINK_DOWN_ON_CLOSE) {
+		if (adapter->hw.mac.type == ixgbe_mac_e610) {
+			flags2 |= IXGBE_FLAG2_LINK_DOWN_ON_CLOSE;
+		} else {
+			e_info(probe, "Cannot set private flags: Unsupported hardware\n");
+			return -EOPNOTSUPP;
+		}
+	}
+
 	if (flags2 != adapter->flags2) {
 		adapter->flags2 = flags2;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 62c2d83e1577..58ee4a186039 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7544,6 +7544,17 @@ static void ixgbe_close_suspend(struct ixgbe_adapter *adapter)
 	ixgbe_free_all_rx_resources(adapter);
 }
 
+static void ixgbe_handle_link_down(struct ixgbe_adapter *adapter)
+{
+	struct net_device *netdev = adapter->netdev;
+
+	if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
+		ixgbe_ptp_start_cyclecounter(adapter);
+
+	e_info(drv, "NIC Link is Down\n");
+	netif_carrier_off(netdev);
+}
+
 /**
  * ixgbe_close - Disables a network interface
  * @netdev: network interface device structure
@@ -7566,6 +7577,16 @@ int ixgbe_close(struct net_device *netdev)
 
 	ixgbe_fdir_filter_exit(adapter);
 
+	if (adapter->flags2 & IXGBE_FLAG2_LINK_DOWN_ON_CLOSE) {
+		int err;
+
+		err = ixgbe_disable_phy_link(&adapter->hw);
+		if (err)
+			e_warn(drv, "Cannot set PHY link down\n");
+
+		ixgbe_handle_link_down(adapter);
+	}
+
 	ixgbe_release_hw_control(adapter);
 
 	return 0;
@@ -8244,11 +8265,7 @@ static void ixgbe_watchdog_link_is_down(struct ixgbe_adapter *adapter)
 	if (ixgbe_is_sfp(hw) && hw->mac.type == ixgbe_mac_82598EB)
 		adapter->flags2 |= IXGBE_FLAG2_SEARCH_FOR_SFP;
 
-	if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
-		ixgbe_ptp_start_cyclecounter(adapter);
-
-	e_info(drv, "NIC Link is Down\n");
-	netif_carrier_off(netdev);
+	ixgbe_handle_link_down(adapter);
 }
 
 static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)
-- 
2.31.1


^ permalink raw reply related

* [PATCH net-next 0/2] geneve: make geneve_fill_info() RTNL-less
From: Eric Dumazet @ 2026-07-01 12:04 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Andrew Lunn, netdev,
	eric.dumazet, Eric Dumazet

This series makes geneve_fill_info() independent of the RTNL lock by
converting the device configuration to an RCU-protected pointer.

Historically, geneve_changelink() updated the device configuration by
copying the new configuration over the old one using memcpy() under RTNL.

To prevent the transmit/receive data paths from reading torn values during
the copy, geneve_quiesce() was used to pause the data path and wait for
a synchronize_net(), causing packet loss and latency.

By converting the configuration to an RCU-protected pointer, we can
perform atomic updates via RCU swap. This allows data path readers to
safely access the configuration locklessly under RCU read lock, and
removes the need to stop the data path during changelink.

With the RCU infrastructure in place, geneve_fill_info() is then updated
to read the configuration under RCU read lock, removing its dependency
on RTNL.

Eric Dumazet (2):
  geneve: convert config to RCU-protected pointer
  geneve: make geneve_fill_info() RTNL independent

 drivers/net/geneve.c | 390 +++++++++++++++++++++++++------------------
 1 file changed, 229 insertions(+), 161 deletions(-)

-- 
2.55.0.rc0.799.gd6f94ed593-goog

^ permalink raw reply

* [PATCH net-next 1/2] geneve: convert config to RCU-protected pointer
From: Eric Dumazet @ 2026-07-01 12:04 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Andrew Lunn, netdev,
	eric.dumazet, Eric Dumazet
In-Reply-To: <20260701120454.3533252-1-edumazet@google.com>

geneve_changelink() currently updates configuration by copying it over
the old one using memcpy() under RTNL, forcing data path pause via
geneve_quiesce() and synchronize_net() to avoid reading torn values.

Convert geneve->cfg to an RCU-protected pointer, allowing lockless
and safe reads under RCU read lock without synchronization overhead.

Key changes:
- Introduced geneve_config_alloc/free() helpers for lifecycle.
- geneve_configure() allocates config and publishes it via RCU.
- geneve_changelink() performs RCU swap; old config is freed via call_rcu_hurry().
- Allocates new dst_cache during changelink to prevent pcpu sharing.
- Removed geneve_quiesce/unquiesce() and synchronize_net() from changelink.
- Added rcu_barrier() to module exit to wait for pending callbacks.
- Updated data path to use rcu_dereference().
- Updated geneve_fill_info() to use rtnl_dereference() for now.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/geneve.c | 366 +++++++++++++++++++++++++------------------
 1 file changed, 210 insertions(+), 156 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 396e1a113cd49e9a43bfe6c7cd78f15ca8963906..c777c1a1fa930d74768a659e16380d719f12a6b2 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -82,6 +82,7 @@ struct geneve_config {
 	u16			port_min;
 	u16			port_max;
 
+	struct rcu_head		rcu;
 	/* Must be last --ends in a flexible-array member. */
 	struct ip_tunnel_info	info;
 };
@@ -100,7 +101,7 @@ struct geneve_dev {
 #endif
 	struct list_head   next;	/* geneve's per namespace list */
 	struct gro_cells   gro_cells;
-	struct geneve_config cfg;
+	struct geneve_config __rcu *cfg;
 };
 
 struct geneve_sock {
@@ -182,8 +183,10 @@ static struct geneve_dev *geneve_lookup(struct geneve_sock *gs,
 	hash = geneve_net_vni_hash(vni);
 	vni_list_head = &gs->vni_list[hash];
 	hlist_for_each_entry_rcu(node, vni_list_head, hlist) {
-		if (eq_tun_id_and_vni((u8 *)&node->geneve->cfg.info.key.tun_id, vni) &&
-		    addr == node->geneve->cfg.info.key.u.ipv4.dst)
+		const struct geneve_config *cfg = rcu_dereference(node->geneve->cfg);
+
+		if (eq_tun_id_and_vni((u8 *)&cfg->info.key.tun_id, vni) &&
+		    addr == cfg->info.key.u.ipv4.dst)
 			return node->geneve;
 	}
 	return NULL;
@@ -201,8 +204,10 @@ static struct geneve_dev *geneve6_lookup(struct geneve_sock *gs,
 	hash = geneve_net_vni_hash(vni);
 	vni_list_head = &gs->vni_list[hash];
 	hlist_for_each_entry_rcu(node, vni_list_head, hlist) {
-		if (eq_tun_id_and_vni((u8 *)&node->geneve->cfg.info.key.tun_id, vni) &&
-		    ipv6_addr_equal(&addr6, &node->geneve->cfg.info.key.u.ipv6.dst))
+		const struct geneve_config *cfg = rcu_dereference(node->geneve->cfg);
+
+		if (eq_tun_id_and_vni((u8 *)&cfg->info.key.tun_id, vni) &&
+		    ipv6_addr_equal(&addr6, &cfg->info.key.u.ipv6.dst))
 			return node->geneve;
 	}
 	return NULL;
@@ -386,11 +391,6 @@ static int geneve_init(struct net_device *dev)
 	if (err)
 		return err;
 
-	err = dst_cache_init(&geneve->cfg.info.dst_cache, GFP_KERNEL);
-	if (err) {
-		gro_cells_destroy(&geneve->gro_cells);
-		return err;
-	}
 	netdev_lockdep_set_classes(dev);
 	return 0;
 }
@@ -399,7 +399,6 @@ static void geneve_uninit(struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
 
-	dst_cache_destroy(&geneve->cfg.info.dst_cache);
 	gro_cells_destroy(&geneve->gro_cells);
 }
 
@@ -675,10 +674,14 @@ static int geneve_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 
 	inner_proto = geneveh->proto_type;
 
-	if (unlikely((!geneve->cfg.inner_proto_inherit &&
-		      inner_proto != htons(ETH_P_TEB)))) {
-		dev_dstats_rx_dropped(geneve->dev);
-		goto drop;
+	{
+		const struct geneve_config *cfg = rcu_dereference(geneve->cfg);
+
+		if (unlikely(!cfg || (!cfg->inner_proto_inherit &&
+			      inner_proto != htons(ETH_P_TEB)))) {
+			dev_dstats_rx_dropped(geneve->dev);
+			goto drop;
+		}
 	}
 
 	opts_len = geneveh->opt_len * 4;
@@ -762,9 +765,10 @@ static int geneve_udp_encap_err_lookup(struct sock *sk, struct sk_buff *skb)
 }
 
 static struct sock *geneve_create_sock(struct net *net,
-				       struct geneve_dev *geneve, bool ipv6)
+				       struct geneve_dev *geneve,
+				       const struct geneve_config *cfg, bool ipv6)
 {
-	struct ip_tunnel_info *info = &geneve->cfg.info;
+	const struct ip_tunnel_info *info = &cfg->info;
 	struct udp_port_cfg udp_conf;
 	struct socket *sock;
 	int err;
@@ -775,7 +779,7 @@ static struct sock *geneve_create_sock(struct net *net,
 	if (ipv6) {
 		udp_conf.family = AF_INET6;
 		udp_conf.ipv6_v6only = 1;
-		udp_conf.use_udp6_rx_checksums = geneve->cfg.use_udp6_rx_checksums;
+		udp_conf.use_udp6_rx_checksums = cfg->use_udp6_rx_checksums;
 		udp_conf.local_ip6 = info->key.u.ipv6.src;
 	} else
 #endif
@@ -991,7 +995,8 @@ static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb,
 
 /* Create new listen socket if needed */
 static struct geneve_sock *geneve_socket_create(struct net *net,
-						struct geneve_dev *geneve, bool ipv6)
+						struct geneve_dev *geneve,
+						const struct geneve_config *cfg, bool ipv6)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
 	struct udp_tunnel_sock_cfg tunnel_cfg;
@@ -1003,7 +1008,7 @@ static struct geneve_sock *geneve_socket_create(struct net *net,
 	if (!gs)
 		return ERR_PTR(-ENOMEM);
 
-	sk = geneve_create_sock(net, geneve, ipv6);
+	sk = geneve_create_sock(net, geneve, cfg, ipv6);
 	if (IS_ERR(sk)) {
 		kfree(gs);
 		return ERR_CAST(sk);
@@ -1060,12 +1065,13 @@ static void geneve_sock_release(struct geneve_dev *geneve)
 }
 
 static struct geneve_sock *geneve_find_sock(struct net *net,
-					    struct geneve_dev *geneve, bool ipv6)
+					    struct geneve_dev *geneve,
+					    const struct geneve_config *cfg, bool ipv6)
 {
 	struct geneve_net *gn = net_generic(net, geneve_net_id);
-	struct ip_tunnel_info *info = &geneve->cfg.info;
+	const struct ip_tunnel_info *info = &cfg->info;
 	sa_family_t family = ipv6 ? AF_INET6 : AF_INET;
-	bool gro_hint = geneve->cfg.gro_hint;
+	bool gro_hint = cfg->gro_hint;
 	__be16 dst_port = info->key.tp_dst;
 	struct geneve_sock *gs;
 
@@ -1095,7 +1101,8 @@ static struct geneve_sock *geneve_find_sock(struct net *net,
 	return NULL;
 }
 
-static int geneve_sock_add(struct geneve_dev *geneve, bool ipv6)
+static int geneve_sock_add(struct geneve_dev *geneve,
+			   const struct geneve_config *cfg, bool ipv6)
 {
 	struct net *net = geneve->net;
 	struct geneve_dev_node *node;
@@ -1103,19 +1110,19 @@ static int geneve_sock_add(struct geneve_dev *geneve, bool ipv6)
 	__u8 vni[3];
 	__u32 hash;
 
-	gs = geneve_find_sock(net, geneve, ipv6);
+	gs = geneve_find_sock(net, geneve, cfg, ipv6);
 	if (gs) {
 		gs->refcnt++;
 		goto out;
 	}
 
-	gs = geneve_socket_create(net, geneve, ipv6);
+	gs = geneve_socket_create(net, geneve, cfg, ipv6);
 	if (IS_ERR(gs))
 		return PTR_ERR(gs);
 
 out:
-	gs->collect_md = geneve->cfg.collect_md;
-	gs->gro_hint = geneve->cfg.gro_hint;
+	gs->collect_md = cfg->collect_md;
+	gs->gro_hint = cfg->gro_hint;
 #if IS_ENABLED(CONFIG_IPV6)
 	if (ipv6) {
 		rcu_assign_pointer(geneve->sock6, gs);
@@ -1128,7 +1135,7 @@ static int geneve_sock_add(struct geneve_dev *geneve, bool ipv6)
 	}
 	node->geneve = geneve;
 
-	tunnel_id_to_vni(geneve->cfg.info.key.tun_id, vni);
+	tunnel_id_to_vni(cfg->info.key.tun_id, vni);
 	hash = geneve_net_vni_hash(vni);
 	hlist_add_head_rcu(&node->hlist, &gs->vni_list[hash]);
 	return 0;
@@ -1137,21 +1144,22 @@ static int geneve_sock_add(struct geneve_dev *geneve, bool ipv6)
 static int geneve_open(struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
-	bool dualstack = geneve->cfg.dualstack;
+	const struct geneve_config *cfg = rtnl_dereference(geneve->cfg);
+	bool dualstack = cfg->dualstack;
 	bool ipv4, ipv6;
 	int ret = 0;
 
-	ipv6 = geneve->cfg.info.mode & IP_TUNNEL_INFO_IPV6 || dualstack;
+	ipv6 = cfg->info.mode & IP_TUNNEL_INFO_IPV6 || dualstack;
 	ipv4 = !ipv6 || dualstack;
 #if IS_ENABLED(CONFIG_IPV6)
 	if (ipv6) {
-		ret = geneve_sock_add(geneve, true);
+		ret = geneve_sock_add(geneve, cfg, true);
 		if (ret < 0 && ret != -EAFNOSUPPORT)
 			ipv4 = false;
 	}
 #endif
 	if (ipv4)
-		ret = geneve_sock_add(geneve, false);
+		ret = geneve_sock_add(geneve, cfg, false);
 	if (ret < 0)
 		geneve_sock_release(geneve);
 
@@ -1189,6 +1197,7 @@ static void geneve_build_header(struct genevehdr *geneveh,
 }
 
 static int geneve_build_gro_hint_opt(const struct geneve_dev *geneve,
+				     const struct geneve_config *cfg,
 				     struct sk_buff *skb)
 {
 	struct geneve_skb_cb *cb = GENEVE_SKB_CB(skb);
@@ -1201,7 +1210,7 @@ static int geneve_build_gro_hint_opt(const struct geneve_dev *geneve,
 	cb->gro_hint_len = 0;
 
 	/* Try to add the GRO hint only in case of double encap. */
-	if (!geneve->cfg.gro_hint || !skb->encapsulation)
+	if (!cfg->gro_hint || !skb->encapsulation)
 		return 0;
 
 	/*
@@ -1262,10 +1271,11 @@ static void geneve_put_gro_hint_opt(struct genevehdr *gnvh, int opt_size,
 
 static int geneve_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 			    const struct ip_tunnel_info *info,
-			    const struct geneve_dev *geneve, int ip_hdr_len)
+			    const struct geneve_dev *geneve,
+			    const struct geneve_config *cfg, int ip_hdr_len)
 {
 	bool udp_sum = test_bit(IP_TUNNEL_CSUM_BIT, info->key.tun_flags);
-	bool inner_proto_inherit = geneve->cfg.inner_proto_inherit;
+	bool inner_proto_inherit = cfg->inner_proto_inherit;
 	bool xnet = !net_eq(geneve->net, dev_net(geneve->dev));
 	struct geneve_skb_cb *cb = GENEVE_SKB_CB(skb);
 	struct genevehdr *gnvh;
@@ -1306,14 +1316,14 @@ static int geneve_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 }
 
 static u8 geneve_get_dsfield(struct sk_buff *skb, struct net_device *dev,
+			     const struct geneve_config *cfg,
 			     const struct ip_tunnel_info *info,
 			     bool *use_cache)
 {
-	struct geneve_dev *geneve = netdev_priv(dev);
 	u8 dsfield;
 
 	dsfield = info->key.tos;
-	if (dsfield == 1 && !geneve->cfg.collect_md) {
+	if (cfg && dsfield == 1 && !cfg->collect_md) {
 		dsfield = ip_tunnel_get_dsfield(ip_hdr(skb), skb);
 		*use_cache = false;
 	}
@@ -1323,6 +1333,7 @@ static u8 geneve_get_dsfield(struct sk_buff *skb, struct net_device *dev,
 
 static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			   struct geneve_dev *geneve,
+			   const struct geneve_config *cfg,
 			   const struct ip_tunnel_info *info)
 {
 	struct geneve_sock *gs4 = rcu_dereference(geneve->sock4);
@@ -1335,35 +1346,35 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	__be16 sport;
 	int err;
 
-	if (skb_vlan_inet_prepare(skb, geneve->cfg.inner_proto_inherit))
+	if (skb_vlan_inet_prepare(skb, cfg->inner_proto_inherit))
 		return -EINVAL;
 
 	if (!gs4)
 		return -EIO;
 
 	use_cache = ip_tunnel_dst_cache_usable(skb, info);
-	tos = geneve_get_dsfield(skb, dev, info, &use_cache);
+	tos = geneve_get_dsfield(skb, dev, cfg, info, &use_cache);
 	sport = udp_flow_src_port(geneve->net, skb,
-				  geneve->cfg.port_min,
-				  geneve->cfg.port_max, true);
+				  cfg->port_min,
+				  cfg->port_max, true);
 
 	rt = udp_tunnel_dst_lookup(skb, dev, geneve->net, 0, &saddr,
 				   &info->key,
-				   sport, geneve->cfg.info.key.tp_dst, tos,
+				   sport, cfg->info.key.tp_dst, tos,
 				   use_cache ?
 				   (struct dst_cache *)&info->dst_cache : NULL);
 	if (IS_ERR(rt))
 		return PTR_ERR(rt);
 
-	if (geneve->cfg.info.key.u.ipv4.src &&
-	    saddr != geneve->cfg.info.key.u.ipv4.src) {
+	if (cfg->info.key.u.ipv4.src &&
+	    saddr != cfg->info.key.u.ipv4.src) {
 		dst_release(&rt->dst);
 		return -EADDRNOTAVAIL;
 	}
 
 	err = skb_tunnel_check_pmtu(skb, &rt->dst,
 				    GENEVE_IPV4_HLEN + info->options_len +
-				    geneve_build_gro_hint_opt(geneve, skb),
+				    geneve_build_gro_hint_opt(geneve, cfg, skb),
 				    netif_is_any_bridge_port(dev));
 	if (err < 0) {
 		dst_release(&rt->dst);
@@ -1397,21 +1408,21 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	}
 
 	tos = ip_tunnel_ecn_encap(tos, ip_hdr(skb), skb);
-	if (geneve->cfg.collect_md) {
+	if (cfg->collect_md) {
 		ttl = key->ttl;
 
 		df = test_bit(IP_TUNNEL_DONT_FRAGMENT_BIT, key->tun_flags) ?
 		     htons(IP_DF) : 0;
 	} else {
-		if (geneve->cfg.ttl_inherit)
+		if (cfg->ttl_inherit)
 			ttl = ip_tunnel_get_ttl(ip_hdr(skb), skb);
 		else
 			ttl = key->ttl;
 		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
 
-		if (geneve->cfg.df == GENEVE_DF_SET) {
+		if (cfg->df == GENEVE_DF_SET) {
 			df = htons(IP_DF);
-		} else if (geneve->cfg.df == GENEVE_DF_INHERIT) {
+		} else if (cfg->df == GENEVE_DF_INHERIT) {
 			struct ethhdr *eth = skb_eth_hdr(skb);
 
 			if (ntohs(eth->h_proto) == ETH_P_IPV6) {
@@ -1425,13 +1436,13 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		}
 	}
 
-	err = geneve_build_skb(&rt->dst, skb, info, geneve,
+	err = geneve_build_skb(&rt->dst, skb, info, geneve, cfg,
 			       sizeof(struct iphdr));
 	if (unlikely(err))
 		return err;
 
 	udp_tunnel_xmit_skb(rt, gs4->sk, skb, saddr, info->key.u.ipv4.dst,
-			    tos, ttl, df, sport, geneve->cfg.info.key.tp_dst,
+			    tos, ttl, df, sport, cfg->info.key.tp_dst,
 			    !net_eq(geneve->net, dev_net(geneve->dev)),
 			    !test_bit(IP_TUNNEL_CSUM_BIT, info->key.tun_flags),
 			    0);
@@ -1441,6 +1452,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 #if IS_ENABLED(CONFIG_IPV6)
 static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 			    struct geneve_dev *geneve,
+			    const struct geneve_config *cfg,
 			    const struct ip_tunnel_info *info)
 {
 	struct geneve_sock *gs6 = rcu_dereference(geneve->sock6);
@@ -1452,35 +1464,35 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	__be16 sport;
 	int err;
 
-	if (skb_vlan_inet_prepare(skb, geneve->cfg.inner_proto_inherit))
+	if (skb_vlan_inet_prepare(skb, cfg->inner_proto_inherit))
 		return -EINVAL;
 
 	if (!gs6)
 		return -EIO;
 
 	use_cache = ip_tunnel_dst_cache_usable(skb, info);
-	prio = geneve_get_dsfield(skb, dev, info, &use_cache);
+	prio = geneve_get_dsfield(skb, dev, cfg, info, &use_cache);
 	sport = udp_flow_src_port(geneve->net, skb,
-				  geneve->cfg.port_min,
-				  geneve->cfg.port_max, true);
+				  cfg->port_min,
+				  cfg->port_max, true);
 
 	dst = udp_tunnel6_dst_lookup(skb, dev, geneve->net, gs6->sk, 0,
 				     &saddr, key, sport,
-				     geneve->cfg.info.key.tp_dst, prio,
+				     cfg->info.key.tp_dst, prio,
 				     use_cache ?
 				     (struct dst_cache *)&info->dst_cache : NULL);
 	if (IS_ERR(dst))
 		return PTR_ERR(dst);
 
-	if (!ipv6_addr_any(&geneve->cfg.info.key.u.ipv6.src) &&
-	    !ipv6_addr_equal(&saddr, &geneve->cfg.info.key.u.ipv6.src)) {
+	if (!ipv6_addr_any(&cfg->info.key.u.ipv6.src) &&
+	    !ipv6_addr_equal(&saddr, &cfg->info.key.u.ipv6.src)) {
 		dst_release(dst);
 		return -EADDRNOTAVAIL;
 	}
 
 	err = skb_tunnel_check_pmtu(skb, dst,
 				    GENEVE_IPV6_HLEN + info->options_len +
-				    geneve_build_gro_hint_opt(geneve, skb),
+				    geneve_build_gro_hint_opt(geneve, cfg, skb),
 				    netif_is_any_bridge_port(dev));
 	if (err < 0) {
 		dst_release(dst);
@@ -1513,22 +1525,22 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 	}
 
 	prio = ip_tunnel_ecn_encap(prio, ip_hdr(skb), skb);
-	if (geneve->cfg.collect_md) {
+	if (cfg->collect_md) {
 		ttl = key->ttl;
 	} else {
-		if (geneve->cfg.ttl_inherit)
+		if (cfg->ttl_inherit)
 			ttl = ip_tunnel_get_ttl(ip_hdr(skb), skb);
 		else
 			ttl = key->ttl;
 		ttl = ttl ? : ip6_dst_hoplimit(dst);
 	}
-	err = geneve_build_skb(dst, skb, info, geneve, sizeof(struct ipv6hdr));
+	err = geneve_build_skb(dst, skb, info, geneve, cfg, sizeof(struct ipv6hdr));
 	if (unlikely(err))
 		return err;
 
 	udp_tunnel6_xmit_skb(dst, gs6->sk, skb, dev,
 			     &saddr, &key->u.ipv6.dst, prio, ttl,
-			     info->key.label, sport, geneve->cfg.info.key.tp_dst,
+			     info->key.label, sport, cfg->info.key.tp_dst,
 			     !test_bit(IP_TUNNEL_CSUM_BIT,
 				       info->key.tun_flags),
 			     0);
@@ -1539,28 +1551,36 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
-	struct ip_tunnel_info *info = NULL;
+	const struct ip_tunnel_info *info = NULL;
+	const struct geneve_config *cfg;
 	int err;
 
-	if (geneve->cfg.collect_md) {
+	rcu_read_lock();
+	cfg = rcu_dereference(geneve->cfg);
+	if (unlikely(!cfg)) {
+		err = -ENODEV;
+		goto tx_err;
+	}
+
+	if (cfg->collect_md) {
 		info = skb_tunnel_info(skb);
 		if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
 			netdev_dbg(dev, "no tunnel metadata\n");
 			dev_kfree_skb(skb);
 			dev_dstats_tx_dropped(dev);
+			rcu_read_unlock();
 			return NETDEV_TX_OK;
 		}
 	} else {
-		info = &geneve->cfg.info;
+		info = &cfg->info;
 	}
 
-	rcu_read_lock();
 #if IS_ENABLED(CONFIG_IPV6)
 	if (info->mode & IP_TUNNEL_INFO_IPV6)
-		err = geneve6_xmit_skb(skb, dev, geneve, info);
+		err = geneve6_xmit_skb(skb, dev, geneve, cfg, info);
 	else
 #endif
-		err = geneve_xmit_skb(skb, dev, geneve, info);
+		err = geneve_xmit_skb(skb, dev, geneve, cfg, info);
 	rcu_read_unlock();
 
 	if (likely(!err))
@@ -1576,6 +1596,12 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	DEV_STATS_INC(dev, tx_errors);
 	return NETDEV_TX_OK;
+
+tx_err:
+	rcu_read_unlock();
+	dev_kfree_skb(skb);
+	DEV_STATS_INC(dev, tx_errors);
+	return NETDEV_TX_OK;
 }
 
 static int geneve_change_mtu(struct net_device *dev, int new_mtu)
@@ -1593,8 +1619,13 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 {
 	struct ip_tunnel_info *info = skb_tunnel_info(skb);
 	struct geneve_dev *geneve = netdev_priv(dev);
+	const struct geneve_config *cfg;
 	__be16 sport;
 
+	cfg = rcu_dereference(geneve->cfg);
+	if (unlikely(!cfg))
+		return -ENODEV;
+
 	if (ip_tunnel_info_af(info) == AF_INET) {
 		struct rtable *rt;
 		struct geneve_sock *gs4 = rcu_dereference(geneve->sock4);
@@ -1606,14 +1637,14 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 			return -EIO;
 
 		use_cache = ip_tunnel_dst_cache_usable(skb, info);
-		tos = geneve_get_dsfield(skb, dev, info, &use_cache);
+		tos = geneve_get_dsfield(skb, dev, cfg, info, &use_cache);
 		sport = udp_flow_src_port(geneve->net, skb,
-					  geneve->cfg.port_min,
-					  geneve->cfg.port_max, true);
+					  cfg->port_min,
+					  cfg->port_max, true);
 
 		rt = udp_tunnel_dst_lookup(skb, dev, geneve->net, 0, &saddr,
 					   &info->key,
-					   sport, geneve->cfg.info.key.tp_dst,
+					   sport, cfg->info.key.tp_dst,
 					   tos,
 					   use_cache ? &info->dst_cache : NULL);
 		if (IS_ERR(rt))
@@ -1633,14 +1664,14 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 			return -EIO;
 
 		use_cache = ip_tunnel_dst_cache_usable(skb, info);
-		prio = geneve_get_dsfield(skb, dev, info, &use_cache);
+		prio = geneve_get_dsfield(skb, dev, cfg, info, &use_cache);
 		sport = udp_flow_src_port(geneve->net, skb,
-					  geneve->cfg.port_min,
-					  geneve->cfg.port_max, true);
+					  cfg->port_min,
+					  cfg->port_max, true);
 
 		dst = udp_tunnel6_dst_lookup(skb, dev, geneve->net, gs6->sk, 0,
 					     &saddr, &info->key, sport,
-					     geneve->cfg.info.key.tp_dst, prio,
+					     cfg->info.key.tp_dst, prio,
 					     use_cache ? &info->dst_cache : NULL);
 		if (IS_ERR(dst))
 			return PTR_ERR(dst);
@@ -1653,7 +1684,7 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
 	}
 
 	info->key.tp_src = sport;
-	info->key.tp_dst = geneve->cfg.info.key.tp_dst;
+	info->key.tp_dst = cfg->info.key.tp_dst;
 	return 0;
 }
 
@@ -1709,7 +1740,49 @@ static void geneve_offload_rx_ports(struct net_device *dev, bool push)
 	}
 }
 
+static struct geneve_config *geneve_config_alloc(const struct geneve_config *src)
+{
+	struct geneve_config *cfg;
+	int err;
+
+	cfg = kmemdup(src, sizeof(*src), GFP_KERNEL);
+	if (!cfg)
+		return ERR_PTR(-ENOMEM);
+
+	cfg->info.dst_cache.cache = NULL;
+	err = dst_cache_init(&cfg->info.dst_cache, GFP_KERNEL);
+	if (err) {
+		kfree(cfg);
+		return ERR_PTR(err);
+	}
+
+	return cfg;
+}
+
+static void geneve_config_free(struct geneve_config *cfg)
+{
+	if (cfg) {
+		dst_cache_destroy(&cfg->info.dst_cache);
+		kfree(cfg);
+	}
+}
+
+static void geneve_config_free_rcu(struct rcu_head *head)
+{
+	struct geneve_config *cfg = container_of(head, struct geneve_config, rcu);
+
+	geneve_config_free(cfg);
+}
+
 /* Initialize the device structure. */
+static void geneve_free_dev(struct net_device *dev)
+{
+	struct geneve_dev *geneve = netdev_priv(dev);
+	struct geneve_config *cfg = rcu_dereference_protected(geneve->cfg, 1);
+
+	geneve_config_free(cfg);
+}
+
 static void geneve_setup(struct net_device *dev)
 {
 	ether_setup(dev);
@@ -1717,6 +1790,8 @@ static void geneve_setup(struct net_device *dev)
 	dev->netdev_ops = &geneve_netdev_ops;
 	dev->ethtool_ops = &geneve_ethtool_ops;
 	dev->needs_free_netdev = true;
+	dev->priv_destructor = geneve_free_dev;
+
 
 	SET_NETDEV_DEVTYPE(dev, &geneve_type);
 
@@ -1878,15 +1953,17 @@ static struct geneve_dev *geneve_find_dev(struct geneve_net *gn,
 	*tun_on_same_port = false;
 	*tun_collect_md = false;
 	list_for_each_entry(geneve, &gn->geneve_list, next) {
-		if (info->key.tp_dst == geneve->cfg.info.key.tp_dst &&
-		    (cfg->dualstack || geneve->cfg.dualstack ||
-		     geneve_saddr_conflict(info, &geneve->cfg.info))) {
-			*tun_collect_md |= geneve->cfg.collect_md;
+		const struct geneve_config *gcfg = rtnl_dereference(geneve->cfg);
+
+		if (info->key.tp_dst == gcfg->info.key.tp_dst &&
+		    (cfg->dualstack || gcfg->dualstack ||
+		     geneve_saddr_conflict(info, &gcfg->info))) {
+			*tun_collect_md |= gcfg->collect_md;
 			*tun_on_same_port = true;
 		}
-		if (info->key.tun_id == geneve->cfg.info.key.tun_id &&
-		    info->key.tp_dst == geneve->cfg.info.key.tp_dst &&
-		    !memcmp(&info->key.u, &geneve->cfg.info.key.u, sizeof(info->key.u)))
+		if (info->key.tun_id == gcfg->info.key.tun_id &&
+		    info->key.tp_dst == gcfg->info.key.tp_dst &&
+		    !memcmp(&info->key.u, &gcfg->info.key.u, sizeof(info->key.u)))
 			t = geneve;
 	}
 	return t;
@@ -1922,6 +1999,7 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 	struct geneve_dev *t, *geneve = netdev_priv(dev);
 	const struct ip_tunnel_info *info = &cfg->info;
 	bool tun_collect_md, tun_on_same_port;
+	struct geneve_config *new_cfg;
 	int err, encap_len;
 
 	if (cfg->collect_md && !is_tnl_info_zero(info)) {
@@ -1962,10 +2040,13 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 		}
 	}
 
-	dst_cache_reset(&geneve->cfg.info.dst_cache);
-	memcpy(&geneve->cfg, cfg, sizeof(*cfg));
+	new_cfg = geneve_config_alloc(cfg);
+	if (IS_ERR(new_cfg))
+		return PTR_ERR(new_cfg);
 
-	if (geneve->cfg.inner_proto_inherit) {
+	rcu_assign_pointer(geneve->cfg, new_cfg);
+
+	if (cfg->inner_proto_inherit) {
 		dev->header_ops = NULL;
 		dev->type = ARPHRD_NONE;
 		dev->hard_header_len = 0;
@@ -1975,10 +2056,15 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 
 	err = register_netdevice(dev);
 	if (err)
-		return err;
+		goto err_free_cfg;
 
 	list_add(&geneve->next, &gn->geneve_list);
 	return 0;
+
+err_free_cfg:
+	geneve_config_free(new_cfg);
+	RCU_INIT_POINTER(geneve->cfg, NULL);
+	return err;
 }
 
 static void init_tnl_info(struct ip_tunnel_info *info, __u16 dst_port)
@@ -2323,81 +2409,47 @@ static int geneve_newlink(struct net_device *dev,
 	return 0;
 }
 
-/* Quiesces the geneve device data path for both TX and RX.
- *
- * On transmit geneve checks for non-NULL geneve_sock before it proceeds.
- * So, if we set that socket to NULL under RCU and wait for synchronize_net()
- * to complete for the existing set of in-flight packets to be transmitted,
- * then we would have quiesced the transmit data path. All the future packets
- * will get dropped until we unquiesce the data path.
- *
- * On receive geneve dereference the geneve_sock stashed in the socket. So,
- * if we set that to NULL under RCU and wait for synchronize_net() to
- * complete, then we would have quiesced the receive data path.
+/* Update the device configuration under RTNL.
+ * We use RCU swap to update the configuration atomically, so the data path
+ * (both TX and RX) can continue running without interruption or packet loss.
  */
-static void geneve_quiesce(struct geneve_dev *geneve, struct geneve_sock **gs4,
-			   struct geneve_sock **gs6)
-{
-	*gs4 = rtnl_dereference(geneve->sock4);
-	rcu_assign_pointer(geneve->sock4, NULL);
-	if (*gs4)
-		rcu_assign_sk_user_data((*gs4)->sk, NULL);
-#if IS_ENABLED(CONFIG_IPV6)
-	*gs6 = rtnl_dereference(geneve->sock6);
-	rcu_assign_pointer(geneve->sock6, NULL);
-	if (*gs6)
-		rcu_assign_sk_user_data((*gs6)->sk, NULL);
-#else
-	*gs6 = NULL;
-#endif
-	synchronize_net();
-}
 
-/* Resumes the geneve device data path for both TX and RX. */
-static void geneve_unquiesce(struct geneve_dev *geneve, struct geneve_sock *gs4,
-			     struct geneve_sock __maybe_unused *gs6)
-{
-	rcu_assign_pointer(geneve->sock4, gs4);
-	if (gs4)
-		rcu_assign_sk_user_data(gs4->sk, gs4);
-#if IS_ENABLED(CONFIG_IPV6)
-	rcu_assign_pointer(geneve->sock6, gs6);
-	if (gs6)
-		rcu_assign_sk_user_data(gs6->sk, gs6);
-#endif
-}
 
 static int geneve_changelink(struct net_device *dev, struct nlattr *tb[],
 			     struct nlattr *data[],
 			     struct netlink_ext_ack *extack)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
-	struct geneve_sock *gs4, *gs6;
-	struct geneve_config cfg;
+	struct geneve_config *old_cfg = rtnl_dereference(geneve->cfg);
+	struct geneve_config *cfg;
 	int err;
 
 	/* If the geneve device is configured for metadata (or externally
 	 * controlled, for example, OVS), then nothing can be changed.
 	 */
-	if (geneve->cfg.collect_md)
+	if (old_cfg->collect_md)
 		return -EOPNOTSUPP;
 
 	/* Start with the existing info. */
-	memcpy(&cfg, &geneve->cfg, sizeof(cfg));
-	err = geneve_nl2info(tb, data, extack, &cfg, true);
+	cfg = geneve_config_alloc(old_cfg);
+	if (IS_ERR(cfg))
+		return PTR_ERR(cfg);
+
+	err = geneve_nl2info(tb, data, extack, cfg, true);
 	if (err)
-		return err;
+		goto err_free_cfg;
 
-	if (!geneve_dst_addr_equal(&geneve->cfg.info, &cfg.info)) {
-		dst_cache_reset(&cfg.info.dst_cache);
-		geneve_link_config(dev, &cfg.info, tb);
-	}
+	if (!geneve_dst_addr_equal(&old_cfg->info, &cfg->info))
+		geneve_link_config(dev, &cfg->info, tb);
 
-	geneve_quiesce(geneve, &gs4, &gs6);
-	memcpy(&geneve->cfg, &cfg, sizeof(cfg));
-	geneve_unquiesce(geneve, gs4, gs6);
+	rcu_assign_pointer(geneve->cfg, cfg);
 
+	call_rcu_hurry(&old_cfg->rcu, geneve_config_free_rcu);
 	return 0;
+
+err_free_cfg:
+	geneve_config_free(cfg);
+	return err;
 }
 
 static void geneve_dellink(struct net_device *dev, struct list_head *head)
@@ -2432,12 +2484,13 @@ static size_t geneve_get_size(const struct net_device *dev)
 static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
-	struct ip_tunnel_info *info = &geneve->cfg.info;
-	bool ttl_inherit = geneve->cfg.ttl_inherit;
-	bool metadata = geneve->cfg.collect_md;
+	struct geneve_config *cfg = rtnl_dereference(geneve->cfg);
+	struct ip_tunnel_info *info = &cfg->info;
+	bool ttl_inherit = cfg->ttl_inherit;
+	bool metadata = cfg->collect_md;
 	struct ifla_geneve_port_range ports = {
-		.low	= htons(geneve->cfg.port_min),
-		.high	= htons(geneve->cfg.port_max),
+		.low	= htons(cfg->port_min),
+		.high	= htons(cfg->port_max),
 	};
 	__u8 tmp_vni[3];
 	__u32 vni;
@@ -2468,17 +2521,17 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 #endif
 	}
 
-	if (!geneve->cfg.dualstack) {
+	if (!cfg->dualstack) {
 		if (ip_tunnel_info_af(info) == AF_INET) {
 			if ((info->key.u.ipv4.src ||
-			     geneve->cfg.collect_md) &&
+			     metadata) &&
 			    nla_put_in_addr(skb, IFLA_GENEVE_LOCAL,
 					    info->key.u.ipv4.src))
 				goto nla_put_failure;
 #if IS_ENABLED(CONFIG_IPV6)
 		} else {
 			if ((!ipv6_addr_any(&info->key.u.ipv6.src) ||
-			     geneve->cfg.collect_md) &&
+			     metadata) &&
 			    nla_put_in6_addr(skb, IFLA_GENEVE_LOCAL6,
 					     &info->key.u.ipv6.src))
 				goto nla_put_failure;
@@ -2491,7 +2544,7 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_be32(skb, IFLA_GENEVE_LABEL, info->key.label))
 		goto nla_put_failure;
 
-	if (nla_put_u8(skb, IFLA_GENEVE_DF, geneve->cfg.df))
+	if (nla_put_u8(skb, IFLA_GENEVE_DF, cfg->df))
 		goto nla_put_failure;
 
 	if (nla_put_be16(skb, IFLA_GENEVE_PORT, info->key.tp_dst))
@@ -2502,21 +2555,21 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 
 #if IS_ENABLED(CONFIG_IPV6)
 	if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
-		       !geneve->cfg.use_udp6_rx_checksums))
+		       !cfg->use_udp6_rx_checksums))
 		goto nla_put_failure;
 #endif
 
 	if (nla_put_u8(skb, IFLA_GENEVE_TTL_INHERIT, ttl_inherit))
 		goto nla_put_failure;
 
-	if (geneve->cfg.inner_proto_inherit &&
+	if (cfg->inner_proto_inherit &&
 	    nla_put_flag(skb, IFLA_GENEVE_INNER_PROTO_INHERIT))
 		goto nla_put_failure;
 
 	if (nla_put(skb, IFLA_GENEVE_PORT_RANGE, sizeof(ports), &ports))
 		goto nla_put_failure;
 
-	if (geneve->cfg.gro_hint &&
+	if (cfg->gro_hint &&
 	    nla_put_flag(skb, IFLA_GENEVE_GRO_HINT))
 		goto nla_put_failure;
 
@@ -2671,6 +2724,7 @@ static void __exit geneve_cleanup_module(void)
 	rtnl_link_unregister(&geneve_link_ops);
 	unregister_netdevice_notifier(&geneve_notifier_block);
 	unregister_pernet_subsys(&geneve_net_ops);
+	rcu_barrier();
 }
 module_exit(geneve_cleanup_module);
 
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH net-next 2/2] geneve: make geneve_fill_info() RTNL independent
From: Eric Dumazet @ 2026-07-01 12:04 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Andrew Lunn, netdev,
	eric.dumazet, Eric Dumazet
In-Reply-To: <20260701120454.3533252-1-edumazet@google.com>

Now that geneve->cfg is an RCU-protected pointer, we can update
geneve_fill_info() to read the configuration under RCU read lock
instead of relying on RTNL.

Also add const qualifiers to the dereferenced pointers where appropriate.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/geneve.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index c777c1a1fa930d74768a659e16380d719f12a6b2..7197f90eeb0f3a381fac5be4a96d6570ca4fccb1 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -2483,17 +2483,28 @@ static size_t geneve_get_size(const struct net_device *dev)
 
 static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
-	struct geneve_dev *geneve = netdev_priv(dev);
-	struct geneve_config *cfg = rtnl_dereference(geneve->cfg);
-	struct ip_tunnel_info *info = &cfg->info;
-	bool ttl_inherit = cfg->ttl_inherit;
-	bool metadata = cfg->collect_md;
-	struct ifla_geneve_port_range ports = {
-		.low	= htons(cfg->port_min),
-		.high	= htons(cfg->port_max),
-	};
+	const struct geneve_dev *geneve = netdev_priv(dev);
+	struct ifla_geneve_port_range ports;
+	const struct geneve_config *cfg;
+	const struct ip_tunnel_info *info;
+	bool ttl_inherit;
+	bool metadata;
 	__u8 tmp_vni[3];
 	__u32 vni;
+	int err = 0;
+
+	rcu_read_lock();
+	cfg = rcu_dereference(geneve->cfg);
+	if (!cfg) {
+		err = -ENODEV;
+		goto out;
+	}
+
+	info = &cfg->info;
+	ttl_inherit = cfg->ttl_inherit;
+	metadata = cfg->collect_md;
+	ports.low = htons(cfg->port_min);
+	ports.high = htons(cfg->port_max);
 
 	tunnel_id_to_vni(info->key.tun_id, tmp_vni);
 	vni = (tmp_vni[0] << 16) | (tmp_vni[1] << 8) | tmp_vni[2];
@@ -2573,10 +2584,13 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_flag(skb, IFLA_GENEVE_GRO_HINT))
 		goto nla_put_failure;
 
-	return 0;
+out:
+	rcu_read_unlock();
+	return err;
 
 nla_put_failure:
-	return -EMSGSIZE;
+	err = -EMSGSIZE;
+	goto out;
 }
 
 static struct rtnl_link_ops geneve_link_ops __read_mostly = {
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* Re: [PATCH v3 0/2] ptp: split non-host-disciplined PHC drivers into a dedicated subdirectory
From: Wen Gu @ 2026-07-01 12:08 UTC (permalink / raw)
  To: David Woodhouse, richardcochran, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, linux-kernel
  Cc: svens, nick.shi, ajay.kaher, alexey.makhalov,
	bcm-kernel-feedback-list, linux-s390, xuanzhuo, dust.li, mani,
	imran.shaik
In-Reply-To: <14969e704fb8e70deb549b2e1c8670f6756a8da7.camel@infradead.org>



On 2026/6/30 18:57, David Woodhouse wrote:
> On Tue, 2026-06-30 at 11:15 +0800, Wen Gu wrote:
> 
> I was thinking we'd move them to drivers/phc, and simplify them as we do.
> 
> Most of them are just a lot of PTP driver boilerplate, wrapping around
> one central function like
> 
> static int vmclock_get_crosststamp(struct vmclock_state *st,
>                                     struct ptp_system_timestamp *sts,
>                                     struct system_counterval_t *system_counter,
>                                     struct timespec64 *tspec)
> 
> ...which is called with different permutations of arguments depending
> on the actual PTP call.
> 
> I was thinking of reducing the duplication and having the PHC drivers
> provide *only* that central function. Let the common PHC code provide
> the interface to PTP (as well as to core timekeeping, for setting the
> clock at boot, for timekeeping_set_reference() in the vmclock case, and
> perhaps even for a PPS-like discipline from other clocks).
> 
> Here's a *very* hastily thrown together proof of concept; utterly
> untested and AI-produced, and I've only given it the bare minimum of
> oversight thus far (I have been meaning to do this for weeks but other
> things have taken precedence so far)...
> 
> https://git.infradead.org/?p=linux-phc.git;a=shortlog;h=refs/heads/next

Thanks for the POC. I've looked over it and agree this is a cleaner
approach overall. The v3 was meant as an interim step, but given the POC
it makes more sense to go straight to drivers/phc.

I'm happy to adapt the Alibaba CIPU PHC driver to the new framework
once it's in place, and glad to help along the way if needed.

^ permalink raw reply

* Re: [PATCH net-next 1/2] macvlan: annotate data-races around vlan->mode and vlan->flags
From: Nikolay Aleksandrov @ 2026-07-01 12:09 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet
In-Reply-To: <20260701082214.2974946-2-edumazet@google.com>

On 01/07/2026 11:22, Eric Dumazet wrote:
> Both fields can be changed in macvlan_changelink() while being read
> locklessly.
> 
> Add READ_ONCE()/WRITE_ONCE() annotations.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>   drivers/net/macvlan.c | 38 +++++++++++++++++++++-----------------
>   1 file changed, 21 insertions(+), 17 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>

^ permalink raw reply

* Re: [PATCH net-next 2/2] macvlan: no longer rely on RTNL in macvlan_fill_info()
From: Nikolay Aleksandrov @ 2026-07-01 12:11 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet
In-Reply-To: <20260701082214.2974946-3-edumazet@google.com>

On 01/07/2026 11:22, Eric Dumazet wrote:
> Add READ_ONCE()/WRITE_ONCE() annotations on vlan->mode, vlan->flags,
> vlan->bc_queue_len_req and port->bc_cutoff.
> 
> Fill IFLA_MACVLAN_MACADDR_DATA nested attribute and compute
> on the fly the precise number of elements we put in it,
> to fill an accurate IFLA_MACVLAN_MACADDR_COUNT attribute
> as some user space applications could depend on its value
> and the attributes order.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>   drivers/net/macvlan.c | 71 +++++++++++++++++++++++++++++--------------
>   1 file changed, 48 insertions(+), 23 deletions(-)
> 

The snippet that sets macaddr_count gave me pause. :)
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>

[snip]
> +
> +	if (READ_ONCE(vlan->macaddr_count) > 0) {
>   		nest = nla_nest_start_noflag(skb, IFLA_MACVLAN_MACADDR_DATA);
>   		if (nest == NULL)
>   			goto nla_put_failure;
>   
>   		for (i = 0; i < MACVLAN_HASH_SIZE; i++) {
> -			if (macvlan_fill_info_macaddr(skb, vlan, i))
> +			cnt = macvlan_fill_info_macaddr(skb, vlan, i);
> +			if (cnt < 0)
>   				goto nla_put_failure;
> +			macaddr_count += cnt;
>   		}
> -		nla_nest_end(skb, nest);
> +		if (!macaddr_count)
> +			nla_nest_cancel(skb, nest);
> +		else if (nla_nest_end_safe(skb, nest) < 0)
> +			goto nla_put_failure;
>   	}
> -	if (nla_put_u32(skb, IFLA_MACVLAN_BC_QUEUE_LEN, vlan->bc_queue_len_req))
> +	*(u32 *)nla_data(attr) = macaddr_count;
> +

^ permalink raw reply

* [PATCH net] ppp: defer channel free to an RCU grace period to fix pppol2tp RX UAF
From: Norbert Szetei @ 2026-07-01 12:14 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Qingfang Deng, Taegu Ha, Yue Haibing,
	Sebastian Andrzej Siewior, Kees Cook, linux-ppp, linux-kernel

pppol2tp_recv() runs in the L2TP UDP-encap softirq RX path:

 l2tp_udp_encap_recv() -> l2tp_recv_common() -> pppol2tp_recv()
   -> ppp_input(&po->chan)

It runs under rcu_read_lock() holding only an l2tp_session reference and
takes NO reference on the internal PPP channel (struct channel,
chan->ppp) that ppp_input() dereferences.

The pppox socket is SOCK_RCU_FREE, so 'po' and the embedded ppp_channel
are RCU-safe.  But the internal struct channel is a separate allocation
that ppp_release_channel() frees with a plain kfree():

 close(data socket) -> pppol2tp_release() -> pppox_unbind_sock()
   -> ppp_unregister_channel() -> ppp_release_channel() -> kfree(pch)

For a channel that is bound (PPPIOCGCHAN) but not attached to a ppp unit
(no PPPIOCCONNECT, pch->ppp == NULL) and not bridged, teardown skips
both ppp_disconnect_channel()'s synchronize_net() and
ppp_unbridge_channels()'s synchronize_rcu(), so the kfree() has no grace
period.  rcu_read_lock() in pppol2tp_recv() does not protect against a
plain kfree(), so an in-flight ppp_input() on one CPU can dereference
the channel just freed by close() on another CPU.

The bug is reachable by an unprivileged user.

Free the channel with kfree_rcu() instead of kfree() so the grace period
fences any in-flight ppp_input(). Done in ppp_release_channel(), this
covers all callers in one place.

Fixes: ee40fb2e1eb5 ("l2tp: protect sock pointer of struct pppol2tp_session with RCU")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
---
 drivers/net/ppp/ppp_generic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 57c68efa5ff8..cb8fe37170d3 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -184,6 +184,7 @@ struct channel {
 	struct list_head clist;		/* link in list of channels per unit */
 	spinlock_t	upl;		/* protects `ppp' and 'bridge' */
 	struct channel __rcu *bridge;	/* "bridged" ppp channel */
+	struct rcu_head	rcu;		/* for RCU-deferred free of the channel */
 #ifdef CONFIG_PPP_MULTILINK
 	u8		avail;		/* flag used in multilink stuff */
 	u8		had_frag;	/* >= 1 fragments have been sent */
@@ -3583,7 +3584,7 @@ static void ppp_release_channel(struct channel *pch)
 	}
 	skb_queue_purge(&pch->file.xq);
 	skb_queue_purge(&pch->file.rq);
-	kfree(pch);
+	kfree_rcu(pch, rcu);
 }

 static void __exit ppp_cleanup(void)
-- 
2.54.0

^ permalink raw reply related

* Re: [PATCH v8 10/14] media: qcom: Pass proper PAS ID to set_remote_state API
From: Sumit Garg @ 2026-07-01 12:19 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: andersson, linux-arm-msm, dri-devel, freedreno, linux-media,
	netdev, linux-wireless, ath12k, linux-remoteproc, konradybcio,
	robh, krzk+dt, conor+dt, robin.clark, sean, akhilpo, lumag,
	abhinav.kumar, jesszhan0024, marijn.suijten, airlied, simona,
	vikash.garodia, bod, mchehab, elder, andrew+netdev, davem,
	edumazet, kuba, pabeni, jjohnson, mathieu.poirier,
	trilokkumar.soni, mukesh.ojha, pavan.kondeti, jorge.ramirez,
	tonyh, vignesh.viswanathan, srinivas.kandagatla, amirreza.zarrabi,
	jens.wiklander, op-tee, apurupa, skare, linux-kernel, Sumit Garg
In-Reply-To: <c93b47af-e291-4a6a-ae4b-cc46f25c422b@oss.qualcomm.com>

On Wed, Jul 01, 2026 at 01:01:52PM +0200, Konrad Dybcio wrote:
> On 7/1/26 9:44 AM, Sumit Garg wrote:
> > On Tue, Jun 30, 2026 at 02:42:25PM +0200, Konrad Dybcio wrote:
> >> On 6/26/26 3:34 PM, Sumit Garg wrote:
> >>> From: Sumit Garg <sumit.garg@oss.qualcomm.com>
> >>>
> >>> As per testing the SCM backend just ignores it while OP-TEE makes
> >>> use of it to for proper book keeping purpose.
> >>>
> >>> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
> >>> Tested-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> # Lemans
> >>> Reviewed-by: Vikash Garodia <vikash.garodia@oss.qualcomm.com>
> >>> Signed-off-by: Sumit Garg <sumit.garg@oss.qualcomm.com>
> >>> ---
> >>>  drivers/media/platform/qcom/iris/iris_firmware.c | 2 +-
> >>>  drivers/media/platform/qcom/venus/firmware.c     | 2 +-
> >>>  2 files changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/media/platform/qcom/iris/iris_firmware.c b/drivers/media/platform/qcom/iris/iris_firmware.c
> >>> index ea9654dd679e..d2e7ba4f37e3 100644
> >>> --- a/drivers/media/platform/qcom/iris/iris_firmware.c
> >>> +++ b/drivers/media/platform/qcom/iris/iris_firmware.c
> >>> @@ -110,5 +110,5 @@ int iris_fw_unload(struct iris_core *core)
> >>>  
> >>>  int iris_set_hw_state(struct iris_core *core, bool resume)
> >>>  {
> >>> -	return qcom_pas_set_remote_state(resume, 0);
> >>> +	return qcom_pas_set_remote_state(resume, IRIS_PAS_ID);
> >>>  }
> >>> diff --git a/drivers/media/platform/qcom/venus/firmware.c b/drivers/media/platform/qcom/venus/firmware.c
> >>> index 3a38ff985822..3c0727ea137d 100644
> >>> --- a/drivers/media/platform/qcom/venus/firmware.c
> >>> +++ b/drivers/media/platform/qcom/venus/firmware.c
> >>> @@ -59,7 +59,7 @@ int venus_set_hw_state(struct venus_core *core, bool resume)
> >>>  	int ret;
> >>>  
> >>>  	if (core->use_tz) {
> >>> -		ret = qcom_pas_set_remote_state(resume, 0);
> >>> +		ret = qcom_pas_set_remote_state(resume, VENUS_PAS_ID);
> >>
> >> This should not be in the middle of a mildly related series..
> >> The PAS IDs should be centralized into a single header. And the
> >> name of the driver shouldn't be part of the define. I would guesstimate
> >> that on the secure side it's probably called VPU or VIDEO
> > 
> > I agree with your comments, this is something I would also like to
> > consolidate on OP-TEE side as well: see discussion here [1].
> > 
> > However, the patch itself was needed to do book keeping on OP-TEE side
> > but I can drop it since anyhow the video isn't functional yet in
> > upstream dependent on the proper IOMMU support.
> 
> For this patch.. I think QCTZ may be ignoring the argument so it
> may not matter.. on a second thought you already have it reviewed
> and it's already a cross-subsys merge so might as well pull it in,
> worst case scenario it'll revert cleanly

Thanks, I will keep it then.

> 
> Once this lands, please move all PAS defines to.. hmm.. qcom_pas.h
> sounds like a good candidate?

Sure, I will propose that as a follow-up change. We have to agree on
common naming there.

-Sumit

^ permalink raw reply

* [PATCH net] amt: fix size calculation in amt_get_size()
From: Eric Dumazet @ 2026-07-01 12:23 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Kuniyuki Iwashima, Andrew Lunn, netdev,
	eric.dumazet, Eric Dumazet

amt_get_size() incorrectly used sizeof(struct iphdr) for the sizes of
IFLA_AMT_DISCOVERY_IP, IFLA_AMT_REMOTE_IP, and IFLA_AMT_LOCAL_IP.
These attributes contain IPv4 addresses (__be32), not full IP headers.

Replace sizeof(struct iphdr) with sizeof(__be32) to avoid over-allocating
netlink message space.

Fixes: b9022b53adad ("amt: add control plane of amt interface")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/amt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index 724a8163a5142a6835950abb63d80f29417b2654..951dd10e192b7924f9d3f05065a298ddcf8f4b25 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -3301,9 +3301,9 @@ static size_t amt_get_size(const struct net_device *dev)
 	       nla_total_size(sizeof(__u16)) + /* IFLA_AMT_GATEWAY_PORT */
 	       nla_total_size(sizeof(__u32)) + /* IFLA_AMT_LINK */
 	       nla_total_size(sizeof(__u32)) + /* IFLA_MAX_TUNNELS */
-	       nla_total_size(sizeof(struct iphdr)) + /* IFLA_AMT_DISCOVERY_IP */
-	       nla_total_size(sizeof(struct iphdr)) + /* IFLA_AMT_REMOTE_IP */
-	       nla_total_size(sizeof(struct iphdr)); /* IFLA_AMT_LOCAL_IP */
+	       nla_total_size(sizeof(__be32)) + /* IFLA_AMT_DISCOVERY_IP */
+	       nla_total_size(sizeof(__be32)) + /* IFLA_AMT_REMOTE_IP */
+	       nla_total_size(sizeof(__be32)); /* IFLA_AMT_LOCAL_IP */
 }
 
 static int amt_fill_info(struct sk_buff *skb, const struct net_device *dev)
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* Re: [PATCH net 4/9] netfilter: nf_conntrack_sip: validate skb_dst() before accessing it
From: Pablo Neira Ayuso @ 2026-07-01 12:23 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
	Jakub Kicinski, netfilter-devel
In-Reply-To: <akTzomC4Qz8u8teJ@chamomile>

On Wed, Jul 01, 2026 at 01:01:58PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Jul 01, 2026 at 08:36:11AM +0200, Florian Westphal wrote:
> > Florian Westphal <fw@strlen.de> wrote:
> > > From: Pablo Neira Ayuso <pablo@netfilter.org>
> > > 
> > > tc ingress and openvswitch do not guarantee routing information to be
> > > available. These subsystems use the conntrack helper infrastructure, and
> > > the SIP helper relies on the skb_dst() to be present if
> > > sip_external_media is set to 1 (which is disabled by default as a module
> > > parameter).
> > 
> > The sashiko drive-by appears real, I submitted a patch for it.
> > Its not a regression added by this patch but a unrelated issue.
> > 
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20260701062922.9660-1-fw@strlen.de/
> 
> Is skb_ensure_writable() bogus here?
> 
> As you said, skb is already linearized. As for clones, they should
> only happen in br_netfilter? In such case, it should be br_netfilter
> that should be audited not to pass cloned skbuffs before calling the
> inet hooks.

Forget this, this skb_ensure_writable() is really needed here.

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH iwl-next v1] ixgbe: E610: force phy link to get down when interface is down
From: Paul Menzel @ 2026-07-01 12:27 UTC (permalink / raw)
  To: Jedrzej Jagielski
  Cc: intel-wired-lan, anthony.l.nguyen, netdev, Aleksandr Loktionov
In-Reply-To: <20260701113519.49859-1-jedrzej.jagielski@intel.com>

Dear Jedrzej,


Thank you for your patch.

Am 01.07.26 um 13:35 schrieb Jedrzej Jagielski:
> For the E610 family, similarly to the E8xx adapters, the default behavior
> is for the PHY link to remain up even when the corresponding OS interface
> is down.
> 
> Add function setting down the PHY config IXGBE_ACI_PHY_ENA_LINK bit
> what leads to disabling PHY link.

It’d extend it a little:

… by factoring the code out into ixgbe_handle_link_down(), and call it 
in ixgbe_close().

> Align functionality with the implementation of the ice driver.

Please add a paragraph detailing regression potential. Are there users 
that might depend on the current default, as uncommon it might be?

> Let user to configure link-down-on-close enablement through ethtool.

Please provide examples, and how to test your change. Doing this you can 
also paste the new log messages.

> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
> ---
>   drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  1 +
>   drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c | 35 ++++++++++++++++++-
>   drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h |  1 +
>   .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  | 15 ++++++++
>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 27 +++++++++++---
>   5 files changed, 73 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 30f62174acf2..7bbb82dd962c 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -685,6 +685,7 @@ struct ixgbe_adapter {
>   #define IXGBE_FLAG2_MOD_POWER_UNSUPPORTED	BIT(22)
>   #define IXGBE_FLAG2_API_MISMATCH		BIT(23)
>   #define IXGBE_FLAG2_FW_ROLLBACK			BIT(24)
> +#define IXGBE_FLAG2_LINK_DOWN_ON_CLOSE		BIT(25)
>   
>   	/* Tx fast path data */
>   	int num_tx_queues;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
> index da445fb673fc..46d8a3ea86b8 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.c
> @@ -1923,6 +1923,33 @@ void ixgbe_fc_autoneg_e610(struct ixgbe_hw *hw)
>   	hw->fc.current_mode = hw->fc.requested_mode;
>   }
>   
> +/**
> + * ixgbe_disable_phy_link - force phy link to get down
> + * @hw: pointer to hardware structure
> + *
> + * Send 0x0601 with the IXGBE_ACI_PHY_ENA_LINK bit set down.
> + *
> + * Return: the exit code of the operation.

At least for me it’s not that helpful. Shouldn’t the return values be 
listed? What is success? What is failure?

> + */
> +int ixgbe_disable_phy_link(struct ixgbe_hw *hw)
> +{
> +	struct ixgbe_aci_cmd_get_phy_caps_data pcaps = {};
> +	struct ixgbe_aci_cmd_set_phy_cfg_data pcfg = {};
> +	int err;
> +
> +	err = ixgbe_aci_get_phy_caps(hw, false, IXGBE_ACI_REPORT_ACTIVE_CFG,
> +				     &pcaps);
> +	if (err)
> +		return err;
> +
> +	ixgbe_copy_phy_caps_to_cfg(&pcaps, &pcfg);
> +
> +	pcfg.caps &= ~IXGBE_ACI_PHY_ENA_LINK;
> +	pcfg.caps |= IXGBE_ACI_PHY_ENA_AUTO_LINK_UPDT;
> +
> +	return ixgbe_aci_set_phy_cfg(hw, &pcfg);
> +}
> +
>   /**
>    * ixgbe_disable_rx_e610 - Disable RX unit
>    * @hw: pointer to hardware structure
> @@ -2207,6 +2234,7 @@ int ixgbe_setup_phy_link_e610(struct ixgbe_hw *hw)
>   	u8 rmode = IXGBE_ACI_REPORT_TOPO_CAP_MEDIA;
>   	u64 sup_phy_type_low, sup_phy_type_high;
>   	u64 phy_type_low = 0, phy_type_high = 0;
> +	bool force_on_required;
>   	int err;
>   
>   	err = ixgbe_aci_get_link_info(hw, false, NULL);
> @@ -2272,6 +2300,11 @@ int ixgbe_setup_phy_link_e610(struct ixgbe_hw *hw)
>   		phy_type_high |= IXGBE_PHY_TYPE_HIGH_10G_USXGMII;
>   	}
>   
> +	/* If IXGBE_ACI_PHY_ENA_LINK has been explicitly disabled that means
> +	 * we need to force interface enablement after reaching that point

It’d be great, if you rephrased “that point”.

> +	 */
> +	force_on_required = !(pcfg.caps & IXGBE_ACI_PHY_ENA_LINK);
> +
>   	/* Mask the set values to avoid requesting unsupported link types. */
>   	phy_type_low &= sup_phy_type_low;
>   	pcfg.phy_type_low = cpu_to_le64(phy_type_low);
> @@ -2280,7 +2313,7 @@ int ixgbe_setup_phy_link_e610(struct ixgbe_hw *hw)
>   
>   	if (pcfg.phy_type_high != pcaps.phy_type_high ||
>   	    pcfg.phy_type_low != pcaps.phy_type_low ||
> -	    pcfg.caps != pcaps.caps) {
> +	    pcfg.caps != pcaps.caps || force_on_required) {
>   		pcfg.caps |= IXGBE_ACI_PHY_ENA_LINK;
>   		pcfg.caps |= IXGBE_ACI_PHY_ENA_AUTO_LINK_UPDT;
>   
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h
> index 2cb76a3d30ae..59044d67ebeb 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_e610.h
> @@ -50,6 +50,7 @@ int ixgbe_cfg_phy_fc(struct ixgbe_hw *hw,
>   		     enum ixgbe_fc_mode req_mode);
>   int ixgbe_setup_fc_e610(struct ixgbe_hw *hw);
>   void ixgbe_fc_autoneg_e610(struct ixgbe_hw *hw);
> +int ixgbe_disable_phy_link(struct ixgbe_hw *hw);
>   void ixgbe_disable_rx_e610(struct ixgbe_hw *hw);
>   int ixgbe_init_phy_ops_e610(struct ixgbe_hw *hw);
>   int ixgbe_identify_phy_e610(struct ixgbe_hw *hw);
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> index 4dfae53b4ea1..0fcb9d738984 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> @@ -139,6 +139,8 @@ static const char ixgbe_priv_flags_strings[][ETH_GSTRING_LEN] = {
>   	"vf-ipsec",
>   #define IXGBE_PRIV_FLAGS_AUTO_DISABLE_VF	BIT(2)
>   	"mdd-disable-vf",
> +#define IXGBE_PRIV_LINK_DOWN_ON_CLOSE	BIT(3)
> +	"link-down-on-close",
>   };
>   
>   #define IXGBE_PRIV_FLAGS_STR_LEN ARRAY_SIZE(ixgbe_priv_flags_strings)
> @@ -3842,6 +3844,9 @@ static u32 ixgbe_get_priv_flags(struct net_device *netdev)
>   	if (adapter->flags2 & IXGBE_FLAG2_AUTO_DISABLE_VF)
>   		priv_flags |= IXGBE_PRIV_FLAGS_AUTO_DISABLE_VF;
>   
> +	if (adapter->flags2 & IXGBE_FLAG2_LINK_DOWN_ON_CLOSE)
> +		priv_flags |= IXGBE_PRIV_LINK_DOWN_ON_CLOSE;
> +
>   	return priv_flags;
>   }
>   
> @@ -3879,6 +3884,16 @@ static int ixgbe_set_priv_flags(struct net_device *netdev, u32 priv_flags)
>   		}
>   	}
>   
> +	flags2 &= ~IXGBE_FLAG2_LINK_DOWN_ON_CLOSE;
> +	if (priv_flags & IXGBE_PRIV_LINK_DOWN_ON_CLOSE) {
> +		if (adapter->hw.mac.type == ixgbe_mac_e610) {
> +			flags2 |= IXGBE_FLAG2_LINK_DOWN_ON_CLOSE;
> +		} else {
> +			e_info(probe, "Cannot set private flags: Unsupported hardware\n");

Please print hw.mac.type, and mention, that it’s only supported on E610.

> +			return -EOPNOTSUPP;
> +		}
> +	}
> +
>   	if (flags2 != adapter->flags2) {
>   		adapter->flags2 = flags2;
>   
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 62c2d83e1577..58ee4a186039 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -7544,6 +7544,17 @@ static void ixgbe_close_suspend(struct ixgbe_adapter *adapter)
>   	ixgbe_free_all_rx_resources(adapter);
>   }
>   
> +static void ixgbe_handle_link_down(struct ixgbe_adapter *adapter)
> +{
> +	struct net_device *netdev = adapter->netdev;
> +
> +	if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
> +		ixgbe_ptp_start_cyclecounter(adapter);
> +
> +	e_info(drv, "NIC Link is Down\n");
> +	netif_carrier_off(netdev);
> +}
> +
>   /**
>    * ixgbe_close - Disables a network interface
>    * @netdev: network interface device structure
> @@ -7566,6 +7577,16 @@ int ixgbe_close(struct net_device *netdev)
>   
>   	ixgbe_fdir_filter_exit(adapter);
>   
> +	if (adapter->flags2 & IXGBE_FLAG2_LINK_DOWN_ON_CLOSE) {
> +		int err;
> +
> +		err = ixgbe_disable_phy_link(&adapter->hw);
> +		if (err)
> +			e_warn(drv, "Cannot set PHY link down\n");

Log the error?

> +
> +		ixgbe_handle_link_down(adapter);
> +	}
> +
>   	ixgbe_release_hw_control(adapter);
>   
>   	return 0;
> @@ -8244,11 +8265,7 @@ static void ixgbe_watchdog_link_is_down(struct ixgbe_adapter *adapter)
>   	if (ixgbe_is_sfp(hw) && hw->mac.type == ixgbe_mac_82598EB)
>   		adapter->flags2 |= IXGBE_FLAG2_SEARCH_FOR_SFP;
>   
> -	if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
> -		ixgbe_ptp_start_cyclecounter(adapter);
> -
> -	e_info(drv, "NIC Link is Down\n");
> -	netif_carrier_off(netdev);
> +	ixgbe_handle_link_down(adapter);
>   }
>   
>   static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)


Kind regards,

Paul

^ permalink raw reply

* [syzbot] Monthly netfilter report (Jul 2026)
From: syzbot @ 2026-07-01 12:32 UTC (permalink / raw)
  To: fw, linux-kernel, netdev, netfilter-devel, pablo, syzkaller-bugs

Hello netfilter maintainers/developers,

This is a 31-day syzbot report for the netfilter subsystem.
All related reports/information can be found at:
https://syzkaller.appspot.com/upstream/s/netfilter

During the period, 0 new issues were detected and 0 were fixed.
In total, 11 issues are still open and 198 have already been fixed.
There are also 3 low-priority issues.

Some of the still happening issues:

Ref Crashes Repro Title
<1> 1759    Yes   INFO: task hung in addrconf_dad_work (5)
                  https://syzkaller.appspot.com/bug?extid=82ccd564344eeaa5427d
<2> 703     Yes   INFO: rcu detected stall in addrconf_rs_timer (6)
                  https://syzkaller.appspot.com/bug?extid=fecf8bd19c1f78edb255
<3> 137     Yes   INFO: rcu detected stall in NF_HOOK (2)
                  https://syzkaller.appspot.com/bug?extid=34c2df040c6cfa15fdfe
<4> 124     Yes   INFO: rcu detected stall in br_handle_frame (6)
                  https://syzkaller.appspot.com/bug?extid=f8850bc3986562f79619
<5> 78      No    INFO: task hung in nfnetlink_rcv_msg (5)
                  https://syzkaller.appspot.com/bug?extid=c4b20b80ee6a7a2f5012
<6> 2       No    WARNING in nf_conntrack_cleanup_net_list (2)
                  https://syzkaller.appspot.com/bug?extid=122256c3e2bf6ec9f928

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

To disable reminders for individual bugs, reply with the following command:
#syz set <Ref> no-reminders

To change bug's subsystems, reply with:
#syz set <Ref> subsystems: new-subsystem

You may send multiple commands in a single email message.

^ permalink raw reply

* [PATCH net-next] gtp: annotate PDP lookups under RTNL
From: Runyu Xiao @ 2026-07-01 12:39 UTC (permalink / raw)
  To: pablo, laforge
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, osmocom-net-gprs,
	netdev, linux-kernel, runyu.xiao, jianhao.xu

The GTP PDP lookup helpers are shared by RCU-protected data and report
paths and RTNL-protected control paths such as gtp_genl_new_pdp(). The
helpers walk RCU hlists, but they do not currently pass the RTNL
condition for the control-path lookups.

Pass lockdep_rtnl_is_held() to the PDP hlist iterators. Existing
RCU-reader callers remain valid because the RCU-list macros also accept
an active RCU read-side section; the added condition only documents the
non-RCU protection already used by RTNL control paths.

This was found by our static analysis tool and then manually reviewed
against the current tree. The dynamic triage evidence is a
target-matched CONFIG_PROVE_RCU_LIST warning; the change is limited
to documenting the existing protection contract.

This is a lockdep annotation cleanup. It does not change PDP lifetime or
hash updates.

Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
 drivers/net/gtp.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 5cb59d72bc82..b94073c55f17 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -151,7 +151,8 @@ static struct pdp_ctx *gtp0_pdp_find(struct gtp_dev *gtp, u64 tid, u16 family)
 
 	head = &gtp->tid_hash[gtp0_hashfn(tid) % gtp->hash_size];
 
-	hlist_for_each_entry_rcu(pdp, head, hlist_tid) {
+	hlist_for_each_entry_rcu(pdp, head, hlist_tid,
+				 lockdep_rtnl_is_held()) {
 		if (pdp->af == family &&
 		    pdp->gtp_version == GTP_V0 &&
 		    pdp->u.v0.tid == tid)
@@ -168,7 +169,8 @@ static struct pdp_ctx *gtp1_pdp_find(struct gtp_dev *gtp, u32 tid, u16 family)
 
 	head = &gtp->tid_hash[gtp1u_hashfn(tid) % gtp->hash_size];
 
-	hlist_for_each_entry_rcu(pdp, head, hlist_tid) {
+	hlist_for_each_entry_rcu(pdp, head, hlist_tid,
+				 lockdep_rtnl_is_held()) {
 		if (pdp->af == family &&
 		    pdp->gtp_version == GTP_V1 &&
 		    pdp->u.v1.i_tei == tid)
@@ -185,7 +187,8 @@ static struct pdp_ctx *ipv4_pdp_find(struct gtp_dev *gtp, __be32 ms_addr)
 
 	head = &gtp->addr_hash[ipv4_hashfn(ms_addr) % gtp->hash_size];
 
-	hlist_for_each_entry_rcu(pdp, head, hlist_addr) {
+	hlist_for_each_entry_rcu(pdp, head, hlist_addr,
+				 lockdep_rtnl_is_held()) {
 		if (pdp->af == AF_INET &&
 		    pdp->ms.addr.s_addr == ms_addr)
 			return pdp;
@@ -220,7 +223,8 @@ static struct pdp_ctx *ipv6_pdp_find(struct gtp_dev *gtp,
 
 	head = &gtp->addr_hash[ipv6_hashfn(ms_addr) % gtp->hash_size];
 
-	hlist_for_each_entry_rcu(pdp, head, hlist_addr) {
+	hlist_for_each_entry_rcu(pdp, head, hlist_addr,
+				 lockdep_rtnl_is_held()) {
 		if (pdp->af == AF_INET6 &&
 		    ipv6_pdp_addr_equal(&pdp->ms.addr6, ms_addr))
 			return pdp;
-- 
2.34.1


^ permalink raw reply related

* [PATCH v5 net 0/7] i40e: re-init and UAF fixes
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
	jacob.e.keller, Maciej Fijalkowski

v5:
- include three new patches to address last Sashiko review
  *
- do not release the irq lump in rebuild path in patch 7
- clear dangling pointers from rx and xdp rings arrays
v4:
- add preceding patch that fixes a case when some of re-init allocations
  failed and we missed de-registering netdev at failure path
- pull out i40e_vsi_setup() changes onto separate patch
v3:
- address UAF when ring arrays were freed before q_vector's ring
  containers (Sashiko, Jacob)
- remove bool params from alloc/free array routines (Simon)
v2:
- NULL vsi->tx_rings in i40e_vsi_alloc_arrays() (Sashiko)


Maciej Fijalkowski (7):
  i40e: unregister netdev before clearing VSI on reinit failure
  i40e: avoid null ptr dereference in i40e_ptp_stop()
  i40e: make ring pointers unreachable before freeing via rcu
  i40e: avoid deadlock when calling unregister_netdev()
  i40e: fix potential UAF in i40e_vsi_setup()'s error path
  i40e: do not expose netdev too early
  i40e: keep q_vectors array in sync with channel count changes

 drivers/net/ethernet/intel/i40e/i40e_main.c | 128 ++++++++++++--------
 drivers/net/ethernet/intel/i40e/i40e_ptp.c  |   5 +-
 2 files changed, 82 insertions(+), 51 deletions(-)

-- 
2.43.0


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox